Power Systems for AIX III - Advanced Administration and Problem Determination

April 5, 2017 | Author: gsghenea | Category: N/A

Short Description

Download Power Systems for AIX III - Advanced Administration and Problem Determination...

Description

V5.3

cover

Front cover

Power Systems for AIX III: Advanced Administration and Problem Determination (Course code AN15)

Instructor Guide ERC 1.1

Instructor Guide

Trademarks The reader should recognize that the following terms, which appear in the content of this training document, are official trademarks of IBM or other companies: IBM® is a registered trademark of International Business Machines Corporation. The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both: AIX® HACMP™ POWER4™ POWER6™ Power Systems™ Redbooks® System i® Tivoli®

AIX 5L™ MWAVE® POWER5™ POWER Gt1™ PowerVM™ RS/6000® System p® WebSphere®

DB2® POWER™ POWER5+™ POWER Gt3™ pSeries® SP™ System p5® Workload Partitions Manager™

Adobe is either a registered trademark or a trademark of Adobe Systems Incorporated in the United States, and/or other countries. Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Linux® is a registered trademark of Linus Torvalds in the United States, other countries, or both. Windows is a trademark of Microsoft Corporation in the United States, other countries, or both. UNIX® is a registered trademark of The Open Group in the United States and other countries. Other company, product, or service names may be trademarks or service marks of others.

November 2009 edition The information contained in this document has not been submitted to any formal IBM test and is distributed on an “as is” basis without any warranty either express or implied. The use of this information or the implementation of any of these techniques is a customer responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will result elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk.

© Copyright International Business Machines Corporation 2009. All rights reserved. This document may not be reproduced in whole or in part without the prior written permission of IBM. Note to U.S. Government Users — Documentation related to restricted rights — Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp.

V5.3 Instructor Guide

TOC

Contents Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Instructor course overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Course description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix Unit 1. Advanced AIX administration overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 Application outages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4 Live Partition Mobility versus Live Application Mobility . . . . . . . . . . . . . . . . . . . . . . 1-7 Maintenance window tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10 Effective problem management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-14 Before problems occur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-17 Before problems occur: A few good commands . . . . . . . . . . . . . . . . . . . . . . . . . . 1-20 Steps in problem resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-22 Progress and reference codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-26 Working with AIX Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-30 AIX Support test case data (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-33 AIX Support test case data (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-36 AIX software update hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-38 Relevant documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-41 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-43 Exercise 1: Advanced AIX administration overview . . . . . . . . . . . . . . . . . . . . . . . 1-45 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-47 Unit 2. The Object Data Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 2.1. Introduction to the ODM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 What is the ODM? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 Data managed by the ODM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8 ODM components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11 ODM database files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13 Device configuration summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16 Configuration manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-18 Location and contents of ODM repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-20 How ODM classes act together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-23 Data not managed by the ODM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-25 Let’s review: Device configuration and the ODM . . . . . . . . . . . . . . . . . . . . . . . . . 2-27 ODM commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-29 Changing attribute values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-32 Using odmchange to change attribute values . . . . . . . . . . . 2-35 2.2. ODM database files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-37

©Copyright IBM Corp. 2009

Contents Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

iii

Instructor Guide

Software vital product data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-38 Software states you should know about . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-41 Predefined devices (PdDv) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-44 Predefined attributes (PdAt) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-49 Customized devices (CuDv) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-52 Customized attributes (CuAt) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-56 Additional device object classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-58 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-61 Exercise 3: The Object Data Manager (ODM) . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-63 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-65 Unit 3. Error monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-2 3.1. Working with the error log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-5 Error logging components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6 Generating an error report using SMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-9 The errpt command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-13 A summary report (errpt) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-16 A detailed error report (errpt -a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-18 Types of disk errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-21 LVM error log entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-24 Maintaining the error log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-26 Exercise 2: Error monitoring (part 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-29 3.2. Error notification and syslogd. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-31 Error notification methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-32 Self-made error notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-35 ODM-based error notification: errnotify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-38 syslogd daemon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-42 syslogd configuration examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-45 Redirecting syslog messages to error log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-49 Directing error log messages to syslogd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-51 System hang detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-53 Configuring shdaemon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-56 Exercise 2: Error monitoring (part 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-59 3.3. Resource monitoring and control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-61 Resource monitoring and control (RMC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-62 RMC conditions property screen: General tab . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-65 RMC conditions property screen: Monitored Resources tab . . . . . . . . . . . . . . . . .3-67 RMC actions property screen: General tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-69 RMC actions property screen: When in Effect tab . . . . . . . . . . . . . . . . . . . . . . . . .3-71 RMC management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-73 Exercise 2: Error monitoring (part 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-76 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-78 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-80 Unit 4. Network Installation Manager basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-2 NIM overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-4 iv

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

©Copyright IBM Corp. 2009

V5.3 Instructor Guide

TOC

Machine roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7 Boot process for AIX installation (tape or CD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10 Boot process for AIX installation (network) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13 NIM objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16 Listing NIM objects and their attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-19 NIM configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-21 resources objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-24 resources objects: lpp_source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-27 resources objects: spot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-31 resources objects: mksysb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-35 networks objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-38 machines objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-41 Defining a machine object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-44 Define a client using SMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-47 NIM operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-50 bos_inst operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-55 More information about NIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-58 Additional topics in NIM course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-64 Exercise 4 overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-66 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-68 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-70 Unit 5. System initialization: Part I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 5.1. System startup process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7 How does a System p server or LPAR boot? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 Loading of a boot image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 Contents of the boot logical volume (hd5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-14 5.2. Unable to find boot image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-17 Working with bootlists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-18 Starting System Management Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-21 Working with bootlists in SMS (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-24 Working with bootlists in SMS (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-27 5.3. Corrupted boot logical volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-29 Boot device alternatives (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-30 Boot device alternatives (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-33 Accessing a system that will not boot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-35 Booting in maintenance mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-39 Working in maintenance mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-41 How to fix a corrupted BLV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-44 Checkpoint (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-47 Checkpoint (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-49 Exercise 3: System initialization: Part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-51 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-53 Unit 6. System initialization: Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 6.1. AIX initialization part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5 ©Copyright IBM Corp. 2009

Contents Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

v

Instructor Guide

System software initialization overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-6 rc.boot 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-9 rc.boot 2 (part 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-12 rc.boot 2 (part 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-15 rc.boot 3 (part 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-18 rc.boot 3 (part 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-21 rc.boot summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-24 Fixing corrupted file systems and logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-26 Let’s review: rc.boot (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-29 Let’s review: rc.boot (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-31 Let’s review: rc.boot (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-33 6.2. AIX initialization part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-35 Configuration manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-36 Config_Rules object class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-39 cfgmgr output in the boot log using alog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-42 /etc/inittab file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-44 Boot problem management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-47 Let’s review: /etc/inittab file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-51 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-55 Exercise 4: System initialization part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-57 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-59 Unit 7. Disk management theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-2 7.1. LVM data representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-5 LVM terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-6 LVM identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-9 LVM data on disk control blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-12 LVM data in the operating system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-15 Contents of the VGDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-17 VGDA example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-20 The logical volume control block (LVCB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-24 How LVM interacts with ODM and VGDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-27 ODM entries for physical volumes (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-30 ODM entries for physical volumes (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-33 ODM entries for physical volumes (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-35 ODM entries for volume groups (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-37 ODM entries for volume groups (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-39 ODM entries for logical volumes (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-41 ODM entries for logical volumes (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-43 ODM-related LVM problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-45 Fixing ODM problems (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-48 Fixing ODM problems (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-51 Intermediate level ODM commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-55 Exercise 7: LVM metadata and problems (parts 1 and 2) . . . . . . . . . . . . . . . . . . .7-58 7.2. Failed disks: Mirroring and quorum issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-61 Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-62 Stale partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-65 vi

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

©Copyright IBM Corp. 2009

V5.3 Instructor Guide

TOC

Mirroring rootvg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VGDA count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quorum not available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonquorum volume groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Forced vary on (varyonvg -f) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Physical volume states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercise 7: LVM Metadata and problems (parts 4 and 5) . . . . . . . . . . . . . . . . . . . Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7-68 7-71 7-73 7-76 7-79 7-82 7-85 7-87 7-89

Unit 8. Disk management procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3 8.1. Disk replacement techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-7 Disk replacement: Starting point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-8 Procedure 1: Disk mirrored . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-11 Procedure 2: Disk still working . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-14 Procedure 2: Special steps for rootvg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-17 Procedure 3: Disk in missing or removed state . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-20 Procedure 4: Total rootvg failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-23 Procedure 5: Total non-rootvg failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-26 Frequent disk replacement errors (1 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-29 Frequent disk replacement errors (2 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-31 Frequent disk replacement errors (3 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-33 Frequent disk replacement errors (4 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-35 8.2. Export and import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-39 Exporting a volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-40 Importing a volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-43 importvg and existing logical volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-46 importvg and existing file systems (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-48 importvg and existing file systems (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-51 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-54 Exercise 8: Exporting and importing volume groups . . . . . . . . . . . . . . . . . . . . . . . 8-56 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-58 Unit 9. Install and backup techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2 9.1. Alternate disk installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5 Topic 1 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6 Alternate disk installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8 Alternate mksysb disk installation (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-12 Alternate mksysb disk installation (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-15 Alternate disk rootvg cloning (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-17 Alternate disk rootvg cloning (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-19 Removing an alternate disk installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-21 NIM alternate disk migration (nimadm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-24 Exercise 9, topic 1: Alternate disk install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-27 9.2. Using multibos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-29 Topic 2 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-30 ©Copyright IBM Corp. 2009

Contents Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

vii

Instructor Guide

multibos overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-32 Active and standby BOS logical volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-35 Setting up a standby BOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-37 Other multibos operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-40 Exercise 9, topic 2: multibos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-44 9.3. JFS2 snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-47 Topic 3 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-48 JFS2 snapshot (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-50 JFS2 snapshot (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-53 JFS2 snapshot mechanism (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-56 JFS2 snapshot mechanism (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-58 JFS2 snapshot SMIT menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-60 Creating snapshots (external) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-62 Creating snapshots (internal) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-66 Listing snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-68 Using a JFS2 snapshot to recover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-70 Using a JFS2 snapshot to back up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-73 JFS2 snapshot space management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-75 Exercise 9, topic 3: JFS2 snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-77 Checkpoint (1 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-79 Checkpoint (2 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-81 Checkpoint (3 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-83 Checkpoint (4 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-85 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-87 Unit 10. Workload partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-3 10.1. Workload partitions review. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-5 Topic 1 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-6 AIX workload partitions (WPAR) review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-8 System WPAR and application WPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-12 System WPAR file systems space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-15 10.2. WPAR Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-19 Topic 2 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-20 Workload Partition Manager overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-22 Workload Partition Manager main GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-25 WPAR Manager topology: Default configuration . . . . . . . . . . . . . . . . . . . . . . . . .10-28 Installation and configuration: WPAR Manager . . . . . . . . . . . . . . . . . . . . . . . . . .10-32 Installation and configuration: WPAR agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-36 Authentication and WPAR Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-39 WPAR Manager functional view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-42 Basic management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-45 Creating a WPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-47 WPAR monitoring and reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-49 Resources view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-51 Manual relocation or mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-53 Tasks activity and logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-55 WPAR 1.2 log locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-58 viii

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

©Copyright IBM Corp. 2009

V5.3 Instructor Guide

TOC

10.3. Application mobility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-61 Topic 3 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-62 Application mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-64 WPAR Manager relocation support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-66 Compatibility issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-69 Live partition mobility versus live application mobility . . . . . . . . . . . . . . . . . . . . . 10-72 WPAR enhanced live mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-75 Steps for WPAR enhanced live mobility (WPAR Mgr GUI) . . . . . . . . . . . . . . . . . 10-78 Enhanced relocation workflow (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-81 Enhanced relocation workflow (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-83 Enhanced relocation error (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-85 Enhanced relocation error (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-87 Steps for WPAR enhanced live mobility (command line) . . . . . . . . . . . . . . . . . . 10-89 Enhanced live relocation: CLI (1 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-91 Enhanced live relocation: CLI (2 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-93 Enhanced live relocation: CLI (3 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-95 Enhanced live relocation: CLI (4 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-99 Steps for WPAR static relocation (WPAR Mgr GUI) . . . . . . . . . . . . . . . . . . . . . 10-101 Steps for checkpoint and restart relocation: CLI . . . . . . . . . . . . . . . . . . . . . . . . 10-104 Checkpoint and restart relocation: CLI (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . 10-107 Checkpoint and restart relocation: CLI (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . 10-109 Checkpoint and restart relocation: CLI (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . 10-111 Checkpoint (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-114 Checkpoint (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-116 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-118 Unit 11. The AIX system dump facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2 System dumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-4 Types of dumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-6 How a system dump is invoked . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-9 LED 888 code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-12 When a dump occurs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-15 The sysdumpdev command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-17 Dedicated dump device (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-23 Dedicated dump device (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-25 Estimating dump size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-28 dumpcheck utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-31 Methods of starting a dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-34 Start a dump from a TTY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-38 Generating dumps with SMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-41 Dump-related LED codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-43 Copying system dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-46 Automatically reboot after a crash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-48 Sending a dump to IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-51 Use kdb to analyze a dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-56 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-60 Exercise 11: System dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-62 ©Copyright IBM Corp. 2009

Contents Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

ix

Instructor Guide

Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-64 Appendix A. Checkpoint solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 Appendix B. Command summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1 Appendix C. AIX dump code and progress codes. . . . . . . . . . . . . . . . . . . . . . . . . . . C-1 Appendix D. Auditing security related events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-1 Appendix E. Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-1

x

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

©Copyright IBM Corp. 2009

V5.3 Instructor Guide

TMK

Trademarks The reader should recognize that the following terms, which appear in the content of this training document, are official trademarks of IBM or other companies: IBM® is a registered trademark of International Business Machines Corporation. The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both: AIX® HACMP™ POWER4™ POWER6™ Power Systems™ Redbooks® System i® Tivoli®

AIX 5L™ MWAVE® POWER5™ POWER Gt1™ PowerVM™ RS/6000® System p® WebSphere®

DB2® POWER™ POWER5+™ POWER Gt3™ pSeries® SP™ System p5® Workload Partitions Manager™

Adobe is either a registered trademark or a trademark of Adobe Systems Incorporated in the United States, and/or other countries. Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Linux® is a registered trademark of Linus Torvalds in the United States, other countries, or both. Windows is a trademark of Microsoft Corporation in the United States, other countries, or both. UNIX® is a registered trademark of The Open Group in the United States and other countries. Other company, product, or service names may be trademarks or service marks of others.

© Copyright IBM Corp. 2009

Trademarks Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

xi

Instructor Guide

xii

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

pref

Instructor course overview This is a five-day course for existing system administrators with at least six months experience in AIX. It is assumed that the students have general administrative skills, such as installing the operating system, configuring and managing devices, working with volume groups (including logical volumes and file systems), adding and administering user accounts, and general day-to-day housekeeping skills. The main target of this course is to provide advanced AIX administration skills, including various tools and techniques in determining and solving problems, monitoring for problems, reducing the maintenance window for system updates, and minimizing downtime for system maintenance.

© Copyright IBM Corp. 2009

Instructor course overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

xiii

Instructor Guide

xiv

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

pref

Course description Power Systems for AIX III: Advanced Administration and Problem Determination Duration: 5 days Purpose This course provides advanced AIX system administrator skills with a focus on availability and problem determination. It provides detailed knowledge of the ODM database where AIX maintains so much configuration information. It shows how to monitor for and deal with AIX problems. There is special focus on dealing with Logical Volume Manager problems, including procedures for replacing disks. Several techniques for minimizing the system maintenance window are covered. It also covers how to migrate AIX Workload Partitions to another system with minimal disruption. While the course includes some AIX 6.1 enhancements, most of the material is applicable to prior releases of AIX.

Audience This is an advanced course for AIX system administrators, system support, and contract support individuals with at least six months of experience in AIX.

Prerequisites You should have basic AIX System Administration skills. These skills include: • Use of the Hardware Management Console (HMC) to activate a logical partition running AIX and to access the AIX system console • Install an AIX operating system from an already configured NIM server • Implementation of AIX backup and recovery • Manage additional software and base operating system updates • Familiarity with management tools such as SMIT • Understand how to manage file systems, logical volumes, and volume groups

© Copyright IBM Corp. 2009

Course description Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

xv

Instructor Guide

• Understand basic Workload Partition (WPAR) concepts and commands (recommended for the WPAR Manager content) • Mastery of the UNIX user interface including use of the vi editor, command execution, input and output redirection, and the use of utilities such as grep These skills could be developed through experience or by formal training. Recommended training courses to obtain these prerequisite skills are either of the following: • Power Systems for AIX III: Advanced Administration and Problem Determination (AN12) and its prerequisites • AIX System Administration I: Implementation (AU14) and its prerequisites. (Note that AU14 does not cover WPARs) If the student has AIX system administration skills, but is not familiar with the LPAR environment, those skills may be obtained by attending either of the following: • AU73/Q1373 System p Virtualization I: Planning and Configuration • AN11 Power Systems Administration I: LPAR Configuration

Objectives On completion of this course, students should be able to: • Perform system problem determination and reporting procedures including analyzing error logs, creating dumps of the system, and providing needed data to the AIX Support personnel • Examine and manipulate Object Data Manager databases • Identify and resolve conflicts between the Logical Volume Manager (LVM) disk structures and the Object Data Manager (ODM) • Complete a very basic configuration of Network Installation Manager to provide network boot support for either system installation or booting to maintenance mode • Identify various types of boot and disk failures and perform the matching recovery procedures • Implement advanced methods such as alternate disk install, multibos, and JFS2 snapshots to use a smaller maintenance window • Install and configure Workload Partition Manager to support WPAR management and to implement Live Application Mobility (LAM)

xvi

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

pref

Contents • Overview of advanced administration techniques • Error monitoring • The Object Data Manager (ODM) • Basic Network Installation Manager (NIM) configuration • System initialization problem determination • Disk management theory and procedures • Advanced techniques for installation and backup • Workload Partition (WPAR) Manager and Live Application Mobility • The AIX system dump facility

© Copyright IBM Corp. 2009

Course description Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

xvii

Instructor Guide

xviii AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

pref

Agenda The estimated timings provided here are for content only. It assumes the remainder of the day is consumed with hourly breaks and a one hour lunch break. Most days are timed to allow class dismissal between 4 p.m. and 4:30 p.m. assuming a 9 a.m. to 5 p.m. class day. If the class runs quicker than expected, most days have an optional lab for the students to play with, which will help fill in the time.

Day 1 (est 5:27) (00:20) Welcome (00:55) Unit 1 - Advanced AIX administration overview (00:35) Exercise 1 - Problem diagnostic information (01:10) Unit 2 - The Object Data Manager (00:42) Exercise 2 - The Object Data Manager (01:35) Unit 3 - Error monitoring (00:45) Exercise 3 - Error monitoring

Day 2 (est 5:13) (01:00) Unit 4 - Network Installation Manager basics (00:55) Exercise 4 - Basic NIM configuration (01:28) Unit 5 - System initialization: Part I (00:50) Exercise 5 - System initialization: Part I (00:18) (optional) Exercise 3 Part 3 - Using RMC to monitor resources on a system

Day 3 (est 5:20) (01:24) Unit 6 - System initialization: Part II (00:30) Exercise 6 - System initialization: Part: II (01:22) Unit 7 - Disk management theory (00:33) Exercise 7 - LVM metadata and problems (01:00) Unit 8 - Disk management procedures (00:20) Exercise 8 parts 1 and 2: Disk replacement techniques (00:23) (optional) Exercise 7 part 5 - Manually fixing an LVM ODM problem

Day 4 (est 5:23) (00:20) Unit 8, Part 2 - Export and import (to fix VGDA/ODM conflict) (00:35) Exercise 8 parts 3 and 4 - Disk management procedures (00:24) Unit 9 - Install and backup techniques (00:12) Exercise 9, part 1 - Alternate disk copy (pre-clone) © Copyright IBM Corp. 2009

Agenda Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

xix

Instructor Guide

(00:24) Unit 9, topic 2 - multibos (00:30) Exercise 9, part 1 - Wait for clone completion (30 min clone) (00:10) Exercise 9, part 1 - Alternate disk copy (post-clone) (00:10) Exercise 9, part 2 - multibos (pre-clone) (00:20) Unit 9, topic 3 - JFS2 snapshot (00:37) Exercise 9, part 2: wait for clone completion (37 min clone (00:18) Exercise 9, part 2: multibos (post-clone) (00:18) Exercise 9, part 3: JFS2 snapshot (00:10) Unit 10, topic 1 - Workload partitions review (00:50) Unit 10, topic 2 - WPAR Manager (01:03) Exercise 10 part 1 - Installing WPAR Manager (00:23) (optional) Exercise 7 part 3 - Using intermediate LVM commands

Day 5 (est 3:08) (00:33) Exercise 10 part 2 - Create and activate a WPAR (01:15) Unit 10, topic 3 - Application mobility (00:13) Exercise 10 part 3 - Enhanced Live Application Mobility (00:28) Exercise 10 part 4- Working with static relocation (00:30) Unit 11 - The AIX system dump facility (00:15) Exercise 11 - System dump facility (00:28) (optional) Exercise 10 part 4 - Working with static relocation (00:30) Wrap up / Evaluations

xx

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit 1. Advanced AIX administration overview What this unit is about This unit introduces various AIX administration issues related to problem determination and handling system maintenance and backup in an efficient manner.

What you should be able to do After completing this unit you should be able to: • List the steps of a basic methodology for problem determination • List AIX features that assist in minimizing planned downtime or shortening the maintenance window • Explain how to find documentation and other key resources needed for problem resolution

How you will check your progress Accountability: • Checkpoint questions • Lab exercise

References SG24-5496

Problem Solving and Troubleshooting in AIX 5L (Redbook)

SG24-5766

AIX 5L Differences Guide Version 5.3 Edition (Redbook)

SG24-7559

IBM AIX Version 6.1 Differences Guide (Redbook)

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-1

Instructor Guide

Unit objectives IBM Power Systems

After completing this unit, you should be able to: • List the steps of a basic methodology for problem determination • List AIX features that assist in minimizing planned downtime or shortening the maintenance window • Explain how to find documentation and other key resources needed for problem resolution

© Copyright IBM Corporation 2009

Figure 1-1. Unit objectives

AN151.0

Notes:

1-2

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — List the objectives for this unit. Details — Additional information — The Problem Solving and Troubleshooting in AIX 5L Redbook listed under the References heading was last updated in May 2002. However, it appears that a more current Redbook dealing with this topic is not available. Transition statement — Problem determination is an important part of system administration.

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-3

Instructor Guide

Application outages IBM Power Systems

• Functional or performance • Avoid unplanned outages with best practices – – – –

Change control Data security Capacity planning High availability design

• Avoid planned outages – Fall-over to backup server – Relocate application (LPAR or WPAR mobility)

• Use maintenance windows – Application stopped versus slow activity – Plan enough time for back-out or recovery – Minimize time needed

• Effective problem determination and recovery © Copyright IBM Corporation 2009

Figure 1-2. Application outages

AN151.0

Notes: Introduction Providing system availability is a major responsibility of any system administrator. An outage may be caused by a functional problem (such as an application or system crash) or a server performance problem (business is seriously impacted due to poor response times or late jobs). There are many approaches to dealing with this.

Unplanned outages When most of us think of availability, we think of unplanned outages. Regular hardware and software maintenance can often avoid these outages. Designing the computing facility to have redundant components (power, network adapters, network switches, storage, and more) can make the overall system resilient to the failure of individual components. Performance problems are often the result of failing to do proper capacity planning, resulting in not enough resources (memory, processors, network bandwidth, or disk I/O bandwidth) to handle the increased workload. If there is no change control to manage what 1-4

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

work is placed on a system, capacity planning is even more challenging. Furthermore, uncontrolled changes to a system result in uncontrolled exposure to possible outages created by those changes, an thus unplanned outages. Computer viruses and other malicious attacks by computer hackers can also reduce system availability (in addition to the exposure of losing proprietary information). Good data security policies are essential. Even when implementing good policies in these areas, some unplanned outages will still happen. In these situations, the system administrator needs to have a plan for minimizing the impact and recovering as quickly as possible. One common approach is to have an alternate system that can take over the work of the failed system. High Availability Cluster Multi-Processing (HACMP) provides a system for either concurrent processing by multiple systems, or an automated fall-over to a backup system, thus minimizing the impact of a server failure. Such server redundancy can be designed to work within a single facility or be divided between different geographical locations. Obviously, rapid notification of a problem, effective and prompt diagnosis of the cause, and being able to quickly implement an effective solution will all contribute to a smaller mean time to recovery.

Planned outages By using change control, the risk associated with certain categories of potential unplanned outages can be managed by implementing the changes during planned windows of time when the impact of any unexpected problem (resulting from the change) is minimized. In addition, there are certain types of changes for which an outage is unavoidable. Some facilities will implement multiple types of maintenance windows. One type would be frequent short maintenance windows for any administrative work that will compete with applications for resources (performance impact) or have a small chance of having a functional disruption. Another type would be a less frequent window in which any reboot of the system or any major change to the level of the operating system or major subsystems, such as database software, would be allowed. Sometimes, the amount of time in a maintenance window is relatively small and the work has to be carefully planned. You also need to allow time to recover if any thing goes wrong due to the maintenance. Any needed resources that can be pre-staged will help expedite the work. Any approach that can speed recovery after a problem occurs is also useful. For systems which need to be up 24 hours a day, seven days a week, and every day in the year (24x7x365), even a short outage cannot be tolerated. In those situation, a method to non-disruptively move the applications to another system can be invaluable. If an HACMP cluster solution is already in place to handle unplanned outages, then this can be used to manually fall-over the services to another system while maintenance is being done. Other solutions are to use Live Partition Mobility or Live Application Mobility.

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-5

Instructor Guide

Instructor notes: Purpose — Provide an overview of issues related to application availability. Details — Additional information — Transition statement — Let’s briefly look at the use of LPAR and WPAR mobility to avoid application outages.

1-6

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Live Partition Mobility versus Live Application Mobility IBM Power Systems

VIOS

VIOS

• Live Partition Mobility allows the Multiple systems managed by a single HMC migration of a running logical partition to another physical server. Server 1 Server 2 – Operating system, applications, P1 P2 P3 P1 P5 and services are not stopped during the process – Requires POWER6 , AIX 5.3 HMC and VIO server Network • Live Application Mobility allows moving a workload partition from one server to another. – Without requiring the workload running in the AIX # 2 WPAR to be restarted – Provides outage avoidance Workload 2. AIX # 1 1. Partition and multi-system Workload Billing AIX # 3 Partition workload balancing Workload Workload Workload Data Mining n n titio titio Par Partition Par Test EMail App Srv – Requires AIX 6.1 Workload tition Workload Partition Web

Par Training

Workload Partition Dev

Policy Workload Partitions Manager

© Copyright IBM Corporation 2009

Figure 1-3. Live Partition Mobility versus Live Application Mobility

AN151.0

Notes: As the number of hosted partitions and applications increases, finding a maintenance window acceptable to all becomes increasingly difficult. Live partition or application mobility allow you to move your partitions around such that you can perform disruptive operations on the machine when it best suits you, rather than when it causes the least inconvenience to the users.

Live Partition Mobility Live Partition Mobility provides the ability to move a running logical partition (including its operating system and applications) non-disruptively from one system to another. The migration operation, which takes just a few seconds, maintains complete system transactional integrity. The migration transfers the entire system environment, including processor state, memory, attached virtual devices, and connected users.

Live Application Mobility

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-7

Instructor Guide

Live Application Mobility (LAM) is a new capability that allows a client to relocate a running WPAR from one system to another, without requiring the workload running in the WPAR to be restarted. LAM is intended for use within a data center and requires the use of the new Licensed Program Product, the IBM AIX Workload Partitions Manager. Live Application Mobility differs significantly from Live Partition Mobility in that Live Partition Mobility is a feature of POWER6 processors. As such, it can be used on operating systems other than AIX 6, such as Linux or earlier AIX versions. On the other hand, WPAR is specifically a feature of AIX 6, but it can run on various hardware platforms (for example: POWER6, POWER5 or POWER5+, or POWER4 systems).

1-8

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Briefly introduce the types of mobility and their differences. Details — Focus on why mobility is important and the high level differences between the two types. Other courses cover Live Partition Mobility and there is a later unit in this course that focuses on WPAR mobility. Additional information — Transition statement — Let’s look at the other factors in minimizing the time for completing maintenance tasks.

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-9

Instructor Guide

Maintenance window tasks IBM Power Systems

• Minimize time needed for tasks • Operating system maintenance – Pre-staging of maintenance – Applying maintenance to alternate rootvg – Applying maintenance with alternate BLV – Reboot to use updated alternate

• System backups – Minimizing rootvg size – Snapshot techniques for user file systems

© Copyright IBM Corporation 2009

Figure 1-4. Maintenance window tasks

AN151.0

Notes: Expediting work in the maintenance window The quicker maintenance can be completed the sooner you can get the system back up and head home (this is likely at night or on a weekend). More importantly, expediting the expedited activities will allow more time to handle any problems that may arise.

Operating system maintenance Ensure you have, on hand, whatever materials you will need for the job, such as the installation media. Eliminating the need to handle that media can be important. This can be done by pre-copying all of the needed filesets to disk storage. This could be on an NFS or NIM server (provided you have sufficient network bandwidth) or it could be a software repository on the system being updated. If using a software repository on the system which is being updated, it is recommended that the filesets be in a file system allocated out of a different volume group than the rootvg.

1-10 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

An important technique, that we will cover, is the use of an alternate storage for the target of the software update. What we mean is that the updates are not made to the rootvg, but rather to a copy of the rootvg. This has two advantages. First, there is no change being made to the active rootvg. For locations that make a distinction between changing the level of the operating system and simply doing work that has a performance impact, the actual time consuming update activity can be done in a more frequently available window. Then when a major maintenance window arrives, you only need to reboot to make it effective. The second advantage, and to some the more important advantage, is the ease of recovery. If you find that there are serious problems with running under the new level of code, you only need to reboot back to the earlier code level, rather than recover from a mksysb or reject the entire update. Of course, the down side is that you will need to reboot to make the update effective; but, this is something a major maintenance window should expect. There are two techniques that we will cover. One technique, is creating an alternate set of logical volumes that are copies of the rootvg BOS logical volumes. This is called multibos. The other technique, is creating an alternate volume group which is a clone of the rootvg. In each case, you would apply the maintenance to the copy and then later reboot to make it effective.

Expediting backups Another common maintenance activity is backing up the system. Unless you have an application that is designed to manage a recovery process using fuzzy backups, you will need to quiesce the application activity long enough to be sure that there are no inconsistencies in the backup. The term fuzzy backup refers to a backup in which the application was making changes during the backup. For a given transaction, multiple data changes are made. Some of these transaction related changes are made before that data was backed up, while other changes were made after that data was backed up. Thus the backup has one piece of data which reflects the transaction and another piece of data that does not reflect the transaction. The two pieces of data are inconsistent and such a backup is referred to as fuzzy. For the rootvg itself, the size of the rootvg should be minimized. It should only contain what is needed for the OS. All user data and other non-essential files should be backed up and restored separately. An example would be the standard location of a software repository: /usr/sys/inst.images. The software repository can be very large and yet this common path resides in the /usr file system, which is in the rootvg. Placing the software repository in a separate file system with its own recovery plan (could be using the original media as the backup) can help reduce backup and recovery time. Another common example is the /home filesystem. If users have vast amounts of data stored there, then over mounting with a separate file system can again speed up working with the rootvg. There other file systems such as /tmp that could have contents be eliminated from the system backup.The trick is that these would need to be excluded (not mounted or identified in /etc/exclude.rootvg) from the backup during mksysb execution, and then

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-11

Instructor Guide

separately recovered from their own backup. Other user data will be in separate user volume groups. With the emphasis on separate backups for non-BOS data, there comes a need to minimize how long the applications need to be quiesced and still have data consistency. One technique that AIX provides is JFS2 snapshots, which will allow us to only very briefly quiesce the application and still have a consistent picture of the data at a single point in time. Then we can either use that snapshot of the data as its own backup, or base an actual backup upon that snapshot (in order to have off-site storage of the backup). There other facilities for doing snapshot captures of data. Some are part of the storage subsystems and some are part of total storage solutions such as Tivoli Storage Manager. Our focus will be on the facility that is provided with AIX: JSF2 snapshot.

1-12 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Cover approaches that reduce maintenance time. Details — Additional information — Transition statement — Actual hardware or software problems are also a concern for application availability. What do we need to do to better manage problems when they occur?

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-13

Instructor Guide

Effective problem management IBM Power Systems

• Keep system documentation current • Keep maintenance up to date. • Use a problem determination methodology. • If an AIX bug: – Collect problem information. – Open problem report with AIX Support. – Provide snap with information.

© Copyright IBM Corporation 2009

Figure 1-5. Effective problem management

AN151.0

Notes: Obtaining and documenting information about your system It is a good idea, whenever you approach a new system, to learn as much as you can about that system. It is also critical to document not only the physical resources and the devices, but also how the system has been configured (network, LVM, and more). Then this information will be ready when needed. Later in the course, we will suggest some ways to collect system information.

System maintenance Sometimes code works well under normal testing or production circumstances, but can have a poor logic discovered when faced with an unanticipated situation. Alternatively, it could be some non-central aspect of the code that is not noticed normally. The number of facilities using this code is large enough that there is a good chance that one of the facilities will detect and report the problem not long after release of the new code level. 1-14 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

The fix for the code defect will usually come out in the next released fix pack. On the other hand, many facilities may not be effected by or be concerned about the code defect problem for months, until the circumstances arise in which it represents a problem. By installing newer service packs, a facility can benefit from the experience of others and avoid being impacted by known problems. Obviously there is always the possible exposure that a new fix pack will introduce new problems, while solving many old problems. This course will cover some techniques to use in applying fix packs.

Problem determination Once you find yourself impacted by what you believe to be a product defect, you will need to obtain prompt resolution. While there is no substitute for experience (the ability to recognize a situation and remember the details of how you dealt with it the last time a similar problem occurred), many problems will be most effectively solved by following a well developed problem determination methodology. This course will cover a basic problem determination methodology.

Problem determination When you find yourself impacted by what you believe to be a product defect, you will need to contact AIX Support. Before contacting AIX Support, you should write up a description of the problem and the surrounding circumstances. When you open a new Problem Management Report (PMR) with AIX Support, you will be expected to provide them with a wealth of information to assist them in determining the cause of the problem. The snap command is a common tool to assist in collecting a vast amount of information about the environment surrounding the problem. The course materials will cover these problem reporting procedures.

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-15

Instructor Guide

Instructor notes: Purpose — Introduce problem management. Details — Additional information — Transition statement — As just stated, keeping good documentation is important. Let’s take a closer look at this.

1-16 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Before problems occur IBM Power Systems

• Effective problem determination starts with a good understanding of the system and its components. • The more information you have about the normal operation of a system, the better. – System configuration – Operating system level – Applications installed – Baseline performance – Installation, configuration, and service manuals

System System documentation documentation

© Copyright IBM Corporation 2009

Figure 1-6. Before problems occur

AN151.0

Notes: Obtaining and documenting information about your system It is a good idea, whenever you approach a new system, to learn as much as you can about that system. It is also critical to document both logical and physical device information so that it is available when troubleshooting is necessary.

Information that should be documented Examples of important items that should be determined and recorded include the following: - Machine architecture (model, CPU type) - Physical volumes (type and size of disks)

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-17

Instructor Guide

- Volume groups (names, just a bunch of disks (JBOD) or redundant array of independent disks (RAID) - Logical volumes (mirrored or not, which VG, type) - Filesystems (which VG, what applications) - Memory (size) and paging spaces (how many, location)

1-18 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Provide guidance on what needs to be known before things go wrong. Details — In the opinion of the developer of the current revision, this visual introduces one of the most important points that you as an instructor should make regarding “what it takes” to be a successful system administrator: In order to be successful at determining what has gone wrong (and how to respond) when there are system problems, the administrator must be extremely familiar with the characteristics of his or her system when it is functioning normally. Be sure to make this point! Use the student notes to guide the rest of your presentation. Additional information — Transition statement — Some of the commands you might want to start out with are discussed next.

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-19

Instructor Guide

Before problems occur: A few good commands IBM Power Systems

• lspv

Lists physical volumes, PVID, VG membership

• lscfg

Provides information regarding system components

• prtconf

Displays system configuration information

• lsvg

Lists the volume groups

• lsps

Displays information about paging spaces

• lsfs

Gives file system information

• lsdev

Provides device information

• getconf

Displays values of system configuration variables

• bootinfo

Displays system configuration information (unsupported)

• snap

Collects system data © Copyright IBM Corporation 2009

Figure 1-7. Before problems occur: A few good commands

AN151.0

Notes: A list of useful commands The list of commands on the visual provides a starting point for use in gathering key information about your system. There are also many other commands that can help you in gathering important system information.

Sources of additional information Be sure to check the man pages or the AIX Commands Reference for correct syntax and option flags to be used with these commands to provide more specific information. There is no man page or entry in the AIX Commands Reference for the bootinfo command.

1-20 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Point out just a few of the commands that are helpful in learning about the system and its configuration. Details — Present the information in the student notes. Provide board work to show some of the commands and the options that can be used with them. Additional information — The bootinfo command is not officially supported; but, in case students ask, here is some information regarding some of the most commonly used flags of this command: -r

Displays real memory in KB

-p

Displays hardware platform (rs6k, rspc, chrp)

-y

Displays 32 if hardware is 32-bit or 64 if hardware is 64-bit

-K

Displays 32 if kernel is 32-bit or 64 if kernel is 64-bit

-z

Displays processor type (0=uniprocessor, 1=multiprocessor)

Transition statement — Let’s talk about what to do when things go wrong.

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-21

Instructor Guide

Steps in problem resolution IBM Power Systems

1.Identify the problem 2. Talk to users to define the problem 3. Collect system data 4. Resolve the problem

© Copyright IBM Corporation 2009

Figure 1-8. Steps in problem resolution

AN151.0

Notes: The start-to-finish method The start-to-finish method for resolving problems consists primarily of the following four major components: -

Identify the problem. Talk to users (to define the problem). Collect system data. Resolve (fix) the problem.

Step 1: Identify the problem The first step in problem resolution is to find out what the problem is. It is important to understand exactly what the users of the system perceive the problem to be. A clear description of the problem typically gives clues as to the cause of the problem and aids in the choice of troubleshooting methods to apply. 1-22 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Step 2: Gathering additional detail A problem might be identified by just about anyone who has use of or a need to interact with the system. If a problem is reported to you, it may be necessary to get details from the reporting user and then query others on the system in order to obtain additional details or to develop a clear picture of what happened. The users may be data entry staff, programmers, system administrators, technical support personnel, management, application developers, operations staff, network users, and so forth.

Suggested questions -

What is the problem? What is the system doing (or not doing)? How did you first notice the problem? When did it happen? Have any changes been made recently?

Keep them talking until the picture is clear. Ask as many questions as you need to in order to get the entire history of the problem.

Step 3 - Collect system data Some information about the system will have already been collected from the users during the process of defining the problem. By using various commands, such as lsdev, lspv, lsvg, lslpp, lsattr, and others, you can gather further information about the system configuration. You should also gather other relevant information by making use of available error reporting facilities, determining the state of the operating system, checking for the existence of a system dump, and inspecting the various available log files. -

How is the machine configured? What errors are being produced? What is the state of the OS? Is there a system dump? What log files exist?

SMIT and Web-based system manager logs If SMIT and the Web-based System Manager have been used, there will be additional logs that could provide further information. These log files are normally contained in the home directory of the root user and are named (by default) /smit.log for SMIT and /websm.log for the Web-based System Manager.

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-23

Instructor Guide

Step 4 - Resolve the problem After all the information is gathered, determine the procedures necessary to solve the problem. Keep a log of all actions you perform in trying to determine the cause of the problem, and any actions you perform to correct the problem. - Use the information gathered. - Keep a log of actions taken to correct the problem. - Use the tools available: commands documentation, downloadable fixes, and updates. - Contact IBM Support, if necessary.

Resources for problem solving A variety of resources, such as the documentation for individual commands, are available to assist you in solving problems with AIX 6 systems. The IBM System p and AIX Information Center is a Web site that serves as a focal point for all information pertaining to pSeries and AIX. It provides a link to the entire pSeries library. A message database is available to search on error numbers, error identifiers, and display codes (LED values). The Web site also contains FAQs, how-tos, a Troubleshooting Guide, and more.

Information Center URL The URL for the IBM System p and AIX Information Center is as follows: http://publib16.boulder.ibm.com/pseries/index.htm

1-24 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Provide the big picture for problem resolution. Details — Additional information — Transition statement — An important part of the problem description is the collection of generated messages or codes. Let’s look at some of the different types of codes and where we can look them up.

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-25

Instructor Guide

Progress and reference codes IBM Power Systems

• • • •

Progress codes System reference codes (SRCs) Service request numbers (SRNs) Obtained from: – Front panel of system enclosure – HMC or IVM (for logically partitioned systems) – Operator console message or diagnostics (diag utility)

• Online hardware and AIX documentation available at: http://publib.boulder.ibm.com/infocenter/systems – Select System Hardware > System i and System p • Popular links and effective searches available

– Select Operating System > AIX 6.1 Information • Search for “message center” • Diagnostic Information for Multiple Bus Systems (SA38-0509) © Copyright IBM Corporation 2009

Figure 1-9. Progress and reference codes

AN151.0

Notes: Introduction AIX provides progress and error indicators (display codes) during the boot process. These display codes can be very useful in resolving startup problems. Depending on the hardware platform, the codes are displayed on the console and the operator panel.

Operator panel For non-LPAR systems, the operator panel is an LED display on the front panel. POWER4, POWER5, and POWER6-based systems can be divided into multiple Logical Partitions (LPARs). In this case, a system-wide LED display still exists on the front panel. However, the operator panel for each LPAR is displayed on the screen of the Hardware Management Console (HMC). The HMC is a separate system which is required when running multiple LPARs. Regardless of where they are displayed, they are often referred to as LED Display Codes.

1-26 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Progress codes and other reference codes Reference codes can have various sources: - Diagnostics: • Diagnostics or error log analysis can provide Service Request Numbers (SRNs) which can be used to determine the source of a hardware or operating system problem. - Hardware initialization: • System firmware sends boot status codes (called firmware checkpoints) to the operator panel. Once the console is initialized, the firmware can also send 8-digit error codes to the console. - AIX initialization: • The rc.boot script and the device configuration methods send progress and error codes to the operator panel. Codes from the hardware/firmware or from AIX initialization scripts fall into two categories: - Progress Codes: These are checkpoints indicating the stages in the initial program load (IPL) or boot sequence. They do not necessarily indicate a problem unless the sequence permanently stops on a single code or a rotating sequence of codes. - System Reference Codes (SRC): These are error codes indicating that a problem has originated in hardware, Licensed Internal Code (firmware), or in the operating system.

Documentation Note: all information on Web sites and their design is based upon what is available at the time of this course revision. Web site URLs and the design of the related Web pages often change. Online hardware documentation and AIX message codes are available at: http://publib.boulder.ibm.com/infocenter/systems - Many of the codes you will deal with are actually hardware or firmware related. For those codes, you need to navigate to the infocenter that specializes in system hardware. • The content area has popular links for accessing code information, or you can use search strings such as: system reference codes, service request numbers, or service support troubleshooting. - For AIX codes and messages, you will need to navigate to the Operating System infocenter for AIX.

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-27

Instructor Guide

• From here you can use the search string of AIX message center to obtain information on various codes (including the seven digit message codes). • One very useful reference that you can find at the AIX infocenter is the: RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems (SA38-0509). Chapter 30 has AIX diagnostic numbers and location codes. It provides descriptions for the numbers and characters that display on the operator panel and descriptions of the location codes used to identify a particular item.

1-28 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Introduce some places to find information to help diagnose boot problems. Details — Additional information — In AIX V4, many problem solving procedures were described in the AIX Problem Solving Guide and Reference (SC23-2606). Transition statement — If you have a problem and you think it is a defect in the product what do you do? Call IBM.

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-29

Instructor Guide

Working with AIX Support IBM Power Systems

• Have needed information ready: – Name, phone #, customer #, – Machine type model and serial #, – AIX version, release, technology level, and service pack – Problem description, including error codes – Severity level: critical, significant impact, some impact, minimal

• 1-800-IBM-SERV (1-800-426-7378) • Level 1 will collect information and assign PMR number • Route to level 2 responsible for the product • You may be asked to collect additional information to upload • They may ask you to update to a specific TL or SP – APAR for your problem already addressed – Need to have a standard environment for them to investigate © Copyright IBM Corporation 2009

Figure 1-10. Working with AIX Support

AN151.0

Notes: If you believe that your problem is the result of a system defect, you can call AIX Support to request assistance. Before you call 1-800-IBM-SERV, it is a good idea to have certain information ready. They will want to verify your name against a list of names associated with your customer number, and validate that your customer number has support for the product in question. They will also need to know some details about the hardware and software environment in which the problem is occurring - such as your MTMS (machine type, model, serial), your AIX OS level, and the level of any other relevant software. Of course, you need to explain your problem, providing as much detail as possible, especially any error messages or codes. The level 1 personnel will ask you for the priority of your problem. • Severity level 1(critical) indicates that the function does not work, your business is severely impacted, there is no work around, and that there needs to be an immediate solution. Be aware that, for severity level 1, you will be expected to be available 24x7 until the problem is resolved.

1-30 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

• Severity level 2 (significant impact) indicates that the function is usable but is limited in a way that your business is severely impacted. • Severity level 3 (some impact) indicates that the program is usable with less significant features (not critical to operations) unavailable. • Severity level 4 (minimal impact) indicates that the problem causes little impact on operations, or a reasonable circumvention to the problem has been implemented. Level 1 will assign you a PMR number (actually a PMR and branch number combination) for tracking purposes. Each time, in the future, when you call about this problem, you should have the PMR and branch numbers at hand. Once the basic information has been collected, you are passed to level 2 personal for the product area for which you are having a problem. They will work with you in investigating the nature and cause of your problem. They will search the support database to see if it is a known problem that is either already being worked on or has a solution already developed. In many cases, they will request that you update to a specific technology level and service pack that already includes the fix. If they do not have a fix, they may still ask you to update your system and determine if the problem still exists. If the problem still exists, they now have a known software environment to work with. At this point they will often ask for a complete set of information from your system to be collected and uploaded to their server, to support their investigation. The basic tool for collecting your system information is the snap command.

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-31

Instructor Guide

Instructor notes: Purpose — Introduce the procedure for working with AIX Support. Details — Additional information — Transition statement — Let’s look at how we work with the snap command.

1-32 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

AIX Support test case data (1 of 2) IBM Power Systems

Run the following (or very similar) commands to gather snap information: # snap –a

This step will create /tmp/ibmsupt/snap.pax.Z.

# snap –c

# mv /tmp/ibmsupt/snap.pax.Z \ PMR#.b.c.snap.pax.Z © Copyright IBM Corporation 2009

Figure 1-11. AIX Support test case data (1 of 2)

AN151.0

Notes: Overview of the snap command The snap command is used to gather system configuration information useful in identifying and resolving system problems. The snap command can also be used to compress the snap information gathered into a pax file. The file may then be written to a device such as tape or DVD, or transmitted to a remote system. Refer to the man page for snap or the corresponding entry in the AIX Commands Reference manual for detailed information about the snap command and its various flags.

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-33

Instructor Guide

Discussion of command sequence shown on the visual First, as illustrated on the visual, the -a flag of the snap command should be used to gather all system configuration information that can be gathered using snap. The output of this command will be written to the /tmp/ibmsupt directory. Next, you should place any additional testcase data that you feel may be helpful in resolving the problem being investigated into the /tmp/ibmsupt/ other subdirectory or into the /tmp/ibmsupt/testcase subdirectory. This additional information is then included (together with the information gathered directly by snap) in the compressed pax file created in the next step in this command sequence. As shown, the -c flag of the snap command should then be used to create a compressed pax file containing all files contained in the /tmp/ibmsupt directory. The output file created by this command is /tmp/ibmsupt/snap.pax.Z. Next, the /tmp/ibmsupt/snap.pax.Z output file should be renamed using the mv command to indicate the PMR number, branch number, and country number associated with the data in the file. For example, if the PMR number is 12345, the branch number is 567, and the country number is 890, the file should be renamed 12345.b567.c890.snap.pax.Z. (The country code for the United States is: 000).

1-34 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Cover how to create a snap. Details — Additional information — Transition statement — Once you have created a compressed snap file, you will need to upload it to AIX Support.

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-35

Instructor Guide

AIX Support test case data (2 of 2) IBM Power Systems

Upload the information you have captured: # ftp testcase.software.ibm.com User: anonymous Password: ftp> cd /aix/toibm ftp> bin ftp> put PMR#.b.c.snap.pax.Z ftp> quit © Copyright IBM Corporation 2009

Figure 1-12. AIX Support test case data (2 of 2)

AN151.0

Notes: Uploading data to AIX Support AIX Support provides an anonymous FTP server for receiving your testcase data. The host name for that server is: testcase.software.ibm.com. Once you login to the server, change directory to /aix/toibm. Be sure to transfer the file as binary to avoid an undesirable attempt by FTP to convert the contents of the file. Then just put your file on the server and notify your support contact that the data is there.

1-36 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Cover how to upload the snap file. Details — Additional information — Transition statement — AIX Support provides software fixes for the reported problems. Your reported problem may already have an available fix. Let’s review the packaging and levels of AIX operating system updates.

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-37

Instructor Guide

AIX software update hierarchy IBM Power Systems

• Version and release (oslevel) – Requires new license and migration install

• Fileset updates (lslpp –L will show mod and fix levels) – Collected changes to files in a fileset – Related to APARs and PTFs – Only need to apply the new fileset

• Fix bundles – Collections of fileset updates

• Technology level and maintenance level (oslevel –r) – Fix bundle of enhancements and fixes

• Service packs (oslevel –s) – Fix bundle of important fixes

• Interim fixes – Special situation code replacements – Delay for normal PTF packaging is too slow – Managed with efix tool © Copyright IBM Corporation 2009

Figure 1-13. AIX software update hierarchy

AN151.0

Notes: Version, release, mod, and fix The oslevel command by default shows us the version and release of the operating system. Changing this requires a new license and a disruption to the system (such as rebooting to installation and maintenance to do a migration install). The mod and fix levels in the oslevel -s output are normally displayed as zeros. The mod level displayed in the oslevel output should reflect the technology level. The mod and fix levels are used to reflect changes to the many individual filesets which make up the operating system. These are best seen by browsing through the output of the lslpp -L report. These changes only require the administrator to install a Program Temporary Fix (PTF) in the form of a fix fileset. A given fix fileset can resolve one or more problems or APARs (Authorized Program Analysis Report).

1-38 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Fix bundles It is useful to collect many accumulated PTFs together and test them together. This can then be used as a base line for a new cycle of enhancements and corrections. By testing them together, it is often possible to catch unexpected interactions between them. There are two types of AIX fix bundles. One type of fix bundle is a Technology Level (TL) update (formally known as Maintenance Level or ML). This is a major fix bundle which not only includes many fixes for code problems, but also includes minor functional enhancements. You can identify the current AIX technology level by running the oslevel -r command. Another type of bundling is a Service Pack (SP). A Service Pack is released more frequently than a Technology Level (between TL releases) and usually only contains needed fixes. You can identify the current AIX technology level and service pack by running the oslevel -s command. For the oslevel command to reflect a new TL or SP, all related filesets fixes must be installed. If a single fileset update in the fix bundle is not installed, the TL or SP level will not change.

Interim fixes On rare occasions, a customer has an urgent situation which needs fixes for a problem so quickly that they cannot wait for the formal PTF to be released. In those situations, a developer may place one or more individual file replacements on an FTP server and allow the system administrator to download and install them. Originally, this would simply involve manually copying the new files over the old files. But this created problems, especially in identifying the state of a system which later experienced other (possibly related) problems or in backing out the changes. Today, there is a better methodology for managing these interim fixes using the efix command. Security alerts will often provide interim fixes for the identified security exposure. Depending upon your own risk analysis, you might immediately use the interim fix, or wait for the next service pack (which will include these security fixes). The syntax and use of the efix command was covered in the prerequisite course.

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-39

Instructor Guide

Instructor notes: Purpose — Explain standard terminology for software updates Details — Additional information — Transition statement — Let’s look at how we obtain these updates.

1-40 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Relevant documentation IBM Power Systems

• IBM System p and AIX Information Center entry page: http://publib.boulder.ibm.com/eserver – Links to: • • • •

IBM Systems Information Center IBM Systems Hardware Information Center IBM Systems Software Information Center IBM System p and AIX information Center

– The System p and AIX information Center and links for both: • AIX 5L Version 5.3 • AIX Version 6.1

• IBM Redbooks home: http://www.redbooks.ibm.com © Copyright IBM Corporation 2009

Figure 1-14. Relevant documentation

AN151.0

Notes: IBM System p and AIX Information Center Most software and hardware documentation for AIX 5L and AIX 6 systems can be accessed online using the IBM System p and AIX Information Center Web site: http://publib16.boulder.ibm.com/pseries/index.htm

IBM systems Information Center Hardware documentation for POWER5 processor-based systems can be accessed online using the IBM Systems Information Centers site.

IBM Redbooks Redbooks can be viewed, downloaded, or ordered from the IBM Redbooks Web site: http://www.redbooks.ibm.com

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-41

Instructor Guide

Instructor notes: Purpose — Identify URLs for hardware and software documentation. Details — Let students know that hard copy versions of the manuals can be ordered from their IBM marketing representative. Additional information — Transition statement — Let’s review what we have covered with some checkpoint questions.

1-42 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Checkpoint IBM Power Systems

1. What are the four major problem determination steps? _________________________________________ _________________________________________ _________________________________________ _________________________________________ 2. Who should provide information about system problems? _________________________________________ _________________________________________ 3. True or False: If there is a problem with the software, it is necessary to get the next release of the product to resolve the problem. 4. True or False: Documentation can be viewed or downloaded from the IBM Web site. © Copyright IBM Corporation 2009

Figure 1-15. Checkpoint

AN151.0

Notes:

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-43

Instructor Guide

Instructor notes: Purpose — Discuss the first group of checkpoint questions. Details — A checkpoint solution is provided below:

Checkpoint solutions IBM Power Systems

1. What are the four major problem determination steps? Identify the problem Talk to users (to further define the problem) Collect system data Resolve the problem 2. Who should provide information about system problems? Always talk to the users about such problems in order to gather as much information as possible. 3. True or False: If there is a problem with the software, it is necessary to get the next release of the product to resolve the problem. False. In most cases, it is only necessary to apply fixes or upgrade microcode. 4. True or False: Documentation can be viewed or downloaded from the IBM Web site. © Copyright IBM Corporation 2009

Additional information — Transition statement — Let’s take a look at what we have in the class lab environment.

1-44 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Exercise 1: Advanced AIX administration overview IBM Power Systems

Ɣ Recording system information Ɣ Finding reference code documentation Ɣ Creating a snap file

© Copyright IBM Corporation 2009

Figure 1-16. Exercise 1: Advanced AIX administration overview

AN151.0

Notes:

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-45

Instructor Guide

Instructor notes: Purpose — Introduce the exercise for this unit. Details — Additional information — Transition statement — Let’s summarize what we have covered in this unit.

1-46 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit summary IBM Power Systems

Having completed this unit, you should be able to: •

List the steps of a basic methodology for problem determination

•

List AIX features that assist in minimizing planned downtime or shortening the maintenance window

•

Explain how to find documentation and other key resources needed for problem resolution

© Copyright IBM Corporation 2009

Figure 1-17. Unit summary

AN151.0

Notes:

© Copyright IBM Corp. 2009

Unit 1. Advanced AIX administration overview Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-47

Instructor Guide

Instructor notes: Purpose — Remind the students of some of the key points in this unit. Details — Additional information — Transition statement — That is the end of this unit.

1-48 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit 2. The Object Data Manager What this unit is about This unit describes the structure of the Object Data Manager (ODM). It shows the use of the ODM command line interface and explains the role of the ODM in device configuration. Specific information regarding the function and content of the most important ODM files is also presented.

What you should be able to do After completing this unit, you should be able to: • • • •

Describe the structure of the ODM Use the ODM command line interface Explain the role of the ODM in device configuration Describe the function of the most important ODM files

How you will check your progress Accountability: • Checkpoint questions • Lab exercise

References Online

AIX Version 6.1 Command Reference volumes 1-6

Online

AIX Version 6.1 General Programming Concepts: Writing and Debugging Programs

Online

AIX Version 6.1 Technical Reference: Kernel and Subsystems

Note: References listed as “online” above are available through the IBM Systems Information Center at the following address: http://publib.boulder.ibm.com/infocenter/systems

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-1

Instructor Guide

Unit objectives IBM Power Systems

After completing this unit, you should be able to: • Describe the structure of the ODM • Use the ODM command line interface • Explain the role of the ODM in device configuration • Describe the function of the most important ODM files

© Copyright IBM Corporation 2009

Figure 2-1. Unit objectives

AN151.0

Notes: Importance of this unit The ODM is a very important component of AIX and is one major feature that distinguishes AIX from other UNIX systems. This unit describes the structure of the ODM and explains how you can work with ODM files using the ODM command line interface. It is also very important that you, as an AIX system administrator, understand the role of the ODM during device configuration. Thus, explaining the role of the ODM in this process is another major objective of this unit.

2-2

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Present the objectives of this unit. Details — Explain that a good understanding of the ODM is very important and can help in analyzing problems. Point out that the ODM is mainly used for device configuration and that this is a major focus in this unit. Additional information — None. Transition statement — Let’s start with the basics, an introduction to the ODM.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-3

Instructor Guide

2-4

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

2.1. Introduction to the ODM Instructor topic introduction What students will do — The students will learn the structure of the ODM and how they can work with ODM files to query system data. Additionally, students will be able to explain the role of the ODM in device configuration. How students will do it — Through lecture and checkpoint questions. What students will learn — Students will learn: • How the ODM is used in AIX • How the command line interface can be used to work with ODM in a safe way • How devices are configured in AIX How this will help students on their job — By having a good understanding of the ODM, solving any system problem is much easier.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-5

Instructor Guide

What is the ODM? IBM Power Systems

• The Object Data Manager (ODM) is a database intended for storing system information. • Physical and logical device information is stored and maintained through the use of objects with associated characteristics.

© Copyright IBM Corporation 2009

Figure 2-2. What is the ODM?

AN151.0

Notes:

2-6

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Introduce the Object Data Manager (ODM). Details — This visual has been intentionally kept simple. The goal here is to introduce the ODM; details will come later. Additional information — Transition statement — What kind of information is managed by the ODM?

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-7

Instructor Guide

Data managed by the ODM IBM Power Systems

Devices

Software

System resource controller

ODM

SMIT menus

TCP/IP configuration

Error Log, Dump

NIM

© Copyright IBM Corporation 2009

Figure 2-3. Data managed by the ODM

AN151.0

Notes: System data managed by ODM The ODM manages the following system data: - Device configuration data - Software Vital Product Data (SWVPD) - System Resource Controller (SRC) data - TCP/IP configuration data - Error log and dump information - NIM (Network Installation Manager) information - SMIT menus and commands

2-8

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Emphasis in this unit Our main emphasis in this unit is on the use of ODM to store and manage information regarding devices and software products (software vital product data). During the course, many other ODM classes are described.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-9

Instructor Guide

Instructor notes: Purpose — Provide an overview what data is stored in the ODM. Details — Go quickly through the list and mention that the main emphasis in this unit is on devices and software vital product data. You might want to point out that the two “hands” on the visual “point” to the types of data that will be emphasized. Later on, you supply the corresponding ODM database files where the data is stored. Additional information — You might mention that TCP/IP configuration can still be set up without using ODM. In this case, traditional ASCII files are used for storing TCP/IP data. To determine whether ODM is used for TCP/IP, use the following command: # lsattr -El inet0 If the attribute bootup_option is set to no, ODM files are used. If it is set to yes, ODM will not be used. Transition statement — Let’s define some key terminology we will need for our discussion of the ODM.

2-10 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

ODM components IBM Power Systems

uniquetype tape/scsi/scsd

disk/scsi/osdisk

tty/rs232/tty

attribute

deflt

values

block_size

none

0-2147483648,1

pvid

none

login

disable

enable, disable, ...

© Copyright IBM Corporation 2009

Figure 2-4. ODM components

AN151.0

Notes: Completing the drawing on the visual The drawing on the visual above identifies the basic components of ODM, but some terms have been intentionally omitted from the drawing. Your instructor will complete this drawing during the lecture. Please complete your own copy of the drawing by writing in the terms supplied by your instructor.

ODM data format For security reasons, the ODM data is stored in binary format. To work with ODM files, you must use the ODM command line interface. It is not possible to update ODM files with an editor.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-11

Instructor Guide

Instructor notes: Purpose — Define the basic components of ODM. Details — Complete the visual during the lesson. ODM components are: • Object classes The ODM consists of many database files, where each file is called an object class. • Objects Each object class consists of objects. Each object is one record in an object class. • Descriptors The descriptors describe the layout of the objects. They determine the name and datatype of the fields that are part of the object class. Additional information — This visual shows an extraction out of the ODM class PdAt. Do not explain the meaning of PdAt or the different fields on this page. Concentrate on the components of the ODM. Transition statement — It is also important to understand how the terms predefined device information and customized device information are used when discussing the ODM.

2-12 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

ODM database files IBM Power Systems

Predefined device information

PdDv, PdAt, PdCn

Customized device information

CuDv, CuAt, CuDep, CuDvDr, CuVPD, Config_Rules

Software vital product data

history, inventory, lpp, product

SMIT menus

sm_menu_opt, sm_name_hdr, sm_cmd_hdr, sm_cmd_opt

Error log, alog, and dump information

SWservAt

System resource controller

SRCsubsys, SRCsubsvr, ...

Network Installation Manager (NIM)

nim_attr, nim_object, nim_pdattr

© Copyright IBM Corporation 2009

Figure 2-5. ODM database files

AN151.0

Notes: Major ODM files The table on the visual summarizes the major ODM files in AIX. As you can see, the files listed in this table are placed into several different categories.

Current focus In this unit, we will concentrate on ODM classes that are used to store device information and software product data. At this point, we will narrow our focus even further and confine our discussion to ODM classes that store device information.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-13

Instructor Guide

Predefined and customized device information The first two rows in the table on the visual indicate that some ODM classes contain predefined device information and that others contain customized device information. What is the difference between these two types of information? Predefined device information describes all supported devices. Customized device information describes all devices that are actually attached to the system. It is very important that you understand the difference between these two information classifications. The classes themselves are described in more detail in the next topic of this unit.

2-14 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain the difference between predefined and customized device information. Details — Do not introduce the other ODM classes on this visual. At this point, just provide the difference between Pd and Cu classes. Additional information — Note: In the activity at the end of this topic, students have to answer the following questions: What ODM class contains all supported devices on your system? What ODM class contains all configured devices on your system? Therefore, describe clearly the meaning of PdDv and CuDv at this point. Transition statement — The next visual shows just the ODM object classes used during the configuration of a device. It also introduces cfgmgr, the “configuration manager.”

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-15

Instructor Guide

Device configuration summary IBM Power Systems

Predefined databases

PdDv

PdCn

PdAt

Configuration Manager (cfgmgr)

Config_Rules

Customized databases CuDep

CuDv

CuDvDr

CuAt

CuVPD © Copyright IBM Corporation 2009

Figure 2-6. Device configuration summary

AN151.0

Notes: ODM classes used during device configuration The visual above shows the ODM object classes used during the configuration of a device.

Roles of cfgmgr and Config_Rules When an AIX system boots, the Configuration Manager (cfgmgr) is responsible for configuring devices. There is one ODM object class which the cfgmgr uses to determine the correct sequence when configuring devices: Config_Rules. This ODM object class also contains information about various methods files used for device management.

2-16 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Summarize the device configuration classes. Details — Review the ODM object classes belonging to the predefined and customized databases. The role of the Config_Rules object class is covered here and on the next visual (and the associated student notes), but we will provide more detail about each of the other object classes shown later. Additional information — Transition statement — Let’s look at the device configuration process a little more closely.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-17

Instructor Guide

Configuration manager IBM Power Systems

Predefined

"Plug and Play"

PdDv PdAt PdCn

Config_Rules

cfgmgr Customized

Methods

CuDv

Define

Device Driver

CuAt

Load

CuDep

Configure Change

Unload

CuDvDr CuVPD

Unconfigure Undefine

© Copyright IBM Corporation 2009

Figure 2-7. Configuration manager

AN151.0

Notes: Importance of Config_Rules object class Although cfgmgr gets credit for managing devices (adding, deleting, changing, and so forth), it is actually the Config_Rules object class that does the work through various methods files.

2-18 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Describe the operation of cfgmgr and its interaction with the ODM. Details — Explain how the “plug and play” gets added. Additional information — Try entering the command odmget Config_Rules to find out more about the content of this object class. Note the frequent references to the directories /etc/methods and /usr/lib/methods. Although we have not discussed the odmget command yet, you could use the command odmget Config_Rules (as a sort of preview) and point out the references to the two directories as a demo. Transition statement — The ODM object classes are stored in three repositories.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-19

Instructor Guide

Location and contents of ODM repositories IBM Power Systems

CuDv CuAt CuDep CuDvDr CuVPD Config_Rules

Network

PdDv PdAt PdCn

history inventory lpp product

history inventory lpp product

nim_* SWservAt SRC*

/etc/objrepos

history inventory lpp product

sm_*

/usr/lib/objrepos

/usr/share/lib/objrepos

© Copyright IBM Corporation 2009

Figure 2-8. Location and contents of ODM repositories

AN151.0

Notes: Introduction To support diskless, dataless and other workstations, the ODM object classes are held in three repositories. Each of these repositories is described in the material that follows.

/etc/objrepos This repository contains the customized devices object classes and the four object classes used by the Software Vital Product Database (SWVPD) for the / (root) part of the installable software product. The root part of the software contains files that must be installed on the target system. To access information in the other directories, this directory contains symbolic links to the predefined devices object classes. The links are needed because the ODMDIR variable points to only /etc/objrepos. It contains the part of the product that cannot be shared among machines. Each client must have its own copy. Most of this software requiring a separate copy for each machine is associated with the configuration of the machine or product. 2-20 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

/usr/lib/objrepos This repository contains the predefined devices object classes, SMIT menu object classes, and the four object classes used by the SWVPD for the /usr part of the installable software product. The object classes in this repository can be shared across the network by /usr clients, dataless and diskless workstations. Software installed in the /usr part can be can be shared among several machines with compatible hardware architectures.

/usr/share/lib/objrepos Contains the four object classes used by the SWVPD for the /usr/share part of the installable software product. The /usr/share part of a software product contains files that are not hardware dependent. They can be shared among several machines, even if the machines have a different hardware architecture. An example of this are terminfo files that describe terminal capabilities. As terminfo is used on many UNIX systems, terminfo files are part of the /usr/share part of a system product.

lslpp options The lslpp command can list the software recorded in the ODM. When run with the -l (lower case L) flag, it lists each of the locations (/, /usr/lib, /usr/share/lib) where it finds the fileset recorded. This can be distracting if you are not concerned with these distinctions. Alternately, you can run lslpp -L which only reports each fileset once, without making distinctions between the root, usr, and share portions.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-21

Instructor Guide

Instructor notes: Purpose — Describe the different directories that hold ODM data. Details — Describe what ODM files reside in /etc/objrepos, /usr/lib/objrepos and /usr/share/lib/objrepos. Explain the meaning of the root, /usr and /usr/share part of a software product and identify that /usr/lib/objrepos and /usr/share/lib/objrepos can be shared in a network. Additional information — Transition statement — It is important to understand how ODM classes interact.

2-22 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

How ODM classes act together IBM Power Systems

PdDv: type = "14106902" class = "adapter" subclass = "pci" prefix = "ent" cfgmgr DvDr = "pci/goentdd" Define = /usr/lib/methods/define_rspc" Configure = "/usr/lib/methods/cfggoent"

CuDv: name = "ent1" status = 1 chgstatus = 2 ddins = "pci/goentdd" location = "02-08" parent = "pci2" connwhere = "8“ PdDvLn = "adapter/pci/14106902"

uniquetype = "adapter/pci/14106902"

PdAt: uniquetype = "adapter/pci/14106902" attribute = "jumbo_frames" deflt = "no" values = "yes,no"

chdev -l ent1 \ -a jumbo_frames=yes

CuAt: name = "ent1" attribute = "jumbo_frames" value = "yes" type = "R"

© Copyright IBM Corporation 2009

Figure 2-9. How ODM classes act together

AN151.0

Notes: Interaction of ODM classes The visual above and the notes below summarize how ODM classes act together. 1. In order for a particular device to be defined in AIX, the device type must be predefined in ODM class PdDv. 2. A device can be defined by either the cfgmgr (if the device is detectable), or by the mkdev command. Both commands use the define method to generate an instance in ODM class CuDv. The configure method is used to load a specific device driver and to generate an entry in the /dev directory. Notice the link PdDvLn from CuDv back to PdDv. 3. At this point you only have default attribute values in PdAt which, in our example of a gigabit Ethernet adapter, means you could not use jumbo frames (default is no). If you change the attributes, for example, jumbo_frames to yes, you get an object describing the nondefault value in CuAt. © Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-23

Instructor Guide

Instructor notes: Purpose — Summarize how the basic ODM classes interact. Details — Explain the flow as described in student notes. Additional information — None. Transition statement — As you know, not all system data is managed by the ODM.

2-24 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Data not managed by the ODM IBM Power Systems

Filesystem information

?

User/security information

?

Queues and queue devices

? © Copyright IBM Corporation 2009

Figure 2-10. Data not managed by the ODM

AN151.0

Notes: Completion of this page The visual above identifies some types of system information that are not managed by the ODM, but the names of the files that store these types of information have been intentionally omitted from the visual. Your instructor will complete this visual during the lecture. Please complete your own copy of the visual by writing in the file names supplied by your instructor.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-25

Instructor Guide

Instructor notes: Purpose — Review some files from the basic administration course. Details — Ask the students the following questions: 1. Which file contains information about the file systems on your system? /etc/filesystems 2. Which file contains most of the basic information (such as home directory and shell) about the users on your system? /etc/passwd Which file contains user attributes like password rules? /etc/security/user 3. Where is information about your queues and queue devices stored? /etc/qconfig Be sure to fill in the appropriate line on the visual as you give the answer to each question. Additional information — Tell the students that this is only a subset of data that is not in ODM. Transition statement — Let’s review some of the points we have covered so far in this unit.

2-26 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Let’s review: Device configuration and the ODM IBM Power Systems

1.

_______

Undefined

2.

Defined

Available

3. AIX kernel

D____ D____ 4.

Applications

/____/_____ 5.

© Copyright IBM Corporation 2009

Figure 2-11. Let’s review: Device configuration and the ODM

AN151.0

Notes: Instructions Please answer the following questions by writing them on the picture above. If you are unsure about a question, leave it out. 1. Which command configures devices in an AIX system? Note: This is not an ODM command.)Which ODM class contains all devices that your system supports? 2. Which ODM class contains all devices that are configured in your system? 3. Which programs are loaded into the AIX kernel to control access to the devices? 4. If you have a configured tape drive rmt1, which special file do applications access to work with this device?

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-27

Instructor Guide

Instructor notes: Purpose — Provide information about what happens when a device is configured in AIX. Details — Give the students five minutes to answer the questions. Then, provide the following answers: 1. cfgmgr 2. PdDv 3. CuDv 4. Device Driver 5. /dev/rmt1 Additional information — Summarize the picture after the discussion: If a device is to be configured, it must first be part of the PdDv class. It is not possible to configure a device that is not defined/predefined in the corresponding Pd classes. If a device is in the defined state, you definitely have an object in ODM class CuDv. The difference between the defined state and the available state is that, in the defined state, no device driver has been loaded into the AIX kernel. In other words, the program that controls the device does not exist in the defined state. When a device is made available, the device driver is loaded into the kernel. Additionally, a special file is created in the /dev directory that applications need to access the device. All this is done dynamically without a need to recompile the AIX kernel (which historically had to be done on other UNIX systems). Historically, this has been one big advantage of AIX against other UNIX systems. Transition statement — Now, let’s look at some commands used to work with the ODM.

2-28 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

ODM commands IBM Power Systems

Object class: odmcreate, odmdrop Descriptors: odmshow

uniquetype tape/scsi/scsd

disk/scsi/osdisk

tty/rs232/tty

attribute

deflt

block_size

none

pvid

none

login

disable

values 0-2147483648,1

enable, disable, ...

Objects: odmadd, odmchange, odmdelete, odmget © Copyright IBM Corporation 2009

Figure 2-12. ODM commands

AN151.0

Notes: Introduction Different commands are available for working with each of the ODM components: object classes, descriptors, and objects.

Commands for working with ODM classes 1. You can create ODM classes using the odmcreate command. This command has the following syntax: odmcreate descriptor_file.cre The file descriptor_file.cre contains the class definition for the corresponding ODM class. Usually, these files have the suffix .cre. The exercise for this unit contains an optional part that shows how to create self-defined ODM classes.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-29

Instructor Guide

2. To delete an entire ODM class, use the odmdrop command. The odmdrop command has the following syntax: odmdrop -o object_class_name The name object_class_name is the name of the ODM class you want to remove. Be very careful with this command. It removes the complete class immediately.

A command for working with ODM descriptors To view the underlying layout of an object class, use the odmshow command: odmshow object_class_name The visual shows an extraction from ODM class PdAt, where four descriptors are shown (uniquetype, attribute, deflt, and values).

Commands for working with objects Usually, system administrators work with objects. The odmget command retrieves object information from an existing object class. To add new objects, use odmadd. To delete objects, use odmdelete. To change objects, use odmchange. Working on the object level is explained in more detail on the following pages.

The ODMDIR environment variable All ODM commands use the ODMDIR environment variable, which is set in the file /etc/environment. The default value of ODMDIR is /etc/objrepos.

2-30 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Introduce the ODM command line interface. Details — Explain briefly the different ODM commands. Introduce the ODMDIR variable that is used for all ODM commands. Additional information — Tell the students that for system developers, an ODM API is available. Transition statement — The commands for working with objects are the commands system administrators use most often, so let’s spend a little more time talking about how these commands work.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-31

Instructor Guide

Changing attribute values IBM Power Systems

# odmget -q"uniquetype=tape/scsi/scsd and attribute=block_size" PdAt > file # vi file

PdAt: uniquetype = "tape/scsi/scsd" attribute = "block_size" deflt = “512" values = "0-2147483648,1" width = "" type = "R" generic = "DU" rep = "nr" nls_index = 6

Modify deflt to 512

# odmdelete -o PdAt -q"uniquetype=tape/scsi/scsd and attribute=block_size" # odmadd file © Copyright IBM Corporation 2009

Figure 2-13. Changing attribute values

AN151.0

Notes: Discussion of command sequence on the visual The odmget command in the example will pick all the records from the PdAt class, where uniquetype is equal to tape/scsi/scsd and attribute is equal to block_size. In this instance, only one record should be matched. The information is redirected into a file which can be changed using an editor. In this example, the default value for the attribute block_size is changed to 512. Note: Before the new value of 512 can be added into the ODM, the old object (which had the block_size set to a null value) must be deleted, otherwise you would end up with two objects describing the same attribute in the database. The first object found will be used, and the results could be quite confusing. This is why it is important to delete an entry before adding a replacement record. The final operation is to add the file into the ODM.

2-32 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Need to use ODM commands The ODM objects are stored in a binary format; that means you need to work with the ODM commands to query or change any objects.

Possible queries As with any database, you can perform queries for records matching certain criteria. The tests are on the values of the descriptors of the objects. A number of tests can be performed: = != > >= < file # vi file

PdAt: uniquetype = "tape/scsi/scsd" attribute = "block_size" deflt = “512" values = "0-2147483648,1" width = "" type = "R" generic = "DU" rep = "nr" nls_index = 6

Modify deflt to 512

# odmchange -o PdAt -q"uniquetype=tape/scsi/scsd and attribute=block_size" file

© Copyright IBM Corporation 2009

AN151.0

Figure 2-14. Using odmchange to change attribute values

Notes: Another way of changing attribute values The series of steps shown on this visual shows how the odmchange command can be used instead of the odmadd and odmdelete steps shown in the previous example to modify attribute values.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-35

Instructor Guide

Instructor notes: Purpose — Define how the odmchange command can be used instead of the odmadd and odmdelete commands. Details — Novice users should be encouraged to use odmdelete and odmadd commands rather than the odmchange command, which does the delete and the add operations all in one step. This is because with the odmchange command, you have to be very careful about the possibility of additional entries with the same field as the one you are using for searching, as you might end up changing more than you anticipated. Additional information — None Transition statement — Now, let’s look at some of the key ODM classes in more detail.

2-36 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

2.2. ODM database files Instructor topic introduction What students will do — Students will learn details regarding the function and layout of those ODM classes that were introduced in topic 1. By examining these classes, we: • Review the role of the ODM in device configuration. • Introduce the software vital product database and explain what state information students should know about. How students will do it — Through lecture, lab exercise, and checkpoint questions What students will learn — Students will be able to: • Discuss the function and layout of those ODM classes which are part of the Software Vital Product Database and those which are used during the configuration of a device. • Explain how ODM classes can be used to analyze system problems. How this will help students on their job — Many ODM-related problems are much easier to fix if one knows the key ODM classes and descriptors.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-37

Instructor Guide

Software vital product data IBM Power Systems

lpp: name = "bos.rte.printers“ size = 0 state = 5 ver = 6 rel = 1 mod =0 fix = 0 description = "Front End Printer Support“ lpp_id = 38 inventory: lpp_id = 38 private = 0 file_type = 0 format = 1 loc0 = "/etc/qconfig“ loc1 = "“ loc2 = "“ size = 0 checksum = 0

product: lpp_name = "bos.rte.printers“ comp_id = "5765-C3403“ state = 5 ver = 6 rel = 1 mod =0 fix = 0 ptf = "“ prereq = "*coreq bos.rte 5.1.0.0“ description = "“ supersedes = "" history: lpp_id = 38 ver = 6 rel = 1 mod = 0 fix = 0 ptf = "“ state = 1 time = 1187714064 comment = ""

© Copyright IBM Corporation 2009

Figure 2-15. Software vital product data

AN151.0

Notes: Role of installp command Whenever installing a product or update in AIX, the installp command uses the ODM to maintain the Software Vital Product Database (SWVPD).

Contents of SWVPD The following information is part of the SWVPD: • The name of the software product (for example, bos.rte.printers) • The version, release, modification, and fix level of the software product (for example, 5.3.0.10 or 6.1.0.0) • The fix level, which contains a summary of fixes implemented in a product • Any program temporary fix (PTF) that has been installed on the system • The state of the software product: - Available (state = 1) 2-38 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

-

Uempty

Applying (state = 2) Applied (state = 3) Committing (state = 4) Committed (state = 5) Rejecting (state = 6) Broken (state = 7)

SWVPD classes The Software Vital Product Data is stored in the following ODM classes: lpp

The lpp object class contains information about the installed software products, including the current software product state and description.

inventory

The inventory object class contains information about the files associated with a software product.

product

The product object class contains product information about the installation and updates of software products and their prerequisites.

history

The history object class contains historical information about the installation and updates of software products.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-39

Instructor Guide

Instructor notes: Purpose — Introduce the software vital product database. Details — Explain what kind of data is stored in the ODM classes (version, release, and so forth) and the meaning of the shown ODM classes. Identify how the classes are linked together by the lpp_id descriptor. Note that the list of descriptors is not complete and that the slide only lists selected descriptors for teaching purposes. Additional information — At this point, you might introduce the lslpp command, which has options like -l, -h, -f and -w. This command queries the software vital product database. We can see most of this information with the high-level lslpp command. The flags (and the related object classes) are: L : list the filesets (lpp object class) d : list the fileset dependencies (product object class) p : list the fileset prerequisites (product object class) w : list the fileset for a given file (inventory object class) f : list the files for a given fileset (inventory object class) h : list the maintenance history for a fileset (history object class) The commands used to produce the output on the visual are: • lpp: odmget -q name=bos.rte.printers lpp • product: odmget -q lpp_name=bos.rte.printers product • inventory: odmget -q lpp_id=38 inventory | pg Since there are a number of files in the root file system for this fileset, there are a number of objects that match this query (hence the pg command). Note that there are also files in this fileset in the usr file system. To display these: ODMDIR=/usr/lib/objrepos, then rerun the last odmget command. (Note: ODMDIR defaults to /etc/objrepos.) • history: odmget -q lpp_id=38 history Transition statement — Let’s introduce the most important software states.

2-40 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Software states you should know about IBM Power Systems

Applied

• Only possible for PTFs or Updates • Previous version stored in /usr/lpp/Package_Name • Rejecting update recovers to saved version • Committing update deletes previous version

Committed

• Removing committed software is possible • No return to previous version

Applying, committing, rejecting, deinstalling

If installation was not successful: a) installp -C b) smit maintain_software

Broken

• Cleanup failed • Remove software and reinstall

© Copyright IBM Corporation 2009

Figure 2-16. Software states you should know about

AN151.0

Notes: Introduction The AIX software vital product database uses software states that describe the status of an install or update package.

The applied and committed states When installing a program temporary fix (PTF) or update package, you can install the software into an applied state. Software in an applied state contains the newly installed version (which is active) and a backup of the old version (which is inactive). This gives you the opportunity to test the new software. If it works as expected, you can commit the software, which will remove the old version. If it does not work as planned, you can reject the software, which will remove the new software and reactivate the old version. Install packages cannot be applied. These will always be committed.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-41

Instructor Guide

Once a product is committed, if you would like to return to the old version, you must remove the current version and reinstall the old version.

States indicating installation problems If an installation does not complete successfully, for example, if the power fails during the install, you may find software states like applying, committing, rejecting, or deinstalling. To recover from this failure, execute the command installp -C or use the SMIT fastpath smit maintain_software. Select Clean Up After Failed or Interrupted Installation when working in SMIT.

The broken state After a cleanup of a failed installation, you might detect a broken software status. In this case, the only way to recover from the failure is to remove and reinstall the software package.

2-42 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Introduce the most important software states. Details — Explain the states using the information given in the student notes. Additional information — None Transition statement — Let’s explain ODM class PdDv.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-43

Instructor Guide

Predefined devices (PdDv) IBM Power Systems

PdDv: type = “scsd" class = "tape" subclass = "scsi" prefix = "rmt" ... base = 0 ... detectable = 1 ... led = 2418 setno = 54 msgno = 0 catalog = "devices.cat" DvDr = "tape" Define = "/etc/methods/define" Configure = "/etc/methods/cfgsctape" Change = "/etc/methods/chggen" Unconfigure = "/etc/methods/ucfgdevice" Undefine = "etc/methods/undefine" Start = "" Stop = "" ... uniquetype = "tape/scsi/scsd" © Copyright IBM Corporation 2009

Figure 2-17. Predefined devices (PdDv)

AN151.0

Notes: The predefined devices (PdDv) object class The Predefined Devices (PdDv) object class contains entries for all devices supported by the system. A device that is not part of this ODM class cannot be configured on an AIX system. Key attributes of objects in this class are described in the following paragraphs.

type This specifies the product name or model number, for example, 8 mm (tape).

class Specifies the functional class name. A functional class is a group of device instances sharing the same high-level function. For example, tape is a functional class name representing all tape devices. 2-44 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

subclass Device classes are grouped into subclasses. The subclass scsi specifies all tape devices that may be attached to a SCSI interface.

prefix This specifies the Assigned Prefix in the customized database, which is used to derive the device instance name and /dev name. For example, rmt is the prefix name assigned to tape devices. Names of tape devices would then look like rmt0, rmt1, or rmt2.

base This descriptor specifies whether a device is a base device or not. A base device is any device that forms part of a minimal base system. During system boot, a minimal base system is configured to permit access to the root volume group (rootvg) and hence to the root file system. This minimal base system can include, for example, the standard I/O diskette adapter and a SCSI hard drive. The device shown on the visual is not a base device. This flag is also used by the bosboot and savebase commands, which are introduced later in this course.

detectable This specifies whether the device instance is detectable or undetectable. A device whose presence and type can be determined by the cfgmgr, once it is actually powered on and attached to the system, is said to be detectable. A value of 1 means that the device is detectable, and a value of 0 that it is not (for example, a printer or tty).

led This indicates the value displayed on the LEDs when the configure method begins to run. The value stored is decimal, but the value shown on the LEDs is hexadecimal (2418 is 972 in hex).

setno, msgno Each device has a specific description (for example, SCSI Tape Drive) that is shown when the device attributes are listed by the lsdev command. These two descriptors are used to look up the description in a message catalog.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-45

Instructor Guide

catalog This identifies the filename of the national language support (NLS) catalog. The LANG variable on a system controls which catalog file is used to show a message. For example, if LANG is set to en_US, the catalog file /usr/lib/nls/msg/en_US/devices.cat is used. If LANG is de_DE, catalog /usr/lib/nls/msg/de_DE/devices.cat is used.

DvDr This identifies the name of the device driver associated with the device (for example, tape). Usually, device drivers are stored in directory /usr/lib/drivers. Device drivers are loaded into the AIX kernel when a device is made available.

Define This names the define method associated with the device type. This program is called when a device is brought into the defined state.

Configure This names the configure method associated with the device type. This program is called when a device is brought into the available state.

Change This names the change method associated with the device type. This program is called when a device attribute is changed through the chdev command.

Unconfigure This names the unconfigure method associated with the device type. This program is called when a device is unconfigured by rmdev -l.

Undefine This names the undefine method associated with the device type. This program is called when a device is undefined by rmdev -l -d.

Start, stop Few devices support a stopped state (only logical devices). A stopped state means that the device driver is loaded, but no application can access the device. These two attributes name the methods to start or stop a device.

2-46 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

uniquetype This is a key that is referenced by other object classes. Objects use this descriptor as a pointer back to the device description in PdDv. The key is a concatenation of the class, subclass, and type values.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-47

Instructor Guide

Instructor notes: Purpose — Introduce object class PdDv. Details — Explain the different descriptors. Additional information — If you want, you can mention there is an additional method for starting and stopping a device. To stop a device issue the following command: # rmdev -l -S Be happy if you found a device that supports the stopped state. Remember physical devices do not support a stopped state. You can list the devices in the Predefined Devices object class using the following command: # lsdev -P Transition statement — Next class is PdAt.

2-48 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Predefined attributes (PdAt) IBM Power Systems

PdAt: uniquetype = "tape/scsi/scsd" attribute = "block_size" deflt = "" values = "0-2147483648,1" ... PdAt: uniquetype = "disk/scsi/osdisk" attribute = "pvid" deflt = "none" values = "" ... PdAt: uniquetype = "tty/rs232/tty" attribute = "term" deflt = "dumb" values = "" ...

© Copyright IBM Corporation 2009

Figure 2-18. Predefined attributes (PdAt)

AN151.0

Notes: The predefined attribute (PdAt) object class The Predefined Attribute (PdAt) object class contains an entry for each existing attribute for each device represented in the PdDv object class. An attribute is any device-dependent information, such as interrupt levels, bus I/O address ranges, baud rates, parity settings, or block sizes. The extract out of PdAt that is given on the visual shows three attributes (block size, physical volume identifier, and terminal name) and their default values. The meanings of the key fields shown on the visual are described in the paragraphs that follow.

uniquetype This descriptor is used as a pointer back to the device defined in the PdDv object class.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-49

Instructor Guide

attribute This identifies the name of the attribute. This is the name that can be passed to the mkdev or chdev command. For example, to change the default name of dumb to ibm3151 for tty0, you can issue the following command: # chdev -l tty0 -a term=ibm3151

deflt This identifies the default value for an attribute. Nondefault values are stored in CuAt.

values This identifies the possible values that can be associated with the attribute name. For example, allowed values for the block_size attribute range from 0 to 2147483648, with an increment of 1.

2-50 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Introduce ODM class PdAt. Details — Describe the four major fields of PdAt that are shown on the visual. Additional information — Describe the pvid attribute for disks. The default physical volume ID for a disk is none. For each disk, a physical volume ID must be generated when the disk is configured for the first time. To list the default attributes of a customized device, the high-level command is: # lsattr -D -l To list the range of supported values for an attribute, the high-level command is: # lsattr -R -l -a Transition statement — The next ODM class is CuDv.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-51

Instructor Guide

Customized devices (CuDv) IBM Power Systems

CuDv: name = "ent1" status = 1 chgstatus = 2 ddins = "pci/goentdd" location = "02-08" parent = "pci2" connwhere = "8" PdDvLn = "adapter/pci/14106902" CuDv: name = "hdisk2" status = 1 chgstatus = 2 ddins = "scdisk" location = "01-08-01-8,0" parent = "scsi1" connwhere = "8,0" PdDvLn = "disk/scsi/scsd" © Copyright IBM Corporation 2009

Figure 2-19. Customized devices (CuDv)

AN151.0

Notes: The customized devices (CuDv) object class The Customized Devices (CuDv) object class contains entries for all device instances defined in the system. As the name implies, a defined device object is an object that a define method has created in the CuDv object class. A defined device object may or may not have a corresponding actual device attached to the system. The CuDv object class contains objects that provide device and connection information for each device. Each device is distinguished by a unique logical name. The customized database is updated twice, during system bootup and at run time, to define new devices, remove undefined devices, and update the information for a device that has changed. The key descriptors in CuDv are described in the next few paragraphs.

2-52 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

name A customized device object for a device instance is assigned a unique logical name to distinguish the device from other devices. The visual shows two devices, an Ethernet adapter ent1 and a disk drive hdisk2.

status This identifies the current status of the device instance. Possible values are: - status = 0 - Defined - status = 1 - Available - status = 2 - Stopped

chgstatus This flag tells whether the device instance has been altered since the last system boot. The diagnostics facility uses this flag to validate system configuration. The flag can take these values: - chgstatus = 0 - New device - chgstatus = 1 - Don't care - chgstatus = 2 - Same - chgstatus = 3 - Device is missing

ddins This descriptor typically contains the same value as the Device Driver Name descriptor in the Predefined Devices (PdDv) object class. It specifies the name of the device driver that is loaded into the AIX kernel.

location Identifies the AIX location of a device. The location code is a path from the system unit through the adapter to the device. In case of a hardware problem, the location code is used by technical support to identify a failing device.

parent Identifies the logical name of the parent device. For example, the parent device of hdisk2 is scsi1.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-53

Instructor Guide

connwhere Identifies the specific location on the parent device where the device is connected. For example, the device hdisk2 uses the SCSI address 8,0.

PdDvLn Provides a link to the device instance's predefined information through the uniquetype descriptor in the PdDv object class.

2-54 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Introduce ODM class CuDv. Details — Do not explain all shown descriptors from the visual. Concentrate on explaining the ones which are important (status and chgstatus). Discuss the objects shown in bold on the visual. The value chgstatus=2 means that the state of hdisk2 has not changed since last boot. The value chgstatus=1 would mean that the state of this device could not be determined by the cfgmgr. (for example when dealing with a device that is attached using a serial or parallel port). Additional information — Ask students if anybody has seen the following message during system boot: A previously defined device could not be detected. Explain that this message is caused by a device that is defined in CuDv but is not physically present. For this device, the value of chgstatus is 3. To list the devices in the Customized Devices object class, the high-level command is: # lsdev -C Transition statement — The next class is CuAt.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-55

Instructor Guide

Customized attributes (CuAt) IBM Power Systems

CuAt: name = "ent1" attribute = "jumbo_frames" value = "yes" ... CuAt: name = "hdisk2" attribute = "pvid" value = "00c35ba0816eafe50000000000000000" ...

© Copyright IBM Corporation 2009

Figure 2-20. Customized attributes (CuAt)

AN151.0

Notes: The customized attribute (CuAt) object class The Customized Attribute (CuAt) object class contains customized device-specific attribute information. Devices represented in the Customized Devices (CuDv) object class have attributes found in the Predefined Attribute (PdAt) object class and the CuAt object class. There is an entry in the CuAt object class for attributes that take customized values. Attributes taking the default value are found in the PdAt object class. Each entry describes the current value of the attribute.

Discussion of examples on visual The sample CuAt entries on the visual show two attributes that have customized values. The attribute login has been changed to enable. The attribute pvid shows the physical volume identifier that has been assigned to disk hdisk0. 2-56 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Introduce the CuAt ODM class. Details — Explain that CuAt contains customized values. The default values are stored in PdAt. Additional information — Mention the 16 zeros that are part of the pvid value. They are not shown with the lsdev command. The value of the pvid for disks is not set until the disk becomes part of a volume group. To list the effective attributes values for a customized device, the high-level command is: # lsattr -E -l To set an effective attribute value for a device, the high-level command is: # chdev -l -a = Transition statement — Let’s look at a few more ODM object classes.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-57

Instructor Guide

Additional device object classes IBM Power Systems

PdCn: uniquetype = "adapter/pci/sym875“ connkey = "scsi“ connwhere = "1,0"

CuDvDr: resource value1 = value2 = value3 =

= "devno" "36" "0" "hdisk3“

PdCn: uniquetype = "adapter/pci/sym875“ connkey = "scsi“ connwhere = "2,0"

CuDvDr: resource value1 = value2 = value3 =

= "devno" "36" "1" "hdisk2"

CuDep: name = "rootvg“ dependency = "hd6" CuDep: name = "datavg“ dependency = "lv01"

CuVPD: name = "hdisk2" vpd_type = 0 vpd = "*MFIBM *TM\n\ HUS151473VL3800 *F03N5280 *RL53343341*SN009DAFDF*ECH17 923D *P26K5531 *Z0\n\ 000004029F00013A*ZVMPSS43A *Z20068*Z307220" © Copyright IBM Corporation 2009

Figure 2-21. Additional device object classes

AN151.0

Notes: PdCn The Predefined Connection (PdCn) object class contains connection information for adapters (or sometimes called intermediate devices). This object class also includes predefined dependency information. For each connection location, there are one or more objects describing the subclasses of devices that can be connected. The sample PdCn objects on the visual indicate that, at the given locations, all devices belonging to subclass SCSI could be attached.

CuDep The Customized Dependency (CuDep) object class describes device instances that depend on other device instances. This object class describes the dependence links between logical devices and physical devices as well as dependence links between

2-58 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

logical devices, exclusively. Physical dependencies of one device on another device are recorded in the Customized Devices (CuDev) object class. The sample CuDep objects on the visual show the dependencies between logical volumes and the volume groups they belong to.

CuDvDr The Customized Device Driver (CuDvDr) object class is used to create the entries in the /dev directory. These special files are used from applications to access a device driver that is part of the AIX kernel. The attribute value1 is called the major number and is a unique key for a device driver. The attribute value2 specifies a certain operating mode of a device driver. The sample CuDvDr objects on the visual reflect the device driver for disk drives hdisk2 and hdisk3. The major number 36 specifies the driver in the kernel. In our example, the minor numbers 0 and 1 specify two different instances of disk dives, both using the same device driver. For other devices, the minor number may represent different modes in which the device can be used. For example, if we were looking at a tape drive, the operating mode 0 would specify a rewind on close for the tape drive, the operating mode 1 would specify no rewind on close for a tape drive.

CuVPD The Customized Vital Product Data (CuVPD) object class contains vital product data (manufacturer of device, engineering level, part number, and so forth) that is useful for technical support. When an error occurs with a specific device, the vital product data is shown in the error log.

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-59

Instructor Guide

Instructor notes: Purpose — Explain briefly the function of some additional ODM classes. Details — Describe the ODM classes shown using the explanations in the student notes. Additional information — None Transition statement — We have reached a checkpoint.

2-60 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Checkpoint IBM Power Systems

1. In which ODM class do you find the physical volume IDs of your disks? ________________________________________________

2. What is the difference between the states: defined and available? ________________________________________________ ________________________________________________ ________________________________________________ ________________________________________________ © Copyright IBM Corporation 2009

Figure 2-22. Checkpoint

AN151.0

Notes:

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-61

Instructor Guide

Instructor notes: Purpose — Details —

Checkpoint solutions IBM Power Systems

1. In which ODM class do you find the physical volume IDs of your disks? CuAt

2. What is the difference between the states: defined and available? When a device is defined, there is an entry in ODM class CuDv. When a device is available, the device driver has been loaded. The device driver can be accessed by the entries in the /dev directory.

© Copyright IBM Corporation 2009

Additional information — Transition statement — Let’s look at reinforcing what we have covered by playing with the ODM in the lab.

2-62 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Exercise 3: The Object Data Manager (ODM) IBM Power Systems

• Review of device configuration ODM classes • Modifying a device default attribute • Creating self-defined ODM classes (Optional)

© Copyright IBM Corporation 2009

Figure 2-23. Exercise 3: The Object Data Manager (ODM)

AN151.0

Notes:

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-63

Instructor Guide

Instructor notes: Purpose — Introduce the exercise. Details — Additional information — Transition statement —

2-64 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit summary IBM Power Systems

Having completed this unit, you should be able to: • Describe the structure of the ODM • Use the ODM command line interface • Explain the role of the ODM in device configuration • Describe the function of the most important ODM files

© Copyright IBM Corporation 2009

Figure 2-24. Unit summary

AN151.0

Notes: The ODM is made from object classes, which are broken into individual objects and descriptors. AIX offers a command line interface to work with the ODM files. The device information is held in the customized and the predefined databases (Cu*, Pd*).

© Copyright IBM Corp. 2009

Unit 2. The Object Data Manager Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-65

Instructor Guide

Instructor notes: Purpose — Review some of the key points covered in the unit. Details — Present the highlights from the unit. Additional information — None. Transition statement — Let’s continue with the next unit.

2-66 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit 3. Error monitoring What this unit is about This unit covers techniques in monitoring for problems and how to automate responses to those problems. Topics include an overview of the AIX Error Log facility (and how it can interact with the syslogd daemon), the Resource Monitoring and Control (RMC) facility, and the system hang (shdaemon) monitoring facility.

What you should be able to do After completing this unit, you should be able to: • • • • • •

Analyze error log entries Identify and maintain the error logging components Describe different error notification methods Log system messages using the syslogd daemon Monitor and take actions for threshold conditions using RMC Monitor and take actions for hang conditions using shdaemon

How you will check your progress Accountability: • Lab exercise • Checkpoint questions

References Online

AIX Version 6.1 General Programming Concepts: Writing and Debugging Programs (Chapter 5. Error-Logging Overview)

Online

AIX Version 6.1 Command Reference volumes 1-6

Note: References listed as “online” above are available at the following address: http://publib.boulder.ibm.com/infocenter/systems

© Copyright IBM Corp. 2009

Unit 3. Error monitoring Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-1

Instructor Guide

Unit objectives IBM Power Systems

After completing this unit, you should be able to: • Analyze error log entries • Identify and maintain the error logging components • Describe different error notification methods • Log system messages using the syslogd daemon • Monitor and take actions for threshold conditions using RMC • Monitor and take actions for hang conditions using shdaemon © Copyright IBM Corporation 2009

Figure 3-1. Unit objectives

AN151.0

Notes:

3-2

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Introduce the topics to be covered in this unit. Details — Use the student material to guide your presentation. Additional information — None Transition statement — Let’s discuss error logging first.

© Copyright IBM Corp. 2009

Unit 3. Error monitoring Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-3

Instructor Guide

3-4

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

3.1. Working with the error log Instructor topic introduction What students will do — Identify the components of the error logging facility and create error reports. How students will do it — Through lecture, lab exercise, and checkpoint questions. What students will learn — How to create and read an error report, and when and how to maintain the error log. How this will help students on their job — Being able to identify possible software and hardware errors and solutions will enhance the students' job performance and productivity.

© Copyright IBM Corp. 2009

Unit 3. Error monitoring Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-5

Instructor Guide

Error logging components IBM Power Systems

console

errnotify

diagnostics

SMIT

error notification

errpt

formatted output

CuDv, CuAt

error daemon

CuVPD error record template /var/adm/ras/errtmplt

errlog /var/adm/ras/errlog /usr/lib/errdemon

errstop

errclear errlogger

application errlog() /dev/error (timestamp)

errsave() kernel module

User Kernel

© Copyright IBM Corporation 2009

Figure 3-2. Error logging components

AN151.0

Notes: Detection of an error The error logging process begins when an operating system module detects an error. The error detecting segment of code then sends error information to either the errsave() kernel service or the errlog() application subroutine, where the information is in turn written to the /dev/error special file. This process then adds a timestamp to the collected data. The errdemon daemon constantly checks the /dev/error file for new entries, and when new data is written, the daemon conducts a series of operations.

Creation of error log entries Before an entry is written to the error log, the errdemon daemon compares the label sent by the kernel or the application code to the contents of the Error Record Template Repository. If the label matches an item in the repository, the daemon collects additional data from other parts of the system.

3-6

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

To create an entry in the error log, the errdemon daemon retrieves the appropriate template from the repository, the resource name of the unit that caused the error, and the detail data. Also, if the error signifies a hardware-related problem and hardware vital product data (VPD) exists, the daemon retrieves the VPD from the ODM. When you access the error log, either through SMIT or with the errpt command, the error log is formatted according to the error template in the error template repository and presented in either a summary or detailed report. Most entries in the error log are attributable to hardware and software problems, but informational messages can also be logged, for example, by the system administrator.

The errlogger command The errlogger command allows the system administrator to record messages of up to 1024 bytes in the error log. Whenever you perform a maintenance activity, such as clearing entries from the error log, replacing hardware, or applying a software fix, it is a good idea to record this activity in the system error log. The following example illustrates use of the errlogger command: # errlogger system hard disk ’(hdisk0)’ replaced. This message will be listed as part of the error log.

Error log hardening Under very rare circumstances, such as powering off the system exactly while the errdemon is writing into the error log, the error log may become corrupted. In AIX 5L V5.3, there are minor modifications made to the errdemon to improve its robustness and to recover the error log file at its start. When the errdemon starts, it checks for error log consistency. First, it makes a backup copy of the existing error log file to /tmp/errlog.save, and then it corrects the error log file, while preserving consistent error log entries. The difference from the previous versions of AIX is that the errdemon used to reset the log file if it was corrupted, instead of repairing it.

© Copyright IBM Corp. 2009

Unit 3. Error monitoring Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-7

Instructor Guide

Instructor notes: Purpose — Define the components of the error logging facility. Details — Additional information — See the AIX 5L Differences Guide Version 5.3 Edition Redbook (SG24-7463-00) for more information about error log hardening (also referred to as error log RAS). The following is a list of terms that you will probably refer to: error ID

This is a 32-bit hexadecimal code used to identify a particular failure. Each error record template has a unique error ID.

error label

This is the mnemonic name for an error ID.

error log

This is the file that stores instances of errors and failures encountered by the system.

error log entry

A record in the system error log that describes a failure. Contains captured failure data.

error record template

A description of what will be displayed when the error log is formatted for a report, including information on the type and class of error, probable causes and recommended actions. Collectively, the templates comprise the Error Record Template Repository.

Cover the diagram on the visual starting from the bottom, with the error being detected by errlog() or errsave() and an entry being made in /dev/error, up to the point where a user can look at the records of the error log either by going through SMIT or by executing the errpt command. An errpt command can be run from the shell or SMIT to format records in the errlog into readable reports. The ODM classes CuDv, CuAt and CuVPD provides information for the detailed error reporting. Transition statement — SMIT can be used to generate an error report.

3-8

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Generating an error report using SMIT IBM Power Systems

# smit errpt Generate an Error Report ... CONCURRENT error reporting? Type of Report Error CLASSES (default is all) Error TYPES (default is all) Error LABELS (default is all) Error ID's (default is all) Resource CLASSES (default is all) Resource TYPES (default is all) Resource NAMES (default is all) SEQUENCE numbers (default is all) STARTING time interval ENDING time interval Show only Duplicated Errors Consolidate Duplicated Errors LOGFILE TEMPLATE file MESSAGE file FILENAME to send report to (default is stdout) ...

no summary [] [] [] [] [] [] [] [] [] [] [no] [no] [/var/adm/ras/errlog] [/var/adm/ras/errtmplt] [] []

+ + + + +X

© Copyright IBM Corporation 2009

Figure 3-3. Generating an error report using SMIT

AN151.0

Notes: Overview The SMIT fastpath smit errpt takes you to the screen used to generate an error report. Any user can use this screen. As shown on the visual, the screen includes a number of fields that can be used for report specifications. Some of these fields are described in more detail below.

CONCURRENT error reporting? Yes means you want errors displayed or printed as the errors are entered into the error log (a sort of tail -f ).

© Copyright IBM Corp. 2009

Unit 3. Error monitoring Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-9

Instructor Guide

Type of report Summary, intermediate, and detailed reports are available. Detailed reports give comprehensive information. Intermediate reports display most of the error information. Summary reports contain concise descriptions of errors.

Error classes Values are H (hardware), S (software), and O (operator messages created with errlogger). You can specify more than one error class.

Error types Valid error types include the following: - PEND - The loss of availability of a device or component is imminent. - PERF - The performance of the device or component has degraded to below an acceptable level. - TEMP - Recovered from condition after several attempts. - PERM - Unable to recover from error condition. Error types with this value are usually the most severe errors and imply that you have a hardware or software defect. Error types other than PERM usually do not indicate a defect, but they are recorded so that they can be analyzed by the diagnostic programs. - UNKN - Severity of the error cannot be determined. - INFO - The error type is used to record informational entries

Error labels An error label is the mnemonic name used for an error ID.

Error IDs An error ID is a 32-bit hexadecimal code used to identify a particular failure.

Resource classes Means device class for hardware errors (for example, disk).

Resource types Indicates device type for hardware (for example, 355 MB).

3-10 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Resource names Provides common device name (for example hdisk0).

Starting and ending time interval The format mmddhhmmyy can be used to select only errors from the log that are time stamped between the two values.

Show only duplicated errors Yes will report only those errors that are exact duplicates of previous errors generated during the interval of time specified. The default time interval is 100 milliseconds. This value can be changed with the errdemon -t command. The default for the Show only Duplicated Errors option is no.

Consolidate duplicated errors Yes will report only the number of duplicate errors and timestamps of the first and last occurrence of that error. The default for the Consolidate Duplicated Errors option is no.

File name to send reports to The report can be sent to a file. The default is to send the report to stdout.

© Copyright IBM Corp. 2009

Unit 3. Error monitoring Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-11

Instructor Guide

Instructor notes: Purpose — Explain how an error report can be generated through SMIT. Details — Additional information — This option will allow you to produce a detailed or summary report. Examples of both will be given. Mention all the different fields that can be used to generate specific searches and reports. Note that the report can be sent to a file - which is defined by the last option. The Show only Duplicated Errors option in the Generate an Error Report screen was introduced in AIX 5L V5.1. Examples of duplicate errors might include floppy drive not ready, external drive not ready, or Ethernet card unplugged. Transition statement — Instead of using SMIT, you can also generate a report from the command line. Let's see how this can be done.

3-12 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

The errpt command IBM Power Systems

• Summary report: # errpt

• Intermediate report: # errpt -A

• Detailed report: # errpt -a

• Summary report of all hardware errors: # errpt

-d

H

• Detailed report of all software errors: # errpt

-a

-d

S

• Concurrent error logging ("Real-time" error logging): # errpt

-c

> /dev/console © Copyright IBM Corporation 2009

Figure 3-4. The errpt command

AN151.0

Notes: Types of reports available The errpt command generates a report of logged errors. Three different layouts can be produced, depending on the option that is used: - A summary report gives an overview (default). - An intermediate report only displays the values for the LABEL, Date/Time, Type, Resource Name, Description and Detailed Data fields. Use the option -A to specify an intermediate report. - A detailed report shows a detailed description of all the error entries. Use the option -a to specify a detailed report.

© Copyright IBM Corp. 2009

Unit 3. Error monitoring Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-13

Instructor Guide

The -d option The -d option (flag) can be used to limit the report to a particular class of errors. Two examples illustrating use of this flag are shown on the visual: - The command errpt -d H specifies a summary report of all hardware (-d H) errors. - The command errpt -a -d S specifies a detailed report (-a) of all software (-d S) errors.

Input file used The errpt command queries the error log file /var/adm/ras/errlog to produce the error report.

The -c option If you want to display the error entries concurrently, that is, at the time they are logged, you must execute errpt -c. In the example on the visual, we direct the output to the system console.

The -D flag Duplicate errors can be consolidated using errpt -D. When used with the -a option, errpt -D reports only the number of duplicate errors and the timestamp for the first and last occurrence of the identical error.

The -P flag Shows only errors which are duplicates of the previous error. The -P flag applies only to duplicate errors generated by the error log device driver.

Additional information The errpt command has many options. Refer to your AIX Commands Reference (or the man page for errpt) for a complete description.

3-14 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Introduce the errpt command. Details — Describe using the information in the student notes Additional information — Transition statement — Now that we know how we can formulate a report, let’s look at examples of summary and detailed reports. Let’s start with the summary report.

© Copyright IBM Corp. 2009

Unit 3. Error monitoring Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-15

Instructor Guide

A summary report (errpt) IBM Power Systems

# errpt IDENTIFIER 192AC071 C6ACA566 A6DF45AA 2BFA76F6 9DBCFDEE 192AC071 AA8AB241 C6ACA566 2BFA76F6 EAA3D429 EAA3D429 F7DDA124

TIMESTAMP 1010130907 1010130807 1010130707 1010130707 1010130707 1010123907 1010120407 1010120007 1010094907 1010094207 1010094207 1010094207

T T U I T T T T U T U U U

C O S O S O O O S S S S H

Error Type: • P: Permanent, Performance, or Pending • T: Temporary • I: Informational • U: Unknown

RESOURCE_NAME errdemon syslog RMCdaemon SYSPROC errdemon errdemon OPERATOR syslog SYSPROC LVDD LVDD LVDD

DESCRIPTION ERROR LOGGING TURNED OFF MESSAGE REDIRECTED FROM SYSLOG The daemon is started. SYSTEM SHUTDOWN BY USER ERROR LOGGING TURNED ON ERROR LOGGING TURNED OFF OPERATOR NOTIFICATION MESSAGE REDIRECTED FROM SYSLOG SYSTEM SHUTDOWN BY USER PHYSICAL PARTITION MARKED STALE PHYSICAL PARTITION MARKED STALE PHYSICAL VOLUME DECLARED MISSING

Error Class: • H: Hardware • S: Software • O: Operator • U: Undetermined © Copyright IBM Corporation 2009

Figure 3-5. A summary report (errpt)

AN151.0

Notes: Content of summary report By default, the errpt command creates a summary report which gives an overview of the different error entries. One line per error is fine to get a feel for what is there, but you need more details to understand problems.

Need for detailed report The example shows different hardware and software errors that occurred. To get more information about these errors, you must create a detailed report.

3-16 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Discuss the summary error report. Details — Use the information in the student notes and the information given under “Additional Information” below to guide your explanation. Additional information — The first field indicates the error ID, which is not unique to each entry, that is, to each instance of an error. It is unique for a kind of error. The next field is the time field which is in the following format: mmddhhmmyy where mmddhhmmyy is the month, day, hour, minute, and year (as previously discussed). The third field specifies the type of error; possible values are defined at the bottom of the visual. There is a problem with this field because there are three possible values that begin with the letter P. As this field is a one-letter field, you cannot tell exactly what type of an error you are dealing with until you view the detailed report. The next field defines the class; again the possible values are given at the bottom of the visual. The last two fields give the resource name of the component that is causing the problem and also a description of the error. Transition statement — Let’s look at a detailed report.

© Copyright IBM Corp. 2009

Unit 3. Error monitoring Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-17

Instructor Guide

A detailed error report (errpt -a) IBM Power Systems

LABEL: IDENTIFIER:

LVM_SA_PVMISS F7DDA124

Date/Time: Sequence Number: Machine Id: Node Id: Class: Type: WPAR: Resource Name: Resource Class: Resource Type: Location:

Wed Oct 10 09:42:20 CDT 2007 113 00C35BA04C00 rt1s3vlp2 H UNKN Global LVDD NONE NONE

Description PHYSICAL VOLUME DECLARED MISSING Probable Causes POWER, DRIVE, ADAPTER, OR CABLE FAILURE Detail Data MAJOR/MINOR DEVICE NUMBER 8000 0011 0000 0001 SENSE DATA 00C3 5BA0 0000 4C00 0000 0115 7F54 BF78 00C3 5BA0 7FCF 6B93 0000 0000 0000 0000

© Copyright IBM Corporation 2009

Figure 3-6. A detailed error report (errpt -a)

AN151.0

Notes: Content of detailed error report As previously mentioned, detailed error reports are generated by issuing the errpt -a command. The first half of the information displayed is obtained from the ODM (CuDv, CuAt, CuVPD) and is very useful because it shows clearly which part causes the error entry. The next few fields explain probable reasons for the problem, and actions that you can take to correct the problem. The last field, SENSE DATA, is a detailed report about which part of the device is failing. For example, with disks, it could tell you which sector on the disk is failing. This information can be used by IBM support to analyze the problem.

3-18 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Interpreting error classes and types The values shown for error class and error type provide information that is useful in understanding a particular problem: 1. The combination of an error class value of H and an error type value of PERM indicates that the system encountered a problem with a piece of hardware and could not recover from it. 2. The combination of an error class value of H and an error type value of PEND indicates that a piece of hardware may become unavailable soon due to the numerous errors detected by the system. 3. The combination of an error class value of S and an error type of PERM indicates that the system encountered a problem with software and could not recover from it. 4. The combination of an error class value of S and an error type of TEMP indicates that the system encountered a problem with software. After several attempts, the system was able to recover from the problem. 5. An error class value of O indicates that an informational message has been logged. 6. An error class value of U indicates that an error class could not be determined.

Link between error log and diagnostics In AIX 5L V5.1 and later, there is a link between the error log and diagnostics. Error reports include the diagnostic analysis for errors that have been analyzed. Diagnostics, and the diagnostic tool diag, will be covered in a later unit.

© Copyright IBM Corp. 2009

Unit 3. Error monitoring Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-19

Instructor Guide

Instructor notes: Purpose — Explain the information that is obtained from a detailed report. Details — Explain using the information in the student notes. Additional information — None Transition statement — Disk errors are frequently seen in the error log. There are many different types of disk errors. Let’s identify the different types and find out the severity of each.

3-20 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Types of disk errors IBM Power Systems

Error Recommendations Type P Failure of physical volume media

Error Label DISK_ERR1

Action: Replace device as soon as possible DISK_ERR2,

P

DISK_ERR3

Device does not respond Action: Check power supply

DISK_ERR4

T

Error caused by bad block or occurrence of a recovered error Rule of thumb: If disk produces more than one DISK_ERR4 per week, replace the disk

SCSI_ERR*

P

(SCSI_ERR10) Error Types:

SCSI communication problem Action: Check cable, SCSI addresses, terminator

P = Permanent T = Temporary © Copyright IBM Corporation 2009

Figure 3-7. Types of disk errors

AN151.0

Notes: Common disk errors The following list explains the most common disk errors you should know about: 1. DISK_ERR1 is caused from wear and tear of the disk. Remove the disk as soon as possible from the system and replace it with a new one. Follow the procedures that you have learned earlier in this course. 2. DISK_ERR2 and DISK_ERR3 error entries are mostly caused by a loss of electrical power. 3. DISK_ERR4 is the most interesting one, and the one that you should watch out for, as this indicates bad blocks on the disk. Do not panic if you get a few entries in the log of this type of an error. What you should be aware of is the number of DISK_ERR4 errors and their frequency. The more you get, the closer you are getting to a disk failure. You want to prevent this before it happens, so monitor the error log closely.

© Copyright IBM Corp. 2009

Unit 3. Error monitoring Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-21

Instructor Guide

4. Sometimes SCSI errors are logged, mostly with the LABEL SCSI_ERR10. They indicate that the SCSI controller is not able to communicate with an attached device. In this case, check the cable (and the cable length), the SCSI addresses, and the terminator.

DISK_ERR5 errors A very infrequent error is DISK_ERR5. It is the catch-all (that is, the problem does not match any of the above DISK_ERRx symptoms). You need to investigate further by running the diagnostic programs which can detect and produce more information about the problem.

3-22 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Define the different types of disk errors. Details — Explain using the information in the student notes. Additional information — Explain each type of error in turn: Disk errors 1,2, and 4 will return sense data which can be analyzed by the diagnostic programs to provide extra information regarding the nature of the error, and its severity. DISK_ERR 4 is by far the most common error generated, and it is the least severe. It indicates that a bad block has been detected during a read or write request to the disk drive. Bad block relocation and mirroring When a disk drive is formatted for the first time, a portion of the drive (about 5% in the case of IBM drives) is set aside for bad block relocation. The format itself also masks and readdresses existing bad blocks so that the medium is clean and ready for use. During use, however, any disk drive can develop bad blocks that can be attributed to deterioration caused by the setting and resetting of magnetic charges on the medium. Bad blocks may be discovered during any read or write operation, triggering disk error 4s, but they can only be actually relocated during a write operation. At the software level, if your hardware does not support bad block relocation, you can set logical volume bad block relocation. If a bad block is detected during a read or write operation, its physical location is recorded in the logical volume device driver (LVDD) defects directory. This directory is reviewed during each read or write request. Most hardware does support bad block relocation and so the logical volume attribute is irrelevant. Bad blocks are never a problem when mirrored logical volumes are used. Either a read or write request is completed on the mirror copy that is undamaged, and the damaged block is always relocated. When a read requests a damaged block, the logical volume manager converts the request to a write request and relocates the block with values derived from the good copy. All this occurs without intervention or special configuration. Transition statement — Let’s show the most important error entries the logical volume manager creates.

© Copyright IBM Corp. 2009

Unit 3. Error monitoring Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-23

Instructor Guide

LVM error log entries IBM Power Systems

Class and Type S,P

Error Label LVM_BBEPOOL, LVM_BBERELMAX,

No more bad block relocation Action: Replace disk as soon as possible.

LVM_HWFAIL LVM_SA_STALEPP

Recommendations

S,P

Stale physical partition Action: Check disk, synchronize data (syncvg).

LVM_SA_QUORCLOSE

H,P

Quorum lost, volume group closing Action: Check disk, consider working without quorum.

Error Classes: H = Hardware S = Software

Error Types: P = Permanent T = Temporary

© Copyright IBM Corporation 2009

Figure 3-8. LVM error log entries

AN151.0

Notes: Important LVM error codes The visual shows some very important LVM error codes you should know. All of these errors are permanent errors that cannot be recovered. Very often these errors are accompanied by hardware errors such as those shown on the previous page.

Immediate response to errors Errors, such as those shown on the visual, require your immediate intervention.

3-24 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Introduce some important LVM errors. Details — Review the different terms and the errors that are produced by LVM. Additional information — None Transition statement — Let’s see how to maintain the error log.

© Copyright IBM Corp. 2009

Unit 3. Error monitoring Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-25

Instructor Guide

Maintaining the error log IBM Power Systems

# smit errdemon Change / Show

Characteristics of the Error Log

Type or select values in entry fields. Press Enter AFTER making all desired changes. LOGFILE *Maximum LOGSIZE Memory Buffer Size ...

[/var/adm/ras/errlog] [1048576] [32768]

# #

# smit errclear Clean the Error Log Type or select values in entry fields. Press Enter AFTER making all desired changes. Remove entries older than this number of days Error CLASSES Error TYPES ... Resource CLASSES ...

[30]

# [ ] [ ]

[ ]

+ + +

==> Use the errlogger command as a reminder = all other NIM machines

• Client – File sets: • bos.sysmgt.nim.client

– Can initiate pull installations from a server

• Server – Any machine, master or client – Serves NIM resources to clients, thus requires adequate disk space and throughput © Copyright IBM Corporation 2009

Figure 4-3. Machine roles

AN151.0

Notes: There are three basic roles that a machine can assume in the NIM environment: master, client, and resource server. There can only be one master machine in a NIM environment, all other machines are clients. Any machine, master or client, can be a resource server.

NIM software All machines in the NIM environment must install bos.sysmgt.nim.client. The master machine must also install bos.sysmgt.nim.master and bos.sysmgt.nim.spot.

Master The NIM master manages all other machines that participate in the NIM environment. The NIM database is stored on the NIM master. The NIM master is fundamental for all

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-7

Instructor Guide

of the operations in the NIM environment and must be set up and operational before performing any NIM operations. The master can initiate a software installation to a client, which is called a push installation. Also, the NIM master is the only machine that is given the permissions and ability to execute NIM operations on other machines within the NIM environment. The rsh command is used to remotely execute commands on clients which allows the NIM master to install to a number of clients with one NIM operation. With AIX 5.3 or AIX 6.1, nimsh can be used as an alternative to rsh.

Client All other machines in a NIM environment are clients. Clients can request a software installation from a server machine (pull installation).

Server Any machine, the master or a client, can be configured by the master as a server for a particular software resource. Most often, the master is also the server. However, if your environment has many nodes or consists of a complex network environment, you may want to configure some nodes to act as servers to improve installation performance. Servers must have adequate disk space for the resources they will be providing. They also need network connections to the client machines they serve and sufficient bandwidth to respond to the expected volume.

4-8

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain machine roles in more detail. Details — Additional information — Transition statement — To better understand how NIM manages a network installation, it is useful to first review the components of a regular installation from tape or optical media.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-9

Instructor Guide

Boot process for AIX installation (tape or CD) IBM Power Systems

Power on machine IPL ROM loads boot image from media into memory

Load boot image Transfer control to mini-runtime environment Invoke boot script

Configure devices for installation

CD

tape Boot image fully responsible for configuring devices

SPOT Programs in /usr on CD are used to configure devices

Install script runs Installation images on media © Copyright IBM Corporation 2009 Figure 4-4. Boot process for AIX installation (tape or CD)

AN151.0

Notes: To understand how NIM works, we need to understand what happens when we install AIX on a system. We start by reviewing what happens when we boot from CD or tape to install AIX.

Power on A Power machine must be booted or reset in order to install the AIX Base Operating System (BOS).

Load boot image into memory The machine's Initial Program Load (IPL) Read Only Memory (ROM) locates a boot image and loads the image into memory. The boot image contains a miniature runtime environment (the kernel and a file system containing libraries and key programs).

4-10 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Where is the boot image? When booting from a hard disk, the boot image is retrieved from the system's hard disk. When a machine is being installed for the first time, it obviously cannot retrieve a boot image from the hard disk. The boot image must therefore be available on the tape or CD.

Transfer control to mini-runtime environment Control is passed to the kernel, and the file system in the boot image is mounted from memory.

Invoke boot script and configure devices needed for installation The kernel initializes and eventually runs the boot script (rc.boot), which configures devices that are needed for the installation such as keyboards, displays, and disks.

Configuring devices In order to keep the boot image small, not all of the software needed to configure devices is included in the boot image. These additional files are contained in a small usr directory tree called a Shared Product Object Tree or SPOT. The boot script mounts this usr directory tree on /SPOT in the memory file system. The SPOT is mounted directly from the CDROM. Note: Since tape devices do not support file system operations, the SPOT files are included in the boot image in the case of booting from a tape drive.

Install script Once the devices have been configured, rc.boot invokes the BOS installation program (bi_main), and installs AIX from the installation images on the tape or CD.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-11

Instructor Guide

Instructor notes: Purpose — Review the flow and components of an AIX installation from tape or optical media. Details — Additional information — Transition statement — If we next look at how a network install is handled, we will see that there are many similarities with a regular installation, of course with some significant variations.

4-12 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Boot process for AIX installation (network) IBM Power Systems

Power on machine IPL ROM loads boot image from server using BOOTP

nim server

Load boot file Transfer control to mini-runtime environment

client

bootpd /etc/bootptab

Invoke boot script en0

bootp boot file name tftp boot file

Configure devices for installation

network

boot file NIM SPOT resource is NFS mounted to help configure devices

Installation images NFS Install script runs mounted from server © Copyright IBM Corporation 2009 Figure 4-5. Boot process for AIX installation (network)

AN151.0

Notes: Booting over the network, using NIM, is essentially the same as booting from CD or tape, except that the boot file (SPOT file) and installation images come from the server system over the network.

Load boot image into memory If the client system is booting from the network, the IPL ROM sends (using a bootp request) a request to the NIM server for the name of a boot file. The NIM server then uses the /etc/bootptab file to determine the boot file name and returns that name to the client system. Finally the client system requests the NIM server (using the tftp command) to download the boot file over the network.

Invoke boot script and configure devices needed for installation When booting over the network, the SPOT is mounted from the server using the Network File System (NFS). © Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-13

Instructor Guide

Invoke install script When booting over the network, the install script installs AIX using installation images which are NFS mounted from the NIM server.

4-14 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Provide a description of the components and flow of a network installation using NIM. Details — Additional information — Transition statement — In order for NIM to manage this install process, it needs to have objects that describe the machines and resources involved. Let’s take a high level look at what these are.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-15

Instructor Guide

NIM objects IBM Power Systems

•NIM objects stored in ODM Ne tw ork s

– Networks – Machines

s rce sou Re

•Object classes

– Resources Machines

•Group objects – mac_group – res_group

© Copyright IBM Corporation 2009

Figure 4-6. NIM objects

AN151.0

Notes: NIM is made up of various components, called objects. There are three classes of objects: machines, networks, and resources. All information about the NIM environment is stored in Object Data Manager (ODM) databases on the NIM master system.

Network objects Network objects are objects in the NIM database that represent information about each Local Area Network (LAN) that is part of the NIM environment. These objects and some of their attributes reflect the physical characteristics of the network. NIM network objects are not used to perform management tasks in the overall network environment; they are only used to represent the physical network topology of the NIM environment. In other words, if something changes in the physical network environment, you must remember to make the change in the NIM database as well.

4-16 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

There are five types of networks supported by NIM: Token-Ring, Ethernet, ATM, FDDI, and generic. These network types are represented as network objects in the NIM environment.

Machine objects Machines in the NIM environment are simply the machines that will be managed by NIM.

Resource objects All operations on clients in the NIM environment require one or more NIM resources. NIM resource objects represent the files, directories, and devices that are used in order to support each type of NIM operation. Some resources are AIX filesets (or devices which contain filesets) that can be installed on a client machine. Other resources are scripts or configuration files that are used in the installation process. The location and other attributes for these resources are stored as resource objects in the NIM database.

Group objects NIM supports two types of group objects: - mac_group A machine group is a group of machine objects. You can use a machine group to simplify performing a NIM operation on multiple machines. - res_group A resource group is a group of resource objects. If you have a set of resources that you typically want to use at the same time, you can create a resource group to simplify allocating those resources.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-17

Instructor Guide

Instructor notes: Purpose — Describe the NIM objects. Details — Additional information — Transition statement — It is useful to be able to list the existing defined objects and their attributes. Let’s look at the lsnim command that provides this information. Then we will explain the meaning and use of the displayed attributes for each type of object. Later, we will cover how to create these objects.

4-18 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Listing NIM objects and their attributes IBM Power Systems

•To list all defined NIM objects –

lsnim master boot nim_script ent0 ...

machines resources resources networks

master boot nim_script ent

•To list attributes of a NIM object –

lsnim -l

# lsnim –l ent0 ent0: class = type = Nstate = prev_state = net_addr = snm = routing1 =

networks ent ready for use information is missing from this object's definition 10.31.192.0 255.255.240.0 default 10.31.192.1 © Copyright IBM Corporation 2009

Figure 4-7. Listing NIM objects and their attributes

AN151.0

Notes: The lsnim command is used to list various types of NIM information. You have the opportunity to experiment with lsnim in the exercise.

Listing objects and attributes When used without any argument, lsnim displays all the currently defined NIM objects. Using -l, you can get a long listing of an individual object.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-19

Instructor Guide

Instructor notes: Purpose — Explain how to use the lsnim command to display objects and their attributes. Details — Keep the focus on these uses of lsnim. The listing of all NIM objects and the listing of attributes for a particular object are the two most common uses of lsnim. The other lsnim options are better left to the NIM course. Additional information — Transition statement — We will now discuss the various NIM objects in the context of configuring NIM. Let’s start with a summary of the basic NIM configuration procedure.

4-20 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

NIM configuration IBM Power Systems

• Configure master – Install master NIM file sets. – Run nimconfig.

• Define resources – Create real resource with full path. – Create resource object to represent.

• Define networks – How do clients on networks access the master.

• Define clients – Able to relate network address of the client with object name

• Allocate resources to clients – Different operations need different resources.

• NIM operations on clients – Setting up for operation – Initiating operation © Copyright IBM Corporation 2009

Figure 4-8. NIM configuration

AN151.0

Notes: Installing NIM The NIM filesets that need to be installed on a machine designated to act as NIM master are: - bos.sysmgt.nim.client - bos.sysmgt.nim.master - bos.sysmgt.nim.spot

Configure master Configuring the master machine consists of installing the master filesets and running nimconfig. You must specify the primary network interface and a NIM network name for the network which is attached to the primary interface. There are several optional attributes which can be specified.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-21

Instructor Guide

nimconfig creates the NIM database and the /etc/niminfo configuration file. It also starts the NIM daemon (nimesis) and creates an entry in /etc/inittab so that nimesis is started on every boot of the master machine.

Create NIM objects Next you need to create the NIM objects: - resources Specify the directories and files needed by NIM. - networks You have already defined the master’s primary network (nimconfig). If some of your clients are connected to separate networks or subnets, you need to define these networks and routes for the master to communicate with all the clients and routes for any servers to communicate with their clients. - clients Specify the client machines you are installing using NIM.

Allocate resources Once the resource and machine objects are defined, you need to decide what operation you want to perform on your client machine. For each operation, there are different resources needed. Next you need to allocate the resource to your client. This identifies which resource object will be used to implement the client operation. There are two ways in which this is done: - Use the nim -o allocate operation (or equivalent SMIT dialog) to relate the resource to the machine. - Use a SMIT dialog which prompts for the resources to allocate as part of the machine operation definition.

Perform the operation on the client There are many different operations that you might perform on a client. You might install an operating system, install maintenance, provide support for a maintenance boot or a diagnostic boot, and more. There are usually two phases related to an operation. - The NIM setup in which the NIM server is configured to support the task you want to perform on the client - The initiation of that task The task can be initiated from the client; or, provided that the client machine has already been configured as a NIM client, the NIM master can initiate the task.

4-22 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Provide a high level look at the NIM configuration steps. Details — This is mainly a menu of topics covered on the following slides, except for the fileset installation and nimconfig execution; so, cover those with this visual, but just cover the rest at a high level to provide an understanding of the sequence. Additional information — For resources, note that there are special location requirements when installing High Availability Management Server (HA MS), an optional feature of Cluster System Management (CSM). Transition statement — As you can see, once we have configured the NIM master, much of the work with NIM is the definition of the machines and resources. Let’s take a closer look at what needs to be defined and how that is done, beginning with the NIM resources.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-23

Instructor Guide

resources objects IBM Power Systems

•Object types – – – – – – – – – –

Represents the network boot image resource Directory for customization scripts created by NIM Shared Product Object Tree - equivalent to /usr filesystem lpp_source Source device for software product images bosinst_data Config file used during base system installation image_data Config file used during base system installation mksysb A mksysb image script A user created script which is executed on a client to perform customization resolv_conf Configuration file for name-server information . . . (additional resource types)

boot nim_script spot

• Attributes – – –

location server Rstate, prev_state – . . . (additional attributes)

Directory path Machine which servers this resource Status attributes

© Copyright IBM Corporation 2009

Figure 4-9. resources objects

AN151.0

Notes: Resources are the files and directories that NIM uses to install software on the clients.

Resource types Resource types identify the different types of files used by NIM. For example: - An lpp_source resource is a directory containing product images to be installed. - A spot resource contains the files used during the boot operation. - A script resource is a user definable script which can be used to perform customization on a newly installed client. - A mksysb resource is a backup image that can be used to install a client.

4-24 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Resource attributes Attributes for resources identify where the resource can be found, its status, and so forth: - location defines the directory path to the resource. - server identifies which machine serves the resource. - Rstate indicates whether a resource is available for clients to use. - prev_state indicate the previous value of Rstate.

Additional resource types and attributes There are a number of different resource types, each having its own set of attributes. lsnim is probably the easiest way to get information about NIM attributes.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-25

Instructor Guide

Instructor notes: Purpose — Cover resources objects and their attributes. Details — Note the variety of resources and that the attributes basically map between the resource name and the location of the file or directory that contains that resource. Be careful not to pre-teach the details on resources covered on later visuals, such as lpp_source, spot, or mksysb. These are covered after the discussion of operations, so they can be discussed in the context of those operations (in particular, the bos_inst operation). Additional information — Transition statement — Let’s take a closer look at the resource types that we will need to define to support a NIM installation of an AIX operating system, starting with the lpp_source.

4-26 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

resources objects: lpp_source IBM Power Systems

•lpp_source – Directory containing software product images – Supports NIM install operations (bos_inst and cust) – Also used for creation of spot resource

•Defining an lpp_source: # nim -o define -t lpp_source -a server= -a location= [ optional attributes ]

• # smit nim_mkres

o nc e g

py

\ \ \ \

lppsource

aix61-00-00

aix61-01-00

bos filesets

© Copyright IBM Corporation 2009

Figure 4-10. resources objects: lpp_source

AN151.0

Notes: lpp_source When a resource of this type is defined, it represents a directory in which software product images are stored. lpp_source resources are used to support NIM install operations. An lpp_source can also be used as the source for the creation of a SPOT. When you perform a NIM install operation and have allocated an lpp_source resource to the client, NIM NFS mounts the lpp_source directory on the client, and then invokes the installp command on the client to install from the directory. When installp finishes, NIM automatically unmounts the resource.

simages attribute This attribute is used to indicate that an lpp_source resource contains the set of installable images to which NIM requires access to perform its basic functionality. This

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-27

Instructor Guide

basic set of images is referred to as support images or simages. NIM automatically manages the use of this attribute as part of the management of an lpp_source. NIM adds this attribute to the definition of an lpp_source when it provides the required simages, and NIM removes this attribute from the object's definition if a required image becomes unavailable. Some NIM operations require access to an lpp_source that has this attribute as part of its definition, so having this attribute can be important. Perform the check operation on the lpp_source to have NIM check to see whether the simages requirement has been fulfilled. If it has, NIM adds this attribute to the lpp_source definition.

Defining an lpp_source resource You can use the command line or SMIT to define an lpp_source. The visual shows how the required attributes would be specified on the command line. Required attributes are: - server= NIM name for the machine which serves this resource - location= Directory where the lpp_source files are located There are a number of optional attributes, including: - source= If you already have a directory that contains the software images, the source attribute is not required. If you want NIM to create a directory and populate it for you, the source attribute specifies the directory or device which contains the software images to be copied into the lpp_source directory. - packages= Use the packages attribute if you only want NIM to copy specific packages from the source. The final argument is the name of the NIM object: - The last argument on the nim command line is the name of the object you are operating on, in this case, the name of the lpp_source resource we are creating. Notes: - If you add or remove an installable image from the lpp_source, perform the check operation on that object so that NIM rebuilds the .toc (table of contents) file, which resides in the lpp_source directory. This is important, as the installp command uses the .toc to determine which images are available. - Starting at AIX 5L Version 5.3, there is an update operation, which allows you to update an lpp_source resource by adding and removing packages. Previously, you could copy packages into an lpp_source directory or remove packages from an 4-28 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

lpp_source directory and run nim -o check to update the lpp_source attributes. Previously, SMIT allowed you to add packages to an lpp_source through the smit nim_bffcreate fast path. However, this SMIT function does not check to see if the lpp_source is allocated or locked, nor does it update the simages attribute when finished. The update operation has been created to address this situation.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-29

Instructor Guide

Instructor notes: Purpose — Cover the definition of the lpp_source. Details — Additional information — Transition statement — Once we have an lpp_source, we next need to use the lpp_source to generate a matching SPOT. Let’s look at how that is done.

4-30 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

resources objects: spot IBM Power Systems

• spot – /usr directory tree used during network boot – Matching network boot images generated: - /tftpboot/... • Defining a SPOT # nim -o define -t spot -a server= -a location= -a source= [ optional attributes ]

\ \ \ \ \

lppsource

spot

spot61-00-00

spot61-01-00

usr

• # smit nim_mkres

bin

include

lib

etc

© Copyright IBM Corporation 2009

Figure 4-11. resources objects: spot

AN151.0

Notes: SPOT • Components - A /usr file system A Shared Product Object Tree (SPOT) is a directory containing AIX code that is equivalent in content to the code that resides in a /usr file system on a system running AIX. The NIM SPOT creation process restores files from AIX filesets into the directory in which the SPOT resides. The SPOT is NFS-mounted on a booting client to provide necessary device support for the boot process. Boot image: As part of the creation of a SPOT resource, NIM also creates network boot images. The network boot images are constructed in /tftpboot on the same machine in © Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-31

Instructor Guide

which the SPOT is created. The boot images are constructed with code from the newly created SPOT. The boot images are also sometimes called spot files. The boot image file is transferred to the client system using the BOOTP protocol. Since one SPOT can potentially support several types of machines, several boot image files may be created. The naming convention identifies each boot image as: ..., where: • identifies which architecture this boot image supports: chrp, rspc, and so forth. • specifies whether this boot image contains a multi-processor (mp) or uni-processor (up) kernel. • identifies the network type: ent, tok, and so forth. These days, the only combination most of us work with is: chrp.mp.ent. During a network boot, the boot image is transferred over the network and loaded into the client’s memory. - /tftpboot It is good practice to make /tftpboot be a separate file system. This removes the risk of filling the root file system. If you are supporting multiple AIX versions on multiple machine types or multiple network types, this directory can get quite large.

• Defining a SPOT resource - Command line: The visual shows the nim syntax to define a spot. The -t flag identifies the type of object you wish to define. In addition, you must specify the following required attributes: • server= NIM name for the machine which serves this resource • location= Directory (on the server) where the SPOT files are located • source= This attributes points to the location of the files used to create the SPOT resource. This can be an existing lpp_source resource, a device name (for example: /dev/cd0) or a directory which contains the source filesets used to create the SPOT. Most commonly, the lpp_source resource is created first and then the spot is created from the lpp_source. • The last argument on the nim command line is the name of the object you are operating on, in this case, the name of the SPOT resource we are creating.

- Optional attributes There can be a number of optional attributes, including:

4-32 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

• installp_flags= NIM calls installp to create the SPOT. By default, NIM uses the -agX flags when calling installp. You can use installp_flags to specify the options you require. • auto_expand={yes|no} Indicates that file systems should be automatically expanded if additional space is needed.

- Defining a SPOT using SMIT The visual shows the SMIT fast path for defining resource objects. SMIT opens with a window that allows you to select which type of resource you want to define. Once you select a resource type, SMIT opens a window with the necessary fields to specify the resources and attributes for that type of object, in this case, a SPOT.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-33

Instructor Guide

Instructor notes: Purpose — Cover how to define a SPOT. Details — Additional information — Transition statement — While we can use an lpp_source and matching SPOT to install a new operating system, quite often the network installs are actually recoveries of mksysb images. This is either to recover a lost rootvg or to clone an AIX image to other machines or LPARs. Let’s see how we define a mksysb resource.

4-34 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

resources objects: mksysb IBM Power Systems

•mksysb – Identifies a mksysb system backup image file – Used for bos_inst operations

• Defining a mksysb # nim -o define -t mksysb

\

-a server= -a location= [ optional attributes ] •

\ \ \

# smit nim_mkres

© Copyright IBM Corporation 2009

Figure 4-12. resources objects: mksysb

AN151.0

Notes: mksysb A mksysb resource represents a system backup image file created using the mksysb command. A mksysb resource can be used as the source of the BOS run-time files when a bos_inst is performed.

Defining a mksysb resource You can use the command line or SMIT to define a mksysb. You can use an existing mksysb image, or you can have nim create one for you. (nim calls mksysb to create the new backup.) Required attributes are: - server= NIM name for the machine which serves this resource

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-35

Instructor Guide

- location= If the system backup image already exists, enter the name of the file where the image resides. If you are creating the system backup image as part of this operation, enter the name of the file where you want the image placed after it is created. There are a number of optional attributes, including: - mk_image={yes|no} If the backup file already exists, specify no (the default). If you want nim to create a new backup file, specify yes. - source= If you want nim to create a backup image for you, specify the NIM name of the machine you want to back up. - mksysb_flags= You can use this attribute to specify optional flags for the mksysb command, if needed.

4-36 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Cover the definition of a mksysb resource. Details — Additional information — Transition statement — Once we have our resources defined, we next need to define the machines we want to manage and the networks to which they are connected. Let’s start with the networks object.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-37

Instructor Guide

networks objects IBM Power Systems

•Object types – ent – fddi – tok – atm – generic

Ethernet network FDDI network Token ring network ATM network (no network boot capability) Generic network (no network boot capability)

• Attributes – – – –

net_addr Network address for a network snm Subnetmask for a network routing Routing information for a network Nstate, prev_state Status attributes – . . . (Additional attributes)

master

router

client

© Copyright IBM Corporation 2009

Figure 4-13. networks objects

AN151.0

Notes: In order to perform certain NIM operations, the NIM master must be able to supply information necessary to configure client network interfaces. The NIM master must also be able to verify that client machines can access all the resources provided by the NIM server. To avoid the overhead of repeatedly specifying network information for each individual client, NIM network objects are used to represent the networks in a NIM environment.

Network types NIM supports the four network types shown in the visual, plus a generic type. Network boot support is provided for Ethernet, Token-Ring, and FDDI. Network boot operations are not supported on ATM or generic networks. NIM supports both standard Ethernet and IEEE 802.3 Ethernet networks.

4-38 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Network attributes Network attributes include the network address, subnet mask, routes, and status. The Nstate attribute indicates whether the object definition of the network is complete. NIM requires that all networks be able to communicate with the NIM master, either by the master being directly connected to them or by having a NIM route to a network to which the master connects.

Routing NIM routing information represents standard TCP/IP routing information for the networks that are part of a NIM environment. This information defines the gateways that are used to establish communication between the master machine and the clients. The routing attribute defines a route and includes: - A destination (default or a NIM network name) - A gateway address If needed, multiple routes can be created and are numbered routing1, routing2, and so forth.

Additional attributes There are a number of other attributes for each network object. lsnim is probably the easiest way to get information about NIM attributes.

Other network information The ring_speed (for token-ring) and cable_type (for Ethernet) are not attributes of the network objects, they are attributes of the machine objects.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-39

Instructor Guide

Instructor notes: Purpose — Cover networks objects and their attributes. Details — Point out that we do not usually define a network object directly. Instead, the information provided when defining a machine is used to either match to an existing network object or to create a new network object. The most important point to make is that the networking information is from the perspective of the machine being defined. In the network diagram shown in the visual, when defining the client, it is the router interface which is in the network to the right that needs to be defined as the gateway. The network option is defining how the client would network boot in order to send a bootp request to the NIM server. Additional information — Unlike other network adapters, ATM adapters cannot be used to boot a machine. This means that installing a machine over an ATM network requires special processing (refer to the AIX Installation Guide and Reference, Chapter 20. Basic NIM Operations and Configuration for instructions). The generic network type is used to represent all other network types where network boot support is not available. For clients on generic networks, NIM operations that require a network boot, such as bos_inst and diag, are not supported. However, non-booting operations, such as cust and maint, are allowed. Transition statement — Next, let’s look at the machines object.

4-40 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

machines objects IBM Power Systems

•Object types – master – standalone – diskless

Master

– dataless

• Attributes – platform

Architecture

– netboot_kernel

Up or mp

– if

Network interface

– serves – Cstate,

Standalone

information Resource served by this machine prev_state, Mstate

Diskless

Status attributes –

. . . (additional attributes) Dataless © Copyright IBM Corporation 2009

Figure 4-14. machines objects

AN151.0

Notes: NIM supports four types of machines: the master type and three types of clients: stand-alone, diskless, and dataless.

Master The master machine is defined by installing the master fileset, and then performing some quick configuration. There can only be one master in the NIM environment. Once a machine is defined as the master, it can participate in NIM operations.

Stand-alone clients Stand-alone clients have local disk resources. They are installed from the NIM server, but once installed, they boot and operate from their local disks.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-41

Instructor Guide

Diskless clients Diskless clients have no disks of their own. They run entirely using resources from the NIM server.

Dataless clients Dataless machines can only use a local disk for paging space and the /tmp and /home file systems. All of the other storage is provided over the network by the NIM server.

Machine attributes Each machine object belongs to one of the four machines’ object classes. Additionally, machine objects store other attributes about the machine. The visual shows a few of them: - The platform attribute describes the machine architecture (chrp, rspc, and so forth). - netboot_kernel indicates which type of kernel is required, uni-processor (up) or multi-processor (mp). - if is used to provide information about a machine’s network interfaces. If there are multiple interfaces, they are numbered: if1, if2, and so forth. This attribute includes the NIM network this interface connects to, the host name, the MAC address, and the network type. - The serves attribute identifies resources that are served by this machine. If the machine serves several resources, there will be a serves attribute for each resource. - Cstate indicates the NIM operation that is currently being performed on a machine or that no NIM operations are currently being performed. - prev_state shows the previous Cstate. - Mstate shows the execution state for a machine. Note: NIM attempts to keep the value of this attribute synchronized with the machine's execution state, but NIM does not guarantee its accuracy. Perform the check operation on the machine for NIM to attempt to determine the machine's execution state.

Additional attributes There are a number of other attributes for each machine object. lsnim is probably the easiest way to get information about NIM attributes.

4-42 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain machine objects and their attributes. Details — Keep the focus on standalone and master. Diskless and dataless is rarely used these days; only provide the briefest of definition for them and move on. Additional information — Transition statement — Let’s discuss how we would define a machine object, first from the command line, and then using SMIT.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-43

Instructor Guide

Defining a machine object IBM Power Systems

• # nim -o define -t standalone -a platform=PlatformType \ -a netboot_kernel=NetbootKernelType -a if1=InterfaceDescription -a net_definition=DefinitionName -a cable_type1=TypeValue MachineName

\ \ \ \

• Example: # nim -o define -t standalone -a if1="network1 lpar1 0 ent0" \ -a cable_type1="N/A" -a connect=nimsh -a platform=chrp -a netboot_kernel=mp

\ lpar1

• # smit nim Perform NIM Administrative Tasks Manage Machines Define a Machine © Copyright IBM Corporation 2009

Figure 4-15. Defining a machine object

AN151.0

Notes: Follow these steps to add a client with the network information using SMIT: • On the NIM master, add a standalone client to the NIM environment by using SMIT (nim_mkmac is the fast path). • Specify the host name of the client. - This is the name translation of the IP address of the install adapter of this machine. By default, this also becomes the hostname of this client when the client is installed. If using DNS, enter in the long host name here (lpar1.my.company.com). • The next SMIT screen displayed depends on whether NIM already has information about the client's network. Supply the values for the required fields or accept the defaults. Use the help information and the LIST option to help you specify the correct values to add the client machine. The if1 quoted value, in the example, has multiple space delimited fields as follows: - network is the network object name.

4-44 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

- lpar1 is the hostname. - 0 is the place holder for the mac address. - ent0 is the physical adapter used by the client to reach the master.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-45

Instructor Guide

Instructor notes: Purpose — Cover how to define standalone machines. Details — Additional information — Transition statement — An easy way to define a machine is to use SMIT. The visual shows the SMIT menu path to use, but let’s look at the resulting dialog panel.

4-46 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Define a client using SMIT IBM Power Systems

Define a Machine

* NIM Machine Name * Machine Type * Hardware Platform Type Kernel to use for Network Boot Communication Protocol used by client Primary Network Install Interface * Cable Type Network Speed Setting Network Duplex Setting * NIM Network * Host Name Network Adapter Hardware Address Network Adapter Logical Device Name IPL ROM Emulation Device CPU Id Machine Group Comments

[lpar1] [standalone] [chrp] [mp] [nimsh] N/A [] [] network1 lpar1 [0] [ent0] [] [] [] []

+ + + + + + +

+/ +

© Copyright IBM Corporation 2009

Figure 4-16. Define a client using SMIT

AN151.0

Notes: NIM Machine Name/Host Name - There are two names given to your client: a NIM name and a hostname. The NIM name is what is used when performing operations on this client. The hostname becomes the system-wide hostname of this client and is also the name associated with the client's adapter that NIM uses to do the client install. In our case, we used a short name on the prior panel. Hence, the NIM name and hostname are identical. If we had used a long name on the prior panel, then we would see the long name for the hostname and the short name for the NIM Name. For example, if we put lpar1.my.company.com on the prior panel, then the hostname would be lpar1.my.company.com and the NIM name would be lpar1. Machine Type - Only one client machine type is used anymore - standalone. Hardware Platform Type - You can choose between chrp, rspc or the really old classical rs6k. Since the chrp architecture came out in the mid 90s, most folks are using that today. If you want to double check what architecture your client is using, run the command: getconf -a | grep MACHINE_ARCHITECTURE. On older AIX release levels, try the bootinfo -p command. © Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-47

Instructor Guide

Kernel Type - If a client machine is running the 64-bit kernel, then mp should be chosen. However, if the client is running the 32-bit kernel, either the up or mp kernel may be chosen. To determine what client is currently, run the ls -l /usr/lib/boot/unix command. Notice whether it is linked to the 64 up or mp kernel in that same directory. Also the getconf -a can be run to determine if the machine is capable of running an mp kernel. An MP_CAPABLE setting of 1 means yes. On older releases, run the bootinfo -z command to find out if the machine can handle mp. A setting of 1 again means yes. Starting with version 6.1, AIX only uses a 64 bit kernel. Communication Protocol - Either the less secure shell protocol (rsh) may be used or the newer (nimsh) protocol (which is available in AIX 5L 5.3 and later versions of AIX). Note: Each client can have a different setting. Cable Type - Most configurations today are set to N/A (not applicable, as modern adapters are autosensing of the connection type, or only support a single type (such as twisted pair or fiber).This can be double checked by running the lsattr -El entX command to notice whether the cable_type field shows. If not, then setting to N/A should work. If running twisted pair cable, then setting it to tp should work. Network Speed/Duplex - These settings are only used when performing a push boot operation on the client. If not set, the current SMS speed/duplex settings for your install adapter are used. NIM Network - This is the NIM network to which the client is assigned. Hardware Address - This is the MAC address of the client. It is only needed for BOOTP broadcast operations. This MAC address, if ever needed, can be retrieved by looking at your client's Remote IPL SMS menus. Logical Device Name - This is the name of NIC physical adapter over which you plan to install. For example, it might be ent0 or ent1. This adapter receives the hostname you have set above on this screen in the Host Name field when the client is installed. IPL ROM Emulation - This is only set for machines that do not support network boot. Please see online documentation for details. CPU_Id - This is the machine ID retrieved from running the uname command on the client. It will be used to uniquely identify this client in the future. You do not have to set this, NIM will configure this. Machine Group - You can assign a client to a machine group. Command Line - The equivalent NIM command for the above operation is: nim -o define -t standalone -a if1="network1 lpar1 0 ent0" \ -a cable_type1="N/A" -a connect=nimsh \ -a platform=chrp -a netboot_kernel=mp lpar1 Use the lsnim -q define -t standalone command for more information or see your nim man page.

4-48 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Cover how to use the SMIT dialog panel to define a machine. Details — Additional information — Transition statement — Now that we have all of our objects defined, we only need to relate what resources are used with a machine, and then set up the NIM support to a particular operation. Let’s look at the various NIM operations and how they relate to the resource allocations.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-49

Instructor Guide

NIM operations IBM Power Systems

•Operations on clients – bos_inst

• rte • mksysb – cust – maint – diag – maint_boot

•Procedure – Allocate resources to clients (for intended operation) – Perform operation – Unallocate resources

•Other NIM object operations – define, change, remove, allocate, deallocate, maint, lslpp, lppchk, check, and so forth

© Copyright IBM Corporation 2009

Figure 4-17. NIM operations

AN151.0

Notes: Operations on clients NIM supports several different types of operations to install and manage software on NIM clients. In addition, there are operations to manage the NIM objects themselves. For the purposes of this class, we are primarily interested in three client operations: - bos_inst Allows you to install AIX on a client - cust and maint Allows you to update and maintain AIX software - maint_boot Allows you to boot a client to maintenance mode over the network

4-50 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

bos_inst A bos_inst operation is used to perform a Basic Operating System (BOS) installation on a client. There are two types of bos_inst operations: rte and mksysb.

bos_inst: rte installations An rte install instructs the BOS installation process to install AIX from the images in the lpp_source resource specified for the operation. The default bos_inst operation is rte (runtime environment).

bos_inst: mksysb installations A mksysb bos_inst operation installs the client from a mksysb resource. A mksysb resource is a system backup image created using the mksysb command (or the SMIT or WebSM interfaces to the mksysb command). Installing a system from backup reduces, and often eliminates, repetitive installation and configuration tasks. For example, a backup installation can copy optional software installed on the source system, in addition to the Base Operating System. The backup image also transfers many user configuration settings. If you have many clients with the same software configuration, you could use one mksysb image as the source to install all of them.

bos_inst customization The NIM installation process provides the ability to invoke a customization script after AIX is installed on the system. This is done by allocating a script resource to the client before performing the bos_inst. That script could be used to perform such customization as setting passwords, changing network addresses, and so forth.

cust This NIM operation performs software customization on a running NIM client. You can use the cust operation to: - Update existing software - Install additional software - Run a customization script

maint This NIM operation performs software maintenance operations on clients, such as committing applied software, removing software, and so forth.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-51

Instructor Guide

diag This NIM operation enables the client to boot to diagnostics over the network.

maint_boot This operation enables the client to boot to maintenance mode over the network.

Procedure for operations In order to perform a NIM operation on a client machine, there are a number of steps which must be performed: 1) Allocate the required resources to the client machine. • This makes the resources available to the client. You can explicitly allocate the resources before your perform the NIM operation, or you can allocate the resources at the same time you perform the operation. • Allocation usually involves NFS exporting the resource’s directory so the client can NFS mount it over the network. • The initial boot image is actually transferred using tftp. To provide this network boot image, an entry is created in the /etc/bootptab file and files are created in the /tftpboot directory. 2) Perform the operation. 3) Unallocate resources. • While a resource is allocated to a client, the resource is locked to block any changes. After the operation completes, the resources should be deallocated from the machine so they can be freed again for updates or changes.

Other NIM object operations In addition to operations which directly affect NIM clients, there are a number of NIM operations used for managing NIM objects. In addition to the obvious (define, change, remove, allocate and unallocate), you can also: - Update or add software to a spot or lpp_source resource. (cust operation) - Perform software maintenance on a spot or lpp_source resource. (maint operation) - List LPP information in a resource. (lslpp operation) - Verify software packages in an spot or lpp_source resource. (lppchk operation) 4-52 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

- Check the status of a NIM object. (check operation) The actual tasks performed by the check operation differ depending on which type of object you are operating on.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-53

Instructor Guide

Instructor notes: Purpose — Cover the various NIM operations that can be performed on machines. Details — Additional information — Transition statement — Let’s take a closer look at the most common NIM operation setting up for installation of an operating system.

4-54 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

bos_inst operation IBM Power Systems

•Command line # nim -o bos_inst -a lpp_source= -a spot= -a source={rte|mksysb} -a mksysb= -a boot_client={yes|no} [optional attributes]

\ \ \ \ \ \ \

• # smit nim_bosinst

© Copyright IBM Corporation 2009

Figure 4-18. bos_inst operation

AN151.0

Notes: bos_inst Configuring NIM to perform a bos_inst can be done from the command line or through SMIT. There are two steps: allocating resources to the client and enabling the bos_inst. It is also possible to combine these steps into one command: # nim -o bos_inst -a lpp_source= \ -a spot= \ [additional resources] \ [-a source={rte|mksysb} \ [additional attributes] \ If you use SMIT to enable a bos_inst, SMIT opens a series of windows to prompt you for the required information and then displays a window where you can set additional optional attributes.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-55

Instructor Guide

Required information The required information for a bos_inst operation is: - As always, the last argument specifies the NIM object you want to operate on. In this case, this is the target client machine that you wish to install. - spot= Specifies the SPOT resource you wish to use. - lpp_source= This is the name of the lpp_source resource you wish to use for the installation. In AIX 5L V5.3 and later, this attribute is not required for a mksysb install (see note below).

Optional information Optional attributes include: - source={rte|mksysb} mksysb= If you do not specify the source attribute, nim performs a rte bos_inst. If you set source=mksysb, then you must use the mksysb attribute to specify the name of the mksysb resource you wish to use. Note: In most cases, you must still include an lpp_source resource, even if you are doing a mksysb install. With AIX 5L and later, if you have created a mksysb that includes all devices, you do not need to specify an lpp_source. - boot_client={yes|no} When set to yes, the master attempts to reboot the client machine automatically for reinstallation. For this option to succeed, the client must be running and initialized as a NIM client or have rhosts permissions granted to the master. If set to no, the server is configured to support the network boot. The actual boot would need to be initiated later.

4-56 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Cover how to set up for installing an operating system on a machine. Details — Additional information — Note: In the CSM environment, boot_client is normally set to no and the client is rebooted using the netboot script from the management server. Transition statement — Having run a bos_inst operation on a machine object, NIM is now prepared to respond to a network boot request from that machine. Network booting an LPAR is something that was covered in previous courses, so we will not repeat that discussion here (though the later lab exercises will have you practice this). This unit was just a high level introduction to NIM. To properly use NIM, there is much more you will need to understand. Let’s look at how you can build your skills beyond what has been taught here.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-57

Instructor Guide

More information about NIM IBM Power Systems

•Documentation –NIM from A to Z in AIX 4.3 (http://www.redbooks.ibm.com/ ) –AIX Version 6.1 Installation Guide and Reference

•IBM Training class (AU08) –AIX 5L Network Installation Manager (NIM) (http://www.ibm.com/services/learning/index.html )

•EZ NIM –nim_master_setup, nim_client_setup

© Copyright IBM Corporation 2009

Figure 4-19. More information about NIM

AN151.0

Notes: More information about NIM NIM is a very powerful tool; it can be used in many different ways. In this topic, we introduced some basic NIM concepts and terminology. If you plan to make use of NIM in your cluster, we strongly recommend that you get more information so that you can use NIM most effectively.

Documentation and Redbook The following books provide in depth information about using NIM: - AIX Version 6.1 Installation and migration - AIX 5L Version 5.2 AIX Installation in a Partitioned Environment The AIX documents listed above can be obtained by visiting the pSeries library at: http://www-1.ibm.com/servers/eserver/pseries/library 4-58 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

- SG24-5524 NIM from A to Z in AIX 4.3 (Redbook: http://www.redbooks.ibm.com/)

Classes You should also consider the following class. - AU08 AIX 5L Network Installation Management (NIM) (IBM Learning Services training course: http://www.ibm.com/services/learning/index.html)

EZNIM The SMIT EZNIM feature helps the system administrator by organizing the commonly used NIM operations and simplifies frequently used advanced NIM operations. Features of SMIT EZNIM include: - Task-oriented menus - Automatic resource naming that includes the level of the software used to create them - The user can review what steps will take place before executing a task, whenever possible. Use the smit eznim fast path to open the EZNIM main menu.

nim_master_setup SMIT EZNIM has a command line equivalent: the nim_master_setup command. For reference, here is the nim_master_setup usage message: # nim_master_setup -h Usage nim_master_setup: Setup and configure NIM master. nim_master_setup [-a mk_resource={yes|no}] [-a file_system=] [-a volume_group=] [-a disk=] [-a device=] [-B] [-v] -B -v

Do not create mksysb resource. Enable debug output.

Default values: mk_resource = yes file_system = /export/nim volume_group = rootvg device = /dev/cd0

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-59

Instructor Guide

nim_master_setup example Here is an example to give you an idea of the NIM resources created by: nim_master_setup (or EZNIM): # nim_master_setup -a file_system=/csminstall/nim \ [-a volume_group=othervg] -B Since we did not specify the device attribute, nim_master_setup will use /dev/cd0 as the source to create the lpp_resource. You can use -v for debug output or tail -f /var/adm/ras/nim.setup to get more information. In this example, we show the output of various commands to illustrate what nim_master_setup has done. # lsnim master boot nim_script master_net master_net_conf bid_ow 520lpp_res 520spot_res basic_res_grp

machines resources resources networks resources resources resources resources groups

master boot nim_script ent resolv_conf bosinst_data lpp_source spot res_group

# df -k ... /dev/lv10 1474560 491980 67% 11578 /dev/lv11 49152 47572 4% 17 # lsnim -l master_net master_net: class = networks type = ent Nstate = ready for use prev_state = ready for use net_addr = 9.41.90.0 snm = 255.255.255.0 routing1 = default 9.41.90.1 # lsnim -l master_net_conf master_net_conf: class = resources type = resolv_conf Rstate = ready for use prev_state = unavailable for use location = /csminstall/nim/resolv.conf alloc_count = 0 4-60 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4% /csminstall/nim 1% /tftpboot

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

server

= master

Note: resolv.conf is the same as the file on the NIM master. # lsnim -l bid_ow bid_ow: class = resources type = bosinst_data Rstate = ready for use prev_state = unavailable for use location = /csminstall/nim/bid_ow alloc_count = 0 server = master Note: bosinst.data is created with the following settings: install_method=overwrite prompt=no existing_system_overwrite=yes run_startup=yes accept_license=yes desktop=cde all_devices_kernels=yes . . . # lsnim -l 520lpp_res 520lpp_res: class = resources type = lpp_source arch = power Rstate = ready for use prev_state = unavailable for use location = /csminstall/nim/lpp_source/520lpp_res simages = yes alloc_count = 0 server = master # lsnim -l 520spot_res 520spot_res: class = resources type = spot plat_defined = chrp arch = power bos_license = yes Rstate = ready for use © Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-61

Instructor Guide

prev_state location version release mod oslevel_r alloc_count server Rstate_result mk_netboot mk_netboot

= = = = = = = = = = =

verification is being performed /csminstall/nim/spot/520spot_res/usr 5 2 0 5200-01 0 master success yes yes

# lsnim -l basic_res_grp basic_res_grp: class = groups type = res_group member1 = bid_ow member2 = 520lpp_res member3 = 520spot_res member4 = master_net_conf

4-62 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Cover sources for additional NIM information. Details — Additional information — Transition statement — One of the sources for additional NIM skills is the Network Installation Management course. Let’s briefly look at some of the additional skills covered in that course.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-63

Instructor Guide

Additional topics in NIM course IBM Power Systems

• Push operations and unattended installations • lppsource and SPOT management issues • Problem determination • Customization scripts • Resource creation (lppsource, mksysb) options • Group definitions • Client software maintenance and bundles • Alternate disk migration • Security and networking issues • NIM based backup, recovery, and cloning

© Copyright IBM Corporation 2009

Figure 4-20. Additional topics in NIM course

AN151.0

Notes:

4-64 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain what additional skills are covered in the NIM course. Details — Additional information — Transition statement — Having explained how to do a basic installation and configuration of NIM, let’s actually implement this in the lab exercise.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-65

Instructor Guide

Exercise 4 overview IBM Power Systems

• Configure an LPAR to be a NIM Master – Using an image which has an lpp_source subdirectory

• Setup for a network installation of a client

© Copyright IBM Corporation 2009

Figure 4-21. Exercise 4 overview

AN151.0

Notes:

4-66 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Direct the students to practice NIM in the lab exercise. Details — Additional information — Transition statement — Let’s review what we have covered with a checkpoint question.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-67

Instructor Guide

Checkpoint IBM Power Systems

1.

True or False: NIM can be used to fix an LPAR which fails to boot because of a problem with the /etc/inittab.

© Copyright IBM Corporation 2009

Figure 4-22. Checkpoint

AN151.0

Notes:

4-68 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Details —

Checkpoint solutions IBM Power Systems

1.

True or False: NIM can be used to fix an LPAR which fails to boot because of a problem with the /etc/inittab. maint_boot

© Copyright IBM Corporation 2009

Additional information — Transition statement —

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-69

Instructor Guide

Unit summary IBM Power Systems

Having completed this unit, you should be able to: • Configure an AIX partition for use as a NIM master • Set up NIM to support the installation of AIX onto a client

© Copyright IBM Corporation 2009

Figure 4-23. Unit summary

AN151.0

Notes:

4-70 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Review material through the use of checkpoint questions. Details — Additional information — Transition statement — Let’s move on to the next unit.

© Copyright IBM Corp. 2009

Unit 4. Network Installation Manager basics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-71

Instructor Guide

4-72 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit 5. System initialization: Part I What this unit is about This unit describes the boot process up to the point of loading the boot logical volume. It describes the content of the boot logical volume and how it can be recreated, if it is corrupted. The meaning of the LED codes is described and how they can be analyzed to fix boot problems.

What you should be able to do After completing this unit, you should be able to: • Describe the boot process through to the loading of the boot logical volume • Describe the contents of the boot logical volume • Interpret LED codes displayed during system boot and at system halt • Recreate the boot logical volume on a system which is failing to boot • Adjust the bootlist for the desired order of search • Describe the features of a service processor

How you will check your progress Accountability: • Checkpoint questions • Exercise

References AIX Version 6.1 Operating system and device management Note: References listed as “online” above are available at the following address: http://publib.boulder.ibm.com/infocenter/systems Online

RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems (at http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp) SA38-0509

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-1

Instructor Guide

SG24-5496

5-2

Problem Solving and Troubleshooting in AIX 5L (Redbook)

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit objectives IBM Power Systems

After completing this unit, you should be able to: • Describe the boot process through to the loading the boot logical volume • Describe the contents of the boot logical volume • Re-create the boot logical volume on a system which is failing to boot • Interpret LED codes during boot • Adjust the bootlist for the desired order of search © Copyright IBM Corporation 2009

Figure 5-1. Unit objectives

AN151.0

Notes: Introduction Hardware and software problems might cause a system to stop during the boot process. This unit describes the boot process of loading the boot image from the boot logical volume and provides the knowledge a system administrator needs to analyze the boot problem.

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-3

Instructor Guide

5-4

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor Notes: Purpose — Present the objectives of this unit. Details — Describe that boot errors are very frequent errors. Fixing these problems requires a good knowledge of the boot process. Additional information — Transition statement — Let’s start with an overview about the AIX boot process.

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-5

Instructor Guide

5-6

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

5.1. System startup process Instructor topic introduction What students will do — The students will identify the boot process of loading the boot logical volume. Additionally, students will be able to explain how bootlists are managed on the different hardware architectures and how to create a new boot logical volume. How students will do it — Through lecture and review questions. What students will learn — Students will: • Discover how an AIX system boots • Identify how the boot logical volume is used during the boot process • Identify how to manage bootlists • Identify how to create a new boot logical volume How this will help students on their job — By having a good understanding of the boot process, solving any boot problem is much easier.

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-7

Instructor Guide

How does a System p server or LPAR boot? IBM Power Systems

Possible failures Check and initialize the hardware POST.

Hardware error; unlikely with LPAR

Locate boot image using the boot list.

Unable to find any boot image

Load and pass control to boot image.

Boot image corrupted

Start AIX software initialization. © Copyright IBM Corporation 2009

Figure 5-2. How does a System p server or LPAR boot?

AN151.0

Notes: Check and initialize hardware (POST) After powering on a machine, the hardware is checked and initialized. This phase is called the Power On Self Test (POST). The goal of the POST is to verify the functionality of the hardware.

Locate and load the boot image After the POST is complete, a boot image is located from the bootlist and is loaded into memory. During a normal boot, the location of the boot image is usually a hard drive. Besides hard drives, the boot image could be loaded from tape, CD-ROM, or the network. This is the case when booting into maintenance mode. If working with the Network Installation Manager (NIM), the boot image is loaded through the network.

5-8

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

To use an alternate boot location you must invoke the appropriate bootlist by pressing function keys during the boot process. There is more information on bootlists, later in the unit.

Last steps Passing control to the operating system means that the AIX kernel (which has just been loaded from the boot image) takes over from the system firmware that was used to find and load the boot image. The operating system is then responsible for completing the boot sequence. The components of the boot image are discussed later in this unit. All devices are configured during the boot process. This is performed in different phases of the boot by the cfgmgr utility. Towards the end of the boot sequence, the init process is started and processes the /etc/inittab file.

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-9

Instructor Guide

Instructor notes: Purpose — Introduce the AIX boot process. Keep this at the overview level. Details — Additional information — You might mention at this point that logical key switches are used to determine which bootlist is used. If you press F5 or numeric 5, the system tries to boot from a default bootlist that contains the diskette, CD-ROM, hard drive, and network. If it boots from the hard drive, it will load AIX diagnostics rather than perform a normal boot. Transition statement — Let’s show how the boot image is loaded from the boot logical volume when booting from disk.

5-10 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Loading of a boot image IBM Power Systems

Firmware Boot (1) Diskette devices (2) CD-Rom

RAM

Boo ts codetrap

Boot Logical Volume (hd5)

(3) Internal disk (4) Network

hdisk0 Boot controller

© Copyright IBM Corporation 2009

Figure 5-3. Loading of a boot image

AN151.0

Notes: Introduction This visual shows how the boot logical volume is found during the AIX boot process. Machines use one or more bootlists to identify a boot device. The bootlist is part of the firmware.

Bootstrap code System p and pSeries systems can manage several different operating systems. The hardware is not bound to the software. The first block of the boot disk contains bootstrap code that is loaded into RAM during the boot process. This part is sometimes referred to as System Read Only Storage (ROS). The bootstrap code gets control. The task of this code is to locate the boot logical volume on the disk, and load the boot image. In some technical manuals, this second part is called the Software ROS. In the case of AIX, the boot image is loaded.

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-11

Instructor Guide

Compression of boot image To save disk space, the boot image is compressed on the disk. During the boot process the boot image is uncompressed and the AIX kernel gets boot control.

5-12 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain the loading of a boot image. Details — Additional information — Explain that many different error situations may come up. The bootlist might be incorrect, the disk could be damaged, the boot record might be wrong and more. LED codes should help to analyze the different kind of errors. Transition statement — Let’s see what parts belong to the boot logical volume.

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-13

Instructor Guide

Contents of the boot logical volume (hd5) IBM Power Systems

AIX Kernel

RAMFS

Reduced ODM

© Copyright IBM Corporation 2009

Figure 5-4. Contents of the boot logical volume (hd5)

AN151.0

Notes: AIX kernel The AIX kernel is the core of the operating system and provides basic services like process, memory, and device management. The AIX kernel is always loaded from the boot logical volume. There is a copy of the AIX kernel in the hd4 file system (under the name /unix), but this program has no role in system initialization. Never remove /unix, because it is used for rebuilding the kernel in the boot logical volume.

RAMFS This RAMFS is a reduced or miniature root file system which is loaded into memory and used as if it were a disk-based file system. The contents of the RAMFS are slightly different depending on the type of system boot:

5-14 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty Type of boot Boot from system hard disk Boot from the Installation CD-ROM Boot from Diagnostics CD-ROM

Contents of RAM file system Programs and data necessary to access rootvg and bring up the rest of AIX. When booted from in service mode, it will boot a diagnostics facility. Programs and data necessary to install AIX or perform software maintenance Programs and data necessary to execute standalone diagnostics

Reduced ODM The boot logical volume contains a reduced copy of the ODM. During the boot process, many devices are configured before hd4 is available. For these devices, the corresponding ODM files must be stored in the boot logical volume.

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-15

Instructor Guide

Instructor notes: Purpose — Describe the components of the BLV. Details — Introduce the different components as described in the student material. Describe that the AIX kernel from the BLV is used during the boot process. Additional information — Describe what the term reduced ODM means. Explain that device support is available only for devices that are marked as base devices in PdDv. The protofiles (in /usr/lib/boot and /usr/lib/boot/protoext) are used by the bosboot command to determine which files should be put into the RAMFS image that is included in the boot image. Transition statement — Many system boot problems involve being unable to locate a good boot image. In order to fix these problems, we often need to boot into special modes. Let’s look at what determines which boot device is used.

5-16 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

5.2. Unable to find boot image Instructor topic introduction What students will do — Learn how to control the boot sequence and how to boot to SMS mode. How students will do it — Lecture, discussion, review questions, and lab exercises What students will learn — How to boot to SMS mode and fix bootlist problems How this will help students on their job — They will be better prepared to deal with problems booting a system.

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-17

Instructor Guide

Working with bootlists IBM Power Systems

• Normal bootlist: # bootlist -m # bootlist -m hdisk0 blv=hd5 hdisk1 blv=hd5

normal normal

hdisk0 -o

hdisk1

• Customization service bootlist: # bootlist -m service -o cd0 hdisk0 blv=hd5 ent0

•Service bootlist – over network: # bootlist -m service ent0 gateway=192.168.1.1 \ bserver=192.168.10.3 client=192.168.1.57

© Copyright IBM Corporation 2009

Figure 5-5. Working with bootlists

AN151.0

Notes: Introduction You can use the command bootlist or diag from the command line to change or display the bootlists. You can also use the System Management Services (SMS) programs. SMS is covered on the next visual.

bootlist command The bootlist command is the easiest way to change the bootlist. The first example shows how to change the bootlist for a normal boot. In this example, we boot either from hdisk0 or hdisk1. To query the bootlist, you can use the -o option. The second example shows how to display the customizable service bootlist.

5-18 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

The bootlist command also allows you to use IP parameters to use when using a network adapter: » # bootlist -m service ent0 gateway=192.168.1.1 bserver=192.168.10.3 \ client=192.168.1.57 Using the service bootlist in this way can allow you to boot to maintenance or diagnostic using a NIM server without having to use SMS to specify the network adapter as the boot device.

Types of bootlists The normal bootlist is used during a normal boot. The default bootlist (hard coded in the firmware) is called when F5 or numeric 5 is pressed during the boot sequence. Most machines, in addition to the default bootlist and the customized normal bootlist, allow for a customized service bootlist. This is set using mode service with the bootlist command. The service bootlist is called when F6 is pressed during boot. For POWER5 and POWER6 systems, the numeric 6 key is used. For machines which are partitioned into logical partitions, the HMC is used to boot the partitions and it provides for specifying boot modes, thus eliminating the need to time the pressing of special keys. Since pressing either 5/F5 or 6/F6 causes a service mode boot and since a service mode boot using a boot logical volume will result in booting to diagnostics, these options are referred to in the HMC as booting to diagnostic either with the default bootlist or the stored (customizable) bootlist. Here is a list summarizing the boot modes and the manual keys associated with them (this may vary depending on the model of your machine): - F1 (graphic console) or 1 (ASCII console and newer models): Start an SMS (System Management Services) mode boot. - F5 (graphic console) or 5 (ASCII console and newer models): Start a service mode boot using the default service bootlist (which searches the removable media first). - F6 (graphic console) or 6 (ASCII console and newer models): Start a service mode boot using the customized service bootlist. You may find variations on the different models of AIX systems. Refer to the User’s Guide for your specific model at: http://publib.boulder.ibm.com/infocenter/pseries/index.jsp?topic=/com.ibm.pseries.doc/ hardware.htm.

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-19

Instructor Guide

Instructor notes: Purpose — Describe how to work with the bootlists. Details — Additional information — The bootlist command will accept one more mode called both. As you might suspect, the both mode sets the service and normal bootlist as the same time to the same value. Transition statement — The SMS programs provide another method to set a bootlist. Let’s take a look at how to start SMS.

5-20 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Starting System Management Services IBM Power Systems

• Reboot or power on the system • Press F1 or numeric 1 or specify SMS on HMC activate IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

1 = SMS Menu 8 = Open Firmware Prompt Memory

Keyboard

Network

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

5 = Default Boot List 6 = Stored Boot List SCSI

...

© Copyright IBM Corporation 2009

Figure 5-6. Starting System Management Services

AN151.0

Notes: SMS ASCII and graphic modes You can also change the bootlist with the System Management Services (SMS). The SMS programs are integrated into the hardware (they reside in NVRAM). The visual shows how to start the System Management Services in ASCII mode seen on newer systems. There is an equivalent graphic menu seen on older systems. During system boot, shortly before the firmware looks for a boot image, it discovers some basic hardware on the system. At this point the LED usually will display a value of E1F1. As the devices are discovered, either a text name or graphic icon for the resource will display on the screen. The second device discovered is usually the keyboard. When the keyboard is discovered, a unique double beep tone is usually sounded. Having discovered the keyboard, the system is ready to accept input that will override the default behavior of conducting a normal boot. Once the last icon or name is displayed, the system starts to use the bootlist to find the boot image and it is too late to change it.

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-21

Instructor Guide

One of the keyboard actions you may do during this brief period of time is to press the F1 (or numeric 1) key to request that the system boot using SMS firmware code.

SMS on LPAR systems To start the SMS profile under a POWER4 HMC: From the Server and Partition: Server Management application, select the profile for the partition and change the boot mode to SMS. Then, activate the partition using this profile. Be sure to check the Open Terminal box when activating. To start SMS using the Advanced Option for Power On under a POWER5 or POWER6 HMC: Activate the partition using the SMS boot mode. Do this by clicking the Advanced button when activating the partition. In the Boot Mode drop down, select SMS. Do not forget to choose to open a terminal window. The partition will stop at the SMS menu.

5-22 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain how to start SMS. Details — Additional information — Transition statement — How do you change the bootlist in SMS?

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-23

Instructor Guide

Working with bootlists in SMS (1 of 2) IBM Power Systems

System Management Services Main Menu

Multiboot

1.

Select Language

2. 3.

Setup Remote IPL (Initial Program Load) Change SCSI Settings

1. Select Install/Boot Device 2. Configure Boot Device Order 3. Multiboot Startup

4. 5.

Select Console Select Boot Options

===> 2 Configure Boot Device Order 1. Select 1st Boot Device

===> 5 Select Device Type 1. Diskette 2. Tape 3. CD/DVD 4. IDE 5. Hard Drive 6. Network 7. None 8. List All Devices

2. 3. 4. 5. 6. 7.

Select 2nd Boot Select 3rd Boot Select 4th Boot Select 5th Boot Display Current Restore Default

Device Device Device Device Setting Setting

===> 1

===> 8 © Copyright IBM Corporation 2009

Figure 5-7. Working with bootlists in SMS (1 of 2)

AN151.0

Notes: Working with the bootlist In the System Management Service menu, select Boot Options to work with the bootlist. The menu differs on the various models and firmware levels, but the one shown here is fairly standard and is used by the firmware when booting a logical partition. The next screen is the Multiboot menu. It allows you to either specify a specific device to boot with right now, modify the customized bootlists (with the intent of booting using one of them), or to request that you be prompted at each boot for the device to boot from (multiboot option). The focus here is the second option, used to modify the customized bootlist. The Configure Bootlist Device Order panel allows us to either list or modify the bootlist. You select which position in the bootlist you wish to modify and then it prompts you to identify the device you want to use.

5-24 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Select the device type. If you do not have many bootable devices it is sometimes easier to use the List All Devices option. It is important to understand that when SMS is used to modify the bootlist, both the normal bootlist and the service bootlist are modified. If you wanted them to be different, you will need to recustomize them, later, when you have a command prompt (such as in multiuser mode).

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-25

Instructor Guide

Instructor notes: Purpose — Show how to change the bootlist in SMS Details — When you use SMS to change the bootlist, you are changing both the normal and service customizable bootlists. After fixing the problem at hand, you may with to use the bootlist command to recustomize them if you want them to be different. Additional information — The following keys are used (follow with the HMC identifying text): - F1 or numeric 1: Start System Management Services - F5 or numeric 5: Boot in diagnostic mode, use default bootlist - F6 or numeric 6: Boot in diagnostic mode, use nondefault bootlist The default bootlist is set to diskette, CD-ROM, internal disk and any communication adapter. To boot diagnostics from disk, do not insert a CD and request to use the default bootlist (press the appropriate key (F5/numeric 5)or specify with HMC). The other options: Boot versus Multiboot Under Select Boot Options, there is a multiboot mode item. This is a toggle that turns multiboot mode either on or off. If you turn it on, the system will boot to an SMS menu every time you boot the system in normal mode. This is to allow you to choose where to boot from each time. For example, you might have different versions of AIX on different hard disks and want to alternate boot between them. If an SMS menu is displayed when performing a normal boot, this might be the reason. Transition statement — Once we have selected the category of boot device, we need to select the particular device we wish to use in the identified position in the bootlist. Let’s see how we do this.

5-26 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Working with bootlists in SMS (2 of 2) IBM Power Systems

Select Device Device Current Number 1.

Device

Position Name IBM 10/100/1000 Base-TX PCI-X Adapter ( loc=U789D.001.DQDWAYT-P1-C5-T1 )

2.

-

3.

1

SAS 73407 MB Harddisk, part=2 (AIX 6.1.0) ( loc=U789D.001.DQDWAYT-P3-D1 ) SATA CD-ROM ( loc=U789D.001.DQDWAYT-P1-T3-L8-L0 ) Select Task

4. ===> 2

None SAS 73407 MB Harddisk, part=2 (AIX 6.1.0) ( loc=U789D.001.DQDWAYT-P3-D1 ) 1. 2.

Information Set Boot Sequence: Configure as 1st Boot Device

===> 2

Current Boot Sequence 1. SAS 73407 MB Harddisk, part=2 (AIX 6.1.0) ( loc=U789D.001.DQDWAYT-P3-D1 ) 2. None 3. None 4. None © Copyright IBM Corporation 2009

Figure 5-8. Working with bootlists in SMS (2 of 2)

AN151.0

Notes: Selecting bootlist devices For each position in the bootlist, you can select a device. The location code provided with each device in the list allows you to uniquely identify devices that otherwise might be confused. Once you have selected a device, you need to “set” that selection. You can repeat this for each position. The other option is to clear a device by specifying none as an option for that position. Exiting out of SMS will always trigger a boot attempt. If you have not specified a particular device for this boot, it will use the bootlist you have set in SMS.

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-27

Instructor Guide

Instructor notes: Purpose — Complete the walkthrough of how to change a bootlist in SMS. Details — Additional information — Transition statement — Let’s next discuss how to handle a corruption of the boot logical volume.

5-28 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

5.3. Corrupted boot logical volume Instructor topic introduction What students will do — The students will learn to boot to maintenance mode, access the rootvg, and repair a corrupted BLV. How students will do it — Through lecture and lab exercise What students will learn — Students will: • Learn how to boot a system in maintenance mode • Learn how to select the correct disk to be accessed • Learn how to rebuild the BLV How this will help students on their job — They will learn how to fix situations where the system will not boot due to a corrupted BLV.

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-29

Instructor Guide

Boot device alternatives (1 of 2) IBM Power Systems

• Boot device is either: – First one found with a boot image in bootlist – Device specified in SMS Select Install/Boot Device

• If boot device is removable media (CD, DVD, Tape) – boots to the Install and Maintenance menu • If the boot device is a network adapter – boot result depends on NIM configuration for client machine: – nim –o bos_inst : Install and Maintenance menu – nim –o maint_boot : Maintenance menu – nim –o diag : Diagnostic menu

© Copyright IBM Corporation 2009

Figure 5-9. Boot device alternatives (1 of 2)

AN151.0

Notes: Boot alternatives The device the system will boot off of is the first one it finds in the designated bootlist. Whenever the effective boot device is bootable media, such as a mksysb tape/CD/DVD or installation media, the system will boot to the Install and Maintenance menu. If the booting device is a network adapter, the mode of boot depends on the configuration of the NIM server which services the network boot request. If the NIM server is configured to support an AIX installation or a mksysb recover, then the system will boot to Install and Maintenance. If the NIM server is configured to serve out a maintenance image, then the system boots to a Maintenance menu (a sub-menu of Install and Maintenance). If the NIM server is configured to serve out a diagnostic image, then we boot to a diagnostic mode. There are other ways to boot to a diagnostic utility. If the booting device is a CD with a diagnostic CD in the drive, we boot into that diagnostic utility. If a service mode boot is 5-30 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

requested and the booting device is a hard drive with a boot logical volume, then the system boots into the diagnostic utilities. The system can be signaled which bootlist to use during the boot process. The default is to use the normal bootlist and boot in a normal mode. This can be changed during a window of opportunity between when the system discovers the keyboard and before it commits to the default boot mode. The signal may be generated from the system console (this may be an HMC provided virtual terminal) or from a service processor attached workstation (such as an HMC) which can simulate a keyboard signal at the right moment. The keyboard signal that is used can vary from firmware to firmware, but the most common is a numeric 5 to indicate that the firmware should use the service bootlist and a numeric 6 to indicate that the firmware should use the customizable service bootlist. Either of these special keyboard signals will result in a service mode boot, which as we stated can cause a boot to diagnostic mode when booting off a boot logical volume on your hard drive. With an HMC, you can specify which signal to send as part of the LPAR activation. Even if you forget to override the default boot mode (usually normal to multiuser), you can still use the virtual console keyboard as described to override, once the keyboard has been discovered.

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-31

Instructor Guide

Instructor notes: Purpose — Explain how the boot mode is controlled. Details — Additional information — Transition statement — Let’s continue to look at the factors that affect boot behavior.

5-32 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Boot device alternatives (2 of 2) IBM Power Systems

• If boot device is a disk – boot depends on “service key” usage –

Normal mode boot – boot to multi-user

–

Service mode boot – Diagnostic menu

–

Two types of service mode boots: • Requesting default service bootlist (key 5 or F5) • Requesting customized service bootlist (key 6 or F6)

• HMC advanced boot options support all of the above – Normal boot – Diagnostic with default bootlist – Diagnostic with stored bootlist © Copyright IBM Corporation 2009

Figure 5-10. Boot device alternatives (2 of 2)

AN151.0

Notes: Booting off a disk with a boot logical volume (BLV) When the boot device is a disk on your system, the disk must have a valid boot logical volume to be successful. The result of the boot depends upon the mode of the boot. If booting in normal mode, the system is booted up into multiuser mode (the default run level of the inittab). If executing a service mode boot (using either default bootlist or the customizable service mode bootlist), then the system will execute a diagnostics program and present a diagnostics menu. Note that when using the HMC advanced activation options, you can set the mode of your boot and, if service mode, which boot list to use: default or stored (customized service).

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-33

Instructor Guide

Instructor notes: Purpose — Continue covering the factors that affect boot behavior. Details — Additional information — Transition statement — Let’s use what we have just learned to effect a boot to maintenance mode.

5-34 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Accessing a system that will not boot IBM Power Systems

HMC

Boot the system from the BOS CD-ROM, tape or network device (NIM)

Advance Activate options: Default bootlist

Select maintenance mode Maintenance 1. 2. 3. 4.

Access a Root Volume Group Copy a System Dump to Media Access Advanced Maintenance Install from a System Backup

Perform corrective actions Recover data

© Copyright IBM Corporation 2009

Figure 5-11. Accessing a system that will not boot

AN151.0

Notes: Introduction The visual shows an overview of how we access a system that will not boot normally. The maintenance mode can be started from an AIX CD, an AIX bootable tape (like a mksysb), or a network device that has been prepared to access a NIM master. The devices that contain the boot media must be stored in the bootlists.

Boot into maintenance mode To boot into maintenance mode: - AIX 5L V5.3 and AIX 6.1 systems support the bootlist command and booting from a mksysb tape, but the tape device is, by default, not part of the boot sequence. - If planning to boot off media in an LPAR environment, check that the device adapter slot is allocated to the LPAR in question. If not, you may need to update the partition profile to allocate that device. If the device is currently allocated to another LPAR, © Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-35

Instructor Guide

then you will need to first deallocate it from that other LPAR.Use a dynamic LPAR operation on the HMC to allocate that slot. - If using the default bootlist, the sequence is fixed and the CD drive is the first practical device. - If using a tape drive or a network adapter as your boot device and not selecting a boot device through SMS for this particular boot, then you will need to use one of the customizable bootlists, usually the service bootlist. Verify your bootlist, but do not forget that some machines do not have a service bootlist. Check that your boot device is part of the bootlist: # bootlist -m service -o - If you want to boot from your internal tape device, you need to change the bootlist because the tape device by default is not part of the bootlist. For example: # bootlist -m service rmt0 hdisk0 - Whichever bootlist you are using, insert the boot media (either tape or CD) into the drive. - Power on the system (or activate the LPAR). The system begins booting from the installation media. After several minutes, c31 is displayed in the LED/LCD panel (or as the reference code on the HMC display) which means that the software is prompting on the console for input (normally to select the console device and then select the language). For an LPAR, your will need to have the virtual console started to interact with the prompts. - Normally, you are prompted to select the console device and then select the language. After making these selections, you see the Installation and Maintenance menu. For partitioned systems with an HMC, you would normally use the HMC to access SMS and then select the bootable device, which would bypass the use of a bootlist. You can also use a NIM server to boot to maintenance. For this, you would need to place your system’s network adapter in your customized service bootlist before any other bootable devices, or use SMS to specifically request boot over that adapter (the latter option is most common). Here is an example of setting the service boot list: # bootlist -m service ent0 gateway=192.168.1.1 bserver=192.168.10.3 client=192.168.1.57

\

You would also need to set up the NIM server to provide a boot image for doing a maintenance boot. For example, at the NIM server: # nim -o maint_boot -spot name>

>>. >>> 1 Start Install Now with Default Settings 2 Change/Show Installation Settings and Install 3 Start Maintenance Mode for System Recovery 4 Configure Network Disks (iSCSI)

Maintenance

>>> Choice [1]: 3

Type the number of your choice and press Enter. >>> 1 Access a Root Volume Group 2 Copy a System Dump to Removable Media 3 Access Advanced Maintenance Functions 4 Erase Disks 5 Configure Network Disks (iSCSI) 6 Install from a System Backup Choice [1]: 1

© Copyright IBM Corporation 2009

Figure 5-12. Booting in maintenance mode

AN151.0

Notes: First steps When booting in maintenance mode, you first have to identify the system console that will be used, for example your virtual console (vty), graphic console (lft), or serial attached console (tty that is attached to the S1 port). After selecting the console, the Installation and Maintenance menu is shown. As we want to work in maintenance mode, we use selection 3 to start up the Maintenance menu. In a network boot using NIM, the console goes straight to the maintenance menu. From this point, we access our rootvg to execute any system recovery steps that may be necessary.

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-39

Instructor Guide

Instructor notes: Purpose — Explain the first maintenance menus that are shown. Details — Describe how to start up the maintenance mode. Additional information — You could, optionally, provide a brief explanation of what other steps could be executed in the Maintenance menu. Copy a dump to a removable media like a tape, accessing an advanced maintenance shell where no rootvg is available, restoring a mksysb tape. Transition statement — Let’s describe how to access the rootvg.

5-40 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Working in maintenance mode IBM Power Systems

Access a Root Volume Group Type the number for a volume group to display the logical volume information and press Enter. 1) Volume Group 00c35ba000004c00000001153ce1c4b0 contains these disks: hdisk1 70006 02-08-00 hdisk0 70006 02-08-00 Choice: 1

Volume Group Information Volume Group ID 00c35ba000004c00000001153ce1c4b0 includes the following logical volumes: hd5 hd6 hd8 hd4 hd2 hd9var hd3 hd1 hd10opt Type the number of your choice and press Enter. 1) Access this Volume Group and start a shell 2) Access this Volume Group and start a shell before mounting filesystems 99) Previous Menu Choice [99]:

1 © Copyright IBM Corporation 2009

Figure 5-13. Working in maintenance mode

AN151.0

Notes: Select the correct volume group When accessing the rootvg in maintenance mode, you need to select the volume group that is the rootvg. In the example, two volume groups exist on the system. Note that only the volume group IDs are shown and not the names of the volume groups. Check with your system documentation that you select the correct disk. Do not rely too much on the physical volume name but more on the PVID, VGID, or SCSI ID. After selecting the volume group, it will show the list of logical volumes contained in the volume group. This is how you confirm you have selected rootvg. Two selections are then offered: - Access this Volume Group and start a shell - Access this Volume Group and start a shell before mounting file systems

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-41

Instructor Guide

Access this volume group and start a shell When you choose this selection the rootvg will be activated (varyonvg command), and all file systems belonging to the rootvg will be mounted. A shell will be started which can be used to execute any system recovery steps. Typical scenarios where this selection must be chosen are: - Changing a forgotten root password - Recreating the boot logical volume - Changing a corrupted bootlist

Access this volume group and start a shell before mounting file systems When you choose this selection, the rootvg will be activated, but the file system belonging to the rootvg will not be mounted. A typical scenario where this selection is chosen is when a corrupted file system needs to be repaired by the fsck command. Repairing a corrupted file system is only possible if the file system is not mounted. Another scenario might be a corrupted hd8 transaction log. Any changes that take place in the superblock or i-nodes are stored in the log logical volume. When these changes are written to disk, the corresponding transaction logs are removed from the log logical volume. A corrupted transaction log must be reinitialized by the logform command, which is only possible, when no file system is mounted. After initializing the log device, you need to do a file system repair for all file systems that use this transaction log. Beginning with AIX 5L V5.1, you have to explicitly specify the file system type: JFS or JFS2: # # # # # # # #

logform fsck -y fsck -y fsck -y fsck -y fsck -y fsck -y exit

-V -V -V -V -V -V -V

jfs2 jfs2 jfs2 jfs2 jfs2 jfs2 jfs2

/dev/hd8 /dev/hd1 /dev/hd2 /dev/hd3 /dev/hd4 /dev/hd9var /dev/hd10opt

Keep in mind that US keyboard layout is used but you can use the retrieve function by using set -o emacs or set -o vi.

5-42 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Describe how to access the rootvg. Details — Additional information — Describe that the logform command can result in data loss. Transition statement — Let’s check where to find information about boot errors.

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-43

Instructor Guide

How to fix a corrupted BLV IBM Power Systems

1 Boot to maintenance

2

mode from bootable media: CD, tape or NIM

Select volume group that contains hd5

Maintenance 1 Access a Root Volume Group

3

Rebuild BLV

# bosboot # shutdown

-ad

/dev/hdisk0

-Fr

© Copyright IBM Corporation 2009

Figure 5-14. How to fix a corrupted BLV

AN151.0

Notes: Maintenance mode If the boot logical volume is corrupted (for example, bad blocks on a disk might cause a corrupted BLV), the machine will not boot. To fix this situation, you must boot your machine in maintenance mode, from a CD or tape. If NIM has been set up for a machine, you can also boot the machine from a NIM master in maintenance mode. NIM is actually a common way to do special boots in a logical partition environment.

Recreating the boot logical volume After booting from CD, tape, or NIM, an Installation and Maintenance Menu is shown and you can start up the maintenance mode. We will cover this later in this unit. After accessing the rootvg, you can repair the boot logical volume with the bosboot command. You need to specify the corresponding disk device, for example hdisk0: 5-44 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

bosboot -ad /dev/hdisk0 It is important that you do a proper shutdown. All changes need to be written from memory to disk. The bosboot command requires that the boot logical volume (hd5) exists. If you ever need to re-create the BLV from scratch, maybe it had been deleted by mistake or the LVCB of hd5 has been damaged, the following steps should be followed: 1. Boot your machine in maintenance mode (from CD or tape (F5 or 5) or use (F1 or 1) to access the Systems Management Services (SMS) to select the boot device. 2. Remove the old hd5 logical volume. # rmlv hd5 3. Clear the boot record at the beginning of the disk. # chpv -c hdisk0 4. Create a new hd5 logical volume: one physical partition in size, it must be in rootvg and outer edge as intrapolicy. Specify boot as the logical volume type. # mklv -y hd5 -t boot -a e rootvg 1 5. Run the bosboot command as described on the visual. # bosboot -ad /dev/hdisk0 6. Check the actual bootlist. # bootlist -m normal -o 7. Write data immediately to disk. # sync # sync 8. Shut down and reboot the system. # shutdown -Fr By using the internal command ipl_varyon -i, you can check the state of the boot record.

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-45

Instructor Guide

Instructor notes: Purpose — Describe the bosboot command. Details — Describe the steps that are necessary to recreate the boot logical volume. Tell the students that working in maintenance mode is explained later in this unit. Describe that an hd5 boot logical volume must exist on the system. Additional information — Be careful to use the correct AIX installation CD to boot your machine. Consider installing AIX base media and then applying patches to the OS. The patches make changes to both kernel routines AND libc. This invalidates using the installation CDs to boot the system into maintenance mode and accessing the disks. This is because when we boot, we use the /unix and libraries from the CD. Since they all match, this should not be an issue. As we activate the rootvg, the root (/) file system from the CD is overlaid with the root (/) file system from the disks. Now, any reference to /unix are resolved to the DISK! If this /unix does not match what we actually booted from on the CD, bad things will happen. The same applies for libraries being referenced. Transition statement — Let’s describe how to work with bootlists.

5-46 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Checkpoint (1 of 2) IBM Power Systems

1. True or False: You must have AIX loaded on your system to use the System Management Services programs. 2. Your AIX system is currently powered off. AIX is installed on hdisk1 but the bootlist is set to boot from hdisk0. How can you fix the problem and make the machine boot from hdisk1? __________________________________________________ __________________________________________________ 3. Your machine is booted and at the # prompt. What is the command that will display the normal bootlist? ______________________________ How could you change the normal bootlist? ______________________________

4. What command is used to build a new boot image and write it to the boot logical volume? _____________________________________ 5. What script controls the boot sequence? _________________

© Copyright IBM Corporation 2009

Figure 5-15. Checkpoint (1 of 2)

AN151.0

Notes:

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-47

Instructor Guide

Instructor notes: Purpose — Review and test the students, understanding of this first part of the unit. Details — A suggested approach is to give the students about five minutes to answer the questions on this page. Then, go over the questions and answers with the class.

Checkpoint solutions (1 of 2) IBM Power Systems

1. True or False: You must have AIX loaded on your system to use the System Management Services programs. False. SMS is part of the built-in firmware. 2. Your AIX system is currently powered off. AIX is installed on hdisk1 but the bootlist is set to boot from hdisk0. How can you fix the problem and make the machine boot from hdisk1? You need to boot the SMS programs and set the new boot list to include hdisk1. 3. Your machine is booted and at the # prompt. What is the command that will display the normal bootlist? # bootlist -om normal. How could you change the normal bootlist? # bootlist -m normal device1 device2

4. What command is used to build a new boot image and write it to the boot logical volume? bosboot -ad /dev/hdiskx 5. What script controls the boot sequence? rc.boot © Copyright IBM Corporation 2009

Additional information — Transition statement — Let’s continue to the second section, solving boot problems.

5-48 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Checkpoint (2 of 2) IBM Power Systems

6. True or False: During the AIX boot process, the AIX kernel is loaded from the root file system. 7. How do you boot an AIX machine into maintenance mode? ________________________________________________ ________________________________________________ 8. Your machine keeps rebooting and repeating the POST. What could be the reason for this? _________________________________________________ _________________________________________________

© Copyright IBM Corporation 2009

Figure 5-16. Checkpoint (2 of 2)

AN151.0

Notes:

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-49

Instructor Guide

Instructor notes: Purpose — Review and test the students understanding of this unit. Details — A suggested approach is to give the students about five minutes to answer the questions on this page. Then, go over the questions and answers with the class.

Checkpoint solutions (2 of 2) IBM Power Systems

6. True or False: During the AIX boot process, the AIX kernel is loaded from the root file system. False. The AIX kernel is loaded from hd5.

7. How do you boot an AIX machine into maintenance mode? You need to boot from an AIX CD, mksysb, or NIM server.

8. Your machine keeps rebooting and repeating the POST. What could be the reason for this? Invalid boot list, corrupted boot logical volume, or hardware failures of boot device.

© Copyright IBM Corporation 2009

Additional information — Transition statement — Now, let’s do an exercise.

5-50 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Exercise 3: System initialization: Part I IBM Power Systems

• Work with bootlists and identify information on your system • Identify LVM information from your system • Repair a corrupted boot logical volume

© Copyright IBM Corporation 2009

Figure 5-17. Exercise 3: System initialization: Part 1

AN151.0

Notes: Introduction This exercise can be found in your Student Exercise Guide.

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-51

Instructor Guide

Instructor notes: Purpose — Introduce the exercise. Details — Additional information — Transition statement — Let’s summarize.

5-52 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit summary IBM Power Systems

Having completed this unit, you should be able to: • Describe the boot process through to the loading the boot logical volume • Describe the contents of the boot logical volume • Re-create the boot logical volume on a system which is failing to boot • Interpret LED codes during boot • Adjust the bootlist for the desired order of search © Copyright IBM Corporation 2009

Figure 5-18. Unit summary

AN151.0

Notes: During the boot process, the kernel from the boot image is loaded into memory. Boot devices and sequences can be updated using the bootlist command, the diag command, and SMS. The boot logical volume contains an AIX kernel, an ODM, and a RAM file system (that contains the boot script rc.boot that controls the AIX boot process). The boot logical volume can be recreated using the bosboot command.

© Copyright IBM Corp. 2009

Unit 5. System initialization: Part I Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-53

Instructor Guide

Instructor notes: Purpose — Summarize the unit. Details — Present the highlights from the unit. Additional information — Transition statement — Let’s continue with the next unit.

5-54 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit 6. System initialization: Part II What this unit is about This unit describes the final stages of the boot process and outlines how devices are configured for the system. Common boot errors are described and how they can be analyzed to fix boot problems.

What you should be able to do After completing this unit, you should be able to: • Identify the steps in system initialization from loading the boot image to boot completion • Identify how devices are configured during the boot process • Analyze and solve boot problems

How you will check your progress Accountability: • Checkpoint questions • Lab exercise

References AIX Version 6.1 Operating system and device management Note: References listed as “online” above are available at the following address: http://publib.boulder.ibm.com/infocenter/systems Online

RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems (at http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp) SA38-0509

SG24-5496

© Copyright IBM Corp. 2009

Problem Solving and Troubleshooting in AIX 5L (Redbook)

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-1

Instructor Guide

Unit objectives IBM Power Systems

After completing this unit, you should be able to: • Identify the steps in system initialization from loading the boot image to boot completion • Identify how devices are configured during the boot process • Analyze and solve boot problems

© Copyright IBM Corporation 2009

Figure 6-1. Unit objectives

AN151.0

Notes: Introduction There are many reasons for boot failures. The hardware might be damaged or, due to user errors, the operating system might not be able to complete the boot process. A good knowledge of the AIX boot process is a prerequisite for all AIX system administrators.

6-2

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Present the objectives of this unit. Details — Explain that boot errors are very frequent errors. Describe that fixing these problems requires a good knowledge of the boot process. Transition statement — Let's start with an overview of the AIX boot process.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-3

Instructor Guide

6-4

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

6.1. AIX initialization part 1 Instructor topic introduction What students will do — The students will identify the boot process after the AIX kernel has been loaded from the boot logical volume. Additionally, students will be able to explain how devices are configured by the cfgmgr. How students will do it — Through discussion, lecture, and checkpoint questions. What students will learn — Students will: • Detect how AIX boots after loading the AIX kernel • Identify the role of the rc.boot script • Identify how the ODMs in hd4 and hd5 are synchronized • Identify how the cfgmgr is used to configure devices How this will help students on their job — Many boot errors occur during the AIX boot process. By having a good knowledge of this process, fixing any boot problem is much easier.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-5

Instructor Guide

System software initialization overview IBM Power Systems

Load kernel and pass control / Restore RAM file system from boot image

Start init process (from RAMFS)

etc dev

mnt usr

rc.boot 1

rc.boot 2

Start "real" init process (from rootvg)

rc.boot 3

Configure base devices Activate rootvg

Configure remaining devices

/etc/inittab © Copyright IBM Corporation 2009

Figure 6-2. System software initialization overview

AN151.0

Notes: Boot sequence The visual shows the boot sequence after loading the AIX kernel from the boot image. The AIX kernel gets control and executes the following steps: 1. The kernel restores a RAM file system into memory by using information provided in the boot image. At this stage the rootvg is not available, so the kernel needs to work with commands provided in the RAM file system. You can consider this RAM file system as a small AIX operating system. 2. The kernel starts the init process which was provided in the RAM file system (not from the root file system). This init process executes a boot script rc.boot. 3. rc.boot controls the boot process. In the first phase (it is called by init with rc.boot 1), the base devices are configured. In the second phase (rc.boot 2), the rootvg is activated (or varied on). 6-6

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

4. After activating the rootvg at the end of rc.boot 2, the kernel overmounts the RAM file system with the file systems from rootvg. The init from the boot image is replaced by the init from the root file system, hd4. 5. This init processes the /etc/inittab file. Out of this file, rc.boot is called a third time (rc.boot 3) and all remaining devices are configured.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-7

Instructor Guide

Instructor notes: Purpose — Introduce the AIX software boot process. Keep this on the overview level. Details — Explain as described in the student notes. Additional information — Underline that at the beginning of the boot process, no rootvg is available. Before activating the rootvg, all devices that are needed to varyon the rootvg must be configured. Transition statement — Let’s look what rc.boot is doing.

6-8

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

rc.boot 1 IBM Power Systems

Failure LED

Process 1 init F05

rootvg is not active.

c06 rc.boot 1 Boot image ODM restbase 548

510

cfgmgr -f

s Rule _ g i f 1 Con se= pha

bootinfo -b 511

RAM file system ODM

Devices to activate rootvg are configured !

© Copyright IBM Corporation 2009

Figure 6-3. rc.boot 1

AN151.0

Notes: rc.boot phase 1 actions The init process started from the RAM file system, executes the boot script rc.boot 1. If init fails for some reason (for example, a bad boot logical volume), c06 is shown on the LED display. The following steps are executed when rc.boot 1 is called: 1. The restbase command is called which copies the ODM from the boot image into the RAM file system. After this step, an ODM is available in the RAM file system. The LED shows 510 if restbase completes successfully, otherwise LED 548 is shown. 2. When restbase has completed successfully, the configuration manager, cfgmgr, is run with the option -f (first). cfgmgr reads the Config_Rules class and executes all methods that are stored under phase=1. Phase 1 configuration methods result in the configuration of base devices into the system, so that the rootvg can be activated in the next rc.boot phase.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-9

Instructor Guide

3. Base devices are all devices that are necessary to access the rootvg. If the rootvg is stored on a hdisk0, all devices from the motherboard to the disk itself must be configured in order to be able to access the rootvg. 4. At the end of rc.boot 1, the system determines the last boot device by calling bootinfo -b. The LED shows 511.

6-10 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain rc.boot 1. Details — When init starts, F05 will be shown on the PCI machines. Note: The LED values shown in the visual are sample LED values and may be different with different architectures. Remind students that some LED values tend to be different on different machines. All values are not shown here. Additional information — Underline the following important information: 1. When rc.boot 1 executes, there is no access to rootvg. 2. When rc.boot 1 is finished, all devices are configured to activate the rootvg in rc.boot 2. Transition statement — Let’s see what happens in rc.boot 2.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-11

Instructor Guide

rc.boot 2 (part 1) IBM Power Systems

Failure LED

rc.boot 2 551

552

rootvg

554

556

ipl_varyon 517

555

fsck -f /dev/hd4 mount /dev/hd4 /

557

hd4: /

hd2: /usr

hd9var: /var

hd6 copycore: if dump, copy

fsck -f /dev/hd2 mount /usr

518

dev fsck -f /dev/hd9var mount /var copycore umount /var

518

etc

mnt

usr

var

/ RAM File system

swapon /dev/hd6 © Copyright IBM Corporation 2009

Figure 6-4. rc.boot 2 (part 1)

AN151.0

Notes: rc.boot phase 2 actions (part 1) rc.boot is run for the second time and is passed the parameter 2. The LED shows 551. The following steps take part in this boot phase: 1. The rootvg is varied on with a special version of the varyonvg command designed to handle rootvg. If ipl_varyon completes successfully, 517 is shown on the LED, otherwise 552, 554, or 556 are shown and the boot process stops. 2. The root file system, hd4, is checked by fsck. The option -f means that the file system is checked only if it was not unmounted cleanly during the last shutdown. This improves the boot performance. If the check fails, LED 555 is shown. 3. Afterwards, /dev/hd4 is mounted directly onto the root (/) in the RAM file system. If the mount fails, for example due to a corrupted JFS log, the LED 557 is shown and the boot process stops.

6-12 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

4. Next, /dev/hd2 is checked and mounted (again with option -f, it is checked only if the file system wasn't unmounted cleanly). If the mount fails, LED 518 is displayed and the boot stops. 5. Next, the /var file system is checked and mounted. This is necessary at this stage, because the copycore command checks if a dump occurred. If a dump exists in a paging space device, it will be copied from the dump device, /dev/hd6, to the copy directory which is by default the directory /var/adm/ras. /var is unmounted afterwards. 6. The primary paging space /dev/hd6 is made available.

Special root syntax in RAMFS Once the disk-based root file system is mounted over the RAMFS, a special syntax is used in rc.boot to access the RAMFS files: • RAMFS files are accessed using a prefix of /../ . For example, to access the fsck command in the RAMFS (before the /usr file system is mounted), rc.boot uses /../usr/sbin/fsck. • Disk-based files are accessed using normal AIX file syntax. For example, to access the fsck command on the disk (after the /usr file system is mounted) rc.boot uses /usr/sbin/fsck. Note: This syntax only works during the boot process. If you boot from the CD-ROM into maintenance mode and need to mount the root file system by hand, you will need to mount it over another directory, such as /mnt, or you will be unable to access the RAMFS files.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-13

Instructor Guide

Instructor notes: Purpose — Describe the first part of rc.boot 2. Details — Introduce this boot phase as described in the student material. Additional information — Beginning with AIX 5L V5.1, the rootvg file system is mounted directly over the root directory in the RAMFS. This simplifies several steps during phase 2 and eliminates the need to remount the rootvg file systems at the end of phase 2. In many reference documents, LED 518 is defined as indicating that the /usr file system could not mount using the network. This is incorrect. LED 518 will display anytime /usr cannot be mounted. Transition statement — Let’s describe the second part of rc.boot 2.

6-14 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

rc.boot 2 (part 2) IBM Power Systems

rootvg

swapon /dev/hd6

Copy RAM /dev files to disk: mergedev Copy RAM ODM files to disk: cp /../etc/objrepos/Cu* /etc/objrepos

hd4: /

dev

hd2: /usr

hd9var: /var

hd6

etc ODM

mount /var dev

Copy boot messages to alog

etc

mnt

usr

var

ODM / RAM file system

Kernel removes RAMFS © Copyright IBM Corporation 2009

Figure 6-5. rc.boot 2 (part 2)

AN151.0

Notes: rc.boot phase 2 actions (part 2) After the paging space /dev/hd6 has been made available, the following tasks are executed in rc.boot 2: 1. To understand this step, remember two things: - /dev/hd4 is mounted onto root(/) in the RAM file system. - In rc.boot 1, the cfgmgr has been called and all base devices are configured. This configuration data has been written into the ODM of the RAM file system. Now, mergedev is called and all /dev files from the RAM file system are copied to disk. 2. All customized ODM files from the RAM file system ODM are copied to disk as well. At this stage, both ODMs (in hd5 and hd4) are in sync now.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-15

Instructor Guide

3. The /var file system (hd9var) is mounted. 4. All messages during the boot process are copied into a special file. You must use the alog command to view this file: # alog -t boot -o As no console is available at this stage all boot information is collected in this file. When rc.boot 2 is finished, the /, /usr, and /var file systems in rootvg are active.

Final stage At this stage, the AIX kernel removes the RAM file system (returns the memory to the free memory pool) and starts the init process from the / file system in rootvg.

6-16 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Describe the second part of rc.boot 2. Details — Additional information — Transition statement — Let’s describe rc.boot 3.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-17

Instructor Guide

rc.boot 3 (part 1) IBM Power Systems

Process 1 init

/etc/inittab: /sbin/rc.boot 3

553

fsck -f /dev/hd3 mount /tmp

Here, we work with Rootvg. 517

syncvg rootvg &

Normal: cfgmgr -p2 Service: cfgmgr -p3 c31 c33

Config_Rules phase=2 phase=3

/etc/objrepos: ODM

cfgcon c32 rc.dt boot c34 savebase

hd5: ODM

© Copyright IBM Corporation 2009

Figure 6-6. rc.boot 3 (part 1)

AN151.0

Notes: rc.boot phase 3 actions (part 1) At this boot stage, the /etc/init process is started. It reads the /etc/inittab file (LED 553 is displayed) and executes the commands line-by-line. It runs rc.boot for the third time, passing the argument 3 that indicates the last boot phase. rc.boot 3 executes the following tasks: 1. The /tmp file system is checked and mounted. 2. The rootvg is synchronized by syncvg rootvg. If rootvg contains any stale partitions (for example, a disk that is part of rootvg was not active), these partitions are updated and synchronized. syncvg is started as a background job. 3. The configuration manager is called again. If the key switch or boot mode is normal, the cfgmgr is called with option -p2 (phase 2). If the key switch or boot mode is service, the cfgmgr is called with option -p3 (phase 3).

6-18 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

4. The configuration manager reads the ODM class Config_Rules and executes either all methods for phase=2 or phase=3. All remaining devices that are not base devices are configured in this step. 5. The console will be configured by cfgcon. The numbers c31, c32, c33 or c34 are displayed depending on the type of console: -

c31: Console not yet configured. Provides instruction to select a console. c32: Console is a lft terminal. c33: Console is a tty. c34: Console is a file on the disk.

If CDE is specified in /etc/inittab, the CDE will be started and you get a graphical boot on the console. 6. To synchronize the ODM in the boot logical volume with the ODM from the / file system, savebase is called.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-19

Instructor Guide

Instructor notes: Purpose — Describe the first part of rc.boot 3. Details — Describe as explained in the student notes. Additional information — Underline the savebase command that is necessary to synchronize the ODMs from hd4 and hd5. Transition statement — Let’s describe the second part of rc.boot 3.

6-20 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

rc.boot 3 (part 2) IBM Power Systems

/etc/objrepos: ODM

savebase

syncd 60 errdemon hd5:

Turn off LEDs

ODM

rm /etc/nologin A device that was previously detected could not be found. Run "diag -a".

s Ye

chgstatus=3 CuDv ?

System initialization is completed.

Execute next line in /etc/inittab © Copyright IBM Corporation 2009

Figure 6-7. rc.boot 3 (part 2)

AN151.0

Notes: rc.boot phase 3 actions (part 2) After the ODMs have been synchronized again, the following steps take place: 1. The syncd daemon is started. All data that is written to disk is first stored in a cache in memory before writing it to the disk. The syncd daemon writes the data from the cache each 60 seconds to the disk. Another daemon process, the errdemon daemon, is started. This process allows errors triggered by applications or the kernel to be written to the error log. 2. The LED display is turned off. 3. If the file /etc/nologin exists, it will be removed. If a system administrator creates this file, a login to the AIX machine is not possible. During the boot process /etc/nologin will be removed.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-21

Instructor Guide

4. If devices exist that are flagged as missing in CuDv (chgstatus=3), a message is displayed on the console. For example, this could happen if external devices are not powered on during system boot. 5. The last message, System initialization completed, is written to the console. rc.boot 3 is finished. The init process executes the next command in /etc/inittab.

6-22 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Describe the second part of rc.boot 3. Details — Describe as explained in the student notes. Additional information — The /etc/nologin file is used to prevent logging in to a system. Just the existence of the file is needed. If any text is placed in the file, this information will be displayed when user attempts to log in. Transition statement — Let’s summarize the rc.boot script.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-23

Instructor Guide

rc.boot summary IBM Power Systems

Where From rc.boot 1

rc.boot 2

Action

/dev/ram0

restbase cfgmgr -f

Phase Config_Rules 1

ipl_varyon rootvg mount /, /usr, /dev/ram0 /var fileystems Merge /dev Copy ODM mount /tmp

rc.boot 3

rootvg

cfgmgr -p2

2-normal

cfgmgr -p3

3-service

savebase © Copyright IBM Corporation 2009

Figure 6-8. rc.boot summary

AN151.0

Notes: Summary During rc.boot 1, all base devices are configured. This is done by cfgmgr -f which executes all phase 1 methods from Config_Rules. During rc.boot 2, the rootvg is varied on. All /dev files and the customized ODM files from the RAM file system are merged to disk. During rc.boot 3, all remaining devices are configured by cfgmgr -p. The configuration manager reads the Config_Rules class and executes the corresponding methods. To synchronize the ODMs, savebase is called that writes the ODM from the disk back to the boot logical volume.

6-24 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Summarize rc.boot script. Details — Describe the highlights from the rc.boot phases that are shown in the table. Additional information — Transition statement — Let’s look at a common problem in rc.boot phase 2 - the failure to mount the file systems because of corruption.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-25

Instructor Guide

Fixing corrupted file systems and logs IBM Power Systems

• Boot to maintenance mode • Access rootvg without mounting file systems • Rebuild file system log and run fsck: # # # # # # # #

logform fsck -y fsck -y fsck -y fsck -y fsck -y fsck -y fsck -y

-V -V -V -V -V -V -V -V

jfs2 jfs2 jfs2 jfs2 jfs2 jfs2 jfs2 jfs2

/dev/hd8 /dev/hd1 /dev/hd2 /dev/hd3 /dev/hd4 /dev/hd9var /dev/hd10opt /dev/hd11admin

© Copyright IBM Corporation 2009

Figure 6-9. Fixing corrupted file systems and logs

AN151.0

Notes: JFS log or JFS2 log corrupt? To fix a corrupted JFS or JFS2 log, boot in maintenance mode and access the rootvg, but do not mount the file systems. In the maintenance shell, issue the logform command and do a file system check for all file systems that use this JFS or JFS2 log. Keep in mind what file system type your rootvg had: JFS or JFS2. For JFS: # # # # # # #

logform fsck -y fsck -y fsck -y fsck -y fsck -y fsck -y

-V -V -V -V -V -V -V

jfs jfs jfs jfs jfs jfs jfs

/dev/hd8 /dev/hd1 /dev/hd2 /dev/hd3 /dev/hd4 /dev/hd9var /dev/hd10opt

6-26 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

# fsck -y -V jfs /dev/hd11admin exit For JFS2: # logform # fsck -y # fsck -y # fsck -y # fsck -y # fsck -y # fsck -y # fsck -y exit

-V -V -V -V -V -V -V -V

jfs2 jfs2 jfs2 jfs2 jfs2 jfs2 jfs2 jfs2

/dev/hd8 /dev/hd1 /dev/hd2 /dev/hd3 /dev/hd4 /dev/hd9var /dev/hd10opt /dev/hd11admin

The logform command initializes a new JFS transaction log and this may result in loss of data because JFS transactions may be destroyed. Your machine will boot after the JFS log has been repaired.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-27

Instructor Guide

Instructor notes: Purpose — Explain how to fix a corrupted file system. Details — Point out that a common cause of this type of corruption is the use of the HMC shutdown immediate option for an LPAR with a running operating system. This is the equivalent of cutting power to a computer while the operating system is running, which does not allow for a proper shutdown. An administrator should always use (when possible) the HMC OS shutdown option or issue the shutdown command from the LPAR command prompt. Additional information — Transition statement — Let’s review the phases of rc.boot.

6-28 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Let’s review: rc.boot (1 of 3) IBM Power Systems

(1) rc.boot 1

(2) (4) (3) (5)

© Copyright IBM Corporation 2009

AN151.0

Figure 6-10. Let’s review: rc.boot (1 of 3)

Notes: Instructions Using the following questions, put the solutions into the visual. 1. Who calls rc.boot 1? Is it: • /etc/init from hd4 • /etc/init from the RAMFS in the boot image 2. Which command copies the ODM files from the boot image into the RAM file system? 3. Which command triggers the execution of all phase 1 methods in Config_Rules? 4. Which ODM files contain the devices that have been configured in rc.boot 1? • ODM files in hd4 • ODM files in RAM file system 5. How can you determine the last boot device? © Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-29

Instructor Guide

Instructor notes: Purpose — Review and test the students understanding of rc.boot phase 1. Details — This is the first of three reviews. You can review each one separately, or have the students do all three, then review them all.

Let’s review solution: rc.boot (1 of 3) IBM Power Systems

(1) /etc/init from RAMFS in the boot image

rc.boot 1

restbase

(2)

cfgmgr -f

(3)

bootinfo -b

(5)

(4) ODM files in RAM file system

© Copyright IBM Corporation 2009

Additional information — Transition statement — Now, let’s review rc.boot phase 2.

6-30 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Let’s review: rc.boot (2 of 3) IBM Power Systems

(5)

rc.boot 2 (1)

(6)

(2)

(7)

(3) (8)

557 (4)

© Copyright IBM Corporation 2009

Figure 6-11. Let’s review: rc.boot (2 of 3)

AN151.0

Notes: Instructions Please order the following nine expressions in the correct sequence. 1. Turn on paging. 2. Merge RAM /dev files. 3. Copy boot messages to alog. 4. Activate rootvg. 5. Mount /var; copy dump; unmount /var. 6. Mount /dev/hd4 onto / in RAMFS. 7. Copy RAM ODM files. Finally, answer the following question. Put the answer in box 8: Your system stops booting with an LED 557. Which command failed? © Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-31

Instructor Guide

Instructor notes: Purpose — Review and test the students, understanding of rc.boot phase 2. Details — This is the second of three reviews. You can review each one separately, or have the students do all three, then review them all.

Let’s review solution: rc.boot (2 of 3) IBM Power Systems

(5) Merge RAM /dev files

rc.boot 2

(6)

(1) Activate rootvg

Copy RAM ODM files

Mount /dev/hd4 on / in RAMFS

(2)

Mount /var Copy dump Unmount /var

(3)

Copy boot messages to alog

557

mount

(7)

/dev/hd4

(8)

(4)

Turn on paging

© Copyright IBM Corporation 2009

Additional information — Question 8 is important for the lab. The command that failed is the mount of /dev/hd4. One reason for this might be a damaged log logical volume. Transition statement — Now, let’s review rc.boot phase 3.

6-32 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Let’s review: rc.boot (3 of 3) IBM Power Systems

Update ODM in BLV _________

From which file is rc.boot 3 started: ________________ _

sy____ ___ err_______

/sbin/rc.boot 3

fsck -f ________ mount ________

Turn off ____

rm _________ s_______ ________& Missing devices ? _________=3 ______ ?

________ -p2 ________ -p3

Start Console: _____ Start CDE: _______

Execute next line in _____________ © Copyright IBM Corporation 2009

Figure 6-12. Let’s review: rc.boot (3 of 3)

AN151.0

Notes: Instructions Please complete the missing information in the picture. Your instructor will review the activity with you.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-33

Instructor Guide

Instructor notes: Purpose — Review and test the students understanding of rc.boot phase 3. Details — This is the last of three reviews. You can review each one separately, or have the students do all three, then review them all.

Let’s review solution: rc.boot (3 of 3) IBM Power Systems

savebase

/etc/inittab

syncd 60 errdemon

/sbin/rc.boot3 fsck -f /dev/hd3 mount /tmp

Turn off LED

rm /etc/nologin

syncvg rootvg &

chgstatus=3 CuDv ?

cfgmgr -p2 cfgmgr -p3

Execute next line in /etc/inittab

Start Console: cfgcon Start CDE: rc.dt boot © Copyright IBM Corporation 2009

Additional information — Transition statement — Now, let’s switch over to the next topic.

6-34 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

6.2. AIX initialization part 2 Instructor topic introduction What students will do — The students will review important components from the AIX software boot process. How students will do it — Through lecture, exercise, and checkpoint questions What students will learn — Students will: • Review the configuration manager (cfgmgr) • Review the Config_Rules object class • Identify the boot alog • Review the /etc/inittab file • Identify important LED codes How this will help students on their job — The components that are introduced or reviewed are vital for the AIX operating system. A good knowledge of these components is a prerequisite for all system administrators.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-35

Instructor Guide

Configuration manager IBM Power Systems

Predefined

PdDv PdAt PdCn Config_Rules

cfgmgr

Methods

Customized CuDv

Define Device Driver

CuAt CuDep

load

Configure Change

CuDvDr

Unconfigure unload

CuVPD

Undefine

© Copyright IBM Corporation 2009

Figure 6-13. Configuration manager

AN151.0

Notes: When the Configuration manager is invoked During system boot, the configuration manager is invoked to configure all devices detected as well as any device whose device information is stored in the configuration database. At run time, you can configure a specific device by directly invoking the cfgmgr command. If you encounter problems during the configuration of a device, use cfgmgr -v. With this option, cfgmgr shows the devices as they are configured.

Automatic configuration Many devices are automatically detected by the configuration manager. For this to occur, device entries must exist in the predefined device object classes. The configuration manager uses the methods from PdDv to manage the device state, for example, to bring a device into the defined or available state. 6-36 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Installing new device support cfgmgr can be used to install new device support. If you invoke cfgmgr with the -i flag, the command attempts to install device software support for each newly detected device. High-level device commands like mkdev invoke methods and allow the user to add, delete, show, or change devices and their attributes.

Define method When a device is defined through its define method, the information from the predefined database for that type of device is used to create the information describing the device specific instance. This device specific information is then stored in the customized database.

Configure method steps The process of configuring a device is often device-specific. The configure method for a kernel device must: 1. Load the device driver into the kernel. 2. Pass device-dependent information describing the device instance to the driver. 3. Create a special file for the device in the /dev directory. Of course, many devices are not physical devices, such as logical volumes or volume groups, these are pseudodevices. For this type of device, the configured state is not as meaningful. However, it still has a configuration method that simply marks the device as configured or performs more complex operations to determine if there are any devices attached to it.

Configuration order The configuration process requires that a device be defined or configured before a device attached to it can be defined or configured. At system boot time, the configuration manager configures the system in a hierarchical fashion. First the motherboard is configured, then the buses, then the adapters that are attached, and finally the devices that are connected to the adapters. The configuration manager then configures any pseudodevices (volume groups, logical volumes, and so forth) that need to be configured.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-37

Instructor Guide

Instructor notes: Purpose — Summarize how the cfgmgr works. Details — Explain that the cfgmgr can detect devices automatically. The devices must be defined in the predefined ODM classes. When they get defined, they are stored in the customized ODM classes. The cfgmgr is method or rule driven. It just uses methods to define or configure a device. These methods are device specific and are listed in PdDv. During the boot process, cfgmgr uses the Config_Rules class to configure the devices in the correct sequence. Note that the actual Config_Rules object class has more objects in each phase than are listed in the visual. Additional information — The output from the configuration manager is viewable in the boot alog. During run-time, cfgmgr can be started with the flag -v, to get more information about the devices that are configured. Transition statement — Let’s have a look in the Config_Rules ODM class.

6-38 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Config_Rules object class IBM Power Systems

Phase seq boot

rule

1 1

10 12

0 0

/etc/methods/defsys /usr/lib/methods/deflvm

2 2 2 2

10 12 19 20

0 0 0 0

/etc/methods/defsys /usr/lib/methods/deflvm /etc/methods/ptynode /etc/methods/startlft

3 3 3 3 3

10 12 19 20 25

0 0 0 0 0

/etc/methods/defsys /usr/lib/methods/deflvm /etc/methods/ptynode /etc/methods/startlft /etc/methods/starttty

cfgmgr -f

cfgmgr -p2 (Normal boot)

cfgmgr -p3 (Service boot)

© Copyright IBM Corporation 2009

Figure 6-14. Config_Rules object class

AN151.0

Notes: Introduction The Config_Rules ODM object class is used by cfgmgr during the boot process. The phase attribute determines when the respective method is called.

Phase 1 All methods with phase=1 are executed when cfgmgr -f is called. The first method that is started is /etc/methods/defsys, which is responsible for the configuration of all base devices. The second method /usr/lib/methods/deflvm loads the logical volume device driver (LVDD) into the AIX kernel. If you have devices that must be configured in rc.boot 1, that means before the rootvg is active, you need to place phase 1 configuration methods into Config_Rules. A bosboot is required afterwards.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-39

Instructor Guide

Phase 2 All methods with phase=2 are executed when cfgmgr -p2 is called. This takes place in the third rc.boot phase, when the key switch is in normal position or for a normal boot on a PCI machine. The seq attribute controls the sequence of the execution: The lower the value, the higher the priority.

Phase 3 All methods with phase=3 are executed when cfgmgr -p3 is called. This takes place in the third rc.boot phase, when the key switch is in service position, or a service boot has been issued on a PCI system.

Sequence number Each configuration method has an associated sequence number. When executing the methods for a particular phase, cfgmgr sorts the methods based on the sequence number. The methods are then invoked, one by one, starting with the smallest sequence number. Methods with a sequence number of zero are invoked last, after those with non-zero sequence numbers.

Boot mask Each configuration method has an associated boot mask: - If the boot_mask is zero, the rule applies to all types of boot. - If the boot_mask is non-zero, the rule then only applies to the boot type specified. For example, if boot_mask = DISK_BOOT, the rule would only be used for boots from disk versus NETWORK_BOOT which only applies when booting through the network.

6-40 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain how cfgmgr uses the Config_Rules object class. Details — Review the methods that are called when the cfgmgr is executed. Explain as described in the notes. Keep this on an easy level. Note that we are only showing a sampling of the objects in this object class. Additional information — If you have devices that must be configured before the rootvg is active, you need to add these configuration methods to the Config_Rules object class. You also have to ensure the methods you want run are included in the RAMFS image created by bosboot. This involves adding information to the “proto” files that are in /usr/lib/boot. While the visual shows cfgmgr verbose output in the log, the log would also have output from other commands executed and status messages written by the rc.boot script for each step that has been outlined in the lecture. Transition statement — Let’s introduce the boot alog.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-41

Instructor Guide

cfgmgr output in the boot log using alog IBM Power Systems

# alog -t boot -o ------------------------------------------------------attempting to configure device 'sys0' invoking /usr/lib/methods/cfgsys_rspc -l sys0 return code = 0 ******* stdout ******* bus0 ******* no stderr ***** ------------------------------------------------------attempting to configure device 'bus0' invoking /usr/lib/methods/cfgbus_pci bus0 return code = 0 ******** stdout ******* bus1, scsi0 ****** no stderr ****** ------------------------------------------------------attempting to configure device 'bus1' invoking /usr/lib/methods/cfgbus_isa bus1 return code = 0 ******** stdout ****** fda0, ppa0, sa0, sioka0, kbd0 ****** no stderr ***** © Copyright IBM Corporation 2009

Figure 6-15. cfgmgr output in the boot log using alog

AN151.0

Notes: The boot log Because no console is available during the boot phase, the boot messages are collected in a special file, which, by default, is /var/adm/ras/bootlog. As shown in the visual, you have to use the alog command to view the contents of this file. To view the boot log, issue the command as shown, or use the smit alog fastpath. If you have boot problems, it is always a good idea to check the boot alog file for potential boot error messages. All output from cfgmgr is shown in the boot log, as well as other information that is produced in the rc.boot script. The default boot log file size in AIX 5L V5.1 (8 KB) was too small to capture the entire output of a system boot in AIX 5L. The default boot log size in AIX 5L V5.2 is 32 KB and in AIX 5L V5.3 and AIX 6.1 it is 128 KB. If you want to increase the size of the boot log, for example to 256 KB, issue the following command: # print “Resizing boot log” | alog -C -t boot -s 262144 6-42 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Describe the alog command to identify boot messages. Details — Describe how boot messages produced during the boot process are written to an alog file. Show how the alog command can be used. The bootlog shows more than the output of the cfgmgr -v during rc.boot execution. The various rc.boot steps which we have covered have messages written to the boot log. Additional information — Describe how the boot log, /var/adm/ras/bootlog, might be increased to a bigger size. This often had to be done prior to AIX 5L V5.2 as the default size of 8 KB was very small. To display the size of the log run: alog -t boot -L The alog is circular; meaning the oldest information will be automatically overwritten by the newest information. Transition statement — Let’s review the /etc/inittab file.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-43

Instructor Guide

/etc/inittab file IBM Power Systems

init:2:initdefault: brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot powerfail::powerfail:/etc/rc.powerfail 2>&1 | alog -tboot > /dev/console # mkatmpvc:2:once:/usr/sbin/mkatmpvc >/dev/console 2>&1 atmsvcd:2:once:/usr/sbin/atmsvcd >/dev/console 2>&1 tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1 # Set tunab securityboot:2:bootwait:/etc/rc.security.boot > /dev/console 2>&1 rc:23456789:wait:/etc/rc 2>&1 | alog -tboot > /dev/console # Multi-User checks rcemgr:23456789:once:/usr/sbin/emgr -B > /dev/null 2>&1 fbcheck:23456789:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console # ru srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons mkcifs_fs:2:wait:/etc/mkcifs_fs > /dev/console 2>&1 sniinst:2:wait:/var/adm/sni/sniprei > /dev/console 2>&1 rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons cron:23456789:respawn:/usr/sbin/cron piobe:2:wait:/usr/lib/lpd/pioinit_cp >/dev/null 2>&1 # pb cleanup cons:0123456789:respawn:/usr/sbin/getty /dev/console qdaemon:23456789:wait:/usr/bin/startsrc -sqdaemon writesrv:23456789:wait:/usr/bin/startsrc -swritesrv uprintfd:23456789:respawn:/usr/sbin/uprintfd shdaemon:2:off:/usr/sbin/shdaemon >/dev/console 2>&1 # High availability

Do not use an editor to change /etc/inittab. Use mkitab, chitab, rmitab instead. © Copyright IBM Corporation 2009

Figure 6-16. /etc/inittab file

AN151.0

Notes: Purpose of /etc/inittab The /etc/inittab file supplies information for the init process. Note how the rc.boot script is executed out of the inittab file to configure all remaining devices in the boot process.

Modifying /etc/inittab Do not use an editor to change the /etc/inittab file. One small mistake in /etc/inittab, and your machine will not boot. Instead use the commands mkitab, chitab, and rmitab to edit /etc/inittab. The advantage of these commands is that they always guarantee a non-corrupted /etc/inittab file. If your machine stops booting with an LED 553, this indicates a bad /etc/inittab file in most cases.

6-44 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Consider the following examples: - To add a line to /etc/inittab, use the mkitab command. For example: # mkitab “myid:2:once:/usr/local/bin/errlog.check” - To change /etc/inittab so that init will ignore the line tty1, use the chitab command: # chitab “tty1:2:off:/usr/sbin/getty /dev/tty1” - To remove the line tty1 from /etc/inittab, use the rmitab command. For example: # rmitab tty1

Viewing /etc/inittab The lsitab command can be used to view the /etc/inittab file. For example: # lsitab dt dt:2:wait:/etc/rc.dt If you issue lsitab -a, the complete /etc/inittab file is shown.

The shdaemon daemon Another daemon started with /etc/inittab is shdaemon. This daemon provides a SMIT-configurable mechanism to detect certain types of system hangs and initiate the configured action. The shdaemon daemon uses a corresponding configuration program named shconf. The system hang detection feature uses the shdaemon entry in the /etc/inittab file, as shown in the visual, with an action field that is set to off by default. Using the shconf command or SMIT (fastpath: smit shd), you can enable this daemon and configure the actions it takes when certain conditions are met. shdaemon is described in the next visual.

telinit and run levels Use the telinit command to signal the init daemon: - To tell the init daemon to re-read the /etc/inittab use: # telinit q - To tell the init daemon to reset the environment to match a different (or same) run level use: # telinit n

(where n is the desired run level)

- To query what the current run level is use: # who -r © Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-45

Instructor Guide

Instructor notes: Purpose — Describe the /etc/inittab file and some important commands to view and manipulate this file. Details — Show that rc.boot is executed out of /etc/inittab. Describe that it is risky to edit the /etc/inittab file. It is always better to use the commands described in the notes. Additional information — Point out that a corrupted /etc/inittab file is indicated by LED 553. The students will see this in their exercise. The mkitab, chitab, and rmitab commands provide automatic syntax checking. The line must match the proper format for /etc/inittab. There is a -i option with mkitab to insert the new line anywhere in the /etc/inittab file. Without the -i, the line will be appended to the end of the file. Transition statement — Let’s describe the basics for system hang detection.

6-46 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Boot problem management IBM Power Systems

Check

LED

User action

Bootlist wrong?

LED codes cycle

Power on, press F1, select Multi-Boot, select the correct boot device.

/etc/inittab corrupt? /etc/environment corrupt?

553

Access the rootvg. Check /etc/inittab (empty, missing or corrupt?). Check /etc/environment.

Boot logical volume or boot record corrupt?

20EE000B

Access the rootvg. Re-create the BLV: # bosboot -ad /dev/hdiskx

JFS/JFS2 log corrupt?

551, 552, 554, 555, 556, 557

Access rootvg before mounting the rootvg file systems. Re-create the JFS/JFS2 log: # logform -V jfs /dev/hd8 or # logform -V jfs2 /dev/hd8 Run fsck afterwards.

Superblock corrupt?

552, 554, 556

Run fsck against all rootvg file systems. If fsck indicates errors (not an AIX file system), repair the superblock as described in the notes.

rootvg locked?

551

Access rootvg and unlock the rootvg: # chvg -u rootvg

ODM files missing?

523 - 534

ODM files are missing or inaccessible. Restore the missing files from a system backup.

Mount of /usr or /var failed?

518

Check /etc/filesystem. Check network (remote mount), file systems (fsck) and hardware. © Copyright IBM Corporation 2009

Figure 6-17. Boot problem management

AN151.0

Notes: Introduction The visual shows some common boot errors that might happen during the AIX software boot process.

Bootlist wrong? If the bootlist is wrong, the system cannot boot. This is easy to fix. Boot in SMS and select the correct boot device. Keep in mind that only hard disks with boot records are shown as selectable boot devices.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-47

Instructor Guide

/etc/inittab corrupt? /etc/environment corrupt? An LED of 553 usually indicates a corrupted /etc/inittab file, but in some cases a bad /etc/environment may also lead to a 553 LED. To fix this problem, boot in maintenance mode and check both files. Consider using a mksysb to retrieve these files from a backup tape.

Boot logical volume or boot record corrupt? The next thing to try if your machine does not boot, is to check the boot logical volume. To fix a corrupted boot logical volume, boot in maintenance mode and use the bosboot command: # bosboot -ad /dev/hdisk0

JFS log or JFS2 log corrupt? To fix a corrupted JFS or JFS2 log, boot in maintenance mode and access the rootvg, but do not mount the file systems. In the maintenance shell, issue the logform command and do a file system check for all file systems that use this JFS or JFS2 log. Keep in mind what file system type your rootvg had: JFS or JFS2. For JFS: # logform # fsck -y # fsck -y # fsck -y # fsck -y # fsck -y # fsck -y exit

-V -V -V -V -V -V -V

jfs jfs jfs jfs jfs jfs jfs

/dev/hd8 /dev/hd1 /dev/hd2 /dev/hd3 /dev/hd4 /dev/hd9var /dev/hd10opt

-V -V -V -V -V -V -V

jfs2 jfs2 jfs2 jfs2 jfs2 jfs2 jfs2

For JFS2: # logform # fsck -y # fsck -y # fsck -y # fsck -y # fsck -y # fsck -y exit

/dev/hd8 /dev/hd1 /dev/hd2 /dev/hd3 /dev/hd4 /dev/hd9var /dev/hd10opt

The logform command initializes a new JFS transaction log and this may result in loss of data because JFS transactions may be destroyed. Your machine will boot after the JFS log has been repaired.

6-48 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Superblock corrupt? Another thing you can try is to check the superblocks of your rootvg file systems. If you boot in maintenance mode and you get error messages like Not an AIX file system or Not a recognized file system type, it is probably due to a corrupt superblock in the file system. Each file system has two super blocks. Executing fsck should automatically recover the primary superblock by copying from the backup superblock. The following is provided in case you need to do this manually. For JFS, the primary superblock is in logical block 1 and a copy is in logical block 31. To manually copy the superblock from block 31 to block 1 for the root file system (in this example), issue the following command: # dd count=1 bs=4k skip=31 seek=1 if=/dev/hd4 of=/dev/hd4 For JFS2, the locations are different. To manually recover the primary superblock from the backup superblock for the root file system (in this example), issue the following command: # dd count=1 bs=4k skip=15 seek=8 if=/dev/hd4 of=/dev/hd4

rootvg locked? Many LVM commands place a lock into the ODM to prevent other commands from working at the same time. If a lock remains in the ODM due to a crash of a command, this may lead to a hanging system. To unlock the rootvg, boot in maintenance mode and access the rootvg with file systems. Issue the following command to unlock the rootvg: # chvg -u rootvg

ODM files missing? If you see LED codes in the range 523 to 534, ODM files are missing on your machine. Use a mksysb tape of the system to restore the missing files.

Mount of /usr or /var failed? An LED of 518 indicates that the mount of the /usr or /var file system failed. If /usr is mounted from a network, check the network connection. If /usr or /var are locally mounted, use fsck to check the consistency of the file systems. If this does not help, check the hardware by running diagnostics from the Diagnostics CD.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-49

Instructor Guide

Instructor notes: Purpose — Describe some common causes of boot problems. Details — Describe as explained in the student notes. Describe the meaning of 553 and 557 as they are part of the exercise. Additional information — Transition statement — Let’s review the /etc/inittab file which was described in the basic administration course.

6-50 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Let's review: /etc/inittab file IBM Power Systems

init:2:initdefault: brc::sysinit:/sbin/rc.boot 3 rc:2:wait:/etc/rc fbcheck:2:wait:/usr/sbin/fbcheck srcmstr:2:respawn:/usr/sbin/srcmstr cron:2:respawn:/usr/sbin/cron rctcpip:2:wait:/etc/rc.tcpip rcnfs:2:wait::/etc/rc.nfs qdaemon:2:wait:/usr/bin/startsrc -sqdaemon dt:2:wait:/etc/rc.dt tty0:2:off:/usr/sbin/getty /dev/tty1 myid:2:once:/usr/local/bin/errlog.check

© Copyright IBM Corporation 2009

Figure 6-18. Let’s review: /etc/inittab file

AN151.0

Notes: Instructions Answer the following questions as they relate to the /etc/inittab file shown in the visual: 1. Which process is started by the init process only one time? The init process does not wait for the initialization of this process.

2. Which process is involved in print activities on an AIX system?

3. Which line is ignored by the init process?

4. Which line determines that multiuser mode is the initial run level of the system? © Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-51

Instructor Guide

5. Where is the System Resource Controller started?

6. Which line controls network processes?

7. Which component allows the execution of programs at a certain date or time?

8. Which line executes /etc/firstboot, if it exists?

9. Which script controls starting of the CDE desktop?

10. Which line is executed in all run levels?

11. Which line takes care of varying on the volume groups, activating paging spaces, and mounting file systems that are to be activated during boot?

6-52 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Review the /etc/inittab file which was described in the basic administration course. Details — Give the students 10 minutes to answer the questions, then review them. When reviewing the answers, complete the empty boxes in the visual with the highlighted expressions. After reviewing all questions, the completed visual should look like the following table:

Let's review solution: /etc/inittab file IBM Power Systems

init:2:initdefault:

Determine initial run-level

brc::sysinit:/sbin/rc.boot 3

Startup last boot phase

rc:2:wait:/etc/rc

Multiuser initialization

fbcheck:2:wait:/usr/sbin/fbcheck

Execute /etc/firstboot, if it exists

srcmstr:2:respawn:/usr/sbin/srcmstr

Start the System Resource Controller

cron:2:respawn:/usr/sbin/cron

Start the cron daemon

rctcpip:2:wait:/etc/rc.tcpip rcnfs:2:wait::/etc/rc.nfs

Startup communication daemon processes (nfsd, biod, ypserv, and so forth)

qdaemon:2:wait:/usr/bin/startsrc -sqdaemon

Startup spooling subsystem

dt:2:wait:/etc/rc.dt

Startup CDE desktop

tty0:2:off:/usr/sbin/getty /dev/tty1

Line ignored by init

myid:2:once:/usr/local/bin/errlog.check

Process started only one time

© Copyright IBM Corporation 2009

Additional information — 1. myid line is started only one time The action once indicates the init process to start the process and not to wait for its initialization. When the process ends, it will not be restarted. 2. qdaemon The qdaemon controls the queueing subsystem in AIX. It manages jobs in queues and their assignment to the different queues in the system.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-53

Instructor Guide

3. tty0 line is ignored by init The action off tells the init process to ignore this line. But: If you change the action to off and you issue the command, telinit q, the init process sends a SIGTERM signal to the process. After 20 seconds if the process still exists, init sends a SIGKILL signal to it. 4. init line determines initial run level The init command uses this entry to determine which run level to enter initially. Run level 2 means multiuser. 1, s, m, and M mean single-user or often called maintenance mode. brc runs at all run levels. 5. srcmstr line starts the System Resource Controller 6. rctcpip line starts the communication daemon processes (inetd, named, and so forth) rcnfs line starts the NFS daemon processes (nfsd, biod, ypserv, and so forth.) TCP/IP and NFS daemons are started in these scripts. Typical examples are inetd (which controls all socket based communication), biod (for the NFS client), nfsd (for the NFS server) or ypserv (for the NIS server) process. 7. cron line starts the cron daemon The cron daemon runs shell commands at specified dates and times. Use the crontab command to administrate cron processes. 8. fbcheck line executes /etc/firstboot, if it exists This process executes a script /etc/firstboot, if it exists. This script is used after the installation of an AIX system to start up any customization steps after the reboot of the system. The program install_assist is an example of such a program that is started after the installation. 9. The dt line controls the startup the CDE desktop This script controls the startup of the graphical desktop. 10. Startup last boot phase This process starts rc.boot 3 that is responsible for the final configuration of all devices on a system. This script is executed in all run levels and before the console is configured (sysinit). 11. rc line activates volume groups, paging spaces, and file systems during boot. Varyon all automatic volume groups. Activate all automatic paging spaces. Mount all file systems marked mount=true in /etc/filesystems. Transition statement — Let’s answer some checkpoint questions.

6-54 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Checkpoint IBM Power Systems

1. From where is rc.boot 3 run? ___________________________________________________

2. Your system stops booting with LED 557: In which rc.boot phase does the system stop? _________ What are some reasons for this problem? _____________________________________________ _____________________________________________ _____________________________________________

3. Which ODM file is used by the cfgmgr during boot to configure the devices in the correct sequence? _____________________

4. What does the line init:2:initdefault: in /etc/inittab mean? ___________________________________________________ ___________________________________________________ © Copyright IBM Corporation 2009

Figure 6-19. Checkpoint

AN151.0

Notes:

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-55

Instructor Guide

Instructor notes: Purpose — Review and test the students, understanding of this unit. Details — A suggested approach is to give the students about five minutes to answer the questions on this page. Then, go over the questions and answers with the class.

Checkpoint solutions IBM Power Systems

1. From where is rc.boot 3 run? From the /etc/inittab file in rootvg

2. Your system stops booting with LED 557: In which rc.boot phase does the system stop? rc.boot 2 What are some reasons for this problem? Corrupted BLV Corrupted JFS log Damaged file system

3. Which ODM file is used by the cfgmgr during boot to configure the devices in the correct sequence? Config_Rules

4. What does the line init:2:initdefault: in /etc/inittab mean? This line is used by the init process, to determine the initial run level (2=multiuser). © Copyright IBM Corporation 2009

Additional information — Transition statement — Now, let’s do an exercise.

6-56 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Exercise 6: System initialization: Part 2 IBM Power Systems

• Repair a corrupted log logical volume • Analyze and fix a boot failure

© Copyright IBM Corporation 2009

Figure 6-20. Exercise 4: System initialization part 2

AN151.0

Notes: Introduction This exercise can be found in your Student Exercise Guide.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-57

Instructor Guide

Instructor notes: Purpose — Prepare the students for the lab. Details — Additional information — Transition statement — Let’s summarize.

6-58 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit summary IBM Power Systems

Having completed this unit, you should be able to: • Identify the steps in system initialization from loading the boot image to boot completion • Identify how devices are configured during the boot process • Analyze and solve boot problems

© Copyright IBM Corporation 2009

Figure 6-21. Unit summary

AN151.0

Notes: • After the boot image is loaded into RAM, the rc.boot script is executed three times to configure the system. • During rc.boot 1, devices to varyon the rootvg are configured. • During rc.boot 2, the rootvg is varied on. • In rc.boot 3, the remaining devices are configured. • Processes defined in the /etc/inittab file are initiated by the init process.

© Copyright IBM Corp. 2009

Unit 6. System initialization: Part II Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-59

Instructor Guide

Instructor notes: Purpose — Summarize the unit. Details — Present the highlights from the unit. Additional information — Transition statement — Let’s continue with the next unit.

6-60 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit 7. Disk management theory What this unit is about This unit explains concepts important for understanding and working with the logical volume manager (LVM) used in AIX.

What you should be able to do After completing this unit, you should be able to: • • • •

Explain where LVM information is stored Solve ODM-related LVM problems Manage volume group quorum issues Explain the physical volume states used by the LVM

How you will check your progress Accountability: • Checkpoint questions • Lab exercises

References Online

AIX Version 6.1 Command Reference volumes 1-6

Online

AIX Version 6.1 Operating system and device management

Note: References listed as “online” above are available at the following address: http://publib.boulder.ibm.com/infocenter/systems GG24-4484-00 AIX Storage Management (Redbook) SG24-5422-00 AIX Logical Volume Manager from A to Z: Introduction and Concepts (Redbook) SG24-5433-00 AIX Logical Volume Manager from A to Z: Troubleshooting and Commands (Redbook)

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-1

Instructor Guide

Unit objectives IBM Power Systems

After completing this unit, you should be able to: • Explain where LVM information is stored • Solve ODM-related LVM problems • Manage volume group quorum issues • Explain the physical volume states used by the LVM

© Copyright IBM Corporation 2009

Figure 7-1. Unit objectives

AN151.0

Notes: Purpose of this unit Basic LVM concepts are introduced in the basic system administration course. In this unit, we will review these basic concepts and expand your knowledge of LVM.

7-2

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Present the objectives of this unit. Details — The AIX Storage Management Redbook listed under “References” was published in 1994, but it is still useful. Transition statement — Let’s start with a review of LVM terms.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-3

Instructor Guide

7-4

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

7.1. LVM data representation Instructor topic introduction What students will do — The students will learn where LVM information is kept and which part of this information resides in the ODM. How students will do it — Through lecture, exercise, and checkpoint questions What students will learn — Students will learn: • Where LVM information is stored • Which LVM information resides in the ODM and on disk control blocks • How to solve ODM-related problems How this will help students on their job — Knowing where LVM data is stored in AIX will make it easier for the students to analyze and avoid LVM errors.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-5

Instructor Guide

Review: LVM terms IBM Power Systems

Physical Partitions

Logical Partitions

Physical Volumes

Logical Volume

Volume Group

© Copyright IBM Corporation 2009

Figure 7-2. LVM terms

AN151.0

Notes: Introduction This visual and the associated student notes will provide a review of basic LVM terms.

Volume groups, physical volumes, and physical partitions A volume group (VG) consists of one or more physical volumes (PV) that are divided into physical partitions (PP). When a volume group is created, a physical partition size has to be specified. This physical partition size is the smallest allocation unit for the LVM. The partition size is specified in units of megabytes from 1 (1 MB) through 131,072 (128 GB). The physical partition size must be equal to a power of 2 (example 1, 2, 4, 8). The default physical partition size values for normal and big volume groups (more on these later) will be the lowest value that can be used to remain within a limitation of 1016 physical partitions per PV. The default value for scalable volume groups (introduced in AIX 5L V5.3) will be the lowest value that can be used to accommodate 2040 physical partitions per PV. There is no actual limit on the number of 7-6

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

physical partitions per physical volume for scalable volume groups, although there is currently a limit of 2 M physical partitions for the entire volume group.

Logical volumes and logical partitions The LVM provides logical volumes (LVs), that can be created, extended, moved and deleted at run time. Logical volumes may span several disks, which is one of the biggest advantages of the LVM. Logical volumes contain the JFS and JFS2 file systems, paging spaces, journal logs, the boot logical volumes or nothing (when used as a raw logical volume). Logical volumes are divided into logical partitions (LPs), where each logical partition is associated with at least one physical partition.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-7

Instructor Guide

Instructor notes: Purpose — Introduce some basic LVM terms. Details — Use the student notes to guide your presentation. Additional information — If no PP size is specified when creating the VG, the mkvg command attempts to figure out an appropriate PP size based on the disks in the volume group. Transition statement — Let’s look at the unique identifiers used by LVM for the volume groups, logical volumes, and physical volumes.

7-8

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

LVM identifiers IBM Power Systems

Goal: Unique worldwide identifiers for • Volume groups • Hard disks • Logical volumes # lsvg rootvg ... VG IDENTIFIER: 00c35ba000004c00000001157f54bf78 # lspv hdisk0 ...

00c35ba07b2e24f0

rootvg

active

32 bytes long 32 bytes long (16 are shown)

# lslv hd4 LOGICAL VOLUME: hd4 VOLUME GROUP: rootvg LV IDENTIFIER: 00c35ba000004c00000001157f54bf78.4 ... ...

VGID.minor number

# uname -m 00C35BA04C00 © Copyright IBM Corporation 2009

Figure 7-3. LVM identifiers

AN151.0

Notes: Use of identifiers The LVM uses identifiers for disks, volume groups, and logical volumes. As volume groups could be exported and imported between systems, these identifiers must be unique worldwide. AIX generated identifiers are based on the CPU ID of the creating host and a timestamp.

Volume group identifiers As shown on the visual, the volume groups identifiers (VGID) have a length of 32 bytes.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-9

Instructor Guide

Disk identifiers Hard disk identifiers have a length of 32 bytes, but currently the last 16 bytes are unused and are all set to 0 in the ODM. Notice that, as shown on the visual, only the first 16 bytes of this identifier are displayed in the output of the lspv command. In a SAN environment, path management needs to have a method for identifying a disk discovered over two different paths is actually the same disk. Some storage solutions, in an AIX environment use the PVID for this purpose. Other storage solutions use a IEEE volume identifier (ieee_volname) or a UDID unique identifier (unique_id) for this purpose. Each of these would be attributes of the disk in the ODM. The PVID attribute is set the first time a disk is assigned to a volume group. If you ever have to manually update the disk identifiers in the ODM, do not forget to add 16 zeros to the physical volume ID.

Logical volume identifiers The logical volume identifiers consist of the volume group identifier, a period, and the minor number of the logical volume.

7-10 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Introduce the LVM identifiers. Details — Explain using the information provided in the student notes. Emphasize that these identifiers are important, since the logical name we use may not be associated and in various problem scenarios we will need to work with the unique identifier instead. Additional information — Be sure to explain that physical volume IDs are 32 bytes long. The last 16 bytes are currently set to zeros. That is important for the lab. Transition statement — Let’s talk about where LVM stores its information.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-11

Instructor Guide

LVM data on disk control blocks IBM Power Systems

• Volume Group Descriptor Area (VGDA) – Most important data structure of LVM – Global to the volume group (same on each disk) – One or two copies per disk

• Volume Group Status Area (VGSA) – Tracks the state of mirrored copies – One or two copies per disk

• Logical Volume Control Block (LVCB) – Has historically occupied the first 512 bytes of each logical volume – Contains LV attributes (policies, number of copies) – Scalable VGs: The information is merged into VGDA © Copyright IBM Corporation 2009

Figure 7-4. LVM data on disk control blocks

AN151.0

Notes: Disk control blocks used by LVM The LVM uses three different disk control blocks: 1. The Volume Group Descriptor Area (VGDA) is the most important data structure of the LVM. A redundant copy is kept on each disk that is contained in a volume group. Each disk contains the complete allocation information of the entire volume group. 2. The Volume Group Status Area (VGSA) tracks the status of all physical volumes in the volume group (active or missing) and the state of all allocated physical partitions in the volume group (active or stale). Each disk in a volume group contains a VGSA. 3. The Logical Volume Control Block (LVCB) traditionally resides in the first 512 bytes of each logical volume. If raw devices are used (for example, many database systems use raw logical volumes), be careful that these programs do not destroy the LVCB. However, LVCB is not kept at this location in scalable volume groups, but 7-12 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

instead is kept in the same reserved disk area as the VGDA. Also, the administrator of a big VG can use the -T option of the mklv command to request that the LVCB not be stored in the beginning of the LV.

VGSA for scalable volume groups The VGSA for scalable VGs consists of three areas: PV missing area (PVMA), mirror write consistency dirty bit area (MWC_DBA), and PP status area (PPSA). - PV missing area: The PVMA tracks if any of the disks are missing - MWC dirty bit area: The MWC_DBA holds the status for each LV if passive mirror write consistency is used - PP status area: The PPSA logs any stale PPs The overall size reserved for the VGSA is independent of the configuration parameters of the scalable VG and stays constant. However, the size of the contained PPSA changes in proportion to the configured maximum number of PPs.

LVCB-related considerations For standard VGs, the LVCB resides in the first block of the user data within the LV. Big VGs keep additional LVCB information in the VGDA. The LVCB structure on the first LV user block and the LVCB structure within the VGDA are similar but not identical. If a big VG was created with the -T 0 option of the mkvg command, no LVCB will occupy the first block of the LV. With scalable VGs, logical volume control information is no longer stored on the first user block of any LV. Therefore, no precautions have to be taken when using raw logical volumes, because there is no longer a need to preserve the information held by the first 512 bytes of the logical device.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-13

Instructor Guide

Instructor notes: Purpose — Introduce the disk control blocks. Details — Explain using the information in the student notes. Additional information — None Transition statement — Let’s see which other locations are used to store LVM data.

7-14 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

LVM data in the operating system IBM Power Systems

• Object Data Manager (ODM) – Physical volumes, volume groups, and logical volumes are represented as devices (customized devices) – CuDv, CuAt, CuDvDr, CuDep

• AIX files – /etc/vg/vgVGID

Handle to the VGDA copy in memory

– /dev/hdiskX

Special file for a disk

– /dev/VGname

Special file for administrative access to a VG

– /dev/LVname

Special file for a logical volume

– /etc/filesystems

Used by the mount command to associate LV name, file system log, and mount point

© Copyright IBM Corporation 2009

Figure 7-5. LVM data in the operating system

AN151.0

Notes: LVM information stored in the ODM Physical volumes, volume groups, and logical volumes are handled as devices in AIX. Every physical volume, volume group, and logical volume is defined in the customized object classes in the ODM.

LVM information stored in AIX files As shown on the visual, many AIX files also contain LVM-related data. The VGDA is always stored by the kernel in memory to increase performance. This technique is called a memory-mapped file. The handle is always a file in the /etc/vg directory. This filename always reflects the volume group identifier.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-15

Instructor Guide

Instructor notes: Purpose — Describe where LVM data is stored. Details — Explain using the information in the student notes. Keep this on an overview level. Additional information — None Transition statement — Let's see what’s stored in the VGDA.

7-16 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Contents of the VGDA IBM Power Systems

Header Time Stamp

Physical Volume List

Logical Volume List

• Updated when VG is changed • PVIDs only (no PV names) • VGDA count and PV state • LVIDs and LV names • Number of copies

Physical Partition Map

• Maps LPs to PPs

Trailer Time Stamp

• Must contain same value as header time stamp © Copyright IBM Corporation 2009

Figure 7-6. Contents of the VGDA

AN151.0

Notes: Introduction The table on the visual shows the contents of the VGDA. The individual items listed are discussed in the paragraphs that follow.

Time stamps The time stamps are used to check if a VGDA is valid. If the system crashes while changing the VGDA, the time stamps will differ. The next time the volume group is varied on, this VGDA is marked as invalid. The latest intact VGDA will then be used to overwrite the other VGDAs in the volume group.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-17

Instructor Guide

Physical volume list The VGDA contains the physical volume list. Note that no disk names are stored, only the unique disk identifiers are used. For each disk, the number of VGDAs on the disk and the physical volume state is stored. We will talk about physical volume states later in this unit.

Logical volume list The VGDA contains a record of the logical volumes that are part of the volume group. It stores the LV identifiers and the corresponding logical volume names. Additionally, the number of copies is stored for each LV.

Physical partition map The most important data structure is the physical partition map. It maps each logical partition to a physical partition. The size of the physical partition map is determined at volume group creation time.

7-18 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Describe the contents of the VGDA. Details — Use the student notes to guide your explanation. The students do not need to know the detailed structure of the VGDA, this is just to reinforce the concepts of the type of information maintained in the VGDA, and that the time stamps help identify a VGDA copy that is out of date. Additional information — The -d flag of the mkvg command is ignored in AIX 5L V5.2, AIX 5L V5.3, and AIX 6.1. Transition statement — Let’s have a look into the VGDA.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-19

Instructor Guide

VGDA example IBM Power Systems

# lqueryvg -p hdisk1 -At Max LVs: PP Size:

256 20

1: ____________

Free PPs: LV count: PV count:

12216 3 1

Total VGDAs:

2

2: ____________ 3: ____________ 4: ____________

MAX PPs per PV: MAX PVs:

32768 1024

5: ____________ Logical: 00c35ba000004c00000001157fcf6bdf.1 00c35ba000004c00000001157fcf6bdf.2 00c35ba000004c00000001157fcf6bdf.3 Physical:

00c35ba07fcf6b93

6: ____________

2

lv00 lv01 lv02

1 1 1

0

7: ____________ © Copyright IBM Corporation 2009

Figure 7-7. VGDA example

AN151.0

Notes: The lqueryvg command The lqueryvg command is a low-level command that shows an extract from the VGDA on a specified disk, for example, hdisk1. In the command shown on the visual, -p hdisk1 means to read the VGDA on hdisk1, -A means to display all available information, and -t means to display descriptive tags. The visual only shows selected fields from the report; a more complete example output is below in these notes.

Interpreting lqueryvg output As an exercise in interpreting the output of lqueryvg, match each of the following expressions to the appropriate numbered location on the visual. a. VGDA count on this disk

7-20 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

b. 2 VGDAs in VG c. 3 LVs in VG d. PP size = 220 (2 to the 20th power) bytes, or 1 MB (for this volume group) e. LVIDs (VGID.minor_number) f. 1 PVs in VG g. PVIDs

Output of lqueryvg on AIX 6.1 The output of lqueryvg on recent AIX versions gives more information than shown in the example on the visual. An example of lqueryvg (for the rootvg disk) output from an AIX 6.1 system is given below: Max LVs: PP Size: Free PPs: LV count: PV count: Total VGDAs: Conc Allowed: MAX PPs per PV MAX PVs: Quorum (disk): Quorum (dd): Auto Varyon ?: Conc Autovaryo Varied on Conc Logical:

256 24 590 10 1 2 0 1016 32 1 1 1 0 0 00c35ba000004c00000001157f54bf78.1 00c35ba000004c00000001157f54bf78.2 00c35ba000004c00000001157f54bf78.3 00c35ba000004c00000001157f54bf78.4 00c35ba000004c00000001157f54bf78.5 00c35ba000004c00000001157f54bf78.6 00c35ba000004c00000001157f54bf78.7 00c35ba000004c00000001157f54bf78.8 00c35ba000004c00000001157f54bf78.9 00c35ba000004c00000001157f54bf78.10 Physical: 00c35ba07b2e24f0 2 Total PPs: 767 LTG size: 128 HOT SPARE: 0 AUTO SYNC: 0 VG PERMISSION: 0 © Copyright IBM Corp. 2009

hd5 1 hd6 1 hd8 1 hd4 1 hd2 1 hd9var 1 hd3 1 hd1 1 hd10opt 1 hd11admin 1 0

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-21

Instructor Guide

SNAPSHOT VG: IS_PRIMARY VG: PSNFSTPP: VARYON MODE: VG Type: Max PPs:

0 0 4352 0 0 32512

7-22 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Examine a VGDA. Details — Implement this page as a sort of activity. Give the students 10 minutes to order the expressions. Then review the page: 1. PP size = 220 (2 to the 20th power) bytes, or 1 MB (for this volume group) 2. 3 LVs in VG 3. 1 PVs in VG 4. 2 VGDAs in VG 5. LVIDs (VGID.minor_number) 6. PVIDs 7. VGDA count on this disk Additional information — The lqueryvg command displays the PP size as the value of the exponent in the power of 2 expression specifying the number of bytes in a PP. In the example on the visual, the value of 20 given for PP Size means that the PP size is 220 bytes, which is the same as saying 1 MB. In the AIX 6.1 example in the student notes, the value of 24 shown for PP Size means that the PP size is 224 bytes, which is 16 MB. The best resource for information about intermediate-level LVM commands such as lqueryvg, lvm_query, and getlvcb is the IBM Redbook AIX Logical Volume Manager from A to Z: Troubleshooting and Commands (SG24-5433-00). The output of lqueryvg might vary a bit, depending on the version of AIX. The notes include an example of output from this command from an AIX 6.1 system. Another command that can be used to examine the VGDA is readvgda or readvgda_svg if you want to read the VGDA for a scalable volume group. You might mention that this VGDA seems to belong to a scalable volume group (1024 MAX PVs) and not a normal volume group (MAX LVs: 256, MAX PVs: 32) or a big volume group (MAX LVs: 512, etc.) Transition statement — Let’s look at the LVCB.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-23

Instructor Guide

The logical volume control block (LVCB) IBM Power Systems

# getlvcb -AT hd2 AIX LVCB intrapolicy = c copies = 1 interpolicy = m lvid = 00c35ba000004c00000001157f54bf78.5 lvname = hd2 label = /usr machine id = 35BA04C00 number lps = 102 relocatable = y strict = y stripe width = 0 stripe size in exponent = 0 type = jfs2 upperbound = 32 fs = time created = Mon Oct 8 11:16:49 2007 time modified = Mon Oct 8 07:00:09 2007

© Copyright IBM Corporation 2009

Figure 7-8. The logical volume control block (LVCB)

AN151.0

Notes: The LVCB and the getlvcb command The LVCB stores attributes of a logical volume. The getlvcb command queries an LVCB.

Example on visual In the example on the visual, the getlvcb command is used to obtain information from the logical volume hd2. The information displayed includes the following: - Intrapolicy, which specifies what strategy should be used for choosing physical partitions on a physical volume. The five general strategies are edge (sometimes called outer-edge), inner-edge, middle (sometimes called outer-middle), inner-middle, and center (c = Center). - Number of copies (1 = No mirroring)

7-24 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

- Interpolicy, which specifies the number of physical volumes to extend across (m = Minimum). - LVID - LV name (hd2) - Number of logical partitions (103) - Can the partitions be reorganized? (relocatable = y) - Each mirror copy on a separate disk (strict = y) - Number of disks involved in striping (stripe width) - Stripe size - Logical volume type (type = jfs) - JFS file system information - Creation and last update time

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-25

Instructor Guide

Instructor notes: Purpose — Describe the LVCB. Details — Explain that the LVCB stores LV attributes. Do not explain each attribute shown; just do a short overview of the LVCB. Additional information — If your logical volume interpolicy is set to maximum, the getlvcb command will show interpolicy = x. The values for intrapolicy are: ie

inner edge

im

inner middle

c

center

m

outer middle

e

outer edge

Transition statement — Let’s identify how LVM uses the ODM and the VGDA/LVCB.

7-26 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

How LVM interacts with ODM and VGDA IBM Power Systems

importvg

ODM VGDA LVCB

/etc/filesystems Match IDs by name

Change, using low-level commands

mkvg extendvg mklv crfs chfs rmlv reducevg

Update exportvg

... © Copyright IBM Corporation 2009

Figure 7-9. How LVM interacts with ODM and VGDA

AN151.0

Notes: High-level commands Most of the LVM commands that are used when working with volume groups, physical, or logical volumes are high-level commands. These high-level commands (like mkvg, extendvg, mklv, and others listed on the visual) are implemented as shell scripts and use names to reference a certain LVM object. The ODM is consulted to match a name, for example, rootvg or hdisk0, to an identifier.

Interaction with disk control blocks and the ODM The high-level commands call intermediate or low-level commands that query or change the disk control blocks VGDA or LVCB. Additionally, the ODM has to be updated; for example, to add a new logical volume. The high-level commands contain signal handlers to clean up the configuration if the program is stopped abnormally. If a system crashes, or if high-level commands are stopped by kill -9, the system can

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-27

Instructor Guide

end up in a situation where the VGDA/LVCB and the ODM are not in sync. The same situation may occur when low-level commands are used incorrectly.

The importvg and exportvg commands The visual shows two very important commands that are explained in detail later. The command importvg imports a complete new volume group based on a VGDA and LVCB on a disk. The command exportvg removes a complete volume group from the ODM.

VGDA and LVCB corruption The focus in this course is on situations where the ODM is corrupted and we assume that the LVM control data (for example, the VGDA or the LVCB) are correct. If an attempted execution of LVM commands (for example: lsvg, varyonvg, reducevg) results in a failure with core dump, that could be an indication that the LVM control data on one of the disks has become corrupted. In this situation, do not attempt to resync the ODM using the procedures covered. In most cases, you will need to recover from a volume group backup. If recovery from backup is not a viable option, It is suggested that you work with AIX Support in dealing with the problem. Attempting to use the procedures covered in this unit will not solve the problem. Even worse, you will likely propagate the corruption to other disks in the volume group, thus making the situation even worse.

7-28 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain how LVM interacts with ODM and VGDA/LVCB. Details — Use the student notes to guide your explanation. Additional information — The commands exportvg/importvg are covered later in this course. Therefore, just mention briefly what these commands do. Transition statement — Let’s see how the LVM-related ODM entries look. This is important, because you will have to repair ODM entries in the next part of the exercise we started earlier. We will start with the entries that store information about physical volumes.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-29

Instructor Guide

ODM entries for physical volumes (1 of 3) IBM Power Systems

# odmget -q "name like hdisk[02]" CuDv CuDv: name = "hdisk0" status = 1 chgstatus = 2 ddins = "scsidisk" location = "" parent = "vscsi0" connwhere = "810000000000" PdDvLn = "disk/vscsi/vdisk" CuDv: name = "hdisk2" status = 1 chgstatus = 0 ddins = "scdisk" location = "01-08-01-8,0" parent = "scsi1" connwhere = "8,0" PdDvLn = "disk/scsi/scsd" © Copyright IBM Corporation 2009

Figure 7-10. ODM entries for physical volumes (1 of 3)

AN151.0

Notes: CuDV entries for physical volumes The CuDv object class contains information about each physical volume.

Key attributes Remember the most important attributes: - status = 1 means the disk is available - chgstatus = 2 means the status has not changed since last reboot - location specifies the location code of the device - parent specifies the parent device

7-30 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Physical versus virtual disks The two disks have different device drivers and different Predefined Device object class links. This is because hdisk2 is a physical disk which has been directly allocated to the logical partition (which this example came from), while hdisk0 is a virtual disk which is mapped though the Advanced Power Virtualization feature to a backing physical disk which is allocated to a Virtual I/O Server partition on the same machine. The virtual disk does not have an AIX location code. Rather, its location is the physical location code of its parent virtual SCSI adapter (vscsi0) supplemented with the LUN number for the backing device which is recorded in the connwhere field. The physical location code of the parent adapter is recorded in the CuVPD object for the adapter.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-31

Instructor Guide

Instructor notes: Purpose — Explain that information about all disks is stored in the CuDv object class. Details — Use the student notes to guide your explanation. Additional information — None Transition statement — Let’s look at CuAt.

7-32 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

ODM entries for physical volumes (2 of 3) IBM Power Systems

# odmget -q "name=hdisk0 and attribute=pvid" CuAt CuAt: name = "hdisk0" attribute = "pvid" value = "00c35ba07b2e24f00000000000000000" type = "R" generic = "D" rep = "s" nls_index = 11

To create or recover a missing pvid attribute object: # chdev –l hdisk# -a pv=yes

© Copyright IBM Corporation 2009

Figure 7-11. ODM entries for physical volumes (2 of 3)

AN151.0

Notes: The pvid attribute The disk’s most important attribute is its PVID. The PVID has a length of 32 bytes, where the last 16 bytes are set to zeros in the ODM. Whenever you must manually update a PVID in the ODM, you must specify the complete 32-byte PVID of the disk.

Other information stored in CuAt Other attributes of physical volumes (for example, the size of the disk) may be stored in CuAt.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-33

Instructor Guide

Instructor notes: Purpose — Explain that the PVID is stored in CuAt. Details — Use the student notes to guide your explanation. Additional information — None Transition statement — Let’s look at CuDvDr.

7-34 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

ODM entries for physical volumes (3 of 3) IBM Power Systems

# odmget -q "value3 like hdisk[03]" CuDvDr CuDvDr: resource = "devno" value1 = "17" value2 = "0" value3 = "hdisk0" CuDvDr: resource value1 = value2 = value3 =

= "devno" "36" "0" "hdisk3"

# ls -l /dev/hdisk[03] brw------- 1 root system brw------- 1 root system

17, 0 Oct 08 06:17 /dev/hdisk0 36, 0 Oct 08 09:19 /dev/hdisk3

© Copyright IBM Corporation 2009

Figure 7-12. ODM entries for physical volumes (3 of 3)

AN151.0

Notes: Major and minor numbers The ODM class CuDvDr is used to store the major and minor numbers of the devices. The output shown on the visual, for example, indicates that CuDvDr has stored the major number 17 (value1) and the minor number 0 (value2) for hdisk0. The major numbers for the two disks are different because hdisk0 is a virtual disk, served from a Virtual I/O Server partition, while hdisk1 is a physical disk allocated to this logical partition.

Special files Applications or system programs use the special files to access a certain device. For example, the visual shows special files used to access hdisk0 (/dev/hdisk0) and hdisk1 (/dev/hdisk1).

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-35

Instructor Guide

Instructor notes: Purpose — Explain that major and minor numbers are stored in CuDvDr. Details — Explain that this ODM class is used to build the special files in /dev. If it seems appropriate for the particular group of students you are teaching, you might provide the major number (22) and minor number (1) for hdisk0 (as given in the student notes) and then ask the students what the major number (22) and minor number (2) are for hdisk1. Additional information — None Transition statement — Let’s see how volume group information is stored in the ODM.

7-36 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

ODM entries for volume groups (1 of 2) IBM Power Systems

# odmget -q "name=rootvg" CuDv CuDv: name = "rootvg" status = 0 chgstatus = 1 ddins = "" location = "" parent = "" connwhere = "" PdDvLn = "logical_volume/vgsubclass/vgtype" # odmget -q "name=rootvg" CuAt CuAt: name = "rootvg" attribute = "vgserial_id" value = "00c35ba000004c00000001157f54bf78" type = "R" generic = "D" rep = "n" nls_index = 637 (output continues on next page) © Copyright IBM Corporation 2009

Figure 7-13. ODM entries for volume groups (1 of 2)

AN151.0

Notes: CuDv entries for volume groups Information indicating the existence of a volume group is stored in CuDv, which means all volume groups must have an object in this class. The visual shows an example of a CuDv entry for rootvg.

VGID One of the most important pieces of information about a volume group is the VGID. As shown on the visual, this information is stored in CuAt.

Disks belonging to a volume group An entry for each disk that belongs to a volume group is stored in CuAt. That is shown on the next page.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-37

Instructor Guide

Instructor notes: Purpose — Describe how volume group information is stored in CuDv and CuAt. Details — Use the student notes to guide your explanation. Additional information — None Transition statement — The CuAt output continues on the next page.

7-38 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

ODM entries for volume groups (2 of 2) IBM Power Systems

# odmget -q "name=rootvg" CuAt ... CuAt: name = "rootvg" attribute = "timestamp" value = "470a1bc9243ed693" type = "R" generic = "DU" rep = "s" nls_index = 0 CuAt: name = "rootvg" attribute = "pv" value = "00c35ba07b2e24f00000000000000000" type = "R" generic = "" rep = "sl" nls_index = 0 © Copyright IBM Corporation 2009

Figure 7-14. ODM entries for volume groups (2 of 2)

AN151.0

Notes: Disks belonging to a volume group The CuAt object class contains an object for each disk that belongs to a volume group. The visual shows an example of a CuAt object for a disk in rootvg.

Length of PVID Remember that the PVID is a 32-number field, where the last 16 numbers are set to zeros.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-39

Instructor Guide

Instructor notes: Purpose — Describe additional objects for volume groups in CuAt. Details — Use the student notes to guide your explanation. Emphasize that PVIDs for disks are stored with a length of 32 bytes. Ensure the students understand that a CuAt object is created for each disk in a volume group. For example, if there were two physical volumes in rootvg, there would be two entries with name = "rootvg" and attribute = "pv" in CuAt. Additional information — Transition statement — Let’s consider logical volumes.

7-40 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

ODM entries for logical volumes (1 of 2) IBM Power Systems

# odmget -q "name=hd2" CuDv CuDv: name = "hd2" status = 0 chgstatus = 1 ddins = "" location = "" parent = "rootvg" connwhere = "" PdDvLn = "logical_volume/lvsubclass/lvtype" # odmget -q "name=hd2" CuAt Other attributes include intra, CuAt: stripe_width, type, and so on. name = "hd2" attribute = "lvserial_id" value = "00c35ba000004c00000001157f54bf78.5" type = "R" generic = "D" rep = "n" nls_index = 648 © Copyright IBM Corporation 2009

Figure 7-15. ODM entries for logical volumes (1 of 2)

AN151.0

Notes: CuDv entries for logical volumes The CuDv object class contains an entry for each logical volume.

Attributes of a logical volume Attributes of a logical volume, for example, its LVID (lvserial_id), are stored in the object class CuAt. Other attributes that belong to a logical volume are the intra-physical policy (intra), stripe_width, type, size, and label.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-41

Instructor Guide

Instructor notes: Purpose — Explain how logical volume data is stored in the ODM. Details — Use the student notes to guide your explanation. Additional information — Remind the students that the LVID is created from the VGID and the minor number of the special file entry of the logical volume. Transition statement — The CuDvDr and CuDep object classes also contain logical volume data.

7-42 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

ODM entries for logical volumes (2 of 2) IBM Power Systems

# odmget -q "value3=hd2" CuDvDr CuDvDr: resource = "devno" value1 = "10" value2 = "5" value3 = "hd2"

# ls -l /dev/hd2 brw------1 root system 10,5 08 Jan

06:56

/dev/hd2

# odmget -q "dependency=hd2" CuDep CuDep: name = "rootvg" dependency = "hd2"

© Copyright IBM Corporation 2009

Figure 7-16. ODM entries for logical volumes (2 of 2)

AN151.0

Notes: CuDvDr logical volume objects Each logical volume has an object in CuDvDr that is used to create the special file entry for that logical volume in /dev. As an example, the sample output on the visual shows the CuDvDr object for hd2 and the corresponding /dev/hd2 (major number 10, minor number 5) special file entry in the /dev directory.

CuDep logical volume entries The ODM class CuDep (customized dependencies) stores dependency information for software devices. For example, the sample output on the visual indicates that the logical volume hd2 is contained in the rootvg volume group.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-43

Instructor Guide

Instructor notes: Purpose — Continue the explanation of where logical volume data is stored in the ODM. Details — Explain logical volume objects in CuDvDr and CuDep. Additional information — None Transition statement — What are reasons for ODM-related LVM problems?

7-44 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

ODM-related LVM problems IBM Power Systems

2. VGDA LVCB

High-level commands

ODM

- Signal handler - Lock

1.

What can cause problems ? • kill -9, shutdown, system crash • Improper use of low-level commands • Hardware changes without or with wrong software actions • Full root file system

© Copyright IBM Corporation 2009

Figure 7-17. ODM-related LVM problems

AN151.0

Notes: Normal functioning of high-level commands As already mentioned, most of the time administrators use high-level commands to create or update volume groups or logical volumes. These commands use signal handlers to set up a proper cleanup in case of an interruption. Additionally, LVM commands use a locking mechanism to block other commands while a change is in progress.

Causes of problems The signal handlers used by high-level LVM commands do not work with a kill -9, a system shutdown, or a system crash. You might end up in a situation where the VGDA has been updated, but the change has not been stored in the ODM. Problems might also occur because of the improper use of low-level commands or hardware changes that are not followed by correct administrator actions. © Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-45

Instructor Guide

Another common problem is ODM corruption when performing LVM operations when the root file system (which contains /etc/objrepos) is full. Always check the root file system free space before attempting LVM recovery operations.

7-46 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain how ODM-related problems might come up. Details — Explain the student material. Additional information — None Transition statement — Let’s identify ways that ODM problems can be fixed.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-47

Instructor Guide

Fixing ODM problems (1 of 2) IBM Power Systems

If the ODM problem is not in the rootvg, for example in volume group homevg, do the following:

# varyoffvg homevg # exportvg homevg

Remove complete volume group from the ODM

# importvg -y homevg hdiskX Import volume group and create new ODM objects

© Copyright IBM Corporation 2009

Figure 7-18. Fixing ODM problems (1 of 2)

AN151.0

Notes: Determining which volume group has the problem If you detect ODM problems, you must determine whether the volume group with the problem is the rootvg or not. Because the rootvg cannot be varied off, the procedure given here applies only to non-rootvg volume groups.

Steps in ODM repair procedure (for problem not in rootvg) 1. In the first step, you vary off the volume group, which requires that all file systems be unmounted first. To vary off a volume group, use the varyoffvg command. 2. In the next step, you export the volume group by using the exportvg command. This command removes the complete volume group from the ODM. The VGDA and LVCB are not touched by exportvg.

7-48 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

3. In the last step, you import the volume group by using the importvg command. Specify the volume group name with option -y, otherwise AIX creates a new volume group name. You need to specify only one intact physical volume of the volume group that you import. The importvg command reads the VGDA and LVCB on that disk and creates completely new ODM objects. It should be noted that this procedure does not allow the data to be used while repairing the corruption, even if the file systems are mounted and are accessible despite the problem. The logical volumes must be closed to vary the volume group offline.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-49

Instructor Guide

Instructor notes: Purpose — Describe how to fix ODM problems in non-rootvg volume groups. Details — Explain the student material. Additional information — None Transition statement — Let’s discuss how to fix ODM problems in rootvg.

7-50 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Fixing ODM problems (2 of 2) IBM Power Systems

If the ODM problem is in the rootvg, try using the rvgrecover procedure: PV=hdisk0 VG=rootvg cp /etc/objrepos/CuAt /etc/objrepos/CuAt.$$ cp /etc/objrepos/CuDep /etc/objrepos/CuDep.$$ cp /etc/objrepos/CuDv /etc/objrepos/CuDv.$$ cp /etc/objrepos/CuDvDr /etc/objrepos/CuDvDr.$$ lqueryvg -Lp $PV | awk '{print $2}' | while read LVname; do odmdelete -q "name=$LVname" -o CuAt odmdelete -q "name=$LVname" -o CuDv odmdelete -q "value3=$LVname" -o CuDvDr • Uses odmdelete done to “export” rootvg odmdelete -q "name=$VG" -o CuAt odmdelete -q "parent=$VG" -o CuDv • Uses importvg to odmdelete -q "name=$VG" -o CuDv odmdelete -q "name=$VG" -o CuDep import rootvg odmdelete -q "dependency=$VG" -o CuDep odmdelete -q "value1=10" -o CuDvDr odmdelete -q "value3=$VG" -o CuDvDr importvg -y $VG $PV # ignore lvaryoffvg errors varyonvg $VG

© Copyright IBM Corporation 2009

Figure 7-19. Fixing ODM problems (2 of 2)

AN151.0

Notes: Problems in rootvg For ODM problems in rootvg, finding a solution is more difficult because rootvg cannot be varied off or exported. However, it may be possible to fix the problem using one of the techniques described below.

The rvgrecover procedure If you detect ODM problems in rootvg, you can try using the procedure called rvgrecover. You may want to code this in a script (shown on the visual) in /bin and mark it executable. The rvgrecover procedure removes all ODM entries that belong to your rootvg by using odmdelete. That is the same way exportvg works.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-51

Instructor Guide

After deleting all ODM objects from rootvg, it imports the rootvg by reading the VGDA and LVCB from the boot disk. This results in completely new ODM objects that describe your rootvg.

RAM disk maintenance mode With the rootvg, the corruption problem may prevent a normal boot to multiuser mode. Thus, you may need to handle this situation in RAM Disk Maintenance Mode (boot into Maintenance mode from the CD-ROM or NIM). Before attempting this, you should make sure you have a current mksysb backup. Use the steps in the following table (which are similar to those in the rvgrecover script shown on the visual) to recover the rootvg volume group after booting to maintenance mode and file system mounting. Step

1

2

3

4

Action Delete all of the ODM information about logical volumes. Get the list of logical volumes from the VGDA of the physical volume. # lqueryvg -p hdisk0 -L | awk '{print $2}' \ | while read LVname; do > odmdelete -q “name=$LVname” -o CuAt > odmdelete -q “name=$LVname” -o CuDv > odmdelete -q “value3=$LVname” -o CuDvDr > done Delete the volume group information from ODM. # odmdelete -q “name=rootvg” -o CuAt # odmdelete -q “parent=rootvg” -o CuDv # odmdelete -q “name=rootvg” -o CuDv # odmdelete -q “name=rootvg” -o CuDep # odmdelete -q “dependency=rootvg” -o CuDep # odmdelete -q “value1=10” -o CuDvDr # odmdelete -q “value3=rootvg” -o CuDvDr Add the volume group associated with the physical volume back to the ODM. # importvg -y rootvg hdisk0 Recreate the device configuration database in the ODM from the information on the physical volume. # varyonvg -f rootvg

This assumes that hdisk0 is part of rootvg. In CuDvDr: value1 = major number value2 = minor number value3 = object name for major/minor number

7-52 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

rootvg always has value1 = 10. The steps can also be used to recover other volume groups by substituting the appropriate physical volume and volume group information. It is suggested that this example be made a script.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-53

Instructor Guide

Instructor notes: Purpose — Describe how to fix ODM problems in rootvg by using the rvgrecover script and other techniques. Details — Explain the student material. Ensure students understand that they do not need to reboot in maintenance mode to fix non-rootvg inconsistencies. Remind them of the importance of backing up rootvg (if possible) before attempting repair on rootvg. Additional information — The AIX 4.3 Problem Solving Guide and Reference was published in 1997. The man page entries (and corresponding entries in the AIX 6.1 Commands Reference) for redefinevg and synclvodm are brief but helpful. Transition statement —

7-54 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Intermediate level ODM commands IBM Power Systems

• High level LVM commands may not be a viable option. – ODM corruption prevents high level commands from running. – varyoffvg and exportvg will disrupt availability.

• redefinevg –d – Identifies and reenters PV data for the VG in the ODM – Checks for inconsistencies between LVM data areas and ODM – Recovers some, but not all of the LV data

• synclvodm – Synchronizes the VGDA, LVCB, ODM, and special device files – Volume group must be active – First run the redefinevg command if ODM does not have the minimum required information about the volume group.

© Copyright IBM Corporation 2009

Figure 7-20. Intermediate level ODM commands

AN151.0

Notes: Overview There are situations where you are unable to run the exportvg or importvg commands because they depend on finding a minimal level of information in the ODM. Even if these high level LVM commands can be run, they require that the volume group be taken offline, which would be disruptive. In these situations it is useful to know some intermediate level LVM commands. These commands are primarily intended to be used by high level ODM commands, but they can be useful in solving tough problems.

The synclvodm command Syntax: synclvodm [ ...] Use of the synclvodm command is yet another way that you might be able to fix ODM problems in rootvg. If, for some reason, the ODM is not consistent with on-disk information, the synclvodm command can be used to resynchronize the database. It synchronizes or rebuilds the LVCB, the ODM, and the VGDAs. The volume group must © Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-55

Instructor Guide

be active for the resynchronization to occur. If logical volume names are specified, only the information related to those logical volumes is updated. The synclvodm command, by itself, can do a fairly complete job of resynchronizing the ODM with the LVM data areas on the disk. It will also synchronize the information between the LVM data areas. As such, it can worsen a situation where only one disk in the volume group has corrupted data areas. The command can be restricted to synchronizing only specific logical volumes. Otherwise, it synchronizes all logical volumes. The synclvodm command depends upon a minimal amount of information in the ODM; most importantly, the ODM needs to know the volume group name plus the physical volume and logical volume memberships.

The redefinevg command The redefinevg command redefines the set of physical volumes of the given volume group in the device configuration database. If inconsistencies occur between the physical volume information in the ODM and the on-disk metadata, the redefinevg command determines which physical volumes belong to the specified volume group and re-enters this information in the ODM. The redefinevg command checks for inconsistencies by reading the reserved areas of all the configured physical volumes attached to the system. It is sometimes necessary to run the redefinevg command to obtain the minimum information about the volume group. It will create new ODM objects for the provided volume group name and it will use the LVM data areas in the specified disk to obtain the correct LVM information. The redefinevg command is not designed to fully rebuild all of the logical volume information. Thus, after running the redefinevg command, it is often necessary to run the synclvodm command to obtain the rest of the logical volume information. These commands can be run with the volume group still on-line.The ODM corruption may prevent any attempt to vary them offline.

7-56 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain the use of LVM intermediate level commands. Details — Note that there is an optional part of the exercise where they explore the use of these intermediate level commands. Additional information — Transition statement — Let’s try some of this in a lab exercise.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-57

Instructor Guide

Exercise 7: LVM metadata and problems (parts 1 and 2) IBM Power Systems

• Part 1: Fixing LVM ODM problems using exportvg and importvg • Part 2: Fixing LVM ODM problems using the rvgrecover procedure • Part 3 (optional): Using intermediate LVM commands

© Copyright IBM Corporation 2009

Figure 7-21. Exercise 7: LVM metadata and problems (parts 1 and 2)

AN151.0

Notes: Goals for part 1 of this exercise At the end of this part of this exercise, you should be able to: - Analyze an LVM-related ODM problem - Fix an LVM-related ODM problem associated with the rootvg

7-58 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Transition to the lab. Details — Explain the goals of this part of the exercise. Additional information — None Transition statement — (Use this statement after students have completed the exercise.) Let’s move on to the final portion of this unit. We will start by looking at mirroring in more detail and learning how to set it up.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-59

Instructor Guide

7-60 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

7.2. Failed disks: Mirroring and quorum issues Instructor topic introduction What students will do — The students will review how mirroring helps with failed disks, learn what the term quorum means, how it relates to situations where a volume group is mirrored, and what physical volume states are defined by LVM. How students will do it — Through lecture, exercise, and checkpoint questions What students will learn — Students will learn: • How mirrored volume groups handle failed disks situations • What stale partitions are • How the quorum mechanism works • What physical volume states are defined by LVM How this will help students on their job — By learning about these advanced topics, students will be able to increase the availability or the performance of AIX systems.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-61

Instructor Guide

Mirroring IBM Power Systems

Logical Partitions

hdisk0

hdisk1 Mirrored Logical Volume

hdisk2

VGSA

LP:

PP1:

PP2:

PP3:

5

hdisk0, 5

hdisk1, 8

hdisk2, 9

© Copyright IBM Corporation 2009

Figure 7-22. Mirroring

AN151.0

Notes: Using mirroring to increase availability The visual above shows a mirrored logical volume, where each logical partition is mirrored to three physical partitions. More than three copies are not possible. If one of the disks fails, there are at least two copies of the data available. That means mirroring is used to increase the availability of a system or a logical volume.

Role of VGSA The information about the mirrored partitions is stored in the VGSA, which is contained on each disk. In the example shown on the visual, we see that logical partition 5 points to physical partition 5 on hdisk0, physical partition 8 on hdisk1, and physical partition 9 on hdisk2.

7-62 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Historical information In AIX 4.1/4.2, the maximum number of mirrored partitions on a disk was 1016. AIX 4.3 and subsequent releases allow more than 1016 mirrored partitions on a disk.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-63

Instructor Guide

Instructor notes: Purpose — Review the concept of mirroring. Details — This is a review of concepts that were covered in the prerequisite course. Mirroring is being discussed to support the discussion of failed disks and why we turn off quorum checking. Use the student notes to guide your explanation. Additional information — None Transition statement — Let’s describe what stale partitions are.

7-64 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Stale partitions IBM Power Systems

hdisk0 Mirrored Logical Volume

hdisk1

hdisk2

Stale partition

After repair of hdisk2: • varyonvg VGName (calls syncvg -v VGName) • Only stale partitions are updated © Copyright IBM Corporation 2009

Figure 7-23. Stale partitions

AN151.0

Notes: How data becomes stale If a disk that contains a mirrored logical volume (such as hdisk2 on the visual) fails, the data on the failed disk becomes stale (not current, not up-to-date).

How state information is kept State information (active or stale) is kept for each physical partition. A physical volume is shown as stale (lsvg VGName), as long as it has one stale partition.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-65

Instructor Guide

Updating stale partitions If a disk with stale partitions has been repaired (for example, after a power failure), you should issue the varyonvg command which starts the syncvg command to synchronize the stale partitions. The syncvg command is started as a background job that updates all stale partitions from the volume group. Always use the varyonvg command to update stale partitions. After a power failure, a disk forgets its reservation. The syncvg command cannot reestablish the reservation, whereas varyonvg does this before calling syncvg. The term reservation means that a disk is reserved for one system. The disk driver puts the disk in a state where you can work with the disk (at the same time the control LED of the disk turns on). The varyonvg command works if the volume group is already varied on or if the volume group is the rootvg.

7-66 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain what stale partitions are. Details — Explain using the information in the student notes. The prerequisite course discusses stale partitions as a stage in the creation of mirroring. Remind them that this can also happen as a result of disk failure. Once a disk is recovered, the syncvg command will need to be run to resynchronize the copies. Additional information — Explain that using varyonvg is better than using syncvg directly. The varyonvg command works if the VG is already varied on and if the VG is the rootvg. Transition statement — Let’s see how mirrored LVs can be created.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-67

Instructor Guide

Mirroring rootvg IBM Power Systems

hd9var hd8 hd5

hd9var hd8 hd5

mirrorvg

...

1. 2. 3. 4.

...

hd1

hd1

hdisk0

hdisk1

5. 6. 7. 8.

extendvg chvg -Qn mirrorvg -s syncvg -v

bosboot -a bootlist shutdown -Fr bootinfo -b

• Make a copy of all rootvg LVs using mirrorvg and place copies on the second disk • Execute bosboot and change your bootlist © Copyright IBM Corporation 2009

Figure 7-24. Mirroring rootvg

AN151.0

Notes: Reason to mirror rootvg What is the reason to mirror the rootvg? If your rootvg is on one disk, you get a single point of failure; that means, if this disk fails, your machine is not available any longer. If you mirror rootvg to a second (or third) disk, and one disk fails, there will be another disk that contains the mirrored rootvg. You increase the availability of your system.

Procedure for mirroring rootvg The following steps show how to mirror the rootvg. - Add the new disk to the volume group (for example, hdisk1): # extendvg [ -f ] rootvg hdisk1

7-68 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

- If you use one mirror disk, be sure that a quorum is not required for varyon: # chvg -Qn rootvg - Add the mirrors for all rootvg logical volumes: # # # # # # # # # #

mklvcopy mklvcopy mklvcopy mklvcopy mklvcopy mklvcopy mklvcopy mklvcopy mklvcopy mklvcopy

hd1 2 hdisk1 hd2 2 hdisk1 hd3 2 hdisk1 hd4 2 hdisk1 hd5 2 hdisk1 hd6 2 hdisk1 hd8 2 hdisk1 hd9var 2 hdisk1 hd10opt 2 hdisk1 hd11admin 2 hdisk1

If you have other logical volumes in your rootvg, be sure to create copies for them as well. An alternative to running multiple mklvcopy commands is to use mirrorvg. This command was added in AIX V4.2 to simplify mirroring VGs. The mirrorvg command by default will disable quorum and mirror the existing LVs in the specified VG. To mirror rootvg, use the command: # mirrorvg -s rootvg - Now synchronize the new copies you created: # syncvg -v rootvg - As we want to be able to boot from different disks, we need to use bosboot: # bosboot -a As hd5 is mirrored, there is no need to do it for each disk. - Update the bootlist. In case of a disk failure, we must be able to boot from different disks. # bootlist -m normal hdisk1 hdisk0 # bootlist -m service hdisk1 hdisk0 - Reboot the system # shutdown -Fr - Check that the system boots from the first boot disk. # bootinfo -b

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-69

Instructor Guide

Instructor notes: Purpose — Review how to mirror rootvg. Details — Again, this is review from the prerequisite course. Use the information in the student material to guide your presentation. You may wish to ask the students why the mirroring of the rootvg has extra steps. One of the steps is the turing off of quorum checking; this can be used as a segue into the next visuals on quorum. Additional information — When mirroring rootvg, hd6 should be mirrored because the paging space availability is critical to keeping the system online. hd6 serves both as paging space and as the default dump device. In AIX V 4.3.3 and subsequent releases, there is no problem with mirroring dump devices. In releases prior to 4.3.3, dump devices did not work correctly if mirrored. On these older releases, a separate dump device should be created and not mirrored. Before 4.3.3, if the dump device was mirrored, when the dump occurred, the data would be written to one copy of the mirror. Even though only one copy was updated, no partitions would be marked stale. When the machine rebooted, the dump data would attempt to move the data from hd6 and write it to /var/adm/ras (by default). Since LVM would think the mirror was in sync, it would read the data from all copies of hd6 causing the dump to become corrupted. In 4.3.3 and subsequent releases, it is possible to read a specified copy of a mirror. Basically, if working with releases prior to 4.3.3, the dump area should be separate and not mirrored. With 4.3.3 and subsequent releases, it is safe to leave hd6 as the dump device and mirror it. Also, with mirrorvg, quorum is turned off by default. Use -Q to leave quorum enabled. The -s option prevents the sync from occurring. If you use the -s, make sure syncvg is run eventually to sync the mirrors. Transition statement — Let’s show another way to mirror the rootvg.

7-70 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

VGDA count IBM Power Systems

Two-disk Volume Group Loss of PV1: Only 33% VGDAs available (No quorum)

PV1

Loss of PV2: 66% of VGDAs available (Quorum)

PV2

Three-disk Volume Group Loss of 1 PV: 66% of VGDAs still available (Quorum)

PV1

PV2

PV3

© Copyright IBM Corporation 2009

Figure 7-25. VGDA count

AN151.0

Notes: Reservation of space for VGDAs Each disk that is contained in a volume group contains at least one VGDA. The LVM always reserves space for two VGDAs on each disk.

Volume groups containing two disks If a volume group consists of two disks, one disk contains two VGDAs, the other disk contains only one (as shown on the visual). If the disk with the two VGDAs fails, we have only 33% of VGDAs available, that means we have less than 50% of VGDAs. In this case, the quorum which means that more than 50% of VGDAs must be available, is not fulfilled.

Volume groups containing more than two disks If a volume group consists of more than two disks, each disk contains one VGDA. If one disk fails, we still have 66% of VGDAs available and the quorum is fulfilled. © Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-71

Instructor Guide

Instructor notes: Purpose — Describe how VGDAs are stored on disks in a volume group and how these VGDAs are involved in determining whether quorum exists. Details — Use the information in the student material to guide your presentation. Additional information — None Transition statement — Let’s discuss what happens if a quorum is not available.

7-72 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Quorum not available IBM Power Systems

datavg One VGDA

Two VGDAs hdisk1

hdisk2

If hdisk1 fails, datavg has no quorum. e activ t o n VG

# varyonvg datavg

VG a

ctiv e

Closed during operation: • No more access to LVs • LVM_SA_QUORCLOSE in error log

FAILS

© Copyright IBM Corporation 2009

Figure 7-26. Quorum not available

AN151.0

Notes: Introduction What happens if quorum checking is enabled for a volume group and a quorum is not available? Consider the following example (illustrated on the visual and discussed in the following paragraphs): In a two-disk volume group datavg, the disk hdisk1 is not available due to a hardware defect. hdisk1 is the disk that contains the two VGDAs; that means the volume group does not have a quorum of VGDAs.

Result if volume group not varied on If the volume group is not varied on and the administrator tries to vary on datavg, the varyonvg command will fail.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-73

Instructor Guide

Volume group already varied on If the volume group is already varied on when quorum is lost, the LVM will deactivate the volume group. There is no more access to any logical volume that is part of this volume group. At this point, the system sometimes shows strange behavior. This situation is posted to the error log, which shows an error entry LVM_SA_QUORCLOSE. After losing the quorum, the volume group may still be listed as active (lsvg -o), however, all application data access and LVM functions requiring data access to the volume group will fail. The volume group is dropped from the active list as soon as the last logical volume is closed. You can still use fuser -k /dev/LVname and umount /dev/LVname, but no data is actually written to the disk.

7-74 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Describe the quorum mechanism. Details — Describe what happens when the quorum is not available. Make sure they understand the difference between quorum checking of an active VG and the quorum mechanisms involved with trying to vary on an inactive VG. Additional information — Some of this discussion applies to rootvg. However, there are some differences, as we will see later. Transition statement — Let’s describe how to set up nonquorum volume groups.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-75

Instructor Guide

Nonquorum volume groups IBM Power Systems

With single mirroring, always disable the quorum: • chvg -Qn datavg • varyoffvg datavg • varyonvg datavg

Additional considerations for rootvg: • chvg -Qn rootvg • bosboot -ad /dev/hdiskX • Reboot

•Turning off the quorum checking: – Requires 100% VGDAs for normal varyonvg – Allows volume group to stay active if quorum is lost © Copyright IBM Corporation 2009

Figure 7-27. Nonquorum volume groups

AN151.0

Notes: Loss of quorum in a nonquorum volume group When a nonquorum volume group loses its quorum it will not be deactivated. It will be active until it loses all of its physical volumes.

Recommendations when using single mirroring When working with single mirroring, always disable quorum checking using the command chvg -Qn. For data volume groups, you must vary off and vary on the volume group to make the change effective.

Recommendations for rootvg When turning off the quorum checking for rootvg, you must do a bosboot (or a savebase), to reflect the change in the ODM in the boot logical volume. Afterwards, reboot the machine. 7-76 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Varying on a nonquorum volume group It is important that you know that turning off the quorum checking does not allow a varyonvg without a quorum. It just prevents the closing of an active volume group when losing its quorum.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-77

Instructor Guide

Instructor notes: Purpose — Describe nonquorum volume groups. Details — Cover the material in the student notes. Additional information — None Transition statement — What can you do if a varyonvg fails?

7-78 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Forced vary on (varyonvg -f) IBM Power Systems

datavg One VGDA

Two VGDAs ved" o m e r "

hdisk1

hdisk2

# varyonvg datavg Fails (even when quorum disabled) Check the reason for the failure (cable, adapter, power), before doing the following: # varyonvg -f datavg Failure accessing hdisk1. Set PV STATE to removed. Volume group datavg is varied on.

© Copyright IBM Corporation 2009

Figure 7-28. Forced vary on (varyonvg -f)

AN151.0

Notes: When normal vary on may fail If the quorum of VGDAs is not available during vary on, the varyonvg command fails, even when quorum is disabled. In fact, when quorum is disabled, the varyonvg command requires that 100% of the VGDAs be available instead of 51%.

Doing a forced vary on Before doing a forced vary on (varyonvg -f), always check the reason of the failure. If the physical volume appears to be permanently damaged, use a forced varyonvg. All physical volumes that are missing during this forced vary on will be changed to physical volume state removed. This means that all the VGDA and VGSA copies will be removed from these physical volumes. Once this is done, these physical volumes will no longer take part in quorum checking, nor will they be allowed to become active within the volume group until you return them to the volume group. © Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-79

Instructor Guide

Change in VGDA distribution In the example on the visual, the active disk hdisk2 becomes the disk with the two VGDAs. This does not change, even if the failed disk can be brought back.

Quorum checking on With Quorum Checking On, you always need > 50% of the VGDAs available (except to vary on rootvg).

Quorum checking off With Quorum Checking Off, you have to make a distinction between an already active volume group and between varying on a volume group. An active volume group will be kept open as long as there is at least one VGDA available. Set MISSINGPV_VARYON=true in /etc/environment if a volume group needs to be varied on with missing disks at boot time. When using varyonvg -f or using MISSINGPV_VARYON=true, you take full responsibility for the volume group integrity.

7-80 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Describe forced vary on of a volume group. Details — Use the student notes to guide your explanation. Additional information — Since AIX 4.3.1, you can change varyon behavior with the MISSINGPV_VARYON variable. AIX 4.2.1 needed >50% of VGDAs to varyon and < AIX 4.2.1 needed 100%. Transition statement — Let’s discuss what’s meant by physical volume state.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-81

Instructor Guide

Physical volume states IBM Power Systems

varyonvg VGName

active um o r k? u Q o

Q losuoru t? m

missing

missing varyonvg -f VGName

Hardware repair

removed Hardware repair followed by: varyonvg VGName chpv -v a hdiskX

removed © Copyright IBM Corporation 2009

Figure 7-29. Physical volume states

AN151.0

Notes: Introduction This page introduces physical volume states (not device states). Physical volume states can be displayed with lsvg -p VGName.

Active state If a disk can be accessed during a varyonvg, it gets a PV state of active.

Missing state If a disk can not be accessed during a varyonvg, but quorum is available, the failing disk gets a PV state missing. If the disk can be repaired, for example, after a power failure, you just have to issue a varyonvg VGName to bring the disk into the active state again. Any stale partitions will be synchronized.

7-82 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Removed state If a disk cannot be accessed during a varyonvg and the quorum of disks is not available, you can issue a varyonvg -f VGName, a forced vary on of the volume group. The failing disk gets a PV state of removed, and it will not be used for quorum checks any longer.

Recovery after repair If you are able to repair the disk (for example, after a power failure), executing a varyonvg alone does not bring the disk back into the active state. It maintains the removed state. At this stage, you have to announce the fact that the failure is over by using the following command: # chpv -va hdiskX This defines the disk hdiskX as active. Note that you have to do a varyonvg VGName afterwards to synchronize any stale partitions.

The chpv -r command The opposite of chpv -va is chpv -vr which brings the disk into the removed state. This works only when all logical volumes have been closed on the disk that will be defined as removed. Additionally, chpv -vr does not work when the quorum will be lost in the volume group after removing the disk.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-83

Instructor Guide

Instructor notes: Purpose — Introduce physical volume states. Details — Use the student notes to guide your presentation. Distinguish between PV states and device states. Additional information — None Transition statement — It is time for a checkpoint.

7-84 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Checkpoint IBM Power Systems

1. True or False: All LVM information is stored in the ODM. 2. True or False: You detect that a physical volume hdisk1 that is contained in your rootvg is missing in the ODM. This problem can be fixed by exporting and importing the rootvg.

© Copyright IBM Corporation 2009

Figure 7-30. Checkpoint

AN151.0

Notes:

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-85

Instructor Guide

Instructor notes: Purpose — Discuss the checkpoint questions. Details — A “Checkpoint Solution” is given below:

Checkpoint solutions IBM Power Systems

1. True or False: All LVM information is stored in the ODM. False. Information is also stored in other AIX files and in disk control blocks (like the VGDA and LVCB). 2. True or False: You detect that a physical volume hdisk1 that is contained in your rootvg is missing in the ODM. This problem can be fixed by exporting and importing the rootvg. False. Use the rvgrecover procedure instead. This script creates a complete set of new rootvg ODM entries.

© Copyright IBM Corporation 2009

Additional information — None Transition statement — Let’s move on to an exercise.

7-86 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Exercise 7: LVM metadata and problems (parts 4 and 5) IBM Power Systems

• Part 4: Working with quorum issues • Part 5 (optional): Manually fixing an LVM ODM problem

© Copyright IBM Corporation 2009

Figure 7-31. Exercise 7: LVM Metadata and problems (parts 4 and 5)

AN151.0

Notes: Objectives for part 4 of this exercise At the end of the exercise, you should be able to: - Create a two disk volume group - Deal with situations where there is a loss of quorum

Objectives for optional part 5 of this exercise At the end of the exercise, you should be able to: - Manually rebuild missing LVM-related ODM objects

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-87

Instructor Guide

Instructor notes: Purpose — Prepare the students for the lab. Details — Use this visual as a transition to the lab. Provide the goals of the lab at this point. If there is extra time after completing part 4, the students can either work on part 5 or go back and work on any of the previous optional parts. Additional information — None Transition statement — Let’s review some of the key points from this unit.

7-88 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit summary IBM Power Systems

Having completed this unit, you should be able to: • Explain where LVM information is stored • Solve ODM-related LVM problems • Manage volume group quorum issues • Explain the physical volume states used by the LVM

© Copyright IBM Corporation 2009

Figure 7-32. Unit summary

AN151.0

Notes: • The LVM information is held in a number of different places on the disk, including the ODM and the VGDA. • ODM-related problems can be solved by: - exportvg/importvg (non-rootvg VGs) - rvgrecover (rootvg) - LVM intermediate commands - Manually fixing using ODM commands • Quorum means that more than 50% of VGDAs must be available. • Quorum enforcement should be disabled when dealing with a two-disk mirrored VG.

© Copyright IBM Corp. 2009

Unit 7. Disk management theory Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-89

Instructor Guide

Instructor notes: Purpose — Summarize key points from the unit. Details — Present the highlights from the unit. Additional information — None Transition statement — Let’s continue with the next unit.

7-90 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit 8. Disk management procedures What this unit is about This unit describes different disk management procedures: • Disk replacement procedures • Procedures to solve problems caused by an incorrect disk replacement • Managing situations where duplicate file systems or logical volumes complicate the import of a volume group.

What you should be able to do After completing this unit, you should be able to: • Replace a disk under different circumstances • Recover from a total volume group failure • Rectify problems caused by incorrect actions that have been taken to change disks • Manage importvg issues

How you will check your progress Accountability: • Lab exercises • Checkpoint questions

References Online

AIX Version 6.1 Command Reference volumes 1-6

Online

AIX Version 6.1 Operating system and device management

Note: References listed as “online” above are available at the following address: http://publib.boulder.ibm.com/infocenter/systems GG24-4484

AIX Storage Management (Redbook)

SG24-5432

AIX Logical Volume Manager from A to Z: Introduction and Concepts (Redbook)

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-1

Instructor Guide

SG24-5433

8-2

AIX Logical Volume Manager from A to Z: Troubleshooting and Commands (Redbook)

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit objectives IBM Power Systems

After completing this unit, you should be able to: • Replace a disk under different circumstances • Recover from a total volume group failure • Rectify problems caused by incorrect actions that have been taken to change disks • Manage importvg issues

© Copyright IBM Corporation 2009

Figure 8-1. Unit objectives

AN151.0

Notes: Introduction This unit presents many disk management procedures that are very important for any AIX system administrator.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-3

Instructor Guide

8-4

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Present the objectives of this unit. Details — Additional information — Transition statement — Let's start with the disk replacement procedures.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-5

Instructor Guide

8-6

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

8.1. Disk replacement techniques Instructor topic introduction What students will do — The students will identify how to replace a disk under different conditions. How students will do it — Through lecture, lab exercise, and checkpoint questions What students will learn — Students will: • Identify how to replace a disk under different conditions • Recover from a total volume group failure How this will help students on their job — Replacing a disk is not always an easy job. System administrators must know the procedures to replace a disk without corrupting the systems.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-7

Instructor Guide

Disk replacement: Starting point IBM Power Systems

A disk must be replaced ...

Yes

Disk mirrored?

Procedure 1

No Disk still working?

Yes

Procedure 2

No Volume group lost?

rootvg

Procedure 4

No

Procedure 3

Not rootvg

Yes Procedure 5 © Copyright IBM Corporation 2009

Figure 8-2. Disk replacement: Starting point

AN151.0

Notes: Reasons to replace a disk Many reasons might require the replacement of a disk, for example: - Disk too small - Disk too slow - Disk produces many DISK_ERR4 log entries

Flowchart Before starting the disk replacement, always follow the flowchart that is shown in the visual. This will help you whenever you have to replace a disk. 1. If the disk that must be replaced is completely mirrored onto another disk, follow procedure 1. 2. If a disk is not mirrored, but still works, follow procedure 2. 8-8

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

3. If you are absolutely sure that a disk failed and you are not able to repair the disk, do the following: - If the volume group can be varied on (normal or forced), use procedure 3. - If the volume group is totally lost after the disk failure, that means the volume group could not be varied on (either normal or forced). • If the volume group is rootvg, follow procedure 4. • If the volume group is not rootvg follow procedure 5.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-9

Instructor Guide

Instructor notes: Purpose — Provide considerations before a disk replacement. Details — Explain as described in the student material. Additional information — This flowchart is a method to offer disk replacement procedures for many types of disk failures. It is not guaranteed that 100% of all disk failures are covered. A good way to distinguish between the various procedures is to focus on where we recover the data from: 1. Procedure 1 - We synchronize from a remaining good mirror copy. 2. Procedure 2 - We migrate the data off the suspect disk to the new disk before removing the suspect disk. 3. Procedure 3 - We recover the data from the filesystem backup(s) (or LV backup provided by the using application). 4. Procedure 4 - We recover using the mksysb backup of the rootvg. 5. Procedure 5 - We recover using the savevg backup for the non-rootvg. Transition statement — Let’s start with procedure 1.

8-10 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Procedure 1: Disk mirrored IBM Power Systems

1.

Remove all copies from disk: # unmirrorvg vg_name hdiskX

2.

Remove disk from volume group: # reducevg vg_name hdiskX

3.

Remove disk from ODM: # rmdev -l hdiskX -d

4.

Connect new disk to system

Mirrored

May have to shut down if not hot-pluggable 5.

Add new disk to volume group: # extendvg vg_name hdiskY

6.

Create new copies: # mirrorvg vg_name hdiskY # syncvg vg_name © Copyright IBM Corporation 2009

Figure 8-3. Procedure 1: Disk mirrored

AN151.0

Notes: When to use this procedure Use procedure 1 when the disk that must be replaced is mirrored.

Disk state This procedure requires that the disk state of the failed disk be either missing or removed. Refer to Physical Volume States in Unit 5: Disk Management Theory for more information on disk states. Use lspv hdiskX to check the state of your physical volume. If the disk is still in the active state, you cannot remove any copies or logical volumes from the failing disk. In this case, one way to bring the disk into a removed or missing state is to run the reducevg -d command or to do a varyoffvg and a varyonvg on the volume group by rebooting the system. Disable the quorum check if you have only two disks in your volume group.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-11

Instructor Guide

The goal and how to do it The goal of each disk replacement is to remove all logical volumes from a disk. 1. Start removing all logical volume copies from the disk. Use either the SMIT fastpath smit unmirrorvg or the unmirrorvg command as shown in the visual. This will unmirror each logical volume that is mirrored on the disk. If you have additional unmirrored logical volumes on the disk, you have to either move them to another disk (migratepv), or remove them if the disk cannot be accessed (rmlv). 2. If the disk is completely empty, remove the disk from the volume group. Use SMIT fastpath smit reducevg or the reducevg command. 3. After the disk has been removed from the volume group, you can remove it from the ODM. Use the rmdev command as shown in the visual. If the disk must be removed from the system, shut down the machine and then remove it, if the disk is not hot-pluggable. 4. Connect the new disk to the system and reboot your system. The cfgmgr will configure the new disk. If using hot-pluggable disks, a reboot is not necessary. 5. Add the new disk to the volume group. Use either the SMIT fastpath smit extendvg or the extendvg command. 6. Finally, create new copies for each logical volume on the new disk. Use either the SMIT fastpath smit mirrorvg or the mirrorvg command. Synchronize the volume group (or each logical volume) afterwards, using the syncvg command.

8-12 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain Procedure 1. Details — Additional information — When you read the student notes you might think the following: Removing a logical volume from a disk that fails is not possible. The important thing is: it is possible, but it requires the disk to be either in a missing or removed state. If the disk is active, the LVM does not allow you to unmount a file system or remove a logical volume from the failing disk. Now the problem is: how do you bring a disk into the missing or removed state? The answer is that you have to do a reducevg -d or to force a new varyonvg, either in a normal or a forced mode. Because you cannot do a varyoffvg when file systems are mounted (and you cannot unmount them from the failing disk), the only way to recover from this bad situation is to reboot your system. This might cause other problems if the failing disk is in rootvg and the quorum has not been disabled in a two-disk volume group. Transition statement — Let’s describe procedure 2.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-13

Instructor Guide

Procedure 2: Disk still working IBM Power Systems

1.

Connect new disk to system.

2.

Add new disk to volume group: # extendvg vg_name hdiskY

3.

Migrate old disk to new disk: (*) # migratepv hdiskX hdiskY

4.

Remove old disk from volume group: # reducevg vg_name hdiskX

5.

Remove old disk from ODM: # rmdev -l hdiskX -d

Volume group

hdiskY

(*) : Is the disk in rootvg? See next visual for further considerations © Copyright IBM Corporation 2009

Figure 8-4. Procedure 2: Disk still working

AN151.0

Notes: When to use this procedure Procedure 2 applies to a disk replacement where the disk is unmirrored but could be accessed. If the disk that must be replaced is in rootvg, follow the instructions on the next visual.

The goal and how to do it The goal is the same as always. Before we can replace a disk, we must remove everything from the disk. 1. Shut down your system if you need to physically attach a new disk to the system. Boot the system so that cfgmgr will configure the new disk. 2. Add the new disk to the volume group. Use either the SMIT fastpath smit extendvg or the extendvg command.

8-14 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

3. Before executing the next step, it is necessary to distinguish between the rootvg and a non-rootvg volume group. - If the disk that is replaced is in rootvg, execute the steps that are shown on the visual Procedure 2: Special Steps for rootvg. - If the disk that is replaced is not in the rootvg, use the migratepv command: # migratepv hdisk_old hdisk_new This command moves all logical volumes from one disk to another. You can do this during normal system activity. The command migratepv requires that the disks are in the same volume group. 4. If the old disk has been completely migrated, remove it from the volume group. Use either the SMIT fastpath smit reducevg or the reducevg command. 5. If you need to remove the disk from the system, remove it from the ODM using the rmdev command as shown. Finally, remove the physical disk from the system.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-15

Instructor Guide

Instructor notes: Purpose — Explain procedure 2. Details — Describe the procedure as explained in the student material. Additional information — Make it clear to the students that step 3 is different for rootvg. Transition statement — Let’s describe the special considerations for rootvg.

8-16 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Procedure 2: Special steps for rootvg IBM Power Systems

rootvg

1…

hdiskX

2…

hdiskY

3. Disk contains hd5? 1. Connect new disk to system 2. Add new disk to volume group

# # # #

migratepv -l hd5 hdiskX hdiskY bosboot -ad /dev/hdiskY chpv -c hdiskX bootlist -m normal hdiskY

Migrate old disk to new disk:

3.

# migratepv hdiskX hdiskY

4. Remove old disk from volume group

4…

5. Remove old disk from ODM

5…

© Copyright IBM Corporation 2009

Figure 8-5. Procedure 2: Special steps for rootvg

AN151.0

Notes: Additional steps for rootvg Procedure 2 requires some additional steps if the disk that must be replaced is in rootvg. 1. Connect the new disk to the system as described in procedure 2. 2. Add the new disk to the volume group. Use smit extendvg or the extendvg command. 3. This step requires special considerations for rootvg: - Check whether your disk contains the boot logical volume. The default location for the boot logical volume is /dev/hd5. Use the command lspv -l to check the logical volumes on the disk that must be replaced.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-17

Instructor Guide

If the disk contains the boot logical volume, migrate the logical volume to the new disk and update the boot logical volume on the new disk. To avoid a potential boot from the old disk, clear the old boot record by using the chpv -c command. Then, change your bootlist: # # # #

migratepv -l hd5 hdiskX hdiskY bosboot -ad /dev/hdiskY chpv -c hdiskX bootlist -m normal hdiskY

If the disk contains the primary dump device, you must deactivate the dump before migrating the corresponding logical volume: # sysdumpdev -p /dev/sysdumpnull - Migrate the complete old disk to the new one: # migratepv hdiskX hdiskY If the primary dump device has been deactivated, you have to activate it again: # sysdumpdev -p /dev/hdX 4. After the disk has been migrated, remove it from the root volume group. # reducevg rootvg hdiskX 5. If the disk must be removed from the system, remove it from the ODM (use the rmdev command), shut down your AIX, and remove the disk from the system afterwards. # rmdev -l hdiskX -d

8-18 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Describe the special considerations for rootvg. Details — Describe as provided in the student material. Additional information — Transition statement — Let’s describe procedure 3.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-19

Instructor Guide

Procedure 3: Disk in missing or removed state IBM Power Systems

1.

Identify all LVs and file systems on failing disk:

Volume group

# lspv -l hdiskY

2.

Unmount all file systems on failing disk: # umount /dev/lv_name

3.

# rmlv lv_name

Remove disk from volume group: # reducevg vg_name hdiskY

5.

# lspv hdiskY ... PV STATE: missing

Add new disk to volume group: # extendvg vg_name hdiskZ

7.

Recreate all LVs and file systems on new disk: # mklv -y lv_name

8.

# lspv hdiskY ... PV STATE: removed

Remove disk from system: # rmdev -l hdiskY -d

6.

hdiskY

Remove all file systems and LVs from failing disk: # smit rmfs

4.

hdiskX

# smit crfs

Restore file systems from backup: # restore -rvqf /dev/rmt0 © Copyright IBM Corporation 2009

Figure 8-6. Procedure 3: Disk in missing or removed state

AN151.0

Notes: When to use this procedure Procedure 3 applies to a disk replacement where a disk could not be accessed but the volume group is intact. The failing disk is either in a state (not device state) of missing (normal varyonvg worked) or removed (forced varyonvg was necessary to bring the volume group online). If the failing disk is in an active state (this is not a device state), this procedure will not work. In this case, one way to bring the disk into a removed or missing state is to run the reducevg -d command or to do a varyoffvg and a varyonvg on the volume group by rebooting the system. The reboot is necessary because you cannot vary off a volume group with open logical volumes. Because the failing disk is active, there is no way to unmount file systems.

8-20 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Procedure steps If the failing disk is in a missing or removed state, start the procedure: 1. Identify all logical volumes and file systems on the failing disk. Use commands like lspv, lslv or lsfs to provide this information. These commands will work on a failing disk. 2. If you have mounted file systems on logical volumes on the failing disk, you must unmount them. Use the umount command. 3. Remove all file systems from the failing disk using smit rmfs or the rmfs command. If you remove a file system, the corresponding logical volume and stanza in /etc/filesystems is removed as well. 4. Remove the remaining logical volumes (those not associated with a file system) from the failing disk using smit rmlv or the rmlv command. 5. Remove the disk from the volume group, using the SMIT fastpath smit reducevg or the reducevg command. 6. Remove the disk from the ODM and from the system using the rmdev command. 7. Add the new disk to the system and extend your volume group. Use the SMIT fastpath smit extendvg or the extendvg command. 8. Recreate all logical volumes and file systems that have been removed due to the disk failure. Use smit mklv, smit crfs or the commands directly. 9. Due to the total disk failure, you lost all data on the disk. This data has to be restored, either by the restore command or any other tool you use to restore data (for example, Tivoli Storage Manager) from a previous backup.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-21

Instructor Guide

Instructor notes: Purpose — Describe procedure 3. Details — Describe as explained in the student notes. Additional information — This procedure requires the volume group to be brought online, either by a varyonvg or a varyonvg -f. If it is forced, the failed disk will be in a removed state. Use lspv to analyze physical volume states. If it is a normal varyonvg, the disk will be in a missing state. Note that removing logical volumes is possible on a disk that could not be accessed. Transition statement — Let’s describe procedure 4.

8-22 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Procedure 4: Total rootvg failure IBM Power Systems

rootvg

1.

Replace bad disk hdiskX

2.

Boot in maintenance mode

3.

Restore from a mksysb tape

4.

Import each volume group into the new ODM (importvg) if needed

rootvg hdiskX

hdiskY

Contains OS logical volumes

datavg

hdiskZ

mksysb © Copyright IBM Corporation 2009

Figure 8-7. Procedure 4: Total rootvg failure

AN151.0

Notes: When to use this procedure Procedure 4 applies to a total rootvg failure. This situation might come up when your rootvg consists of one disk that fails. Or, your rootvg is installed on two disks and the disk fails that contains operating system logical volumes (for example, /dev/hd4).

Procedure steps Follow these steps: 1. Replace the bad disk and boot your system in maintenance mode. 2. Restore your system from a mksysb tape.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-23

Instructor Guide

If any rootvg file systems were not mounted when the mksysb was made, those file systems are not included on the backup image. You will need to create and restore those as a separate step. If your mksysb tape does not contain user volume group definitions (for example, you created a volume group after saving your rootvg), you have to import the user volume group after restoring the mksysb tape. For example: # importvg -y datavg hdisk9 Only one disk from the volume group (in our example hdisk9), needs to be selected. Export and import of volume groups is discussed in more detail in the next topic.

8-24 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain how to recover a total rootvg failure. Details — Describe as explained in the student notes. Additional information — Transition statement — Let’s describe procedure 5.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-25

Instructor Guide

Procedure 5: Total non-rootvg failure IBM Power Systems

1.

datavg

Export the volume group from the system: # exportvg vg_name

2.

Check /etc/filesystems.

3.

Remove bad disk from ODM and the system:

hdiskX

# rmdev -l hdiskX -d

4.

Connect the new disk.

5.

If volume group backup is available (savevg):

Tape # restvg -f /dev/rmt0 hdiskY

6.

If no volume group backup is available: Recreate ... - Volume group (mkvg) - Logical volumes and file systems (mklv, crfs)

hdiskY

Restore data from a backup: # restore -rqvf /dev/rmt0 © Copyright IBM Corporation 2009

Figure 8-8. Procedure 5: Total non-rootvg failure

AN151.0

Notes: When to use this procedure Procedure 5 applies to a total failure of a non-rootvg volume group. This situation might come up if your volume group consists of only one disk that fails. Before starting this procedure, make sure this is not just a temporary disk failure (for example, a power failure).

Procedure steps Follow these steps: 1. To fix this problem, export the volume group from the system. Use the command exportvg as shown. During the export of the volume group, all ODM objects that are related to the volume group will be deleted. 2. Check your /etc/filesystems. There should be no references to logical volumes or file systems from the exported volume group. 8-26 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

3. Remove the bad disk from the ODM (use rmdev as shown). Shut down your system and remove the physical disk from the system. 4. Connect the new drive and boot the system. The cfgmgr will configure the new disk. 5. If you have a volume group backup available (created by the savevg command), you can restore the complete volume group with the restvg command (or the SMIT fastpath smit restvg). All logical volumes and file systems are recovered. If you have more than one disk that should be used during restvg, you must specify these disks: # restvg -f /dev/rmt0 hdiskY hdiskZ The savevg and restvg commands will be discussed in a future chapter. 6. If you have no volume group backup available, you have to recreate everything that was part of the volume group. Recreate the volume group (mkvg or smit mkvg), all logical volumes (mklv or smit mklv) and all file systems (crfs or smit crfs). Finally, restore the lost data from backups, for example with the restore command or any other tool you use to restore data in your environment.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-27

Instructor Guide

Instructor notes: Purpose — Explain procedure 5. Details — Describe as explained in the student notes. Additional information — Transition statement — Let’s discuss some common disk replacement failures.

8-28 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Frequent disk replacement errors (1 of 4) IBM Power Systems

rootvg

rootvg - Migration hdiskY

hdiskX

Boot problems after migration: • Firmware LED codes cycle or boots to SMS multiboot menu Fix: • Check bootlist (SMS menu) • Check bootlist (bootlist) • Recreate boot logical volume (bosboot) © Copyright IBM Corporation 2009

Figure 8-9. Frequent disk replacement errors (1 of 4)

AN151.0

Notes: Possible problem after rootvg migration A common problem seen after a migration of the rootvg is that the machine will not boot. The LED codes may cycle. This loop indicates that the firmware is not able to find bootstrap code to boot from. At some firmware levels, the system will boot to SMS mode when unable to find a valid boot image. At the newest firmware level, the system console prompts whether you wish to continue looping or boot to SMS. This problem is usually easy to fix: - Check your bootlist by either: • Booting in SMS (F1) and check your bootlist • Booting in maintenance mode and check your bootlist using the bootlist command - If the bootlist is correct, update the boot logical volume using the bosboot command. © Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-29

Instructor Guide

Instructor notes: Purpose — Show what might happen after a rootvg migration. Details — Explain as described in the student notebook. Additional information — On a MicroChannel system, you get alternating LED codes 223-229. Modern systems stop at SMS. Transition statement — Let’s explain another disk replacement error.

8-30 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Frequent disk replacement errors (2 of 4) IBM Power Systems

VGDA: datavg

PVID: ...221...

hdisk4

PVID: ...555...

hdisk5

... physical: ...221... ...555...

ODM:

hdisk5 is removed from ODM and from the system, but not from the volume group: # rmdev -l hdisk5 -d © Copyright IBM Corporation 2009

CuAt: name = "hdisk4" attribute = "pvid" value = "...221..." ... CuAt: name = "hdisk5" attribute = "pvid" value = "...555..." ...

Figure 8-10. Frequent disk replacement errors (2 of 4)

AN151.0

Notes: The problem Another frequent error occurs when the administrator removes a disk from the ODM (by executing rmdev) and physically removes the disk from the system, but does not remove entries from the volume group descriptor area (VGDA). The VGDA stores information about all physical volumes of the volume group. Each disk has at least one VGDA. Disk information is also stored in the ODM, for example, the physical volume identifiers are stored in the ODM class CuAt. Note: Throughout this discussion the physical volume ID (PVID) is abbreviated in the visuals for simplicity. The physical volume ID is actually 32 characters. What happens if a disk is removed from the ODM but not from the volume group?

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-31

Instructor Guide

Instructor notes: Purpose — Introduce the VGDA corruption if a disk is removed from the ODM but not from the volume group. Details — Describe as explained in the student notes. Additional information — It is not possible to remove a disk from the ODM as long as it has open logical volumes. If any process is using a logical volume from a disk, you cannot remove the disk with rmdev. Transition statement — Let’s describe the fix for this error.

8-32 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Frequent disk replacement errors (3 of 4) IBM Power Systems

datavg

VGDA: ...

PVID: ...221...

physical: ...221... ...555...

hdisk4

!!! ODM:

# rmdev -l hdisk5 -d CuAt: name = "hdisk4" attribute = "pvid" value = "...221..." ...

Fix: # reducevg datavg ...555... Use PVID instead of disk name © Copyright IBM Corporation 2009

Figure 8-11. Frequent disk replacement errors (3 of 4)

AN151.0

Notes: The fix After removing a disk from the ODM, there is still a reference in the VGDA of the other disks in the volume group of the removed disk. In early AIX versions, the fix for this problem was difficult. You had to add ODM objects that described the attributes of the removed disk. This problem can now be fixed by executing the reducevg command. Instead of specifying the disk name, the physical volume ID of the removed disk is specified. Execute the lspv command to identify the missing disk. Write down the physical volume ID of the missing disk and compare this ID with the contents of the VGDA. Use the following command to query the VGDA on a disk: # lqueryvg -p hdisk4 -At (Use any disk from the volume group) If you are sure that you found the missing PVID, pass this PVID to the reducevg command. © Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-33

Instructor Guide

Instructor notes: Purpose — Describe how to fix this VGDA corruption. Details — Additional information — Transition statement — Let’s explain other errors that might come up after a disk replacement.

8-34 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Frequent disk replacement errors (4 of 4) IBM Power Systems

# lsvg -p datavg unable to find device id ...734... in device configuration database

ODM failure

1. Typo in command ?

Analyze failure

2. Analyze the ID of the device: Which PV or LV causes problems?

ODM problem in rootvg?

No

Yes

Export and import volume group

rvgrecover © Copyright IBM Corporation 2009

Figure 8-12. Frequent disk replacement errors (4 of 4)

AN151.0

Notes: ODM failure After an incorrect disk replacement, you might detect ODM failures. For example, when issuing the command lsvg -p datavg, a typical error message could be: unable to find device id 00837734 in device configuration database In this case, a device could not be found in the ODM.

Analyze the failure Before trying to fix it, check the command you typed in. Maybe it just contains a typo. Find out what device corresponds to the ID that is shown in the error message.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-35

Instructor Guide

Fix the ODM problem We have already discussed two ways to fix an ODM problem: - If the ODM problem is related to the rootvg, execute the rvgrecover procedure. - If the ODM problem is not related to the rootvg, export the volume group and import it again. Export and import will be explained in more detail in the next topic.

8-36 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Provide how to fix ODM failures (this is a kind of a review page). Details — Additional information — Transition statement — Next topic is export and import.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-37

Instructor Guide

8-38 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

8.2. Export and import Instructor topic introduction What students will do — The students will identify how to export and import volume groups. How students will do it — Through lecture, lab exercise, and checkpoint questions What students will learn — Students will: • Identify how to export a volume group • Identify how to import a volume group How this will help students on their job — Export and import are important features in AIX. They can be used to easily transfer data between systems and provide a method to fix ODM problems.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-39

Instructor Guide

Exporting a volume group IBM Power Systems

moon hdisk9 lv10 lv1 loglv1 01

To export a volume group: 1.Unmount all file systems from the volume group: # umount /dev/lv10 # umount /dev/lv11

myvg

2.Vary off the volume group: # varyoffvg myvg

3.Export volume group: # exportvg myvg

The complete volume group is removed from the ODM.

© Copyright IBM Corporation 2009

Figure 8-13. Exporting a volume group

AN151.0

Notes: The scenario The exportvg and importvg commands can be used to fix ODM problems. These commands also provide a way to transfer data between different AIX systems. This visual provides an example of how to export a volume group. The disk, hdisk9, is connected to the system moon. This disk belongs to the myvg volume group. This volume group needs to be transferred to another system.

Procedure to export a volume group Execute the following steps to export the volume group: 1. Unmount all file systems from the volume group. In the example, there are three logical volumes in myvg; lv10, lv11, and loglv01. The loglv01 logical volume is the JFS log device for the file systems in myvg, which is closed when all file systems are unmounted. 8-40 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

2. When all logical volumes are closed, use the varyoffvg command to vary off the volume group. 3. Finally, export the volume group, using the exportvg command. After this point, the complete volume group (including all file systems and logical volumes) is removed from the ODM. 4. After exporting the volume group, the disks in the volume group can be transferred to another system.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-41

Instructor Guide

Instructor notes: Purpose — Explain how to export a volume group. Details — Additional information — Transition statement — Let’s describe how to import a volume group.

8-42 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Importing a volume group IBM Power Systems

To import a volume group: 1. Configure the disks. 2. Import the volume group: # importvg -y myvg hdisk3

mars

3. Mount the file systems:

lv1 0 lv11 loglv 01

# mount /dev/lv10 # mount /dev/lv11 hdisk3

The complete volume group is added to the ODM.

myvg © Copyright IBM Corporation 2009

Figure 8-14. Importing a volume group

AN151.0

Notes: Procedure to import a volume group To import a volume group into a system, for example into a system named mars, execute the following steps: 1. Connect all disks (in our example we have only one disk) and reboot the system so that cfgmgr will configure the added disks. 2. You only have to specify one disk (using either hdisk# or the PVID) in the importvg command. Because all disks contain the same VGDA information, the system can determine this information by querying any VGDA from any disk in the volume group. If you do not specify the option -y, the command will generate a new volume group name. The importvg command generates completely new ODM entries.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-43

Instructor Guide

In AIX V4.3 and subsequent releases, the volume group is automatically varied on. 3. Finally, mount the file systems.

8-44 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain how to import a volume group. Details — Be sure to let students know that as of AIX 5L V5.2, importvg was enhanced to accept a PVID as a command line argument. For example: importvg -y myvg 0001810fd3838c5e Beginning with AIX 5L V5.3, the default algorithm for the importvg command was enhanced to reduce the execution time while maintaining a maximum of integrity protection. It is no longer the default to scan every disk of a system for an import operation. Beginning with AIX 5L V5.3, the importvg command uses the redefinevg command to get all the PVIDs by reading the VGDA of the disk that is related to the given volume group. Then, only the initial LVM records for those physical volumes are examined. The default method of previous AIX releases used to read the LVM record of every disk in the system trying to match the disks that are listed in the VGDA. Beginning with AIX 5L V5.3, this method will be an error path to try other disks in the system, if needed. Additional information — Prior to AIX V4.3, you had to check whether the volume group is varied on after the importvg. If the volume group is not automatically varied on, execute the varyonvg command to vary on the volume group. Transition statement — What happens if logical volumes already exist during the importvg?

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-45

Instructor Guide

importvg and existing logical volumes IBM Power Systems

mars lv10 l v11 loglv0 1

hdisk3

myvg lv10 lv11 loglv 01

# importvg -y myvg hdisk3 importvg: changing LV name lv10 to fslv00 importvg: changing LV name lv11 to fslv01 hdisk2

datavg importvg can also accept the PVID in place of the hdisk name © Copyright IBM Corporation 2009

Figure 8-15. importvg and existing logical volumes

AN151.0

Notes: Renaming logical volumes If you are importing a volume group with logical volumes that already exist on the system, the importvg command renames the logical volumes from the volume group that is being imported. The logical volumes /dev/lv10 and /dev/lv11 exist in both volume groups. During the importvg command, the logical volumes from myvg are renamed to /dev/fslv00 and /dev/fslv01.

8-46 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain what happens if a logical volume already exists on a system during the import. Details — Additional information — Transition statement — Let’s describe what happens if a file system already exists during an import.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-47

Instructor Guide

importvg and existing file systems (1 of 2) IBM Power Systems

/dev/lv10: /dev/lv11:

/home/sarah /home/michael

/dev/lv23: /dev/lv24:

/home/peter /home/michael

/dev/loglv00:

log device

/dev/loglv01:

log device

# importvg -y myvg hdisk3 Warning: mount point /home/michael already exists in /etc/filesystems # umount /home/michael # mount -o log=/dev/loglv01 /dev/lv24 /home/michael

© Copyright IBM Corporation 2009

Figure 8-16. importvg and existing file systems (1 of 2)

AN151.0

Notes: Using umount and mount If a file system (for example /home/michael) already exists on a system, you run into problems when you mount the file system that was imported. One method to get around this problem is to: 1. Unmount the file system that exists on the system. For example, /home/michael from datavg. 2. Mount the imported file system. Note that you have to specify the: - Log device (-o log=/dev/lvlog01) - Logical volume name (/dev/lv24) - Mount point (/home/michael)

8-48 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

If the file system type is jfs2, you have to specify this as well (-V jfs2). You can get this information by running the command getlvcb lv24 -At Another method is to add a new stanza to the /etc/filesystems file. This is covered in the next visual.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-49

Instructor Guide

Instructor notes: Purpose — Describe what happens if a file system already exists during the import. Details — Additional information — Transition statement — Let’s see how to add a stanza to the /etc/filesystems file.

8-50 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

importvg and existing file systems (2 of 2) IBM Power Systems

# vi /etc/filesystems /home/michael: dev = vfs = log = mount = options = account =

/dev/lv11 jfs /dev/loglv00 false rw false

/home/michael_moon: dev = /dev/lv24 vfs = jfs log = /dev/loglv01 mount = false options = rw account = false # mount # mount

/dev/lv10: /dev/lv11:

/home/sarah /home/michael

/dev/loglv00:

log device

datavg

/dev/lv23: /dev/lv24:

/home/peter /home/michael

/dev/loglv01:

log device

hdisk3 (myvg)

/home/michael /home/michael_moon

Mount point must exist

© Copyright IBM Corporation 2009

Figure 8-17. importvg and existing file systems (2 of 2)

AN151.0

Notes: Create a new stanza in /etc/filesystems If you need both file systems (the imported and the one that already exists) mounted at the same time, you must create a new stanza in /etc/filesystems. In our example, we create a second stanza for our imported logical volume, /home/michael_moon. The fields in the new stanza are: - dev specifies the logical volume, in our example /dev/lv24. - vfs specifies the file system type, in our example a journaled file system. - log specifies the JFS log device for the file system. - mount specifies whether this file system should be mounted by default. The value false specifies no default mounting during boot. The value true indicates that a file system should be mounted during the boot process. - options specifies that this file system should be mounted with read and write access. © Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-51

Instructor Guide

- account specifies whether the file system should be processed by the accounting system. A value of false indicates no accounting. Before mounting the file system /home/michael_moon, the corresponding mount point must be created.

8-52 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Describe how to add a stanza to /etc/filesystems. Details — Additional information — To discover the information contained in the newly imported volume group, use the standard LVM tools: To see the logical volume names: lsvg -l myvg To see details of the logical volumes: lslv lvname These commands will assist in creating the new stanza in /etc/filesystems. Transition statement — Let’s introduce the learn option of importvg.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-53

Instructor Guide

Checkpoint IBM Power Systems

1. Although everything seems to be working fine, you detect error log entries for disk hdisk0 in your rootvg. The disk is not mirrored to another disk. You decide to replace this disk. Which procedure would you use to migrate this disk? __________________________________________________ __________________________________________________

2. You detect an unrecoverable disk failure in volume group datavg. This volume group consists of two disks that are completely mirrored. Because of the disk failure you are not able to vary on datavg. How do you recover from this situation? __________________________________________________ __________________________________________________

3. After disk replacement, you recognize that a disk has been removed from the system but not from the volume group. How do you fix this problem? __________________________________________________ __________________________________________________ © Copyright IBM Corporation 2009

Figure 8-18. Checkpoint

AN151.0

Notes:

8-54 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Review and test the students understanding of this unit. Details — A suggested approach is to give the students about five minutes to answer the questions on this page. Then, go over the questions and answers with the class.

Checkpoint solutions IBM Power Systems

1. Although everything seems to be working fine, you detect error log entries for disk hdisk0 in your rootvg. The disk is not mirrored to another disk. You decide to replace this disk. Which procedure would you use to migrate this disk? Procedure 2: Disk still working. There are some additional steps necessary for hd5 and the primary dump device hd6.

2. You detect an unrecoverable disk failure in volume group datavg. This volume group consists of two disks that are completely mirrored. Because of the disk failure you are not able to vary on datavg. How do you recover from this situation? Forced varyon: varyonvg -f datavg. Use procedure 1 for mirrored disks.

3. After disk replacement, you recognize that a disk has been removed from the system but not from the volume group. How do you fix this problem? Use PVID instead of disk name: reducevg vg_name PVID © Copyright IBM Corporation 2009

Additional information — Transition statement — Now, let’s do an exercise.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-55

Instructor Guide

Exercise 8: Exporting and importing volume groups IBM Power Systems

• Disk replacement • Export and import a volume group • Analyze import messages (optional)

© Copyright IBM Corporation 2009

Figure 8-19. Exercise 8: Exporting and importing volume groups

AN151.0

Notes: Introduction This exercise can be found in your Student Exercise Guide.

8-56 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Introduce the exercise. Details — Additional information — Transition statement — Let’s summarize the unit.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-57

Instructor Guide

Unit summary IBM Power Systems

Having completed this unit, you should be able to: • Replace a disk under different circumstances • Recover from a total volume group failure • Rectify problems caused by incorrect actions that have been taken to change disks • Manage importvg issues © Copyright IBM Corporation 2009

Figure 8-20. Unit summary

AN151.0

Notes: Different procedures are available that can be used to fix disk problems under any circumstance: Procedure 1: Mirrored disk Procedure 2: Disk still working (rootvg specials) Procedure 3: Total disk failure Procedure 4: Total rootvg failure Procedure 5: Total non-rootvg failure

exportvg and importvg can be used to easily transfer volume groups between systems.

8-58 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Summarize the unit. Details — Present the highlights from the unit. Additional information — Transition statement — Let’s continue with the next unit.

© Copyright IBM Corp. 2009

Unit 8. Disk management procedures Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-59

Instructor Guide

8-60 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit 9. Install and backup techniques What this unit is about This unit describes techniques to reduce the size of a maintenance window. Specific techniques are taught for installing system updates or using AIX facilities to back up JFS2 file systems.

What you should be able to do After completing this unit, you should be able to: • Apply maintenance using the alternate disk copy technique • Apply maintenance using the multibos technique • Use JFS2 snapshot to back up file system data

How you will check your progress Accountability: • Checkpoint questions • Lab exercise

Reference Online

AIX Version 6.1 Command Reference volumes 1-6

Online

AIX Version 6.1 Operating system and device management

Online

AIX Version 6.1 Installation and migration

Note: References listed as “online” above are available at the following address: http://publib.boulder.ibm.com/infocenter/systems SG24-2014

AIX Version 4.3 Differences Guide (Redbook)

SG24-5765

AIX 5L Differences Guide: V 5.2 Edition (Redbook)

SG24-7463

AIX 5L Differences Guide: V 5.3 Edition (Redbook)

SG24-7414

AIX 5L Differences Guide: V 5.3 Addendum (Redbook)

SG24-7559

IBM AIX Version 6.1 Differences Guide (Redbook)

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-1

Instructor Guide

Unit objectives IBM Power Systems

After completing this unit, you should be able to: • Use alternate disk installation techniques for applying AIX maintenance • Use multibos to apply AIX maintenance • Use JFS2 snapshot to back up file system data

© Copyright IBM Corporation 2009

Figure 9-1. Unit objectives

AN151.0

Notes:

9-2

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — List this unit’s objectives. Details — Additional information — In the previous unit, students learned when volume group backups must be used after a disk failure. This unit will explain how to back up rootvg and non-rootvg volume groups. Transition statement — Let’s start with the mksysb command.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-3

Instructor Guide

9-4

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

9.1. Alternate disk installation Instructor topic introduction What students will do — The students will identify how alternate disk installation techniques can be used. How students will do it — Through lecture and activity What students will learn — Students will learn how to handle alternate disk installation techniques. How this will help students on their job — Being able to work with alternate disk installation allows students to handle the installation of large facilities. Systems can be installed over a longer period of time while the systems are still running at the same version. The switchover can then happen at the same time.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-5

Instructor Guide

Topic 1 objectives IBM Power Systems

After completing this topic, you should be able to: • Install a mksysb onto an alternate disk • Clone an existing rootvg to an alternate disk • Remove an alternate disk

© Copyright IBM Corporation 2009

Figure 9-2. Topic 1 objectives

AN151.0

Notes:

9-6

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Cover the unit objectives. Details — Additional information — Transition statement —

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-7

Instructor Guide

Alternate disk installation IBM Power Systems

#smit alt_install

# smit alt_clone -OR# alt_disk_mksysb

Installing a mksysb on another disk

# smit alt_mksysb -OR# alt_disk_copy

Cloning the running rootvg to another disk

© Copyright IBM Corporation 2009

Figure 9-3. Alternate disk installation

AN151.0

Notes: Benefits of alternate disk installation Alternate disk installation lets you install the operating system while the system is still up and running, which reduces installation or upgrade downtime considerably. It also allows large facilities to better manage an upgrade because systems can be installed over a longer period of time. While the systems are still running at the previous version, the switch to the newer version can happen at the same time.

When to use an alternate disk installation Alternate disk installation can be used in one of two ways: - Installing a mksysb image on another disk - Cloning the current running rootvg to an alternate disk

9-8

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Filesets An alternate disk installation uses the following filesets: - bos.alt_disk_install.boot_images must be installed for alternate disk mksysb installations - bos.alt_disk_install.rte must be installed for rootvg cloning and alternate disk mksysb installations

How to use alternate disk installation All modes of alternate disk installations are available through the SMIT fastpath: smit alt_install. To focus on installing a new image on an alternate disk, you can either use the SMIT fastpath: smit alt_mksysb or directly run the command: alt_disk_mksysb. To focus on cloning an existing mksysb to an alternate disk, you can either use the SMIT fastpath: smit alt_clone or directly run the command: alt_disk_copy.

How current commands relate to pre-AIX 5L V5.3 command Prior to AIX 5L V5.3, all alternate disk functions were invoked through a single command: alt_disk_install.The use of alt_disk_install command is still supported, but it now simply invokes the new replacement commands to do the actual work. The following three commands were added in AIX 5L V5.3: -

alt_disk_copy will create copies of rootvg on an alternate set of disks.

-

alt_disk_mksysb will install an existing mksysb on an alternate set of disks.

-

alt_rootvg_op will perform Wake, Sleep, and Customize operations.

The alt_disk_install module will continue to ship as a wrapper to the new modules. However, it will not support any new functions, flags, or features. The following table displays how the existing operation flags for alt_disk_install will map to the new modules. The alt_disk_install command will now call the new modules after printing an attention notice that it is obsolete. All other flags will apply as currently defined.

alt_disk_install Command Arguments

New Commands

-C args disk

alt_disk_copy args -d disks

-d mksysb args disks

alt_disk_mksysb -m mksysb args -d disks

-W args disk

alt_rootvg_op -W args -d disk

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-9

Instructor Guide

alt_disk_install Command Arguments

New Commands

-S args

alt_rootvg_op -S args

-P2 args disks

alt_rootvg_op -C args -d disks

-X args

alt_rootvg_op -X args

-v args disk

alt_rootvg_op -v args -d disk

-q args disk

alt_rootvg_op -q args -d disk

9-10 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Introduce alternate disk installation. Details — Alternate disk installation has been available since AIX V4.3. Additional information — Transition statement — Let’s discuss alternate mksysb disk installation.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-11

Instructor Guide

Alternate mksysb disk installation (1 of 2) IBM Power Systems

hdisk0 • rootvg (AIX 5L V5.3)

hdisk1

AIX 6.1

# alt_disk_mksysb –m /dev/rmt0 –d hdisk1 Example installs an AIX 6.1 mksysb on hdisk1 • Bootlist will be set to alternate disk (hdisk1) • Changing the bootlist allows you to boot different AIX levels (hdisk0 boots AIX 5L V5.3, hdisk1 boots AIX 6.1) © Copyright IBM Corporation 2009

Figure 9-4. Alternate mksysb disk installation (1 of 2)

AN151.0

Notes: Introduction An alternate mksysb installation involves installing a mksysb image that has already been created from another system onto an alternate disk of the target system. The mksysb image must have been created on a system running AIX V4.3 or subsequent versions of the operating system.

Example In the example, an AIX V6.1 mksysb tape image is installed on an alternate disk, hdisk1 by executing the following command: # alt_disk_mksysb -m /dev/rmt0 -d hdisk1 The system now contains two rootvgs on different disks. In the example, one rootvg has an AIX 5L V5.3 (hdisk0), one has an AIX 6.1 (hdisk1).

9-12 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Which disk does the system use to boot? The alt_disk_mksysb command changes the bootlist by default. During the next reboot, the system will boot from the new rootvg. If you do not want to change the bootlist, use the option -B of alt_disk_mksysb. By changing the bootlist, you determine which AIX version you want to boot. alt_disk_mksysb options.

Filesets within the mksysb being installed The mksysb image used for the installation must be created on a system that has either the same hardware configuration as the target system, or must have all the device and kernel support installed for a different machine type or platform. In this case, the following filesets must be contained in the mksysb: - devices.* - bos.mp - bos.up bos.64bit (if necessary)

alt_disk_mksysb options The alt_disk_mksysb command has the following options: -m device -d target disks -B : do not change the bootlist -i image.data -s script -R resolve.conf -p platform -L mksysb_level -n : remain a nim client -P phase -c console -r reboot after install -k keep mksysb device customization -y : import non-rootvg volume groups

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-13

Instructor Guide

Instructor notes: Purpose — Introduce alternate mksysb disk installation. Details — Additional information — Transition statement — Let’s introduce the SMIT interface.

9-14 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Alternate mksysb disk installation (2 of 2) IBM Power Systems

# smit alt_mksysb Install mksysb on an Alternate Disk Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * Target Disk(s) to install * Device or image name Phase to execute image.data file Customization script Set bootlist to boot from this disk on next reboot? Reboot when complete? Verbose output? Debug output? resolv.conf file

[]

[hdisk1] [/dev/rmt0] all [] / yes no no no []

+ + + /

+ + + + /

© Copyright IBM Corporation 2009

Figure 9-5. Alternate mksysb disk installation (2 of 2)

AN151.0

Notes: SMIT panel example The alternate disk install function can also be executed from the user-friendly smit dialog panel.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-15

Instructor Guide

Instructor notes: Purpose — Describe the SMIT interface for alternate mksysb disk installation. Details — Keep it very brief - we only want to show that this can be easily executed from a SMIT dialog panel. Additional information — The installation on the alternate disk is broken into three phases: 1. Phase 1 creates the altinst_rootvg volume group, the alt_logical volumes, the /alt_inst file systems and restores the mksysb data. 2. Phase 2 runs any specified customization script and copies a resolv.conf file, if specified. 3. Phase 3 umounts the /alt_inst file systems, renames the file systems and logical volumes and varies off the altinst_rootvg. It sets the bootlist and reboots, if specified. Each phase can be run separately. Phase 3 must be run to get a usable rootvg volume group. Transition statement — Let’s describe alternate disk rootvg cloning.

9-16 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Alternate disk rootvg cloning (1 of 2) IBM Power Systems

hdisk0 • rootvg (AIX 6.1 TL01) Clone

AIX

AIX 6.1 TL02

hdisk1 • rootvg (AIX 6.1 TL02)

# alt_disk_copy -b update_all -l /dev/cd0 -d hdisk1

• Example creates a copy of the current rootvg on hdisk1 • Installs a maintenance level on the clone (AIX 6.1 TL02) • Changing the bootlist allows you to boot different AIX levels (hdisk0 boots AIX 6.1 TL01, hdisk1 boots AIX 6.1 TL02) © Copyright IBM Corporation 2009

Figure 9-6. Alternate disk rootvg cloning (1 of 2)

AN151.0

Notes: Benefits of cloning rootvg Cloning the rootvg to an alternate disk can have many advantages. One advantage is having an online backup available, in case of a disk failure. Another benefit of rootvg cloning is in applying new maintenance levels or updates. A copy of the rootvg is made to an alternate disk (in our example hdisk1) followed by the installation of a maintenance level on that copy. The active system runs uninterrupted during this time. When it is rebooted, the system will boot from the newly updated rootvg for testing. If the maintenance level causes problems, the old rootvg can be retrieved by simply resetting the bootlist and rebooting.

Example In the example, rootvg which resides on hdisk0, is cloned to the alternate disk hdisk1. Additionally, a new maintenance level will be applied to the cloned version of AIX.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-17

Instructor Guide

Instructor notes: Purpose — Introduce alternate disk rootvg cloning. Details — Additional information — The alt_disk_copy options are (see man page): -b bundle name -f APAR_list file -F list_of_APARs -l path to location of installp images -w list_of_filesets_to_install -d target disks -B : do not change bootlist -r : reboot after cloning -s script -P phases -R resolv.conf -W filesets Transition statement — Let’s show the SMIT fastpath.

9-18 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Alternate disk rootvg cloning (2 of 2) IBM Power Systems

# smit alt_clone Clone the rootvg to an Alternate Disk Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * Target Disk(s) to install Phase to execute image.data file Exclude list

[hdisk1] all [] []

+ + / /

Bundle to install -ORFileset(s) to install

[update_all]

+

Fix bundle to install -ORFixes to install

[]

Directory or Device with images (required if filesets, bundles or fixes used) ... Customization script Set bootlist to boot from this disk on next reboot? Reboot when complete? ...

[/dev/cd0]

[]

[]

[] yes no

/ + +

© Copyright IBM Corporation 2009

Figure 9-7. Alternate disk rootvg cloning (2 of 2)

AN151.0

Notes: Example with SMIT The SMIT fastpath for alternate disk rootvg cloning is smit alt_clone. The target disk in the example is hdisk1, that means the rootvg will be copied to that disk. If you specify a bundle, a fileset or a fix, then the installation or the update takes place on the clone, not in the original rootvg. By default, the bootlist will be set to the new disk. Changing the bootlist allows you to boot from the original rootvg or the cloned rootvg.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-19

Instructor Guide

Instructor notes: Purpose — Describe the SMIT fastpath for alternate disk rootvg cloning. Details — Keep it very brief. Additional information — Transition statement — Let’s show how to remove an alternate disk installation.

9-20 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Removing an alternate disk installation IBM Power Systems

Original

# bootlist -m normal hdisk0 # reboot # lsvg rootvg altinst_rootvg # alt_rootvg_op -X

# bootlist -m normal hdisk1 # reboot # lsvg rootvg old_rootvg # alt_rootvg_op –X old_rootvg

Clone

hdisk0 • rootvg (AIX 6.1 TL01)

hdisk1 • rootvg (AIX 6.1 TL02)

• alt_rootvg_op -X removes the ODM definition from the ODM • Do not use exportvg to remove the alternate volume group

© Copyright IBM Corporation 2009

Figure 9-8. Removing an alternate disk installation

AN151.0

Notes: Removing the alternate rootvg If you have created an alternate rootvg with alt_disk_mksysb or alt_disk_copy, but no longer wish to use it, first boot your system from the original disk (in our example, hdisk0) then use alt_rootvg_op. When executing lsvg to list the volume groups in the system, the alternate rootvg is shown with the name altinst_rootvg. To remove the alternate rootvg, do not use the exportvg command. Simply run the following command: # alt_rootvg_op -X This command removes the altinst_rootvg definition from the ODM database. If exportvg is run by accident, you must recreate the /etc/filesystems file before rebooting the system. The system will not boot without a correct /etc/filesystems.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-21

Instructor Guide

Removing the original rootvg If you have created an alternate rootvg with alt_disk_mksysb or alt_disk_copy, and no longer wish to use the original disk, first boot your system from the cloned disk (in our example, hdisk1) and then use the alt_rootvg_op command to remove it. When executing lsvg to list the volume groups in the system, the alternate rootvg is shown with the name old_rootvg. To remove the original rootvg, do not use the exportvg command. Simply run the following command: # alt_rootvg_op -X old_rootvg This command removes the old_rootvg definition from the ODM database. If exportvg is run by accident, you must recreate the /etc/filesystems file before rebooting the system. The system will not boot without a correct /etc/filesystems.

9-22 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Describe how to remove an alternate disk installation. Details — Additional information — Transition statement — You may have noted that, up to this point, we only talked about applying maintenance to an existing version and release of AIX, but not about migrating to a new version and release. To use the alternate disk capabilities with a migration install, you need to use NIM. Let’s look at this briefly.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-23

Instructor Guide

NIM alternate disk migration (nimadm) IBM Power Systems

• alt_disk_copy does not support migrating to a new version or release of AIX. • nimadm uses a NIM server to migrate to an alternate disk. hdisk0 • rootvg •(AIX 5.3) Clone NIM server AIX

NIM client: lpar1

AIX 6.1

hdisk1 • rootvg •(AIX 6.1)

# nimadm -c lpar1 -s spot1 -l lpp1 -d "hdisk0 hdisk1" -Y © Copyright IBM Corporation 2009

Figure 9-9. NIM alternate disk migration (nimadm)

AN151.0

Notes: What is nimadm? The nimadm command (Network Install Manager Alternate Disk Migration) is a utility that allows the system administrator to create a copy of rootvg to a free disk (or disks) and simultaneously migrate it to a new version or release level of AIX. The nimadm command uses NIM resources to perform this function.

Advantages of nimadm There are several advantages to using the nimadm command over a conventional migration: • Reduced downtime. The migration is performed while the system is up and functioning normally. There is no requirement to boot from install media, and the majority of processing occurs on the NIM master.

9-24 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

• The nimadm command facilitates quick recovery in the event of migration failure. Since the nimadm command uses alt_disk_install to create a copy of rootvg, all changes are performed to the copy (altinst_rootvg). In the even of serious migration installation failure, the failed migration is cleaned up and there is no need for the administrator to take further action. In the event of a problem with the new (migrated) level of AIX, the system can be quickly returned to the pre-migration operating system by booting from the original disk. • The nimadm command allows a high degree of flexibility and customization in the migration process. This is done with the use of optional NIM customization resources: image_data, bosinst_data, exclude_files, pre-migration script, installp_bundle, and post-migration script.

Details of using NIM to perform an alternate disk migration are not covered in this course.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-25

Instructor Guide

Instructor notes: Purpose — Introduce the use of nimadm. Details — The intent is only to make the students aware of this NIM capability. You do not even need to cover the displayed example of the nimadm command. Instead, refer then to the full NIM training course. It is important that they understand that an alternate disk migration can not be done without using NIM. Additional information — Transition statement — Let’s do a review of this section.

9-26 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Exercise 9, topic 1: Alternate disk install IBM Power Systems

• Clone the existing rootvg • Apply a new service pack • Alternate boot between different levels

© Copyright IBM Corporation 2009

Figure 9-10. Exercise 9, topic 1: Alternate disk install

AN151.0

Notes:

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-27

Instructor Guide

Instructor notes: Purpose — Details — Additional information — Transition statement —

9-28 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

9.2. Using multibos Instructor Topic Introduction What students will do — The students will learn how to set up and use multibos to work with an alternate BOS. How students will do it — Through lecture and lab exercise What students will learn — Students will learn how to set up and use multibos to work with an alternate BOS. How this will help students on their job — An alternate BOS provides a tool for making AIX system modification (such as applying a new technology level) without any effect on the functionality of the active BOS. When the next maintenance window arrives, a quick reboot can be used to switch over to the new technology level. If there is an unexpected problem, a quick reboot returns the system to the prior state.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-29

Instructor Guide

Topic 2 objectives IBM Power Systems

After completing this topic, you should be able to:

• Clone an active BOS to a standby BOS

• Customize a standby BOS

• Alternate boot between an active BOS and a standby BOS

• Mount a standby BOS

• Start a standby BOS shell © Copyright IBM Corporation 2009

Figure 9-11. Topic 2 objectives

AN151.0

Notes:

9-30 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Cover the objectives for the topic. Details — Additional information — Transition statement — Let’s look at what multibos is and what it provides.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-31

Instructor Guide

multibos overview IBM Power Systems

• Two alternate AIX base operating systems (BOS) in a single rootvg • Standby BOS created as copy of active BOS • Modify standby BOS without affecting active BOS – Apply maintenance to standby BOS – Mount and modify standby BOS – Start interactive shell working in standby BOS

• Can alternate on reboot which BOS is active

© Copyright IBM Corporation 2009

Figure 9-12. multibos overview

AN151.0

Notes: Overview The main purpose of using multibos is to have the type of alternate BOS (base operating system) capabilities that are available with the alternate disk technology, without having to use another disk. The operating system filesets do not occupy enough space to justify allocating another entire disk for that purpose. With multibos, you can have the two BOS versions on the same disk. This is accomplished by creating copies of the effected (by an OS update) base operating system logical volumes (active BOS) with a different file name path. Note that these copies are in the one and only rootvg. Another advantage to multibos is that there is lower overhead to the cloning operation, since it does not need to clone all the LVs in the rootvg. Once you have created the alternate BOS, changes, such as applying maintenance, can be made to these copies, without changing the level of code being used in the 9-32 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

active BOS. In addition to applying maintenance, you can access and make configuration changes to the standby BOS through two techniques: mounting the standby BOS and starting an interactive shell (chroot) for the standby BOS. When you would like to test the standby BOS, you simply reboot using the standby copy of the boot logical volume (BLV). If there is a problem with the changes that were made, configure the bootlist to use the original BLV and a reboot will return you to the original version of the BOS.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-33

Instructor Guide

Instructor notes: Purpose — Provide an overview of multibos function and purpose. Details — Additional information — Transition statement — Let’s first look at the file system structure of the alternate BOS, when created.

9-34 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Active and standby BOS logical volumes IBM Power Systems

Active BOS BLV (hd5)

/ (hd4)

jfslog (hd8)

Standby BOS tmp opt home usr var (hd1) (hd10opt) (hd2) (hd9var) (hd3)

bos_inst (bos_hd4)

opt (bos_hd10opt)

(if mounted)

usr var (bos_hd2) (bos_hd9var) BLV (bos_hd5)

jfslog (bos_hd8)

© Copyright IBM Corporation 2009

Figure 9-13. Active and standby BOS logical volumes

AN151.0

Notes: Standby BOS structure The standby BOS needs to mimic the structure of the live BOS file system structure, yet we do not want it to replace the active file systems. To handle this, multibos creates a logical volume to match each of the BOS logical volumes. This includes not only the file systems, but also the JFSlogs and the boot logical volume. The names are modified by prepending a prefix of bos_ to the front of the standard logical volume names. For the standby BOS file systems, the file system mount point is changed to have a root path of /bos_inst/. If we mount the standby BOS, then we will use this modified path (beginning with /bos_inst). If we use the chrooted shell access or if we reboot to make the standby BOS the active BOS, then the (formally standby BOS) file systems will have a root path of /.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-35

Instructor Guide

Instructor notes: Purpose — Explain the structure of the standby BOS. Details — Additional information — Transition statement — Next, we will look at how we actually create a standby BOS using the multibos command.

9-36 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Setting up a standby BOS IBM Power Systems

• multibos –s –X • Pre-validate that there is sufficient rootvg free space • Uses default image.data (can customize with –i) • Special logical volumes and file systems created for the standby OS – bos_ – /bosinst/

• Copies BOS file systems – backup and restore • Non-BOS logical volumes are shared • Optional post-creation customization script • Bootlist updated (-t will block) – 1st: standby BOS – 2nd: active BOS © Copyright IBM Corporation 2009

Figure 9-14. Setting up a standby BOS

AN151.0

Notes: multibos space prerequisite Since the multibos will need sufficient space in rootvg to replicate the BOS logical volumes, you must ensure that there is enough free space in the rootvg to accommodate this. Display the current space used by these BOS logical volumes (remember that user defined LVs, even if in the rootvg will not be cloned). Then check that there is enough space in the rootvg disk. Note that the clone, by default, uses the default /image.data file. This means that the cloned LVs, are placed on the same disk as the source LVs. If you need to obtain space by extending the VG, then you will need to customize the image.data file that is used. The creation of the standby BOS will require additional space in the active BOS during the operation. As such, it is recommended that you allow the multibos command to increase the size of filesystems as needed (using the -X flag).

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-37

Instructor Guide

image.data customization If you want to change any characteristics of the cloned rootvg logical volumes or file systems, you can create a copy of the image to data file, edit the copy, and then specify that the multibos command should use your edited copy (by using the -i flag). For example, if you wanted the cloned LVs to be placed on a disk we added to the rootvg, then you would first run the mkszfile command (to obtain a current capture of the characteristics), copy the created /image.data to a different name, and edit it to specify that the cloned LVs should be placed on the additional disk. Then you need to point to that new file by running the multibos -i -Xs.

Which LVs are cloned? The multibos facility does not clone all the LVs in the rootvg (unlike the alternate disk facility). Some of the system defined logical volumes and all user defined logical volumes are accessed in common between the active BOS and the standby BOS. The logical volumes which are cloned are: • /dev/hd5 (BLV) • /dev/hd4 (root file system) • /dev/hd2 (/usr) • /dev/hd9var (/var) • /dev/hd10opt (/opt)

Tasks of multibos standby BOS creation The multibos command, when requested to create a standby BOS, will: • Collect the meta information about the rootvg • Create and define the standby logical volumes and file systems • Use the backup and restore commands to copy the files from the active BOS file systems to the standby file systems • Set the bootlist to have the standby BOS BLV first and the active BLV second • Run a post-creation customization script, if provided by the administrator

9-38 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain how to set up a standby BOS using the multibos command. Details — Additional information — Transition statement — Let’s briefly look at some of the operations that we can execute once we have a standby BOS created.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-39

Instructor Guide

Other multibos operations IBM Power Systems

• Customizing a standby BOS – multibos –c { -a | -b | -f } –l device – Can combine with standby BOS creation

• Mounting and unmounting a standby BOS – multibos –m – mounts to /bosinst/ – multibos -u

• Standby BOS shell – multibos –S – exit returns to active shell environment

• Booting to either standby BOS or active BOS – bosboot –m hdisk# blv# – shutdown -Fr

• Removing a standby BOS – multibos -R © Copyright IBM Corporation 2009

Figure 9-15. Other multibos operations

AN151.0

Notes: Customizing standby BOS You can use the multibos customization operation, with the -c flag, to update the standby BOS. The customization operation requires an source for the fix filesets (-l device or directory flag) and at least one installation option (installation by bundle, installation by fix, or update_all). The customization operation performs the following steps: 1. The standby BOS file systems are mounted, if not already mounted. 2. If you specify an installation bundle with the -b flag, the installation bundle is installed using the geninstall utility. The installation bundle syntax should follow geninstall conventions. If you specify the -p preview flag, geninstall will perform a preview operation.

9-40 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

3. If you specify a fix list, with the -f flag, the fix list is installed using the instfix utility. The fix list syntax should follow instfix conventions. If you specify the -p preview flag, then instfix will perform a preview operation. 4. If you specify the update_all function, with the -a flag, it is performed using the install_all_updates utility. If you specify the -p preview flag, then install_all_updates performs a preview operation. Note: It is possible to perform one, two, or all three of the installation options during a single customization operation. 5. The standby boot image is created and written to the standby BLV using the AIX bosboot command. You can block this step with the -N flag. You should only use the -N flag if you are an experienced administrator and have a good understanding of the AIX boot process. 6. Upon exit, if standby BOS file systems were mounted in step 1, they are unmounted.

Mounting and unmounting standby BOS It is possible to access and modify the standby BOS by mounting its file systems over the standby BOS file system mount points. The multibos mount operation, using the -m flag, mounts all standby BOS file systems in the appropriate order. The multibos unmount operation, using the -u flag, unmounts all standby BOS file systems in the appropriate order

Standby BOS shell The multibos shell operation -S flag enables you to start a limited interactive chroot shell with standby BOS file systems. This shell allows access to standby files using standard paths. For example, /bos_inst/usr/bin/ls maps to /usr/bin/ls within the shell. The active BOS files are not visible outside of the shell, unless they have been mounted over the standby file systems. Limit shell operations to changing data files, and do not make persistent changes to the kernel, process table, or other operating system structures. Only use the BOS shell if you are experienced with the chroot environment. The multibos shell operation performs the following steps: 1. The standby BOS file systems are mounted, if they are not already. 2. The chroot utility is called to start an interactive standby BOS shell. The shell runs until an exit occurs. 3. If standby BOS file systems were mounted in step 1, they are unmounted.

Alternate boot The bootlist command supports multiple BLVs. As an example, to boot from disk hdisk0 and BLV bos_hd5, you would enter the following:

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-41

Instructor Guide

# bootlist –m normal hdisk0 blv=bos_hd5 After the system is rebooted from the standby BOS, the standby BOS logical volumes are mounted over the usual BOS mount points, such as /, /usr, /var, and so on. The set of BOS objects, such as the BLV, logical volumes, file systems, and so on that are currently booted are considered the active BOS, regardless of logical volume names. The previously active BOS becomes the standby BOS in the existing boot environment. Some facilities have been blocked from alternating the BLV. When they tried to set the bootlist to the standby BLV, they would receive the following error: 0514-226 bootlist: Invalid attribute value for blv This is an indication that either the BLV is corrupted or the ODM entry for it is corrupted. A suggested solution is to rebuild the standby BLV. This requires a special bosboot flag: #bosboot -sd /dev/ipldevice -M standby -l bos_hd5

Removing standby BOS The remove operation, using the -R flag, deletes all standby BOS objects, such as BLV, logical volumes, file systems, and so on. You can use the remove operation to make room for a new standby BOS, or to clean up a failed multibos installation. The remove operation performs standby tag verification on each object before removing it. The remove operation will only act on BOS objects that multibos created, regardless of name or label. You always have the option of removing additional BOS objects using standard AIX utilities, such as rmlv, rmfs, rmps, and so on. The multibos remove operation performs the following steps: 1. All boot references to the standby BLV are removed. 2. The bootlist is set to the active BLV. You can skip this step using the -t flag. 3. Any mounted standby BLVs are unmounted. 4. Standby file systems are removed. 5. Remaining standby logical volumes are removed.

9-42 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain the various standby BOS operations. Details — Provide a brief description of what each of these options provide and why they might want to do them. Do not spend too much time here; they will experience these first-hand in the lab exercises. Additional information — Transition statement —

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-43

Instructor Guide

Exercise 9, topic 2: multibos IBM Power Systems

• Clone the active BOS • Apply a new service pack • Alternate boot between different levels

© Copyright IBM Corporation 2009

Figure 9-16. Exercise 9, topic 2: multibos

AN151.0

Notes:

9-44 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Details — Additional information — Transition statement —

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-45

Instructor Guide

9-46 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

9.3. JFS2 snapshot Instructor topic introduction What students will do — The students will identify how creating a JFS2 snapshot can be useful as a filesystem backup tool. How students will do it — Through lecture and checkpoint questions What students will learn — Students will learn how to create and manage JFS2 snapshots. How this will help students in their job — Understanding how to use a JFS2 snapshot can allow for consistent backup with only a very brief quiesce of application activity.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-47

Instructor Guide

Topic 3 objectives IBM Power Systems

After completing this topic, you should be able to:

• Create either an internal or external JFS2 snapshot

• List existing JFS2 snapshots

• Recover lost or corrupted files from a JFS2 snapshot

• Remove a JFS2 snapshot

• Increase the size of an external JFS2 snapshot © Copyright IBM Corporation 2009

Figure 9-17. Topic 3 objectives

AN151.0

Notes:

9-48 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Cover the topic objectives. Details — Additional information — Transition statement — Let’s define a JFS2 snapshot and what it provides us.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-49

Instructor Guide

JFS2 snapshot (1 of 2) IBM Power Systems

• A point-in-time image of a JFS2 file system – Source file system is called the snapped file system (snappedFS). – Snapshot creation is very quick and requires little space. – It can have multiple snapshots for a single snappedFS, each taken at a different point in time.

• A snapshot image of a JFS2 file system can be used to: – Restore files from a known point in time. – Access files or directories as they were at the time of the snapshot. – Back up a mounted snapshot to tape, DVD or a remote server.

© Copyright IBM Corporation 2009

Figure 9-18. JFS2 snapshot (1 of 2)

AN151.0

Notes: JFS2 snapshot A point-in-time image for a JFS2 file system is called a snapshot. The file system which is the source of this point-in-time image is referred to as the snapped file system or snappedFS. The snapshot view of the data remains static and retains the same security permissions that the original snappedFS had when the snapshot was made. Also, a JFS2 snapshot can be created without unmounting the file system, or quiescing the file system (though it may be advisable for some application to briefly quiesce during the snapshot). A snapshot can be used to access files or directories as they existed when the snapshot was taken. The snapshot can then be used to create a backup of the file system at the given point in time that the snapshot was taken. The snapshot also provides the capability to access files or directories as they were at the time of the snapshot.

9-50 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-51

Instructor Guide

Instructor notes: Purpose — Describe the JFS2 snapshot function. Details — Additional information — Transition statement — Let’s see how to create a JFS2 snapshot.

9-52 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

JFS2 snapshot (2 of 2) IBM Power Systems

• Snapshot stays stable while snappedFS is changing. • Using snapshot reduces application downtime. – Automatically freezes I/O while snapshot is created. – If intolerant of fuzzy backups, briefly quiesce the application.

• A snapshot typically needs 2% - 6% of snappedFS space requirements. There are two options: – Separate logical volume (ppsize unit of allocation) – Allocate space out of snappedFS (called an internal snapshot)

• At snapshot creation, only structure information is included. • When a write or delete occurs in the snappedFS, the affected blocks are copied into existing snapshots

© Copyright IBM Corporation 2009

Figure 9-19. JFS2 snapshot (2 of 2)

AN151.0

Notes: How the JFS2 snapshot works During creation of a snapshot, the snappedFS I/O will be momentarily frozen, and all new writes are blocked. This ensures that the snapshot really is a consistent view of the file system at the time of snapshot. When a snapshot is initially created, only structure information is included. When a write or delete occurs, then the affected blocks are copied into the snapshot file system. Every read of the snapshot will require a lookup to determine whether the block needed should be read from the snapshot or from the snappedFS. For instance, the block will be read from the snapshot file system if the block has been changed since the snapshot took place. If the block is unchanged since the snapshot, it will be read from the snappedFS. There are two types of JFS2 snapshots: internal and external. A JFS2 internal snapshot uses space within the snappedFS. A JFS2 external snapshot is created in a separate © Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-53

Instructor Guide

logical volume from the file system. The external snapshot can be mounted separately from the file system at its own unique mount point. A given file system can only use either internal or external snapshots; it cannot mix the different types.

Space requirements for a snapshot Typically, a snapshot will need 2-6% of the space needed for the snappedFS. In the case of a highly active snappedFS, this estimate could rise to 15%. This space is needed if a block in the snappedFS is either written to or deleted. If this happens, the block is copied to the snapshot. Any blocks associated with new files written after the snapshot was taken will not be copied to the snapshot, as they were not current at the time of the snapshot and therefore not relevant. If the snapshot runs out of space, all snapshots associated with the snappedFS will be discarded and an entry will be made in the AIX error log. If a snapshot file system fills up before a backup is taken, the backup is not complete and will have to be rerun from a new snapshot, with possibly a larger size, to allow for changes in the snappedFS.

9-54 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Continue basic discussion of a JSF2 snapshot. Details — Additional information — A JFS2 snapshot is a file system that maps its contents to the contents of the source snappedfs. If the snappedfs is not modified, the snapshot does not store any of the files in its own physical partition allocations, and has content which is identical to the snappedfs. If the snappedfs is modified, the original value of the affected blocks are saved in the allocated storage of the snapshot file system. When the snapshot is modified, it either retrieves the data from the snappedfs (if the data has not been modified) or it retrieves the data from its own disk storage (if the snappedfs data was changed). So, the snapshot always gives us the state of the data at the time the snapshot was created, but only uses enough storage to hold the data that has been changed in the snappedfs. When allocating space for a snapshot logical volume, we can typically allocate as little as 2-6% of the size of the snappedfs (depending on the volatility of the snappedfs). Note that when compared to using split mirror copies, the jfs2 snapshot has very little overhead. We do not have to create a total copy of the existing data when creating the snapshot (as we do in creating mirror copies) and instead of doing a resync of the data before the next backup (as we need to do with the spit mirror when rejoining), we simply eliminate the snapshot and create a new one when needed for the next backup. Transition statement — Let’s take a closer looks at the mechanism behind a JFS2 snapshot.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-55

Instructor Guide

JFS2 snapshot mechanism (1 of 2) IBM Power Systems

snappedFS

inode2

inode1

snapshot

inode1

inode2

Initially, the snapshot only points to data extents in snappedFS

© Copyright IBM Corporation 2009

Figure 9-20. JFS2 snapshot mechanism (1 of 2)

AN151.0

Notes: Data blocks in snappedFS The diagram, at the top, shows two inodes anchoring file data blocks. The inode accesses the data blocks through a binary tree structure.

Data blocks in JFS2 snapshot The diagram, at the bottom, shows the structure initially created in a JFS2 snapshot. The snapshot has the metadata, but all of the pointers refer back to the snappedFS data blocks. Thus, the snapshot requires very little space. Initially, data retrieved from a mounted snapshot is identical to the current data in the snappedFS.

9-56 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain how a snapshot accesses the snappedFS data blocks. Details — Additional information — Transition statement — Let’s look at what happens as data blocks in the snappedFS are modified.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-57

Instructor Guide

JFS2 snapshot mechanism (2 of 2) IBM Power Systems

snappedFS

inode2

inode1

snapshot

inode1

inode2

Original of modified data copied to snapshot

© Copyright IBM Corporation 2009

Figure 9-21. JFS2 snapshot mechanism (2 of 2)

AN151.0

Notes: Data blocks in snappedFS after data changes In the diagram, at the top, some of the data blocks have been modified. Because the kernel file system logic knows that there is a snapshot for this file systems, it copies the original data blocks to the snapshot before modifying (or deleting) those data blocks in the snappedFS.

Data blocks in JFS2 snapshot after data changes The diagram, at the bottom, shows that the inode tree structure points to the copies of the original data (now stored in the snapshot) rather than referring back to the snappedFS data blocks. This ensures that access to the snapshot always returns the original data (from the time the snapshot was created) for the snappedFS.

9-58 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain how the original version of the data is copied to the snapshot when modified. Details — Additional information — Transition statement — Let’s look at how we can implement the JFS2 snapshot, starting with the SMIT facility.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-59

Instructor Guide

JFS2 snapshot SMIT menu IBM Power Systems

# smit jfs2 Enhanced Journaled File Systems Move cursor to desired item and press Enter. . . . List Snapshots for an Enhanced Journaled File System Create Snapshot for an Enhanced Journaled File System Mount Snapshot for an Enhanced Journaled File System Remove Snapshot for an Enhanced Journaled File System Unmount Snapshot for an Enhanced Journaled File System Change Snapshot for an Enhanced Journaled File System Rollback an Enhanced Journaled File System to a Snapshot

© Copyright IBM Corporation 2009

Figure 9-22. JFS2 snapshot SMIT menu

AN151.0

Notes: The various JFS2 snapshot operations can be executed from SMIT dialog panels. Shown is the SMIT JFS2 menu, with selective display of only those menu items which are JFS2 snapshot related.

9-60 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Show how all the JFS2 snapshot functions can be accessed from the SMIT JFS2 menu. Details — Additional information — Transition statement — Let’s first look at how we create an external snapshot.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-61

Instructor Guide

Creating snapshots (external) IBM Power Systems

# snapshot -o snapfrom=snappedFS -o size=Size # snapshot -o snapfrom=/home/myfs -o size=16M -OR-

# smit crsnapj2 Create Create Snapshot Snapshot for for an an Enhanced Enhanced Journaled Journaled File File System System in New Logical Volume in New Logical Volume [Entry [Entry Fields] Fields] /home/myfs /home/myfs

File File System System Name Name SIZE of snapshot SIZE of snapshot Unit Unit Size Size ** Number Number of of units units

Megabytes Megabytes [500] [500]

++ ##

Creating a snapshot as part of the mount option: # mount –o snapto=/dev/mysnaplv /home/myfs © Copyright IBM Corporation 2009

Figure 9-23. Creating snapshots (external)

AN151.0

Notes: Creating an external snapshot for a JFS2 file system that is already mounted When creating a new external snapshot, you must provide the size of the logical volume allocation (unless using a pre-existing LV). If you want to create a snapshot for a mounted JFS2 file system, you can use the following method: • To create a snapshot in a new logical volume, specifying the size: # snapshot -o snapfrom=snappedFS -o size=Size For example: # snapshot -o snapfrom=/home/myfs -o size=16M This will create a 16 MB logical volume and create a snapshot for the /home/myfs file system on the newly created logical volume. 9-62 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Creating an internal snapshot for a JFS2 file system that is already mounted If you want to create an internal snapshot for a mounted JFS2 file system, you can use the following method: • To create a internal snapshot, specify a snapshot name: # snapshot -o snapfrom=snappedFS -n snapshotname For example: # snapshot -o snapfrom=/home/myfs -n mysnap This will create a snapshot named mysnap which is internal to the snappedFS /home/myfs.

Creating an internal snapshot for a JFS2 file system that is not mounted First, it is important to know that the you cannot use internal snapshots unless the file system was enabled to support them at file system creation. • To enable the file system to support internal snapshots (at creation time only): # crfs –a isnapshot=yes .... The mount option, -o snapto=snapshotlv, can be used to create a snapshot for a JFS2 file system that is not currently mounted: # mount -o snapto=snapshotLV snappedFS MountPoint or # mount -o snapto=snapshotname snappedFS MountPoint If the snapto value starts with a slash, then it is assumed to be a special device file for an existing logical volume where the snapshot should be created. If the snapto value does not start with a slash, then it is assumed to be the name of an internal snapshot to be created. For example: # mount -o snapto=/dev/mysnaplv /dev/fslv00 /home/myfs This will mount the file system contained on the /dev/fslv00 to the mount point of /home/myfs and then proceeds to create a snapshot for the /home/myfs file system in the logical volume /dev/mysnaplv.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-63

Instructor Guide

Creating a snapshot using an existing logical volume If you want to control details of the logical volume which holds an external snapshot, you can use the following method: • To create a snapshot using an existing logical volume: # snapshot -o snapfrom=snappedFS snapshotLV For example: # snapshot -o snapfrom=/home/myfs /dev/mysnaplv This will create a snapshot for the /home/myfs file system on the /dev/mysnaplv logical volume, which already exists.

9-64 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Describe how to create a JFS2 snapshot Details — Additional information — Transition statement — Let’s take a look at how you create a JFS2 internal snapshot.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-65

Instructor Guide

Creating snapshots (internal) IBM Power Systems

# snapshot -o snapfrom=snappedFS –n snapshotName # snapshot -o snapfrom=/home/myfs –n mysnap -OR# smit crintsnapj2 Create Create Snapshot Snapshot for for an an Enhanced Enhanced Journaled Journaled File File System System in in File File System System [Entry [Entry Fields] Fields] /home/myfs /home/myfs [mysnap] [mysnap]

File File System System Name Name ** Snapshot Name Snapshot Name

Internal snapshot attribute must be set to yes on creation of the filesystem: # smitcrfs (in dialog panel: Allow Internal Snapshots [yes]) -or# crfs –a isnapshot=yes © Copyright IBM Corporation 2009

Figure 9-24. Creating snapshots (internal)

AN151.0

Notes: Internal JFS2 snapshot considerations: • Internal snapshots are preserved when the logredo command runs on a JFS2 file system with an internal snapshot. • Internal snapshots are removed if the fsck command has to modify a JFS2 file system to repair it. • If an internal snapshot runs out of space, or if a write to an internal snapshot fails, all internal snapshots for that snappedFS are marked invalid. Further access to the internal snapshots will fail. These failures write an entry to the error log. • Internal snapshots are not separately mountable. • Internal snapshots are not compatible with AIX releases prior to AIX 6.1. A JFS2 file system created to support internal snapshots cannot be modified on an earlier release of AIX. • A JFS2 file system with internal snapshots cannot be defragmented. 9-66 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain how to create a JFS2 internal snapshot. Details — Additional information — Transition statement — Later, we will want to identify if a file system has a snapshot and obtain information about those snapshots.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-67

Instructor Guide

Listing snapshots IBM Power Systems

# smit lssnap (and select file system from list) -OR## snapshot snapshot -q -q /home/myfs2 /home/myfs2 Snapshots Snapshots for for /home/myfs2 /home/myfs2 Current Name Time Current Name Time mysnap Wed mysnap Wed 19 19 Nov Nov 08:44:33 08:44:33 2008 2008 mysnap2 Fri 21 Nov 09:33:33 2008 mysnap2 Fri 21 Nov 09:33:33 2008 ** mysnap3 Mon mysnap3 Mon 24 24 Nov Nov 14:03:18 14:03:18 2008 2008 ## snapshot snapshot -q -q /home/myfs /home/myfs Snapshots Snapshots for for /home/myfs /home/myfs Current Location Current Location 512-blocks 512-blocks Free Free Time Time ** /dev/fslv06 262144 261376 Wed /dev/fslv06 262144 261376 Wed May May 66 18:15:11 18:15:11 2009 2009

© Copyright IBM Corporation 2009

Figure 9-25. Listing snapshots

AN151.0

Notes: The snapshot –q option can be used display the snapshots related to the specified file system. If the file system uses internal snapshots, then the report provides the snapshot names and creation times. The * indicates the current snapshot. If the file system uses external snapshots, then the report provides, for each snapshot, the logical volume special device file, the snapshot size, how much space is free in the snapshot, and the creation time.

9-68 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain how to display snapshot information. Details — Additional information — Transition statement — Let’s look at how we can use an existing snapshot to recover files which were inadvertently deleted or incorrectly modified.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-69

Instructor Guide

Using a JFS2 snapshot to recover IBM Power Systems

• Recover entire file system to point of snapshot creation: # umount /home/myfs # rollback /home/myfs /dev/mysnaplv

(for external)

# rollback –n mysnap /home/myfs

(for internal)

• Recover individual files from JFS2 snapshot image: –

Mount the snapshot (if external): # mount -v jfs2 -o snapshot /dev/mysnaplv /mntsnapshot

–

Change to the directory that contains the snapshot: # cd /mntsnapshot # cd /home/mfs/.snapshot/mysnap (if internal)

–

Copy the accurate file to overwrite the corrupted one: # cp myfile /home/myfs (Copies only the file named myfile) © Copyright IBM Corporation 2009

Figure 9-26. Using a JFS2 snapshot to recover

AN151.0

Notes: rollback The rollback command is an interface to revert a JFS2 file system to a point-in-time snapshot. The snappedFS parameter must be unmounted before the rollback command is run and remains inaccessible for the duration of the command. Any snapshots that are taken after the specified snapshot (snapshotObject for external or snapshotName for internal) are removed. The associated logical volumes are also removed for external snapshots.

Recover individual files If you wish to restore individual files back to their original state, then you can mount the snapshot and then manually copy the files back over. If the snapshot is internal, then no mount is necessary. Instead, you need to explicitly specify the path to the snapshot (/snappedFS-mount-point/.snapshot/snapshot-name) on a change directory command.

9-70 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

As with any file copying, be careful about changing the nature of the file (ownership, permission, sparseness, and so on). Using the backup and restore utilities to implement a copy of files is often a safer technique.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-71

Instructor Guide

Instructor notes: Purpose — Explain how to use a JFS2 snapshot to recover data. Details — Additional information — Transition statement — While using a snapshot directly to recover data is useful, it does not address a situation where there is a situation in which the disk holding the snappedFS is lost, much less a site disaster recovery situation. Let’s look at how we can use a snapshot as a stable source for a backup to media or to a network server.

9-72 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Using a JFS2 snapshot to back up IBM Power Systems

• The JFS2 snapshot can be a stable source for backup to media. • Mount the external snapshot and use relative path backup: # mount -v jfs2 -o snapshot /dev/mysnaplv /mntsnapshot # cd /mntsnapshot # find . | backup –i –d /servermnt/backup52

• cd to internal snapshot and use relative path backup: # cd /home/myfs/.snapshot/mysnap # find . | backup –i –d /servermnt/backup52

• To create snapshot and backup in one operation: # backsnap -m MountPoint -s Size BackupOptions snappedFS # backsnap –n snapshotname BackupOptions snappedFS # backsnap -m /mntsnapshot -s size=16M -i -f/dev/rmt0 \ /home/myfs

© Copyright IBM Corporation 2009

Figure 9-27. Using a JFS2 snapshot to back up

AN151.0

Notes: Creating a snapshot and backup in one operation The backsnap command provides an interface to create a snapshot for a JFS2 file system and perform a back up of the snapshot. The command syntax is: # backsnap -m MountPoint -s Size BackupOptions snappedFS For example: # backsnap -m /mntsnapshot -s size=16M -i -f/dev/rmt0 \ /home/myfs This will create a 16 MB logical volume and create a snapshot for the /home/myfs file system on the newly created logical volume. It then mounts the snapshot logical volume on /mntsnapshot. The remaining arguments are passed to the backup command. In this case, the files and directories in the snapshot will be backed up by name (-i) to /dev/rmt0.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-73

Instructor Guide

Instructor notes: Purpose — Explain how to use snapshot with a backup utility. Details — Additional information — Transition statement — If you make a mistake and underestimate how quickly data is modified or deleted, then you can have space allocation problems related to the JFS2 snapshot allocation. Let’s look at how to monitor and manage that situation.

9-74 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

JFS2 snapshot space management IBM Power Systems

• List snapshots for the snappedFS # snapshot –q snappedFS

• External snapshot: –

The snapshot report identifies the size and amount of free space.

–

If snapshot needs more space: # snapshot –o size=+1 snapshotLV

• Internal snapshot: –

Shares logical volume with the snappedFS # df –m snappedFS

–

If snappedFS is out of space, try to free up space – possibly delete old snapshots. #snapshot –d –n snapshot_name snappedFS © Copyright IBM Corporation 2009

Figure 9-28. JFS2 snapshot space management

AN151.0

Notes: It is useful to be able to identify situation where a snapshot is growing large. If a snapshot runs out of space then all snapshots are invalidated and become unusable. If dealing with an internal snapshot, the snapshots can contribute to the entire filesystem running out of space. To monitor an external snapshot, use the query option of the snapshot command. An alternative would be to mount the snapshot and use the df command, but that is more complicated. If an external snapshot needs more room, you can dynamically increase the size of the snapshot logical volume by using the size option of the snapshot command. For an internal snapshot, there is no mechanism for identifying the space usage of the snapshots. Instead, you monitor the size of the snappedFS. When a file system is running out of space, one way to free space is to delete old snapshots. Keeping many generations of snapshots can be useful, but it can also be expensive in terms of space usage. © Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-75

Instructor Guide

Instructor notes: Purpose — Explain how to manage snapshot space allocation issues. Details — Additional information — Transition statement — Let’s review what we have covered.

9-76 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Exercise 9, topic 3: JFS2 snapshot IBM Power Systems

• Create a JFS snapshot • Recover files from the snapshot

© Copyright IBM Corporation 2009

Figure 9-29. Exercise 9, topic 3: JFS2 snapshot

AN151.0

Notes:

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-77

Instructor Guide

Instructor notes: Purpose — Details — Additional information — Transition statement —

9-78 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Checkpoint (1 of 4) IBM Power Systems

1. Name the two ways alternate disk installation can be used. _______________________________________________________ _______________________________________________________

2. What are the advantages of alternate disk rootvg cloning? _______________________________________________________ _______________________________________________________

3. How do you remove an alternate rootvg? _______________________________________________________

4. Why should you not use exportvg with an alternate disk VG? ________________________________________________________

© Copyright IBM Corporation 2009

Figure 9-30. Checkpoint (1 of 4)

AN151.0

Notes:

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-79

Instructor Guide

Instructor notes: Purpose — Details —

Checkpoint solutions (1 of 4) IBM Power Systems

1. Name the two ways alternate disk installation can be used. Installing a mksysb image on another disk Cloning the current running rootvg to an alternate disk

2. What are the advantages of alternate disk rootvg cloning? Creates an online backup Allows maintenance and updates to software on the alternate disk helping to minimize down time

3. How do you remove an alternate rootvg? alt_disk_install -X

4. Why should you not use exportvg with an alternate disk VG? This will remove rootvg related entries from /etc/filesystems.

© Copyright IBM Corporation 2009

Additional information — Transition statement —

9-80 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Checkpoint (2 of 4) IBM Power Systems

5. True or False: multibos provides for booting between alternate operating system environments within a single rootvg. 6. True or False: A standby BOS can only be accessed by changing the bootlist and then rebooting. 7. True or False: New fixpacks can be applied to a standby BOS with only a performance impact to the active BOS during the operation.

© Copyright IBM Corporation 2009

Figure 9-31. Checkpoint (2 of 4)

AN151.0

Notes:

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-81

Instructor Guide

Instructor notes: Purpose — Details —

Checkpoint solutions (2 of 4) IBM Power Systems

5. True or False: multibos provides for booting between alternate operating system environments within a single rootvg. 6. True or False: A standby BOS can only be accessed by changing the bootlist and then rebooting. 7. True or False: New fixpacks can be applied to a standby BOS with only a performance impact to the active BOS during the operation.

© Copyright IBM Corporation 2009

Additional information — Transition statement —

9-82 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Checkpoint (3 of 4) IBM Power Systems

8. True or False: Creating a JFS2 snapshot requires a long time and a lot of disk space. 9. What is needed to change from external snapshots to internal snapshots? _______________________________________________________ _______________________________________________________

10. How can we tell if an external snapshot is about to fill up? _______________________________________________________ _______________________________________________________

© Copyright IBM Corporation 2009

Figure 9-32. Checkpoint (3 of 4)

AN151.0

Notes:

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-83

Instructor Guide

Instructor notes: Purpose — Review and test the students, understanding of this section. Details — A suggested approach is to give the students about five minutes to answer the questions on this page. Then, go over the questions and answers with the class.

Checkpoint solutions (3 of 4) IBM Power Systems

8. True or False: Creating a JFS2 snapshot requires a long time and a lot of disk space. 9. What is needed to change from external snapshots to internal snapshots? If already internal snapshot enabled – delete all external snapshots and start creating internal snapshots. If not already enabled, you additionally need to back up and delete the file system, before redefining it with internal snapshot enabled (isnapshot=yes) and restoring from backup.

10. How can we tell if an external snapshot is about to fill up? Run snapshot –q filesystem-name. The amount of free space will be listed.

© Copyright IBM Corporation 2009

Additional information — Transition statement —

9-84 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Checkpoint (4 of 4) IBM Power Systems

11. Which two alternate disk installation techniques are available? ____________________________________________ ____________________________________________

12. True or False: multibos requires cloning all of the logical volumes in the active rootvg. 13. True or False: JFS2 snapshots require little or no quiescing of application activity to obtain a stable point in time image of the snapped file system.

© Copyright IBM Corporation 2009

Figure 9-33. Checkpoint (4 of 4)

AN151.0

Notes:

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-85

Instructor Guide

Instructor notes: Purpose — Review and test the students, understanding of this unit. Details — A suggested approach is to give the students about five minutes to answer the questions on this page. Then, go over the questions and answers with the class.

Checkpoint solutions (4 of 4) IBM Power Systems

11. Which two alternate disk installation techniques are available? Installing a mksysb on another disk Cloning the rootvg to another disk

12. True or False: multibos requires cloning all of the logical volumes in the active rootvg. 13. True or False: JFS2 snapshots require little or no quiescing of applications to obtain a stable point in time image of the snapped file system.

© Copyright IBM Corporation 2009

Additional information — Transition statement — Now, let’s summarize the unit.

9-86 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit summary IBM Power Systems

Having completed this unit, you should be able to: • Use alternate disk installation techniques for applying AIX maintenance • Use multibos to apply AIX maintenance • Use JFS2 snapshot to back up file system data

© Copyright IBM Corporation 2009

Figure 9-34. Unit summary

AN151.0

Notes: Alternate disk installation techniques are available: • Installing a mksysb onto an alternate disk • Cloning the current rootvg onto an alternate disk Alternate BOS can be created and maintenance applied JFS2 snapshots are a great way to capture a file system image at a point in time with minimal impact to the application.

© Copyright IBM Corp. 2009

Unit 9. Install and backup techniques Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-87

Instructor Guide

Instructor notes: Purpose — Summarize the unit. Details — Present the highlights from the unit. Additional information — Transition statement — Let’s continue with the next unit.

9-88 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit 10. Workload partitions What this unit is about This unit covers advanced aspects of AIX Workload Partition management. It teaches the installation and use of the AIX Workload Partition Manager to define and manage WPARs. It then teaches how to use WPAR Manager to relocate a WPAR to a different AIX system.

What you should be able to do After completing this unit, you should be able to: • Describe WPAR Manager concepts • Install WPAR Manager and Agent Manager on server LPAR • Install WPAR Agent on client LPAR • Create, start, and manage a WPAR • Relocate a WPAR from source client LPAR to destination LPAR client

How you will check your progress Accountability: • Checkpoint • Machine exercises

Reference Online

AIX Version 6.1 Command Reference volumes 1-6

Online

AIX Version 6.1 IBM Workload Partitions for AIX

Online

AIX Version 6.1 IBM PowerVM Workload Partitions Manager for AIX

Note: References listed as “online” above are available at the following address: http://publib.boulder.ibm.com/infocenter/systems

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-1

Instructor Guide

• SG24-7656 Workload Partition Management in IBM AIX Version 6.1 (Redbook)

10-2 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit objectives IBM Power Systems

After completing this unit, you should be able to: • Describe WPAR Manager concepts • Install WPAR Manager and Agent Manager on server LPAR • Install WPAR Agent on client LPAR • Create, start, and manage a WPAR • Relocate a WPAR from source client LPAR to destination LPAR client © Copyright IBM Corporation 2009

Figure 10-1. Unit objectives

AN151.0

Notes:

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-3

Instructor Guide

Instructor notes: Purpose — Explain unit objectives. Details — Additional information — Transition statement — Let’s first review some WPAR basics.

10-4 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

10.1.Workload partitions review Instructor topic introduction What students will do — Review some basic WPAR concepts taught in the prerequisite AN12 course. How students will do it — Listen to the lecture, ask questions. What students will learn — They will reinforce prerequisite knowledge they should already have learned. How this will help students on their job — This topic reinforces concepts necessary for understanding the material taught in the unit.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-5

Instructor Guide

Topic 1 objectives IBM Power Systems

After Aftercompleting completingthis thistopic, topic,you youshould shouldbe beable ableto: to: •• Explain Explainthe theprimary primarybenefits benefitsof ofusing usingWPARs WPARs •• Explain Explainthe thedifference differencebetween betweenaasystem systemWPAR WPARand andan an application applicationWPAR WPAR •• Explain Explainthe thedifference differencebetween betweenand andreasons reasonsfor forshared shared namefs namefsmounts mountsand andprivate privateJFS2 JFS2mounts mountsfor forsystem systemWPARs WPARs

© Copyright IBM Corporation 2009

Figure 10-2. Topic 1 objectives

AN151.0

Notes:

10-6 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain the objectives of this topic. Details — Additional information — Transition statement — Let’s start with a review of what a WPAR provides.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-7

Instructor Guide

AIX workload partitions (WPAR) review IBM Power Systems

Ɣ WPARs reduce administration – By reducing the number of AIX images to maintain

Ɣ Each WPAR is isolated – Appears as a separate instance of AIX – Regulated share of system resources – May have unique network and file systems – Separate administrative and security domain

Ɣ WPARs can be relocated

AIX 6 instance Workload Partition

Workload Partition Billing

Application Server Workload Partition Workload Partition

Web Server

Test

Workload Partition

BI

– Load balancing – Server maintenance © Copyright IBM Corporation 2009

Figure 10-3. AIX workload partitions (WPAR) review

AN151.0

Notes: Introduction Workload Partition (WPAR) is a software-base virtualization capability of AIX 6 that provides a new capability to reduce the number of AIX operating system images that need to be maintained when consolidating multiple workloads on a single server. WPARs provide a way for clients to run multiple applications inside the same instance of an AIX operating system while providing security and administrative isolation between applications. WPARs complement logical partitions and can be used in conjunction with logical partitions. WPAR can improve administrative efficiency by reducing the number of AIX operating system instances that must be maintained and can increase the overall utilization of systems by consolidating multiple workloads on a single system and is designed to improve cost of ownership. WPARs allow users to create multiple software-based partitions on top of a single AIX instance. This approach enables high levels of flexibility and capacity utilization for applications executing heterogeneous workloads, and simplifies patching and other operating system maintenance tasks. 10-8 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty WPARs provide unique partitioning values. • Smaller number of OS images to maintain • Performance-efficient partitioning through sharing of application text and kernel data and text • Fine-grain partition resource controls • Simple, lightweight, centralized partition administration WPARs enable multiple instances of the same application to be deployed across partitions. • Many WPARs running DB2, WebSphere, or Apache in the same AIX image • Different capability from other partitioning technologies • Greatly increases the ability to consolidate workloads because often the same application is used to provide different business services • Enables the consolidation of separate discrete workloads that require separate instances of databases or applications into a single system or LPAR • Reduces costs through optimized placement of workloads between systems to yield the best performance and resource utilization WPAR technology enables the consolidation of diverse workloads on a single server increasing server utilization rates. • Hundreds of WPARs can be created, far exceeding the capability of other partitioning technologies. • WPARs support fast provisioning and fast resource adjustments in response to both normal or unexpected demands. WPARs can be created and resource controls modified in seconds. • WPAR resource controls enable the over-provisioning of resources. If a WPAR is below allocated levels, the unused allocation is automatically available to other WPARs. • WPARs support the live migration of a partition in response to normal or unexpected demands. • All of the above capabilities enable more consolidation on a single server or LPAR. WPARs enable development, test, and production cycles of one workload to be placed on a single system. • Different levels of applications (production1, production2,test1, test2) may be deployed in separate WPARs. • Quick and easy roll out and roll back to production environments. • Reduced costs through the sharing of hardware resources.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-9

Instructor Guide

• Reduced costs through the sharing of software resources such as the operating system, databases, and tools. A WPAR supports the control and the management of its resources, CPU, memory, and processes. That means that you can assign specific fractions of CPU and memory to each WPAR and this is done by WLM running on the partition. Most resource controls are similar to those supported by the Workload Manager. You can specify shares_CPU which is the number of processor shares available for a workload partition, or you can specify minimum and maximum percentages. The same is true for memory utilization. There are also WPAR limits for run-away situations (for example: total processes). When you create a WPAR, a WLM class is created (having the same name as the WPAR). All processes running in the partition inherit this classification. You can see the statistics and classes using the wlmstat command which has been enhanced to display WPAR statistics. wlmstat -@ 2 --shows the WPAR classes. Also, you cannot use WLM inside the WPAR to manage its resources.

10-10 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Review what a WPAR provides. Details — Additional information — Transition statement — Next, we need to be clear about the difference between a system WPAR and an application WPAR.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-11

Instructor Guide

System WPAR and application WPAR IBM Power Systems

• System WPAR – Autonomous virtual system environment • Shared file systems (with the global environment) : /usr and /opt • Private file systems for the WPAR’s own use: /, /var and /tmp • Unique set of users, groups, and network addresses

– Can be accessed through: • Network protocols (for example: telnet or ssh) • Log in from the global environment using the clogin command

– Can be stopped and restarted

• Application WPAR – Isolate an individual application – Light weight; quick to create and remove • Created with wparexec command • Removed when stopped • Stopped when the application finished

Create and run

Stop and remove

– Shares file systems and devices with the global environment – No user log in capabilities © Copyright IBM Corporation 2009

Figure 10-4. System WPAR and application WPAR

AN151.0

Notes: System workload partition System workload partitions are autonomous virtual system environments with their own private root file systems, users and groups, login, network space, and administrative domain. A system WPAR represents a partition within the operating system isolating runtime resources such as memory, CPU, user information, or file system to specific application processes. Each system WPAR has its own unique set of users, groups and network addresses. The systems administrator accesses the WPAR through the administrator console or through regular network tools such as telnet or ssh. Inter-process communication for a process in a WPAR is restricted to those processes in the same WPAR. System workload partitions provide a complete virtualized OS environment, where multiple services and applications run. It takes longer to create a system WPAR compared to an application WPAR as it builds its file systems. The system WPAR is removed only when

10-12 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

requested. It has its own root user, users, and groups, and own system services like inetd, cron, syslog, and so forth. A system WPAR does not share writable file systems with other workload partitions or the global environment. It is integrated with the Role Based Access control (RBAC). Application workload partition • Normal WPAR except that there is no file system isolation • Login not supported • Internal mounts not supported • Target: Lightweight process group for mobility Application workload partitions do not provide the highly virtualized system environment offered by system workload partitions, rather they provide an environment for segregation of applications and their resources to enable checkpoint, restart, and relocation at the application level. The application WPAR represents a shell or an envelope around a specific application process or processes which leverage shared system resources. It is lightweight (that is, quick to create and remove and does not take lots of resources) since it uses the global environment system file system and device resources. Once the application process or processes are finished, the WPAR is stopped. The user cannot log in inside the application WPAR using telnet or ssh from the global environment. If you need to access the application in some way this must be achieved by some application-provided mechanism. All file systems are shared with the global environment. If an application is using devices it uses global environment devices. The wparexec command builds and starts an application workload partition, or creates a specification file to simplify the creation of future application workload partitions. An application workload partition is an isolated execution environment that might have its own network configuration and resource control profile. Although the partition shares the global environment file system space, the processes running therein are only visible to other processes in the same partition. This isolated environment allows process monitoring, gathering of resource, accounting, and auditing data for a predetermined cluster of applications. The wparexec command invokes and monitors a single application within this isolated environment. The wparexec command returns synchronously with the return code of this tracked process only when all of the processes in the workload partition terminate. For example, if the tracked process creates a daemon and exits with the 0 return code, the wparexec command blocks until the daemon and all of its children terminate, and then exits with the 0 return code, regardless of the return code of the daemon or its children.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-13

Instructor Guide

Instructor notes: Purpose — Review the difference between a system WPAR and an application WPAR. Details — Additional information — Transition statement — In the lab, we will be creating a relocatable system WPAR. To do this, we need to be very clear about how shared and private file systems are accessed in the WPAR.

10-14 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

System WPAR file systems space IBM Power Systems

• AIX 6 global environment

{sys02_p2} / # mount Node mounted mounted over --------------------- --------------/dev/hd4 / /dev/hd2 /usr /dev/hd9var /var /dev/hd3 /tmp /dev/hd1 /home /proc /proc /dev/hd10opt /opt /dev/fslv01 /wpars/wparA /dev/fslv02 /wpars/wparA/home /opt /wpars/wparA/opt /proc /wpars/wparA/proc /dev/fslv03 /wpars/wparA/tmp /usr /wpars/wparA/usr /dev/fslv04 /wpars/wparA/var

• System WPAR – /usr Æ namefs, nfs mount or local – /opt Æ namefs, nfs mount or local – /proc Æ namefs

{Marie} / # mount Node mounted -------------------/dev/fslv01 /dev/fslv02 /opt /proc /dev/fslv03 /usr /dev/fslv04

mounted over --------------/ /home /opt /proc /tmp /usr /var

vfs -----jfs jfs jfs jfs jfs procfs jfs jfs2 jfs2 namefs namefs jfs2 namefs jfs2

vfs -----jfs2 jfs2 namefs namefs jfs2 namefs jfs2

date options --------------------Aug 27 14:05 rw,log=/dev/hd8 Aug 27 14:05 rw,log=/dev/hd8 Aug 27 14:06 rw,log=/dev/hd8 Aug 27 14:06 rw,log=/dev/hd8 Aug 27 14:06 rw,log=/dev/hd8 Aug 27 14:06 rw Aug 27 14:06 rw,log=/dev/hd8 Sep 03 14:55 rw,log=INLINE Sep 03 14:55 rw,log=INLINE Sep 03 14:55 ro Sep 03 14:55 rw Sep 03 14:55 rw,log=INLINE Sep 03 14:55 ro Sep 03 14:55 rw,log=INLINE

date options -------------Sep 03 14:55 rw,log=INLINE Sep 03 14:55 rw,log=INLINE Sep 03 14:55 ro Sep 03 14:55 rw Sep 03 14:55 rw,log=INLINE Sep 03 14:55 ro Sep 03 14:55 rw,log=INLINE

© Copyright IBM Corporation 2009

Figure 10-5. System WPAR file systems space

AN151.0

Notes: Storage level access in a system WPAR is primarily through set of file systems assigned to the WPAR at creation and mounted within the WPAR during activation. A system WPAR operates within a localized view of these file systems, by default: / /usr /opt /tmp /var /home Each WPAR must have a writable / (root) directory. The other system directories (/tmp, /var, /home) may be simple subdirectories under that / directory or they may be separate file systems mountable under /. The default storage model is to have each of these system

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-15

Instructor Guide

directories established as separate file systems mounted into the WPAR. These may also be NFS-mounted from an NFS server.

10-16 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Review the difference between shared and private file systems. Details — Point out how the non-shared file systems are normally mounted JFS2 file systems defined in the global environment and made to look like the base AIX file systems though the use of chroot. For a relocatable system WPAR, these same file systems will be NFS-mounted instead. The shared file systems are namefs-mounted whether or not the WPAR is relocatable. Additional information — Transition statement — Having completed our brief review, let’s spend a short time going into the details of WPAR resource management.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-17

Instructor Guide

10-18 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

10.2.WPAR Manager Instructor topic introduction What students will do — Learn how to install, configure, and use the WPAR Manger components. How students will do it — The students will attend lecture, participate in discussion, and practice implementing WPAR Manager with hands-on labs. What students will learn — Students will learn to install and configure WPAR Manager components. How this will help students on their job — The WPAR Manager is an invaluable tool when working with many WPARs on many systems. It also provides support for Live Application Mobility, which improves system availability or supports workload rebalancing. Installation and configuration is a prerequisite skill to using the tool.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-19

Instructor Guide

Topic 2 objectives IBM Power Systems

After Aftercompleting completingthis thistopic, topic,you youshould shouldbe beable ableto: to: •• Describe DescribeWPAR WPARManager Managerconcepts conceptsand andcomponents components •• Install InstallWPAR WPARManager Manager •• Access AccessWPAR WPARManager ManagerGUI GUI •• Create Createand andmanage manageWPARs WPARsfrom fromWPAR WPARManager Manager •• Perform PerformWPAR WPARmobility mobilityand andadvanced advancedoperations operations

© Copyright IBM Corporation 2009

Figure 10-6. Topic 2 objectives

sAN151.0

Notes:

10-20 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Cover the objectives of the topic. Details — Additional information — Transition statement — Let’s begin with an overview of the WPAR Manger function.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-21

Instructor Guide

Workload Partition Manager overview IBM Power Systems

• Provides for centralized management of WPARS across multiple servers and makes infrastructure optimization easier Browser

• WPAR Manager components required: – One server LPAR running as manager – One agent on each managed LPAR containing WPARS

• Browser-based single GUI for WPAR management: – Basic lifecycle administration • Create, view, modify, start, stop, and remove

Webserver Workload Partition Manager

LPAR1 Management server

– Advanced management • Manual relocation, mobility • Checkpoint, restart • Automated relocation, policy driven • Monitoring, performance reporting • Global load balancing • Recovery

LPAR X WPAR Agent

LPAR Y WPAR Agent WPAR1

WPAR2

WPAR A WPAR B

WPAR C

WPAR3

© Copyright IBM Corporation 2009

Figure 10-7. Workload Partition Manager overview

AN151.0

Notes: IBM AIX 6.1 Workload Partition Manager (WPAR Manager) is a platform management solution that provides a centralized point of control for managing workload partitions (WPARs) across a collection of managed systems running AIX. It is an optional product, part of IBM Systems Director family, designed to facilitate the management of WPARs and application mobility, as well as provide advanced features such as policy-based mobility for automation of WPAR relocation based on current performance state. The Workload Partition Manager is a separated product, not part of AIX. By deploying the WPAR Manager, users are able to take full advantage of WPAR technology by leveraging the following features: • Basic life cycle management: Create, start, stop, and delete WPAR instances • Manual WPAR mobility: User-initiated relocation of WPAR instances • Creation and administration of mobility policies: User-defined policies governing automated relocation of WPAR instances based on performance state 10-22 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

• Creation of compatibility criteria on a per WPAR basis: User-defined criteria based on compatibility test results gathered by the WPAR Manager • Administration of migration domains: Creation and management of server groups associated to specific WPAR instances which establish which servers would be appropriate as relocation targets • Server profile ranking: User-defined rankings of servers for WPAR relocation based on performance state • Reports based on historical performance: Performance metrics gathered by WPAR manager for both servers and WPAR instances • Event logs and error reporting: Detailed information related to actions taken during WPAR relocation events and other system operations • Inventory and automated discovery: Complete inventory of WPAR instances deployed on all servers with WPAR Manager agents installed whether created by the WPAR Manager or through the CLI on the local system console. Workload Partition Manager helps with resource optimization. Physical servers can be consolidated and deconsolidated dynamically. For application granularity, this allows for more utilization of the already powerful virtualization (APV or PowerVM) capability of AIX and System p. For applications which require less than 1/10 of a processor to run, the WPAR approach allows for their consolidation into a global LPAR that can distribute the workload at a finer grain utilization of a CPU and other systems resources. This provides better use of systems and future cost savings to the enterprise.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-23

Instructor Guide

Instructor notes: Purpose — Provide a basic definition of WPAR Manger function and value Details — The WPAR Manager is a management system designed to provide a centralized interface for administration of WPAR instances across multiple systems. Additional information — The WPAR Manager feature improves flexibility of WPAR management across several systems especially in a virtualized environment. Its main benefit is automated workload balancing for WPARs across a farm of servers. It may be considered as a software feature for WPAR virtualization in addition to hardware PowerVM virtualization tools. The WPAR Manager software product is part of the Director family. The WPAR Manager product provides the Metacluster Checkpoint and Restart (MCR) software required for WPAR relocation and all other agents, It provides both a built-in database capability (using Apache Derby) and provides a limited use copy of DB2 which is recommended for large environments. Explain to students that the WPAR Manager licensed program product is not part of AIX. That means WPAR relocation cannot be performed without the WPAR Manager LPP. The MCR fileset (mcr.rte) provides the chkptwpar, restartwpar, and movewpar commands, corresponding to checkpoint, restart, and relocation functions. Transition statement — Let’s next look at the graphical user interface to the WPAR Manager.

10-24 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Workload Partition Manager main GUI IBM Power Systems

• Access the WPAR Manager from a browser using a system anywhere on the network.

Browser-based console

• WPAR Manager console default Web address: – Public: http:// :14080/ibm/console – Secured: https://:14443/ibm/console

• Single point of control for managing: – System WPARS – Application WPARS

• WPAR Manager is licensed – Covers all embedded technologies and products: • Agent services • Database • MetaCluster Checkpoint Restart (MCR) – Customer required to accept license agreement on all installp filesets © Copyright IBM Corporation 2009

Figure 10-8. Workload Partition Manager main GUI

AN151.0

Notes: WPAR Manager for AIX server is a Java application running in the management server. The WPAR Manager user interface provides a browser-driven interface to the WPAR management server. The user interface allows for the display of information that has been collected through the agents, and also provides management capability such as creation, deletion, relocation of WPARs, and so forth. The agent is based on Common Agent Services (CAS) technology. Many of these tasks can also be accomplished from the command line interface. Automated WPAR mobility provides another key to success and lowering cost in the optimization of uptime: applications can be relocated on maintenance windows, or set up for proactive fail over in case of indication of degradation (predictive failure analysis). This provides for non-interruptive maintenance providing zero downtime for server fixes and upgrades through virtual server/application relocation. This is clearly something that would need to be tested before going into a production environment and is not a replacement for high availability software such as HACMP or similar products.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-25

Instructor Guide

WPAR Manager is an enhancement to the flexibility and power of the IBM UNIX story as it becomes a more highly available solution. Other factors that promote AIX availability are more dynamic allocation and reallocation along with the configuration of virtual servers, storage, and network resources. Optimization of performance: applications or virtual servers can be scaled up or down, based on actual throughput demand and performance requirements. Sharing of application text, kernel data and text, through the WPAR technology, improves efficiency of partitioning. To use your browser with the WPAR management console, you must use Firefox 1.5+ or Internet Explorer (IE) version 6+, and JavaScript must be enabled in the browser. Since IE does not have native support for Scalable Vector Graphic (SVG), the Adobe SVG plug-in is needed, which can be downloaded from http://www.adobe.com/svg/viewer/install/main.html.

10-26 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain the WPAR Manager user interface. Details — Additional information — Transition statement — Let’s take a closer look at the components which will need to be installed and what roles they play.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-27

Instructor Guide

WPAR Manager topology: Default configuration IBM Power Systems

• WPAR Manager has three components: – WPAR Manager (resource manager) – Agent Manager – WPAR Agent (common agent) Browser-based console

Agent Manager

Agent Discovery

DB WPAR Manager (resource Manager)

Database access

WPAR Manager system Agent Registration

SSL Manager to Agent Communication

WPAR Agent

M C R

WPAR Agent

Mobility operations Managed system/LPAR NFS exports for mobility

M Mobility operations C R Managed system/LPAR

NFS Server © Copyright IBM Corporation 2009

Figure 10-9. WPAR Manager topology: Default configuration

AN151.0

Notes: The figure shows the basic installation components configuration. Deploying management software usually requires a server that hosts the management software and an agent that has to be installed on each server that is to be managed. The WPAR Manager is composed of three components: • The WPAR Manager (resource manager) is the back-end part containing the database and Web server. It is a server component. • The Agent Manager for WPAR Manager LPAR communication with WPAR clients. It is a server component. • The WPAR agent running on the client WPAR LPARs. It is the client component. To simplify the installation, by default, WPAR Manager and CAS Agent Manager are installed on the same system (the management server) and CAS Agent and WPAR Agent are installed on any server that will be managed.

10-28 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

The goal of Common Agent Services (CAS) is to minimize the complexity of the software management deployment by reducing the efforts needed for deployment and utilizing system resources more effectively. During WPAR Agent registration you have to provide the hostname of the CAS Agent Manager. The WPAR Manager then instructs the WPAR Agent to send it the information, in the format of an XML document, at a regular interval (default is 1 minute). This is a system wide value for all servers managed by the WPAR Manager. The information received by the WPAR Manager is maintained in database tables. With this information, the WPAR management console allows us to monitor various aspects of the managed WPAR, such as: • WPAR status: • WPAR name • Operational state • Type • Last modification time Also, a significant amount of performance metrics are sent by the WPAR Agent to WPAR Manager at a regular interval. Client and manager agents communication also provides for the checking of application health inside WPAR: • The administrator can provide scripts to check the health of the application running in WPAR.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-29

Instructor Guide

Instructor notes: Purpose — Cover the WPAR Manager components and what roles they play. Details — WPAR Manager utilizes the information in the registry on the Agent Manager to discover new managed systems. The Agent Manager is the server component of the common agent services. It provides authentication and authorization services, enables secure connections between managed systems in your deployment, maintains the registry about the managed systems and the software running on those systems. It also handles queries from the resource managers against the database. The agent manager has the following components: • The agent manager service The agent manager service serves as a certificate and registration authority to provide authentication and authorization using X.590 digital certificates and the Secure Socket Layer (SSL) protocol. It also handles requests for registry information from common agents and resource managers. Resource managers and common agents must register with the agent manager service before they can use its services to communicate with each other. This registration is password protected and there are separate passwords for the common agents and resource managers. For WPAR Manager, you only need to specify the registration password for the common agents. The password for resource manager is automatically generated during the configuration of WPAR Manager. • The registry The registry is the database that contains the current configuration of all known common agents and resource managers. Some of the information contained in the registry are: - The identity, digital certificates, and communication information for each resource manager - Basic configuration information for each common agent, for example, hardware type and operating system version - The status of each common agent - The last error or, optionally, a configurable number of errors, reported by each common agent WPAR Manager listens to port 14080 and 14443 and communicates to port 9511, 9512 and 9513 on the Agent Manager and to port 9510 on the WPAR Agent. Notice that these are default ports which can be overridden by the user during the configuration of WPAR Manager.

10-30 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Additional information — Transition statement — The first task is to install and configure the WPAR manager.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-31

Instructor Guide

Installation and configuration: WPAR Manager IBM Power Systems

• Check prerequisites and prepare – Check AIX 6.1 version: oslevel –r • WPAR Manager 1.2 requires 6100-02

– Check Java version 5: lslpp –lq ‘Java5*’ – Check file system space

• Install WPAR Manager and CAS Agent Manager – Install WPAR Manager fileset, wparmgt.mgr, using SMIT

• Install and configure DB2 (optional) – WPAR Manager 1.2 uses the Apache Derby database by default

• Configure WPAR Manager and CAS Agent Manager – Start /opt/IBM/WPAR/manager/bin/WPMConfig.sh –I console

• Verify WPAR Manager installation – Check WPAR Manager and CAS Agent Manager daemons are active © Copyright IBM Corporation 2009

Figure 10-10. Installation and configuration: WPAR Manager

AN151.0

Notes: This scenario is listed as an example with a completely new installation with no existing CAS Agent Manager or any DB2 server in the environment. It lists steps for installation of WPAR Manager, CAS Agent Manager, and (optionally) DB2 on the server machine. Prerequisites Required free space: - /tmp is 175 MB. - /opt - 700 MB. - /home - 800 MB. - /var is 200 MB. If using DB2 instead of the provided default Apache Derby for the database, then an additional 2GB of disk space is recommended:

10-32 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

- 1.5 GB in /home - 500 MB in /opt The minimum memory requirements (in addition to existing memory usage) is: - 125 MB for WPAR Manager - 60 MB for the CAS Agent Manager

Install filesets During wparmgt.mgr.rte fileset installation, three prerequisites are also installed: lwi.runtime, tivoli.tivguid, and wparmgt.cas.agentmgr. Configure WPAR Manager There are three modes in which WPAR Manager Configurator can be used: i.

Graphical mode (GUI - this is the default mode)

ii. Console mode (text) iii. Quiet mode (use a response file) Here is the command syntax to start each mode: •/opt/IBM/WPAR/manager/bin/WPMConfig.sh •/opt/IBM/WPAR/manager/bin/WPMConfig.sh -i console •/opt/IBM/WPAR/manager/bin/WPMConfig.sh -i silent \ -f /opt/IBM/WPAR/manager/config/wpmInstall.properties Console mode for text input is convenient as it can be started from any user interface: /opt/IBM/WPAR/manager/bin/WPMConfig -i console You are guided through several menus to enter parameters such as LOCALE variable, communication ports, manager hostname, agent manager password. Following actions are performed: - Start CAS Agent Manager and WPAR Manager - Register WPAR Manager to CAS Agent Manager - Set WPAR Manager to autostart at reboot (/etc/inittab file) - Set CAS Agent Manager to autostart at reboot (/etc/inittab file) Once configured, the WPAR Manager daemon should be active. You can manage the daemon by using the wparmgr command: - To verify, use /opt/IBM/WPAR/manager/bin/wparmgr status - To start, use /opt/IBM/WPAR/manager/bin/wparmgr start - To stop, use /opt/IBM/WPAR/manager/bin/wparmgr stop

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-33

Instructor Guide

Once configured, the CAS Agent Manager daemon should also be active. You can manage this daemon using the agentmgr command: - To verify, use /opt/IBM/WPAR/manager/bin/agentmgr status - To start, use /opt/IBM/WPAR/manager/bin/agentmgr start - To stop, use /opt/IBM/WPAR/manager/bin/agentmgr stop You can also use the Web browser to verify the installation by testing that it can connect to both CAS Agent Manager and WPAR Manager. - To verify the connection to the CAS Agent Manager: http://:9513/AgentMgr/Info - To verify the connection to the WPAR Manager: http://:14080/ibm/console Installing and configuring (optional) DB2 This is not necessary if you will be configuring WPAR Manager to use the default Apache Derby database that comes with it. For environments with large numbers of WPARs, it is recommended to use a DB2 database. Install the wparmgt.db fileset. This is a limited use packaging of DB2 for use with WPAR Manager. Optionally, you could use an existing DB2 9.1 instance. In that case, you need to work with the database administrator to create and populate the database catalog and schema. Start /opt/IBM/WPAR/manager/db/bin/DBInstall.sh. You can specify the following options: –dbinstallerdir -dbpassword The following actions are performed the by DBInstall.sh script: - Verify that port 50000, which will be used for DB2, is not already in use. - Verify that there is enough space in /tmp and /opt/IBM/WPAR/manager/db2. - Run db2setup to install DB2. - Verify that there is enough space in /home/db2wmgt (instance owner home). - Run db2isetup to create db2 instance db2wmgt and database WPARMGTDB. - Create and populate tables, indexes, views, and triggers that WPAR Manager will use. - Set the database to automatically start when the system starts. You can also view the detail information in the log file at /var/opt/IBM/WPAR/manager/logs/install/WPMDBI.log 10-34 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain steps for installing and configuring the manager components. Details — Additional information — Transition statement — Having installed and configured the manager components, the next task is to install and configure the agent components.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-35

Instructor Guide

Installation and configuration: WPAR agent IBM Power Systems

• Check hardware/software requirements – Processor architecture: Any IBM System p server supported on AIX 6.1

• Install WPAR Agent packages: – wparmgt.agent – Three filesets are prerequisites and are generally installed by default: bos.wpars, Java5.sdk, perfagent.tools

• Post install configuration – To register with the WPAR Manager and Agent Manager: # cd /opt/IBM/WPAR/agent/bin # ./configure-agent –amhost -prompt

• Verify WPAR Agent installation: – WPAR Agent daemon is active – WPAR management console can discover the Agent LPAR © Copyright IBM Corporation 2009

Figure 10-11. Installation and configuration: WPAR agent

AN151.0

Notes: Hardware and software requirements Any system running AIX 6.1, can run a WPAR Agent. The WPAR Manager version 1.2 on the server system can work with a WPAR agent which is either version 1.1 or version 1.2. But, the WPAR Manager 1.2 enhanced capabilities (especially the enhanced live relocation) are only available with the WPAR Manager version 1.2 agent. WPAR Manager 1.2 requires at least AIX6.1 TL2 (preferably with most recent service pack). Install the packages Select the wparmgt.agent and the mcr.rte packages for installation. The prerequisite software is normally installed by default when installing the operating system. Notice that, besides wparmgt.agent, there are three other prerequisite fileset that are also installed: wparmgt.cas.agent, this fileset contains the CAS Agent function. tivoli.tviguid (co-requisite for wparmgt.cas.agent) GUID is a 32-hexadecimal digit number that is used to uniquely identify a common agent or a resource manager in the 10-36 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

environment. (Example /usr/tivoli/guid/tivguid –show Guid:35.df.20.82.fe.67.11.db.95.b3.08.63.09.03.05.90) mcr.rte Metacluster Checkpoint and Restart (MCR) is the software that provides the capability to checkpoint (capture the entire state of running applications to be relocated) a workload partition to a statefile and restart the applications, using the information in that statefile, on another logical partition or managed system. MCR is provided as part of the WPAR Manager LPP, and installs to: /opt/mcr. Configuring the agent One of the files installed is a configure-agent script which will configure the agent, start the daemon, and register with the agent manager. You need to know the address of the server where you installed the WPAR Manager server components and the password you set when you configured the manager. The script will prompt you for that password. To configure WPAR Agent, log in as root to the managed system: /opt/IBM/WPAR/agent/bin/configure-agent -amhost -prompt You will be asked to enter the Agent Registration password. This is the password that you provide during the WPMConfig step. (If you use existing CAS Agent Manager, you will need this agent register password from the CAS administrator.) On successful registration, the following files are created on the managed system in the /opt/IBM/WPAR/agent/cas/runtime/agent/cert directory: CertificateRevocationList agentKeys.jks agentTrust.jks pwd

Verifying Once the agent has been configured, you can verify the agent daemon is running by using the following command: # wparagent status The real test is to use the WPAR Manager web interface to discover the managed systems. The successful discovery of the agent completes the validation, and enables the WPAR Manager to create, activate, and relocate WPAR on the agent system.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-37

Instructor Guide

Instructor notes: Purpose — Cover the steps involved in the agent installation and configuration. Details — Additional information — The common agent has a “heartbeat” function that sends periodic status and configuration reports to the agent manager. The frequency of this update can be set or it can be turn off. The common agent functionality for WPAR Manager is in the fileset wparmgt.cas.agent. WPAR Agent listens to port 9510 and communicates to port 9511, 9512, and 9513 on the CAS Agent Manager. Notice that these are default ports which can be overridden by the user during the configuration of WPAR Manager. Transition statement — Now that we have installed and configured the software, we are ready to use the WPAR Manager graphic interface. To do this you will need to log in to the WPAR Manager. Let’s look at some of the possible roles that can be assigned to an AIX user who is logging in.

10-38 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Authentication and WPAR Manager IBM Power Systems

• Authentication – Any user with a user ID and password on the local AIX system hosting the WPAR Manager application can authenticate to WPAR Manager, but the actions available in the interface differ depending on the role assigned to the user

• WPAR Manager roles – Administrator: Can define roles for other users – WPARAdministrator: Provides access to all WPAR Manager management actions – WPARUser: • Provides access to all basic WPAR actions • Does not provide access to high-level administrative tasks: • Discovering, modifying, and deleting managed systems • Creating or modifying relocation policies • Modifying general WPAR settings

– WPARMonitor • Provides read-only access to managed systems, WPARs, and WPAR groups • Does not allow you to make any changes to the environment

© Copyright IBM Corporation 2009

Figure 10-12. Authentication and WPAR Manager

AN151.0

Notes: Some implementation may be considered after installation, such as creating additional users accounts for WPAR Manager access. You may define the appropriate role to each of the AIX users. During WPAR Manager installation, the root user is mapped to the administrator and WPAR administrator roles. ID-to-application role mappings can either be performed using the lwiMapRole.sh script or with the user interface using the Console User Authority window. Full accessibility support for screens readers should be enabled from the Configure WPAR Manager > User Preferences panel. WPAR Manager uninstall is done through the following steps: 1. Connect as root to the server partition and remove WPAR Manager and CAS Agent Manager filesets using SMIT. 2. Run “/cdrom/db2/DBUninstall.sh db2wmgt” to remove DB2 database. © Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-39

Instructor Guide

3. Connect as root to the client partition and remove WPAR Agent filesets using SMIT. See the installation slides to determine the list of filesets to uninstall. You can determine WPAR Manager and Agent versions looking at the following files: •/opt/IBM/WPAR/manager/version.properties •/opt/IBM/WPAR/agent/version.properties License files are located in /usr/swlag//WPARManager_110* and /opt/mcr/mcr.rte.copyright.

10-40 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain WPAR Manager authentication and roles. Details — Additional information — Transition statement — Once logged in, what are some of the functions that are available?

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-41

Instructor Guide

WPAR Manager functional view IBM Power Systems

Functional components: WPAR Manager

Ɣ Basic management Ɣ Relocation Ɣ Compatibility

Relocation

Global load balancing

Compatibility

Basic management

Monitoring and reporting

Recovery

Ɣ Global load balancing Ɣ Monitoring and reporting

WPAR Agent

WPAR Agent

Ɣ Recovery LPAR 1 System WPAR

LPAR 2 Application PAR

© Copyright IBM Corporation 2009

Figure 10-13. WPAR Manager functional view

AN151.0

Notes: The basic management features are used for standard operations such as create, view, modify, start, stop, and remove. Using the WPAR Manager, the deployment operation is provided for copying WPAR definition to the managed server. The deployment option enables you to build a WPAR profile on the WPAR manager with or without creating it on the client LPAR. Global load balancing: By automating the relocation of a WPAR to a managed system that is better suited to its current workload, we are effectively load balancing all systems under WPAR Manager control to achieve, not only a better performance for each application, but also a better utilization for the entire IT enterprise. The global load balancing feature of the WPAR Manager is based on the concepts of WPAR group, server group, server ranking profile, and WPAR relocation policy.

10-42 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

The balancer component can be used in semi-automatic or automatic mode for relocation. Manual mode is also available. Using manual relocation, the WPAR manager assists the administrator by providing WPARs and LPARs performance metrics and handling automatically compatibility checks. There are three major functions of the WPAR Manager workload balancer, relocation analysis, relocation workflow management, and relocation recovery. This component uses the monitoring information that is collected and analyzes the current utilization of all WPARs to find whether there is any requirement to relocate any WPAR, if there are multiple events, it also prioritizes the order of relocation events. It selects the most appropriate managed system as the target for relocation, based on the user-defined policy for each WPAR group. In short, this component finds which WPAR to relocate, when and where to relocate it. Notice that Partition Load Manager (PLM) moves idle resources across the LPARs of a single server. Monitoring and reporting: The monitoring and reporting feature is used for performance metrics collection to the database. This information contains: • WPAR Agent's GUID. WPAR Manager uses the GUID to identify which client the information comes from. • Global environment performance metrics • A list of WPARs • Performance metrics for each active WPAR This not only allows the administrator who uses the WPAR Management console to monitor the performance of each WPAR and managed system, but also enables the WPAR Manager to know which WPAR may need to be relocated and which logical partition or managed system is the most suitable candidate to host the WPAR.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-43

Instructor Guide

Instructor notes: Purpose — Identify a list of functions provided through the WPAR Manager interface. Details — Do not preteach all of the items here. This is mostly a list of what is covered on the following visuals. The exception is the load balancing function. For that, just briefly define what it is and explain that: a. We do not have time to cover the configuration of that capability b. The use of the load balancing requires extensive understanding of AIX performance issues. Additional information — Transition statement — Let’s take a look at the first item in this list.

10-44 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Basic management IBM Power Systems

Ɣ

© Copyright IBM Corporation 2009

Figure 10-14. Basic management

AN151.0

Notes: Pick your browser of choice and type in https://:14443/ibm/console. This opens the GUI to the user login page. Type in the user name and password and click log in button. Clicking Guided Activities opens up a drop-down menu to choose from. The two options are listed as: 1. Create Workload Partition 2. Create WPAR Group The first option, Create Workload Partition, opens a new page which is the entry page to the wizard. The welcome page for the wizard give you an opportunity to choose between using the default wizard interface or using the advanced interface (able to jump between the tasks using tabs). Next, the list of tasks for this activity will be listed. The following panels then guide you through all the options and parameters to define the WPAR.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-45

Instructor Guide

Instructor notes: Purpose — Discuss basic management using the WPAR Manager interface. Details — While the visual focuses on the Guided Activities menu, most of the basic management actually occurs in the Resources Views. There, you can select a managed system or a WPAR and select the action to perform against the selected entity. Using a resource view is best covered on a later visual. For this visual, the main focus is actually the use of the Guided Activities to invoke the Create Workload Partition guided wizard, Additional information — Transition statement — Let’s see what we get if we click the Create Workload Partition menu item.

10-46 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Creating a WPAR IBM Power Systems

Welcome General Filesystems Options Network Routing Resource Controls Security Advanced settings Summary

© Copyright IBM Corporation 2009

Figure 10-15. Creating a WPAR

AN151.0

Notes: After selecting the option Create Workload Partition, this task list appears and you are guided through all the steps for defining the properties. You will be guided through a wizard to: • Provide a name and description for the new partition • Select whether this will be a system or application partition • Specify whether the partition can be relocated from one system to another • Set up network addresses and settings • Set up WPAR properties, Role Based Access Control (RBAC), resource controls • Review or change settings for file systems and paths • Choose whether to deploy and start partition immediately or at a later time

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-47

Instructor Guide

Instructor notes: Purpose — Illustrate the start of the Create Workload Partition wizard. Details — We will not cover the details of using the wizard in this lecture. Instead, the lab exercise will walk them through the process, step-by-step. Note that they should already be familiar with using the AIX command line interface to define and activate a new WPAR. Additional information — Transition statement — Once we have WPAR created, we will want to monitor the resource usage.

10-48 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

WPAR monitoring and reporting IBM Power Systems

• WPAR performance and managed server metrics

© Copyright IBM Corporation 2009

Figure 10-16. WPAR monitoring and reporting

AN151.0

Notes: Performance metrics are sent by the WPAR Agent to WPAR Manager at a regular interval. This not only allows the administrator who uses the WPAR Management console to monitor the performance of each WPAR and managed system, but also enables the WPAR Manager to know which WPAR may need to be relocated and which logical partition or managed system is the most suitable candidate to host the WPAR.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-49

Instructor Guide

Instructor notes: Purpose — Explain how WPAR manager collects and displays metrics. Details — Additional information — Transition statement — Most of the management of workload partitions will be done by locating an existing WPAR on a resource view and selecting an action to be performed. Let’s see what that looks like.

10-50 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Resources view IBM Power Systems

© Copyright IBM Corporation 2009

Figure 10-17. Resources view

AN151.0

Notes: To see the resources defined or discover others, move the mouse to the upper left corner and click Resource Views. It has a drop-down with three options: Managed Systems, Workload Partitions, and Workload Groups. Each view provides a list of known resources in that category. For example, the Workload Partitions view has a list of known workload partitions. From a list, you can select a resource, such as a particular WPAR, and then select an action from the Actions menu. For example, you can activate, deactivate, or remove a known workload partition.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-51

Instructor Guide

Instructor notes: Purpose — Describe WPAR Manager Resource Views. Details — Additional information — Transition statement — One of the most important WPAR Manager abilities is to manage the relocation of workload partitions. We can select a known workload partition from the Workload Partitions resource view and then request a relocation.

10-52 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Manual relocation or mobility IBM Power Systems

© Copyright IBM Corporation 2009

Figure 10-18. Manual relocation or mobility

AN151.0

Notes: The relocate wizard prompts you for relocation options and then manages the relocation process with minimal effort on the part of the administrator. It will suggest a compatible and optimal target system, but allows you to pick your own. Best practice is to first test for compatibility before attempting relocation. The Compatibility item is directly under the Relocate item in the actions menu. Compatibility analysis determines if it is safe to relocate a WPAR from one machine to another. Both software and hardware compatibility tests are run. Also, both critical and optional test cases are run. Even if we do not pretest for compatibility, the relocation wizard will automatically verify compatibility before executing a relocation. Details of the relocation process and compatibility rule will be covered in the next lecture topic.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-53

Instructor Guide

Instructor notes: Purpose — Explain how WPAR Manager can initiate a WPAR relocation. Details — Do not preteach too much here. The next topic goes into the details of WPAR relocation. Additional information — Transition statement — Regardless of what specific activities we invoke in WPAR Manager, it is useful to be able to track the progress of that action, or later be able to examine a log to investigate any problems.

10-54 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Tasks activity and logging IBM Power Systems

• Access Task activity and details from the main GUI by selecting Task activity in the Monitoring section – Gives tasks and operations details with executed command, output, and errors

• Logging mechanisms are enabled by default – WPAR Manager logging can be controlled from GUI or command line – WPAR Agent logging can be controlled by modifying a property file – MCR logging can be controlled through the GUI configuration page

• Old performance data can be removed automatically after a number of days • Metric collection interval can be changed (default is 60 seconds) © Copyright IBM Corporation 2009

Figure 10-19. Tasks activity and logging

AN151.0

Notes: There are three basic categories of tracking information available with the WPAR Manager components. - Detailed task monitoring - Log information from the various components - Performance metrics The task monitoring is easily available though the WPAR Manager interface. It can be viewed either while the activity is in progress or at any time after the activity has completed. It enumerates the task history (with their status). The task details, for a given task, will list the operations which implement that task. For some operations, you can obtain the command executed along with any STDIN and STDERR that was written. The logs are mostly for detailed problem determination when an activity fails. If working with AIX Support on a problem, the support staff will likely ask for these logs to be collected and included in the snap. There are separate logs for each component, obviously collected

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-55

Instructor Guide

from several servers. The logs are not easily read; some of them are in a tagged html format. The performance metrics are supported by the WPAR Manager GUI. The WPAR Manager collects and can later display the performance metrics. Especially useful is the graphing of the metrics.

10-56 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Discuss the logging mechanisms. Details — Additional information — Transition statement — Let’s look at the locations of the logs for the various components.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-57

Instructor Guide

WPAR Manager 1.2 log locations IBM Power Systems

• WPAR Manager log files: – /var/opt/IBM/WPAR/manager/lwi/logs/derby.log – /var/opt/IBM/WPAR/manager/lwi/logs/error-log.#.html – /var/opt/IBM/WPAR/manager/lwi/logs/trace-log.#.html

• WPAR Agent Manager log files: – /var/opt/IBM/WPAR/manager/cas/agentmgr/logs/derby.log – /var/opt/IBM/WPAR/manager/cas/agentmgr/logs/error-log.#.html – /var/opt/IBM/WPAR/manager/cas/agentmgr/logs/trace-log.#.html

• WPAR Agent log files – /var/opt/IBM/WPAR/agent/logs/mcr/.log – /var/opt/IBM/WPAR/agent/logs/WPARAgent.*

• Common Agent log files: – /var/opt/tivoli/ep/logs/error-log.#.html – /var/opt/tivoli/ep/logs/trace-log.#.html © Copyright IBM Corporation 2009

Figure 10-20. WPAR 1.2 log locations

AN151.0

Notes: This lists the location of the logs for the three main WPAR Manager components. The WPAR Agent Manager log files would be on the WPAR Manager server system. The WPAR Agent and Common Agent log files would be on the systems managed by the WPAR Manager server. Remember that when diagnosing a WPAR relocation problem, there are two platforms with agents working to implement the relocation activity.

10-58 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Point out the locations of the various logs. Details — Additional information — There are differences in the way workload partitions are created in the workload partition agent versus the command line which restricts the ability to relocate application WPARs. This limitation does not exist for system WPARs. There is no compatibility for relocation of application WPARs when created in WPAR Manager or from the command line. The following limitations were identified in WPAR Manager version 1.1. It is not clear if they have changed in WPAR Manager 1.2: • Processes with deleted POSIX shm maps cannot be paused. • Application WPARs created from the command line cannot be relocated using the WPAR Manager, and application WPARs created from the WPAR Manager cannot be relocated from the command line. • Processes launched inside a system WPAR using clogin cannot be paused. • Process running in an unlinked working directory cannot be paused. • Checkpoint fails if processes or threads are stopped in the WPAR. • Checkpoint supports memory regions created using mmap (MAP_SHARED) only for files opened in read-only mode or anonymous mmaps. • Fixed resource controls on WPARs that belong to a WPAR group that has policy enabled relocation are not taken in to account to determine the performance state of the WPAR group. Transition statement —

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-59

Instructor Guide

10-60 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

10.3.Application mobility Instructor topic introduction What students will do — Learn to relocate WPARs from one system to another. How students will do it — Lecture discussions and lab exercises What students will learn — To relocate WPARs from one system to another How this will help students on their job — This will enable them to avoid maintenance disruptions and load balance work between systems.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-61

Instructor Guide

Topic 3 objectives IBM Power Systems

After Aftercompleting completingthis thistopic, topic,you youshould shouldbe beable ableto: to: •• Explain Explainthe therole roleof ofApplication ApplicationMobility Mobility •• Explain Explainthe theNFS NFSrole rolein inLive LiveApplication ApplicationMobility Mobility(LAM) (LAM) •• List Listthe theLAM LAMrequirements requirementsfor forthe theWPAR WPARand andthe thelogical logical partitions, and validate that the requirements are met partitions, and validate that the requirements are met •• Migrate Migrateaalive livesystem systemWPAR WPAR from from one one logical logical partition partition to to another another •• Explain Explain WPAR WPAR Manager Manager support support for for static static relocation relocation © Copyright IBM Corporation 2009

Figure 10-21. Topic 3 objectives

AN151.0

Notes:

10-62 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Discuss the objectives of the topic. Details — Additional information — Transition statement — Let’s define what we mean by Application mobility.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-63

Instructor Guide

Application mobility IBM Power Systems

• Moving a workload partition between logical partitions – Typically, the LPARs are on different servers

• Empty a machine for application outage avoidance – – – –

Upgrade machine Upgrade firmware Machine repair Upgrade AIX version and release

• Multi-system workload balancing – From overloaded system to system with extra capacity – Application consolidation: from many systems to one system – WPAR Manager provides for automated relocation • Can provide relocation policies with thresholds to trigger relocation © Copyright IBM Corporation 2009

Figure 10-22. Application mobility

AN151.0

Notes: Outage avoidance Hardware components of an IT infrastructure might need to undergo maintenance operations requiring the component to be powered off. If an application is not part of a cluster of servers designed to provide continuous availability, then using WPARs to host them can help to reduce interruption of availability. Using the live application mobility feature, the applications that are executing on a physical server can be temporarily moved to another server without an application blackout period during the period of time required to perform the server physical maintenance operations. Workload sizing and balancing Using the mobility feature of WPARs, the server sizing and planning can be based on the overall resources of a group of servers, rather than being performed server by server. It is possible to allocate applications to one server up to 100% of its resources. When an application grows and requires resources that can no longer be provided by the server, the application can be moved to a different server with spare capacity.

10-64 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Define Live Application Mobility. Details — Additional information — Transition statement — Let’s compare this with Live Partition Mobility.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-65

Instructor Guide

WPAR Manager relocation support IBM Power Systems

• Live relocation (WPAR Manager 1.1 and 1.2) – – – – –

Private file systems on common NFS server Checkpoint with creation of statefile on NFS server Restart on target using statefile on NFS server Longer application freeze period (up to 30 secs) No GUI support under WPAR Manager 1.2

• Enhanced live relocation (WPAR Manager 1.2) – Private file systems on common NFS server – New dynamic transfer of state and memory – Only a second or two of application freeze

• Static relocation (WPAR Manager 1.2) – – – – –

File systems must be local WPAR stopped, if not already savewpar and restwpar (backup on common NFS server) Fewer compatibility rules between systems Can be implemented through CLI without WPAR Manager © Copyright IBM Corporation 2009

Figure 10-23. WPAR Manager relocation support

AN151.0

Notes: Live relocation The live relocation was implemented (under WPAR Manager version 1.1 and is still supported by WPAR Manager 1.2), using the checkpoint command to pause the WPAR and capture all of its state information in a collection of state files. Not only did the private file systems need to be on an NFS server (common to both source and target systems), but also the state file was passed through the NFS server. At the target system, a clone of the source WPAR needed to be defined, and the state file used, to restart the WPAR. The time between WPAR pause and WPAR restart was long enough that connections from application clients or peers could time out. On the other hand, the supporting line commands are fully documented and some administrators find the checkpoint-based live relocation to be more reliable than the enhanced live relocation. While WPAR Manager version 1.2 still supports the command line implementation, the WPAR Manager GUI uses enhanced live relocation.

10-66 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty Enhanced live relocation WPAR Manager 1.2 has improved the technology used to relocate active WPARs. The command line interface implementation has fewer steps. The enhanced live relocation is also referred to as asynchronous live relocation, because of the use of memory transfer technologies (similar to what is used in live partition mobility). The important result of the enhancement is a much shorter period of application freeze, thus avoiding most connection outages. The WPAR Manager GUI orchestrates the agents to carry out the relocation using the MCR commands. The main command is the movewpar command. While it is possible to use a command line interface on the source and target LPARs to implement enhanced live relocation, the movewpar command is not officially documented. Information on how you might use the movewpar command is provided only in the WPAR redbook. The intent is that you would use the WPAR manager GUI. In both live relocation and enhanced live relocation, the WPAR processes (when they restart) expect to be in an execution environment that looks the same on the target as it did on the source system. This expectation is expressed as a series of compatibility requirements between the source and target systems. Static relocation If there is not a requirement for the WPAR to stay active with its applications running during relocation, then you can relocate a WPAR without the use of WPAR Manager. You simply need to save the WPAR on the source system, and restore the WPAR on the target system. In WPAR version 1.2, the WPAR manager GUI will implement this type of relocation with a single request. This is referred to as a static relocation. The important requirements are that the private file systems must not be local (rather than NFS mounted) and that the WPAR backup files must be on an NFS server common to both source and target. Since there are no running WPAR processes in a static relocation, the compatibility requirements are much less than in the live relocation scenario.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-67

Instructor Guide

Instructor notes: Purpose — Compare the different types of relocation support. Details — Additional information — Transition statement — Let’s talk about the importance of compatibility between the source and the target systems.

10-68 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Compatibility issues IBM Power Systems

• WPAR Mobility requires the departure and arrival systems to be compatible. – Requirements for live relocation much greater than static – WPAR Manager provides tools to validate compatibility.

• Software compatibility – WPAR Manager agent levels – Global environment AIX operating system levels – Any other binaries which are in a namefs mounted file system

• Hardware compatibility – Server processor type – Devices and hardware features © Copyright IBM Corporation 2009

Figure 10-24. Compatibility issues

AN151.0

Notes: WPAR mobility across systems requires the departure and arrival systems to be compatible. This includes the software and the hardware compatibility. Software compatibility The software levels on both the departure and arrival systems must match. This is absolutely necessary as the application binaries are not saved in the checkpoint state and are instead restarted utilizing the arrival system's application binaries. Hardware compatibility The hardware characteristics of the departure and arrival system must be compatible. This ensures that an application that is aware of system hardware characteristics will continue to see the same features after migration to the remote system. For WPAR Live Application Mobility each machine or LPAR needs to be configured in the same way for the WPAR. This includes: • The file systems needed by the application

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-69

Instructor Guide

• Similar network functionalities (meaning the same subnet because of routing implications • Enough space to handle the data created during the migration process • Same OS level (same technology level)

Some WPAR mobility considerations • Currently, the only supported mechanism to make the WPAR file systems available on multiple systems is NFS. Each machine or LPAR has to be able to mount the same WPAR file systems. • In an application WPAR environment, applications using PTY devices have to ensure that both master and slave users are in the same WPAR. • Applications that bind to CPUs will have to unbind during the duration of the event. This can be done by registering for DR Migration event notifications. • Applications opening files using the O_DEFER flag will not be mobile. Processes launched inside a system WPAR using clogin cannot not relocated.

10-70 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Discuss compatibility levels. Details — Note that there is a more detailed list of compatibility requirements later in the unit. Additional information — Transition statement — Let’s briefly compare WPAR Live Application mobility with Live Partition Mobility.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-71

Instructor Guide

Live partition mobility versus live application mobility IBM Power Systems

• Live partition mobility: Migration of a running logical partition to another physical server

Multiple Systems managed by a single HMC

P1 P2 P3

VIOS

– Operating system, applications, and services are not stopped during the process – Requires POWER6 , AIX 5.3 and VIO server

Server 2

VIOS

Server 1

P1 P5

HMC Network

• Live application mobility: Moving a workload partition from one server to another – Without requiring the workload running in the WPAR to be restarted – Provides outage avoidance and multi-system workload balancing – Requires AIX 6.1

AIX # 2

AIX # 1 Workload Partition EMail Workload Partition Web

1.

Workload Partition App Srv

Workload Partition Dev

Workload Partition Billing

2. Workload Partition Data Mining

AIX # 3

Workload Partition Training

Workload Partition Test

Policy Workload Partitions Manager

© Copyright IBM Corporation 2009

Figure 10-25. Live partition mobility versus live application mobility

AN151.0

Notes: Overview Live Partition Mobility and Live Application Mobility are capabilities that enable users to move workloads between systems with no (or limited) application downtime. Both types of mobility allow organizations to move workloads from busy servers to less busy ones in order to improve overall performance and system utilization (based on requirements at a particular time). They can also be used to enable a maintenance window on a machine without necessarily needing any application downtime. This is accomplished by moving the work (either WPARs or entire LPARs) off the machine needing the maintenance and then later returning the work to that same machine after the maintenance is completed. The only interruption of service would be due to network latency. If sufficient bandwidth was available, a delay of – at most – a few seconds could typically be expected. Live Application Mobility

10-72 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Application Mobility is a capability that allows a client to relocate a running WPAR from one system to another, without requiring the workload running in the WPAR to be restarted. Application Mobility is intended for use within a data center and requires the use of the new Licensed Program Product; the IBM AIX Workload Partitions Manager. WPARs differ significantly from Live Partition Mobility in that Live Partition Mobility is a feature of POWER6 processors. As such, it can be used on operating systems other than AIX 6, such as Linux or earlier AIX versions. In contrast, Workload Partitions is a feature of AIX 6 specifically and can run on a variety of hardware (for example either POWER6, POWER5 or POWER5+ systems). Live Partition Mobility Partition mobility enables the movement of full partitions between systems, which not only enables better optimization of your IT environment by balancing workload, it also helps to eliminate the need for planned outages for system upgrades. An active migration moves the definition of a logical partition from one system to another along with its network and disk configuration. The operating system, the applications, and the services they provide are not stopped during the process. The physical memory content of the logical partition is copied from system to system allowing the transfer to be imperceptible to users. During an active migration, the applications continue to handle their normal workload. Disk data transactions, running network connections, user contexts, and the complete environment is migrated without any loss and migration can be activated any time on any production partition.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-73

Instructor Guide

Instructor notes: Purpose — Explain the difference between Live Partition Mobility and Live Application Mobility. Details — Additional information — Transition statement — How is it possible to move a WPAR between systems without disrupting the execution of the programs?

10-74 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

WPAR enhanced live mobility IBM Power Systems

1. Issue movewpar 2. Send WPAR spec file to target and save page and segment table on NFS 3. Target receives spec file, creates WPAR. State of T 4. Get WPAR memory page and segment table from NFS 5. Source state: T -> M 6. Transmit Memory data 7. Source state: M -> T 8. Transfer complete, target state T -> A 9. Source state T -> D © Copyright IBM Corporation 2009

Figure 10-26. WPAR enhanced live mobility

AN151.0

Notes: WPAR enhanced live mobility is implemented as follows: 1. Our WPAR is active, and we issue movewpar on the target (or Arriving) system. The WPAR changes state to T, and starts the move processes. 2. We send our WPAR spec file to the arriving system. At the same time, we start saving page and segment table information on our NFS server. 3. When our Arriving system receives the spec file, it creates the WPAR. When this is done we change state to T. 4. We are ready for receiving memory data from the source (or Departing) system, and we can get page and segment table information from the NFS server. 5. While this happens on the Arriving system, our Departing system changes state from T to Moving (M). 6. The M state is only shown while our memory data is transmitted to the Arriving system. 7. As soon as this is finished, we change the state to T. © Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-75

Instructor Guide

8. When our Arriving system has received all needed data, it starts and changes states from T to A. 9. At the same time, our Departing system changes state to D.

Fine grained mobility Just one of many WPARs in a system may be moved. You can create and migrate a WPAR containing just an application set (that is, DB2). It is very different from mobility in other partitioning technologies. Workload partitions are mobile and mobility provides support for common application features such as: • Advisory locks including NFSv3 locks • Network connections (TCP, UDP, UNIX) • Pseudo terminals • Memory, IPC • Most system calls transparent

10-76 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Discuss some of the mechanisms involved in relocating a WPAR. Details — Additional information — Transition statement — Let’s look at the steps involved in relocating a WPAR using the WPAR Manager graphic interface.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-77

Instructor Guide

Steps for WPAR enhanced live mobility (WPAR Mgr GUI) IBM Power Systems

1. 1 2 2. 3 3. 4 4. 5 5.

On the NFS server, create, and export WPAR file systems Create WPAR on source system (with checkpointable flag) Start and use the WPAR Identify compatible target system Invoke Relocation task for WPAR to target system WPAR Mgr Source System

Target System

WPAR ServerA

WPAR ServerA relocated

NFS Server

© Copyright IBM Corporation 2009

Figure 10-27. Steps for WPAR enhanced live mobility (WPAR Mgr GUI)

AN151.0

Notes: Here is an overview of the different steps needed to create a workload partition check-pointable, then to relocate if from system 1 to system 2. 1. Create the file-systems structure on the nfs server for the WPAR: - / - /tmp - /home - /var (and optionally /usr and /opt ; not recommended) Export the file systems with root access to both of the global environments and to the WPAR. 2. Create the WPAR – checkpointable (mkwpar –c). 3. Start the WPAR – startwpar. 10-78 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

4. Identify a compatible target system (WPAR Manager will provide a list of registered systems and their compatibility with the global system of the WPAR). Alternatively, prior to attempting relocation, you can select the WPAR and click the Compatibility item in the Action menu. This will list the known managed systems and their compatibility status for the selected WPAR. If you attempt a relocation with a non-compatible system, WPAR Manager will fail the relocation and identify the compatibility issue. 5. Select the WPAR in the WPAR Manager GUI and select the Relocate task from the actions menu. WPAR Manager will manage the entire relocation process and provide a listing of the steps taken and the status of the relocation task.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-79

Instructor Guide

Instructor notes: Purpose — Cover how to relocate a WPAR using the WPAR Manager GUI. Details — Additional information — Transition statement — Let’s look at the steps the wizard goes through to relocate an active WPAR.

10-80 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Enhanced relocation workflow (1 of 2) IBM Power Systems

Workflow

Departure server

Arrival server

Verify that the WPAR is active.

Verify that the WPAR does not exist.

Lock the WPAR on the source.

Lock the WPAR name on the target.

Get the departure server properties.

Get arrival server properties.

Verify the compatibility of departure and arrival servers. Get the WPAR properties.

Use WPAR properties to deploy WPAR.

Pause WPAR and send WPAR state.

Receive WPAR state from the departure server.

Transfer WPAR memory between systems and restart WPAR Verify that WPAR relocation was successful and WPAR is healthy. Remove WPAR. Verify that WPAR does not exist.

Get WPAR properties.

Unlock the WPAR name.

Unlock the WPAR.

© Copyright IBM Corporation 2009

Figure 10-28. Enhanced relocation workflow (1 of 2)

AN151.0

Notes: When all operations are run through the guided Relocation wizard, the WPAR Manager orchestrator automatically performs all the steps of the workflow. The relocation is performed with a minimal application downtime and interruption to the end user. WPAR Manager uses a WPAR lock mechanism during the relocation process. Locks are not used for manual relocation when performed from the command line. Details about relocation steps have been described in the previous WPAR topic. Relocation process details and performed steps can be checked from the monitoring task menu.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-81

Instructor Guide

Instructor notes: Purpose — Cover the steps in relocating an active WPAR, using the wizard. Details — Additional information — Transition statement — We can track these steps in the WPAR Manager graphic interface in the Task Details panel. Let’s see what this looks like after a successful relocation.

10-82 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Enhanced relocation workflow (2 of 2) IBM Power Systems

© Copyright IBM Corporation 2009

Figure 10-29. Enhanced relocation workflow (2 of 2)

AN151.0

Notes: During or after a relocation task, you can examine the individual operations and their status by using the WPAR Manager Task Details panel, as shown here. If there is any problem with the relocation, the first point of failure in the workflow will be identified as the failed operation.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-83

Instructor Guide

Instructor notes: Purpose — Illustrate the ability to track the tasks during a relocation. Details — Do not spend much time on this. Just note that this can be an invaluable tool if there are problems in completing a relocation. Additional information — Transition statement — We just looked at what a successful relocation would look like. What if the relocation task failed? How would we diagnose the problem?

10-84 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Enhanced relocation error (1 of 2) IBM Power Systems

© Copyright IBM Corporation 2009

Figure 10-30. Enhanced relocation error (1 of 2)

AN151.0

Notes: If there is a problem with the relocation task, the Task Details list of operations will identify what operation failed. When you click on the name of the operation it will bring up the details about that particular operation.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-85

Instructor Guide

Instructor notes: Purpose — Illustrate what a task failure would look like. Details — Additional information — Transition statement — If we click the failed operation name, what will we see?

10-86 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Enhanced relocation error (2 of 2) IBM Power Systems

© Copyright IBM Corporation 2009

Figure 10-31. Enhanced relocation error (2 of 2)

AN151.0

Notes: The Operations Details panel provides additional information about the operation. If the operation involved the execution of a line command, then there will be three tabs in the operation details: - Command: The full syntax of the issued command with options and arguments - Output: The standard output from the command - Error: The standard error from the command Obviously, the standard error listing is very useful in diagnosing what caused the task failure. In the displayed example, the enhanced live relocation failed because the NFS server setup was not complete; the NFS server had not identified the relocation target system global environment as having root access to the exported file system.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-87

Instructor Guide

Instructor notes: Purpose — Illustrate the Operation Details panel. Details — Additional information — Transition statement — Let’s see how we would implement an enhanced live relocation from the command line interface.

10-88 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Steps for WPAR enhanced live mobility (command line) IBM Power Systems

1 1. 2 2. 3 3. 4 4. 5 5. 6 6. 7 7.

On the NFS server, create, and export WPAR file systems. Create WPAR on the source system (with checkpointable flag). Start and use WPAR. Generate a spec file for the WPAR. Ensure the target system is compatible. Create WPAR on the target system using the spec file. Start a migration server on the source system. – Record the reported connection key value

8 Start migration on the target system (using connection key). 8. Source: lparX

Target: lparY

WPAR ServerA

WPAR ServerA relocated

NFS Server © Copyright IBM Corporation 2009

Figure 10-32. Steps for WPAR enhanced live mobility (command line)

AN151.0

Notes: In comparison to the graphic interface, the command line interface (CLI) requires more work on behalf of the administrator, but has the advantage that it can be embedded in a shell script for flexible automation. Some of the main differences are: - You have to manually determine the compatibility of the two servers. - You are responsible to create (but not activate) a WPAR on the arriving system which is exactly the same as the one which is to be relocated. The exact match is typically ensured by creating and then using a specification file for the WPAR. - You have to start a mobility server on the departure system and then start a mobility client on the arrival system which will connect to the mobility server. The movewpar command that is the core of this capability is not officially documented. There is no man page, nor does the WPAR Manager product documentation mention the command line approach. The information here is from the redbook on the topic. The documented command line approach uses checkpoint and restart (covered later). © Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-89

Instructor Guide

Instructor notes: Purpose — Provide an overview of the command line interface method for enhanced live application mobility. Details — Additional information — Transition statement — Let’s examine this procedure, step-by-step.

10-90 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Enhanced live relocation: CLI (1 of 4) IBM Power Systems

• Create the file systems structure on the NFS server for the WPAR. – /, /tmp, /home, /var , and any application file systems – Optionally: /usr , /opt (but not recommended)

• Export the NFS file systems with root access to both global environments and to the WPAR. 1 # exportfs /export/wpars –sec=sys,access=lparX:wparA:lparY, root=lparX:wparA:lparY /export/wpars_home -sec=sys,rw,access=lparX: wparA:lparY,root=lparX:wparA:lparY /export/wpars_tmp –sec=sys,rw,access=lparX: wparA:lparY,root=lparX:wparA:lparY /export/wpars_var -sec=sys,rw,access=lparX: wparA:lparY,root=lparX:wparA:lparY

© Copyright IBM Corporation 2009

Figure 10-33. Enhanced live relocation: CLI (1 of 4)

AN151.0

Notes: The NFS aspects of using the command line interface are not any different from using the WPAR Manager GUI interface. Live relocation requires that any file systems that are private to the WPAR be stored externally with common access from both the source and target systems. Currently, only NFS is supported for this requirement. The file systems /usr and /opt are optionally nfs mounted from the nfsserver, but managing these as private file systems can be a problem, both in management and performance. It is not recommended. Once the file systems have been defined and mounted on the NFS server, they need to be defined as NFS exported file systems with the two LPARs (global environments) and the WPAR having read-write access as root.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-91

Instructor Guide

Instructor notes: Purpose — Cover the set up of the NFS server to support relocation. Details — Additional information — Transition statement — Let’s look at the creation and activation of a relocatable WPAR.

10-92 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Enhanced live relocation: CLI (2 of 4) IBM Power Systems

• Create a checkpointable WPAR on the source LPAR: # mkwpar -c -r -o /tmp/mkwparA.log -R active=yes -M directory=/ vfs=nfs host=nfsserver \ dev=/export/wpars \ -M directory=/home vfs=nfs host=nfsserver \ dev=/export/wpars_home \ -M directory=/tmp vfs=nfs host=nfsserver \ dev=/export/wpars_tmp \ -M directory=/var vfs=nfs host=nfsserver \ dev=/export/wpars_var \ -n wparA

\ 2

• Start the WPAR on the source LPAR #

3

startwpar wparA

• Generate a spec file for the active WPAR:

4

# mkwpar -w -o /tmp/wparA_specfile -e wparA © Copyright IBM Corporation 2009

Figure 10-34. Enhanced live relocation: CLI (2 of 4)

AN151.0

Notes: The –c option in the mkwpar command specifies that the WPAR created will be checkpointable. The mount specifications match the NFS exports you set up earlier. Using the command line to implement the relocation, you need to define a clone of the WPAR on the target system. The easiest and safest way to do this is to generate a specification file.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-93

Instructor Guide

Instructor notes: Purpose — Discuss the creation and activation of a relocatable WPAR. Details — Additional information — Transition statement — Before you attempt a relocation, you should first check the compatibility of the two systems.

10-94 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Enhanced live relocation: CLI (3 of 4) IBM Power Systems

• Check compatibility of the target system (must be done manually): 5

– – – – – –

Processor type equal to source Processor speed and memory should equal or exceed source Matches source on hardware features and devices File systems must match the source system bos.rte.libc must match what is on the source Following filesets must have the same VRMF: • bos.rte • bos.wpars • mcr.rte – Any other software that is used by WPAR must match – Date and time should match (use xntpd or timed) – Administrator can provide additional tests © Copyright IBM Corporation 2009

Figure 10-35. Enhanced live relocation: CLI (3 of 4)

AN151.0

Notes: System compatibility System compatibility is strictly related to the relocation type. Live application mobility is the process of relocating a WPAR while preserving the state of the application stack. Static application mobility is defined as a shutdown of the WPAR on the departure node and the clean start of the WPAR on the arrival node while preserving the file system state. Live relocation requires a more extensive compatibility testing than static relocation. Therefore, it is possible that two systems could be incompatible for live relocation, but compatible for static relocation. Compatibility is evaluated on the following criteria: - Hardware levels (the two systems must have identical processor types) - Installed hardware features - Installed devices (as seen by the LPARs involved) - Operating system levels and patch levels

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-95

Instructor Guide

- Other software or file systems installed with the operating system (same V.R.M.F: version, release, modification, and fix) - Additional user-selected compatibility testing for application mobility Compatibility testing includes critical tests and optional tests. These compatibility tests help to determine if a WPAR can be relocated from one managed system to another. For each relocation type, live or static, there is a set of critical tests that must pass for one managed system to be considered compatible with another. For live relocation, the critical compatibility tests check the following compatibility criteria: - The operating system type must be the same on the arrival system and the departure system. - The operating system version and level must be the same on the arrival system and the departure system. - The processor class on the arrival system must be at least as high as the processor class of the departure system. - The version, release, modification, and fix level of the bos.rte fileset must be the same on the arrival system and the departure system. - The version, release, modification, and fix level of the bos.wpars fileset must be the same on the arrival system and the departure system. - The version, release, modification, and fix level of the mcr.rte fileset must be the same on the arrival system and the departure system. - The bos.rte.libc file must be the same on the arrival system and the departure system. - There must be at least as many storage keys on the arrival system as on the departure system. Note: The critical tests for static relocation are a subset of the tests for live relocation. The only critical test for static relocation is that the bos.rte.libc file must be the same on the arrival system and the departure system. In addition to these critical tests, you can choose to add additional optional tests for determining compatibility. These optional tests are selected as part of the WPAR group policy for the WPAR you are planning to relocate, and are taken into account for both types of relocation. Two managed systems might be compatible for one WPAR and not for another, depending on which WPAR group the WPAR belongs to and which optional tests were selected as part of the WPAR group policy. Critical tests are always applied in determining compatibility regardless of the WPAR group to which the WPAR belongs. You can choose from optional tests to check the following compatibility criteria: - NTP must be enabled on the arrival system and the departure system. 10-96 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

- The amount of physical memory on the arrival system must be at least as high as the amount on the departure system. - The processor speed for the arrival system must be at least as high as the processor speed for the departure system. - The version, release, modification, and fix level of the xlC.rte file set must be the same on the arrival system and the departure system.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-97

Instructor Guide

Instructor notes: Purpose — Explain the compatibility requirements of two servers to support relocation of a WPAR. Details — Additional information — Transition statement — Let’s look at the final steps involved in invoking the relocation between the servers.

10-98 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Enhanced live relocation: CLI (4 of 4) IBM Power Systems

• Create the WPAR on the target system using the spec file. 6

# mkwpar -p –f /tmp/wparA -specfile

• Start a migration server on the source system. 7 # /opt/mcr/bin/movewpar –s wparA Connection key: 49ff5ba00000838c mcr: Migration server started successfully for wpar wparA

• Migrate the active WPAR using the key on the target system. 8 # /opt/mcr/bin/movewpar –k 49ff5ba00000838c \ wparA © Copyright IBM Corporation 2009

Figure 10-36. Enhanced live relocation: CLI (4 of 4)

AN151.0

Notes: When creating a workload partition, or when you have to create many workload partitions, it can be long and complex. A specification file can be used instead and specified as an argument of the mkwpar command (mkwpar –f wpar.specfile). Also, you can create a spec file from an existing workload partition. The file /etc/wpars/xxx.cf contains the file for the WPAR xxx. You can use an existing specification file to create the next WPAR, create a near clone WPAR, and to document a current WPAR configuration. To do an enhanced live migration, there needs to be a migration server running on the source system. When you start a migration server (using the movewpar command) to support the WPAR which you intend to relocate, the server generates a connection key. Record the connection key value; this key is needed when you request the actual migration. To run the actual migration, you start a migration client at the target system (using the movewpar command), provide it with the information of what server to connect to, what WPAR to migrate, and what connection key to use. The migration then proceeds just as if you had requested it from the WPAR Manager GUI interface. © Copyright IBM Corp. 2009

Unit 10. Workload partitions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

10-99

Instructor Guide

Instructor notes: Purpose — Discuss the relocation of WPAR. Details — Additional information — Transition statement — Let’s review what we have covered.

10-100 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Steps for WPAR static relocation (WPAR Mgr GUI) IBM Power Systems

1 1. 2 2. 3 3. 4. 4 5. 5

On the NFS server, create, and export file system for backup Create WPAR on source system (with local file systems) Start and use the WPAR Quiesce and stop applications (if needed) Invoke static relocation task to compatible target system wp p o st

ar

WPAR Mgr

Source System

st

ar tw

pa r Target System

WPAR ServerA

sav /va ewpa r/ a dm r /W PA Local R file systems

NFS Server (backup)

WPAR ServerA relocated

par w t res

Local file systems

© Copyright IBM Corporation 2009

Figure 10-37. Steps for WPAR static relocation (WPAR Mgr GUI)

AN151.0

Notes: NFS for static relocation A major requirement for WPAR Manager version 1.2 static relocation is the use of an NFS server to hold the backup images. Based upon prior WPAR backups, you should know how large the allocated file system, on the NFS server, will need to be. Both the source and target LPARs will need to have root read-write access to this NFS file system. You need to manually define the mount of this NFS file system in both the source and target systems’ global environments, using the expected mount point of /var/adm/WPAR, (though that can be modified as a WPAR Manager application configuration setting). WPAR definition The WPAR does not have to be checkpoint-enabled, since you are not going to use live relocation. On the other hand, the WPAR must not use NFS for its private file systems. Static relocation expects to back up these file systems to NFS and then restore them from NFS.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions 10-101 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

Instructor Guide

WPAR state prior to and after static relocation The WPAR Manager static relocation task will stop the WPAR on the source system, if it is currently active. In most situations, you will want to be sure you have safely quiesced your applications and then shutdown the WPAR prior to invoking relocation. The WPAR Manager static relocation task will restart the WPAR on the target system after relocation only if it needed to stop the WPAR on the source system. WPAR Manager GUI As before, selecting the WPAR and clicking Relocate from the task menu will start the relocation task. You can, optionally, first click the Compatibility menu item in order to discover valid candidate targets; but, the relocation task will automatically check the compatibility of the target you designate. After the compatibility check, the relocation task will: - Collect the WPAR properties (to use in later redeployment) - Stop the WPAR (if it is active) - Execute savewpar on the source system - Execute restwpar on the target system - Start the WPAR (if this task stopped it) - Remove the backup image - Remove the WPAR definition from the source system

10-102 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Discuss static relocation. Details — The use of savewpar and restwpar were covered in the prerequisite course. The only difference here is that we are restoring on a different system and the storage of the image files are on an NFS server to allow the WPAR Manager GUI to easily execute the relocation as a single task. Additional information — Transition statement — While the enhanced live relocation technology has many advantages, the older non-enhanced technology still has its uses and is still supported.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions 10-103 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

Instructor Guide

Steps for checkpoint and restart relocation: CLI IBM Power Systems

1 1. 2 2. 3 3. 4 4. 5 5. 6 6.

On the NFS server, create, and export WPAR filesystems. Create WPAR on Source system (with checkpointable flag). Start the WPAR. Checkpoint the WPAR (store state file on NFS server). Create WPAR on target system (with checkpointable flag). Restart the WPAR on target system using statefile. lparX

lparY

Source System

Target System

WPAR wparB

WPAR wparB relocated

NFS Server

© Copyright IBM Corporation 2009

Figure 10-38. Steps for checkpoint and restart relocation: CLI

AN151.0

Notes: Here is an overview of the different steps needed to implement live relocation using the checkpoint and restart technologies. 1. Create the file-systems structure on the nfs server for the WPAR: • / • /tmp • /home • /var • optionally /usr and /opt (not recommended) Then export the file systems with root access to both the global environments and to the WPAR. Here is an example: 2. Create the WPAR – checkpointable (mkwpar –c) 3. Start the WPAR – startwpar. 10-104 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

4. Checkpoint the running WPAR using the chkptwpar command. Specify the state file name and the –k option in the command (kill the WPAR running). That requires an empty directory in which the state file will be created (this directory must be accessible from both systems). 5. Create the WPAR on the target system using the mkwpar command (checkpointable). 6. Restart this WPAR using the state-file previously created during the checkpoint operation. Verify that the application is still running after the relocation.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions 10-105 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

Instructor Guide

Instructor notes: Purpose — Provide an overview of the checkpoint and restart based live relocation procedure. Details — Additional information — Transition statement — Let’s look at the steps in detail.

10-106 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Checkpoint and restart relocation: CLI (1 of 3) IBM Power Systems

• Create the file systems structure on the NFS server for the WPAR. 1

– Optionally: /usr , /opt – /, /tmp, /home, /var , and any application file systems

• Export the file systems with root access to both global environment and WPAR. {nfsserver} /# exportfs /export/wpars -sec=sys,rw, access=lparX:wparB:lparY,root=lparX:wparB:lparY /export/wpars_home -sec=sys,rw, access=lparX:wparB:lparY,root=lparX:wparB:lparY /export/wpars_tmp -sec=sys,rw, access=lparX:wparB:lparY,root=lparX:wparB:lparY /export/wpars_var -sec=sys,rw, access=lparX:wparB:lparY,root=lparX:wparB:lparY /export/wpars_cpr -sec=sys,rw, access=lparX:wparB:lparY,root=lparX:wparB:lparY © Copyright IBM Corporation 2009

Figure 10-39. Checkpoint and restart relocation: CLI (1 of 3)

AN151.0

Notes: Both live relocation and enhanced live relocation require that the WPAR private file systems be served from an NFS server. You need to be sure that the file systems allocation on the NFS server are large enough. When configuring NFS to export these file systems, be sure to provide root read-write access to the WPAR and to the global environment on both the source and target systems. The only difference between the enhanced live relocation and the live relocation setup is that the live relocation setup requires an additional file system to hold the state files. This is shown in the visual as the /export/wpars_cpr line in the exportfs report.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions 10-107 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

Instructor Guide

Instructor notes: Purpose — Cover the NFS setup for live application mobility. Details — Additional information — Transition statement — Once the NFS server is setup, we next need to define and start the WPAR on the source system.

10-108 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Checkpoint and restart relocation: CLI (2 of 3) IBM Power Systems

• Create the WPAR on source – checkpointable. {lparX} /: mkwpar -c -r -o /tmp/mkwparB.log -R active=yes \ -M directory=/ vfs=nfs host=nfsserver dev=/export/wpars \ -M directory=/home vfs=nfs host=nfsserver \ dev=/export/wpars_home \ -M directory=/tmp vfs=nfs host=nfsserver \ 2 dev=/export/wpars_tmp \ -M directory=/var vfs=nfs host=nfsserver \ dev=/export/wpars_var \ -M directory=/cpr vfs=nfs host=nfsserver \ dev=/export/wpars_cpr \ -n wparB -o wparB.spec

• Start the WPAR {lparX} /: startwpar wparB

3

© Copyright IBM Corporation 2009

Figure 10-40. Checkpoint and restart relocation: CLI (2 of 3)

AN151.0

Notes: One important step in a WPAR migration is to create a checkpoint of the system WPAR. That requires an empty directory in which the statefile will be created. In our example, an empty directory named /export/wpars_cpr is created on the NFS shared filesystem and must be mountable from both AIX systems (must be visible from inside and outside the WPAR) The –c option in the mkwpar command specifies that the WPAR which is created will be checkpointable.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions 10-109 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

Instructor Guide

Instructor notes: Purpose — Cover the definition and starting of the WPAR. Details — Additional information — Transition statement — Next, let’s begin the relocation process.

10-110 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Checkpoint and restart relocation: CLI (3 of 3) IBM Power Systems

• Checkpoint the running WPAR. – Specify the statefile name and the –k option {lparX} /: /opt/mcr/bin/chkptwpar –d \ /wpars/wparB/cpr/wparB.statefile \ -o /wpars/wparB/cpr/wparB.statefile.log

4 -k wparB

• Create the WPAR on the target system – checkpointable. 5 {lparY} /: mkwpar –f /cpr/wparB.spec

• Restart the WPAR on target system using the statefile. 6 {lparY} /: /opt/mcr/bin/restartwpar -d /wpars/wparB/cpr/wparB.statefile \ -o /wpars/wparB/cpr/wparB.statefile.log

wparB

© Copyright IBM Corporation 2009

Figure 10-41. Checkpoint and restart relocation: CLI (3 of 3)

AN151.0

Notes: WPAR live relocation uses the following approach: • Freezing running applications and other services within a WPAR • Performing a checkpoint which saves all execution state to a checkpoint file • Restoring the execution state on a different but compatible system or LPAR • Restarting the applications and other services from the restored execution state This is primarily done by saving the runtime state of the WPAR and its processes and then reconstructing the state using the configuration and the saved runtime state. The restarted application resumes at the point where the checkpoint was done and the state of its objects like memory, file objects, network connections and IPC objects is restored without loss of any data. Checkpoint and restart Live relocation depends upon the process of saving (through a checkpoint operation) an application or system service's complete execution state and then restarting that © Copyright IBM Corp. 2009

Unit 10. Workload partitions 10-111 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

Instructor Guide

application or system service at a later time or on a different machine utilizing the previously saved state. The application should continue from the previously saved state as if no checkpoint and restart operation happened. Applications running inside a WPAR are checkpointed by sending them a special signal (outside the process signal range) which loads the system checkpoint handler and does the checkpoint. During a checkpoint, applications and the network are frozen while the state is being saved. After a checkpoint, an application may resume or it can be restarted on another system. Creating the WPAR on the target The restart requires an already defined WPAR that matches the definition of the WPAR being relocated. The easiest way to ensure that the target uses the correct definition is to use a specification file generated from the source system WPAR definition. A specification file can be used as an option of the mkwpar command (mkwpar –f wpar.specfile). Since the source WPAR was defined as checkpointable, the spec file will create the target WPAR as checkpointable.

10-112 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Describe the checkpoint and restart steps. Details — Additional information — Transition statement — Let’s review what we have covered with some checkpoint questions.

© Copyright IBM Corp. 2009

Unit 10. Workload partitions 10-113 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

Instructor Guide

Checkpoint (1 of 2) IBM Power Systems

1. What are the three forms of file system access within a WPAR? 2. True or False: For live application mobility, the WPAR must be checkpoint enabled.

© Copyright IBM Corporation 2009

Figure 10-42. Checkpoint (1 of 2)

AN151.0

Notes: Write down your answers here:

10-114 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Details —

Checkpoint solutions (1 of 2) IBM Power Systems

1. What are the three forms of file system access within a WPAR? Shared-system: /usr and /opt are shared read-only from the global environment through namefs mounts. NFS hosted: /usr and /opt file systems are nfs mounted from a host system Non shared: /var, /home, /tmp, and / are separate local file systems (jfs/jfs2) within the WPAR 2. True or False: For live application mobility, the WPAR must be checkpoint enabled.

© Copyright IBM Corporation 2009

Additional information — Transition statement —

© Copyright IBM Corp. 2009

Unit 10. Workload partitions 10-115 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

Instructor Guide

Checkpoint (2 of 2) IBM Power Systems

3. True or False: WPAR Manager is part of AIX 6. 4. What are the two types of WPAR relocation supported by the WPAR Manager version 1.2 GUI? 5. True or False: WPAR Manager is able to manage WPARs in LPARs for several servers over the same network.

© Copyright IBM Corporation 2009

Figure 10-43. Checkpoint (2 of 2)

AN151.0

Notes: Write down your answers here:

10-116 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Details —

Checkpoint solutions (2 of 2) IBM Power Systems

3. True or False: WPAR Manager is part of AIX 6. WPAR Manager is a separate product, that is part of the IBM System Director family. 4. What are the two types of WPAR relocation supported by the WPAR Manager version 1.2 GUI? Enhanced live relocation and static relocation 5. True or False: WPAR Manager is able to manage WPARs in LPARs for several servers over the same network. WPAR Manager provides a centralized management of WPARs for all client servers on the same network.

© Copyright IBM Corporation 2009

Additional information — Transition statement —

© Copyright IBM Corp. 2009

Unit 10. Workload partitions 10-117 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

Instructor Guide

Unit summary IBM Power Systems

Having completed this unit, you should be able to: • Describe WPAR Manager concepts • Install WPAR Manager and Agent Manager on server LPAR • Install WPAR Agent on client LPAR • Create, start, and manage a WPAR • Relocate a WPAR from source client LPAR to destination LPAR client © Copyright IBM Corporation 2009

Figure 10-44. Unit summary

AN151.0

Notes:

10-118 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Details — Additional information — Transition statement —

© Copyright IBM Corp. 2009

Unit 10. Workload partitions 10-119 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

Instructor Guide

10-120 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Unit 11. The AIX system dump facility What this unit is about This unit explains how to maintain the AIX system dump facility and how to obtain a system dump.

What you should be able to do After completing this unit, you should be able to: • • • • •

Explain what is meant by a system dump Determine and change the primary and secondary dump devices Create a system dump Execute the snap command Use the kdb command to check a system dump

How you will check your progress Accountability: • Checkpoint questions • Lab exercise

References Online

AIX Version 6.1 Command Reference volumes 1-6

Online

AIX Version 6.1 Kernel Extensions and Device Support Programming Concepts (Chapter 16. Debug Facilities)

Online

AIX Version 6.1 Operating system and device management (section on System Startup)

Note: References listed as “online” above are available at the following address: http://publib.boulder.ibm.com/infocenter/systems

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-1

Instructor Guide

Unit objectives IBM Power Systems

After completing this unit, you should be able to: • Explain what is meant by a system dump • Determine and change the primary and secondary dump devices • Create a system dump • Execute the snap command • Use the kdb command to check a system dump © Copyright IBM Corporation 2009

Figure 11-1. Unit objectives

AN151.0

Notes: Importance of this unit If an AIX kernel (the major component of your operating system) crashes, routines used to create a system dump are invoked. This dump can be used to analyze the cause of the system crash. As an administrator, you have to know what a dump is, how the AIX dump facility is maintained, and how a dump can be obtained. You also need to know how to use the snap command to package the dump before sending it to IBM.

11-2 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Present the objectives for this unit. Details — Use the information in the student notes to emphasize the importance of this unit. Also, be sure to set expectations regarding this unit: The purpose of this unit is to show the students how to maintain/configure the system dump facility and obtain a system dump. We are not trying to teach the students anything about analyzing a dump in this unit. (Students interested in learning how to analyze system dumps should attend QT080 - AIX 5L System Dump Analysis. Note that completion of either QT070 - AIX 5L Kernel Internals: Concepts or Q1335 - AIX Kernel Internals is a prerequisite for taking QT080.) Additional information — None Transition statement — So, what is a system dump?

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-3

Instructor Guide

System dumps IBM Power Systems

• What is a system dump? • What is a system dump used for?

© Copyright IBM Corporation 2009

Figure 11-2. System dumps

AN151.0

Notes: What is a system dump? A system dump is a snapshot of the operating system state at the time of a crash or a manually-initiated dump. When a manually-initiated or unexpected system halt occurs, the system dump facility automatically copies selected areas of kernel data to the primary (or secondary) dump device. These areas include kernel memory, as well as other areas registered in a structure called the Master Dump Table by kernel modules or kernel extensions.

What is a system dump used for? The system dump facility provides a mechanism to capture sufficient information about the AIX kernel for later expert analysis. Once the preserved image is written to disk, the system will be booted and returned to production. The dump is then typically submitted to IBM for analysis.

11-4 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Present an overview of system dumps. Details — We will talk more about the primary and secondary dump devices (mentioned in the student notes on the current page) later. Additional information — Transition statement — Let’s look at the different types of dumps which are available in AIX.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-5

Instructor Guide

Types of dumps IBM Power Systems

• Traditional: – AIX generates dump prior to halt

• Firmware assisted (fw-assist): – POWER6 firmware generates dump in parallel with AIX V6 halt process – Defaults to same scope of memory as traditional – Can request a full system dump

• Live Dump Facility: – Selective dump of registered components without need for a system restart – Can be initiated by software or by operator – Controlled by livedumpstart and dumpctrl – Written to a file system rather than a dump device

© Copyright IBM Corporation 2009

Figure 11-3. Types of dumps

AN151.0

Notes: Overview In addition to the traditional dump function, AIX 6.1 introduces two new types of dumps.

Traditional dumps Traditionally, AIX alone handled system dump generation and the only way to get a dump was to halt the system either due to a crash or through operator request. In a logical partition it will only dump the memory that is allocated to that partition.

Firmware assisted dumps (fw-assist) With AIX 6.1 and POWER6 hardware, you can configure the dump facility to have the firmware of the hardware platform handle the dump generation. The main advantage to this is that the operating system can start its reboot while the firmware handles the dumping of the memory contents. 11-6 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

In its default mode, it will capture the same scope of memory as the traditional dump, but it can be configured for a full memory dump. If, for some reason (such as memory restrictions), a configured or requested firmware assisted dump is not possible, then the traditional dump facility will be invoked. More details on the configuration and initiation of firmware assisted dumps will be covered later in the context of the sysdumpdev and sysdumpstart commands.

Live dump facility AIX 6.1 also introduces a new live dump capability. If a system component is designed to use this facility, a restricted scope dump of the related memory can be captured without the need to halt the system. If an individual component is having problems (such as being hung), a livedumpstart command may be run to dump the needed diagnostic information. The management of live dumps (such as enabling a component or controlling the dump directory) is handled with the dumpctrl command. The use and management of live dumps require a knowledge of system components which is beyond the scope of this class. Only use these commands under the direction of AIX Support line personnel.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-7

Instructor Guide

Instructor notes: Purpose — Explain the different types of dumps Details — This is only an overview of the dump types. Do not go into much detail here. There are two main reasons for introducing these dump types. First, they will likely hear them referred to as being in AIX 6.1 and this will help clarify what these are about. Second, they will see references to the firmware assisted dumps when we look at the smit panels and line commands for dump management, later in the unit. Additional information — Transition statement — Let’s look at ways a system dump might be created.

11-8 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

How a system dump is invoked IBM Power Systems

Copies kernel data structure to a dump device

Through command

At unexpected system halt

Through keyboard or reset button

Through remote reboot facility Through HMC reset/ dump

Through SMIT

© Copyright IBM Corporation 2009

Figure 11-4. How a system dump is invoked

AN151.0

Notes: Creating a system dump A system dump might be created in one of several ways: - An AIX system will generate a system dump automatically when a sufficiently severe system error is encountered. - A set of special keys on the Low Function Terminal (LFT) graphics console keyboard can invoke a system dump when your machine's mode switch is set to the Service position or the Always Allow System Dump option is set to true. - On systems running versions of AIX 5L prior to AIX 5L V5.3, a dump can also be invoked when the Reset button is pressed when your machine's mode switch is set to the Service position or the Always Allow System Dump option is set to true. In AIX 5L V5.3 and AIX 6.1, the system will always dump when the Reset button is pressed, providing the dump device is non-removable.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-9

Instructor Guide

- For logical partitions running AIX, the HMC can issue a restart with dump request which is the functional equivalent of the previously described reset button triggered dump. - The superuser can issue a command directly, or through SMIT, to invoke a system dump. - The remote reboot facility can also be used to create a system dump.

Analysis of system dump Usually, for persistent problems, the raw dump data is placed on a portable media, such as tape, and sent to AIX Support for analysis. The raw dump data can be formatted into readable output through the kdb command.

The sysdumpdev command The default system dump configuration of the system can be altered with the sysdumpdev command. For example, using this command, you can configure system dumps to occur regardless of key mode switch position, which is handy for PCI-bus systems, as they do not have a key mode switch.

System dumps in an LPAR environment In an LPAR environment, a dump can be initiated from the Hardware Management Console (HMC). We will discuss this point in more detail later in this unit.

11-10 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain ways to obtain a system dump. Details — Your system generates a system dump when a severe error occurs. System dumps can also be user-initiated by users with root authority. A system dump creates a picture of your system's memory contents. System administrators and programmers can generate a dump and analyze its contents when debugging new applications. When the system halts with a flashing 888 followed by a 102, then you know that a system dump has occurred. When this happens, you must turn your system off and back on in order to get your system back again. You might want to recap the different uses of the Reset button on the classical RS/6000 that we have considered so far in this course: 1. Reboot the system (in normal or service mode). 2. Formulate the Service Request Number (SRN) and the location code (in normal mode). 3. Cause a dump (in service mode). Keep in mind that there is no system key on PCI-based systems, so (prior to AIX 5L V5.3) you had to use the -K option of the sysdumpdev command (or use SMIT) to set Always Allow System Dump to true in order to enable use of the Reset button to force a dump on such systems. We will discuss how a dump can be initiated using the other methods listed later in this unit. Additional information — In an LPAR environment, a dump can be initiated from the Hardware Management Console (HMC) by choosing Dump from the Restart Options when using the Restart Partition menu selection in the Server Management application. The Dump option is equivalent of pressing the Reset button on an Sserver non-LPAR system. The partition will initiate a system dump to the primary dump device if configured to do that. Otherwise, the partition will simply reboot. Transition statement — Let’s see what happens when a dump occurs. It may be the result of a system crash.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-11

Instructor Guide

LED 888 code IBM Power Systems

888 code

Software

Hardware

Reset 102

103 Yes

Reset for crash code Reset for dump code

Reset twice for SRN yyy-zzz Reset once for FRU Reset eight times for location code

Optional codes for hardware failure

© Copyright IBM Corporation 2009

Figure 11-5. LED 888 code

AN151.0

Notes: What is the 888 code? One type of error you may encounter is an LED 888 code. When displayed on a physical operator panel display, the 888 will often be flashing on and off. So you will hear this referred to as a flashing 888, even though an HMC does not flash the number. An 888 code indicates that you have lost your system and that additional information is available as a series of display values. Either a hardware or software problem has been detected and a diagnostic message is ready to be read. A series of resets will walk through the sequence of code values. Record, in sequence, every code displayed after the 888. On systems with no HMC and a three-digit or a four-digit operator panel, you may need to press the system’s reset button to view the additional digits after the 888. When working with an HMC you will need to do the virtual equivalent by requesting a reset operation. Stop recording when the 888 digits reappear.

11-12 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

102 code A 102 code indicates that a system dump has occurred; your AIX kernel crashed due to bad circumstances. You may need to press the reset button to obtain the crash code and then the dump code. We will cover more on system dumps in Unit 11, The AIX System Dump Facility.

103 code A 103 usually indicates a hardware error. In an HMC managed LPAR environment, hardware errors are reported through the service focal point of the HMC; thus, you should not expect to see an 888-103 sequence for in an LPAR reference code field on the HMC. Working with the HMC facilities is covered in the LPAR training (either AU730 or the AN301). If you do have an 888-103 sequence, pressing the reset button twice will get a Service Request Number, which may be used by IBM support to analyze the problem. In case of a hardware failure, additional resets would retrieve the sequence number of the Field Replaceable Unit (FRU) and a location code. The location code identifies the physical location of a device.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-13

Instructor Guide

Instructor notes: Purpose — Introduce what an 888 display code means. Details — Describe what students have to do when an 888 display occurs. Emphasize that, in an HMC managed LPAR environment, they should only see the 888-102 sequence. The focus here is on crashes which result in dumps (the left side of the diagram). Additional information — Transition statement — Whether an unintended system crash or an administrator requested dump, where is the dump stored and how do we access it?

11-14 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

When a dump occurs IBM Power Systems

AIX Kernel

hd6

Crash

Primary dump device

/dev/hd6

Next boot: Copy dump into ...

/var/adm/ras/vmcore.0

Copy directory

© Copyright IBM Corporation 2009

Figure 11-6. When a dump occurs

AN151.0

Notes: Primary dump device If an AIX kernel crash (system-initiated or user-initiated) occurs, kernel data is written to the primary dump device, which is, by default, /dev/hd6, the primary paging device. Note that, after a kernel crash, AIX may need to be rebooted. If the autorestart system attribute is set to TRUE, the system will automatically reboot after a crash.

The copy directory During the next boot, the dump is copied (remember: rc.boot 2) into a dump directory; the default is /var/adm/ras. The dump file name is vmcore.x, where x indicates the number of the dump (for example, 0 indicates the first dump).

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-15

Instructor Guide

Instructor notes: Purpose — Describe what happens if a dump occurs. Details — Base your presentation on the material in the student notes. Additional information — None Transition statement — Let’s find out where all this information is written and how you can customize this.

11-16 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

The sysdumpdev command IBM Power Systems

# sysdumpdev -l primary secondary copy directory forced copy flag always allow dump dump compression type of dump

List dump values /dev/hd6 /dev/sysdumpnull /var/adm/ras TRUE FALSE ON traditional

# sysdumpdev -p /dev/sysdumpnull

Deactivate primary dump device (temporary)

# sysdumpdev -P -s /dev/rmt0

Change secondary dump device (Permanent)

# sysdumpdev -L Device name: Major device number: Minor device number: Size: Date/Time: Dump status:

Display information about last dump /dev/hd6 10 2 9507840 bytes Tue Oct 5 20:41:56 PDT 2007 0 © Copyright IBM Corporation 2009

Figure 11-7. The sysdumpdev command

AN151.0

Notes: Primary and secondary dump devices There are two system dump devices: - Primary - Usually used when you wish to save the dump data - Secondary - Can be used to discard dump data (using /dev/sysdumpnull) Use the sysdumpdev command or SMIT to query or change the primary and secondary dump devices. Ensure you know your system and know what your primary and secondary dump devices are set to. Your dump device can be a portable medium, such as a tape drive. AIX 5L and AIX 6.1 uses /dev/hd6 (paging) as the default primary dump device.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-17

Instructor Guide

Flags for sysdumpdev command Flags for the sysdumpdev command include the following: -l

Lists current values of dump-related settings

-e

Estimates the size of a dump

-p

Specifies primary dump device

-C

Turns on compression (default in AIX 5L V5.3 and not an option in AIX 6.1 where dumps are always compressed)

-c

Turns off compression (not an option in AIX 6.1)

-s

Specifies secondary dump device

-P

Makes change of primary or secondary dump device permanent

-d directory

Specifies the directory the dump is copied to at system boot. If the copy fails at boot time, the -d flag indicates that the system dump should be ignored (force copy flag = FALSE)

-D directory

Specifies the directory the dump is copied to at system boot. If the copy fails at boot time, using the -D flag allows you to copy the dump to external media (force copy flag = TRUE).

-K

If your machine has a key mode switch, the reset button or the dump key sequences will force a dump with the key in the normal position, or on a machine without a key mode switch. Note: On a machine without a key mode switch, a dump cannot be forced with the key sequence without this value set. This is also true of the reset button prior to AIX 5.3.

-f { disallow | allow | require } Specifies whether the firmware-assisted full memory system dump is allowed, required, or not allowed. The -f has the following variables: - The disallow variable specifies that the full memory system dump mode is not allowed (it is the selective memory mode). - The allow variable specifies that the full memory system dump mode is allowed but is performed only when the operating system cannot properly handle the dump request. - The require variable specifies that the full memory system dump mode is allowed and is always performed. -t { traditional | fw-assisted } Specifies the type of dump to perform. The -t flag has the following variables: 11-18 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

- The traditional variable specifies performing a traditional system dump. In this dump type, the dump data is saved before system reboot.

Uempty

- The fw-assisted variable specifies performing a firmware-assisted system dump. In this dump type, the dump data is saved in parallel with the system reboot. You can use the firmware-assisted system dump only on PHYP platforms with various restrictions on memory size. When the fw-assisted system dump type is not allowed at configuration time, or is not enforced at dump request time, a traditional system dump is performed. In addition, because the scratch area is only reserved at initialization, a configuration change from traditional system dump to firmware-assisted system dump is not effective before the system is rebooted. -z

Writes to standard output the string containing the size of the dump in bytes and the name of the dump device, if a new dump is present.

Dump status values Status values, as reported by sysdumpdev -L, correspond to dump LED codes (listed in full later) as follows: 0 = 0c0 -1 = 0c8 -2 = 0c4 -3 = 0c5

dump completed no primary dump device partial dump dump failed to start

Note: If the value of Dump status is -3, Size usually shows as 0, even if some data was written.

Examples on visual The examples on the visual illustrate use of several of the sysdumpdev flags discussed in the preceding material.

Dump information in the error log System dumps are usually recorded in the error log with the DUMP_STATS label. Here, the Detail Data section will contain the information that is normally given by the sysdumpdev -L command: the major device number, minor device number, size of the dump in bytes, time at which the dump occurred, dump type, that is, primary or secondary, and the dump status code.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-19

Instructor Guide

DVD support for system dumps (AIX 5L V5.3 and later) AIX 5L V5.3 added the ability to send the system dump to DVD media. The DVD device could be used as a primary or secondary dump device. In order to get this functionality, the target DVD device should be DVD-RAM or writable DVD. Remember to insert an empty writable DVD in the drive when using the sysdumpdev command, or when you require the dump to be copied to the DVD at boot time after a crash. If the DVD media is not present, the commands will give error messages or will not recognize the device as suitable for system dump copy.

Display of extra dump information on TTY (AIX 5L V5.3 and later) During the creation of the system dump, AIX 5L V5.3 or later displays additional information on the console TTY about the progress of the system dump, as illustrated in the following sample output: # sysdumpstart -p Preparing for AIX System Dump . . . Dump Started .. Please wait for completion message AIX Dump .. 23330816 bytes written - time elapsed is 47 secs Dump Complete .. type=4, status=0x0, dump size:23356416 bytes Rebooting . . . At this time, the kernel debugger and the 32-bit kernel need to be enabled to see this function, and the functionality has been checked only on the S1 port. However, this limitation may change in the future.

11-20 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Verbose flag for sysdumpdev (AIX 5L V5.3 and later) Following a system crash, there exist scenarios where a system dump may crash or fail without one byte of data written out to the dump device, for example, power off or disk errors. For cases where a failed dump does not include the dump minimal table, it is very useful to save some trace back information in the NVRAM. Starting with AIX 5L V5.3, the dump procedure is enhanced to use the NVRAM to store minimal dump information. In the case where the dump fails, we can use the sysdumpdev -vL command (-v is the new verbose flag) to check the reason for the failure.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-21

Instructor Guide

Instructor notes: Purpose — Discuss the sysdumpdev command and its various options. Details — When you install the operating system, the dump device is automatically configured for you. By default the primary device is /dev/hd6, which is a paging logical volume, and the secondary device is /dev/sysdumpnull. If a dump occurs to paging, the system will automatically copy the dump when the system is rebooted. By default, the dump gets copied to the /var/adm/ras directory. We will look at this in detail later in this unit. The recommended size for the dump device is at least a quarter of the size of real memory. In problem situations where the current dump device does not meet this recommendation, it is advisable to create a temporary dump logical volume of the size required and manually recreate the environment in which a previous dump occurred. If the dump device is not large enough, the system will produce a partial dump only. It is possible, but extremely unlikely, that a support center can determine the cause of the crash from a partial dump. The -e flag can be used as a starting point to determine how big the dump device should be. Discussion Items - What is the advantage of having two dump areas? Answer: For a backup media. Additional information — Historic note: For systems that were migrated from AIX V3.2 to AIX V4 or later, the primary dump device is set to what it was formerly - /dev/hd7. Transition statement — For systems with more than 4 GB of memory, a dedicated dump device is created at installation time.

11-22 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Dedicated dump device (1 of 2) IBM Power Systems

Servers with real memory > 4 GB will have a dedicated dump device created at installation time System memory size

Dump device size

4 GB to, but not including, 12 GB

1 GB

12, but not including, 24 GB

2 GB

24, but not including, 48 GB

3 GB

48 GB and up

4 GB

© Copyright IBM Corporation 2009

Figure 11-8. Dedicated dump device (1 of 2)

AN151.0

Notes: Creation of dedicated dump device Servers with more than 4 GB of real memory will have a dedicated dump device created at installation time. This dedicated dump device is automatically created; no user intervention is required. As indicated on the visual, the size of the dump device that will be created depends on the system memory size.

Default name of dedicated dump device The default name of the dump device logical volume is lg_dumplv.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-23

Instructor Guide

Instructor notes: Purpose — Explain that a dedicated dump device is created for systems with more than 4 GB of main memory. Details — Point out that the size of the dedicated dump device depends on the amount of physical memory on this system and mention the default name of the dedicated dump device. Additional information — Transition statement — You can specify the name and size of the dedicated dump device instead of using the defaults we have just discussed.

11-24 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Dedicated dump device (2 of 2) IBM Power Systems

/bosinst.data ... control_flow: CONSOLE = /dev/vty0 ... large_dumplv: DUMPDEVICE = /dev/lg_dumplv SIZEGB = 1

© Copyright IBM Corporation 2009

Figure 11-9. Dedicated dump device (2 of 2)

AN151.0

Notes: The bosinst.data file The bosinst.data file contains stanzas which direct the actions of the Base Operating System (BOS) install program. After an initial installation, you can change many aspects of the default behavior of the BOS install program by editing the bosinst.data file and using it (for example, on a supplementary diskette) with your installation media.

The large_dumplv stanza The optional large_dumplv stanza in bosinst.data can be used to specify characteristics to be used if a dedicated dump device is created. A dedicated dump device is only created for systems with 4 GB or more of memory. The following characteristics can be specified in the large_dumplv stanza: - DUMPDEVICE: Specifies the name of the dedicated dump device © Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-25

Instructor Guide

- SIZEGB: Specifies the size of the dedicated dump device in gigabytes If the stanza is not present, the dedicated dump device is created when required, using the default values previously discussed.

11-26 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Describe how the bosinst.data file can be used to specify the name and size of a dedicated dump device. Details — Additional information — Transition statement — It is important to determine the estimated size of a system dump for your machine.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-27

Instructor Guide

Estimating dump size IBM Power Systems

Estimate dump size # sysdumpdev -e 0453-041 estimated dump size in bytes: 52428800

Turn on dump compression

# sysdumpdev -C

(In AIX 6.1, dumps are always compressed) # sysdumpdev -e 0453-041 estimated dump size in bytes: 10485760

Use this information to size the /var file system. © Copyright IBM Corporation 2009

Figure 11-10. Estimating dump size

AN151.0

Notes: Sizing the /var file system You should size the /var file system so that there is enough free space to hold the dump information should your machine ever crash.

Estimating the space needed to hold a system dump The sysdumpdev -e command will provide an estimate of the amount of disk space needed for system dump information. The size of the dump device and of the copy directory you will require are directly related to the amount of RAM on your machine. The more RAM on the machine, the more space that will be needed on the disk. Machines with 16 GB of RAM may need 2 GB of dump space.

11-28 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Dump compression In AIX V4.3.2, an option was added to compress the dump data before it is written. Dump compression is on by default in AIX 5L V5.3. To turn on dump compression, enter sysdumpdev -C. This will significantly reduce the amount of space needed for dump information. To turn off compression, enter sysdumpdev -c. Starting with AIX 6.1, dumps are always compressed; thus the -C and -c flags to control compression are no longer valid options of the sysdumpdev command.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-29

Instructor Guide

Instructor notes: Purpose — Show how to estimate the disk space needed for a system dump. Details — The command sysdumpdev -e will estimate the dump size. It is just an estimate. To be safe, the disk space should be larger than the estimate. Also, if the system has dumped in the past, looking at the size of the past dump can provide more guidance on sizing the dump device. This can be seen using the command sysdumpdev -L (mentioned earlier in the unit). In AIX V4.3.2, the ability to compress the dump was introduced. Turning on dump compression will reduce the space needed significantly. Dump compression is on by default in AIX 5L V5.3. Dumps are always compressed in AIX 6.1 and later. You should mention a few other points about dump devices: • • • •

If a paging device (like hd6) is used for dumps, it must be part of rootvg. The primary dump device must always be in the rootvg. The secondary dump device may be outside rootvg as long as it is not a paging device. Prior to 4.3.3, dump devices should not be mirrored. The dump information was written to only one mirror and the mirror was not marked stale. When rebooting, the information in the dump device would write the data to the dump file using both copies of the mirror even though only one mirror had the correct information. This created a corrupted dump file. In 4.3.3, this was corrected by allowing the dump file to be read only from the good copy. • AIX at V5.3 and later allows a DVD device to be used as a primary or secondary dump device.

Additional information — Transition statement — Let’s look at a new feature in AIX 5L that checks dump space sizes.

11-30 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

dumpcheck utility IBM Power Systems

• The dumpcheck utility will do the following when enabled: – Estimate the dump or compressed dump size using sysdumpdev –e. – Find the dump logical volumes and copy directory using sysdumpdev –l. – Estimate the primary and secondary dump device sizes. – Estimate the copy directory free space. – Report any problems in the error log file.

© Copyright IBM Corporation 2009

Figure 11-11. dumpcheck utility

AN151.0

Notes: Function of the dumpcheck utility AIX 5L V5.1 introduced the /usr/lib/ras/dumpcheck utility. This utility is used to check the disk resources used by the system dump facility. The command logs an error if either the largest dump device is too small to receive the dump, or there is insufficient space in the copy directory when the dump device is a paging space. If the dump device is a paging space, dumpcheck will verify if the free space in the copy directory is large enough to copy the dump. If the dump device is a logical volume, dumpcheck will verify it is large enough to contain a dump. If the dump device is a tape, dumpcheck will exit without a message. Any time a problem is found, dumpcheck will (by default) log an entry in the error log. If the -p flag is present, dumpcheck will display a message to stdout.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-31

Instructor Guide

Example of dumpcheck use The following example illustrates use of the dumpcheck utility and shows sample output from this command: # /usr/lib/ras/dumpcheck -p There is not enough free space in the file system containing the copy directory to accommodate the dump. File system name /var/adm/ras Current free space in kb 117824 Current estimated dump size in kb 161996 Note that, since the -p flag was used in this example, the output from dumpcheck was written to stdout.

Enabling and disabling dumpcheck In order to be effective, the dumpcheck utility must be enabled. Verify that dumpcheck has been enabled by using the following command: # crontab -l | grep dumpcheck 0 15 * * * /usr/lib/ras/dumpcheck >/dev/null 2>&1 By default, it is set to run at 3 p.m. each afternoon. Enable the dumpcheck utility by using the -t flag. This will create an entry in the root crontab if none exists. In this example, the dumpcheck utility is set to run at 2 p.m: # /usr/lib/ras/dumpcheck -t “0 14 * * *” For the best results, set dumpcheck to run when the system is heavily loaded. This will identify the maximum size the dump will take. As previously mentioned, the time is set for 3 p.m. by default. If you use the -p flag in the crontab entry, root will be send a mail message with the standard output of the dumpcheck command.

11-32 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Discuss the dumpcheck command. Details — Emphasize that (by default) any problems found by dumpcheck will be written to the error log. So, it is important to check the error log. Additional information — If compression is turned off, dumpcheck does not work and gives the error message: 0453-062 Could not change the user Set attributes. Transition statement — We have mentioned several ways in which a system dump can be initiated. Let’s discuss this subject in more detail.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-33

Instructor Guide

Methods of starting a dump IBM Power Systems

• Automatic invocation of dump routines by system • Using the sysdumpstart command or SMIT Option: Option: Option: Option:

-p –s –t –f

(send to primary dump device) (send to secondary dump device) (use traditional dump) (select scope of dump)

• Using a special key sequence on the LFT (to primary dump device) (to secondary dump device)

• Using the Reset button • Using the Hardware Management Console (HMC) – Restart LPAR with the Dump option

• Using the remote reboot facility © Copyright IBM Corporation 2009

Figure 11-12. Methods of starting a dump

AN151.0

Notes: Ways to obtain a system dump A system dump may be automatically created by the system. In addition, there are several ways for a user to invoke a system dump. The most appropriate method to use depends on the condition of the system.

Automatic invocation of dump routines If there is a kernel panic, the system will automatically dump the contents of real memory to the primary dump device.

Using the sysdumpstart command or SMIT One method a superuser can use to invoke a dump is to run the sysdumpstart command or invoke it through SMIT (fastpath smit dump).

11-34 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

The -p flag of sysdumpstart is used to specify a dump to the primary dump device. The -s flag of sysdumpstart is used to specify a dump to the secondary dump device. The -t flag of sysdumpstart is used to change the default type from fw_assist to traditional. The -f flag of sysdumpstart is used to change the scope of the dump (interacts with the configuration set up with sysdumpdev): - disallow - Do not allow a full memory dump. - require - Require a full memory dump.

Using a special key sequence If the system has halted, but the keyboard will still accept input, a dump to the primary dump device can be forced by pressing the key sequence on the LFT keyboard. The key combination on the LFT can be used to initiate a system dump to the secondary dump device. This method can only be used when your machine's mode switch (if your machine has such a switch) is set to the Service position or the Always Allow System Dump option is set to true. The Always Allow System Dump option can be set to true using SMIT or by using sysdumpdev -K.

Using the Reset button On systems running versions of AIX 5L prior to AIX 5L V5.3, a dump can also be invoked when the Reset button is pressed and when your machine's mode switch is set to the Service position or the Always Allow System Dump option is set to true. In AIX 5L V5.3 and later, the system will always dump when the Reset button is pressed, providing the dump device is non-removable. This method can be used if the keyboard is no longer accepting input. Note that pressing the Reset button twice will cause the system to reboot.

Using the hardware management console In an LPAR environment, a dump can be initiated from the Hardware Management Console (HMC) by choosing Dump from the Restart Options (accessed through the Restart Partition menu selection in the Server Management application). The Dump option is the equivalent of pressing the physical Reset button on a non-LPAR system. The partition will initiate a system dump to the primary dump device if configured to do that. Otherwise, the partition will simply reboot.

Using the remote reboot facility The remote reboot facility can also be used to obtain a system dump. This capability will be further discussed shortly.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-35

Instructor Guide

Obtaining a useful system dump Bear in mind that if your system is still operational, a dump taken at this time will not assist in problem determination. A relevant dump is one taken at the time of the system halt.

11-36 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Describe the different methods that can be used to initiate a dump. Details — Do not start a dump if the flashing 888 number shows on the LED. This number could indicate that a dump has already occurred on your system. You can determine this by finding out the LED code that is displayed after the flashing 888. If it is a 102, then this indicates that a dump has occurred. This indicates that your system has already created a system dump and has written the information to the primary dump device. If you start your own dump before copying the information in your dump device, your new dump may overwrite the existing information. You are allowed two system dumps with the names vmcore.0 and vmcore.1. If another system dump occurs, the names will be vmcore.1 and vmcore.2, with system dump vmcore.0 removed. A user-initiated dump is different from a dump initiated by an unexpected system halt because the user can designate which dump device to use. When the system halts unexpectedly, a system dump is initiated automatically to the primary dump device. Here is some additional information about some of the methods listed on the visual: • Command Line This method uses the sysdumpstart command. Note, however, this command is only available if you install the Software Service Aids (bos.sysmgt.serv_aid) package. You must have root authority to run this command. First, you might want to check the current settings of your system dump devices by using the sysdumpdev -l command. Then initiate the dump with sysdumpstart -p (for the primary device) or -s (for the secondary device). Note that if the LED display is blank on the RS/6000 with an LED, the dump was not started. Try again using a different method. There is no way to tell on the RS/6000 system without an LED if a dump has started, is in process, or has finished. • Using SMIT The SMIT screen which will allow you to do this is shown on a subsequent visual. • Using special key sequence If you have an LFT, you can initiate a dump either to the primary or the secondary device by using one of the key sequences specified. The NUMPAD, which is referred to in the student notes, is the set of number keys on the right hand side of the keyboard. • Using the Reset button This procedure works for all system configurations and will work in circumstances where other methods for starting a dump will not. On systems running versions of AIX prior to AIX 5L V5.3, ensure always allow dump is set to TRUE. The system writes the dump information to the primary dump device. To set always allow dump to TRUE, execute the sysdumpdev -K command (or use SMIT). Additional information — Transition statement — Let’s discuss the remote reboot facility in a little more detail.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-37

Instructor Guide

Start a dump from a TTY IBM Power Systems

S1

login: #dump#>1

mp u D

Add a TTY ... REMOTE Reboot ENABLE: dump REMOTE Reboot STRING: #dump# ...

© Copyright IBM Corporation 2009

Figure 11-13. Start a dump from a TTY

AN151.0

Notes: The remote reboot facility The remote reboot facility allows the system to be rebooted through a native (integrated) serial port. The system is rebooted when the reboot_string is received at the port. This facility is useful when the system does not otherwise respond but is capable of servicing serial port interrupts. Remote reboot can be enabled on only one native serial port at a time. An important feature of the remote reboot facility is that it can be configured to obtain a system dump prior to rebooting.

11-38 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Configuring the remote reboot facility Two native serial port attributes control the operation of remote reboot: - reboot_enable - reboot_string Use of these attributes is discussed in the following paragraphs.

reboot_enable The value of this attribute (referred to as REMOTE Reboot ENABLE in SMIT) indicates whether this port is enabled to reboot the machine on receipt of the remote reboot_string, and if so, whether to take a system dump prior to rebooting: - no: Indicates remote reboot is disabled - reboot: Indicates remote reboot is enabled - dump: Indicates remote reboot is enabled, and, prior to rebooting, a system dump will be taken on the primary dump device

reboot_string This attribute (referred to as REMOTE Reboot STRING in SMIT) specifies the remote reboot_string that the serial port will scan for when the remote reboot feature is enabled. When the remote reboot feature is enabled, and the reboot_string is received on the port, a '>' character is transmitted, and the system is ready to reboot. If a '1' character is received, the system is rebooted (and a system dump may be started, depending on the value of the reboot_enable attribute); any character other than '1' aborts the reboot process. The reboot_string has a maximum length of 16 characters and must not contain a space, colon, equal sign, null, new line, or Ctrl-\ character.

Enabling remote reboot Remote reboot can be enabled through SMIT or the command line. For SMIT, the path System Environments -> Manage Remote Reboot Facility may be used for a configured TTY. Alternatively, when configuring a new TTY, remote reboot may be enabled from the Add a TTY or Change/Show Characteristics of a TTY menus. These menus are accessed through the path Devices -> TTY. From the command line, the mkdev or chdev command is used to enable remote reboot.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-39

Instructor Guide

Instructor notes: Purpose — Explain how to start a dump from a TTY. Details — Base your explanation on the material in the student notes. Additional information — As mentioned in the student notes, the values for REMOTE Reboot ENABLE are: no reboot dump

Remote reboot is disabled Remote reboot is enabled Remote reboot is enabled and a dump will occur prior to reboot

There is a good discussion of the remote boot facility (starting on page 24) in the AIX 5L Version 5.3 System Management Guide: Operating System and Devices. Transition statement — Let’s look at the dump interface of SMIT.

11-40 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Generating dumps with SMIT IBM Power Systems

# smit dump System Dump Move cursor to desired item and press Enter Show Current Dump Devices Show Information About the Previous System Dump Show Estimated Dump Size Change the Type of Dump Change the Full Memory Dump Mode Change the Primary Dump Device Change the Secondary Dump Device Change the Directory to which Dump is Copied on Boot Start a Dump to the Primary Dump Device Start a Traditional System Dump to the Secondary Dump Device Copy a System Dump from a Dump Device to a File Always ALLOW System Dump Check Dump Resources Utility Change/Show Global System Dump Properties Change/Show Dump Attributes for a Component Change Dump Attributes for multiple Components © Copyright IBM Corporation 2009

Figure 11-14. Generating dumps with SMIT

AN151.0

Notes: Using the SMIT dump interface You can use the SMIT dump interface to work with the dump facility. The menu items that show or change the dump information use the sysdumpdev command.

The Always ALLOW System Dump option A very important item on the menu shown on the visual is Always ALLOW System Dump. If you set this option to yes, the CTRL-ALT-1 (numpad) and CTRL-ALT-2 (numpad) key sequences will start a dump even when the key mode switch is in Normal position. On systems running versions of AIX prior to AIX 5L V5.3, setting this item to yes also enables use of the Reset button to start a system dump.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-41

Instructor Guide

Instructor notes: Purpose — Introduce the SMIT dump interface. Details — Do not go into too much detail here. Just mention that SMIT uses the sysdumpdev command for many of the items and they were covered earlier. Explain that PCI machines should always allow a system dump. Historically, MCA machines could put the physical key in the service mode to achieve this. This setting was created specifically for PCI machines. While there are three new AIX 6.1 items at the bottom of the menu, they are for the component live dump facility. If asked about them, be ready to place them in context, but avoid getting into the details which is outside the scope of this course. On the other hand, the Change Type of Dump and Change Full Memory Dump Mode are new items with AIX 6.1, which relate to the firmware assisted dump capabilities we previously introduced. The name of the menu item Start a Dump to the Secondary Dump Device has changed in AIX 6.1 to Start a Traditional System Dump to the Secondary Dump Device in order to distinguish this from the firmware assisted dump. Additional information — None Transition statement — Let’s discuss dump-related LED codes.

11-42 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Dump-related LED codes IBM Power Systems

0c0

Dump completed successfully

0c1

I/O error occurred during the dump

0c2

Dump started by user

0c4

Dump completed unsuccessfully, not enough space on dump device, partial dump available

0c5

Dump failed to start, unexpected error occurred when attempting to write to dump device; for example, tape not loaded

0c6

Secondary dump started by user

0c8

Dump disabled, no dump device configured

0c9

System-initiated panic dump started

0cc

Failure writing to primary dump device, switched over to secondary © Copyright IBM Corporation 2009

Figure 11-15. Dump-related LED codes

AN151.0

Notes: System-initiated dumps If a system dump is initiated through a kernel panic, the LEDs on an RS/6000 will display 0c9 while the dump is in progress, and then either a flashing 888 or a steady 0c0. All of the LED codes following the flashing 888 (remember: you must use the Reset button), should be recorded and passed to IBM. While rotating through the 888 sequence, you will encounter one of the codes shown. The code you want to see is 0c0, indicating that the dump completed successfully.

User-initiated dumps For user-initiated system dumps to the primary dump device, the LED codes should indicate 0c2 for a short period, followed by 0c0 upon completion.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-43

Instructor Guide

Other common LED codes Other common codes include the following: 0c1

An I/O error occurred during the dump.

0c4

Indicates that the dump routine ran out of space on the specified device. It may still be possible to examine and use the data on the dump device, but this tells you that you should increase the size of your dump device.

0c5

Check the availability of the medium to which you are writing the dump (for example, whether the tape is in the drive and write enabled).

0c6

This is used to indicate a dump request to the secondary device.

0c7

A network dump is in progress, and the host is waiting for the server to respond. The value in the three-digit display should alternate between 0c7 and 0c2 or 0c9. If the value does not change, then the dump did not complete due to an unexpected error.

0c8

You have not defined a primary or secondary dump device. The system dump option is not available. Enter the sysdumpdev command to configure the dump device.

0c9

A dump started by the system did not complete. Wait for one minute for the dump to complete and for the three-digit display value to change. If the three-digit display value changes, find the new value on the list. If the value does not change, then the dump did not complete due to an unexpected error.

0cc

This code indicates that the dump could not be written to the primary dump device. Therefore, the secondary dump device will be used. This code was introduced quite some time ago (with AIX V4.2.1).

11-44 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — List the different dump LED codes that will be seen under different circumstances. Details — Go through the list and highlight first the codes that will be seen if a system-initiated dump occurs and then if a user-initiated dump occurs. Refer to the student notes for a detailed description of the commonly seen codes. Additional information — While the dump is occurring, the 0c2 or 0c9 code is displayed. How long the dump takes to complete is dependent on how large the dump is. Small dumps should take less than 30 seconds; large dumps may take several minutes. On machines with two line front panel displays (LEDs), the second line will display the number of bytes written so far to the dump device. This provides an indication to you that the dump is still proceeding well, and it also gives you an idea of how much more data has to be written (if you have a record of a past sysdumpdev -e). Transition statement — Having caused a dump, the next issue you have to consider is how you are going to retrieve the dump from your system.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-45

Instructor Guide

Copying a system dump IBM Power Systems

Dump occurs rc.boot 2

yes

Dump copied to /var/adm/ras

Is there sufficient space in /var to copy dump to?

no Display the copy dump to tape Menu.

Forced copy flag = TRUE

Boot continues © Copyright IBM Corporation 2009

Figure 11-16. Copying system dump

AN151.0

Notes: Sufficient space in /var For an RS/6000 with an LED, after a crash, if the LED displays 0c0, then you know that a dump occurred and that it completed successfully. At this point, unless you have set the autorestart system attribute to true, you have to reboot your system. If there is enough space to copy the dump from the paging space to the /var/adm/ras directory, then it will be copied directly.

11-46 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Insufficient space in /var/adm/ras If, however, at bootup, the system determines that there is not enough space to copy the dump to /var, the /sbin/rc.boot script (which is executed at bootup) will call the /lib/boot/srvboot script. This script in turn calls on the copydumpmenu command, which is responsible for displaying the following menu which can be used to copy the dump to removable media: Copy a System Dump to Removable Media

The system dump is 583973 bytes and will be copied from /dev/hd6 to media inserted into the device from the list below. Please make sure that you have sufficient blank, formatted media before proceeding. Step One: Step Two:

Insert blank media into the chosen drive. Type the number for that device and press Enter. Device type

>>> 1 2

tape/scsi/8mm Diskette Drive

Path Name /dev/rmt0 /dev/fd0

88 Help? 99 Exit >>> Choice

[1]

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-47

Instructor Guide

Automatically reboot after a crash IBM Power Systems

# smit chgsys Change/Show Characteristics of Operating System Type or select values in entry fields. Press Enter AFTER making all desired changes. Maximum number of PROCESSES allowed per user [128] Maximum number of pages in block I/O BUFFER CACHE Automatically REBOOT system after a crash

[20] false

... Enable full CORE dump Use pre-430 style CORE dump

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

false false

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

© Copyright IBM Corporation 2009

Figure 11-17. Automatically reboot after a crash

AN151.0

Notes: Specifying automatic reboot using SMIT If you want your system to reboot automatically after a dump, you must set the kernel parameter autorestart to true. This can be easily done by the SMIT fastpath smit chgsys. The corresponding menu item is Automatically REBOOT system after a crash. Note that the default value is true in AIX 5L V5.2 and later.

Specifying automatic reboot using the chdev command If you do not want to use SMIT to specify automatic reboot after a system dump, execute the following command: # chdev -l sys0 -a autorestart=true

11-48 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Checking the size of /var If you specify an automatic reboot, you should verify that the /var file system is large enough to store a system dump.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-49

Instructor Guide

Instructor notes: Purpose — Describe how to set up an automatic reboot after a crash. Details — Base your explanation on the material in the student notes. Additional information — None Transition statement — Let’s discuss the snap command.

11-50 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Sending a dump to IBM IBM Power Systems

• Copy all system configuration data including a dump onto tape: # snap

-a

-o

/dev/rmt0

• Label tape with: –

Problem Management Record (PMR) number

–

Command used to create tape

–

Block size of tape

• Support Center uses kdb to examine the dump

© Copyright IBM Corporation 2009

Figure 11-18. Sending a dump to IBM

AN151.0

Notes: Collecting system data Before sending a dump to the IBM Support Center, use the snap command to collect system data. The command /usr/sbin/snap -a -o /dev/rmt0 will collect all the necessary data. In AIX 5L V5.2 and subsequent versions, pax is used to write the data to tape. The Support Center will need the information collected by snap in addition to the dump and kernel. Do not send just the dump file vmcore.x without the corresponding AIX kernel. Without the corresponding kernel, analysis is not possible.

Use of the kdb command The AIX Systems Support Center will analyze the contents of the dump using the kdb command. The kdb command uses the kernel that was active on the system at the time of the halt. © Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-51

Instructor Guide

Purpose of snap command The snap command was developed by IBM to simplify gathering configuration information. It provides a convenient method of sending lslpp and errpt output to the support centers. It gathers system configuration information and compresses the information to a pax file. The file can then be downloaded to disk, or tape.

Flags for snap command Some useful flags for the snap command are the following: -a

Copies all system configuration information to /tmp/ibmsupt directory tree

-c

Creates a compressed pax image (snap.pax.Z) of all files in the /tmp/ibmsupt directory tree or other named output directory

-f

Gathers file system information

-g

Gathers general information

-k

Gathers kernel information

-D

Gathers dump and /unix

-t

Creates tcpip.snap file; gathers TCP/IP information

AIX 5L V5.3 snap enhancements AIX 5L V5.3 extended the functionality of snap in using external scripts, letting snap split up the output pax file into smaller pieces, or extending the collected data. The next few paragraphs provide additional details regarding these new capabilities.

Extending snap to run external scripts Scripts that the snap command is to run can be specified in three different ways: - Specifying the name of a script in the /usr/lib/ras/snapscripts directory that snap should call - Specifying the all keyword, which indicates that snap should call all scripts in the /usr/lib/ras/snapscripts directory - Specifying the name of a file that contains the list of scripts (one per line) that snap should call. The syntax file: is used in this case.

11-52 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

The snapsplit command The snapsplit command is introduced in AIX 5L V5.3. The snapsplit command is used to split a snap output file into smaller files. This command is useful for dealing with very large snap files. It breaks the file down into files of a specific size that are multiples of 1 MB. Furthermore, it will combine these files into the original file when called with the -u option. Refer to the man page for snapsplit (or the corresponding entry in the AIX Commands Reference manual) for additional information regarding this command.

Splitting the snap output file from the snap command There is a new flag for the snap command, -O megabytes, introduced in AIX 5L V5.3 that enables you to split the snap output file. The snap command calls the snapsplit command. You can use the flag as follows to split the large snap output into smaller 4 MB files. # snap -a -c -O 4

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-53

Instructor Guide

Instructor notes: Purpose — Explain how the system dump should be prepared before it is sent to the IBM Support Center. Details — Provide the students with as much of the following information as you think is appropriate: The information gathered with the snap command can be used to identify and resolve system problems. You must have root authority to execute this command. If you use the -a flag, then you need approximately 8 MB of temporary disk space to collect all the system information, including the contents of the error log (covered in a previous unit). The -g flag gathers the following information: • Error report • Copy of the customized ODM • Trace file • User environment • Amount of physical memory and paging space • Device and attribute information • Security user information The output from the -g flag is written to /tmp/ibmsupt/general/general.snapfile. However, you can specify another directory using the -d flag. The execution of snap appends information to the previously created files. Use the -r flag to remove previously gathered and saved information. Before you send your media to the support center, ensure you call them and obtain a Problem Management Number (PMR) which will be used to trace the status of your problem. Ensure you label the media with this number, and also the other pieces of information listed, to help the support team act quickly on your problem. There is not much left for you to do after this, apart from waiting for a response from the Support Center. However, you may want to have a look at your dump to try and analyze it yourself. The tool that is used by the support center to analyze your dump is called kdb (crash prior to AIX 5L V5.1), which is also available on the system; however, the output from the command is very user unfriendly. Most people do not bother with this. See the student notes for the AIX 5L V5.3 enhancements. Additional information — In AIX 5L, the pax command was enhanced to allow archiving of large files, such as dumps. The tar command, which was used prior to AIX 5L, does not support files larger than 2 GB. If the file to be archived is larger than 2 GB, the only thing available is pax. 11-54 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Transition statement — Let's take a brief look at kdb to see how it can be used.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-55

Instructor Guide

Use kdb to analyze a dump IBM Power Systems

/unix (Kernel)

/var/adm/ras/vmcore.x (Dump file)

# uncompress /var/adm/ras/vmcore.x.Z or # dmpuncompress /var/adm/ras/vmcore.x.BZ # kdb /var/adm/ras/vmcore.x /unix > status > stat (further sub-commands for analyzing) > quit

/unix kernel must be the same as on the failing machine © Copyright IBM Corporation 2009

Figure 11-19.

Use kdb to analyze a dump

AN151.0

Notes: Function of the kdb command The kdb command is an interactive tool used for operating system analysis. Typically, kdb is used to examine kernel dumps in a system postmortem state. However, a live running system can also be examined with kdb, although due to the dynamic nature of the operating system, the various tables and structures often change while they are being examined, and this precludes extensive analysis.

Examining an active system To examine an active system, you would simply run the kdb command without any arguments.

11-56 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Analyzing a system dump For a dead system, a dump is analyzed using the kdb command with file name arguments, as illustrated on the visual. To use kdb, the vmcore file must be uncompressed. After a crash, it is typically named vmcore.x.Z, which indicates that it is in a compressed format. As illustrated on the visual, use the uncompress command before using kdb. To analyze a dump file, you would first uncompress the compressed dump. If the dump file has a .Z suffix, then you would use the uncompress command. In AIX 6.1, the dump file ends in a .BZ suffix and you must use the dmpuncompress command to process this file. If you wish to leave the original compressed file intact (rather than replacing it with the uncompressed file), then use the -p option of the dmpuncompress command. # uncompress /var/adm/ras/vmcore.x.Z or # dmpuncompress /var/adm/ras/vmcore.x.BZ Once the dump is uncompressed, you would analyze it with the kdb command. # kdb /var/adm/ras/vmcore.x /unix

Potential problems when using kdb If the copy of /unix does not match the dump file, the following output will appear on the screen: WARNING: dumpfile does not appear to match namelist If the dump itself is corrupted in some way, then the following will appear on the screen: ... dump /var/adm/ras/vmcore.x corrupted

Useful subcommands Examining a system dump requires an in-depth knowledge of the AIX kernel. However, there are two subcommands that might be useful to you: - The subcommand status displays the processes/threads that were active on the CPUs when the crash occurred - The subcommand stat shows the machine status when the dump occurred To exit the kdb debug program, type quit at the > prompt.

Creating a sample system dump The following example stops your running machine and creates a system dump: # cat /unix > /dev/mem Do not execute this command in your production environment. © Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-57

Instructor Guide

The LEDs displayed are 888, 102, 300, 0C0: - Refer to earlier material for discussion of the 888 code - LED 102 indicates that “a dump has occurred” - LED 300 stands for crash code “Data Storage Interrupt (DSI)” - LED 0C0 means “Dump completed successfully”

11-58 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Introduce the kdb command. Details — Cover the information in the student notes. You might also want to make some of the points mentioned below: kdb is an interactive utility for examining an operating system image, a core image, or the running kernel. It also interprets and formats control structures in the system and provides certain miscellaneous functions useful for examining a dump. In order to analyze the dump, you must execute the kdb command against /unix, and it must be the /unix of the system that had the problem. To make any change to code, you must have the source AIX code, which is not held by customers - so there is not much more that you can do. Generally speaking, it is best left for the IBM Support Center to handle the dump. The last thing you want to do is send a dump to the IBM Support Center and find out that they cannot do anything about it because it is a partial dump. Get it right from the start. Additional information — Prior to AIX 5L V5.1, the crash command was used instead of the kdb command. Transition statement — We have reached a checkpoint.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-59

Instructor Guide

Checkpoint IBM Power Systems

1. If your system has less than 4 GB of main memory, what is the default primary dump device? Where do you find the dump file after reboot? _________________________________________________________ _________________________________________________________

2. How do you turn on dump compression? _________________________________________________________

3. What command can be used to initiate a system dump? _________________________________________________________

4. If the copy directory is too small, will the dump, which is copied during the reboot of the system, be lost? _________________________________________________________ _________________________________________________________

5. Which command should you execute to collect system data before sending a dump to IBM? _________________________________________________________ © Copyright IBM Corporation 2009

Figure 11-20. Checkpoint

AN151.0

Notes:

11-60 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Present the checkpoint questions. Details — A “Checkpoint Solution” is given below:

Checkpoint solutions IBM Power Systems

1.

If your system has less than 4 GB of main memory, what is the default primary dump device? Where do you find the dump file after reboot? The default primary dump device is /dev/hd6. The default dump file is /var/adm/ras/vmcore.x, where x indicates the number of the dump.

2.

How do you turn on dump compression? sysdumpdev -C (Dump compression is on by default in AIX 5L V5.3 and cannot be turned off in AIX 6.1)

3.

What command can be used to initiate a system dump? sysdumpstart

4.

If the copy directory is too small, will the dump, which is copied during the reboot of the system, be lost? If the force copy flag is set to TRUE, a special menu is shown during reboot. From this menu, you can copy the system dump to portable media.

5.

Which command should you execute to collect system data before sending a dump to IBM? snap © Copyright IBM Corporation 2009

Additional information — Here are a couple of points you might want to make when going over the answers to the checkpoint: • If there is 4 GB or more of memory, then a dedicated dump logical volume is created. • Dump compression can be turned off with the -c flag of sysdumpdev. Transition statement — Let’s switch over to the lab.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-61

Instructor Guide

Exercise 11: System dump IBM Power Systems

• Working with the AIX Dump Facility

© Copyright IBM Corporation 2009

Figure 11-21. Exercise 11: System dump

AN151.0

Notes: Objectives for this exercise At the end of the exercise, you should be able to: - Initiate a system dump - Use the snap command

11-62 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Transition to the exercise for this unit. Details — Additional information — Transition statement — Let’s recall some of the key points from this unit.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-63

Instructor Guide

Unit summary IBM Power Systems

Having completed this unit, you should be able to: • Explain what is meant by a system dump • Determine and change the primary and secondary dump devices • Create a system dump • Execute the snap command • Use the kdb command to check a system dump © Copyright IBM Corporation 2009

Figure 11-22. Unit summary

AN151.0

Notes: When a dump occurs, kernel and system data are copied to the primary dump device. By default, the system has a primary dump device (/dev/hd6) and a secondary device (/dev/sysdumpnull). During reboot, the dump is copied to the copy directory (/var/adm/ras). A system dump should be retrieved from the system using the snap command. The Support Center uses the kdb debugger to examine the dump.

11-64 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Summarize the unit. Details — Present the highlights from the unit. Additional information — You might want to note that, if the system has 4 GB or more of main memory, then a dedicated dump logical volume is created. So, the default primary dump device actually depends on the amount of physical memory installed in the system. Transition statement — This brings us to the end of this course. Thank you.

© Copyright IBM Corp. 2009

Unit 11. The AIX system dump facility Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-65

Instructor Guide

11-66 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Appendix A. Checkpoint solutions Unit 1:

Checkpoint solutions IBM Power Systems

1. What are the four major problem determination steps? Identify the problem Talk to users (to further define the problem) Collect system data Resolve the problem 2. Who should provide information about system problems? Always talk to the users about such problems in order to gather as much information as possible. 3. True or False: If there is a problem with the software, it is necessary to get the next release of the product to resolve the problem. False. In most cases, it is only necessary to apply fixes or upgrade microcode. 4. True or False: Documentation can be viewed or downloaded from the IBM Web site. © Copyright IBM Corporation 2009

© Copyright IBM Corp. 2009

Appendix A. Checkpoint solutions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

A-1

Instructor Guide

Unit 2:

Checkpoint solutions IBM Power Systems

1. In which ODM class do you find the physical volume IDs of your disks? CuAt

2. What is the difference between the states: defined and available? When a device is defined, there is an entry in ODM class CuDv. When a device is available, the device driver has been loaded. The device driver can be accessed by the entries in the /dev directory.

© Copyright IBM Corporation 2009

A-2

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Unit 3:

Checkpoint solutions IBM Power Systems

1.

Which command generates error reports? Which flag of this command is used to generate a detailed error report? errpt errpt -a

2.

Which type of disk error indicates bad blocks? DISK_ERR4

3.

What do the following commands do? errclear Clears entries from the error log. errlogger Is used by root to add entries into the error log

4.

What does the following line in /etc/syslog.conf indicate? *.debug errlog All syslogd entries are directed to the error log. What does the descriptor en_method in errnotify indicate?

5.

It specifies a program or command to be run when an error matching the selection criteria is logged.

© Copyright IBM Corporation 2009

© Copyright IBM Corp. 2009

Appendix A. Checkpoint solutions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

A-3

Instructor Guide

Unit 4:

Checkpoint solutions IBM Power Systems

1.

True or False: NIM can be used to fix an LPAR which fails to boot because of a problem with the /etc/inittab. maint_boot

© Copyright IBM Corporation 2009

A-4

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Unit 5:

Checkpoint solutions (1 of 2) IBM Power Systems

1. True or False: You must have AIX loaded on your system to use the System Management Services programs. False. SMS is part of the built-in firmware. 2. Your AIX system is currently powered off. AIX is installed on hdisk1 but the bootlist is set to boot from hdisk0. How can you fix the problem and make the machine boot from hdisk1? You need to boot the SMS programs and set the new boot list to include hdisk1. 3. Your machine is booted and at the # prompt. What is the command that will display the normal bootlist? # bootlist -om normal. How could you change the normal bootlist? # bootlist -m normal device1 device2

4. What command is used to build a new boot image and write it to the boot logical volume? bosboot -ad /dev/hdiskx 5. What script controls the boot sequence? rc.boot © Copyright IBM Corporation 2009

© Copyright IBM Corp. 2009

Appendix A. Checkpoint solutions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

A-5

Instructor Guide

Checkpoint solutions (2 of 2) IBM Power Systems

6. True or False: During the AIX boot process, the AIX kernel is loaded from the root file system. False. The AIX kernel is loaded from hd5.

7. How do you boot an AIX machine into maintenance mode? You need to boot from an AIX CD, mksysb, or NIM server.

8. Your machine keeps rebooting and repeating the POST. What could be the reason for this? Invalid boot list, corrupted boot logical volume, or hardware failures of boot device.

© Copyright IBM Corporation 2009

A-6

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Unit 6:

Let’s review solution: rc.boot (1 of 3) IBM Power Systems

(1) /etc/init from RAMFS in the boot image

rc.boot 1

restbase

(2)

cfgmgr -f

(3)

bootinfo -b

(5)

(4) ODM files in RAM file system

© Copyright IBM Corporation 2009

© Copyright IBM Corp. 2009

Appendix A. Checkpoint solutions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

A-7

Instructor Guide

Let’s review solution: rc.boot (2 of 3) IBM Power Systems

(5) Merge RAM /dev files

rc.boot 2

(6)

(1) Activate rootvg

Copy RAM ODM files

Mount /dev/hd4 on / in RAMFS

(2)

Mount /var Copy dump Unmount /var

(3)

Copy boot messages to alog

557

mount

(7)

/dev/hd4

(8)

(4)

Turn on paging

© Copyright IBM Corporation 2009

A-8

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Let’s review solution: rc.boot (3 of 3) IBM Power Systems

savebase

/etc/inittab

syncd 60 errdemon

/sbin/rc.boot3 fsck -f /dev/hd3 mount /tmp

Turn off LED

rm /etc/nologin

syncvg rootvg &

chgstatus=3 CuDv ?

cfgmgr -p2 cfgmgr -p3

Execute next line in /etc/inittab

Start Console: cfgcon Start CDE: rc.dt boot © Copyright IBM Corporation 2009

© Copyright IBM Corp. 2009

Appendix A. Checkpoint solutions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

A-9

Instructor Guide

Let's review solution: /etc/inittab file IBM Power Systems

init:2:initdefault:

Determine initial run-level

brc::sysinit:/sbin/rc.boot 3

Startup last boot phase

rc:2:wait:/etc/rc

Multiuser initialization

fbcheck:2:wait:/usr/sbin/fbcheck

Execute /etc/firstboot, if it exists

srcmstr:2:respawn:/usr/sbin/srcmstr

Start the System Resource Controller

cron:2:respawn:/usr/sbin/cron

Start the cron daemon

rctcpip:2:wait:/etc/rc.tcpip rcnfs:2:wait::/etc/rc.nfs

Startup communication daemon processes (nfsd, biod, ypserv, and so forth)

qdaemon:2:wait:/usr/bin/startsrc -sqdaemon

Startup spooling subsystem

dt:2:wait:/etc/rc.dt

Startup CDE desktop

tty0:2:off:/usr/sbin/getty /dev/tty1

Line ignored by init

myid:2:once:/usr/local/bin/errlog.check

Process started only one time

© Copyright IBM Corporation 2009

A-10 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Checkpoint solutions IBM Power Systems

1. From where is rc.boot 3 run? From the /etc/inittab file in rootvg

2. Your system stops booting with LED 557: In which rc.boot phase does the system stop? rc.boot 2 What are some reasons for this problem? Corrupted BLV Corrupted JFS log Damaged file system

3. Which ODM file is used by the cfgmgr during boot to configure the devices in the correct sequence? Config_Rules

4. What does the line init:2:initdefault: in /etc/inittab mean? This line is used by the init process, to determine the initial run level (2=multiuser). © Copyright IBM Corporation 2009

© Copyright IBM Corp. 2009

Appendix A. Checkpoint solutions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

A-11

Instructor Guide

Unit 7:

Checkpoint solutions IBM Power Systems

1. True or False: All LVM information is stored in the ODM. False. Information is also stored in other AIX files and in disk control blocks (like the VGDA and LVCB). 2. True or False: You detect that a physical volume hdisk1 that is contained in your rootvg is missing in the ODM. This problem can be fixed by exporting and importing the rootvg. False. Use the rvgrecover procedure instead. This script creates a complete set of new rootvg ODM entries.

© Copyright IBM Corporation 2009

A-12 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Unit 8:

Checkpoint solutions IBM Power Systems

1. Although everything seems to be working fine, you detect error log entries for disk hdisk0 in your rootvg. The disk is not mirrored to another disk. You decide to replace this disk. Which procedure would you use to migrate this disk? Procedure 2: Disk still working. There are some additional steps necessary for hd5 and the primary dump device hd6.

2. You detect an unrecoverable disk failure in volume group datavg. This volume group consists of two disks that are completely mirrored. Because of the disk failure you are not able to vary on datavg. How do you recover from this situation? Forced varyon: varyonvg -f datavg. Use procedure 1 for mirrored disks.

3. After disk replacement, you recognize that a disk has been removed from the system but not from the volume group. How do you fix this problem? Use PVID instead of disk name: reducevg vg_name PVID © Copyright IBM Corporation 2009

© Copyright IBM Corp. 2009

Appendix A. Checkpoint solutions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

A-13

Instructor Guide

Unit 9:

Checkpoint solutions (1 of 4) IBM Power Systems

1. Name the two ways alternate disk installation can be used. Installing a mksysb image on another disk Cloning the current running rootvg to an alternate disk

2. What are the advantages of alternate disk rootvg cloning? Creates an online backup Allows maintenance and updates to software on the alternate disk helping to minimize down time

3. How do you remove an alternate rootvg? alt_disk_install -X

4. Why should you not use exportvg with an alternate disk VG? This will remove rootvg related entries from /etc/filesystems.

© Copyright IBM Corporation 2009

A-14 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Checkpoint solutions (2 of 4) IBM Power Systems

5. True or False: multibos provides for booting between alternate operating system environments within a single rootvg. 6. True or False: A standby BOS can only be accessed by changing the bootlist and then rebooting. 7. True or False: New fixpacks can be applied to a standby BOS with only a performance impact to the active BOS during the operation.

© Copyright IBM Corporation 2009

© Copyright IBM Corp. 2009

Appendix A. Checkpoint solutions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

A-15

Instructor Guide

Checkpoint solutions (3 of 4) IBM Power Systems

8. True or False: Creating a JFS2 snapshot requires a long time and a lot of disk space. 9. What is needed to change from external snapshots to internal snapshots? If already internal snapshot enabled – delete all external snapshots and start creating internal snapshots. If not already enabled, you additionally need to back up and delete the file system, before redefining it with internal snapshot enabled (isnapshot=yes) and restoring from backup.

10. How can we tell if an external snapshot is about to fill up? Run snapshot –q filesystem-name. The amount of free space will be listed.

© Copyright IBM Corporation 2009

A-16 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Checkpoint solutions (4 of 4) IBM Power Systems

11. Which two alternate disk installation techniques are available? Installing a mksysb on another disk Cloning the rootvg to another disk

12. True or False: multibos requires cloning all of the logical volumes in the active rootvg. 13. True or False: JFS2 snapshots require little or no quiescing of applications to obtain a stable point in time image of the snapped file system.

© Copyright IBM Corporation 2009

© Copyright IBM Corp. 2009

Appendix A. Checkpoint solutions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

A-17

Instructor Guide

Unit 10:

Checkpoint solutions (1 of 2) IBM Power Systems

1. What are the three forms of file system access within a WPAR? Shared-system: /usr and /opt are shared read-only from the global environment through namefs mounts. NFS hosted: /usr and /opt file systems are nfs mounted from a host system Non shared: /var, /home, /tmp, and / are separate local file systems (jfs/jfs2) within the WPAR 2. True or False: For live application mobility, the WPAR must be checkpoint enabled.

© Copyright IBM Corporation 2009

A-18 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Checkpoint solutions (2 of 2) IBM Power Systems

3. True or False: WPAR Manager is part of AIX 6. WPAR Manager is a separate product, that is part of the IBM System Director family. 4. What are the two types of WPAR relocation supported by the WPAR Manager version 1.2 GUI? Enhanced live relocation and static relocation 5. True or False: WPAR Manager is able to manage WPARs in LPARs for several servers over the same network. WPAR Manager provides a centralized management of WPARs for all client servers on the same network.

© Copyright IBM Corporation 2009

© Copyright IBM Corp. 2009

Appendix A. Checkpoint solutions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

A-19

Instructor Guide

Unit 11:

Checkpoint solutions IBM Power Systems

1.

If your system has less than 4 GB of main memory, what is the default primary dump device? Where do you find the dump file after reboot? The default primary dump device is /dev/hd6. The default dump file is /var/adm/ras/vmcore.x, where x indicates the number of the dump.

2.

How do you turn on dump compression? sysdumpdev -C (Dump compression is on by default in AIX 5L V5.3 and cannot be turned off in AIX 6.1)

3.

What command can be used to initiate a system dump? sysdumpstart

4.

If the copy directory is too small, will the dump, which is copied during the reboot of the system, be lost? If the force copy flag is set to TRUE, a special menu is shown during reboot. From this menu, you can copy the system dump to portable media.

5.

Which command should you execute to collect system data before sending a dump to IBM? snap © Copyright IBM Corporation 2009

A-20 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Appendix E:

Checkpoint solutions IBM Power Systems

1. What diagnostic modes are available? Concurrent Maintenance Service (standalone)

2. How can you diagnose a communication adapter that is used during normal system operation? Use either maintenance or service mode.

© Copyright IBM Corporation 2009

© Copyright IBM Corp. 2009

Appendix A. Checkpoint solutions Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

A-21

Instructor Guide

A-22 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Appendix B. Command summary Startup, Logoff, and Shutdown d (exit)

Log off the system (or the current shell).

shutdown

Shuts down the system by disabling all processes. If in single-user mode, you may want to use -F option for fast shutdown. -r option will reboot system. This requires user to be root or member of shutdown group.

Directories mkdir

Make directory

cd

Change the directory. The default is $HOME directory.

rmdir

Remove a directory (beware of files starting with “.”).

rm

Remove file; -r option removes directory and all files and subdirectories recursively.

pwd

Print working directory: shows name of current directory

ls

List files -a -l -d -r -t -C -R -F

(all) (long) (directory information) (reverse alphabetic) (time changed) (multi-column format) (recursively) (places / after each directory name & * after each exec file)

Files - Basic cat

List files contents (concatenate). This can open a new file with redirection, for example, cat > newfile. Use d to end input.

chmod

Change the permission mode for files or directories. • • • • •

© Copyright IBM Corp. 2009

chmod =+- files or directories (r,w,x = permissions and u, g, o, a = who) Can use + or - to grant or revoke specific permissions Can also use numerics, 4 = read, 2 = write, 1 = execute Can sum them, first is user, next is group, last is other Appendix B. Command summary

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

B-1

Instructor Guide

• For example, chmod 746 file1 is user = rwx, group = r, other = rw chown

Change owner of a file, for example, chown owner file

chgrp

Change group of files

cp

Copy file

mv

Move or rename file

pg

List file content by screen (page) • • • • • • • • • • • • • •

h (help) q (quit) (next pg) f (skip 1 page) l (next line) d (next 1/2 page) $ (last page) p (previous file), n (next file) . (redisplay current page) /string (find string forward) ?string (find string backward) -# (move backward # pages) +# (move forward # pages)

.

Current directory

..

Parent directory

rm

Remove (delete) files (-r option removes directory and all files and subdirectories)

head

Print first several lines of a file

tail

Print last several lines of a file

wc

Report number of lines (-l), words (-w), characters (-c) in files, no options gives lines, words, and characters

su

Switch user

id

Displays your user ID environment, user name and ID, group names and IDs

tty

Displays the device that is currently active. Very useful for XWindows where there are several pts devices that can be created. It is nice to know which one you have active. who am i will do the same.

B-2

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Files - Advanced awk

Programmable text editor / report write

banner

Display banner (can redirect to another terminal nn with > /dev/ttynn)

cal

Calendar (cal month year)

cut

Cut out specific fields from each line of a file.

diff

Differences between two files

find

Find files anywhere on disks. Specify location by path (will search all subdirectories under specified directory). • • • • • • •

-name fl (file names matching fl criteria) -user ul (files owned by user ul) -size +n (or -n) (files larger (or smaller) than n blocks) -mtime +x (-x) (files modified more (less) than x days ago) -perm num (files whose access permissions match num) -exec (execute a command with results of find command) -ok (execute a command interactively with results of find command) • -o (logical or) • -print (display results. Usually included.)

find syntax: find path expression action For example: • find / -name "*.txt" -print • find / -name "*.txt" -exec li -l {} \; (Executes li -l where names found are substituted for {}) ; indicates end of command to be executed and \ removes usual interpretation as command continuation character) grep

Search for pattern, for example, grep pattern files. pattern can include regular expressions. • • • •

-c -l -n -v

(count lines with matches, but do not list) (list files with matches, but do not list) (list line numbers with lines) (find files without pattern)

Expression metacharacters: • • • • • © Copyright IBM Corp. 2009

[ ] matches any one character inside. with a - in [ ] will match a range of characters ^ matches BOL when ^ begins the pattern. $ matches EOL when $ ends the pattern. . matches any single character. (same as ? in shell) Appendix B. Command summary

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

B-3

Instructor Guide

• * matches 0 or more occurrences of the preceding character. (Note: ".*" is the same as "*" in the shell). sed

Stream (text) editor, used with editing flat files

sort

Sort and merge files -r (reverse order); -u (keep only unique lines)

Editors ed

Line editor

vi

Screen editor

INed

LPP editor

emacs

Screen editor +

Shells, Redirection, and Pipelining < (read)

Redirect standard input, for example, command < file reads input for command from file.

> (write)

Redirect standard output, for example, command > file writes output for command to file overwriting contents of file.

>> (append)

Redirect standard output, for example, command >> file appends output for command to the end of file.

2>

Redirect standard error (to append standard error to a file, use command 2>> file) combined redirection examples: • command < infile > outfile 2> errfile • command >> appendfile 2>> errfile < infile

;

Command terminator used to string commands on single line

|

Pipe information from one command to the next command. For example, ls | cpio -o > /dev/fd0 passes the results of the ls command to the cpio command.

\

Continuation character to continue command on a new line, will be prompted with > for command continuation

tee

Reads standard input and sends standard output to both standard output and a file, for example, ls | tee ls.save | sort results in ls output going to ls.save and piped to sort command

B-4

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Metacharacters *

Any number of characters (0 or more)

?

Any single character

[abc]

[ ] any character from the list

[a-c]

[ ] match any character from the list range

!

Not any of the following characters (for example, leftbox !abc right box)

;

Command terminator used to string commands on a single line

&

Command preceding and to be run in background mode

#

Comment character

\

Removes special meaning (no interpretation) of the following character Removes special meaning (no interpretation) of character in quotes

"

Interprets only $, backquote, and \ characters between the quotes

'

Used to set variable to results of a command. for example, now='date' sets the value of now to current results of the date command

$

Preceding variable name indicates the value of the variable

Physical and Logical Storage chfs

Changes file system attributes such as mount point, permissions, and size

compress

Reduces the size of the specified file using the adaptive LZ algorithm

crfs

Creates a file system within a previously created logical volume

extendlv

Extends the size of a logical volume

extendvg

Extends a volume group by adding a physical volume

fsck

Checks for file system consistency, and allows interactive repair of file systems

fuser

Lists the process numbers of local processes that use the files specified

lsattr

Lists the attributes of the devices known to the system

© Copyright IBM Corp. 2009

Appendix B. Command summary Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

B-5

Instructor Guide

lscfg

Gives detailed information about the AIX system hardware configuration

lsdev

Lists the devices known to the system

lsfs

Displays characteristics of the specified file system such as mount points, permissions, and file system size

lslv

Shows you information about a logical volume

lspv

Shows you information about a physical volume in a volume group

lsvg

Shows you information about the volume groups in your system

lvmstat

Controls LVM statistic gathering

migratepv

Used to move physical partitions from one physical volume to another

migratelp

Used to move logical partitions to other physical disks

mkdev

Configures a device

mkfs

Makes a new file system on the specified device

mklv

Creates a logical volume

mkvg

Creates a volume group

mount

Instructs the operating system to make the specified file system available for use from the specified point

quotaon

Starts the disk quota monitor

rmdev

Removes a device

rmlv

Removes logical volumes from a volume group

rmlvcopy

Removes copies from a logical volume

umount

Unmounts a file system from its mount point

uncompress

Restores files compressed by the compress command to their original size

unmount

Exactly the same function as the umount command

varyoffvg

Deactivates a volume group so that it cannot be accessed

varyonvg

Activates a volume group so that it can be accessed

B-6

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Variables =

Set a variable (for example, d="day" sets the value of d to "day"), can also set the variable to the results of a command by the ` character, for example, now=`date` sets the value of now to the current result of the date command.

HOME

Home directory

PATH

Path to be checked

SHELL

Shell to be used

TERM

Terminal being used

PS1

Primary prompt characters, usually $ or #

PS2

Secondary prompt characters, usually >

$?

Return code of the last command executed

set

Displays current local variable settings

export

Exports variable so that they are inherited by child processes

env

Displays inherited variables

echo

Echo a message (for example, echo HI or echo $d), can turn off carriage returns with \c at the end of the message, can print a blank line with \n at the end of the message.

Tapes and Diskettes dd

Reads a file in, converts the data (if required), and copies the file out

fdformat

Formats diskettes or read/write optical media disks

flcopy

Copies information to and from diskettes

format

AIX command to format a diskette

backup

Backs up individual files • • • •

-i reads file names from standard input -v list files as backed up; For example, backup -iv -f/dev/rmt0 file1, file2 -u backup file system at specified level; For example, backup -level -u filesystem

Can pipe list of files to be backed up into command, for example, find . -print | backup -ivf/dev/rmt0 where you are in directory to be backed up. mksysb © Copyright IBM Corp. 2009

Creates an installable image of the root volume group Appendix B. Command summary Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

B-7

Instructor Guide

restore

Restores commands from backup • -x restores files created with backup -i • -v list files as restore • -T list files stored of tape or diskette • -r restores file system created with backup -level -u; for example, restore -xv -f/dev/rmt0

cpio

Copies to and from an I/O device, destroys all data previously on tape or diskette, for input, must be able to place files in the same relative (or absolute) path name as when copied out (can determine path names with -it option), for input, if file exists, compares last modification date and keeps most recent (can override with -u option). • • • • • •

-o (output) -i (input), -t (table of contents) -v (verbose), -d (create needed directory for relative path names) -u (unconditional to override last modification date) for example, cpio -o > /dev/fd0 or cpio -iv file1 < /dev/fd0

tapechk

Performs simple consistency checking for streaming tape drives

tcopy

Copies information from one tape device to another

tctl

Sends commands to a streaming tape device

tar

Alternative utility to back up and restore files

pax

Alternative utility to cpio and tar commands

Transmitting mail

Send and receive mail. With userID sends mail to userID. Without userID, displays your mail. When processing your mail, at the ? prompt for each mail item, you can: • • • • •

d - delete s - append q - quit enter - skip m - forward

mailx

Upgrade of mail

uucp

Copy file to other UNIX systems (UNIX to UNIX copy)

B-8

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

uuto/uupick

Send and retrieve files to public directory

uux

Execute on remote system (UNIX to UNIX execute)

System administration df

Display file system usage

installp

Install program

kill (pid)

Kill batch process with ID or (PID) (find using ps); kill -9 PID will absolutely kill process

mount

Associate logical volume to a directory; for example, mount device directory

ps -ef

Shows process status (ps -ef)

umount

Disassociate file system from directory

smit

System management interface tool

Miscellaneous banner

Displays banner

date

Displays current date and time

newgrp

Change active groups

nice

Assigns lower priority to following command (for example, nice ps -f)

passwd

Modifies current password

sleep n

Sleep for n seconds

stty

Show or set terminal settings

touch

Create a zero length files

xinit

Initiate X-Windows

wall

Sends message to all logged in users

who

List users currently logged in (who am i identifies this user)

man,info

Displays manual pages

© Copyright IBM Corp. 2009

Appendix B. Command summary Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

B-9

Instructor Guide

System files /etc/group

List of groups

/etc/motd

Message of the day, displayed at login

/etc/passwd

List of users and signon information. Password shown as !, can prevent password checking by editing to remove !

/etc/profile

System wide user profile executed at login, can override variables by resetting in the user's .profile file

/etc/security

Directory not accessible to normal users

/etc/security/environ

User environment settings

/etc/security/group

Group attributes

/etc/security/limits

User limits

/etc/security/login.cfg

Login settings

/etc/security/passwd

User passwords

/etc/security/user

User attributes, password restrictions

Shell programming summary Variables var=string

Set variable to equal string. (NO SPACES). Spaces must be enclosed by double quotes. Special characters in string must be enclosed by single quotes to prevent substitution. Piping (|), redirection (, >>), and & symbols are not interpreted.

$var

Gives value of var in a compound

echo

Displays value of var, for example, echo $var

HOME

= Home directory of user

MAIL

= Mail file name

PS1

= Primary prompt characters, usually "$" or "#"

PS2

= Secondary prompt characters, usually ">"

PATH

= Search path

TERM

= Terminal type being used

export

Exports variables to the environment

env

Displays environment variables settings

B-10 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

${var:-string}

Gives value of var in a command, if var is null, uses string instead

$1 $2 $3...

Positional parameters for variable passed into the shell script

$*

Used for all arguments passed into shell script

$#

Number of arguments passed into shell script

$0

Name of shell script

$$

Process ID (PID)

$?

Last return code from a command

Commands #

Comment designator

&&

Logical-and. Run command following && only if command Preceding && succeeds (return code = 0)

||

Logical-or. Run command following || only if command preceding || fails (return code < > 0)

exit n

Used to pass return code nl from shell script, passed as variable $? to parent shell

expr

Arithmetic expressions Syntax: "expr expression1 operator expression2" operators: + \* (multiply) / (divide) % (remainder)

for loop

for n (or: for variable in $*); for example,: do command done

if-then-else

if test expression then elif then else then fi

command test expression command command

read

Read from standard input

shift

Shifts arguments 1-9 one position to the left and decrements number of arguments

test

Used for conditional test, has two formats. if test expression (for example, if test $# -eq 2)

© Copyright IBM Corp. 2009

Appendix B. Command summary Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

B-11

Instructor Guide

if [ expression ] (for example, if [ $# -eq 2 ]) (spaces required) Integer operators: -eq (=) -lt () String operators: = != (not eq.) -z (zero length) File status (for example, -opt file1) • -f (ordinary file) • -r (readable by this process) • -w (writable by this process) • -x (executable by this process) • -s (non-zero length) while loop

while test expression do command done

Miscellaneous sh

Execute shell script in the sh shell -x (execute step-by-step, used for debugging shell scripts)

vi Editor Entering vi vi file

Edits the file named file

vi file file2

Edit files consecutively (through :n)

.exrc

File that contains the vi profile

wm=nn

Sets wrap margin to nn. Can enter a file other than at first line by adding + (last line), +n (line n), or +/pattern (first occurrence of pattern).

vi -r

Lists saved files

vi -r file

Recover file named file from crash

:n

Next file in stack

:set all

Show all options

:set nu

Display line numbers (off when set nonu)

:set list

Display control characters in file

B-12 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

:set wm=n

Set wrap margin to n

:set showmode

Sets display of "INPUT" when in input mode

Read, write, exit :w

Write buffer contents

:w file2

Write buffer contents to file2

:w >> file2

Write buffer contents to end of file2

:q

Quit editing session

:q!

Quit editing session and discard any changes

:r file2

Read file2 contents into buffer following current cursor

:r! com

Read results of shell command com following current cursor

:!

Exit shell command (filter through command)

:wq or ZZ

Write and quit edit session

Units of measure h, l

Character left, character right

k or p

Move cursor to character above cursor

j or n

Move cursor to character below cursor

w, b

Word right, word left

^, $

Beginning, end of current line

or +

Beginning of next line

-

Beginning of previous line

G

Last line of buffer

Cursor movements Can precede cursor movement commands (including cursor arrow) with number of times to repeat, for example, 9--> moves right nine characters. 0

Move to first character in line

$

Move to last character in line

^

Move to first nonblank character in line

fx

Move right to character x

Fx

Move left to character x

© Copyright IBM Corp. 2009

Appendix B. Command summary Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

B-13

Instructor Guide

tx

Move right to character preceding character x

Tx

Move left to character preceding character x

;

Find next occurrence of x in same direction

,

Find next occurrence of x in opposite direction

w

Tab word (nw = n tab word) (punctuation is a word)

W

Tab word (nw = n tab word) (ignore punctuation)

b

Backtab word (punctuation is a word)

B

Backtab word (ignore punctuation)

e

Tab to ending char. of next word (punctuation is a word)

E

Tab to ending char. of next word (ignore punctuation)

(

Move to beginning of current sentence

)

Move to beginning of next sentence

{

Move to beginning of current paragraph

}

Move to beginning of next paragraph

H

Move to first line on screen

M

Move to middle line on screen

L

Move to last line on screen

f

Scroll forward 1 screen (3 lines overlap)

d

Scroll forward 1/2 screen

b

Scroll backward 1 screen (0 line overlap)

u

Scroll backward 1/2 screen

G

Go to last line in file

nG

Go to line n

g

Display current line number

Search and replace /pattern

Search forward for pattern

?pattern

Search backward for pattern

n

Repeat find in the same direction

N

Repeat find in the opposite direction

B-14 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Adding text a

Add text after the cursor (end with )

A

Add text at end of current line (end with )

i

Add text before the cursor (end with )

I

Add text before first nonblank character in current line

o

Add line following current line

O

Add line before current line

Return to command mode

Deleting text w

Undo entry of current word

@

Kill the insert on this line

x

Delete current character

dw

Delete to end of current word (observe punctuation)

dW

Delete to end of current word (ignore punctuation)

dd

Delete current line

d

Erase to end of line (same as d$)

d)

Delete current sentence

d}

Delete current paragraph

dG

Delete current line through end of buffer

d^

Delete to the beginning of line

u

Undo last change command

U

Restore current line to original state before modification

Replacing text ra

Replace current character with a

R

Replace all characters overtyped until is entered

s

Delete current character and append test until

s/s1/s2

Replace s1 with s2 (in the same line only)

S

Delete all characters in the line and append text

cc

Replace all characters in the line (same as S)

© Copyright IBM Corp. 2009

Appendix B. Command summary Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

B-15

Instructor Guide

ncx

Delete n text objects of type x, w, b = words,) = sentences, } = paragraphs, $ = end-of-line, ^ = beginning of line) and enter append mode

C

Replace all characters from cursor to end-of-line

Moving text p

Paste last text deleted after cursor (xp will transpose 2 characters)

P

Paste last text deleted before cursor

nYx

Yank n text objects of type x (w, b = words,) = sentences, } = paragraphs, $ = end-of-line, and no "x" indicates lines. Can then paste them with p command. Yank does not delete the original.

"ayy"

Can use named registers for moving, copying, cut/paste with "ayy" for register a (use registers a-z), can then paste them with ap command.

Miscellaneous .

Repeat last command

J

Join current line with next line

B-16 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Appendix C. AIX dump code and progress codes This appendix is an extract out of the AIX 4.3 Messages Guide and Reference.

0c0 - 0cc 0c0

A user-requested dump completed successfully.

0c1

An I/O error occurred during the dump.

0c2

A user-requested dump is in progress. Wait at least one minute for the dump to complete.

0c4

The dump ran out of space. Partial dump is available.

0c5

The dump failed due to an internal failure. A partial dump may exist.

0c7

Progress indicator. Remote dump is in progress.

0c8

The dump device is disabled. No dump device configured.

0c9

A system-initiated dump has started. Wait at least one minute for the dump to complete.

0cc

(AIX 4.2.1 and later) An error occurred writing to the primary dump device. It switched over to the secondary.

100 - 195 100

Progress indicator. BIST completed successfully.

101

Progress indicator. Initial BIST started following system reset.

102

Progress indicator. BIST started following power on reset.

103

BIST could not determine the system model number.

104

BIST could not find the common on-chip processor bus address.

105

BIST could not read from the on-chip sequencer EPROM.

106

BIST detected a module failure.

111

On-chip sequencer stopped. BIST detected a module error.

112

Checkstop occurred during BIST and checkstop results could not be logged out.

113

The BIST checkstop count equals 3, that means three unsuccessful system restarts. System halts.

120

Progress indicator. BIST started CRC check on the EPROM.

121

BIST detected a bad CRC on the on-chip sequencer EPROM.

© Copyright IBM Corp. 2009

Appendix C. AIX dump code and progress codes Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

C-1

Instructor Guide

122

Progress indicator. BIST started a CRC check on the EPROM.

123

BIST detected a bad CRC on the on-chip sequencer NVRAM.

124

Progress indicator. BIST started a CRC check on the on-chip sequencer NVRAM.

125

BIST detected a bad CRC on the time-of-day NVRAM.

126

Progress indicator. BIST started a CRC check on the time-of-day NVRAM.

127

BIST detected a bad CRC on the EPROM.

130

Progress indicator. BIST presence test has started.

140

BIST was unsuccessful. The system halts.

142

BIST was unsuccessful. The system halts.

143

Invalid memory configuration

144

BIST was unsuccessful. The system halts.

151

Progress indicator. BIST has started.

152

Progress indicator. BIST has started direct-current logic self-test (DCLST) code.

153

Progress indicator. BIST has started.

154

Progress indicator. BIST has started array self-test (AST) test code.

160

BIST detected a missing early power-off warning (EPOW) connector.

161

The Bump quick I/O tests failed.

162

The JTAG tests failed.

164

BIST encountered an error while reading low NVRAM.

165

BIST encountered an error while writing low NVRAM.

166

BIST encountered an error while reading high NVRAM.

167

BIST encountered an error while writing high NVRAM.

168

BIST encountered an error while reading the serial input/output register.

169

BIST encountered an error while writing the serial input/output register.

180

Progress indicator. The BIST checkstop logout is in progress.

182

BIST COP bus is not responding.

185

Checkstop occurred during BIST.

186

System logic-generated checkstop (Model 250 only).

C-2

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

187

BIST was unable to identify the chip release level in the checkstop logout data.

195

Progress indicator. The BIST checkstop logout completed.

200 - 299, 2e6-2e7 200

Key mode switch is in the secure position.

201

Checkstop occurred during system restart. If a 299 LED was shown before, recreate the boot logical volume (bosboot).

202

Unexpected machine check interrupt, system halts

203

Unexpected data storage interrupt, system halts

204

Unexpected instruction storage interrupt, system halts

205

Unexpected external interrupt, system halts

206

Unexpected alignment interrupt, system halts

207

Unexpected program interrupt, system halts

208

Machine check due to an L2 uncorrectable ECC, system halts

209

Reserved, system halts

210

Unexpected switched virtual circuit (SVC) 1000 interrupt, system halts

211

IPL ROM CRC miscompare occurred during system restart, system halts

212

POST found processor to be bad, system halts

213

POST failed. No good memory could be detected, the system halts.

214

An I/O planar failure has been detected. The power status register, the time-of-day clock, or NVRAM on the I/O planar failed. The system halts

215

Progress indicator. The level of voltage supplied to the system is too low to continue a system restart.

216

Progress indicator. The IPL ROM code is being uncompressed into memory for execution.

217

Progress indicator. The system has encountered the end of the boot devices list. The system continues to loop through the boot devices list.

218

Progress indicator. POST is testing for 1MB of good memory.

219

Progress indicator. POST bit map is being generated.

21c

L2 cache not detected as part of systems configuration (when LED persists for 2 seconds).

220

Progress indicator. IPL control block is being initialized.

© Copyright IBM Corp. 2009

Appendix C. AIX dump code and progress codes Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

C-3

Instructor Guide

221

An NVRAM CRC miscompare occurred while loading the operating system with the key mode switch in Normal position. System halts.

222

Progress indicator. Attempting a Normal-mode system restart from the standard I/O planar-attached devices. System retries.

223

Progress indicator. Attempting a Normal-mode system restart from the SCSI-attached devices specified in the NVRAM list.

224

Progress indicator. Attempting a Normal-mode system restart from the 9333 High-Performance Disk Drive Subsystem.

225

Progress indicator. Attempting a Normal-mode system restart from the bus-attached internal disk.

226

Progress indicator. Attempting a Normal-mode system restart from Ethernet.

227

Progress indicator. Attempting a Normal-mode system restart from token ring.

228

Progress indicator. Attempting a Normal-mode system restart using the expansion code devices list, but cannot restart from any of the devices in the list.

229

Progress indicator. Attempting a Normal-mode system restart from devices in NVRAM boot devices list, but cannot restart from any of the devices in the list. System retries.

22c

Progress indicator. Attempting a Normal-mode IPL from FDDI specified in the NVRAM device list.

230

Progress indicator. Attempting a Normal-mode system restart from Family 2 Feature ROM specified in the IPL ROM default devices list.

231

Progress indicator. Attempting a Normal-mode system restart from Ethernet specified by selection from ROM menus.

232

Progress indicator. Attempting a Normal-mode system restart from the standard I/O planar-attached devices specified in the IPL ROM default device list.

233

Progress indicator. Attempting a Normal-mode system restart from the SCSI-attached devices specified in the IPL ROM default device list.

234

Progress indicator. Attempting a Normal-mode system restart from the 9333 High-Performance Disk Drive Subsystem specified in the IPL ROM default device list.

234

Progress indicator. Attempting a Normal-mode system restart from the 9333 High-Performance Disk Drive Subsystem specified in the IPL ROM default device list.

C-4

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

235

Progress indicator. Attempting a Normal-mode system restart from the bus-attached internal disk specified in the IPL ROM default device list.

236

Progress indicator. Attempting a Normal-mode system restart from the Ethernet specified in the IPL ROM default device list.

237

Progress indicator. Attempting a Normal-mode system restart from the token ring specified in the IPL ROM default device list.

238

Progress indicator. Attempting a Normal-mode system restart from the token-ring specified by selection from ROM menus.

239

Progress indicator. A Normal-mode menu selection failed to boot.

23c

Progress indicator. Attempting a Normal-mode IPL form FDDI in IPL ROM device list.

240

Progress indicator. Attempting a Service-mode system restart from the Family 2 Feature ROM specified in the NVRAM boot devices list.

241

Attempting a Normal-mode system restart from devices specified in NVRAM bootlist.

242

Progress indicator. Attempting a Service-mode system restart from the standard I/O planar-attached devices specified in the NVRAM boot devices list.

243

Progress indicator. Attempting a Service-mode system restart from the SCSI-attached devices specified in the NVRAM boot devices list.

244

Progress indicator. Attempting a Service-mode system restart from the 9333 High-Performance Disk Drive Subsystem specified in the NVRAM boot devices list.

245

Progress indicator. Attempting a Service-mode system restart from the bus-attached internal disk specified in the NVRAM boot devices list.

246

Progress indicator. Attempting a Service-mode system restart from the Ethernet specified in the NVRAM boot devices list.

247

Progress indicator. Attempting a Service-mode system restart from the Token-Ring specified in the NVRAM boot devices list.

248

Progress indicator. Attempting a Service-mode system restart using the expansion code specified in the NVRAM boot devices list.

249

Progress indicator. Attempting a Service-mode system restart from devices in NVRAM boot devices list, but cannot restart from any of the devices in the list.

250

Progress indicator. Attempting a Service-mode system restart from the Family 2 Feature ROM specified in the IPL ROM default devices list.

251

Progress indicator. Attempting a Service-mode system restart from Ethernet by selection from ROM menus.

© Copyright IBM Corp. 2009

Appendix C. AIX dump code and progress codes Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

C-5

Instructor Guide

252

Progress indicator. Attempting a Service-mode system restart from the standard I/O planar-attached devices specified in the IPL ROM default devices list.

253

Progress indicator. Attempting a Service-mode system restart from the SCSI-attached devices specified in the IPL ROM default devices list.

254

Progress indicator. Attempting a Service-mode system restart from the 9333 High-Performance Subsystem devices specified in the IPL ROM default devices list.

255

Progress indicator. Attempting a Service-mode system restart from the bus-attached internal disk specified in the IPL ROM default devices list.

256

Progress indicator. Attempting a Service-mode system restart from the Ethernet specified in the IPL ROM default devices list.

257

Progress indicator. Attempting a Service-mode system restart from the token ring specified in the IPL ROM default devices list.

258

Progress indicator. Attempting a Service-mode system restart from the token ring specified by selection from ROM menus.

259

Progress indicator. Attempting a Service-mode system restart from FDDI specified by the operator.

260

Progress indicator. Menus are being displayed on the local display or terminal connected to your system. The system waits for input from the terminal.

261

No supported local system display adapter was found. The system waits for a response from an asynchronous terminal on serial port 1.

262

No local system keyboard was found.

263

Progress indicator. Attempting a Normal-mode system restart from the Family 2 Feature ROM specified in the NVRAM boot devices list.

269

Progress indicator. Cannot boot system, end of bootlist reached.

270

Progress indicator. Ethernet/FDX 10 Mbps MC adapter POST is running.

271

Progress indicator. Mouse and mouse port POST are running.

272

Progress indicator. Tablet port POST is running.

276

Progress indicator. A 10/100 Mbps Ethernet MC adapter POST is running.

277

Progress indicator. Auto Token Ring LAN streamer MC 32 adapter POST is running.

278

Progress indicator. Video ROM scan POST is running.

279

Progress indicator. FDDI POST is running

C-6

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

280

Progress indicator. 3Com Ethernet POST is running.

281

Progress indicator. Keyboard POST is running.

282

Progress indicator. Parallel port POST is running.

283

Progress indicator. Serial port POST is running.

284

Progress indicator. POWER Gt1 graphics adapter POST is running.

285

Progress indicator. POWER Gt3 graphics adapter POST is running.

286

Progress indicator. Token Ring adapter POST is running.

287

Progress indicator. Ethernet adapter POST is running.

288

Progress indicator. Adapter slot cards are being queried.

289

Progress indicator. POWER Gt0 graphics adapter POST is running.

290

Progress indicator. I/O planar test started.

291

Progress indicator. Standard I/O planar POST is running.

292

Progress indicator. SCSI POST is running.

293

Progress indicator. Bus-attached internal disk POST is running.

294

Progress indicator. TCW SIMM in slot J is bad.

295

Progress indicator. Color Graphics Display POST is running.

296

Progress indicator. Family 2 Feature ROM POST is running.

297

Progress indicator. System model number could not be determined. System halts.

298

Progress indicator. Attempting a warm system restart.

299

Progress indicator. IPL ROM passed control to loaded code.

2e6

Progress indicator. A PCI Ultra/Wide differential SCSI adapter is being configured.

2e7

An undetermined PCI SCSI adapter is being configured.

500 - 599, 5c0 - 5c6 500

Progress indicator. Querying standard I/O slot.

501

Progress indicator. Querying card in slot 1.

502

Progress indicator. Querying card in slot 2.

503

Progress indicator. Querying card in slot 3.

504

Progress indicator. Querying card in slot 4.

505

Progress indicator. Querying card in slot 5.

© Copyright IBM Corp. 2009

Appendix C. AIX dump code and progress codes Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

C-7

Instructor Guide

506

Progress indicator. Querying card in slot 6.

507

Progress indicator. Querying card in slot 7.

508

Progress indicator. Querying card in slot 8.

510

Progress indicator. Starting device configuration.

511

Progress indicator. Device configuration completed.

512

Progress indicator. Restoring device configuration from media.

513

Progress indicator. Restoring BOS installation files from media.

516

Progress indicator. Contacting server during network boot.

517

Progress indicator. The / (root) and /usr file systems are being mounted.

518

Mount of the /usr file system was not successful. System halts.

520

Progress indicator. BOS configuration is running.

521

The /etc/inittab file has been incorrectly modified or is damaged. The configuration manager was started from the /etc/inittab file with invalid options. System halts.

522

The /etc/inittab file has been incorrectly modified or is damaged. The configuration manager was started from the /etc/inittab file with conflicting options. System halts.

523

The /etc/objrepos file is missing or inaccessible.

524

The /etc/objrepos/Config_Rules file is missing or inaccessible.

525

The /etc/objrepos/CuDv file is missing or inaccessible.

526

The /etc/objrepos/CuDvDr file is missing or inaccessible.

527

You cannot run Phase 1 at this point. The /sbin/rc.boot file has probably been incorrectly modified or is damaged.

528

The /etc/objrepos/Config_Rules file has been incorrectly modified or is damaged, or a program specified in the file is missing.

529

There is a problem with the device containing the ODM database or the root file system is full.

530

The savebase command was unable to save information about the base customized devices onto the boot device during Phase 1 of system boot. System halts.

531

The /usr/lib/objrepos/PdAt file is missing or inaccessible. System halts.

532

There is not enough memory for the configuration manager to continue. System halts.

C-8

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

533

The /usr/lib/objrepos/PdDv file has been incorrectly modified or is damaged, or a program specified in the file is missing.

534

The configuration manager is unable to acquire a database lock. System halts.

535

A HIPPI diagnostics interface driver is being configured.

536

The /etc/objrepos/Config_Rules file has been incorrectly modified or is damaged. System halts.

537

The /etc/objrepos/Config_Rules file has been incorrectly modified or is damaged. System halts.

538

Progress indicator. The configuration manager is passing control to a configuration method.

539

Progress indicator. The configuration method has ended and control has returned to the configuration manager.

540

Progress indicator. Configuring child of IEEE-1284 parallel port.

544

Progress indicator. An ECP peripheral configure method is executing.

545

Progress indicator. A parallel port ECP device driver is being configured.

546

IPL cannot continue due to an error in the customized database.

547

Rebooting after error recovery (LED 546 precedes this LED).

548

Restbase failure.

549

Console could not be configured for the “Copy a System Dump” menu.

550

Progress indicator. ATM LAN emulation device driver is being configured.

551

Progress indicator. A varyon operation of the rootvg is in progress.

552

The ipl_varyon command failed with a return code not equal to 4, 7, 8 or 9 (ODM or malloc failure). System is unable to vary on the rootvg.

553

The /etc/inittab file has been incorrectly modified or is damaged. Phase 1 boot is completed and the init command started.

554

The IPL device could not be opened or a read failed (hardware not configured or missing).

555

The fsck -fp /dev/hd4 command on the root file system failed with a non-zero return code.

556

LVM subroutine error from ipl_varyon.

557

The root file system could not be mounted. The problem is usually due to bad information on the log logical volume (/dev/hd8) or the boot logical volume (hd5) has been damaged.

© Copyright IBM Corp. 2009

Appendix C. AIX dump code and progress codes Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

C-9

Instructor Guide

558

Not enough memory is available to continue system restart.

559

Less than 2 MB of good memory are left for loading the AIX kernel. System halts.

560

Unsupported monitor is attached to the display adapter.

561

Progress indicator. The TMSSA device is being identified or configured.

565

Configuring the MWAVE subsystem.

566

Progress indicator. Configuring Namkan twinaxx common card.

567

Progress indicator. Configuring High-Performance Parallel Interface (HIPPI) device driver (fpdev).

568

Progress indicator. Configuring High-Performance Parallel Interface (HIPPI) device driver (fphip).

569

Progress indicator. FCS SCSI protocol device is being configured.

570

Progress indicator. A SCSI protocol device is being configured.

571

HIPPI common functions driver is being configured.

572

HIPPI IPI-3 master mode driver is being configured.

573

HIPPI IPI-3 slave mode driver is being configured.

574

HIPPI IPI-3 user-level interface is being configured.

575

A 9570 disk-array driver is being configured.

576

Generic async device driver is being configured.

577

Generic SCSI device driver is being configured.

578

Generic common device driver is being configured.

579

Device driver is being configured for a generic device.

580

Progress indicator. A HIPPI-LE interface (IP) layer is being configured.

581

Progress indicator. TCP/IP is being configured. The configuration method for TCP/IP is being run.

582

Progress indicator. Token ring data link control (DLC) is being configured.

583

Progress indicator. Ethernet data link control (DLC) is being configured.

584

Progress indicator. IEEE Ethernet (802.3) data link control (DLC) is being configured.

585

Progress indicator. SDLC data link control (DLC) is being configured.

586

Progress indicator. X.25 data link control (DLC) is being configured.

C-10 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

587

Progress indicator. Netbios is being configured.

588

Progress indicator. Bisync read-write (BSCRW) is being configured.

589

Progress indicator. SCSI target mode device is being configured.

590

Progress indicator. Diskless remote paging device is being configured.

591

Progress indicator. Logical Volume Manager device driver is being configured.

592

Progress indicator. An HFT device is being configured.

593

Progress indicator. SNA device driver is being configured.

594

Progress indicator. Asynchronous I/O is being defined or configured.

595

Progress indicator. X.31 pseudo device is being configured.

596

Progress indicator. SNA DLC/LAPE pseudo device is being configured.

597

Progress indicator. Outboard communication server (OCS) is being configured.

598

Progress indicator. OCS hosts is being configured during system reboot.

599

Progress indicator. FDDI data link control (DLC) is being configured.

5c0

Progress indicator. Streams-based hardware driver being configured.

5c1

Progress indicator. Streams-based X.25 protocol stack being configured.

5c2

Progress indicator. Streams-based X.25 COMIO emulator driver being configured.

5c3

Progress indicator. Streams-based X.25 TCP/IP interface driver being configured.

5c4

Progress indicator. FCS adapter device driver being configured.

5c5

Progress indicator. SCB network device driver for FCS is being configured.

5c6

Progress indicator. AIX SNA channel being configured.

c00 - c99 c00

AIX Install/Maintenance loaded successfully.

c01

Insert the AIX Install/Maintenance diskette.

c02

Diskettes inserted out of sequence.

c03

Wrong diskette inserted.

c04

Irrecoverable error occurred.

© Copyright IBM Corp. 2009

Appendix C. AIX dump code and progress codes Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

C-11

Instructor Guide

c05

Diskette error occurred.

c06

The rc.boot script is unable to determine the type of boot.

c07

Insert next diskette.

c08

RAM file system started incorrectly.

c09

Progress indicator. Writing to or reading from diskette.

c10

Platform-specific bootinfo is not in boot image.

c20

Unexpected system halt occurred. System is configured to enter the kernel debug program instead of performing a system dump. Enter bosboot -D for information about kernel debugger enablement.

c21

The if config command was unable to configure the network for the client network host.

c25

Client did not mount remote mini root during network install.

c26

Client did not mount the /usr file system during the network boot.

c29

System was unable to configure the network device.

c31

If a console has not been configured, the system pauses with this value and then displays instructions for choosing a console.

c32

Progress indicator. Console is a high-function terminal.

c33

Progress indicator. Console is a tty.

c34

Progress indicator. Console is a file.

c40

Extracting data files from media.

c41

Could not determine the boot type or device.

c42

Extracting data files from diskette.

c43

Could not access the boot or installation tape.

c44

Initializing installation database with target disk information.

c45

Cannot configure the console. The cfgcon command failed.

c46

Normal installation processing.

c47

Could not create a PVID on a disk. The chgdisk command failed.

c48

Prompting you for input. BosMenus is being run.

c49

Could not create or form the JFS log.

c50

Creating rootvg on target disk.

c51

No paging devices were found.

c52

Changing from RAM environment to disk environment.

C-12 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

c53

Not enough space in /tmp to do a preservation installation. Make /tmp larger.

c54

Installing either BOS or additional packages.

c55

Could not remove the specified logical volume in a preservation installation.

c56

Running user-defined customization.

c57

Failure to restore BOS.

c58

Displaying message to turn the key.

c59

Could not copy either device special files, device ODM, or volume group information from RAM to disk.

c61

Failed to create the boot image.

c70

Problem mounting diagnostic CD-ROM disk in stand-alone mode.

c99

Progress indicator. The diagnostic programs have completed.

© Copyright IBM Corp. 2009

Appendix C. AIX dump code and progress codes Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

C-13

Instructor Guide

C-14 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Appendix D. Auditing security related events

© Copyright IBM Corp. 2009

Appendix D. Auditing security related events Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

D-1

Instructor Guide

Appendix objectives IBM Power Systems

After completing this appendix, you should be able to: • Configure the auditing subsystem

© Copyright IBM Corporation 2009

Figure D-1. Appendix objectives

AN151.0

Notes:

D-2

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Instructor notes: Purpose — Present the objectives for this appendix. Details — Additional information — Transition statement — Let’s start with an overview of how the auditing subsystem works.

© Copyright IBM Corp. 2009

Appendix D. Auditing security related events Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

D-3

Instructor Guide

How the auditing subsystem works IBM Power Systems

Kernel

Applications Audit Events /dev/audit

Audit records

BIN

Audit logger

STREAM

Audit records

© Copyright IBM Corporation 2009

Figure D-2. How the auditing subsystem works

AN151.0

Notes: Function of auditing subsystem The AIX auditing subsystem provides a way to trace security-relevant events like accessing an important system file or the execution of applications, which might influence the security of your system.

Operation of auditing subsystem The auditing subsystem works in the following way. The AIX kernel or other security-related application uses a system call to process the security-related event in the auditing subsystem. This system call writes the auditing information to a special file /dev/audit. An audit logger reads the audit information from this device, formats it, and writes the audit record either to files (in BIN mode) or to a specified device, for example a display, or a printer (in STREAM mode).

D-4

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Instructor notes: Purpose — Describe how auditing works. Details — Base your explanation on the information in the student materials. Additional information — The auditing subsystem enables you to capture any event on the system that changes the state of the security of your system. This visual should be used to set the scene for the entire session. An auditable event is any security-relevant occurrence in the system. The programs and kernel modules that detect auditable events are responsible for reporting these events to the system audit logger. The fileset which is required to enable auditing is the bos.rte.security fileset. Transition statement — Let’s discuss the configuration files for auditing.

© Copyright IBM Corp. 2009

Appendix D. Auditing security related events Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

D-5

Instructor Guide

Auditing configuration files IBM Power Systems

/etc/security/audit/objects

/etc/security/audit/events

Contains the audit events triggered by file access Contains information about system audit events and responses to those events Contains audit configuration information:

/etc/security/audit/config

- Start mode - Audit classes - Audited users

© Copyright IBM Corporation 2009

Figure D-3. Auditing configuration files

AN151.0

Notes: Introduction All audit configuration files reside in the directory /etc/security/audit. Individual configuration files used by the auditing subsystem are described in the material that follows.

The objects file This file describes all files and programs that are audited. For each file, a unique audit event name is specified. These files are monitored by the AIX kernel.

The events file This file contains one stanza called auditpr. Each audit event is named, and the format of the output produced by each event is defined in this stanza. The auditpr command writes all audit output based on this information in this file. D-6

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

The config file This file contains audit configuration information: - The start mode for the audit logger (BIN or STREAM mode) - Audit classes: Are groups of audit events. Each audit class name must be less than 16 characters and must be unique to the system. AIX Supports up to 32 audit classes. - Audited users: The users whose activities you wish to monitor are defined in the users stanza. A users stanza determines which combination of user and event class to monitor.

© Copyright IBM Corp. 2009

Appendix D. Auditing security related events Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

D-7

Instructor Guide

Instructor notes: Purpose — Provide information about the most important audit configuration files. Details — Base your explanation on the information in the student materials. Additional information — None Transition statement — Let’s identify how to configure the auditing subsystem.

D-8

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Audit configuration: Objects IBM Power Systems

# vi /etc/security/audit/objects /etc/security/user: w = "S_USER_WRITE"

...

/etc/filesystems: w = "MY_EVENT" /usr/sbin/no: x = "MY_X_EVENT"

© Copyright IBM Corporation 2009

Figure D-4. Audit configuration: Objects

AN151.0

Notes: Specifying objects To configure the auditing subsystem, you first specify the objects (files or applications) that you want to audit in /etc/security/audit/objects. In this file, you find predefined files, for example, /etc/security/user. To audit your own files, you have to add stanzas for each file, in the following format: file: access_mode = "event_name" An audit event name can be up to 15 bytes long. Valid access modes are read (r), write (w), and execute (x).

© Copyright IBM Corp. 2009

Appendix D. Auditing security related events Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

D-9

Instructor Guide

Discussion of example on visual In the example shown on the visual, we add two files. An event MY_EVENT will be generated by the AIX kernel when somebody writes the file /etc/filesystems. Another event MY_X_EVENT will be generated when somebody executes the program /usr/sbin/no. After adding objects, you have to specify formatting information in the events file. That is shown on the next visual.

Note regarding symbolic links Symbolic links cannot be monitored by the auditing subsystem.

D-10 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Instructor notes: Purpose — Describe the function of the objects file. Details — Base your explanation on the information in the student materials. Additional information — When running a shell script, only a read event is generated. For an execute event to be triggered, the program must be compiled. Transition statement — Let’s introduce the events configuration file.

© Copyright IBM Corp. 2009

Appendix D. Auditing security related events Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

D-11

Instructor Guide

Audit configuration: Events IBM Power Systems

# vi /etc/security/audit/events auditpr: USER_Login = printf "user: %s tty: %s" USER_Logout = printf "%s"

...

MY_EVENT = printf "%s" MY_X_EVENT = printf "%s"

© Copyright IBM Corporation 2009

Figure D-5. Audit configuration: Events

AN151.0

Notes: Function of /etc/security/audit/events file All audit system events have a format specification that is used by the auditpr command, which prints the audit record. This format specification is defined in the /etc/security/audit/events file and specifies how the information will be printed when the audit data is analyzed.

Entries in /etc/security/audit/events file The /etc/security/audit/events file contains just one stanza, auditpr, which lists all the audit events in the system. Each attribute in the stanza is the name of an audit event, where the following formats are possible: AuditEvent = printf "format-string" AuditEvent = event_program arguments

D-12 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

To print out the audit record with all event arguments, printf is used. Different format specifiers are used, depending on the audit event that occurs. If you want to trigger other applications that are called whenever an event occurs, you can specify an event_program. If you do this, always use the full pathname of the event_program.

Adding format specifications If you specify your own events in the objects file, you need to add a corresponding format specification to the events file. For our self-defined events, MY_EVENT and MY_X_EVENT, we use the printf format command. Remember that the AIX kernel monitors these objects and triggers the audit events.

© Copyright IBM Corp. 2009

Appendix D. Auditing security related events Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

D-13

Instructor Guide

Instructor notes: Purpose — Describe the events configuration file. Details — Describe it using the information in the student materials. Additional information — The command printf “%s” will format the output that will be printed to the appropriate location. The %s indicates to accept a string of characters. There are a number of format specifiers available with the printf command. Review man pages for more detail. Transition statement — Let’s introduce the config configuration file.

D-14 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Audit configuration: config IBM Power Systems

# vi /etc/security/audit/config start: binmode = off streammode = on ... classes: general = USER_SU, PASSWORD_Change, ... tcpip = TCPIP_connect, TCPIP_data_in, ... ... init = USER_Login, USER_Logout users: root = general michael = init

© Copyright IBM Corporation 2009

Figure D-6. Audit configuration: config

AN151.0

Notes: Introduction The /etc/security/audit/config file contains audit configuration information. The information that follows describes three of the stanzas in this file: start, classes, and users.

The start stanza The stanza start specifies the start mode for the audit logger. If you work in bin mode, the audit records are stored in files. The auditbin daemon will be started. The stream mode allows real-time processing of an audit event, for example, to display the audit record on the system console or to print it on a printer.

© Copyright IBM Corp. 2009

Appendix D. Auditing security related events Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

D-15

Instructor Guide

The classes stanza The stanza classes groups audit events together to a class. These classes could then be assigned to users who are then audited for all events belonging to a class. Note that this is necessary for all events that are triggered by applications. Object events triggered by the kernel need not be part of a class. Note that the class name (for example init) must be less than 16 characters and must be unique on the system.

The users stanza The stanza users assigns audit classes to a user. The username (for example, michael) must be the login name of a system user, or the string default which stands for all system users. In the example, the self-defined class init is assigned to the user michael. Whenever michael logs in or out from the system, an audit record will be written.

Use of the chuser command Note that you can also use the chuser command to establish an audit activity for a special user: # chuser "auditclasses=init" michael

D-16 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Instructor notes: Purpose — Describe the config configuration file. Details — Describe using the information in the student materials. Additional information — None Transition statement — Let’s describe how bin mode works.

© Copyright IBM Corp. 2009

Appendix D. Auditing security related events Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

D-17

Instructor Guide

Audit configuration: bin mode IBM Power Systems

# vi /etc/security/audit/config start: binmode = on streammode = off

bin: trail = /audit/trail bin1 = /audit/bin1 bin2 = /audit/bin2 binsize = 10240 cmds = /etc/security/audit/bincmds

...

• Use the auditpr command to display the audit records: # auditpr -v < /audit/trail © Copyright IBM Corporation 2009

Figure D-7. Audit configuration: bin mode

AN151.0

Notes: Use of start stanza To work in bin mode, specify binmode = on in the start stanza in /etc/security/audit/config. In this case, the auditbin daemon will be started.

Use of bin stanza The bin stanza specifies how the bin mode works: The audit records are stored in alternating files that have a fixed size (specified by binsize). The records are first written into the file specified by bin1. When this file fills, future records are written to /audit/bin2 automatically and the content of /audit/bin1 is written to /audit/trail to create the permanent record.

D-18 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Use of the auditpr command To display the audit records, you must use the auditpr command: # auditpr -v < /audit/trail In this example, you display the audit records that are stored in /audit/trail.

Recommendation regarding root file system If you use bin-mode auditing, it is recommended that you do not specify bins that are in the hd4 (root) file system.

© Copyright IBM Corp. 2009

Appendix D. Auditing security related events Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

D-19

Instructor Guide

Instructor notes: Purpose — Describe how the bin mode works. Details — Describe using the information in the student materials. Additional information — If binsize is set to 0, no bin switching will occur and all bin collection will go to bin1. Transition statement — Let’s describe stream mode.

D-20 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Audit configuration: stream mode IBM Power Systems

# vi /etc/security/audit/config start: binmode = off streammode = on stream: cmds = /etc/security/audit/streamcmds ... # vi /etc/security/audit/streamcmds /usr/sbin/auditstream | auditpr -v > /dev/console &

All audit records are displayed on the console © Copyright IBM Corporation 2009

Figure D-8. Audit configuration: stream mode

AN151.0

Notes: Configuring stream mode The stream mode allows real-time processing of the audit events. To configure stream mode auditing, you have to do two things in /etc/security/audit/config: 1. Specify streammode = on in the start stanza. 2. Specify the audit record destination in the stream mode backend file /etc/security/audit/streamcmds. In our example, all records are displayed on the console, using the auditpr command. Note that you must specify the & sign after the command.

© Copyright IBM Corp. 2009

Appendix D. Auditing security related events Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

D-21

Instructor Guide

The auditstream command The auditstream command starts up an auditstream daemon. In streamcmds, you can startup multiple daemons that monitor different classes, for example: /usr/sbin/auditstream -c init | auditpr -v > /var/init.txt & /usr/sbin/auditstream -c general | auditpr -v > /var/general.txt & If you want to monitor selected events in these classes, use the auditselect command. See the man pages for more information regarding these commands.

D-22 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Instructor notes: Purpose — Explain stream mode auditing. Details — Describe using the information in the student materials Additional information — None Transition statement — Let’s show how to start and stop the auditing subsystem.

© Copyright IBM Corp. 2009

Appendix D. Auditing security related events Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

D-23

Instructor Guide

The audit command IBM Power Systems

# audit start Start / stop auditing # audit shutdown

# audit query

Display audit status

# audit off Suspend / restart auditing # audit on

© Copyright IBM Corporation 2009

Figure D-9. The audit command

AN151.0

Notes: Starting and stopping auditing The audit command controls system auditing. To start the auditing system, use audit start; to stop auditing, use audit shutdown. Note that you have to stop and restart auditing whenever you change a configuration file.

Displaying audit status To query the current audit configuration, use audit query.

Suspending and restarting auditing If you want to suspend auditing, use audit off; to restart it, use audit on.

D-24 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Instructor notes: Purpose — Describe the audit command. Details — Base your explanation on the information in the student materials. Additional information — The difference between audit shutdown/start vs. audit off/on is shutdown and start force the configuration files to be reread, whereas off and on do not reread the configuration files. Also, a shutdown forces the information from the bin files to be written to the trail file so when the start option is used, the bin files are empty. The off option leaves the information in the bin files and resumes where it left off when the on option is specified. Transition statement — Let’s show some audit records.

© Copyright IBM Corp. 2009

Appendix D. Auditing security related events Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

D-25

Instructor Guide

Example audit records IBM Power Systems

Event

Login

Status

Time

MY_X_EVENT root OK Tue Aug 09 audit object exec event detected /usr/bin/no

Command no

MY_EVENT root OK Thu Aug 09 vi audit object write event detected /etc/filesystems USER_Logout michael /dev/pts/0

Audit tail

OK

Thu Aug 09

logout

Audit header

© Copyright IBM Corporation 2009

Figure D-10. Example audit records

AN151.0

Notes: Parts of an audit record Each audit record consists of two parts, an audit header and an audit tail. The tail is printed according to the format specification in /etc/security/audit/events and is only shown if you use the -v option in the auditpr command.

Content of audit header The audit header specifies the event name, the user, the status, the time, and the command that triggers the audit event.

Content of audit tail The audit tail shows additional information, such as the terminal where the user logged out, as shown in the final example on the visual.

D-26 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Instructor notes: Purpose — Describe examples of audit records. Details — Describe them using the information in the student materials. Additional information — None Transition statement — Let’s provide a flowchart that helps to set up auditing.

© Copyright IBM Corp. 2009

Appendix D. Auditing security related events Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

D-27

Instructor Guide

Set up auditing in your environment IBM Power Systems

What objects do I want to audit?

objects

What applications do I want to audit?

What users do I want to audit?

events Do they trigger events?

Are you allowed to do this? Create classes and assign them to a user.

config

© Copyright IBM Corporation 2009

Figure D-11. Set up auditing in your environment

AN151.0

Notes: Need to plan use of auditing subsystem If used correctly, the auditing subsystem is a very good tool for auditing events. However, problems can arise if the auditing subsystem gathers too much data to be analyzed. To prevent this problem from occurring, careful planning is required when configuring auditing. The flowchart on the visual provides an aid in configuring auditing in your environment so that the auditing data can be managed.

Deciding which objects to monitor Decide what objects you want to monitor. Objects are files that you can audit for read, write, or execute actions. For example, files that make good candidates for monitoring are those in the /etc directory. Unfortunately, the audit subsystem can only monitor existing files. If you wanted to monitor files like .rhosts, you first need to create the files.

D-28 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Deciding whether to monitor applications Decide if you want to monitor special applications. This could be done by adding an execute event into the objects file. If you are interested in application events, you must determine if the application triggers audit events. For example, you might want to audit all TCP/IP-related events on a system where the transfer of data needs to be monitored. These events can be found in the events file.

Deciding whether to trace users Decide if you want to trace users. Before doing this, confirm that there are no legal issues within your organization that would prohibit tracing users. To trace users, create audit classes and assign these classes to the users you want to audit.

© Copyright IBM Corp. 2009

Appendix D. Auditing security related events Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

D-29

Instructor Guide

Instructor notes: Purpose — Present a flowchart that helps in setting up auditing in a customer’s environment. Details — Describe it using the information in the student materials. Additional information — None Transition statement — There’s no checkpoint for this appendix. Let’s move on to the exercise.

D-30 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Exercise appendix A: Auditing IBM Power Systems

• Bin mode auditing • Stream mode auditing

© Copyright IBM Corporation 2009

Figure D-12. Exercise appendix D: Auditing

AN151.0

Notes: Location of this exercise This exercise is located in “Appendix A” of your Student Exercises guide.

Objectives of this exercise After the lab exercise, you should be able to: - Audit objects and application events - Create audit classes - Audit users - Set up auditing in bin and stream mode

© Copyright IBM Corp. 2009

Appendix D. Auditing security related events Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

D-31

Instructor Guide

Instructor notes: Purpose — Introduce the lab exercise. Details — Be sure to tell the students where this exercise can be found. Additional information — None Transition statement — Let’s take a quick look back at what we discussed in this appendix.

D-32 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

AP

Appendix summary IBM Power Systems

Having completed this appendix, you should be able to: • Configure the auditing subsystem

© Copyright IBM Corporation 2009

Figure D-13. Appendix summary

AN151.0

Notes:

© Copyright IBM Corp. 2009

Appendix D. Auditing security related events Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

D-33

Instructor Guide

Instructor notes: Purpose — Review the purpose of this appendix. Details — Additional information — None. Transition statement — This concludes our discussion of auditing.

D-34 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Appendix E. Diagnostics What this unit is about This appendix is an overview of diagnostics available in AIX.

What you should be able to do After completing this appendix, you should be able to: • Use the diag command to diagnose hardware • List the different diagnostic program modes

How you will check your progress Accountability: • Activity • Checkpoint questions

References Online

AIX Version 6.1 Understanding the Diagnostic Subsystem for AIX

Note: References listed as “online” above are available at the following address: http://publib.boulder.ibm.com/infocenter/systems

© Copyright IBM Corp. 2009

Appendix E. Diagnostics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

E-1

Instructor Guide

Appendix objectives IBM Power Systems

After completing this appendix, you should be able to: • Use the diag command to diagnose hardware

• List the different diagnostic program modes

© Copyright IBM Corporation 2009

Figure E-1. Appendix objectives

AN151.0

Notes:

E-2

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Introduce the objectives of this appendix. Details — Additional information — This section covers very important information for support staff aiming for AIX Certification. Transition statement — Tell the students when they need diagnostic programs.

© Copyright IBM Corp. 2009

Appendix E. Diagnostics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

E-3

Instructor Guide

When do I need diagnostics? IBM Power Systems

Diagnostics CD-ROM

NIM Master

bos.diag

Diagnostics

Hardware error in error log

Machine does not boot

Strange system behavior

© Copyright IBM Corporation 2009

Figure E-2. When do I need diagnostics?

AN151.0

Notes: Introduction The lifetime of hardware is limited. Broken hardware leads to hardware errors in the error log, to systems that will not boot, or to very strange system behavior. The diagnostic package helps you to analyze your system and discover hardware that is broken. Additionally, the diagnostic package provides information to service representatives that allows fast error analysis.

Sources for diagnostic programs Diagnostics are available from different sources: - A diagnostic package is shipped and installed with your AIX operating system. Diagnostics is packaged into separate software packages and filesets. The base diagnostics support is contained in the package bos.diag. The individual device support is packaged in separate devices.[type].[deviceid] packages. E-4

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

The bos.diag package is split into three distinct filesets: - bos.diag.rte contains the Controller and other base diagnostic code - bos.diag.util contains the Service Aids and Tasks - bos.diag.com contains the diagnostic libraries, kernel extensions, and development header files - Diagnostic CD-ROMs are available that allow you to diagnose a system that does not have AIX installed. Normally, the diagnostic CD-ROM is not shipped with the system. - Diagnostic programs can be loaded from a NIM master (NIM=Network Installation Manager). This master holds and maintains different resources, for example, a diagnostic package. This package could be loaded through the network to a NIM client, that is used to diagnose the client machine.

© Copyright IBM Corp. 2009

Appendix E. Diagnostics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

E-5

Instructor Guide

Instructor notes: Purpose — Give reasons when diagnostics are used. Describe the different sources for diagnostics. Details — Additional information — Transition statement — Let’s discuss how to use diagnostics.

E-6

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

The diag command IBM Power Systems

AIX error log

Auto diagnose

Report test result

diag

•diag allows testing of a device, if it is not busy •diag allows analyzing the error log © Copyright IBM Corporation 2009

Figure E-3. The diag command

AN151.0

Notes: Overview of the diag command Whenever you detect a hardware problem, for example, a communication adapter error in the error log, use the diag command to diagnose the hardware. The diag command can test a device if the device is not busy. If any AIX process is using a device, the diagnostic programs cannot test it; they must have exclusive use of the device to be tested. Methods used to test devices that are busy are introduced later in this unit. The diag command analyzes the error log to fully diagnose a problem if run in the correct mode. It provides information that is very useful for the service representative, for example Service Request Numbers (SRN) or probable causes. in AIX 5L and AIX 6.1 there is a cross link between the AIX error log and diagnostics. When the errpt command is used to display an error log entry, diagnostic results related to that entry are also displayed. © Copyright IBM Corp. 2009

Appendix E. Diagnostics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

E-7

Instructor Guide

Instructor notes: Purpose — Introduce the diag command. Details — Additional information — When the diagnostic tool runs, it automatically tries to diagnose hardware errors it finds in the error log. The information generated by the diag command is put back into the error log entry, so that it is easy to make the connection between the error event and, for example the FRU number required to repair failing hardware. Transition statement — Let’s show how to work with the diag menus.

E-8

AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Working with diag (1 of 2) IBM Power Systems

# diag FUNCTION SELECTION

801002

Move cursor to selection, then press Enter. Diagnostic Routines This selection will test the machine hardware. Wrap plugs and other advanced functions will not be used. ... DIAGNOSTIC MODE SELECTION

801003

Move cursor to selection, then press Enter. System Verification This selection will test the system, but will not analyze the error log. Use this option to verify that the machine is functioning correctly after completing a repair or an upgrade. Problem Determination This selection tests the system and analyzes the error log if one is available. Use this option when a problem is suspected on the machine. © Copyright IBM Corporation 2009

Figure E-4. Working with diag (1 of 2)

AN151.0

Notes: Introduction to diag menus The diag command is menu driven, and offers different ways to test hardware devices or the complete system. One method to test hardware devices with diag is: - Start the diag command. A welcome screen appears, which is not shown on the visual. After pressing Enter, the FUNCTION SELECTION menu is shown. - Select Diagnostic Routines, which allows you to test hardware devices. - The next menu is DIAGNOSTIC MODE SELECTION. Here you have two selections: • System Verification tests the hardware without analyzing the error log. This option is used after a repair to test the new component. If a part is replaced due to an error log analysis, the service provider must log a repair action to reset error counters and prevent the problem from being reported again. Running

© Copyright IBM Corp. 2009

Appendix E. Diagnostics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

E-9

Instructor Guide

Advanced Diagnostics Routines (in the FUNCTION SELECTION menu) in System Verification mode will log a repair action. • Problem Determination tests hardware components and analyzes the error log. Use this selection when you suspect a problem on a machine. Do not use this selection after you have repaired a device, unless you remove the error log entries of the broken device.

E-10 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain how to work with the diag command. Details — Additional information — The diagnostics version number appears on the first diagnostics screen. The version of diagnostics may become an issue in rare cases. Normally, diagnostic versions are backwards-compatible. However, diagnostic support for older hardware may have been dropped from the CD for a particular version of diagnostics. In this situation, students should contact support for more information. Transition statement — Let’s show how to select hardware devices to test.

© Copyright IBM Corp. 2009

Appendix E. Diagnostics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

E-11

Instructor Guide

Working with diag (2 of 2) IBM Power Systems

DIAGNOSTIC SELECTION

801006

From the list below, select any number of resources by moving the cursor to the resource and pressing 'Enter'. To cancel the selection, press 'Enter' again. To list the supported tasks for the resource highlighted, press 'List'. Once all selections have been made, press 'Commit'. To avoid selecting a resource, press 'Previous Menu'. All Resources This selection will select all the resources currently displayed. sysplanar0 System Planar U7311.D20.107F67Bsisscsia0 P1-C04 PCI-XDDR Dual Channel Ultra320 SCSI Adapter + hdisk2 P1-C04-T2-L8-L0 16 Bit LVD SCSI Disk Drive (73400 MB) hdisk3 P1-C04-T2-L9-L0 16 Bit LVD SCSI Disk Drive (73400 MB) ses0 P1-C04-T2-L15-L0 SCSI Enclosure Services Device L2cache0 L2 Cache ... © Copyright IBM Corporation 2009

Figure E-5. Working with diag (2 of 2)

AN151.0

Notes: Selecting a device to test In the next diag menu, select the hardware devices that you want to test. If you want to test the complete system, select All Resources. If you want to test selected devices, press Enter to select any device, then press F7 to commit your actions. In our example, we select one of the disk drives. If you press F4 (List), diag presents tasks the selected devices support, for example: - Run diagnostics - Run error log analysis - Change hardware vital product data - Display hardware vital product data - Display resource attributes To start diagnostics, press F7 (Commit). E-12 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain how to select hardware devices. Details — Additional information — Transition statement — What happens if a device is busy?

© Copyright IBM Corp. 2009

Appendix E. Diagnostics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

E-13

Instructor Guide

What happens if a device is busy? IBM Power Systems

ADDITIONAL RESOURCES ARE REQUIRED FOR TESTING 801011 No trouble was found. However, the resource was not tested because the device driver indicated that the resource was in use. The resource needed is - hdisk2 Drive (73400 MB) U7311.D20.107F67B-P1-C04-T2-L8-L0

16 Bit LVD SCSI Disk

To test this resource, you can do one of the following: Free this resource and continue testing. Shut down the system and reboot in Service mode. Move cursor to selection, then press Enter. Testing should stop. The resource is now free and testing can continue. © Copyright IBM Corporation 2009

Figure E-6. What happens if a device is busy?

AN151.0

Notes: If the device is busy If a device is busy, which means the device is in use, the diagnostic programs do not permit testing the device or analyzing the error log. The example in the visual shows that the disk drive was selected to test, but the resource was not tested because the device was in use. To test the device, the resource must be freed. Another diagnostic mode must be used to test this resource.

E-14 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Explain what happens if a device is busy. Details — Additional information — Transition statement — Let’s describe the different diagnostic modes.

© Copyright IBM Corp. 2009

Appendix E. Diagnostics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

E-15

Instructor Guide

Diagnostic modes (1 of 2) IBM Power Systems

Concurrent mode:

# diag

• Execute diag during normal system operation • Limited testing of components

Maintenance mode:

# shutdown -m

• Execute diag during single-user mode • Extended testing of components

Password: # diag

© Copyright IBM Corporation 2009

Figure E-7. Diagnostic modes (1 of 2)

AN151.0

Notes: Diagnostic modes Three different diagnostic modes are available: - Concurrent mode - Maintenance (single-user) mode - Service (standalone) mode (covered on the next visual).

Concurrent mode Concurrent mode provides a way to run online diagnostics on some of the system resources while the system is running normal system activity. Certain devices can be tested, for example, a tape device that is currently not in use, but the number of resources that can be tested is very limited. Devices that are in use cannot be tested.

E-16 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Maintenance (single-user) mode To expand the list of devices that can be tested, one method is to take the system down to maintenance mode by using the command shutdown -m. Enter the root password when prompted, and execute the diag command in the shell. All programs, except the operating system itself, are stopped. All user volume groups are inactive, which extends the number of devices that can be tested in this mode.

© Copyright IBM Corp. 2009

Appendix E. Diagnostics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

E-17

Instructor Guide

Instructor notes: Purpose — Describe diagnostic modes. Details — In concurrent mode, because the system is running in normal operation, devices such as the following may require additional actions by the user or diagnostic application before testing can be done: • SCSI adapters connected to paging devices • Disk drives used for paging, or are part of the rootvg • LFT devices and graphic adapters if a Windowing system is active • Memory • Processor Additional information — Transition statement — Let’s describe the standalone mode.

E-18 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Diagnostic modes (2 of 2) IBM Power Systems

Insert diagnostics CD-ROM, if available

Service (standalone) mode

Shut down your system: # shutdown

Turn off the power Press F5 (or 5) when logo appears

Boot system in service mode

diag will be started automatically © Copyright IBM Corporation 2009

Figure E-8. Diagnostic modes (2 of 2)

AN151.0

Notes: Standalone mode But what do you do if your system does not boot or if you have to test a system without AIX installed on the system? In this case, you must use the standalone mode. Standalone mode offers the greatest flexibility. You can test systems that do not boot or that have no operating system installed (the latter requires a diagnostic CD-ROM).

Starting standalone diagnostics Follow these steps to start up diagnostics in standalone mode: 1. If you have a diagnostic CD-ROM, insert it into the system. 2. Shut down the system. When AIX is down, turn off the power. 3. Turn on power.

© Copyright IBM Corp. 2009

Appendix E. Diagnostics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

E-19

Instructor Guide

4. Press F5 when an acoustic beep is heard and icons are shown on the display. This simulates booting in service mode (logical key switch). 5. The diag command will be started automatically, from the diagnostic CD-ROM. 6. At this point, you can start your diagnostic routines.

Using keys to control boot mode After the system discovers the keyboard (you will hear a beep) and before the system begins to use a particular bootlist, you may press a key to control the mode and bootlist. Both F5 and F6 will cause the system to execute a service mode boot. On newer systems, the equivalent keys would be a numeric 5 or numeric 6, but we will refer to F5 and F6 here. F5 uses the system default (non-customizable) bootlist. It lists the diskette drive, CD drive, hard drive, and network adapter (in that order). F6 uses the customizable service bootlist, which can be set with the bootlist command, SMS, or the diag utility. If the first successfully bootable device in the selected bootlist (normal, F5 or F6) is a CD drive with a diagnostic CD loaded, the system will boot into diagnostic mode. If you are doing a service mode boot and the first successfully bootable device in the selected bootlist (F5 or F6) is a hard drive, then the system will boot into diagnostic mode from that hard drive. If the first successfully bootable device in the selected bootlist is installation media (AIX installation CD or mksysb tape/CD), then the system will boot into Installation and Maintenance mode.

Using NIM to boot to standalone diagnostic mode Assuming that the network adapter itself is not the problem, you can also boot to standalone diagnostic mode doing a network boot using a NIM server. The NIM service must first be set up with a spot resource assigned to your machine object and then you need to prepare it your machine object to serve out a server diagnostics rather than a mksysb or BOS filesets from installation. Next, you boot the machine to SMS, use SMS to set up the IP parameters and then select the network adapter as the boot device.

E-20 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Instructor notes: Purpose — Describe how to start up the standalone mode. Details — Additional information — Standalone mode allows the greatest number of devices to be tested. However, it does not have the ability to examine entries in the system error log. Transition statement — Let’s look at additional tasks diag provides.

© Copyright IBM Corp. 2009

Appendix E. Diagnostics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

E-21

Instructor Guide

diag: Using task selection IBM Power Systems

# diag FUNCTION SELECTION

801002

Move cursor to selection, then press Enter. ... Task Selection (Diagnostics, Advanced Diagnostics, Service Aids, etc.) This selection will list the tasks supported by these procedures. Once a task is selected, a resource menu may be presented showing all resources supported by the task. ...

• • • • • • • •

Run diagnostics Run error log analysis Run exercisers Display or change diagnostic run time options Add resource to resource list Automatic error log analysis and notification Back up and restore media Certify media

• Change hardware VPD • Configure platform processor diagnostics • Create customized configuration diskette • Disk maintenance • Display configuration and resource list … and more

© Copyright IBM Corporation 2009

Figure E-9. diag: Using task selection

AN151.0

Notes: Additional tasks The diag command offers a wide number of additional tasks that are hardware related. All these tasks can be found after starting the diag main menu and selecting Task Selection. The tasks that are offered are hardware (or resource) related. For example, if your system has a service processor, you will find service processor maintenance tasks, which you do not find on machines without a service processor. On some systems, you find tasks to maintain RAID and SSA storage systems.

Example list of tasks Following is a list of tasks available on a power6 p570 running AIX 6.1: Run Diagnostics Run Error Log Analysis E-22 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Run Exercisers Display or Change Diagnostic Run Time Options Add Resource to Resource List Automatic Error Log Analysis and Notification Back Up and Restore Media Change Hardware Vital Product Data Configure Platform Processor Diagnostics Create Customized Configuration Diskette Delete Resource from Resource List Create Customized Configuration Diskette Delete Resource from Resource List Disk Maintenance Display Configuration and Resource List Display Firmware Device Node Information Display Hardware Error Report Display Hardware Vital Product Data Display Multipath I/O (MPIO) Device Configuration Display Previous Diagnostic Results Display Resource Attributes Display Service Hints Display Software Product Data Display Multipath I/O (MPIO) Device Configuration Display Previous Diagnostic Results Display Resource Attributes Display Service Hints Display Software Product Data Display or Change Bootlist Gather System Information Hot Plug Task Log Repair Action Microcode Tasks RAID Array Manager Update Disk Based Diagnostics

© Copyright IBM Corp. 2009

Appendix E. Diagnostics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

E-23

Instructor Guide

Instructor notes: Purpose — Describe the additional tasks that diag offers. Details — Explain some typical tasks that are offered. Additional information — All newer PCI models support the diag command. Transition statement — The diagnostic output is saved to a binary file so it can be referenced later. Let’s take a look at that.

E-24 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Diagnostic log IBM Power Systems

# /usr/lpp/diagnostics/bin/diagrpt -r ID DATE/TIME T RESOURCE_NAME DC00 Mon Oct 08 16:13:06 I diag DAE0 Mon Oct 08 16:10:38 N hdisk2 DC00 Mon Oct 08 16:10:13 I diag DA00 Mon Oct 08 16:05:11 N sysplanar0 DA00 Mon Oct 08 16:05:05 N sisscsia0 DC00 Mon Oct 08 16:04:46 I diag

DESCRIPTION Diagnostic Session was started The device could not be tested Diagnostic Session was started No Trouble Found No Trouble Found Diagnostic Session was started

# /usr/lpp/diagnostics/bin/diagrpt -a IDENTIFIER: DC00 Date/Time: Mon Oct 08 16:13:06 Sequence Number: 15 Event type: Informational Message Resource Name: diag Diag Session: 327726 Description: Diagnostic Session was started. ---------------------------------------------------------------------------IDENTIFIER: DAE0 Date/Time: Mon Oct 08 16:10:38 Sequence Number: 14 Event type: Error Condition Resource Name: hdisk2 Resource Description: 16 Bit LVD SCSI Disk Drive Location: U7311.D20.107F67B-P1-C04-T2-L8-L0 © Copyright IBM Corporation 2009

Figure E-10. Diagnostic log

AN151.0

Notes: Diagnostic log When diagnostics are run in online or single user mode, the information is stored into a diagnostic log. The binary file is called /var/adm/ras/diag_log. The command, /usr/lpp/diagnostics/bin/diagrpt, is used to read the content of this file.

Report fields The ID column identifies the event that was logged. In the example in the visual, DC00 and DA00 are shown. DC00 indicated the diagnostics session was started and the DA00 indicates No Trouble Found (NTF). The T column indicates the type of entry in the log. I is for informational messages. N is for No Trouble Found. S shows the Service Request Number (SRN) for the error that was found. E is for an Error Condition.

© Copyright IBM Corp. 2009

Appendix E. Diagnostics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

E-25

Instructor Guide

Instructor notes: Purpose — Show the contents of a diagnostics log. Details — Review the visual content for the diagnostics log. The student notes explain the ID and Types that are displayed. Additional information — The IDs that currently exist are: DC00 - Diagnostic controller session started DCF0 - Diagnostic controller reported an SRN from missing options DCF1 - Diagnostic controller reported an SRN from new resource DCE1 - Diagnostic controller reported ERROR_OTHER DA00 - Diagnostic application reported NTF (No Trouble Found) DAF0 - Diagnostic application reported an SRN DAFE - Diagnostic application reported an ELA (Error Log Analysis) SRN DAE0 - Diagnostic application reported ERROR_OPEN DAE1 - Diagnostic application reported ERROR_OTHER Transition statement — Let’s answer some checkpoint questions.

E-26 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Checkpoint IBM Power Systems

1. What diagnostic modes are available? ____________________________________________ ____________________________________________ ____________________________________________

2. How can you diagnose a communication adapter that is used during normal system operation? ____________________________________________

© Copyright IBM Corporation 2009

Figure E-11. Checkpoint

AN151.0

Notes:

© Copyright IBM Corp. 2009

Appendix E. Diagnostics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

E-27

Instructor Guide

Instructor notes: Purpose — Review and test the students, understanding of this unit. Details — A suggested approach is to give the students about five minutes to answer the questions on this page. Then, go over the questions and answers with the class.

Checkpoint solutions IBM Power Systems

1. What diagnostic modes are available? Concurrent Maintenance Service (standalone)

2. How can you diagnose a communication adapter that is used during normal system operation? Use either maintenance or service mode.

© Copyright IBM Corporation 2009

Additional information — Transition statement — Now, let’s do an exercise.

E-28 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Exercise appendix B: Diagnostics IBM Power Systems

Execute hardware diagnostics in the following modes: íConcurrent íMaintenance íService (standalone)

© Copyright IBM Corporation 2009

Figure E-12. Exercise appendix E: Diagnostics

AN151.0

Notes: Introduction This exercise can be found in your Student Exercise Guide.

© Copyright IBM Corp. 2009

Appendix E. Diagnostics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

E-29

Instructor Guide

Instructor notes: Purpose — Explain the goals of the lab. Details — Clearly explain what students have to do. Additional information — This exercise should be performed only by one person per system. Transition statement — Let’s summarize.

E-30 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3 Instructor Guide

Uempty

Appendix summary IBM Power Systems

Having completed this appendix, you should be able to: • Use the diag command to diagnose hardware

• List the different diagnostic program modes

© Copyright IBM Corporation 2009

Figure E-13. Appendix summary

AN151.0

Notes:

© Copyright IBM Corp. 2009

Appendix E. Diagnostics Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

E-31

Instructor Guide

Instructor notes: Purpose — Summarize the unit. Details — Present the highlights from the unit. Additional information — Transition statement —

E-32 AIX Advanced Administration Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

© Copyright IBM Corp. 2009

V5.3

backpg

Back page

Power Systems for AIX III - Advanced Administration and Problem Determination

Short Description

Description

Comments

We need your help!