RH436.PDF

January 30, 2017 | Author: jeheca | Category: N/A

Share Embed Donate

Report this link

Short Description

RH 436 course...

Description

1 II

a II 1 1 1 1 1

e 1

e 1 1 1

e • I/

10 1

1

RH436 Red Hat Enterprise Clustering and Storage Management RH436-RHEL5u4-en-17-20110428

11

Table of Contents

e

RH436 - Red Hat Enterprise Clustering and Storage Management

III e

110 o

e e

II

RH436: Red Hat Enterprise Clustering and Storage Management Copyright Welcome Red Hat Enterprise Linux Red Hat Enterprise Linux Variants Red Hat Subscription Model Contacting Technical Support Red Hat Network Red Hat Services and Products Fedora and EPEL Classroom Setup Networks Notes on Intemationalization

Lecture 1 - Storage Technologies

I

Objectives The Data Data Storage Considerations Data Availability Planning for the Future The RHEL Storage Model Volume Management SAN versus NAS SAN Technologies Fibre Channel Host Bus Adapter (HBA) Fibre Channel Switch Internet SCSI (iSCSI) End of Lecture 1 Lab 1: Data Management and Storage Lab 1.1: Evaluating Your Storage Requirements Lab 1.2: Configuring the Virtual Cluster Environment

•

Lecture 2 - iSCSI Configuration

110

Objectives Red Hat iSCSI Driver iSCSI Data Access iSCSI Driver Features iSCSI Device Names and Mounting iSCSI Target Naming

Ilb

IP 11/ 11>

I 10

II> 10

ix x xi xii xiii xiv xv xvi xvii xviii xix xx

Copyright © 2011 Red Hat, Inc.

2 3 4 5 6 7 8 9 10 11 12 13 14 15 17

20 21 22 23 24 RH436-RHEL5u4-en-17-20110428 / rh436-main

Configuring iSCSI Targets Manual iSCSI configuration Configuring the iSCSI Initiator Driver iSCSI Authentication Settings Configuring the open-iscsi Initiator First-time Connection to an iSCSI Target Managing an iSCSI Target Connection Disabling an iSCSI Target End of Lecture 2

25 26 27 28 29 30 31 32 33

Lab 2: iSCSI Configuration

Lab 2.1: iSCSI Software Target Configuration Lab 2.2: iSCSI Initiator Configuration

34 35

• •

•• •

Lecture 3 - Kernel Device Management Objectives udev Features Event Chain of a Newly Plugged-in Device /sys Filesystem udev

Configuring udev udev Rules udev Rule Match Keys Finding udev Match Key Values udev Rule Assignment Keys udev Rule Substitutions udev Rule Examples udevmonitor Dynamic storage management Tuning the disk queue Tuning the deadline scheduler Tuning the anticipatory scheduler Tuning the noop scheduler Tuning the (default) cfq scheduler Fine-tuning the cfq scheduler End of Lecture 3

42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

• • • •

•

111 •

Lab 3: udev and device tuning

Lab 3.1: Persistent Device Naming

62

Lecture 4 - Device Mapper and Multipathing Objectives Device Mapper Device Mapping Table dmsetup

Mapping Targets Mapping Target Mapping Target Mapping Target Mapping Target Copyright © 2011 Red Hat, Inc.

linear striped error snapshot-origin

65 66 67 68 69 70 72 73 RH436-RHEL5u4-en-17-20110428 / rh436-main

•

e e

e e

•

111/ e e II

III

Mapping Target - snapshot LVM2 Snapshots LVM2 Snapshot Example Mapping Target - zero Device Mapper Multipath Overview Device Mapper Components Multipath Priority Groups Mapping Target - mult ipath Setup Steps for Multipathing FC Storage Multipathing and iSCSI Multipath Configuration Multipath Information Queries End of Lecture 4 Lab 4: Device Mapper Multipathing Lab 4.1: Device Mapper Multipathing

11)

II>

10 II>

Lecture 6 - Logical Volume Management

111>

e I, 110 11,

I, IP

11, II/

91

Lecture 5 - Red Hat Cluster Suite Overview Objectives What is a Cluster? Red Hat Cluster Suite Cluster Topology Clustering Advantages Advanced Configuration and Power Interface (ACPI) Cluster Network Requirements Broadcast versus Multicast Ethernet Channel Bonding Channel Bonding Configuration Red Hat Cluster Suite Components Security Cluster Configuration System (CCS) CMAN - Cluster Manager Cluster Quorum OpenAIS rgmanager - Resource Group Manager The Conga Project luci ricci Deploying Conga lucí Deployment Interface Clustered Logical Volume Manager (CLVM) Distributed Lock Manager (DLM) Fencing End of Lecture 5 Lab 5: Cluster Deployment using Conga Lab 5.1: Building a Cluster with Conga

11,

74 75 76 78 80 81 82 83 84 85 86 88 90

103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128

Objectives Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / rh436-main

An LVM2 Review LVM2 - Physical Volumes and Volume Groups LVM2 - Creating a Logical Volume Files and Directories Used by LVM2 Changing LVM options Moving a volume group to another host Clustered Logical Volume Manager (CLVM) CLVM Configuration End of Lecture 6

138 139 140 141 142 143 144 145 146

Lab 6: Clustered Logical Volume Manager

Lab 6.1: Configure the Clustered Logical Volume Manager

147

Lecture 7 - Global File System 2 Objectives Global File System 2 GFS2 Limits GFS2 Enhancements Creating a GFS2 File System Lock Managers Distributed Lock Manager (DLM) Mounting a GFS2 File System Journaling Quotas Growing a GFS2 File System GFS2 Super Block Changes GFS2 Extended Attributes (ACL) Repairing a GFS2 File System End of Lecture 7

150 151 152 153 154 155 156 157 158 159 160 161 162 163

Lab 7: Global File System 2

Lab 7.1: Creating a GFS2 file system with Conga Lab 7.2: Create a GFS2 filesystem on the commandline Lab 7.3: GFS1: Conversion Lab 7.4: GFS2: Working with images Lab 7.5: GFS2: Growing the filesystem

164 165 168 169 170

Lecture 8 - Quorum and the Cluster Manager Objectives Cluster Quorum Cluster Quorum Example Modifying and Displaying Quorum Votes CMAN - two node cluster CCS Tools - ccs_tool cluster. conf Schema Updating an Existing RHEL4 cluster. conf for RHEL5 cman_tool cman_tool Examples

CMAN - API CMAN - libcman r,,rmrinht"9n11 ph.d 1-Int Int,

182 183 184 186 187 188 189 190 191 192 193 id-cm-17-9M 1 flA9R / rhtilA-main

• •

• • • • • • • • • • • • •

•

e 111 •

4B

II,

110

e

End of Lecture 8 Lab 8: Adding Cluster Nodes and Manually Editing cluster .conf Lab 8.1: Extending Cluster Nodes Lab 8.2: Manually Editing the Cluster Configuration Lab 8.3: GFS2: Adding Journals

e

11,

Lecture 10 - Quorum Disk

11

Objectives Quorum Disk Quorum Disk Communications Quorum Disk Heartbeating and Status Quorum Disk Heuristics Quorum Disk Configuration Working with Quorum Disks Example: Two Cluster Nodes and a Quorum Disk Tiebreaker Example: Keeping Quorum When All Nodes but One Have Failed End of Lecture 10 Lab 10: Quorum Disk Lab 10.1: Quorum Disk

1

IP 11/ 11,

111/

e 11/ • 10

11> •

10

195 197 198

Lecture 9 - Fencing and Failover Objectives No-fencing Scenario Fencing Components Fencing Agents Power Fencing versus Fabric Fencing SCSI Fencing Fencing From the Command Line The Fence Daemon - fenced Manual Fencing Fencing Methods Fencing Example - Dual Power Supply Handling Software Failures Handling Hardware Failures Failover Domains and Service Restrictions Failover Domains and Prioritization NFS Failover Considerations clusvcadm End of Lecture 9 Lab 9: Fencing and Failover Lab 9.1: Node Priorities and Service Relocation

11/

194

207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224

229 230 231 232 233 234 235 236 237 238

Lecture 11 - rgmanager Objectives Resource Group Manager Cluster Configuration - Resources Copyright © 2011 Red Hat, Inc.

244 245 RH436-RHEL5u4-en-17-20110428 / rh436-main

Resource Groups Start/Stop Ordering of Resources Resource Hierarchical Ordering NFS Resource Group Example Resource Recovery Service Status Checking Custom Service Scripts Displaying Cluster and Service Status Cluster Status (luc i) Cluster Status Utility (clustat) Cluster Service States Cluster SNMP Agent Starting/Stopping the Cluster Software on a Member Node Cluster Shutdown Tips Troubleshooting Logging End of Lecture 11 Lab 11: Cluster Manager Lab 11.1: Adding an NFS Service to the Cluster Lab 11.2: Configuring SNMP for Red Hat Cluster Suite

247 248 249 250 251 252 253 254 255 256 257 258 260 261 262 263 264 265 266

Copyright © 2011 Red Hat, Inc.

II • • •

•

•

274 275

•

276 277 278 279 280 281

•

Appendix A - Advanced RAID Objectives Redundant Array of Inexpensive Disks RAIDO RAID1 RAID5 RAID5 Parity and Data Distribution RAID5 Layout Algorithms RAID5 Data Updates Overhead RAID6 RAID6 Parity and Data Distribution RAID10 Stripe Parameters /proc/mdstat Verbose RAID Information

111

•

Lecture 12 - Comprehensive Review Objectives Start from scratch End of Lecture 12 Lab 12: Comprehensive Review Lab 12.1: Rebuild your environment Lab 12.2: Setup iscsi and multipath Lab 12.3: Build a three node cluster Lab 12.4: Add a quorum-disk Lab 12.5: Add a GFS2 filesystem Lab 12.6: Add a NFS-service to your cluster

•

•

1111

• •

291 292 293 294 295 296 297 298 299 300 301 302 303 RH436-RHEL5u4-en-17-20110428 / rh436-main

e e

•

11>

e

1110

I 11> 11/

SYSFS Interface /etc/mdadm.conf Event Notification Restriping/Reshaping RAID Devices Growing the Number of Disks in a RAID5 Array Improving the Process with a Critica) Section Backup Growing the Size of Disks in a RAID5 Array Sharing a Hot Spare Device in RAID Renaming a RAID Array Write-intent Bitmap Enabling Write-Intent on a RAID1 Array Write-behind on RAID1 RAID Error Handling and Data Consistency Checking Appendix A: Lab: Advanced RAID Lab A.1: Improve RAID1 Recovery Times with Write-intent Bitmaps Lab A.2: Improve Data Reliability Using RAID 6 Lab A.3: Improving RAID reliability with a Shared Hot Spare Device Lab A.4: Online Data Migration Lab A.5: Growing a RAID5 Array While Online Lab A.6: Clean Up Lab A.7: Rebuild Virtual Cluster Nodes

304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 320 321 322 323 324

111>

1 1 1 I/ O 1 1 1 1 1 1 Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / rfi436-main

Introduction

RH436: Red Hat Enterprise Clustering and Storage Management

For use only by e student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publicetion may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training meteriels are being improperly usad, copied, or distributed pleese email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 0ca8c908

1

Copyright

• •

•

•

The contents of this course and all its modules and related materials, including handouts to audience members, are Copyright O 2011 Red Hat, Inc. No part of this publication may be stored in a retrieval system, transmitted or reproduced in any way, including, but not limited to, photocopy, photograph, magnetic, electronic or other record, without the prior written permission of Red Hat, Inc. This instructional program, including all material provided herein, is supplied without any guarantees from Red Hat, Inc. Red Hat, Inc. assumes no liability for damages or legal action arising from the use or misuse of contents or details contained herein. If you believe Red Hat training materials are being used, copied, or otherwise improperly distributed please email [email protected] or phone toll-free (USA) +1 866 626 2994 or +1 919 754 3700.

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red HM Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior mitren consent of Red HM, Inc. ff you believe Red HM training materiats are being improperly used, copied, or distributed pisase email Add/Remove Software). A system's default language can be changed with system-config-language ( System->Administration>Language), which affects the /etc/sysconfig/il 8n file. Users may prefer to use a different language for their own desktop environment or interactive shells than is set as the system default. This is indicated to the system through the LANG environment variable. This may be set automatically for the GNOME desktop environment by selecting a language from the graphical login screen by clicking on the Language item at the bottom left comer of the graphical login screen immediately prior to login. The user will be prompted about whether the language selected should be used just for this one login session or as a default for the user from now on. The setting is saved in the user's -/ . dmrc file by GDM. If a user wants to make their shell environment use the same LANG setting as their graphical environment even when they login through a text console or over ssh, they can set code similar to the following in their -/ .bashrc file. This will set their preferred language if one is saved in -/ . dmrc and use the system default if not: i=$(grep 'Language=' ${HOME}/.dmrc I sed 's/Language=//') if [ "$i" I= "" ]; then export LANG=$i fi.

Languages with non-ASCII characters may have problems displaying in some environments. Kanji characters, for example, may not display as expected on a virtual console. Individual commands can be made to use another language by setting LANG on the command-line: [userhost. -i$ LANG=frFR.UTF-8 date

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No pare of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hal training materials are being improperly used, copied, or distributed please email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc. . .

RH436-RHEL5u4-en-17-20110428 / 6f0a110d

The Data

1-1

• User versus System data • • •

Availability requirements Frequency and type of access Directory location •

/home versus /var/spool/mail

• Application data •

Shared?

• Host or hardware-specific data User data often has more demanding requirements and challenges than system data. System data is often easily re-created from installation CDs and a relatively small amount of backed-up configuration files. System data can often be reused for similar architecture machines, whereas user data is highly specific to each user. Some user data lies outside of typical user boundaries, like user mailboxes. Would the data ideally be shared among many machines? is the data specific to a specific type of architecture?

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No parí of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materiale are being improperly ueed, copied, or distributed please email or phone toll-free (USA) +1 (855) 626 2994 or +1 (919)754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / c59804d9

Data Storage Considerations

1 -2

• Is it represented elsewhere? • Is it private or public? • Is it nostalgic or pertinent? • Is it expensive or inexpensive? • Is it specific or generic? Is the data unique, or are there readily-accessible copies of ít elsewhere? Does the data need to be secured, or is it available to anyone who requests it? Is the data stored for historical purposes, or are old and new data being accessed just as frequently? Was the data difficult or expensíve to obtain? Could it just be calculated from other already-available data, or is it one of a kind? Is the data specific to a particular architecture or OS type? Is it specific to one application, or one version of one application?

For use only by a student enrollad in a Red Hat training course taught by Red Hat, Inc. or a Red HM Certifled Training Partner. No pan of this publícation may be photocopied, duplicated, stored in a retrieval system, or othenvise reproduced without prior written consent of Red HM, Inc. It you believe Red HM training materials are being impropedy usad, copiad, or cfistributed please email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 9864cabd

Data Availability

1 -3

• How available must it be? • Data lifetime •

Archived or stored?

• Frequency and method of access •

Read-only or modifiable

•

Application-specific or direct access

•

Network configuration and security

• Is performance a concern? •

Applications "data starved"?

• Where are my single points of failure (SPOF)? What happens if the data become unavailable? What is necessary to be done in the event of data downtime? How long is the data going to be kept around? Is it needed to establish a historical profile, or is it no longer valid after a certain time period? Is this data read-only, or is it frequently modified? What exactly is modified? Is modification a privilege of only certain users or applications? Are applications or users limited in any way by the performance of the data storage? What happens when an application is put into a wait-state for the data it needs? With regard to the configuration environment and resources used, where are my single points of failure?

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materials are being improperly usad, copied, or distributed please email trainíng•redhat com> or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Coovriaht © 2011 Red Hat. Inc.

RH436-RHEL5u4-en-17-20110428 / be32bccb

Planning for the Future

1 -4

• Few data requirements ever diminish • Reduce complexity • Increase fiexibility • Storage integrity Few data requirements ever diminish: the number of users, the size of stored data, the frequency of access, etc.... What mechanisms are in place to aid this growth? A reduction in complexity often means a simpler mechanism for its management, which often leads to less error-prone tools and methods.

For use only by a student enrollad in a Red Hat training course taught by Red HM, Inc. or a Red HM Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red HM, Inc. H you believe Red HM training materials are being improperly usad, copiad, or cHstributed pisase email or phone ton-free (USA) +1 (1366) 626 2994 or +1 (919) 754 3700.

Copyright(?) 2011 Red Hat, Inc. ..

RH436-RHEL5u4-en-17-20110428 / b3f8d9b5 7

SAN versus NAS

1 -7

• Two shared storage technologies trying to accomplish the same thing -- data delivery • Network Attached Storage (NAS) •

The members are defined by the network • •

Scope of domain defined by IP domain NFS/CIFS/HTTP over TCP/IP Delivers file data blocks

• Storage Area Network (SAN) •

The network is defined by its members •

Scope of domain defined by members

•

Encapsulated SCSI over fibre channel Delivers volume data blocks

Often used one for the other, Storage Area Network(SAN) and Network Accessed Storage (NAS) differ. NAS is best described as IP network access to File/Record data. A SAN represents a collection of hardware components which, when combined, present the disk blocks comprising a volume over a fibre channel network. The iSCSI-SCSI layer communication over IP also satisfies this definition: the delivery of low-level device blocks to one or more systems equally. NAS servers generally run some form of a highly optimized embedded OS designed for file sharing. The NAS box has direct attached storage, and clients connect to the NAS server just like a regular file server, over a TCP/IP network connection. NAS deals with files/records. Contrast this with most SAN implementations in which Fibre-channel (FC) adapters provide the physical connectivity between servers and disk. Fibre-channel uses the SCSI command set to handle communications between the computer and the disks; done properly, every computer connected to the disk view it as if it were direct attached storage. SANs deal with disk blocks. A SAN essentially becomes a secondary LAN, dedicated to interconnecting computers and storage devices. The advantages are that SCSI is optimized for transferring large chunks of data across a reliable connection, and having a second network can off-load much of the traffic from the LAN, freeing up capacity for other uses.

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, *torrad in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training material* are being improperly usad, copied, or distributed please email < training*redhat com> or phone toll-free (USA) +1 (866) 628 2994 or +1 (919) 754 3700.

Copyright 02011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 631 ba8c2

SAN Technologies

1 -8

• Different mechanisms of connecting storage devices to machines over a network • Used to emulate a SCSI device by providing transparent delivery of SCSI protocol to a storage device • Provide the illusion of locally-attached storage • Fibre Channel •

Networking protocol and hardware for transporting SCSI protocol across fiber optic equipment

• Internet SCSI (iSCSI) •

Network protocol that allows the use of the SCSI protocol over TCP/IP networks

•

"SAN via IP"

• Global Network Block Device (GNBD) •

Client/Server kernel modules that provide block-level storage access over an Ethernet LAN

•

Deprecated by iSCSI, included for compatibility only,

Most storage devices use the SCSI (Small Computer System Interface) command set to communicate. This is the same command set that was developed to control storage devices attached to a SCSI parallel bus. The SCSI command set is not tied to the originally-used bus and is now commonly used for all storage devices with all types of connections, including fibre channel. The command set is still referred to as the SCSI command set. The LUN on a SCSI parallel bus is actually used to electrically address the various devices. The concept of a LUN has been adapted to fibre channel devices to allow multiple SCSI devices to appear on a single fibre channel connection. It is important to distinguish between a SCSI device and a fibre channel (or iSCSI, or GNBD) device. A fibre channel device is a abstract device that emulates one or more SCSI devices at the lowest leve) of storage virtualization. There is not an actual SCSI device, but one is emulated by responding appropriately to the SCSI protocol. SCSI over fibre channel is similar to speaking a language over a telephone connection. The low level connection (fibre channel) is used to transpon the conversation's language (SCSI command set).

For use only by a student enrolled in a Red HM training course taught by Red HM, Inc. or a Red HM Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red HM, Inc. tf you believe Red HM training material are being improperly used, copiad, or cfistributed pelase email ctrainiageredhat . coa> or phone to1I-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright O 2011 . Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 4ddab820

Fibre Channel

1-9

Common enterprise-class network connection to storage technology • Major components: •

Fiber optic cable

• •

Interface card (Host Bus Adaptor) Fibre Channel switching technology

Fibre Channel is a storage networking technology that provides flexible connectivity options to storage using specialized network switches, fiber optic cabling, and optic connectors. While a common connecting cable for fibre channel is fiber-optic, it can also be enabled over twisted pair copper wire, despite the implied limitation of the technology's name. Transmitting the data via light signals, however, allows the cabling lengths to far exceed that of normal copper wiring and be far more resistant to electrical interference. The Host Bus Adaptor (HBA), in its many forms, is used to convert the light signals transmitted over the fiber-optic cables to electrical signals (and vice-versa) for interpretation by the endpoint host and storage technologies. The fibre channel switch is the foundation of a fibre channel network, defining the topology of how the network ports are arranged and the data path's resistance to failure.

For use only by a student enrolled in a Red HM training course taught by Red Het, Inc. or e Red Hat Certified Training Partner. No part of this publication mey be photocopied, duplicated, stored in e retrievel system, or otherwise reproduced without prior written consent of Red Het, Inc. If you believe Red Het training meteriale ere being improperly used, copied, or distributed please emeil < t reining0redhat com> or phone toIl-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 78crif51

Host Bus Adapter (HBA)

• • • •

1-10

Used to connect hosts to the fibre channel network Appears as a SCSI adapter Relieves the host microprocessor of data I/O tasks Multipathing capable

An HBA is simply the hardware on the host machine that connects it to, for example, a fibre channel networked device. The hardware can be a PCI, Sbus, or motherboard-embedded IC that transiates signals on the local computer to frames on the fibre channel network. An operating system treats an HBA exactly like it does a SCSI adapter. The HBA takes the SCSI commands it was sent and transiates them into the fiber channel protocol, adding network headers and error handling. The HBA then makes sure the host operating system gets return information and status back from the storage device across the network, just like a SCSI adapter would. Some HBAs offer more than one physical pathway to the fibre channel network. This is referred to as multipathing. While the analogy can be drawn to NICs and their purpose, HBAs tend to be far more intelligent: switch negotiation, tracking devices on the network, I/O processing offloading, network configuration monitoring, load balancing, and failover management. Critica! to the HBA is the driver that controls it and communicates with the host operating system. In the case of iSCSI-like technologies, TCP Offloading Engine (TOE) cards can be used instead of ordinary NICs for performance enhancement.

For use onty by a student enrollad in a Red HM training course taught by Red HM, Inc. or a Red HM Certified Training Partner. No pan of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red HM, Inc. If you believe Red HM training materials are being improperly usad, copied, or cfistributed pisase email ctrainingaredhat coa> or phone tdt-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright (?) 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 2dbdf27c

••

Fibre Channel Switch

• Foundation of a Fibre channel SAN, providing: • • • •

High-speed non-blocking interconnect between devices Fabric services Additional ports for scalability Linking capability of the SAN over a wide distance

• Switch topologies • • •

Point-to-Point - A simple two-device connection Arbitrated loop - All devices are arranged in a loop connection Switched fabric - All devices are connected to one or more interconnected Fibre Channel switches, and the switches manage the resulting "fabric" of communication channels

The fibre channel fabric refers to one or more interconnected switches that can communicate with each other independently instead of having to share the bandwidth, such as in a looped network connection. Additional fiber channel switches can be combined into a variety of increasingly complex wired connection patterns to provide total redundancy so that failure of any one switch will not harm the fabric connection and still provide maximum scalability. Fibre channel switches can provide fabric services. The services provided are conceptually distributed (independent of direct switch attachment) and include a login server (fabric device authentication), name server (a distributed database that registers all devices on a fabric and responds to requests for address information), time server (so devices can maintain system time with each other), alias server (like a name server for multicast groups), and others. Fibre channel is capable of communicating up to 100km.

For use only by a student enrollad in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, atorad in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materials are being improperly used, copied, or distributed please email < t redningeradhat . coa> or phone toll-f res (USA) +1 (855) 826 2994 or +1 (919) 754 3700.

Copyright O 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 2c3a168e

Internet SCSI (iSCSI)

1-12

• A protocol that enables clients (initiators) to send SCSI commands to remote storage devices (targets) • Uses TCP/IP (tcp : 3260, by default) • Often seen as a low-cost alternatíve to Fibre Channel because ít can run over existing switches and network infrastructure iSCSI sends storage traffic over TCP/IP, so that inexpensive Ethernet equipment may be used instead of Fibre Channel equipment. FC currently has a performance advantage, but 10 Gigabit Ethernet will eventually allow TCP/IP to surpass FC in overall transfer speed despíte the additional overhead of TCP/IP to transmit data. TCP offload engines (TOE) can be used to remove the burden of doing TCP/IP from the machines using iSCSI. iSCSI is routable, so it can be accessed across the Internet.

For use only by a atudent enrollad in a Red HM training course taught by Red Hat, Inc. or a Red HM Certified Training Partner. No part of this publication may be photocopied, duplicated, atorad in a retrieval system, or otherwise reproduced without prior written consent of Red HM, Inc. It you betieve Red HM training materiafs are being improperly usad, copiad, or distributed pisase email 1

Scenario:

The root password is redhat for your classroom workstation and for a11 virtual machines.

Deliverable:

Create, instan, and test the virtual cluster machines hosted by your workstation.

Instructions:

1.

Configure your physical machine to recognize the hostnames of your virtual machines: stationX#

cat RH436/HelpfulFiles/hosts

-

table » /etc/hosts

1,

2.

111

The virtual machines used for your labs still need be created. Execute the script rebuildc luster -m. This script will build a master Xen virtual machine (cXnO . example.com, 172.16.50. X0, hereafter referred to as 'node0') within a logical volume. The node O Xen virtual machine will be used as a template to create three snapshot images. These snapshot images will, in turn, become our cluster nodes. stationX# rebuild-cluster -m This will create or rebuild the template node (node0). Continue? (y/N): y

If you are logged in graphically a virt viewer will automatically be created, otherwise your terminal will automatically become the console window for the instan. -

IP

The installation process for this virtual machine template will take approximately 10-15 minutes.

IIP 3.

Once your nade O installation is complete and the node has shut down, your three cluster nodes: 172.16.50.X1 172.16.50.X2 172.16.50.X3

cXnl.example.com cXn2.example.com cXn3.example.com

can now be created. Each cluster node is created as a logical volume snapshot of nade O. The pre-created rebuild cluster script simplifies the process of creating and/or rebuilding your three cluster nodes. Feel free to inspect the script's contents to see what it is doing. Passing any combination of numbers in the range 1 -3 as an option to rebui ldc lus ter creates or rebuilds those corresponding cluster nodes in a process that takes only a few minutes. -

11) 10

At this point, create three new nodes: stationX# rebuild-cluster -123 This will create or rebuild node(s): 1 2 3 Continue? (y/N): y Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 1ccafc0b

Monitor the boot process of one or all three nodes using the command: st ionX#

•

xm console nodeN

where N is a node number in the range 1-3. Console mode can be exited at any time with the keystroke combination: Ct rl - ] .

411

To rebuild only node3, execute the following command (Do not worry if it has not finished booting yet):

•

st at

rebuild cluster -

-

3

1111

Because the cluster nodes are snapshots of an already-created virtual machine, the rebuilding process is dramatically reduced in time, compared to building a virtual machine from scratch, as we did with node0.

•

You should be able to log into all three machines once they have completed the boot process.

•

For your convenience, an /et c/host s table has already been preconfigured on your cluster nodes with name-to-IP mappings of your assigned nodes.

11/

If needed, ask your instructor for assistance.

• 1

111 •

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 1ccafc0b

•

Lecture 2

iSCSI Configuration Upon completion of this unit, you should be able to: • Describe the iSCSI Mechanism • Define iSCSI Initiators and Targets • Explain iSCSI Configuration and Tools

For use only by a student enrollad in a Red HM training course taught by Red HM, Inc. or a Red HM Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red HM, Inc. 1f you betieve Red HM training materiMs are being improperly usad, copiad, or distributed piense email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 2276845b

Red Hat iSCSI Driver

2-1

• Provides a host with the ability to access storage via IP • iSCSI versus SCSI/FC access to storage:

Host Appitcatioris

~II SCSI Driver iSCSI Driver TCP/IP

SCSt or PC Adepter Driver

Network Drivers

Storage Router or Gateway

The iSCSI driver provides a host with the ability to access storage through an IP network. The driver uses the iSCSI protocol (IETF-defined) to transport SCSI requests and responses over an IP network between the host and an iSCSI target device. For more information about the iSCSI protocol, refer to RFC 3720 (http: //www.ietf .org/rfc/rfc3 7 2 O .txt). Architecturally, the iSCSI driver combines with the host's TCP/IP stack, network drivers, and Network Interface Card (NIC) to provide the same functions as a SCSI or a Fibre Channel (FC) adapter driver with a Host Bus Adapter (HBA).

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materials are being improperly usad, copied, or distributed piense email ctrainingersdhat . coa> or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 765b832c

iSCSI Data Access

2-2

• Clients (initiators) send SCSI commands to remote storage devices (targets) • Uses TCP/IP (t cp : 3260, by default) • Initiator •

Requests remote block device(s) via discovery process

•

iSCSI device driver required

•

iscs i service enables target device persistence

•

Package:iscsi-initiator-utils-*.rpm

• Target •

Exports one or more block devices for initiator access

•

Supported starting RHEL 5.3

•

Package:scsi-target-utils-*.rpm

An initiating device is one that actively seeks out and interacts with target devices, while a target is a passive device. The host ID is unique for every target. The LUN ID is assigned by the iSCSI target. The iSCSI driver provides a transport for SCSI requests and responses to storage devices via an IP network instead of using a direct attached SCSI bus channel or an FC connection. The Storage Router, in turn, transports these SCSI requests and responses received via the IP network between it and the storage devices attached to it. Once the iSCSI driver is installed, the host will proceed with a discovery process for storage devices as follows: •

The iSCSI driver requests available targets through a discovery mechanism as configured in the /etc/ iscsi/iscsid . conf configuration file.

•

Each iSCSI target sends available iSCSI target names to the iSCSI driver.

•

The iSCSI target accepts the login and sends target identifiers.

•

The iSCSI driver queries the targets for device information.

•

The targets respond with the device information.

•

The iSCSI driver creates a table of available target devices.

Once the table is completed, the iSCSI targets are available for use by the host using the same commands and utilities as a direct attached (e.g., via a SCSI bus) storage device.

For use only by a student enrollad in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training material are being improperly usad, copiad, or distributed pisase email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919)754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / ea325263

Configuring iSCSI Targets

• • • • •

2-6

Install scsi-target-utils package Modify letc/tgt/targets . conf Start the tgtd service Verify configuration with tgt-admin -s Reprocess the configuration with tgt-admin --update •

Changing parameters of a 'busy' target is not possible this way

•

Use tgtadm instead

Support for configuring a Linux server as an iSCSI target is supported in RHEL 5.3 onwards, based on the scsi-target-utils package (developed at http : //stgt.berlios.de/). After installing the package, the userspace tgtd service must be started and configured to start at boot. Then new targets and LUNs can be defined using /etc/tgt/targets.conf. Targets have an iSCSI name associated with them that is universally unique and which serves the same purpose as the SCSI ID number on a traditional SCSI bus. These names are set by the organization creating the target, with the iqn method defined in RFC 3721 being the most commonly used.

/etc/tgt/targets.conf: Parameter

backing-store device direct-store device initiator-address address íncominguser username password outgoinguser username password

Description defines a virtual device on the target. creates a device that with the same VENDORID and SERIAL_NUM as the underlying storage Limits access to only the specified IP address. Defaults to all Only specified user can connect. Target will use this user to authenticate against the initiator.

Example:

# List of files to export as LUNs backing-store /dev/volO/iscsi initiator-address 172.17.120.1 initiator-address 172.17.120.2 initiator-address 172.17.120.3

For use only by a student enrollad in a Red Hat training course taught by Red HM, Inc. or a Red HM Certified Training Partner. No pan of this publication may be photocopied, duplicated, stored in a relievel system, or otherwise reproduced without prior written consent of Red HM, Inc. ff you believe Red HM training material are being improperly used, copiad, or cfistributed pleese email ctrainingOredhat . coa> or phone toil-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright @ 2011 Red Hat, Inc. ... . . .

RH436-RHEL5u4-en-17-20110428 / bfd4c00e nc

Manual iSCSI configuration

• Create a new target • # tgtadm --Ild iscsi --op new --mode target --tid 1 -T

2-7

iqn.2008-02.cozn.exaznple:diskl

• Export local block devices as LUNs and configure target access • # tgtadm --Ild iscsi --op new --mode logicalunit --tid 1 --Iun 1 -b /dev/v010/iscsil •

# tgtadm --Ild iscsi --op bind --mode target --tid 1-I 192.0.2.15

To create a new target manually and not persistently, with target ID 1 and the name iqn.2008-02.com.example:diskl,Use:

frootstation51# tgtadm --11d iscsi --op new --mode target --tid 1 -T ign.2008-02.com.example:diskl

Then that target needs to provide one or more disks, each assigned to a logical unit number or LUN. These disks are arbitrary block devices which will only be accessed by iSCSI initiators and are not mounted as local file systems on the target. To set up LUN 1 on target ID 1 using the existing logical volume /dev/ vol o/iscen as the block device to export: [root9station5]# tgtadm --11d iscsi --op new --moda logicalunit --tid 1 ke --lun 1 -b /dev/volO/iscsil

Finally, the target needs to allow access to one or more remote initiators. Access can be allowed by IP address: [rootstation5]# tgtadm --11d iscsi --op bind --mode target --tid 1 -I / 192.168.0.6

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training material. are being improperly usad, copied, or distributed please email or phone toll-free (USA) +1 (888) 628 2994 or +1 (919)754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / d86e078d

Configuring the iSCSI Initiator Driver

2-8

• /etc/iscsi/iscsid.conf • Default configuration works unmodified (no authentication) • Settings: •

Startup - automatic or manual

•

CHAP - usernames and passwords

•

Timeouts - connections, login/logout

•

iSCSI - flow control, payload size, digest checking

The following settings can be configured in /etc/ scsi / iscsid . conf. Startup settings:

node.startup

automatic Or manual

CHAP settings:

node.session.auth.authmethod

Enable CHAP authentication (cHAP). Default is NONE.

node.session.auth.username node.session.auth.password node.session.auth.username_in node.session.auth.password_in discovery.sendtargets.auth.authmethod discovery.sendtargets.auth.username discovery.sendtargets.auth.password discovery.sendtargets.auth.username_in discovery.sendtargets.auth.password_in

CHAP username for initiator authentication by the target CHAP password for initiator authentication by the target CHAP username for target authentication by the initiator CHAP password for target authentication by the initiator Enable CHAP authentication (cHAP) for a discovery session to the target. Default is NONE. Set a discovery session CHAP username for the initiator authentication by the target Set a discovery session CHAP username for the initiator authentícation by the target Set a discovery session CHAP username for target authentication by the initiator Set a discovery session CHAP username for target authentication by the initiator

For more information about iscsid. conf settings, refer to the file comments.

For use only by a student enrolled in a Red HM training course taught by Red HM, Inc. or a Red HM Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red HM, Inc. If you believe Red HM training materials are being improperly usad, copied, or cfistributed pisase email or phone toil-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 85675f28 • 7

iSCSI Authentication Settings

2-9

• Two-way authentication can be configured using CHAP •

Target must also be capable/configured

• No encryption of iSCSI communications •

•

Authentication based on CHAP implies that: •

Usemame and challenge is sent cleartext

•

Authenticator is a hash (based on challenge and password)

•

If username, challenge and authenticator is sniffed, offline brute force attack is possible

•

Standard (RFC 3720) recommends use of IPSec

Consider running on an isolated storage-only network

CHAP (Challenge Handshake Authentication Protocol) is defined as a one-way authentication method (RFC 1334), but CHAP can be used in both directions to create two-way authentication. The following sequence of events describes, for example, how the initiator authenticates with the target using CHAP: After the initiator establishes a link to the target, the target sends a challenge message back to the initiator. The initiator responds with a value obtained by using its authentication credentials in a one-way hash function. The target then checks the response by comparing it to its own calculation of the expected hash value. If the values match, the authentication is acknowledged; otherwise the connection is terminated. The maximum length for the username and password is 256 characters each. For two-way authentication, the target will need to be configured also.

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red HM Certified Training Partner. No parí of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red HM training materiale are being Improperly used, copied, or distributed pisase email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc. ... . — ,

RH436-RHEL5u4-en-17-20110428 / 87fb99cb rara

First-time Connection to an iSCSI Target

2-11

• Start the initiator service: • # service iscsi start •

Discover available targets: •

•

# iscsiadm -m discovery -t sendtargets -p 172.16.36.1:3260 172.16.36.71:3260,1 iqn.2007-01 .com.example:storage.diskl

Login to the target session: • # iscsiadm -m node -T iqn.2007-01.com .example:storage.diskl -p 172.16.36.1:3260 -I

• View information about the targets: • # iscsiadm -m node -P N (N=0, 1) •

# iscsiadm -m session -P

•

# iscsiadm -m discovery -P N (N=0, 1)

N (N=0

-

3)

The iSCSI driver has a SysV initialization script that will report information on each detected device to the console or in dmesg(8) output. Anything that has an iSCSI device open must close the iSCSI device before shutting down iscsi. This includes filesystems, volume managers, and user applications. If iSCSI devices are open and an attempt is made to stop the driver, the script will error out and stop iscsid instead of removing those devices in an attempt to protect the data on the iSCSI devices from corruption. If you want to continue using the iSCSI devices, it is recommended that the iscsi service be started again. Once Iogged into the iSCSI target volume, it can then be partitioned for use as a mounted filesystem. When mounting iSCSI volumes, use of the netdev mount option is recommended. The _netdev mount option is used to indicate a filesystem that requires network access, and is usually used as a preventative measure to keep the OS from mounting these file systems until the network has been enabled. It is recommended that all filesystems mounted on iSCSI devices, either directly or on virtual devices (LVM, MD) that are made up of iSCSI devices, use the ' netdevl mount option. With this option, they will automatically be unmounted by the net f s initscript (before iscsi is stopped) during normal shutdown, and you can more easily see which filesystems are in network storage.

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in e retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training material, are belng Improperly used, copied, or distributed oleosa email < t reining@recthat . cosa> or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc. A II -• La-

RH436-RHEL5u4-en-17-20110428 / 9effb144

Managing an iSCSI Target Connection

2-12

To disconnect from an iSCSI target: • Discontinue usage • Log out of the target session: • # iscsiadm -m node -T iqn.2007-01.com.example:storage.disk1 -p 172.16.36.1:3260 -u To later reconnect to an iSCSI target: • Log in to the target session • # iscsiadm -m node -T iqn.2007-01.com.example:storage.disk1 -p 172.16.36.1:3260 -1 or restart the iscsi service • # service iscsi restart The iSCSI initiator "remembers" previously-discovered targets. Because of this, the iSCSI initiator will automatically Iog finto the aforementioned target(s) at boot time or when the iscsi service is restarted.

For use only by a student enrollad in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Cerlified Training Partner. No part of this publication may be photocopied, duplicated, atorad in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materials are being improperly usad, copiad, or distributed please email or phone toil-free (USA) +1 (666) 626 2994 or +1 (919) 754 3700.

Copyright e 2011 Red Hat, Inc. ... . , . ,

RH436-RHEL5u4-en-17-20110428 / 16c7ae1c ni

Disabling an iSCSI Target

2-13

To disable automatic iSCSI Target connections at boot time or iscsi service restarts: • Discontinue usage • Log out of the target session • # iscsiadm -m node -T iqn.2007-01.com.example:storage.diskl -p 172.16.36.1:3260 -u • Delete the target's record ID • # iscsiadm -m node -o delete -T iqn.2007-01.com.example:storage.disk1 -p 172.16.36.1:3260 Deleting the target's record ID will clean up the entries for the target in the structure.

/var/lib/iscsi

directory

Alternatively, the entries can be deleted by hand when the iscsi service is stopped.

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, atorad in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materials are being improperly usad, copied, or distributed pisase email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright ©2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / d9f1Oef6

End of Lecture 2

• Questions and Answers • Summary •

Describe the iSCSI Mechanism

•

Define iSCSI Initiators and Targets

•

Explain iSCSI Configuration and Tools

For use ordy by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No para of this publication may be photocopied, duplicated, atorad in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. H you believe Red Hat training material* are being improperly usad, copied, or distributed picase email ctrainingWredhat . coa> or phone toll-free (USA) +1 (866)626 2994 or +1 (919) 754 3700.

Copyright 2011 Red Hat, Inc. . ..

RH436-RHEL5u4-en-17-20110428 / 2276845b 00

Lab 2.1: iSCSI Software Target Configuration Scenario:

For a test cluster you have been assigned to configure a software iSCSI target as backend storage.

Deliverable:

A working iSCSI software target we can use to practice configuration of an iSCSI initiator.

Instructions: 1.

Instan the scsi -target - ut i I s package on your physical machine

2.

Create a 5GiB logical volume named iscsi inside the vol O volumegroup to be exported as the target volume.

3.

Modify /e tc/tgt/target s . conf so that it exports the volume to the cluster nodes: IQN Backing Store Initiator Addresses

iqn.2009-10.com.example.clusterx:iscsi klev/volOnscsi 172.17. (1 00+x).1 , 172.17. (1 00+X).2 , 172.17. (1 00+X).3

4.

Start the tgtd service and make sure that it will start automatically on reboot.

5.

Check to see that the iSCSI target volume is being exported to the correct host(s).

• • • • • • • •

• •

• • •

• •

•• • •• Coovriaht © 2011 Red Hat. Inc.

P11-141A-1:11-1PI Fi id-on-17-9111 1 nA9sa h.gs:triniA

•

11,

Lab 2.2: iSCSI Initiator Configuration Deliverable:

A working iSCSI initiator on the virtual machine that can connect to the iSCSI target.

System Setup:

It is assumed that you have a working iSCSI target from the previous exercise. All tasks are done on nodel.

1 Instructions:

•

1.

The iscs - init iat or-ut i 1 s RPM should already be installed on your virtual machines. Verify.

2. Set the initiator alias to nodel in /etc/ iscsi/init iatorname . iscsi.

11>

3.

10

Start the iSCSI service and make sure it survives a reboot. Check the command output and /var/log/messages for any errors and correct them before continuing on with the lab.

4.

Discover any targets being offered to your initiator by the target. The output of the iscsiadm discovery command should show the target volume that is available to the initiator in the form: .

5.

Note: The discovery process also loads information about the target in the directories:

e> 111

View information about the newly discovered target.

/var/lib/iscsi/ {nodes, send-targets}

6.

Log in to the iSCSI target.

7.

Use f disk to view the newly available device. It should appear as an unpartitioned 5GiB volume.

8.

Log out of the iSCSI target. Is the volume still there?

9.

Restart the iscsi service. Is the volume visible now?

10. Log out of the iSCSI service one more time, but this time also delete the record ID for the target. 11. Restart the iscsi service. Is the volume visible now?

110

12. Re-discover and log into the target volume, again.

e Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / e4ec3012

13. Use the volume to create a 100MB partition (of type Linux). Format the newly-created partition with an ext 3 filesystem. Create a directory named /mnt /c las s and mount the partition to it. Test that you are able to write to it. Create a new entry in /et c/ f st ab for the filesystem and test that the mount is able to persist a reboot of the machine. 14. Remove the fstab entry when you are finished testing and umount the volume.

•

•

• • • •

• •

• • 111 11 •

• •

Copyright © 2011 Red Hat, Inc. . .

RH436-RHEL5u4-en-17-20110428 / e4ec3012

010 •

11>

Lab 2.1 Solutions 1.

Install the scsi - target -utils package on your physical machine

stationn yum install 2.

-

-

-

Create a 5GiB logical volume named iscsi inside the vol° volumegroup to be exported as the target volume. statimx# lvcreate vol0

3.

y scsi target utils

-

n iscsi

-

L 5G

Modify /etc/tgt/targets. conf so that it exports the volume to the cluster nodes: iqn.2009-10.com.example.clusterX:iscsi klev/volO/iscsi 172.17. (100+x).1 , 172.17. (1 00+X).2 , 172.17. (1 00+x).3

1

IQN Backing Store Initiator Addresses

1 1

Edit /etc/tgt/targets. conf so that it reads:

backing-store /dev/volO/iscsi initiator-address 172.17.(100+X).1 initiator-address 172.17.(100+X).2 initiator-address 172.17.(100+X).3

1

e

4.

Start the tgtd service and make sure that it will start automatically on reboot.

*service tgtd start; chkconfig tgtd on 5.

1

Check to see that the iSCSI target volume is being exported to the correct host(s).

# tgt-admin -s Target 1: iqn.2009-10.com.example.clusterX:iscsi System information: Driver: iscsi State: ready I T nexus information: LUN information: LUN: O Type: controller SCSI ID: deadbeaf1:0 SCSI SN: beaf 10 Size: 0 MB Online: Yes Removable media: No Backing store: No backing store LUN: 1 Type: disk Copyright 02011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / be38d016

•

SCSI ID: deadbeafl:1 SCSI SN: beafll Size: 1074 MB Online: Yes Removable media: No Backing store: /dev/volO/iscsi Account information: ACL information:

•

••

• •

172.17.(100+X).1 172.17.(100+X).2 172.17.(100+X).3

• ••

• • • •• •

• • • •

•

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / be38d016

•

• Lab 2.2 Solutions 1.

The iscsi - initiator-utils RPM should already be inst alled on your virtual machines. Verify. cxni# rpm

-

q iscsi initiator utils -

-

1, 2. 11,

# echo "InitiatorAlias=nodeln » /etc/iscsi/initiatorname.iscsi 3.

Start the iSCSI service and make sure it survives a reboot. # service iscsi start # chkconfig iscsi on

e•

11,

11>

Set the initiator alias to nodel in /etc/ iscsi/init iatorname iscsi.

Check the command output and /var/log/messages for any errors and correct them before continuing on with the lab. 4.

Discover any targets being offered to your initiator by the target. # iscsiadm -m discovery -t sendtargets -p 172.17.(100+X).254 172.17.100+X.254:3260,1 iqn.2009-10.com.example.clusterX:iscsi

11>

The output of the iscsiadm discovery command should show the target volume that is available to the initiator in the forro: .

I> 5.

View information about the newly discovered target. # iscsiadm -m node -T -p 172.17.(100+X).254

11>

Note: The discovery process also loads information about the target in the directories:

11

/var/lib/iscsi/{nodes, send-targets} 6.

Log in to the iSCSI target.

10 # iscsiadm 7.

m node

-

T

-

p 172.17.(100+X).254

-

1

Use fdisk to view the newly available device. It should appear as an unpartitioned 5GiB volume. # fdisk

8.

-

-

1

Log out of the iSCSI target. Is the volume still there?

1,

# iscsiadm -m node -T -p 172.17.(100+X).254 -u # fdisk -1

11,

It should not still be visible in the output of fdisk 1. -

9.

Restart the iscsi service. Is the volume visible now?

Copyright (1) 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / e4ec3012

fi service iscsi restart

Because the record ID information about the previously-discovered target is still stored in the / var/lib/ isc si directory structure, it should have automatically made the volume available again. 10. Log out of the iSCSI service one more time, but this time also delete the record ID for the target. # iscsiadm -m node -T -p 172.17.(100+X).254 -u # iscsiadm -m node -T -p 172.17.(100+X).254 -o delete

11. Restart the iscsi service. Is the volume visible now? # service iscsi restart

It should not still be available. We must re-discover and log in to make the volume available again. 12. Re-discover and log into the target volume, again. # iscsiadm -m discovery -t sendtargets -p 172.17.(100+X).254 # iscsiadm -m node -T -p 172.17.(100+X).254 -1

13. Use the volume to create a 100MB partition (of type Linux). Format the newly-created partition with an ext 3 filesystem. Create a directory named /mnt /c las s and mount the partition to it. Test that you are able to write to it. Create a new entry in /et c/ f stab for the filesystem and test that the mount is able to persist a reboot of the machine. # mkdir /mnt/class # fdisk fimkfs -t ext3 # echo " /mnt/class ext3 » /etc/fstab fi mount /mnt/class 4 cd /mnt/class 4 dd if=/dev/zero of=myfile bs=1M count=10

_netdev

O O"

it

14. Remove the fstab entry when you are finished testing and umount the volume. fi umount /mnt/class 0 rmdir /mnt/class 4 vi /etc/fstab

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / e4ec3012

Lecture 3

e e e a e a

a

Kernel Device Management Upon completion of this unit, you should be able to: • Understand how udev manages device names. • Describe the role of the sys filesystem • Learn how to write udev rules for custom device names. • Dymically add storage to the system

e

e e e e e e

e e e e e

o

For use only by a student enrollad in a Red HM training course taught by Red HM, Inc. or a Red HM Certified Training Partner. No pert of this publication may be photocopied, duplicated, Mored in a retrieval system, or otherwise reproduced without prior written consent of Red HM, Inc. It you believe Red HM training materials are being improperty usad, copiad, or distributed pisase ornad ctraining4redhat . cota> or phone toll-free (USA) +1 (966) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 6ae830ef Al

udev Featu res

• • • • • • •

3-1

Only populates /dev with devices currently present in the system Device major/minor numbers are irrelevant Provides the ability to name devices persistently Userspace programs can query for device existence and name Moves all naming policies out of kernel and into userspace Follows LSB device naming standard but allows customization Very small

The /dev directory was unwieldy and big, holding a large number of static entries for devices that might be attached to the system (18,000 at one point). udev, in comparison, only populates /dev with devices that are currently present in the system. udev also solves the problem of dynamic allocation of entries as new devices are plugged (or unplugged) into the system. Developers were running out of major/minor numbers for devices. Not only does udev not care about major/minor numbers, but in fact the kernel could randomly assign them and udev would be fine. Users wanted a way to persistently name their devices, no matter how many other similar devices were attached, where they were attached to the system, and the order in which the device was attached. For example, a particular disk might always be named /dev/bootdisk no matter where it might be plugged into a SCSI chain. Userspace programs needed a way to detect when a device was plugged in or unplugged, and what /dev entry is associated with that device. follows the Linux Standards Base (LSB) for naming conventions, but allows userspace customization of assigned device names. udev

udev

is small enough that embedded devices can use it, as well.

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Treining Partner. No parí of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materials are being improperly usad, copiad, or distributed please email < training@redhat coms or phone toll-free (USA) +1 (866) 626 2994 or +1 (919)754 3700.

Copyright O 2011 Red Hat, Inc. All

RH436-RHEL5u4-en-17-20110428 / b765de18

Event Chain of a Newly Plugged-in Device

3-2

1. Kemel discovers device and exports the device's state to sysfs 2. udev is notified of the event via a netlink socket 3. udev creates the device node and/or runs programs (rule files) 4. udev notifies hald of the event via a socket 5. HAL probes the device for information 6. HAL populates device object structures with the probed information and that from several other sources 7. HAL broadcasts the event over D-Bus 8. A user-space application watching for such events processes the information When a device is plugged into the system, the kernel detects the plug-in and populates sysfs (/sys) with state information about the device. sysfs is a device virtual file system that keeps track of all devices supported by the kernel. Via a netlink socket (a connectionless socket which is a convenient method of transferring information between the kernel and userspace), the kernel then notifies udev of the event. udev, using the information passed to it by the kernel and a set of user-configurable rule files in /etc/ udev/rules .d, creates the device file and/or runs one or more programs configured for that device (e.g.

modprobe), before then notifying HAL of the event via a regular socket (See/etc/udev/rules . d/90

for the RUN+=" socket be monitored with udevmonitor --env.

hal . rules

: /org/freedesktop/hal/udev_event"

event).

udev

-

events can

When HAL is notified of the event, it then probes the device for information and populates a structured object with device properties using a merge of information from several different sources (kemel, configuration files, hardware databases, and the device itself). hald

then broadcasts the event on D-Bus (a system message bus) for receipt by user-space applications.

Those same applications also have the ability to send messages back to hald via the D-Bus to, for example, invoke a method on the HAL device object, and potentially invoking the kernel. For example, the mounting of a filesystem might be requested by gnome-volume-manager. The actual mounting is done by HAL, but the request and configuration carne from a user-space application.

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No pan of this publication mey be photocopied, duplicated, stored in a retrieval system, or othenvise reproduced without prior written consent of Red Hat, Inc. It you believe Red Hat training materias are being improperly used, copied, or astributed aviase email ctrainingeredhat . cesta or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / d63be5d4

•_

/sys Filesystem

3-3

Virtual filesystem like procfs Used to manage invidual devices Used by udev to identify devices Changes to /sys are not persistent

• • • •

•

Use udev rule to persist

•

Alternative: ktune(RHEL5.4 or newer)

• Important directories •

/sys/class/scsi_host/: contains all detected SCSI adapters

•

/sys/block/ : Contains alI block devices

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, atorad in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training material, are being Improperly ueed, copied, or distributed please email or phone toll-free (USA) +1 (866) 628 2994 or +1 (919)754 3700.

(PA

nr11 1 o,.a

LJni.

111-1.43A_I:11-1P1

itl-t3n-1 7-91)1 1 rld9R / SahRR3911

udev

3-4

• Upon receipt of device add/remove events from the kemel, udev will parse: • • •

user-customizable rules in /etc/udev/rules .d output from commands within those rules (optional) information about the device in /sys

• Based upon the information udev has gathered: • • • •

•

Handles device naming (based on rules) Determines what device files or symlinks to create Determines device file attributes to set Determines what, if any, actions to take

udevmonitor [--env]

When a device is added to or removed from the system, the kernel sends a message to udevd and advertises information about the device through /sys. udev then looks up the device information in /sys and determines, based on user customizable rules and the information found in /sys, what device node files or symlinks to create, what their attributes are, and/or what actions to perform. sysfs is used by udev for querying attributes about all devices in the system (location, name, serial number, major/minor number, vendor/product IDs, etc...).

has a sophisticated userspace rule-based mechanism for determining device naming and actions to perform upon device loading/unloading.

udev

udev accesses device information from sysfs using libsysf s library calls. libsysfs has a standard,

consistent interface for all applications that need to query

sysfs

for device information.

The udevmonitor command is useful for monitoring kernel and udev events, such as the plugging and unplugging of a device. The env option to udevmonitor increases the command's verbosity. - -

For use only by a student enrollad in a Red Hat training course taught by Red HM, Inc. or a Red HM Certified Training Partner. No parí of this publication may be photocopied, duplicated, atorad in a retrieval system, or otherwise reproduced without prior written consent of Red HM, Inc. II you believe Red Hat training materials are being improperly used, copied, or cfistributed pisase email ctrainiageredhat . coas or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 764af923

Configuring udev

3-5

• /etc/udev/udev.conf •

udev_root

•

udev rules

•

udev_log •

-

-

location of created device files (default is /dev) -

location of udev rules (default is /etc/udev/rules .d)

syslog(3) priority (default is err)

Run-time: udevcontrol Iog_priority=

All udev configuration files are placed in /etc/udev and every file consists of a set of lines of text. All empty lines or lines beginning with '#' will be ignored. The main configuration file for udev is /etc/udev/udev. conf, which aliows udev's default configuration variables to be modified. The following variables can be defined: udev_root

-

Specifies where to place the created device nodes in the filesystem. The default value is /

dev.

The name of the udev rules file or directory to look for files with the suffix " . rules". Multiple rule files are read in lexical order. The default value is /etc/udev/rules . d.

udev_rules

-

The priority level to use when logging to syslog (3). To debug udev at run-time, the logging level can be changed with the command "udevcontrol Iog_priority=". The default value is err. Possible values are: err, inf o and debug.

udev_log

-

•

••

• • •• •

•e e

• •e

• e e

•

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training material. are being improperly usad, copied, or distributed please email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

ennvrinht (e) 2011 Red Hat. Inc.

RH436-RHEL5u4-en-17-20110428 / 6eec8c5f

•e

udev Rules

3-6

• Filename location/format: •

/etc/udev/rules.dkrulename>.rules

• Examples: •

50-udev.rules

•

75-custom.rules

• Rule format: •

value [, ...] value [, ...]

• Example: •

BUS=="usb", SYSFS{serial}=="20043512321411d34721", SYMLINK+="usb_backup"

• Rule files are called on first read and cached •

Touch file to force an update

By default, the udev mechanism reads files with a " . rules" suffix located in the directory /etc/udev/ rules . d. If there is more than one rule file, they are read one at a time by udev in lexical order. By convention, the name of the rule file usually consists of a 2-digit integer, followed by a dash, followed by a descriptive name for the rules within it, and completes with a ". rules" suffix. For example, a udev config file named s O udev . rules would be read by udev before a file named 75 usb_custom rules because 50 comes before 75. -

-

The format of a udev rule is logically broken into two separate pieces on the same line: one or more match key-value pairs used to match a device's attributes and/or characteristics to some value, and one or more assignment key-value pairs that assign a value to the device, such as a name. If no matching rule is found, the default device node name is used. In the example aboye, a USB device with serial number 20043512321411d34721 will be assigned an additional symkink /dev/usb_backup (presuming no other rules override it later).

For use only by a etudent enroIled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. fi you believe Red HM training motorista are being improperly used, copiad, or cfistributed pisase ornad < trainingAredhat . COZ> or phone to1I-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428/ c6ace2cb

udev Rule Match Keys

3-7

• Operators: • •

Compare for equality ! = Compare for non-equality

• Match key examples: • ACTION=="add" •

KERNEL=="sd[a-z]1"

• BUS=="scsi" • DRIVER!="ide-cdrom"

•

•

SYSFS{serial}=="20043512321411d34721"

•

PROGRAM=="customapp.pl" RESULT=="some return string"

udev(7)

The following keys can be used to match a device: ACTION KERNEL DEVPATH SUBSYSTEM BUS DRIVER ID SYSFS{filename}

ENV{key}

PROGRAM

RESULT

Match the name of the event action (add or remove). Typicaily used to run a program upon adding or removing of a device on the system. Match the name of the device. Match the devpath of the device. Match the subsystem of the device. Search the devpath upwards for a matching device subsystem name. Search the devpath upwards for a matching device driver name. Search the devpath upwards for a matching device name. Search the devpath upwards for a device with matching sysfs attribute values. Up to five SYSFS keys can be specified per rule. All attributes must match on the same device. Match against the value of an environment variable (up to five ENV keys can be specified per rule). This key can also be used to export a variable to the environment. Execute external program and return true if the program returns with exit code O. The whole event environment is available to the executed program. The program's output, printed to stdout, is available for the RESULT key. Match the returned string of the last PROGRAM call. This key can be used in the same or in any later rule after a PROGRAM call.

Most of the fields support a form of pattern matching:

[a-z] [!a]

Matches zero or more characters Matches any single character Matches any single character specified within the brackets Matches any single character in the range a to z Matches any single character except for the letter a

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training material. are being improperly used, copied, or distributed piense email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright 02011 Red Hat, Inc. . .

RH436-RHEL5u4-en-17-20110428 / 89698ec1 fl

Finding udev Match Key Values

•

•

3-8

udevinfo -a -p $(udevinfo -q path -n /deld) • •

The sysfs device path of the device in question is all that is needed Produces a list of attributes that can be used in match rules

• • •

Choose attributes which identify the device in a persistent and easily-recognizable way Can combine attributes of device and a single parent device Padding spaces at the end of attribute values can be omitted

Also useful: •

scsi_id -g -x -s /block/sdx

•

/lib/udev/ata_id /dev/hdx

•

flib/udev/usbid /block/sdx

Finding key values to match a particular device to a custom rule is made easier with the udevinfo command, which outputs attributes and unique identifiers for the queried device. The "inner" udevinfo command aboye first determines the sysfs (/sys) path of the device, so the "outer" udevinfo command can query it for all the attributes of the device and its parent devices. Examples: # udevinfo -a -p $(udevinfo -q path -n /dev/sdal) # udevinfo -a -p /sys/class/netteth0

Other examples of commands that might provide useful information for udev rules: # scsi_id -g -s /block/sda # scsi_id -g -x -s /block/sda/sda3 # /11b/udev/ataid /dev/hda # /lib/udev/usb_id /block/sda

For use only by a student enrolled in a Red Hat training course taught by Red HM, Inc. or a Red HM Certified Training Partner. No pan of this publication may be photocopied, duplicated, stored in a retrieval system, or ~enviee reproduced without prior !rinden consent of Red HM, Inc. If you believe Red Hat training matenals are being improperly usad, copiad, or dstributed please emaiI ctrainingeredhat . coas or phone toil-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / c2bcbfb3

udev Rule Assignment Keys

3-9

• Operators: • = Assign a value to a key • •

+= Add the value to a key : = Assign a value to a key, disallowing changes by any later rules

• Assignment key examples: • • • •

NAME="usbcrypto" SYMLINK+="datal" OWNER="student" MODE="0600"

• LABEL="testrulesend" • GOTO="testrulesend" • RUN="myapp" The following keys can be used to assign a value/attribute to a device: NAME

SYMLINK

OWNER, GROUP, MODE ENV{key} RUN

LABEL GOTO IMPORT{type}

WAIT_FOR_SYSFS OPTIONS

The name of the node to be created, or the name the network interface should be renamed to. Only one rule can set the node name, all later rules with a NAME key will be ignored. It is not recommended to rename devices this way because tools like fdisk expect certain naming conventions. Use symlinks instead. The name of a symlink targeting the node. Every matching rule can add this value to the list of symlinks to be created along with the device node. Multiple symlinks may be specified by separating the names by the space character. The permissions for the device node. Every specified value overwrites the compiled-in default value. Export a variable to the environment. This key can also be used to match against an environment variable. Add a program to the list of programs to be executed for a specific device. This can only be used for very short running tasks. Running an event process for a long period of time may block all further events for this or a dependent device. Long running tasks need to be immediately detached from the event process itself. Named label where a GOTO can jump to. Jumps to the next LABEL with a matching name Import the printed result or the value of a file in environment key format into the event environment. program will execute an externa! program and read its output. Pile will import a text file. If no option is given, udev will determine it from the executable bit of of the file permissions. Wait for the specified sysfs file of the device to be created. Can be used to fight against kernel sysfs timing issues. last_rule - No later rules will have any effect, ignore_device - Ignore this event completely, ignore_remove - Ignore any later remove event for this device, al 1 partitions - Create device nodes for all available partitions of a block—device.

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materials are being improperly used, copiad, or distributed oleosa email ctraining@redhat .coms or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Coovriaht © 2011 Red Hat. Inc.

RH436-RHEL5u4-en-17-20110428 / 7a4d3e13

udev Rule Substitutions

3-10

• printf-Iike string substitutions • Can simplify and abbreviate rules • Supported by NAME, SYMLINK, PROGRAM, OWNER, GROUP and RUN keys • Example: KERNEL=="sda*", SYMLINK+="iscsi%n"

Substitutions are applied while the individual rule is being processed (except for RUN; see udev(7)). The available substitutions are:

$kernel, %k $number, %n $devpath, %p $id, %b $sysfs{file}, %s{file} $env{key}, %E{key} $major, %M $minor %m $result, %c

$parent, %P $root, %r $tempnode, %N %% $$

The kernel name for this device (e.g. sdbl) The kernel number for this device. (e.g. %n is 3, for 'sdan The devpath of the device (e.g. /block/sdb/sdbl, not /sys/block/sdb/ sdbl). Device name matched while searching the devpath upwards for BUS, IDDRIVER and SYSFS. The value of a sysfs attribute found at the current or parent device. The value of an environment variable. The kernel major number for the device. The kernel minor number for the device. The string retumed by the external program requested with PROGRAM. A single part of the string, separated by a space character may be selected by specifying the part number as an attribute: %c{N}. If the number is followed by the '+' char this part plus all remaining parts of the result string are substituted: %c{N+} The node name of the parent device. The udev_root value. The name of a created temporary device node to provide access to the device from a external program before the real node is created. The '%' character itself. The '$' character itself.

The count of characters to be substituted may be limited by specifying the format length value. For example, ' % 3s{file}' wi II only insert the first three characters of the sysfs attribute For example, using the rule:

KERNEL==usda*" SYMLINK+="iscsAnn any newly created partitions on the /dev/sda device (e.g. /dev/sdas) would trigger udev to also create a symbolic link named iscsi with the same kemel-assigned partition number appended to it (/dev/ iscsis, in this case).

For use only by a student enrollad in a Red Hat training course taught by Red HM, Inc. or a Red HM Certified Training Partner. No part of this publication may be photocopied, duplicated, atorad in a retrieval system, or otherwise reproduced without prior wrítten consent of Red HM, Inc. If you believe Red HM training meteríais are being improperly usad, copiad, or distributed please email ctrainingeredhat . coa, or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright O 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / d2cc94ae

udev Rule Examples

3-11

• Examples: BUS=="scsi", SYSFS{serial}=="123456789", NAME="byLocation/rackl-shelf2disk3" • KERNEL=="sd*", BUS=="scsi", PROGRAM=="/lib/udev/scsi_id -g -s %p", RESULT=="SATA ST340014AS 3JX8LVCA", NAME="backup%n"

•

•

KERNEL=="sd*", SYSFS{idVendor}=="0781", SYSFS{idProduct}=="5150", SYMLINK +="keycard", OWNER="student", GROUP="student", MODE="0600"

•

KERNEL=="sd?1", BUS=="scsi", SYSFS{model}=="DSCT10", SYMLINK+="camera"

• ACTION=="add", KERNEL=="ppp0", RUN+="/usr/bin/wall PPP Interface Added" • KERNEL=="ttyUSB*", BUS=="usb", SYSFS{product}=="Palm Handheld", SYMLINK +="pda"

The first example demonstrates how to assign a SCSI drive with serial number "123456789" a meaningful device name of /dev/byLocation/rackl-shelf2-disk3. Subdirectories are created automatically. The second example runs the program "/Iib/udev/scsi_id -g -s %p", substituting "%p" with the device path of any device that matches "/dev/sd*". In the second example, any device whose name begins with the Ietters "sd" (assigned by the kernel), will have its devpath substituted for the '%p" in the command "/sbin/scsi_id -g -s %p" (e.g. /block/ sda3 if /sys/block/sda3). If the command is successful (zero exit code) and its output is equivalent to "SATA ST340014AS 3JX8LVCA", then the device name "backup%n" will be assigned to it, where %n is the number portion of the kernel-assigned name (e.g. 3 if sda3). In the third example, any SCSI device that matches the listed vendor and product IDs will have a symbolic link named /dev/keycard point to the device. The device name will have owner/group associations with student and permissions mode 0600. The fourth example shows how to create a unique device name for a USB camera, which otherwise would appear like a normal USB memory stick. The fifth example executes the wall command-line shown whenever the ppp0 interface is added to the machine. The sixth example shows how to make a PDA always available at /dev/pda.

•

•• •• e

• •

• •

•• •

• •

•

•

e • o

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certitied Training Partner. No part of this publication may be photocopied, duplicated, atorad in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. II you believe Red Hat training materials are beIng Improperly usad, copiad, or distributed please email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / ff3a0d5e

M

udevmonitor

3-12

• Continually monitors kernel and udev rule events • Presents device paths and event timing for analysis and debugging continuously monitors kemel and udev rule events and prints them to the console whenever hardware is added or deleted from the machine.

udevmonitor

For use only by a student enrolled in a Red Hat training course taught by Red HM, Inc. or a Red HM Certified Training Partner. No parí of this pubiication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red HM, Inc. if you believe Red HM training meteríais are being improperly used, copied, or distributed pisase email ctrairringereehat . coa> or phone ton-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 9e8e5c7f

Dynamic storage management

3-13

• SCSI or FC devices require active scanning to be detected by the OS • Scan can by triggered manually: # echo

> /sys/class/scsi_host/hoet/scan

• Disks can be removed by executing: # echo 1 > /sys/block/device/device/delete • sg3_utils in RHEL 5.5 or newer provide the rescanscsi -bus . sh • For iSCSI use iscsiadm RHEL5.5 and newer offer the rescan-scsi-bus.sh script in the sg3 utils package. This tool makes scanning of the scsi bus easy. It can automatically update the logical unit configuration of the host as needed after a device has been added to the system and issues Loop Initialization Primitive (LIP) on on supported devices. In order for rescan scsi bus .sh to work properly, LUNO must be the first mapped logical unit else nothing is beeing detected even with --nooptscan option. During the first scan, rescan scsi bus . sh only adds LUNO, all other logical units are added in the second scan (race condition). A bug in the rescanscsi bus . sh script incorrectly executes the functionality for recognizing a change in logical unit size when the --remove option is used. -

-

-

-

-

For use only by a student enrollad in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. It you believe Red Hat training material. are being improperly used, copiad, or distributed please email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919)754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 52c09891

e

Tuning the anticipatory scheduler

e 1 1 ir 1

1

e e o e 1

e

3-16

• Goal: optimize completion rate (C) for dependent reads echo anticipatory > /sys/block/sda/queue/scheduler •

Primary tunables in /sys/block/sda/queue/iosched/ •

How long to wait for another, nearby read

antic _expire •

Max queue time

read_expire write expire

In many cases, an application that has issued a read may, after a short amount of think time, issue a request for the next disk block alter the one that was just read. Of course, by this time, the 10 scheduler will most likely have moved the disk read/write head to a different spot on the disk drive resulting in a another seek back to the same spot on disk to read the next block of data. This will result in additional latency for the application. The anticipatory 10 scheduler will wait for a short amount of time after servicing an 10 request for another request near the block of data that was just read. This can result in greatly improved throughput for certain types of loads. Reads and writes are processed in batches. Each batch of requests is allocated a specific amount of time in which to complete. Read batches should be allowed longer times than write batches. The tunables at /sys/block//queue/íosched/ for the anticipatory scheduler are documented at: /usr/share/doc/kernel-doc-*/Documentation/block/as-iosched.txt

1

r 1 • o 1 II

1

For use onty by a student enrollad in a Red Hat training course taught by Red HM, Inc. or a Red HM Certified Training Partner. No part of this publication may be photocopied, duplicated, atorad in a retrieval system, or otherwise reproduced without prior miden consent of Red HM, Inc. If you believe Red HM training materials are being improperly usad, copied, or cfistributed pisase email ctraining@redhat . coa> or phone boli-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 9285b974

Tuning the noop scheduler

3-17

• Goal: conserve CPU clock cycles echo noop > /sys/block/sda/queue/scheduler •

No tunable settings required

•

Use when CPU clock cycles are "too expensive"

•

Host CPU cycles are usually cheaper than SAN CPU cycles

• Some controllers perform elevator functions •

Tagged command queuing

•

Available on SCSI and some SATA drives

• Sorting is still useful for iSCSI and GNBD The no-op scheduler is just what it sounds like: with this 10 scheduler requests are queued as they are sent to the 10 sub-system and it is left up to the disk hardware to optimize things. This scheduler may be appropriate for certain types of workloads and hardware (RAM disks, TCQ disks, etc.) A feature of many modern drives that can boost performance is tagged command queuing. Tagged command queuing (TCQ) can improve disk performance by allowing the the disk controller to re-order 10 requests in such a way that head seek movement is minimized. The requests are tagged with an identifier by the disk controller so that the block(s) requested for a particular 10 operation can be returned in the proper sequence in which they were received. For certain applications, you may wish to limit the queue depth (number of commands that can be queued up). Many device driver modules for devices that support TCQ accept a parameter that will control the queue depth. This parameter is typically passed as an argument to the kernel at system boot time. For example, to limit the queue depth for the SCSI disk at LUN 2 on an Adaptec controller to 64 requests, you would append the following to the 'kernel' line in /boot/grub/grub.conf : aic7xxx=tag_info:{{0,0,64,0,0,0,0})

For use only by a atudent enrolled In a Red Hat training course taught by Red HM, Inc. or a Red Hat Certified Training Partner. No part of thie publication may be photocopied, duplicated, stored In a retrleval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materials are being Improperly usad, copied, or distributed piense email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 6edb76bf

Tuning the (default) cfq scheduler

3-18

• Goal: differentiated 10 service per application echo cfq > /sys/block/sda/queue/scheduler •

Class- and priority-based 10 queuing

•

Uses 64 interna! queues

•

Fills internal queues using round-robin

•

Requests are dispatched from non-empty queues

•

Sort occurs at dispatch queue

• Primary tunables in /sys/block/sda/queue/iosched/ •

Max requests per internal queue

queued •

Number of requests dispatched to device per cycle

quantum

The goal of the completely fair queuing (CFQ) 10 scheduler is to divide available 10 bandwidth equally among all processes that are doing 10. Internally, the CFQ 10 scheduler maintains 64 request queues. 10 requests are assigned in round robin fashion to one of these internal request queues. Requests are pulled from non-empty internal queues and assigned to a dispatch queue where they are serviced. 10 requests are ordered to minimize seek head movement when they are placed on the dispatch queue. The tunable settings for the CFQ 10 scheduler are: quantum - the total number of requests placed on the dispatch queue per cycle, queued - the maximum number of requests allowed per interna! request queue.

For use only by a student enrollad in a Red HM training course taught by Red Hat, Inc. or a Red HM Confiad Training Partner. No parí of Mis publication may be photocopied, duplicated, atorad in a retrieval aystem, or otherwise reproduced without prior written consent of Red HM, Inc. tf you believe Red HM training materials are being improperty usad, copiad, or distributed pisase email or phone ton-free (USA) +1 (866) 826 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / b41619d1 - -

Fine-tuning the cfq scheduler

3-19

• Class-based, prioritized queuing •

Class 1 (real-time): first-access to disk, can starve other classes •

•

Class 2 (best - effort): round - robin access, the default •

•

Priorities O (most important) through 7 (least important)

Priorities O (most important) through 7 (least important)

Class 3 (idle): receives disk 10 only if no other requests in queue •

No priorities

• Example ionice -pl ionice -pl -n7 -c2 ionice -pl

For use only by a student enrollad in a Red Hat training course taught by Red Hat, Inc. or e Red Hat Certified Training Partner. No part of this publication may be photocopled, duplicated, atorad in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training material» are being improperly usad, copiad, or distributed piense email ctrainingeredhat .com> or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc. ...

RH436-RHEL5u4-en-17-20110428 / 4fe88513 ,,„

End of Lecture 3

• Questions and Answers • Summary • •

Understand how udev manages device names. Learn how to write udev rules for custom device names.

For use only by a student enrolled in a Red HM training course taught by Red HM, Inc. or a Red HM Certified Training Partner. No part of Mis publication may be photocopied, duplicated, atorad in a retrieval system, or otherwise reproduced without prior written consent of Red HM, Inc. It you belleve Red HM training materials are being improperly used, copiad, or cfistributed pisase email c training@redhat . coa> or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 6ae830ef ....

Lab 3.1: Persistent Device Naming Scenario:

The order in which devices are attached or recognized by the system may dictate the device name attached to it, which can be problematic. We will learn how to map a specific device or set of devices to a persistent device name that will always be the same.

Deliverable:

Statically defined device names for storage devices.

Instructions:

1. Create and implement a udev rule on node 1 that, upon reboot, will create a symbolic link named /dev/ iscsiN that points to any partition device matching /dev/sdaN, where N is the partition number (any value between 1-9). Test your udev rule on an existing partition by rebooting the machine and verifying that the symbolic link is made correctly. If you don't have any partitions on /dev/ sda, create one before rebooting. The reboot can be avoided if, after verifying the correct operation of your udev rule, you create a new partition on /dev/ sda and update the in-memory copy of the partition table

(partprobe).

nfli 1

o

rai-imaa_mu

_nn_17_9niinA011 / Orihf9.40

Lab 3.1 Solutions 1. Create and implement a udev rule on node 1 that, upon reboot, will create a symbolic link named /dev/ i scs iN that points to any partition device matching /dev/ sdaN, where N is the partition number (any value between 1-9). Test your udev rule on an existing partition by rebooting the machine and verifying that the symbolic link is made correctly. If you don't have any partitions on /dev/sda, create one before rebooting. The reboot can be avoided if, after verifying the correct operation of your udev rule, you create a new partition on /dev/ sda and update the in-memory copy of the partition table (partprobe). a. There are severa] variations the student could come up with, but one is to create a file with priority 75, possibly named /et c/udev/rul e s . d/ 75 c lass lab remote . rules, with the contents: KERNEL=="sda [1-9] " , \ PROGRAM=="scsi id -g -u -s /block/sda/sda%n", RESULT=="S beafll", \ SYMLINK+="iscsi%n"

-

\

(Replace the RESULT field with the output you get from running the command: scsi_id -g -u -s /block/sda) If you are having problems making this work, double-check your file for typos and make sure that you have the correct number of equals signs for each directive as shown aboye.

Copyright 02011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 9dbf2495

Lecture 4

•

Device Mapper and Multipathing

•

Upon completion of this unit, you should be able to: • Understand how Device Mapper works and how to configure it • Understand Multipathing and its configuration

A eli e.a Cl o n

LL/H V,013 u, v i re C) e-y- 13 e 11. fq/

v.5

.p-k4 (ye a

I "I bac

CrL

fitcke. các

_ e)loc

Ltk

s. ¿

I

)

(

C

I t,c

Man)

r

.5 t)1 e lus

1 1)110c)

e-

DfiVC-f

•• • •• •

4II

2

e

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent o1 Red Hat, Inc. 11 you believe Red Hat training materiale are being improperly usad, copied, or distributed piense email or phone toll-free (USA) +1 (888) 826 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

• • • e

RH436-RHEL5u4-en-17-20110428 / a502ee61

•• O

Device Mapper

4-1

• Generic device mapping platform • Used by applications requiring block device mapping: • •

• • • • • •

LVM2 (e.g. logical volumes, snapshots) Multipathing

Manages the mapped devices (create, remove, ...) Configured using plain text mapping tables (load, reload, ...) Online remapping Maps arbitrary block devices Mapping devices can be stacked (e.g. RAID10) Kernel mapping-targets are dynamically loadable

The goal of this driver is to support volume management. The driver enables the creation of new logical block devices composed of ranges of sectors from existing, arbitrary physical block devices (e.g. (i)SCSI). This can be used to define disk partitions, or logical volumes. This kernel component supports user-space tools for logical volume management. Mapped devices can be more than 2TiB in 2.6 and newer versions of the kernel

(CONFIG_LBD).

Device mapper has a user space library (1 ibdm) that is interfaced by DeviceNolume Management applications (e.g. dmraid, LVM2) and a configuration and testing tool: dmsetup. The library creates nodes to the mapped devices in /dev/mapper.

For use only by a atudent enrollad in a Red Hat training course taught by Red HM, Inc. or a Red HM Certified Training Partner. No parí of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent oí Red Hat, Inc. 11 you believe Red HM training material are being improperty usad, copiad, or cfistributed please amad 20000< (0 60000 of /dev/sdb2) >80000] —

-

—

—

-

—

The /dev/mapper/mydevice logical device would appear as a single new device with 80000 contiguous (linearly mapped) sectors. As another example, the following script concatenates two devices in their entirety (both provided as the first two arguments to the command, e.g. scriptname /dev/sda /dev/sdb), to create a single new logical device named /dev/mapper/combined: #!/bin/bash sizel=$(blockdev --getsize $1) size2=$(blockdev --getsize $2) echo -e "0 $sizel linear $1 0\n$sizel $size2 linear $2 0" combined

dmsetup create be

For use only by a student enrolled in a Red HM training course taught by Red HM, Inc. or a Red HM Certified Training Partner. No part of this publication may be photocopied, duplicated, atorad in a retrieval system, or otherwise reproduced without prior written consent of Red HM, Inc. 1f you believe Red HM training ~crisis are being impropedy usad, copiad, or chstributed pisase email or phone ton-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright @ 2011 Red Hat, Inc. ... . . .

RH436-RHEL5u4-en-17-20110428 / e2506f23

Setup Steps for Multipathing FC Storage

4-17

• •

• Install device-mapper-multipath RPM • Configure /etc/mult ipath. conf • • • • •

modprobe dm_multipath modprobe dm-round-robin chkconfig multipathd on service multipathd start multipath -I

Note: while the actual device drivers are named dm-multipath . ko and dm- round- robin . ko (see the files in /1 ib/modules /kernel -version/kernel /drivers/md), underscores are used in place of the dash characters in the output of the Ismod command and either naming form can be used with modprobe. Available SCSI devices are viewable via /proc/scsi/scsi:

# cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: SEAGATE Model: ST318305LC Type: Direct-Access Host: scsil Channel: 00 Id: 00 Lun: 00 Vendor: ATA Model: ST340014AS Type: Direct-Access Host: scsi3 Channel: 00 Id: 00 Lun: 08 Vendor: IET Model: VIRTUAL-DISK Type: Direct-Access

Rev: 2203 ANSI SCSI revision: 03 Rev: 8.05 ANSI SCSI revision: 05

• • •

Rev: O ANSI SCSI revision: 04

If you need to re-do a SCSI scan, you can run the command:

echo "- - -" > /sys/class/scsi_host/hostO/scan where hos t O is replaced by the HBA you wish to use. You also can do a fabric rediscover with the commands:

echo "1" > /sys/class/fc_host/hostO/issue_lip echo "- - -" > /sys/class/scsi_host/hostO/scan This sends a LIP (loop initialization primitive) to the fabric. During the initialization, HBA access may be slow and/or experience timeouts.

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No parí of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materials are belng improperly usad, copied, or distributed please email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

•

RH436-RHEL5u4-en-17-20110428 / 6de19b5f

•

•

•e • • •

Multipathing and iSCSI

4-18

• Similar to FC multipathing setup • Can use either: • •

dm-multipath interface bonding

iSCSI can be multipathed. The iSCSI target is presented to the initiator via a completely independent pathway. For example, two different interfaces, eth0 and ethl , configured on different subnets, can provide the same exact device to the initiator via different pathways. In Linux, when there are multiple paths to a storage device, each path appears as a separate block device. The separate block devices, with the same WWID, are used by multipath to create a new multipath block device. Device mapper multipath then creates a single block device that re-routes I/O through the underlying block devices. In the event of a failure on one interface, multipath transparently changes the route for the device to be the other network interface. Ethernet interface bonding provides a partial alternative to dm-multipath with iSCSI, where one of the Ethernet links can fail between the node and the switch, and the network traffic to the target's IP address can switch to the remaining Ethernet link without involving the iSCSI block device at all. This does not necessarily address the issue of a failure of the switch or of the target's connection to the switch.

For use only by a studeM enrollad in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part oí this publication may be photocopied, duplicated, atored in a retrieval system, or othenvise reproduced without prior written consent o( Red HM, Inc. II you believe Red Hat training materials are being improperly usad, copiad, or cfistributed piense ornad ctrainingeredhat . coas or phone toII-free (USA) +1 (066) 620 2994 or +1 (919) 754 3700.

Copyright 0 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / ca0c7912

Multipath Configuration

•

4-19

/etc/multipath. conf SeCtiOnS: • defaults - multipath tools default settings • blacklist - list of specific device names to not consider for multipathing • blacklist_exceptions - list of multipathing candidates that would otherwise be blacklisted • mult ipaths - list of multipath characteristic settings • devices - list of per storage controller settings

• Allows regular expression description syntax • Only specify sections that are needed A section that lists default settings for the multipath tools. See the file: /usr/share/doc/device-mapper-multipath-/ multipath. conf .annotated for more details. By default, all devices are blacklisted (devnode "*"). Usually, the default blacklist blacklist section is commented out and/or modified by more specific rules in the blacklist_exceptions and secondary blackli st sections. blacklist_exceptionsAllOWS devices to be multipathing candidates that would otherwise be blacklisted. multipaths Specifies multipath-specific characteristics. Secondary blackli st To blacklist entire types of devices (e.g. SCSI devices), use a devnode line in the secondary blacklist section. To blacklist specific devices, use a WorldWide IDentification (WWID) line. Unless it is statically mapped by udev rules, there may be no guarantee that a specific device will have the same name on reboot (e.g. it could change from /dev/sda to /dev/sdb). Therefore is is generally recommended to not use devnode lines for blacklisting specific devices. Examples: defaults

blacklist { wwid 26353900f02796769 devnode " ^ (ramirawiloopifdlmclidm-isriscdist)[0-9]*" devnode " ^ hd[a-z]" devnode " ^ ccissic[0-9]d[0-9)*" }

Multipath attributes that can be set: wwid alias path_checker path_selector failback no_path_retry rr_min_io rr_weight

prio_callout

The container index Symbolic name for the multipath Path checking algorithm used to check path state The path selector algorithm used for this multipath Whether the group daemon should manage path group failback or not Should retries queue (never stop queuing until the path is fixed), fail (no queuing), or try N times before disabling queuing (N>0) The number of lOs to route to a particular path before switching to the next in the same path group Used to assign weights to the path Executable used to obtain a path weight for a block device. Weights are summed for each path group to determine the next path group to use in case of path failure

For use only by a student enrollad in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materiale are being Improperly used, copied, or distributed please email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright @ 2011 Red Hat, Inc. •

La

RH436-RHEL5u4-en-17-20110428 / 655f1732 00

[path status] [dm_status_if_known] •

host:channel:id:lun : The SCSI host, channel, ID, and LUN variables that identify the LUN.

•

devnode : The name of the device.

•

major:minor : The major and minor numbers of the block device.

•

path_status : One of the following: ready (path is able to handle I/O requests), shaky (path is up, but temporarily not available for normal operations), faulty (path is unable to handle I/O requests), and ghost (path is a passive path, on an active/passive controller).

•

dm_status_if_known : Similar to the path status, but from the kernel's point of view. The dm status has

two states: failed (analogous to faulty), and active which covers alI other path states. If the path is up and ready for I/O, the state of the path is [ready] [active]. If the path is down, the state will be [faulty] [failed] . The path state is updated periodically by the multipathd daemon based on the polling interval defined in /etc/multipath. conf. The dm status is similar to the path status, but from the kernel's point of view. NOTE: When a multipath device is being created or modified, the path group status and the dm status are not known. Also, the features are not always corred. When a multipath device is being listed, the path group priority is not known. To find out which device mapper entries match the systems multipathed devices, perform the following: •

multipath -II

•

Determine which long numbers are needed for the device mapper entries.

•

dmsetup Is —target multipath

This will return the long number. Examine the part that reads " (255, #)". The 'W is the device mapper number. The numbers can then be compared to find out which dm device corresponds to the multípathed device, for example /dev/dm 3. -

For use only by a student enrollad in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, atorad in a retrieval system, or othenvise reproduced without prior written consent of Red Hat, Inc. If you betieve Red Hat tmining materiaIs are being improperly usad, copiad, or diatributed pisase amad < trainingeredhat . coi or phone toll-free (USA) +1 (666) 626 2994 or +1 (919) 754 3700.

Copyright (1) 2011 Red Hat, Inc. ... . .. .

RH436-RHEL5u4-en-17-20110428 / 655f1732

End of Lecture 4

• Questions and Answers • Summary •

We Iearn how the system maps ordinary physical devices into very useful logical devices.

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or e Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materiale ere being improperly usad, copied, or distributed piense email ctraíning@redhat . con» or phone toll-free (USA) +1 (865) 626 2994 or +1 (919) 754 3700.

Coovricht in 2011 Red Hat. Inc.

RH436-RHEL5u4-en-17-20110428 / a502ee61

Lab 4.1: Device Mapper Multipathing Scenario:

The iSCSI target volume on your workstation has been previously created. The nodel machine accesses this iSCSI volume via it's eth2 (172 . 17 . (100+X) .1) interface. We want to make this iSCSI volume more fault tolerant by providing a second, independent network pathway to the iSCSI storage. We also want our operating system to ensure uninterrupted access to the storage device by automatically detecting any failure in the iSCSI pathway and redirecting all traffic to the alternate path. The /dev/sda device on your nodel is an iSCSI-provided device, discovered on your iSCSI target's 172.17 . (100+X) 254 interface. .

Deliverable:

11,

11/

Instructions:

1. If you did not rebuild nodel at the end of the last lab, do so now using the rebuildcluster script. 2.

I>

•

Create a second, independent network pathway on nodel to our iSCSI storage device and configure device-mapper multipathing so that access to the iSCSI target device continues in the event of a network failure.

Note: In this lab we will be performing multipath failovers using our iSCSI SAN. In node i's iSCSI initiator configuration file, /etc/iscsi/iscsid. conf, the default iSCSI timeout parameters (node sess ion . timen. replacement_timeout and node. session. err timeo. lu reset timeout) are set to 120 and 20 seconds, respectively. Left unchanged, failovers would take a while to complete. Edit these parameters to something smaller (e.g. 10, for both) and restart the isc s id service to put them into effect.

3. Before we can use the second interface on the intiator side, we need to modify the target configuration. Add 172.17.200+X. 1,172.17.200+X 2, and 172.17.200+X.3 as valid intitiator addresses to /etc/tgt/targets . conf .

4. •

Restart tgtd to activate the changes. Note that this will not change targets that have active connections. In this case either stop these connections first, or use tgtadm --lid iscsi --op bind --mode target --tid 1 -I ini tia torip

10

5. Let's start by disovering the target on the first interface. Also set the initiator alias again to nodel

110 Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 130c5b8a

6.

Log into nodel via ssh (do not use the console). Currently, nodel's network interfaces are configured as:

eth0 ethl eth2 eth3

-> -> -> ->

172.16.50.X1/16 172.17.X.1/24 (will be used for cluster messaging later) 172.17.100+X.1/24 (first path to the iscsi target) 172.17.200+X.1/24 (second path to the iscsi target)

Note that eth3 is on a different subnet than eth2. 7. On nodel, make sure there are exactly two 1GiB partitions on /dev/ sda (/dev/ sdal and /dev/ sda2). Delete any extras or create new ones if necessary. 8.

Discover and login to the target on the second interface (172.17 . 200+X. 254).

9.

Re-examine the output of the command 'fdisk -1'. Notice the addition of the new /dev/ sdb device, which is really the same underlying device as /dev/sda (notice their partitions have the same characteristics), but provided to the machine a second time via a second pathway. We can prove it is the same device by, for example, comparing the output of the following commands: cXn1.# c.:Xn1#

scsi id -g -u -s /block/sda scsi id -g -u -s /block/sdb

or cxnui scsi id -g -p 0x83 -s /block/sda cxni# scsi id -g -p 0x83 -s /block/sdb

See scsi_id(8) for explanation of the output and options used. 10. If not already installed, install the devi ce -mapper-multipath RPM on nodel. 11. Make the following changes to /etc/mult ipath conf:

Comment out the first blacklist section:

# blacklist { devnode "*" # } Uncomment the device mapper default behavior section that looks like the following: -

defaults { udev dir polling_interval selector path_grouping_policy getuid callout prio callout path checker Convriaht (e) 2011 Red Hat. Inc.

/dev 10 "round-robin O" multibus

"/sbin/scsi id /bin/true readsector0

- g

-u -s /block/%n"

RH436-RHEL5u4-en-17-201 10428 / 130c5b8a

rr_min_io rr weight failback no_path_retry user friendly_name

100 priorities immediate fail yes

}

Change the path_grouping_policy to failover, instead of mult ibus, to enable simple failover.

defaults { udev dir polling_interval selector path_grouping_policy getuid callout prio_callout pathchecker rr_min_io rr weight failback no_path_retry user friendly_names

/dev 10 "round-robin O" failover

"/sbin/scsi id -g -u -s /block/%n" /bin/true readsector0 100 priorities immediate fail yes

}

Uncomment the blacklist section just below it. This filters out all the devices that are not normally multipathed, such as IDE hard drives and floppy drives. Save the configuration file and exit the editor. 12. Before we start the mult ipathd service, make sure the proper modules are loaded: dm mult ipath, dm round robin. List all available dm target types currently available in the kernel. 13. Open a console window to nodel from your workstation and, in a separate terminal window, log in to nodel and monitor /var/log/messages. 14. Now start the mult ipathd service and make it persistent across reboots. 15. View the result of starting mult ipathd by running the commands: cXnl# cXnl#

fdisk -1 11 /dev/mpath

The device mappings, in this case, are as follows:

/ - - sda- - \

LUN --+

/ dm 3 (sd [ab] 1) -

-- dm-2 (mpath0) ---+

\ - sdb- - / -

\ dm 4 (sd[ab]2) -

These device mappings follow the pattern of: Copyright © 2011 Red Hat, Inc. . ..

RH436-RHEL5u4-en-17-20110428 / 130c5b8a

SAN (iSCSI storage) --> NIC (eth0/ethl, or HBA) --> device (/dev/sda) --> dm device (/dev/dm- 2) --> dm-mp device (/dev/mpath/mpath0).

•

Notice how device mapper combines multiple paths into a single device node. For example, /dev/dm- 2 represents both paths to our iSCSI target LUN. The device node /dev/dm- 3 singularly represents both paths to the first partition on the device, and the device node /dev/ dm-4 singularly represents both paths to the second partition on the device.

•

You will notice that /dev/dm- 3 is also referred to as/dev/mpath/mpathOpl and /dev/ mapper/mpathOpl. Only the /dev/mapper/mpath* device names are persistent and are created early enough in the boot process to be used for creating logical volumes or filesystems. Therefore these are the device names that should be used to access the multipathed devices. Keep in mind that fdisk cannot be used with /dev/dm- # devices. If the multipathed device needs to be repartitioned, use fdisk on the underlying disks instead. Afterward, execute the command 'kpartx -a /dev/dm-#' to recognize any newly created partitions. The device-mapper multipath maps will get updated and create /dev/dm- # devices for them.

••

• •

•

16. View the multipath device assignments using the command: multipath -11 mpath0 (S_beaf11) dm-2 IET,VIRTUAL-DISK [size=10G] [features=0] [hwhandler=0] \_ round-robin O [prio=0] [active] [active] [ready] \_ 0:0:0:1 sda 8:0 \_ round-robin O [prio=0] [enabled] [active][ready] \ 1:0:0:1 sdb 8:16 cXn1#

The first line shows the narre of the multipath (mpath0), its SCSI ID, and device-mapper device node. The second line helps to identify the device vendor and model. The third line specifies device attributes. The remaining lines show the participating paths of the multipath device, and their state. The "0:0:0:1" portion represents the host, bus (channel), target (SCSI id) and LUN, respectively, of the device (compare to the output of the command cat /proc/scsi/ scsi). 17. Test our multipathed device to make sure it really will survive a failure of one of its pathways. Create a filesystem on /dev/mapper/mpathOpl (which is really the first partition of our multipathed device), create a mount point named /mnt /data, and then mount it. Create a file in the /mnt /data directory that we can use to verify we still have access to the disk device. 18. To test that our filesystem can survive a failure of either path (eth2 or eth3) to the device, we will systematically bring down the two interfaces, one at a time, and test that we still have access to the remote device's contents. To do this, we will need to work from the console window of nodel, which you opened earlier, otherwise open a new console connection now. 19. Test the first path. From the console, verify that device access survives if we bring down eth3, and that we still have read/write access to /mnt/data/passwd. Note: if the iSCSI parameters were not trimmed to smaller values properly, the following multipath command and log output could take up to 120 seconds to complete. Copyright O 2011 Red Hat, Inc.

•

RH436-RHEL5u4-en-17-20110428 / 130c5b8a --

• • • • • • •• • • • •

If you monitor the tau end of /var/ log/messages, you will see messages similar to (trimmed for brevity): avahi-daemon[1768]: Interface eth3.IPv6 no longer relevant for II mDNS. kernel: sd 1:0:0:1: SCSI error: return code = 0x00020000 kernel: end request: I/O error, dev sdb, sector 4544 kernel: device-mapper: multipath: Failing path 8:16. multipathd: sdb: readsector0 checker reports path is down multipathd: checker failed path 8:16 in map mpath0 multipathd: mpath0: remaining active paths: 1 iscsid: Nop-out timedout after 15 seconds on connection 2:0 le state (3). Dropping session. The output of multipath also provides information: nodel-console#

multipath -11

sdb: checker msg is "readsector0 checker reports path is down" mpath0 (16465616462656166313a3100000000000000000000000000) dm-2 le IET,VIRTUAL-DISK [size=10G] [features=0] [hwhandler=0] \_ round-robin O [prio=0] [active] \ 0:0:0:1 sda 8:0 [active] [ready] \_—round-robin O [prio=0] [enabled] \_ 1:0:0:1 sdb 8:16 [failed] [faulty] Notice that the eth3 path (/dev/sdb) has failed, but the other path is still ready and active for all access requests. Bring the eth3 interface back up when you are finished verifying. Ensure that both paths are active and ready before continuing. 20. Now test the other path. Repeat the process by bringing down the eth2 interface, and again verifying that you still have read/write access to the device's contents. Bring the eth2 interface back up when you are finished verifying. 21. Rebuild nodei when done (execute rebuild-cluster -1 on your workstation.

Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 130c5b8a

Lab 4.1 Solutions 1.

If you did not rebuild nodel at the end of the last lab, do so now using the rebuildcluster script. stationx# rebuild cluster -

2.

-

1

Note: In this lab we will be performing multipath failovers using our iSCSI SAN. In node l's iSCSI initiator configuration file, / etc / iscsi / iscsid . conf, the default iSCSI timeout parameters (node session. timeo . replacement_timeout and node . session. err timeo. lu reset timeout) are set to 120 and 20 seconds, respectively. Left unchanged, failovers would take a while to complete. Edit these parameters to something smaller (e.g. 10, for both) and restart the iscsid service to put them into effect. nociel# vi /etc/iscsi/iscsid.conf

nodei# service iscsid restart

3.

Before we can use the second interface on the intiator side, we need to modify the target configuration. Add 172.17.200+X. 1, 172.17.200+X. 2, and 172.17.200+X. 3 as valid intitiator addresses to /etc/tgt/targets . conf /etc/tgt/targets.conf: -> -> ->

172.16.50.X1/16 172.17.X.1/24 (will be used for cluster messaging later) 172.17.100+X.1/24 (first path to the iscsi target) 172.17.200+X.1/24 (second path to the iscsi target)

Note that eth3 is on a different subnet than eth2. 7.

e 10 1

e 1

irfr

On nodel, make sure there are exactly two 1GiB partitions on /dev/sda (/dev/ sdal and /dev/sda2). Delete any extras or create new ones if necessary. eXn1.#

8.

fdisk

-

1

Discover and login to the target on the second interface (172.17 .200+X. 254). cXnl#

iscsiadm -m discovery -t sendtargets -p 172.17. (200+X) .254 # iscsiadm -m node -T -p 172.17. (200+X) .254 le

-1 9.

Re-examine the output of the command 'fdisk -1'. Notice the addition of the new /dev/sdb device, which is really the same underlying device as /dev/sda (notice their partitions have the same characteristics), but provided to the machine a second time via a second pathway. We can prove it is the same device by, for example, comparing the output of the following commands: cxxii# /sbin/scsi id -g -u -s /block/sda cx_in# /5bin/scsi id -g -u -s /block/sdb See scsi_id(8) for explanation of the output and options used.

10

1110 11,

10. If not already installed, install the device-mapper-mult ipath RPM on nodel. cXnl#yum

-

y install device mapper multipath -

-

11. Make the following changes to /etc /mult ipath conf: Comment out the first blacklist section:

# blacklist { Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 130c5b8a

devnode "*"

#

}

Uncomment the device mapper default behavior section that looks like the following:

• •

-

defaults { udev dir polling_interval selector path_grouping_policy getuid_callout prio_callout path_checker rr _ min _io rr weight failback no_path_retry user friendly_names

/dev 10 "round-robin O" multibus

"/sbin/scsi_id -g -u -s /block/%n" /bin/true readsector0 100 priorities immediate fail yes

}

Change the path_grouping_policy to failover, instead of multibus, to enable simple failover.

defaults { udev dir polling_interval selector path_grouping_policy getuid callout prio callout path_checker rr _ min _io rr weight failback no path retry user friendly_names

/dev 10 "round-robin O" failover

u/sbin/scsi id -g -u -s /block/%n" /bin/true readsector0 100 priorities immediate fail yes

}

Uncomment the blackl st section just below it. This filters out all the devices that are not normally multipathed, such as IDE hard drives and floppy drives. Save the configuration file and exit the editor. 12. Before we start the mult ipathd service, make sure the proper modules are loaded: dm multipath, dm round rob in. List all available dm target types currently available in — the kernel. cx111.# modprobe dm_multipath cx111# modprobe dm_round_robin grep dm_ cx1-11# lsmod

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-201 10428 / 130c5b8a

• • • • • • • a •

• • • •

•• •

13. Open a console window to nodel from your workstation and, in a separate terminal window, log in to nodel and monitor /var/log/messages. stationx# acm console nodel stationX# ssh nodel cXn1# tail -f /var/log/messages

14. Now start the multipathd service and make it persistent across reboots. cXn1# chkconfig multipathd on

cxrii# service multipathd start

15. View the result of starting mult ipathd by running the commands: cxralt fdisk -1 cXn1# 11 /dev/mpath

The device mappings, in this case, are as follows:

/--sda--\ LUN

-

-

+

/ dm-3 (sd[ab] 1) + -- dm 2 (mpath0) -

-

-

-

\--sdb--/

+ \ dm-4

(sd[ab]2)

These device mappings follow the pattern of: SAN (iSCSI storage) --> NIC (ethWethl, or HBA) --> device (/dev/sda) --> dm device (/dev/dm 2) --> dm-mp device (/dev/mpath/mpath0). -

Notice how device mapper combines multiple paths into a single device node. For example, /dev/dm- 2 represents both paths to our iSCSI target LUN. The device node /dev/dm- 3 singularly represents both paths to the first partition on the device, and the device node /dev/ dm-4 singularly represents both paths to the second partition on the device. You will notice that /dev/dm- 3 is also referred to as/dev/mpath/mpathOpl and /dev/ mapper/mpathOpl. Only the /dev/mapper/mpath* device names are persistent and are created early enough in the boot process to be used for creating logical volumes or filesystems. Therefore these are the device names that should be used to access the multipathed devices. Keep in mind that fdisk cannot be used with /dev/dm # devices. If the multipathed device needs to be repartitioned, use fdisk on the underlying disks instead. Afterward, execute the command 'kpartx -a /dev/dm-#' to recognize any newly created partitions. The device-mapper multipath maps will get updated and create /dev/dm-# devices for them. -

16.

View the multipath device assignments using the command: cxn?# multipath -11 mpath0(S_beaf11) dm-2 IET,VIRTUAL-DISK [size=10G] [features=0] [hwhandler=0] \_ round-robin O [prio=0] [active] \_ 0:0:0:1 sda 8:0 [active][ready] \ round-robin O [prio=0] [enabled]

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 130c5b8a

\_ 1:0:0:1 sdb 8:16

[active][ready]

The first line shows the name of the multipath (mpath0), its SCSI ID, and device-mapper device node. The second line helps to identify the device vendor and model. The third line specifies device attributes. The remaining lines show the participating paths of the multipath device, and their state. The "0:0:0:1" portion represents the host, bus (channel), target (SCSI id) and LUN, respectively, of the device (compare to the output of the command cat /prociscsi/ scsi). 17. Test our multipathed device to make sure it really will survive a failure of one of its pathways. Create a filesystem on /dev/mapper/mpathOpl (which is really the first partition of our multipathed device), create a mount point named /mnt /data, and then mount it. cxn3.# mke2fs -j /dev/mapper/mpathOpl cxn].# mkdir /mnt/data c.:Xn1.4#

mount /dev/mapper/mpathOpl /mnt/data

Create a file in the /mnt /data directory that we can use to verify we still have access to the disk device. cXn1#

cp /etc/passwd /mnt/data

18. To test that our filesystem can survive a failure of either path (eth2 or eth3) to the device, we will systematically bring down the two interfaces, one at a time, and test that we still have access to the remote device's contents. To do this, we will need to work from the console window of nodel, which you opened earlier, otherwise open a new console connection now. 19. Test the first path. From the console, verify that device access survives if we bring down eth3, and that we still have read/write access to /mnt/dat a/pa s swd. cY,:n1# cXn1# eXnl#

ifdown eth3 cat /mnt/data/passwd echo "HELLO" » /mnt/data/passwd

Note: if the iSCSI parameters were not trimmed to smaller values properly, the following multipath command and log output could take up to 120 seconds to complete. If you monitor the tail end of /var/log/messages, you will see messages similar to (trimmed for brevity): avahi-daemon[1768]: Interface eth3.IPv6 no longer relevant for w. mDNS. kernel: sd 1:0:0:1: SCSI error: return code = 0x00020000 kernel: end request: I/O error, dev sdb, sector 4544 kernel: device-mapper: multipath: Failing path 8:16. multipathd: sdb: readsector0 checker reports path is down multipathd: checker failed path 8:16 in map mpath0 multipathd: mpath0: remaining active paths: 1 iscsid: Nop-out timedout after 15 seconds on connection 2:0 state (3). Dropping session.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-201 10428 / 130c5b8a

The output of multipath also provides information:

11

1,

multipath -11 sdb: checker msg is "readsector0 checker reports path is down" mpath0 (16465616462656166313a3100000000000000000000000000) dm-2 11 IET,VIRTUAL-DISK [size=10G][features=0][hwhandler=0] \_ round-robin O [prio=0] [active] \_ 0:0:0:1 sda 8:0 [active] [ready] \_ round-robin 0 [prio=0] [enabled] 1:0:0:1 sdb 8:16 [failed] [faulty]

1

Notice that the eth3 path (/dev/ sdb) has failed, but the other path is still ready and active for all access requests.

1

I/

nodel-console#

Bring the eth3 interface back up when you are finished verifying. Ensure that both paths are active and ready before continuing. ifup eth3 cxrali multipath -11 # multipath -11 mpath0 (16465616462656166313a3100000000000000000000000000) dm-2 I/ IET,VIRTUAL-DISK [size=10G] [features=0] [hwhandler=0] \_ round-robin O [prio=0] [active] \_ 0:0:0:1 sda 8:0 [active] [ready] \_ round-robin O [prio=0] [enabled] \ 1:0:0:1 sdb 8:16 [active][ready] c xnz#

1

1

20. Now test the other path.

e

Repeat the process by bringing down the eth2 interface, and again verifying that you still have read/write access to the device's contents. cXnl# cXn1# cXnl# cXnl#

1

o

•

ifdown eth2 cat /mnt/data/passwd echo "LINUX" » /mnt/data/passwd multipath -11

Bring the eth2 interface back up when you are finished verifying. 21. Rebuild node 1 when done (execute rebuild-cluster -1 on your workstation). rebuild-cluster -1 This will create or rebuild node(s): 1 Continue? (y/N): y station5#

The rebuild process can be monitored from node l's console window: station#

an console nodel

1 Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 130c5b8a

Lecture 5

Red Hat Cluster Suite Overview Upon completion of this unit, you should be able to: • Provide an overview of Red Hat Cluster Suite and its major components

• • • • • • • e • • • •

•

1 •

For use only by a student enrolted in a Red Hat treining course teught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat treining materials are belng improperly used, copied, or distributed pleese email ct rainingeredhat . coa» or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700. rnnurinht le)

9111 1 Read Hat Int

RH436-RHEL5u4-en-17-20110428 / b2cdb596

•

What is a Cluster?

5-1

• A group of machines that work together to perform a task. • The goal of a cluster is to provide one or more of the following: •

High Performance

• •

High Availability Load Balancing

• Red Hat's cluster products are enablers of these goals • • • •

Red Hat Cluster Suite Global File System (GFS) Clustered Logical Volume Manager (CLVM) Piranha

High performance, or Computational clusters, sometimes referred to as GRID computing, use the CPUs of severa! systems to perform concurrent calculations. Working in parallel, many applications, such as animation rendering or a wide variety of simulation and modeling problems, can improve their performance considerably. High-availability application clusters are also sometimes referred to as fail-over clusters. Their intended purpose is to provide continuous availability of some service by eliminating single points of failure. Through redundancy in both hardware and software, a highly available system can provide virtually continuous availability for one or more services. Fail-over clusters are usually associated with services that involve both reading and writing data. Fail-over of read-write mounted file systems is a complex process, and a fail-over system must contain provisions for maintaining data integrity as a system takes over control of a service from a failed system. Load-balancing clusters dispatch network service requests to multiple systems in order to spread the request load over multiple systems. Load-balancing provides cost-effective scalability, as more systems can be added as requirements change over time. Rather than investing in a single, very expensive system, it is possible to invest in multiple commodity x86 systems. If a member server in the cluster fails, the clustering software detects this and sends any new requests to other operational servers in the cluster. An outside client should not notice the failure at all, since the cluster looks like a single large server from the outside. Therefore, this form of clustering also makes the service highly-available, able to survive system failures. What distinguishes a high availability system from a load-balancing system is the relationship of fail-over systems to data storage. For example, web service might be provided through a load-balancing router that dispatches requests to a number of real web servers. These web servers might read content from a failover cluster providing a NFS export or running a database server.

For use only by a student enroffed in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materials are being improperly usad, copied, or distributed pisase email < training4redhat . cosa> or phone toil-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 06cfad5d

Red Hat Cluster Suite

5-2

• Open Source Clustering Solution • Provides infrastructure for Clustered Application • rgmanager: makes off-the-shelf applications highly available • GFS2: Clustered filesystem High availability clusters, like Red Hat Cluster Suite, provide the necessary infrastructure for monitoring and failure resolution of a service and its resources. Red Hat Cluster Suite allows services to be relocated to another node in the event of an unresolvable failure on the original node. The service itself does not need to be aware of the other nodes, the status of its own resources, or the relocation process. Shared storage among the cluster nodes may be useful so that the services' data remains available after being relocated to another node, but shared storage is not required for the cluster to keep a service available. The ability to prevent access to a resource (hard disk, etc...) for a cluster node that loses contact with the rest of the nodes in the cluster is called fencing, and is a requirement for multi-machine (as opposed to single machine, or virtual machine instances) support. Fencing can be accomplished at the network level (e.g. SCSI reservations or a fibre channel switch) or at the power level (e.g. networked power switch).

For use only by a student enrolled In a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No parí of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training material, are being improperly used, copied, or distributed please email or phone tolf-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 4f74f7ee

Cluster Configuration System (CCS)

5-12

• Daemon runs on each node in the cluster (ccsd) • Provides cluster configuration info to all cluster components • Configuration file: •

/etc/cluster/cluster.conf

•

Stored in XML format

•

cluster.conf(5)

• Finds most recent version among cluster nodes at startup • Facilitates online (active cluster) reconfigurations •

Propagates updated file to other nodes

•

Updates cluster manager's information

CCS consists of a daemon and a library. The daemon stores the XML file in memory and responds to requests from the library (or other CCS daemons) to get cluster information. There are two operating modes quorate and nonquorate. Quorate operation ensures consistency of information among nodes. Non-quorate mode connections are only allowed if forced. Updates to the CCS can only happen in quorate mode. If no cluster .conf exists at startup, a cluster node may grab the first one it hears about by a multicast announcement. The OpenAIS parser is a "plugin" that can be replaced at run time. The cman service that plugs into OpenAIS provides its own configuration parser, ccsd. This means /etc/ais/openais . conf is not used if cman is loaded into OpenAlS; ccsd is used for configuration, instead.

For use only by a student enrolled in e Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. 11 you believe Red Hat training materials ere being improperly used, copied, or distributed please email < t rainingerecthat .corn> or phone toII-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright O 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 88d9cf79

CMAN - Cluster Manager

5-13

• Main component of cluster suite • Calculates quorum - an indication of the cluster's health • Started by the curan SysV script • •

•

also starts other components (ccsd,fenced,..) must be started in parallel on cluster members

Uses /etc/cluster/cluster . conf

• all cluster applications (CLVM, GFS2, rgmanager, ..) require this service • Uses the OpenAIS framework for communication The cluster manager, an OpenAIS service, is the mechanism for configuring, controlling, querying, and calculating quorum for the cluster. The cluster manager is configured via /etc/cluster/cluster.. conf (ccsd), and is responsible for the quorum disk API and functions for managing cluster quorum.

For use My by a student enrollad he a Red HM training course taught by Red HM, Inc. or a Red HM Certified Training Partner. No part of this publicaba) may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior minen consent of Red HM, Inc. If you believe Red HM training materials are being improperly usad, copiad, or chstributed piense amad < rainingfiredhat coa> or phone to114ree (USA) +1 (866) 626 2994 or +1 (919)754 3700.

Copyright 2011 Red Hat, Inc. • —

RH436-RHEL5u4-en-17-20110428 / a18b7ef5

Cluster Quorum

5-14

• Most cluster operations require the cluster to be "quorate" • •

generally more than haif nodes must be available exact quorum requirements can be configured

• prevents accidential resource usage by errant nodes (split-brain situation) • If quorum is lost, no cluster resources may be started • Two node clusters do not use this feature CMAN keeps track of cluster quorum by monitoring the count of cluster nodes. This feature is only used for clusters with more than two nodes. If more than haif the nodes are active, the cluster has quorum. If half the nodes (or fewer) are active, the cluster does not have quorum, and all cluster activity is stopped. Cluster quorum prevents the occurrence of a "split-brain" condition — a condition where two instances of the same cluster are running. A split-brain condition would allow each cluster instance to access cluster resources without knowledge of the other cluster instance, resulting in corrupted cluster integrity.

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No parí of this publication may be photocopied, duplicated, atorad in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training ~tortilla are being Improperly used, copiad, or distributed please email trainingOredhat coa» or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright 2011 Red Hat, Inc. . .

RH436-RHEL5u4-en-17-20110428 / d6bc45d7

OpenAIS

• • • • • •

5-15

A cluster manager Underlying Cluster Communication Framework Provides cluster membership and messaging foundation All components that can be in user space are in user space Allows closed process groups (iibcpg) Advantages: • • • • • •

Failures do not cause kemel crashes and are easier to debug Faster node failure detection Other OpenAIS services now possible Larger development community Advanced, well researched membership/messaging protocols Encrypted communication

OpenAIS has several subsystems that already provide membership/locking/events/communications services and other features. In this sense, OpenAIS is a cluster manager in its own right. OpenAIS's core messaging system used is called "totem", and it provides reliable messaging with predictable delivery ordering. While standard OpenAIS callbacks are relative to the entire cluster for tasks such as message delivery and configuration/membership changes, OpenAIS also allows for Closed Process Groups (libcpg) so processes can join a closed group for callbacks that are relative to the group. For example, communication can be limited to just host nodes that have a specific GFS filesystem mounted, currently using a DLM lockspace, or a group of nodes that will fence each other. The core of OpenAIS is the modular aisexec daemon, into which various services load. Because cman is a service module that loads into aisexec, it can now take advantage of the OpenAIS's totem messaging system. Another module that loads into aisexec is the CPG (Closed Process Groups) service, used to manage trusted service partners. cman, to some extent, still exists largely as a compatibility layer for existing cluster applications. A configuration interface into CCS, quorum disk API, mechanism for conditional shutdown, and functions for managing quorum are among its still-remaining tasks.

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red HM Certified Treining Partner. No part of this publication may be photocopied, duplicated, atorad in a retrieval system, or otherwise reproduced without prior written consent of Red HM, Inc. II you believe Red HM training materials are being improperly used, copiad, or ckstributed pisase email or phone toll-free (USA) +1 (666) 626 2994 or +1 (919) 754 3700.

Copyright 2011 Red Hat, Inc. . .

RH436-RHEL5u4-en-17-20110428 / 4ea94f94

lucí • • • • • • •

5-18

•

•

Web interface for cluster management Create new clusters or import old configuration Can create users and determine what privileges they have Can grow an online cluster by adding new systems Only have to authenticate a remote system once Node fencing View system logs for each node

•

Conga is an agent/server architecture for remote administration of systems. The agent component is called ricci, and the server is called luci. One luci server can communicate with many multiple ricci agents installed on systems.

• • • • • • • • •• Ó

• •

For use only by a student enrolled in a Red Hat treining course teught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, dupliceted, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materials are being improperly usad, copied, or distributed ()tease email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Coovriaht O 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 2b94add9

•

ricci

5-19

• An agent that runs on any cluster node to be administered by lucí • One-time certificate authentication with lucí • All communication between lucí and ricci is via XML When a system is added to a luci server to be administered, authentication is done once. No authentication is necessary from then on (unless the certificate used is revoked by a CA). Through the UI provided by luci, users can configure and administer storage and cluster behavior on remote systems. Communication between luci and ricci is done via XML.

For use only by a student enrolled in a Red HM training course taught by Red HM, Inc. or a Red HM Certifled Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or othenvise reproduced without prior written consent of Red HM, Inc. If you believe Red HM training materials are being improperly usad, copiad, or efistributed pisase email or phone toll-free (USA) +1 (886) 626 2994 or +1 (919) 754 3700.

Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 5a6ce63f

Fencing

5-24

• Fencing separates a cluster node from its storage • •

Power fencing Fabric fencing

• Fencing is necessary to prevent corruption of resources • Fencing is required for a supportable configuration •

Watchdog timers and manual fencing are NOT supported

Fencing is the act of immediately and physically separating a cluster node from its storage to prevent the node from continuing any form of I/O whatsoever. A cluster must be able to guarantee a fencing action against a cluster node that loses contact with the other nodes in the cluster, and is therefore no longer working cooperatively with them. Without fencing, an errant node could continue I/O to the storage device, totally unaware of the I/O from other nodes, resulting in corruption of a shared filesystem.

For use only by a student enrollad in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materiala are being improperly used, copiad, or distributed please email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919)754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / f4a70f29

End of Lecture 5

• Questions and Answers • Summary •

Red Hat Cluster Suite

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certifted Training Partner. No part of this publication may be photocopied, duplicated, stored ín a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. I( you belíeve Red Hat training material* are being improperly usad, copiad, or distributed pisase emelt or phone ton-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright 2011 Red Hat, Inc. AI

•

RH436-RHEL5u4-en-17-20110428 / b2cdb596

Lab 5.1: Building a Cluster with Conga Scenario:

We will use the workstation as the luci deployment node to create a cluster from nodes 1 and 2. We will configure an Apache web server resource group on the cluster nodes that accesses a shared ext3 filesystem for our DocumentRoot.

Instructions:

1.

Recreate nodel and node2 if necessary with the rebuild-cluster tool.

2.

It is best practice to put the cluster traffic on a private network. For this purpose ethl of your virtual machines is connected to private bridge named cluster on your workstation. Cluster suite picks the network that is associated with the hostname as its cluster communication network. It is considered best practice to use a separate private network for that. Configure the hostname of both virtual machines so that it points to nodeN. clus t erX. example . com (Replace N with the node number and X with your cluster number. Make sure that the setting is persistent.

3.

Make sure that the iscsi target is available on both nodes. You can use /root/RH436/ HelpfulFiles/setup-initiator -bl .

4.

From any node in the cluster, delete any pre-existing partitions on our shared storage (the / root/RH4 3 6 /HelpfulFiles/wipe_sda script makes this easy), then make sure the OS on each node has its partition table updated using the partprobe command.

5.

Install the luc i RPM on your workstation and the ricci and ht tpd RPMs on nodel and node2 of your assigned cluster.

6.

Start the ricci service on nodel and node2, and configure it to start on boot.

7.

Initialize the lucí service on your workstation and create an administrative user named admin with a password of redhat.

8.

Restart luc i (and configure to persist a reboot) and open the web page the command output suggests. Use the web browser on your local classroom machine to access the web page.

9.

Log in to luc i using admin as the Login Name and redhat as the Password.

10. From the "Luci Homebase" page, select the cluster tab near the top and then select "Create a New Cluster" from the left sidebar. Enter a cluster name of c lusterX, where X is your assigned cluster number. Enter the fully-qualified name for your two cluster nodes (nodeN. clust erX. example . com) and the password for the root user on each. Make

Copyright O 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-201 10428 / 081f11a7

sure that "Download packages" is pre-selected, then select the "Check if node passwords are identical" option. All other options can be left as-is. Do not click the Submit button yet! 11. Before submitting the node information to lucí and beginning the Install, Reboot, Configure, and Join phases, open a console window to nodel and node2, so you can monitor each node's progress. Once you have completed the previous step and have prepared your consoles, click the Submit button to send your configuration to the cluster nodes. 12. Once lucí has completed (once all four circles have been filled-in in the lucí interface), you will be automatically re-directed to a General Properties page for your cluster. Select the Fence tab. In the XVM fence daemon key distribution section, enter dont() . clusterX. example . com in the first box (node hostname from the host cluster) and nodel . clusterX. example . com in the second box (node hostname from the hosted (virtual) cluster). Click on the Retrieve cluster nodes button. At the next screen, in the same section, make sure both cluster nodes are selected and click on the Create and distribute keys button. 13. From the left-hand menu select Failover Domains, then select Add a Failover Domain. In the "Add a Failover Domain" window, enter pre f er_nodel as the "Failover Domain Name". Select the Prioritized and Restrict failover to this domain's members boxes. In the "Failover domain membership" section, make sure both nodes are selected as members, and that nodel has a priority of 1 and node2 has a priority of 2 (lower priority). Click the Submit button when finished. 14. We must now configure fencing (the ability of the cluster to quickly and absolutely remove a node from the cluster). Fencing will be performed by your workstation (domo . clusterX. example . com), as this is the only node that can execute the xm destroy command necessary to perform the fencing action. First, create a shared fence device that will be used by all cluster nodes. From the left-hand menu select Shared Fence Devices, then select Add a Fence Device. In the Fencing Type dropdown menu, select Virtual Machine Fencing. Choose the name xenf enceX (where X is your cluster number) and click the Add this shared fence device button. 15. Second, we associate each node with our shared fence device. From the left-hand menu select Nodes. From the lower left area of the first node in luc i's main window (node 1) select Manage Fencing for this Node. Scroll to the bottom, and in the Main Fencing Method; section, click the Add fence device to this level link. In the drop-down menu, select xenf enceX (Virtual Machine Fencing). In the Domain box, type nodel (the name that would be used in the command: xm destroy to fence the node), then click the Update main fence properties button at the bottom. Repeat the process for each node in the cluster (using the appropriate node name for each in the Domain box).

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 081f11a7

16. To complete the fencing setup, we need run fence xvmd on your workstation. First, install the cman packages on your workstation, but do not start the cman service. stat ionx# yum -y instan cman Second, copy /et c/c luster/ f ence_xvm . key from one of the cluster nodes to /etc/ cluster on stationX.

Note:

If the fence key was not created automatícally by the GUI, it is possible to create one manually.: #

dd if= dev urandom count=1

cluster fence-xvm.key bsm4k

Third, add the command /sbin/fence_xvmd -L -I cluster to /etc /rc . local and execute rc . local. This starts the fence daemon without a running cluster (-L) and let it listen on the cluster bridge (-I cluster). 17. Before we add our resources to luci, we need to make sure one of them is in place: a partition we will use for an Apache Web Server DocumentRoot filesystem. From a terminal window connected to nodel, create an ext 3-formatted 100MiB partition on the /dev/sda shared storage volume. Make sure it is recognized by both nodel and node2, and run the partprobe command, if not. Temporarily mount it and place a file named index . html in it with permissions mode 0644 and contents "Helio". Unmount the partition when finished, and do not place any entries for it in /e t c/ fstab. 18. Next we build our clustered service by first creating the resources that make it up. Back in the luc i interface window, select Add a Resource, then from the Select a Resource Type menu, select IP Address. Choose 172.16.50 . X6 for the IP address and make sure the Monitor link box is selected. Click the Submit button when finished. 19. Select Add a Resource from the left-hand-side menu, and from the drop-down menu select File system.

Enter the following parameters: Name: docroot File system type: ext3 Mount point: /var/www/html Device: /dev/sdal

All other parameters can be left at their default. Click the Submit button when finished.

-: -1-1 r7-:, 'In «I .1

r"...-1

ouA•un ouci

nAoa ni:on 1n7

20. Once more, select Add a Resource from the left-hand-side menu, and from the drop-down menu select Apache. Choose ht tpd for the Name. Set Shutdown Wait to 5 seconds. This parameter defines how long stopping the service may take before Cluster Suite declares it failed. Click the Submit button when finished. 21. Now we collect together our three resources to create a functional web server service. From the left-hand-side menu, select Services, then Add a Service. Choose webby for the Service Name, pre f er_nodel as the Failover Domain, and a Recovery Policy of Relocate. Leave all other options at their defaults. Click the Add a resource to this service button when finished. Under the Use an existing global resource drop-down menu, choose the previously-created IP Address resource, then click the Add a resource to this service button again. Under the Use an existing global resource drop-down menu, choose the previously-created File System resource, then click the Add a resource to this service button again. Finally, under the Use an existing global resource drop-down menu, choose the previouslycreated Apache Server resource. When ready, click the Submit button at the bottom of the window. If you want that webby starts automatically set the auto start option. 22. From the left-hand menu, select Cluster List. Notice the brief description of the cluster just created, including services, nodes, and status of the cluster service, indicated by the color of the cluster name. A green-colored name indicates the cluster service is functioning properly. If your cluster name is colored red, wait a minute and refresh the information by selecting Cluster List from the left-hand side menu, again. The service should autostart (an option in the service configuration window). If it remains a red color, that may indícate a problem with your cluster configuration. 23. Verify the web server is working properly by pointing a web browser on your local workstation to the URL: http: //172.16.50.X6/index.html or running the command: local# elinks

-

dump http:

//172.16.50.X6/index.html

Verify the virtual IP address and cluster status with the following commands: nodel# ip addr list nodel, 2#

clustat

24. If the previous step was successful, try to relocate the service using the lucí interface onto the other node in the cluster, and verify it worked. 25. While continuously monitoring the cluster service status from nodel, reboot node2 and watch the state of webby.

Copyright © 2011 Red Hat, Inc. . .

RH436-RHEL5u4-en-17-20110428 / 081f11a7

Lab 5.1 Solutions 1.

Recreate nodel and node2 if necess ary with the rebuild-cluster tool.

2.

It is best practice to put the cluster traffic on a private network. For this purpose ethl of your virtual machines is connected to private bridge named cluster on your workstation.

Cluster suite picks the network that is associated with the hostname as its cluster communication network. It is considered best practice to use a separate private network for that. Configure the hostname of both virtual machines so that it points to nodeN. c lus t e rX. example . com (Replace N with the node number and X with your cluster number. Make sure that the setting is persistent. Either edit /etc/sysconf ig/network manually or use the following perl statement cXni# perl -pi -e "s/HOSTNAME=.*/HOSTNAME=nodeN.clusterX. example 11 com" /etc/sysconfig/network cXnl# hostname nodeN. clus t erX. example .com .

Repeat for node2. 3.

Make sure that the iscsi target is available on both nodes. You can use /root/RH436/ HelpfulFiles/setup-initiator -bl . :111#

4.

/root/RH436/HelpfulFiles/setup-initiator -bl eX.n2# /root/RH436/HelpfulFiles/setup-initiator -bl

From any node in the cluster, delete any pre-existing partitions on our shared storage (the / root/RH4 3 6 /Helpf ulFi les/wipe_sda script makes this easy), then make sure the OS on each node has its partition table updated using the partprobe command. nodel#

/root/RH436/HelpfulFiles/wipe_sda

nodel , 2#

5.

partprobe /dev/sda

Install the lucí RPM on your workstation and the ricci and httpd RPMs on nodel and node2 of your assigned cluster. stationx# yum -y install luci nodel, 2#

6.

7.

yum -y install ricci httpd

Start the ricci service on nodel and node2, and configure it to start on boot. nodel, 2#

service ricci start

nodel, 2#

chkconfig ricci on

Initialize the lucí service on your workstation and create an administrative user named admin with a password of redhat.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 081f11a7

statíonx# luci admin init

8.

Restart luci (and configure to persist a reboot) and open the web page the command output suggests. Use the web browser on your local classroom machine to access the web page. stat ionx# chkconfig luci on; service luci restart Open https : //stationX. example . com : 8084/ in a web browser, where X is your cluster number. (If presented with a window asking if you wish to accept the certificate, click the 'OK' button)

9.

Log in to luci using admin as the Login Name and redhat as the Password.

10. From the "Luci Homebase" page, select the cluster tab near the top and then select "Create a New Cluster" from the left sidebar. Enter a cluster name of clusterX, where X is your assigned cluster number. Enter the fully-qualified name for your two cluster nodes (nodeN. clusterX. example . com ) and the password for the root user on each. Make sure that "Download packages" is pre-selected, then select the "Check if node passwords are identical" option. All other options can be left as-is. Do not click the Submit button yet! nodel.clusterX.example.com node2.clusterX.example.com

redhat redhat

11. Before submitting the node information to luci and beginning the Instail, Reboot, Configure, and Join phases, open a console window to nodel and node2, so you can monitor each node's progress. Once you have completed the previous step and have prepared your consoles, click the Submit button to send your configuration to the cluster nodes. stationx# xm console nodel stationx# xm console node2

12. Once luc i has completed (once all four circles have been filled-in in the luc i interface), you will be automatically re-directed to a General Properties page for your cluster. Select the Fence tab. In the XVM fence daemon key distribution section, enter dom0 . clusterX. example . com in the first box (node hostname from the host cluster) and nodel . c lusterX. example . com in the second box (node hostname from the hosted (virtual) cluster). Click on the Retrieve cluster nodes button. At the next screen, in the same section, make sure both cluster nodes are selected and click on the Create and distribute keys button. 13. From the left-hand menu select Failover Domains, then select Add a Failover Domain. In the "Add a Failover Domain" window, enter prefer_nodel as the "Failover Domain Name". Select the Prioritized and Restrict failover to this domain's members boxes. In the "Failover domain membership" section, make sure both nodes are selected as members, and that nodel has a priority of 1 and node2 has a priority of 2 (lower priority). Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-1 7-20110428 / 081f11a7

Click the Submit button when finished. 14. We must now configure fencing (the ability of the cluster to quickly and absolutely remove a node from the cluster). Fencing will be performed by your workstation (domo . c lus terX. example . com ), as this is the only node that can execute the xm destroy command necessary to perform the fencing action. First, create a shared fence device that will be used by all cluster nodes. From the left-hand menu select Shared Fence Devices, then select Add a Fence Device. In the Fencing Type; dropdown menu, select Virtual Machine Fencing. Choose the name xenf enceX (where x is your cluster number) and click the Add this shared fence device button. 15. Second, we associate each node with our shared fence device. From the left-hand menu select Nodes. From the lower left area of the first node in luc i's main window (nodel) select Manage Fencing for this Node. Scroll to the bottom, and in the Main Fencing Method section, click the Add fence device to this level link. In the drop-down menu, select xenf enceX (Virtual Machine Fencing). In the Domain box, type nodel (the name that would be used in the command: xm destroy to fence the node), then click the Update main fence properties button at the bottom. Repeat the process for each node in the cluster (using the appropriate node name for each in the Domain box). 16. To complete the fencing setup, we need to configure your workstation as a simple single-node cluster with the same fence xvm. key as the cluster nodes. Complete the following three steps: First, install the cman packages on your workstation, but do not start the cman service yet. Second, copy /e t c/c luster/fence_xvrn. key from one of the cluster nodes to /etc/ cluster on stationX st at ionX#SCp

nodel:/etc/cluster/fence_xvm.key /etc/cluster

Note:

If the fence key was not created automatically by the GUI, it is possible to create one manually.: 4

dd if=/dev/urandom o etc/cluster fence-xvm.key b0=4k bt" count=1

Third, add the command /sbin/fence_xvmd -L -I cluster to /etc/rc . local and execute rc . local. This starts the fence daemon without a running cluster (-L) and let it listen on the cluster bridge (-1 cluster). st ati.onx# echo '/sbin/fence xvmd -L -I cluster' »/etc/rc.local stationX#

/etc/rc.local

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 081f11a7

17. Before we add our resources to lucí, we need to make sure one of them is in place: a partition we will use for an Apache Web Server DocumentRoot filesystem. From a terminal window connected to nodel, create an ext3 formatted 100MiB partition on the /dev/sda shared storage volume. Make sure it is recognized by both nodei and node2, and run the partprobe command, if not. Temporarily mount it and place a file named índex . html in it with permissions mode 0644 and contents "Helio". Unmount the partition when finished, and do not place any entries for it in /etc/fstab. -

fdisk /dev/sda > (size=+100M, /dev/sdal (this partition may differ on your machine)) node 1 ,2# partprobe /dev/sda nade 1.# mkfs -t ext3 /dev/sdal node 1.# mount /dev/sdal /mnt node 144 echo "Helio" > /mnt/index.html nodei# chmod 644 /mnt/index.html node 1.44 umount /mnt nade 1 #

-

18. Next we build our clustered service by first creating the resources that make it up. Back in the luc i interface window, select Add a Resource, then from the Select a Resource Type menu, select IP Address. Choose 172.16.50.X6 for the IP address and make sure the Monitor link box is selected. Click the Submit button when finished. 19. Select Add a Resource from the left-hand-side menu, and from the drop-down menu select File system. Enter the following parameters: Name: docroot File system type: ext3 Mount point: /var/www/html Device: /dev/sdal

All other parameters can be left at their default. Click the Submit button when finished. 20. Once more, select Add a Resource from the left-hand-side menu, and from the drop-down menu select Apache. Choose ht tpd for the Name. Set Shutdown Wait to 5 seconds. This parameter defines how long stopping the service may take before Cluster Suite declares it failed. Click the Submit button when finished. 21. Now we collect together our three resources to create a functional web server service. From the left-hand-side menu, select Services, then Add a Service. Choose webby for the Service Name, prefer_nodel as the Failover Domain, and a Recovery Policy of Relocate. Leave all other options at their defaults. Click the Add a resource to this service button when finished. Under the Use an existing global resource drop-down menu, choose the previously-created IP Address resource, then click the Add a resource to this service button again. Copyright @ 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 081f11a7

Under the Use an existing global resource drop-down menu, choose the previously-created File System resource, then click the Add a resource to this service button again. Finally, under the Use an existing global resource drop-down menu, choose the previouslycreated Apache Server resource. When ready, click the Submit button at the bottom of the window. If you want that webby starts automatically set the auto start option. 22. From the left-hand menu, select Cluster List. Notice the brief description of the cluster just created, including services, nodes, and status of the cluster service, indicated by the color of the cluster name. A green-colored name indicates the cluster service is functioning properly. If your cluster name is colored red, wait a minute and refresh the information by selecting Cluster List from the left-hand side menu, again. The service should autostart (an option in the service configuration window). If it remains a red color, that may indicate a problem with your cluster configuration. 23. Verify the web server is working properly by pointing a web browser on your local workstation to the URL: ht tp : //172.16.50 . X6 / index . html or running the command: local#

elinks

-

dump http://172.16.50.X6/index.html

Verify the virtual IP address and cluster status with the following commands: nodel#

ip addr list

nodel, 2# clustat

24. If the previous step was successful, try to relocate the service using the lucí interface onto the other node in the cluster, and verify it worked (you may need to refresh the lucí status screen to see the service name change from the red to green color, otherwise you can continuously monitor the service status with the clustat -i 1 command from one of the node terminal windows.

Cluster List --> clusterX --> Services --> Choose a Task... --> Relocate this service to node3.clusterX.example.com --> Go Note: the service can also be manually relocated using the command: nodel#

clusvcadm -r webby -m node2.clusterX.example.com

from any active node in the cluster. 25. While continuously monitoring the cluster service status from nodel, reboot node2 and watch the state of webby. From one terminal window on nodel: nodel # clustat

-

i 1

From another terminal window on nodel: nodel #

ennwrinht (E) 9n11

tail -f /var/log/messages

Rad 1-lat Inr

111-1/11R-RHFI qiid-tan-17-2011(42R/ 0R1f11a7

Lecture 6

Logical Volume Management Upon completion of this unit, you should be able to: • understand advanced LVM topics • move and rename Volume groups • setup Clustered Logical Volumes

For use only by e student enrolled in e Red Hat training course taught by Red Hat, Inc. or a Red Hat Certitied Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrievat system, or otherwise reproduced without prior written consent of Red Hat, Inc. It you believe Red HM training materiaIs are being improperly usad, copiad, or distributed pisase email or phone toil-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / fa48b603 0,,

An LVM2 Review

6-1

• Review of LVM2 Iayers:

Logicat Volurnes lvcreate 1 Volurne C;rn p vgcreate pvcreate

1

9 Citickj(¡I CiA

1:2(k

PhysicAl W u rries

Block Devices

For use only by a student enrollad in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication mey be photocopled, duplicated, atorad in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. It you believe Red Hat training material, are being improperly usad, copied, or distributed pisase email or phone toll-free (USA) +1 (855)628 2994 or +1 (919)754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 889a2702

LVM2 - Physical Volumes and Volume Groups

6-2

• Creating a physical volume (PV) initializes a whole disk or a partition for use in a logical volume •

•

Using the space of one or more PVs, create a volume group (VG) named vg0 •

•

pvcreate /dev/sda5 /dev/sdb vgcreate vg0 /dev/sda5 /dev/sdb

Display information •

pvdisplay, pvs, pvscan

•

vgdisplay, vgs, vgscan

Whole disk devices or just a partition can be turned into a physical volume (PV), which is really just a way of initializing the space for later use in a logícal volume. If converting a partition into a physical volume, first set its partition type to LVM (se) within a partitioning tool like fdisk. Whole disk devices must have their partition table wiped by zeroing out the first sector of the device (dd if=/dev/zero of= bs=512 count=1). Up to 21\32 PVs can be created in LVM2. One or more PVs can be used to create a volume group (VG). When PVs are used to create a VG, its disk space is "quantized" into 4MB extents, by default. This extent is the minimum amount by which the logical volume (LV) may be increased or decreased in size. In LVM2, there is no restriction on the number of allowable extents and large numbers of them will have no impact on I/O performance of the LV. The only downside (if it can be considered one) to a large number of extents is it will slow down the tools. The following commands display useful PVNG information in a brief format: # pvscan PV /dev/sdb2 VG vg0 lvm2 [964.00 MB / 0 free] PV /dev/sdcl VG vg0 lvm2 [964.00 MB / 428.00 MB free] PV /dev/sdc2 lvm2 [964.84 MB] Total: 3 [2.83 GB] / in use: 2 [1.88 GB] / in no VG: 1 [964.84 MB] # pvs -o pv_name,pv_size -O pv_free PV PSize /dev/sdb2 964.00M /dev/sdcl 964.00M /dev/sdc2 964.84M # vgs -o vg_name,vg_uuid -O vg_size VG VG UUID vg0 18IoBt-hAFn-lUsj-dai2-UGry-Ymgz-w6AfD7

For use only by a student enrollad in a Red Hat training course taught by Red HM, Inc. or a Red HM Certified Training Partner. No part of this publication may be photocopied, duplicated, atorad in a retrieval system, or otherwise reproduced without prior written consent of Red HM, Inc. 11 you believe Red HM training materials are being improperly usad, copied, or distributed pisase email

To grow a GFS2 file system, the underlying logical volume on which it was built must be grown first. This is also a good time to consider if additional nodes will be added to the cluster, because each new node will require room for its journal (journais consume 128MB, by default) in addition to the data space.

For use only by a student enrollad in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No parí this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent ot Red Hat, Inc. fi you believe Red Hat training materials are being ímproperly usad, copiad, or distributed pisase email or phone toll-free (USA) +1 (866) 826 2994 or +1 (919) 754 3700.

Copyright @ 2011 Red Hat, Inc. . ..

RH436-RHEL5u4-en-17-20110428 / 239f7d7e

GFS2 Super Block Changes

7-11

• •

• It is sometimes necessary to make changes directly to GFS2 super block settings • GFS2 file system should be unmounted from all nodes before changes applied • Lock manager •

gfs2_tool sb proto [Iock_dlm,lock_nolock]

• Lock table name •

•

gfs2_tool sb table clusterl:gfslv

•

• List superblock information •

•

gfs2_tool sb all

GFS2 file systems are told at creation time (gfs2_mkfs) what type of locking manager (protocol) will be used. If this should ever change, the locking manager type can easily be changed with gfs2_tool. For example, suppose a single-node GFS2 filesystem created with the lock_nolock locking manager is now going to be made highly available by adding additional nodes and clustering the service between them. We can change its locking manager using: gfs2_tool sb proto lock_dlm

• • •

••

• a • •• ••

•

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or e Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materials are being improperly used, copied, or distributed please email or phone toI14ree (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright 02011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 6d5cd97d

••

GFS2 Extended Attributes (ACL)

• • • •

7-12

Access Control Lists (ACL) are supported under GFS2 file systems ACLs allow additional "owners/groups" to be assigned to a file or directory Each additional owner or group can have customized permissions File system must be mounted with acl option •

Add l acr to/etc/fstab entry

•

mount -o remount

• getfacl - view ACL settings • setfacl - set ACL permissions The file system on which ACLs are to be used must be mounted with the acl option. Place 'acl' in the options field of the file system's line entry in /etc/fstab and run the command mount -o remount . Run the mount command to verify the acl option is in effect. ACLs add additional owners and groups to a file or directory. For example, suppose the following file must have read-write permissions for user jane, and read-only permissions for the group 'users': -rw-r

1 jane users O Dec 17 18:33 data.0

Now suppose the 'boss' user also wants read-write permissions, and one particular user who is a member of the users group, 'joe', shouldn't have any access to the file at all. This is easy to do with ACLs. The following command assigns user 'boss' as an additional owner (user) with read-write permissions, and 'joe' as an additional owner with no privileges: setfacl -m u:boss:rw,u:joe:- data.0

Because owner permission masks are checked before group permission masks, user joe's group membership has no effect it never gets that far, stopping once identifying joe as an owner with no permissions.

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No parí of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. It you believe Red Hat training materials are being improperly used, copied, or distributed pisase email ctraining•redhat .com> or phone 1oll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright 02011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 4f14f60a

Repairing a GFS2 File System

7-13

• In the event of a file system corruption, brings it back into a consistent state • File system must be unmounted from all nodes •

gfs2 isck

While the command is running, verbosity of output can be increased (—V, —vv) or decreased (-q, -qq). The -y option specifies a 'yes' answer to any question that may be asked by the command, and is usually used to run the command in "automatic" mode (discover and fix). The -n option does just the opposite, and is usually used to run the command and open the file system in read-only mode to discover what errors, if any, there are without actually trying to fix them. For example, the following command would search for file system inconsistencies and automatically perform necessary changes (e.g. attempt to repair) to the file system without querying the user's permission to do so first.

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, dupliceted, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materials are being impropedy usad, copied, or distributed please email < training@redhat . com> or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright @ 2011 Red Hat, Inc. A11 • la

J

RH436-RHEL5u4-en-17-20110428 / 85eee0fb J^^

End of Lecture 7

• Questions and Answers • Summary •

Describe GFS2

•

Setup and maintain GFS2

For use only by a student enrolled in a Red HM training course taugM by Red HM, Inc. or a Red HM Certífied Training Partner. No part of ibis publication may be photocopied, duplicated, Morad in a retrieval system, or otherwise reproduced without prior written consent of Red HM, Inc. 11 you believe Red HM training materials are being improperly used, copiad, or distributed pisase email ctrainiagaredhat . con, or phone toll-free (USA) +1 (866) 626 2994 or +1 (919)754 3700.

Copyright @ 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / a2166820

• Lab 7.1: Creating a GFS2 file system with Conga Instructions:

1.

Create a 2 GB logical volume named webfs containing a gf s2 filesystem named webf s. Set the mount point to /var/www/html. Leave the other values at their defaults.

2.

Before adding the filesystem to the webby service, let's mount the filesystem manually on a node and add some content for the web service. On node1 mount the filesystem to /var/ www/html and create the file índex . html with some content.

3.

On node2 verify the GFS2 functionality by mounting the filesystem manually and checking the content of the index. html file.

4.

Umount the GFS2 filesystem on both nodes.

5.

Create a GFS resource using the newly added filesystem. Use the following parameters and leave the others at their default value. Name Mount Point Device Filesystem Type

e

e

• •

O

webfs /var/www/html /dev/ClusterVG/webf s GFS2

6.

Remove the doc root resource from the webby service. Notice that this also removes the httpd child resource.

7.

Add the GFS resource webfs as a child to the IP address.

8.

Re-add the httpd resource as a child to the webfs resource.

9.

Save the changes.

• O

• e • O

10. Return to the Services list and enable the webby service. Confirm it's operation by pointing your webbrowser to http : / /172.16.50 . 100+X.

•

•

1

Coovriaht C 2011 Red Hat. Inc.

RH426-RHFI_Sn4-en-17-2011042R / asSPRSRA

• • •

1 1›

Lab 7.2: Create a GFS2 filesystem on the commandline We've seen how easy it is to configure a GFS2 filesystem from within lucí, but what if we want to configure a GFS2 filesystem for a nonclustered application? In this lab we explore how to create and manage a GFS2 filesystem from the command line.

Scenario:

1 1,

Instructions:

111

1. Because we've already configured a GFS2 filesystem from within lucí, the required RPMs have already been installed for us. GFS2 requires only gf s2 -ut ils. The kernel module is already provided by the installed kernel RPM. if GFS2 is installed on top of CLVM lvm2 cluster is also required. -

Note that luci has also installed the GFS 1 specific RPMs which we will use in the next exercise. Verify which of the aboye RPMs are already installed on your cluster nodes. •

2. Verify that the GFS2 kernel module is loaded. nodel#

3. 11/

lsmod I head

-

1; lsmod grep

-

E " (gfs I dlm I kmod) "

Verify that Conga converted the default LVM locking type from 1 (local file-based locking) to 3 (clustered locking), and that clvmd is running. node1,2# grep locking_type /etc/lvm/lvm.conf nodel, 2# service clvmd status

Note:

convert the locking type without Conga's help, use the following command before starting clvmd: To

nodel , 2#

4.

vinconf•- enabl e -clus ter

In the next step we will create a clustered LVM2 logical volume as the GFS 1 "container". Before doing so, we briefly review LVM2 and offer some troubleshooting tips. First, so long as we are running the clvmd service on all participating GFS cluster nodes, we only need to create the logical volume on one node and the others will automatically be updated. Second, the following are helpful commands to know and use for displaying information about the different logical volume elements: pvdisplay, pvs vgdisplay [-v], vgs

Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 699c0558

lvdisplay, lvs service clvmd status Possible errors you may encounter: If, when viewing the LVM configuration the tools show or complain about missing physical volumes, volume groups, or logical volumes which no longer exist on your system, you may need to flush and re-scan LVM's cached information: # # # #

rm -f /etc/lvm/cache/.cache pvscan vgscan lvscan

If, when creating your logical volume it complains about a locking error ("Error locking on node..."), stop clvmd on every cluster node, then start it on all cluster nodes again. You may even have to clear the cache and re-scan the logical volume elements before starting clvmd again. The output of: # lvdisplay I grep "LV Status" should change from:

LV Status

NOT available

LV Status

available

to:

and the LV should be ready to use. If you need to dismantle your LVM to start from scratch for any reason, the following sequence of commands will be helpful: 1. Remove any /etc/fstab entries referencing the LVM 2. Make sure it is unmounted 3. Deactivate the logical volume 4. Remove the logical volume 5. Deactivate the volume group 6. Remove the volume group 7. Remove the physical volumes 8. Stop c lvmd

vi /etc/fstab umount /dev/ClusterVG/gfslv lvchange -an /dev/ClusterVG/ gf slv lvremove /dev/ClusterVG/gfslv vgchange -an ClusterVG vgremove ClusterVG pvremove /dev/sd?? service clvmd stop

5.

Create a 1GB logical volume named gf slv from volume group Cluste rVG that will be used for the GFS.

6.

The GFS locktable name is created from the cluster name and a uniquely defined name of your choice. Verify your cluster's name.

Coovriaht 2011 Red Hat. Inc.

RH436-RHEL5u4-en-17-20110428 / 699c0558

7.

Create a GFS2 file system on the gf slv logical volume with journal support for two (do not create any extras at this time) nodes. The GFS2 file system should use the default DLM to manage it's locks across the cluster and should use the unique name "gf slv". Note: GFS2 journals consume 32MB, by default, each.

8.

Create a new mount point named /mnt/gf s on both nodes and mount the newly created file system to it, on both nodes. Look at the tail end of /var/log/messages to see that it has properly acquired a journal lock.

9.

Add an entry to both node's /etc/ f stab file so that the shared file system persists across reboots.

10. Copy into or create some data in /mnt/gf s from either node and verify that the other node can see and access it.

Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 699c0558

Lab 7.3: GFS1: Conversion Scenario:

While GFS 1 is still supported, many customers choose to upgrade their existing GFS 1 filesystems to GFS2 for better performance. In this exercise we create a GFS 1 filesystem and then convert it to GFS2. Again we use the command line. Notice the strong similiarity between the GFS 1 and GFS2 commands.

Instructions:

1.

Luci has already installed the necessary gf s -ut 1.1 s and kmod-gf s -xen for you.

2.

Create a 1GB logical volume named gf si from volume group ClusterVG that will be used for the GFS.

3.

Create a GFS 1 file system on the gf sl logical volume with journal support for two nodes using the mkfs.gfs command. The GFS 1 file system should use the default DLM to manage it's locks across the cluster and should use the unique narre "gf sl". Note: journals in GFS2 consume 128MB, by default, each.

4.

Create a new mount point named /mnt /gf sl on both nodes and mount the newly created file system to it, on both nodes. Look at the tail end of /var/log/messages to see that it has properly acquired a journal lock.

5.

Add an entry to both node's /etc/f st ab file so that the shared file system persists across reboots.

6.

Copy into or create some data in /mnt /gf sl from either node and verify that the other node can see and access it.

7.

It is now time to convert this filesystem to GFS2. This conversion has to be done offline.

Umount the filesystem on both nodes. 8.

Before the filesystem is converted it is strongly recommended to backup the filesystem and perforen a filesystem check. Use the tool gfs_fsck to test the integrity of the data.

9.

Convert the filesystem to GFS2.

10. Update the filesystem type in /et c/f stab on both cluster nodes. 11. Now mount the filesystem again. Use mount -a to verify consistency of /etc/ f st ab

12. Cleanup: Umount the filesystem and delete the /etc /f s t ab reference on both nodes and remove the logical volume /dev/ClusterVG/gf sl

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-201 10428 / 734fc20a

I>

Lab 7.4: GFS2: Working with images Scenario:

1

In this exercise we explore how to access an image of a GFS2 filesystem on a node outside of the cluster.

Instructions:

1.

Begin by creating a 512MB LVM volume with a GFS2 filesystem. Use imagetest as the filesystem and volume name.

2. Mount the filesystem on a single node and put some data in it. You don't need to mount the filesystem persistently across boots. •

1

3. Before taking the image umount the filesystem. Images taken from a live filesystem will not be consistent. 4.

Now use dd to create an image of the logical volume. The GFS2 filesystem you have created in the first part of the lab has enough space to store the file.

5.

We won't need the filesystem anymore. Remove it.

6.

Now copy this image to your physical system.

7.

You have two options to access this image: Either by creating a logical volume or partition and "dd'ing" the image into that container or by loop-mounting the image directly. In both cases you have to manually disable the DLM locking mechanism, since the station is not member of your cluster.

1110

Let's use the loop-mounting way: Use losetup to point the loopl device to the image file, perform a filesystem check and the mount the filesystem 8.

How would you mount such an image persistently without creating the loop device manually?

1 Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 88f20560

Lab 7.5: GFS2: Growing the filesystem Instructions:

1.

We still have room left in volume group ClusterVG, so let's expand our logical volume and GFS2 filesystem to use the rest of the space. First, expand the logical volume into the remaining volume group space.

2.

sp.;

Now grow the GFS2 filesystem into the newly-available logical volume space, and verify the additional space is available. Note: GFS2 must be mounted, and we only need to do this on one node in the cluster.

,r,

nn

1

U.., I.,"

fLJ Anc nu 1-1

C.. A

17 Al,- t. 4 "AntS 1

..cn......esns

I> 11)

Lab 7.1 Solutions 1.

Create a 2 GB logical volume named webfs containing a gf s2 filesystem named webfs. Set the mount point to /var/www/html. Leave the other values at their defaults. Go to the storage tab and select one of your cluster nodes. Select Volume Groups/ClusterVG. Click on New Logical Volume.

110

Set the Logical Volume Name to webfs, the size to 2 GB and the Content Type to GFS2 Global FS v.2. Choose webfs as the Unique GFS Name and /var/ f tp/pub as the mount point. Leave the other values at their default values. Click on the Create button and confirm. 2.

1

Before adding the filesystem to the webby service, let's mount the filesystem manually on a node and add some content for the web service. On nodel mount the filesystem to /var/ www/html and create the file index . html with some content. nodel# mount /dev/ClusterVG/webfs /var/www/html nodel# echo 'Helio GFS2I' >/var/www/html/índex.html

3.

ID

1 1 11, 1 1 1

On node2 verify the GFS2 functionality by mounting the filesystem manually and checking the content of the index html file. node2# mount /dev/ClusterVG/webfs /var/www/html node2# cat /var/www/html/index.html Helio GFS2!

4.

Umount the GFS2 filesystem on both nodes. nodel# umount /var/www/html node2# umount /var/www/html

5.

Create a GFS resource using the newly added filesystem. Use the following parameters and leave the others at their default value. Name Mount Point Device Filesystem Type

webf s /var/www/html /dev/ClustervG/webf s GFS2

In Luci, go to the cluster tab and click on the c lus terX link. Select Resources, then Add a Resource.

1 1

Choose the Resource Type GFS f i le system. Set the Name, Mount Point, Device, and Filesystem Type as defined aboye and leave the other values unchanged. Click on Submit and confirm. Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / aa523563

6.

Remove the doc root resource from the webby service. Notice that this also removes the httpd child resource. Go to Services and click on the webby link. Scroll down to the File System Resource Configuration and click the button Delete this Resource. Confirm. As you see, the changed Service Composition only lists the IP Address resource.

7.

Add the GFS resource webf s as a child to the IP address. At IP Address Resource Configuration click on Add a child. From Use an existing global resource select webf s (GFS) .

8.

Re-add the httpd resource as a child to the webf s resource. At GFS Resource Configuration click on Add a child. From Use an existing global resource select httpd (Apache Server).

9.

Save the changes. Scroll down and click on Save changes. Confirm.

10. Return to the Services list and enable the webby service. Confirm it's operation by pointing your webbrowser to http: //172.16.50.100+X. Select Services, then choose Enable this service from webby's task list. Click on Go and con firm. Open a new webbrowser window and enter http: / /172.16.50.100+X as the URL. Do you see the content of your índex . html file?

Coovrinht (c) 2011 Red Hat Inc

RHdRA-RHFI Sud-an-17-20110420 / RaWRARR

•

Lab 7.2 Solutions 1.

Because we've already configured a GFS2 filesystem from within luc i, the required RPMs have already been installed for us. GFS2 requires only gf s 2 utils. The kernel module is already provided by the installed kernel RPM. if GFS2 is installed on top of CLVM lvm2 cluster is also required. -

•

Note: Luci has also installed the GFS 1 specific RPMs which we will use in the next exercise. Verify which of the aboye RPMs are already installed on your cluster nodes.

•

nodel# rpm -qa 1 grep -E " (gfs 1 lvm2 ) "

-

2. Verify that the GFS2 kernel module is loaded. nodel#

I>

3.

e

lsmod 1 head -1; ismod 1 grep -E " (gfs 1 dlm 1 kmod) "

Verify that Conga converted the default LVM locking type from 1 (local file-based locking) to 3 (clustered locking), and that clvmd is running. nodel , 2#

110

grep locking type /etc/lvm/lvm. conf

node]., 2# service clvmd status

Note: to convert the locking type without. Conga's help, use the following command before starting clvmd:

O

node 1 , 2#

1 1 O e

4.

lvmconf -enable-cluster

In the next step we will create a clustered LVM2 logical volume as the GFS2 "container". Before doing so, we briefly review LVM2 and offer some troubleshooting tips. First, so long as we are running the clvmd service on all participating GFS cluster nodes, we only need to create the logical volume on one node and the others will automatically be updated. Second, the following are helpful commands to know and use for displaying information about the different logical volume elements:

pvdisplay, pvs vgdisplay [-v], vgs lvdisplay, lvs service clvmd status

O O

Possible errors you may encounter:

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 699c0558

If, when viewing the LVM configuration the tools show or complain about missing physical volumes, volume groups, or logical volumes which no longer exist on your system, you may need to flush and re-scan LVM's cached information: # # # #

rm -f /etc/lvm/cache/.cache pvscan vgscan lvscan

If, when creating your logical volume it complains about a locking error ("Error locking on node..."), stop c lvmd on every cluster node, then start it on all cluster nodes again. You may even have to clear the cache and re-scan the logical volume elements before starting clvmd again. The output of: # lvdisplay 1 grep "LV Status"

should change from: LV Status

NOT available

LV Status

available

to:

and the LV should be ready to use. If you need to dismantle your LVM to start from scratch for any reason, the following sequence of commands will be helpful: 1. Remove any /etc/ f stab entries referencing the LVM 2. Make sure it is unmounted 3. Deactivate the logical volume 4. Remove the logical volume 5. Deactivate the volume group 6. Remove the volume group 7. Remove the physical volumes 8. Stop c lvmd 5.

vi /etc/fstab umount /dev/ClusterVG/gfslv lvchange -an /dev/ClusterVG/ gfslv lvremove /dev/ClusterVG/gfslv vgchange -an ClusterVG vgremove ClusterVG pvremove /dev/sd?? service clvmd stop

Create a 1GB logical volume named gf s lv from volume group ClusterVG that will be used for the GFS. nodel#

lvcreate

-

L 1G

-

n gfslv ClusterVG

This command will create the /dev/ClusterVG/gfslv device file and it should be visible on all nodes of the cluster. 6.

The GFS locktable name is created from the cluster name and a uniquely defined name of your choice. Verify your cluster's name. nodel#

cman tool status 1 grep "Cluster Name"

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-201 10428 / 699c0558

7.

Create a GFS2 file system on the gf slv logical volume with journal support for two (do not create any extras at this time) nodes. The GFS2 file system should use the default DLM to manage its locks across the cluster and should use the unique name "gf slv". Note: journals consume 32MB, by default, each. Substitute your cluster's number for the character X in the following command: nodel# mkfs.gfs2 -t clusterX:gfslv -j 2 /dev/ClusterVG/gfslv

8.

Create a new mount point named /mnt /gf s on both nodes and mount the newly created file system to it, on both nodes. Look at the tail end of /var/log/messages to see that it has properly acquired a journal lock. node1,2# mkdir /mnt/gfs nodel,2# mount /dev/ClusterVG/gfslv /mnt/gfs node1,2# tail /var/log/messages

9.

Add an entry to both node's /etc/f st ab file so that the shared file system persists across reboots. /dev/ClusterVG/gfslv O

/mnt/gfs

gfs2

defaults

O V'

10. Copy into or create some data in /mnt/gf s from either node and verify that the other node can see and access it. nodel# cp /etc/group /mnt/gf node2# cat /mnt/gfs/group

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-201 10428 / 699c0558

•

Lab 7.3 Solutions 1.

Luci has already installed the necessary gf s -ut ils and kmod-gf s -xen for you.

2.

Create a 1GB logical volume named gf sl from volume group ClusterVG that will be used for the GFS.

nodel# lvcreate

-

L 1G

-

•

n gfsl ClusterVG

This command will create the /dev/ClusterVG/gf sl device file and it should be visible on all nodes of the cluster. 3.

•

Create a GFS I file system on the gf sl logical volume with journal support for two nodes unisng the mkfs.gfs command. The GFS 1 file system should use the default DLM to manage its locks across the cluster and should use the unique name "gf sl". Note: journals consume 128MB, by default, each. • Substitute your cluster's number for the character X in the following command:

4111

nodel# mkfs.gfs -t clusterX:gfsl -j 2 /dev/ClusterVG/gfsl 4.

Create a new mount point named /mnt/gf sl on both nodes and mount the newly created file system to it, on both nodes.

•

Look at the tail end of /var/ log/messagesl. to see that it has properly acquired a journal lock.

•

111

nodes, 2# mkdir /mnt/gfsl nodel , 2# mount /dev/ClusterVG/gfsi /mnt/gfsi nodel , 2# tan. /var/log/messages 5.

Add an entry to both node's /et c/ f stab file so that the shared file system persists across reboots. /dev/ClusterVG/gfsl

/mnt/gfsl

gf s

defaults

•

O ii

O 6.

Copy into or create some data in /mnt /gf sl from either node and verify that the other node can see and access it.

• •

•

nodel# cp /etc/group /mnt/gfsl node2# cat /mnt/gfsi/group

• 7.

It is now time to convert this filesystem to GFS2. This conversion has to be done offline. Umount the filesystem on both nodes.

nodes. , 2# umount /mnt/gfsi 8.

Before the filesystem is converted it is strongly recommended to backup the filesystem and perform a filesystem check. Use the tool gfs fsck to test the integrity of the data. 011>

CInnwrinht n

9ni 1 Pori 1--lat Inr

RHARR- [AH Fl id-pn-17-9(111n49R 734fr2na

nodel# gfs_fsck /dev/ClusterVG/gfsl

9.

Convert the filesystem to GFS2 nodel# gfs2_convert /dev/ClusterVG/gfsl

10. Update the filesystem type in /etc/ f stab on both cluster nodes. Edit both node's /et c/f stab to read: /etc/fstab: /dev/ClusterVG/gfsl O

/mnt/gfsl

gfsi

defaults

O iti

11. Now mount the filesystem again. Use mount -a to verify consistency of /et c/f stab nade 1. , 2# mount -a

12. Cleanup: Umount the filesystem and delete the /etc/ f stab reference on both nodes and remove the logical volume /dev/ClusterVG/gfsi umount /dev/ClusterVG/gfsl node , 2# vim /etc/fstab nodel# lvremove /dev/ClusterVG/gfsl node 1 , 2#

Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 734fc20a

Lab 7.4 Solutions 1.

Begin by creating a 512MB LVM volume with a GFS2 filesystem. Use imagetest as the filesystem and volume name.

nodel# nodel#

2.

Mount the filesystem on a single node and put some data in it. You don't need to mount the filesystem persistently across boots.

nodel# nodel#

3.

mkdir /mnt/imagetest cp /etc/services /mnt/imagetest

Before taking the image umount the filesystem. Images taken from a live filesystem will not be consistent.

nodel#

4.

lvcreate -n imagetest -G 512M ClusterVG mkfs.gfs2 -t clusterX:imagetest /dev/ClusterVG/imagetest

umount /dev/ClusterVG/imagetest

Now use dd to create an image of the logical volume. The GFS2 filesystem you have created in the first part of the lab has enough space to store the file.

nodel# dd if=/dev/ClusterVG/imagetest of=/mnt/gfa/imagetest.img bs=4M

5.

We won't need the filesystem anymore. Remove it. nodel#

6.

Now copy this image to your physical system. nodel#

7.

lvremove /dev/ClusterVG/imagetest

scp /mnt/gfs/imagetest.img stationX:/tmp

You have two options to access this image: Either by creating a logical volume or partition and "dd'ing" the image into that container or by loop-mounting the image directly. In both cases you have to manually disable the DLM locking mechanism, since the station is not member of your cluster. Let's use the loop-mounting way: Use losetup to point the 1 oopl device to the image file, perform a filesystem check and the mount the filesystem to /mnt/imagetest.

losetup /dev/loopl /tmp/imagetest.img stationx# gfs2_fsck /dev/loopl stationx# mkdir /mnt/imagetest stationx# mount -o lockproto=locknolock /dev/loopl stationX#

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 88f20560

/mnt/imagetest

8.

How would you mount such an image persistently without creating the loop device manually?

/etc/fstab: /tmp/imagetest. img /mnt/image gfs2 loop, lockproto=lock_nolock O

11

O

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 88f20560

Lab 7.5 Solutions 1.

We still have room left in volume group ClusterVG, so lets expand our logical volume and GFS2 filesystem to use the rest of the space. First, expand the logical volume into the remaining volume group space. Determine the number of free physical extents (PE) in vg0: nodel#

vgdisplay vg0

Free PE / Size

I

grep Free 516 / 2.98 GB

then grow the logical volume by that amount (alternatively, you can use the option " -1 +10 0%FREE" to lvextend to do the same thing in fewer steps): nodel#

lvextend

-

1 +516 /dev/vg0/gfs

and verify the additional space in the logical volume: nodel#

2.

lvdisplay /dev/ClusterVG/gfslv

Now grow the GFS2 filesystem into the newly-available logical volume space, and verify the additional space is available. Note: GFS2 must be mounted, and we only need to do this on one node in the cluster. nodel#

gfs2 grow -v /mnt/gfs

nodel#

df

Copyright 02011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / a63ec29f

e e e e

e e e e

Lecture 8

Quorum and the Cluster Manager Upon completion of this unit, you should be able to: • Define Quorum • Understand how Quorum is Calculated • Understand why the Cluster Manager Depends Upon Quorum

•

e e

e e

e e

For use only by a student enrollad in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No parí of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materials are being improperly used, copiad, or cfistributed please email ctrainingerecthat coa, or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / aa694607

Cluster Quorum

8-1

• Majority voting scheme to deal with split-brain situations • Each node has a configurable number of votes (default=1) • or phone ton-free (USA) +1 (866)526 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 2d4d95b7 .

-

e e

cman_tool Examples

•

cman_tool join •

•

Leave the cluster Fails if systems are still using the cluster

cman_tool status •

•

Join the cluster

cman_tool leave • •

•

8-9

Local view of cluster status

cman_tool nodes •

Local view of cluster membership

In a CMAN cluster, there is a join protocol that all nodes have to go through to become a member, and nodes will only talk to known members. By default, cman will use UDP port 6809 for internode communication. This can be changed by setting a port number in cluster . conf as follows:

or at cluster join time using the command: cman_tool join -p 6809

For use only by a student enrollad in a Red HM training course taught by Red Hat, Inc. or a Red HM Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red HM, Inc. If you believe Red HM training materials are being improperly used, copied, or cfistributed pisase email or phone toll-free (USA) +1 (866) 626 2994 or +1 (019) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 0469cf08

CMAN - API

8-10

• Provides interface to cman libraries • Cluster Membership API • Backwards-compatible with RHEL4 The 1 ibcman library provides a cluster membership API. It can be used to get a count of nodes in the cluster, a list of nodes (name, address), whether it is quorate, the cluster name, and join times.

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of thie publication may be photocopled, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. It you believe Red Hat training material. are being improperly ueed, copied, or distributed pisase email < trainingaredhat . coa» or phone toll-f res (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright @ 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / a766c 181 —s..

CMAN -

libcman

8-11

• For developers • Backwards-compatible with RHEL4 • Cluster Membership API •

cman_get_node_count()

•

cmangetnodes()

•

cman_get node()

•

cman_is_quorate()

•

cman_get_cluster()

•

cmansenddata()

The 1 ibcman library provides a cluster membership API. It can be used to get a count of nodes in the cluster, a Iist of nodes (name, address), whether it is quorate, the cluster name, and join times.

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. 1f you believe Red Hat training materials are being improperly used, copiad, or cfistributed please email ctraining•redhat. coa> or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 24e3d78f ,no

e

e

End of Lecture 8

• •

• Questions and Answers • Summary • •

Define Quorum Understand how Quorum is Calculated

•

Understand why the Cluster Manager Depends Upon Quorum

e

e

• e e

•

• • e • e

•

o

•e e For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, atorad in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. 11 you believe Red Hat training materiale are being Improperly tosed, copiad, or distributed please email < t raining@redhat . com> or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / aa694607

•

1 Lab 8.1: Extending Cluster Nodes Scenario:

In this exercise we will extend our two-node cluster by adding a third node.

System Setup:

Students should already have a working two-node cluster from the previous lab.

Instructions: 1.

Recreate node3 if you have not already done so, by executing the command:

11> stationX#

2.

Make sure the node's hostname is set persistently to node3 . clusterX. example . com Configure your cluster's node3 for being added to the cluster by installing the ricci and httpd RPMs, starting the ricci service, and making sure the ricci service survives a reboot.

1110

1111 101

rebuild-cluster -3

Make sure that node3's iscsi initator is configured and the partition table is consistent with nodel and node2. 3.

If not already, log into luc i's administrative interface. From the cluster tab, select Cluster List

from the clusters menu on the left-side of the window.

11>

From the "Choose a cluster to administer" section of the page, click on the cluster name. 4.

Enter the fully-qualified name of your node3 (node3 . clusterX. example . com ) and the root password. Click the Submit button when finished. Monitor node3's progress via its console and the luc i interface.

11>

5.

Provide node3 with a copy of /etc/c lust ernence_xvm . key from one of the other nodes, and then associate node3 with the xenfenceX shared fence device we created earlier.

6.

Make sure that cman and rgmanager start automatically on node3 by setting the Enabled at start up flag.

7.

Once finished, select Failover Domains from the menu on the left-hand side of the window, then click on the Failover Domain Name (prefer_nodel).

111 10 111

From the clusterX menu on the left side, select Nodes, then select Add a Node.

In the "Failover Domain Membership" section, node3 should be usted. Make it a member and set its priority to 2. Click the Submit button when finished. 8.

Relocate the webby service to node3 to test the new configuration, while monitoring the status of the service. Verify the web page is accessible and that node3 is the node with the 172.16.50 . X6 IP address.

Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 04a4445d

9.

Troubleshooting: In rare cases luci fails to propage /etc /c lust er /c luster . conf to a newly added node. Without the config file cman cannot start properly. If the third node cannot join the cluster check if the file exist on node3. If it doesn't, copy the file manually from another node and restart the cman service manually.

10. View the current voting and quorum values for the cluster, either from luc i's Cluster List view or from the output of the command cman_tool status on any cluster node. 11. Currently, the cluster needs a minimum of 2 nodes to remain quorate. Let's test this by shutting down our nodes one by one. On nodel, continuously monitor the status of the cluster with the clustat command, then poweroff -f node 3. Which node did the service failover to, and why? Verify the web page is still accessible. 12. Check the values for cluster quorum and votes again. Go ahead and poweroff -fnode2. 13. Does the service stop or fail? Why or why not? Check the values for cluster quorum and votes again. 14. Re-start nodes 2 and 3, and once again query the cluster quorum and voting values. Have they returned to their original settings?

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 04a4445d

1 Lab 8.2: Manually Editing the Cluster Configuration Scenario:

111

1111 0 1110

Building a cluster from scratch or making changes to the cluster's configuration within the lucí GUI is convenient. Propagating changes to the other cluster nodes can be as simple as pressing a button within the interface. There are times, however, that you will want to tweak the cluster . conf file by hand: avoiding the overhead of a GUI, modifying a parameter that can be specified in the XML but isn't handled by the GUI, or maybe changes best implemented by a script that edits the cluster. conf file directly. Command line interface (CLI) changes are straightforward, as you will see, but there is a process that must be followed.

Deliverable:

In this lab section, we will make a very simple change to the cluster. conf file, propagate the new configuration and update the inmemory CCS information.

" Instructions:

11>

111> 11,

1.

First, inspect the current post_join_delay and conf ig_vers ion parameters on both nodel and node2.

2. On nodel, edit the cluster configuration file, /etc/cluster/cluster. conf, and increment the post join_delay parameter from its default setting to a value that is one integer greater (e.g. change post_join_delay=3) to post_join_delay=4. Do not exit the editor, yet, as there is one more change we will need to make. 3. Whenever the cluster . conf file is modified, it must be updated with a new integer version number. Increment your cluster. conf conf ig_vers ion value (keep the double quotes around the value) and save the file.

1)

4.

On node2, verify (but do not edit) its cluster. conf still has the old values for the post_j oin_delay and conf ig_vers ion parameters.

5.

On nodel, update the CCS with the changes, then use ccsd to propagate them to the other nodes in the cluster. Re-verify the information on node2. Was the post join delay and conf ig_vers ion updated on node2? Is cman on node2 aware of the update?

1 1 Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 ecb3f060

Lab 8.3: GFS2: Adding Journals Scenario:

Every node in the cluster that wants access to the GFS needs its own journal. Each journal is 128MB in size, by default. We specified 2 journals were to be created (-j 2 option to mkfs.gfs2) when we first created our GFS filesystem, and so only nodel and node2 were able to mount it. We now want to extend GFS2's reach to our third node, node 3. In order to do that, we need to add an additional journal. We will actually add two additional journals it is always helpful to have spares for future growth.

System Setup:

GFS2 must be mounted, and we only have to do this on one node.

Instructions:

1.

First, verify our current number of journals.

2.

Confirm that the third node can currently not mount the GFS2 filesystem.

3.

Verify that there is enough space on the filesystem to add another 128MB journal.

4.

Add two more journals with the same size.

5.

Verify that the available space has been reduced by 2*128MB

6.

Now mount the GFS2 filesystem on the third node. This time the command should succeed.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 33856a1d

1 11

Lab 8.1 Solutions 1. Recreate node3 if you have not already done so, by executing the command: stationx# rebuild cluster -

111

-

3

2. Make sure the node's hostname is set persistently to node3 . clusterX. example . com eXn3# perl -pi -e ns/HOSTNAHE=.*/HOSTNAME=node3.clusterX.example .com" /etc/sysconfig/network cXn1# hostname node3.clusterX.example.com

11. 11>

Configure your cluster's node3 for being added to the cluster by installing the ricci and httpd RPMs, starting the ricci service, and making sure the ricci service survives a reboot.

110

node3# yum -y install ricci httpd

11>

node3# service ricci start; chkconfig ricci on

Make sure that node3's iscsi initator is configured and the partition table is consistent with nodel and node2 node3# /root/RH436/HelpfulFiles/setup-initiator -bl node3# partprobe /dev/sda

11>

3.

If not already, log into luci's administrative interface. From the cluster tab, select Cluster List from the clusters menu on the left-side of the window. From the "Choose a cluster to administer" section of the page, click on the cluster name.

I>

4.

Enter the fully-qualified name of your node3 (node3 . clusterX. example . com ) and the root password. Click the Submit button when finished. Monitor node3's progress via its console and the luc i interface.

10

11.

From the clusterX menu on the left side, select Nodes, then select Add a Node.

5.

Provide node3 with a copy of /et c/c lus ter/ fence_xvm . key from one of the other nodes, and then associate node3 with the xenfenceX shared fence device we created earlier. nodel# scp /etc/cluster/fence_xvm.key node3:/etc/cluster

To associate node3 with our shared fence device, follow these steps: From the left hand menu select Nodes, then select node3 . clusterX. example . com just below it. In luci's main window, scroll to the bottom, and in the "Main Fencing Method" section, click the "Add fence device to this level" link. In the drop-down menu, select "xenfenceX (Virtual Machine Fencing)". In the "Domain" box, type node3, then click the Update main fence properties button at the bottom. -

11/

6.

Make sure that cman and rgmanager start automatically on node3 by setting the Enabled at start up flag.

110 Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 04a4445d ^ ^

7.

Once finished, select Failover Domains from the menu on the left-hand side of the window, then click on the Failover Domain Name (pre f e r_nodel). In the "Failover Domain Membership" section, node3 should be listed. Make it a member and set its priority to 2. Click the Submit button when finished.

8.

Relocate the webby service to node3 to test the new configuration, while monitoring the status of the service. Monitor the service from luc i's interface, or from any node in the cluster run the clustat -i 1 command. To relocate the service in lucí, traverse the menus to the webby service (Cluster List --> webby), then choose Relocate this service to node3.clusterX.example.com " from the Choose a Task... drop-down menu near the top. Click the Go button when finished. Alternatively, from any cluster node run the command: nodel# clusvcadm

-

r webby

-

m node3 .clusterX. example .com

Verify the web page is accessible and that node3 is the node with the 172.16.50 . X6 IP address (Note: the ifconfig command won't show the address, you must use the ip command). qt at ionX# el inks -dump http: //172.16.50 . X6/index . html node 3# ip addr list

9.

Troubleshooting: In rare cases luci fails to propage /et c /c lus t e r/ c luster . conf to a newly added node. Without the config file cman cannot start properly. If the third node cannot join the cluster check if the file exist on node3. If it doesn't, copy the file manually from another node and restad the cman service manually.

10. View the current voting and quorum values for the cluster, either from luc i's Cluster List view or from the output of the command cman_tool status on any cluster node. nodel# cman tool status

Nodes: 3 Expected votes: 3 Total votes: 3

Quorum: 2 (output truncated for brevity) 11. Currently, the cluster needs a minimum of 2 nodes to remain quorate. Let's test this by shutting down our nodes one by one. On nodel, continuously monitor the status of the cluster with the clustat command, then poweroff -f node3. nodel#

clustat -i 1

node3# poweroff - f Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 04a4445d

Which node did the service failover to, and why? The node should have failed over to node 1 because it has a higher priority in the pref er_nodel failover domain (the name is a clue!). Verify the web page is still accessible. stat ionX#

elinks -dump http://172.16.50.X6/index.html

12. Check the values for cluster quorum and votes again. cman tool status Nodes: 2 Expected votes: 3 Total votes: 2 Quorum: 2 nodei#

(There can be a delay in the information update. If your output does not agree with this, wait a minute and run the command again.) Go ahead and poweroff -fnode2. node2#

poweroff -f

13. Does the service stop or fail? Why or why not? Now only a single node is online, the cluster lost quorum and the service is no longer active. Check the values for cluster quorum and votes again. cman tool status Nodes: 1 Expected votes: 3 Total votes: 1 Quorum: 2 Activity blocked nodel#

14. Re-start nodes 2 and 3, and once again query the cluster quorum and voting values. Have they returned to their original settings? st at ionX# xm

create node2

stationX# 3an

create -c node3

Verify all three nodes have rejoined the cluster by running the "cman_tool status" command and ensuring that all three nodes have "Onl ine , rgmanager" usted in their status field. As soon as the two nodes are online again, the cluster adjusts the values back to their original state automatically. node3#CMan

tool status

Nodes: 3 Expected votes: 3 Total votes: 3 Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 04a4445d

Quorum : 2

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 04a4445d

1 Lab 8.2 Solutions

11/

1.

First, inspect the current post_j oin_delay and conf ig_vers ion parameters on both nodel and node2. nodel, 2# cd /etc/cluster

11)

nocie1,2# grep config_version cluster.conf nodel,2# grep post_join_delay cluster.conf node1,2# cman tool version node1,2# cman tool status 1 grep Version

111

2.

On nodel, edit the cluster configuration file, /etc/cluster/cluster. . conf, and increment the post_j oin_delay parameter from its default setting to a value that is one integer greater (e.g. change post_j oin_delay=3) to post_j oin_delay=4. Do not exit the editor, yet, as there is one more change we will need to make.

3.

Whenever the cluster. conf file is modified, it must be updated with a new integer version number. Increment your cluster . conf's conf ig_version value (keep the double quotes around the value) and save the file.

4.

On node2, verify (but do not edit) its cluster . conf still has the oid values for the post_j oin_delay and conf ig_vers ion parameters. a.

node2# cd /etc/cluster node2# grep config_version cluster.conf

11,

node2# grep post_join_delay cluster.conf node2# cman tool version node2# cman tool status 1 grep Version

5.

• •

e

On nodel, update the CCS with the changes, then use ccsd to propagate them to the other nodes in the cluster. Re-verify the information on node2. Was the post_j oin _delay and conf ig_ver s ion updated on node2? Is cman on node2 aware of the update? a.

nodel# CCS

tool update /etc/cluster/cluster.conf

node2# grep config_version cluster.conf node2# grep post_join_delay cluster.conf node2# cman tool version

1,

node2# cman tool status 1 grep "Config Version"

Copyright 02011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / ecb3f060

b. The changes should have been propagated to node2 (and node 3) and cman updated by the ccs_tool command.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / ecb3f060

1 Lab 8.3 Solutions 1.

First, verify our current number of journals. nodei.# gfs2_tool journals /mnt/gfs

2.

Confirm that the third node can currently not mount the GFS2 filesystem. mkdir /mnt/gfs mount /dev/CluserVG/gfslv /mnt/gfs /sbin/mount.gfs2: error mounting /dev/mapper/CluserVG-gfslv on /mnt/gfs: Invalid argument

node3# 111

11/

node3#

3. Verify that there is enough space on the filesystem to add another 128MB journal. nodel#

11>

4.

df -h I grep gfs

Add two more journals with the same size. gfs2_jadd -j2 /mnt/gfs

5.

Verify that the available space has been reduced by 2*128MB nodel.#

6.

df -h I grep gfs

Now mount the GFS2 filesystem on the third node. This time the command should succeed. node3#

mount /dev/ClusterVG/g1s1v /mnt/gls

5 5 1 5 1

• IP

1 Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 33856a1d

•

Lecture 9

1

Fencing and Failover Upon completion of this unit, you should be able to: • Define Fencing • Describe Fencing Mechanisms • Explain CCS Fencing Configuration

1

• e

• 1

•

•

• •

•

1

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. 11 you believe Red Hat training materiale ere being improperly ueed, copied, or distributed please email < training@redhat coz» or phone toll-free (USA) +1 (868) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 2ec9bed0

•

•

No-fencing Scenario

9-1

• What could happen if we didn't use fencing? • The live-hang scenario: • • • • • •

Three-node cluster: nodes A, B, C Node A hangs with 1/Os pending to a shared file system Node B and node C decide that node A is dead, so they recover resources allocated by node A, including the shared file system Node A "wakes up" and resumes normal operation Node A completes 1/Os to the shared file system Data corruption ensues...

If a node has a lock on GFS metadata and live-hangs long enough for the rest of the cluster to think it is dead, the other nodes in the cluster will take over its I/O for it. A problem occurs if the (wrongly considered dead) node wakes up and still thinks it has that lock. If it proceeds to alter the metadata, thinking it is safe to do so, it will corrupt the shared file system. 1f you're lucky, gfsfsck will fix it if you're not, you'Il need to restore from backup. I/O fencing prevents the "dead" node from ever trying to resume lis I/O to the storage device.

For use only by a student enrollad in a Red Hat training course taught by Red Hat, Inc. or a Red Hat Certified Training Partner. No parí of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red HM, Inc. ff you believe Red Hat training meteríais are being impropedy used, copied, or cfistributed pisase amad or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / c1eca2a6 .)n7

Fencing Components

9-2

• The I/O fencing system has two components: • •

Fence daemon: receives fencing requests as service events from cman Fence agent: a program to interface with a specific type of fencing hardware

• The fencing daemon determines how to fence the failed node by looking up the information in CCS • Starting and stopping fenced • •

Automatically by cman service script Manually using fence_tool

The fenced daemon is started automatically by the cman service: # service cman start

Starting cluster: Loading modules... done Mounting configfs... done Starting ccsd... done Starting cman... done Starting daemons... done Starting fencing... done [ OK ] f ence_tool is used to join or leave the default fence domain, by either starting fenced on the node to join, or killing f enced to leave. Before joining or leaving the fence domain, fence_tool waits for the cluster be in a quorate state.

The fence_tool join -w command waits until the join has actually completed before returning. It is the same as fence_tool join; fence_tool wait.

For use only by a student enrolled in a Red Hat training courae taught by Red Hat, Inc. or a Red Hat Certifiod Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materiale are being improperly used, copied, or distributed picase email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919)754 3700.

Copyright © 2011 Red Hat, Inc. All rinin+e• renc•es.....-..-1

RH436-RHEL5u4-en-17-20110428 / 6bcdfe18 ^^^

Fencing Agents

9-3

• Customized script/program for popular hardware fence devices Included in the curan package • /sbin/fence_* • Usually Perl or Python scripts • "fence_ The fence_node program accumulates all the necessary CCS information for 1/0 fencing a particular node and then performs the fencing action by issuing a call to the proper fencing agent. The following fencing agents are provided by Cluster Suite at the time of this writing: fence_ack_manual - Acknowledges a manual fence fence_apc - APC power switch fence_bladecenter - IBM Blade Center fence_brocade - Brocade Fibre Channel fabric switch. fence_bullpap - Bull PAP fence_drac - DRAC fence_egenera - Egenera SAN controller fence_ilo - HP iLO device fence_ípmilan - IPMI Lan fence_manual - Requires human interaction fence_mcdata McData SAN switch fence_rps10 - RPS10 Serial Switch fence_rsa - IBM RSA II Device fence_rsb Fujitsu-Siemens RSB management interface fence_sanbox2 - QLogíc SANBox2 fence_scsi - SCSI persistent reservations fence_scsi_test - Tests SCSI persistent reservations capabilities fence_vixel - Vixel SAN switch fence_wti - WTI network power switch fence_xvm - Xen virtual machines fence_xvmd - Xen virtual machines Because manufacturers come out with new models and new microcode all the time, forcing us to change our fence agents, we recommend that the source code in CVS be consulted for the very latest devices to see if yours is mentioned: http: //sources .redhat .com/cgi-bin/cvsweb.cgi/cluster/fence/ agents/?cvsroot=cluster For use only by a student enrollad in a Red HM training course laught by Red HM, Inc. or a Red HM Certified Training Partner. No part of this publication ~y be photocopied, duplícated, stored in a retrieval system, or otherviíse reproduced without prior written consent of Red HM, Inc. If you believe Red HM training meteríais are being improperly usad, copiad, or distributed pisase email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 5942f2bf 717

Lab 10.1: Quorum Disk Scenario:

In a two-node cluster where both nodes have a single vote, a split-brain problem (neither node can communicate with the other, but each still sees itself as perfectly functional) can result a fencing war, as bother nodes continuously try to "correct" the other. In this lab we demonstrate how configuring a quorum disk heuristic can help the split-brain cluster nodes decide (though not always absolutely) which node is OK and which is errant. The heuristic we will use will be a query of which remaining node can still ping the IP address 172.16.255.254. o/lab>

Instructions:

1.

Create a two-node cluster by gracefully withdrawing node3 from the cluster and deleting it from luc i's cluster configuration. Once completed, rebuild node3 using the rebuild cluster script. -

2.

View the cluster's current voting/quorum values so we can compare changes later.

3.

Create a new 10MB quorum partition named /dev/ sdaN and assign it the label myqdi sk.

4.

Configure the cluster's configuration with the quorum partition using luc i's interface and the following characteristics. Quorum should be communicated through a shared partition named /dev/sdaN with label myqdisk. The frequency of reading/writing the quorum disk is once every 2 seconds. A node must have a minimum score of 1 to consider itself "alive". If the node misses 10 cycles of quorum disk testing it should be declared "dead". The node should advertise an additional vote (for a total of 2) to the cluster manager when its heuristic is successful. Add a heuristic that pings the IP address 172.17 . X. 254 once every 2 seconds. The heuristic should have a weight/score of 1.

5.

Using a file editor, manually modify the following values in cluster. conf: expected_votes="3" two_node="0"

Observe the quorumd-tagged section in c luster . conf. Increment cluster . conf's version number (conf ig_version), save the file, and then

update the cluster configuration with the changes. 6.

Start qdi skd on both nodes and make sure the service starts across reboots.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / cdfafbOb

7.

Monitor the output of the clustat. When the quorum partition finally becomes active, what does the cluster manager view it as?

8.

Now that the quorum partition is functioning, whichever node is able to satisfy its heuristic becomes the "master" cluster node in the event of a split-brain scenario. Note: this does not cure split-brain, but it may help prevent it in specific circumstances. View the cluster's new voting/quorum values and compare to before.

9.

What happens if one of the nodes is unable to complete the heuristic command (ping)? Open a terminal window on whichever node is running the service and monitor messages in /var/ log/messages. On the other node, firewall any traffic to 172.17 . X. 254.

10. Clean up. Stop and disable the qdiskd service on both nodes. 11. Disable the quorum partition in luc i's interface. 12. Add node3 back into the cluster as you have done before. You will need to set the hostname, enable the initiator, re-install the ricci and ht tpd RPMs and start the ricci service before adding it back in with lucí. Don't forget to copy /etc /c luster/fence_xvm. key to it and reconfigure its fencing mechanism!

Copyright CID 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / cdfafbOb ^^^

Lab 10.1 Solutions 1.

Create a two-node cluster by gracefully withdrawing node3 from the cluster and deleting it from lucís cluster configuration. To gracefully withdraw from the cluster, navigate luc i's interface to and choose the Nodes link from the left sidebar menu. In the section of the window describing node3 . lusterX. example . com, select "Have node leave cluster" from the "Choose a Task..." drop-down menu, then press the Go button. To delete node3 from the cluster configuration, wait for the previous action to complete, choose "Delete this node" from the same drop-down menu, and then press the Go button. Once completed, rebuild node3 using the rebuild-c luster script. stat ionX# .

2.

rebuild cluster -

-

3

View the cluster's current voting/quorum values so we can compare changes later. nodel# cmantool status

3.

Create a new 10MB quorum partition named /dev/sdaN and assign it the label myqdi sk. fdisk /dev/sda nodel , 2# partprobe /dev/sda nodei# mkqdisk -c /dev/sdaN -1 myqdisk node #

Verify the quorum partition was made correctly: nodel.# mkqdisk

4.

-

L

Configure the cluster's configuration with the quorum partition using luc i's interface and the following characteristics. Quorum should be communicated through a shared partition named /dev/sdaN with label myqdi sk. The frequency of reading/writing the quorum disk is once every 2 seconds. A node must have a minimum score of 1 to consider itself "alive". If the node misses 10 cycles of quorum disk testing it should be declared "dead". The node should advertise an additional vote (for a total of 2) to the cluster manager when its heuristic is successful. Add a heuristic that pings the IP address 172.17 . X. 254 once every 2 seconds. The heuristic should have a weight/score of 1. In luc i, navigate to the cluster tab near the top, and then select the c lusterX link. Select the Quorum Partition tab. In the "Quorum Partition Configuration" menu, select "Use a Quorum Partition", then fill in the fields with the following values: Interval: 2 Votes: 1 TKO: 10 Mínimum Score: 1 Device: /dev/sdaN

Copyright © 2011 Red Hat. Inc.

RH426-RHFI 5lu4-gn-17-2011042F1 / nrifaffinh

Label: mycidisk Heuristics Path to Program: ping -cl -ti 172.17.X.254 Interval: 2 Score: 1

5.

Using a file editor, manually modify the following values in cluster. conf: expectedvotes="3" two node="0"

Observe the quorumd-tagged section in cluster. conf. Increment cluster. conf's version number (conf ig_version), save the file, and then update the cluster configuration with the changes. nodel# vi /etc/cluster/cluster.conf nodel# CCS tool update /etc/cluster/cluster.conf

6.

Start qdiskd on both nodes and make sure the service starts across reboots. nodel, 2ff service qdiskd start; chkconfig qdiskd on

7.

Monitor the output of the clustat. When the quorum partition finally becomes active, what does the cluster manager view it as? nodel# clustat -i 1

The cluster manager treats it as if it were another node in the cluster, which is why we incremented the expected_votes value to 3 and disabled two_node mode, aboye. 8.

Now that the quorum partition is functioning, whichever node is able to satisfy its heuristic becomes the "master" cluster node in the event of a split-brain scenario. Note: this does not cure split-brain, but it may help prevent it in specific circumstances. View the cluster's new voting/quorum values and compare to before. cman tool status Nodes: 2 Expected votes: 3 Total votes: 2 Quorum: 2

(truncated for brevity) 9.

What happens if one of the nodes is unable to complete the heuristic command (ping)? Open a terminal window on whichever node is running the service and monitor messages in /var/ log/messages. On the other node, firewall any traffic to 172.17.X.254. If nodel is the node running the service, then:

Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / cdfafbOb

nodel# tail -f /var/log/messages node2# iptables -A OUTPUT -d 172.17.X.254

-

j REJECT

Because the heuristic will not be able to complete the ping successfully, it will declare the node dead to the cluster manager. The messages in /var/log/messages should indicate that node2 is being removed from the cluster and that it was successfully fenced. 10. Clean up. Stop and disable the gdiskd service on both nodes. node1,2# service qdiskd stop; chkconfig qdiskd off

11. Disable the quorum partition in luc i's interface. Navigate to the Cluster List and click on the c lusterX link. Select the Quorum Partition tab, then select "Do not use a Quorum Partition", and press the Apply button near the bottom. 12. Add node3 back into the cluster as you have done before. You will need to set the hostname, enable the initiator, re-install the ricci and httpd RPMs and start the rico i service before adding it back in with luci. Don't forget to copy /etc/cluster/ f ence_xvm . key to it and reconfigure its fencing mechanism! cxn3# perl

pi e "s/HOSTNAME= . */HOSTNAME=node3 clus terX. example h? .comn /etc/sysconfig/network cXni# hostname node3.clusterX.example.com -

-

node3# /root/RH436/HelpfulFiles/setup-initiator -bl node3# yum -y install ricci httpd node3# service ricci start; chkconfig ricci on node3# scp nodel:/etc/cluster/fencexvm.key /etc/cluster

Copyright O 2011 . Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / cdfafb0b

Lecture 11

rgmanager Upon completion of this unit, you should be able to: • Understand the function of the Service Manager • Understand resources and services

For use only by a student enrollad in a Red Hat training course taught by Red Hat, Inc. or a Red HM Certífied Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise reproduced without prior wrítten consent of Red Hat, Inc. If you believe Red HM training materials are being improperty usad, copiad, or distributed pisase email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 920246d6 oAq

Resource Group Manager

11-1

• Provides failover of user-defined resources collected into groups (services) • rgmanager improves the mechanism for keeping a service highly available • Designed primarily for "cold" failover (application restarts entirely) •

Warm/hot failovers often require application modification

• Most off-the-shelf applications work with minimal configuration changes • Uses SysV-style init script (rgmanager) or API • No dependency on shared storage • • •

Distributed resource group/service state Uses CCS for all configuration data Uses OpenAIS for cluster infrastructure communication

• Failover Domains provide preferred node ordering and restrictions • Hierarchical service dependencies rgmanager provides "cold failover" (usually means "full application restad") for off-the-shelf applications and does the "heavy lifting" involved in resource group/service failover. Services can take advantage of the cluster's extensible resource script framework API, or simply use a SysV-style init script that accepts start, stop, restad, and status arguments.

Without rgmanager, when a node running a service fails and is subsequently fenced, the service it was running will be unavailable until that node comes back online. rgmanager uses OpenAIS for talking to the cluster infrastructure, and uses a distributed model for its

knowledge of resource group/service states. It is not always desirable for a service (a resource group) to fail over to a particular node. Perhaps the service should only run on certain nodes in the cluster, or certain nodes in the cluster never run services but mount GFS volumes used by the cluster. rgmanager registers as a "service" with CMAN: # cman_tool services type level name fence O default [1 2 3] dlm 1 rgmanager [1 2 3]

id state 00010003 none 00030003 none

For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or e Red Hat Certified Training Partner. No parí of this publication may be photocopied, duplicated, atorad in a retrieval system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materiele are being Improperly used, copied, or distributed please email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / a64f4b17

Cluster Configuration - Resources

11-2

• A cluster service is comprised of resources • Many describe additional settings that are application-specific • Resource types: • • • •

GFS file system Non-GFS file system (ext2, ext3)

• • •

NFS Client NFS Export Script

• • • • • • •

Samba Apache LVM MySQL OpenLDAP PostgreSQL 8

IP Address NFS Mount

Tomcat 5

The luci GUI currently has more resource types to choose from than system-config-cluster. GFS file system - requires name, mount point, device, and mount options. Non-GFS file system - requires name, file system type (ext2 or ext3), mount point, device, and mount options. This resource is used to provide non-GFS file systems to a service. IP Address - requires valid IP address. This resource is used for floating service IPs that follow relocated services to the destination cluster node. Monitor Link can be specified to continuously check on the interface's link status so it can failover in the event of, for example, a downed network interface. The IP won't be associated with a named interface, so the command: ip addr list must be used to view its configuration. The NFS resource options can sometimes be confusing. The following two fines explain, vía command-fine examples, some of the most important options that can be specified for NFS resources: showmount -e mount -t nfs : NFS Mount - requires name, mount point, host, export path, NFS version (NFS, NFSv4), and mount options. This resource details an NFS share to be imported from another host. NFS Client - requires name, target (who has access to this share), permissions (ro, rw), export options. This resource essentially details the information normally listed in /etc/export s. NFS Export - requires a name for the export. This resource is used to identify the NFS export with a unique name. Script - requires name for the script, and a fully qualified pathname to the script. This resource is often used for the service script in /etc/init . d used to control the application and check on its status. For use only by a student enrolled in a Red Hat training course taught by Red Hat, Inc. or a Red HM Certified Treining Partner. No part of this publication may be photocopied, duplicated, atorad in a retrieval system, or otherwise reproduced without prior written consent of Red HM, Inc. tf you believe Red HM training materiats are being improperly used, copiad, or distributed pisase amad or phone toll-free (USA) +1 (866) 626 2994 or +1 (919)754 3700.

Copyright 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 411e6fa7 9AZ

The GFS, non-GFS, and NFS mount file system resources have force umount options. The severa! different application resource types (Apache, Samba, MySQL, etc...) describe additional configuration parameters that are specific to that particular application. For example, the Apache resource allows the specification of ServerRoot, location of httpd. conf, additional httpd options, and the number of seconds to wait before shutdown.

For use only by a student enrolled in a Red Hat training course teught by Red Hat, Inc. or a Red Hat Certified Training Partner. No part of this publication may be photocopied, duplicated, stored in a retrievel system, or otherwise reproduced without prior written consent of Red Hat, Inc. If you believe Red Hat training materials are being improperly usad, copied, or distributed please email or phone toll-free (USA) +1 (866) 626 2994 or +1 (919) 754 3700.

Copyright © 2011 Red Hat, Inc.

RH436-RHEL5u4-en-17-20110428 / 336e0171 9A7

Start/Stop Ordering of Resources

11-4

• Within a resource group, the start/stop order of resources when enabling a service is important • Examples: • •

Should the Apache service be started before its

is mounted? Should the NFS server's IP address be up before the allowed-clients have been defined? DocumentRoot

• Several "special" resources have default start/stop ordering values built-in • /usr/share/cluster/service.sh

• Order dependencies can be resolved in the service properties configuration (GUI) From /usr/share/cluster/service. sh (XML file), we can see the built-in resource ordering defaults:

RH436.PDF

Short Description

Description

Comments

We need your help!