Emc 257758
Short Description
EMC VNX Backend Cleanup...
Description
"Backend cleanup process for factory re-installation of VNX OE for File (NAS software for VNX File)" ID:
emc257758
Usage:
14
Date Created:
12/10/2010
Last Modified:
02/03/2012
STATUS:
Approved
Audience:
Support
Question:
Backend cleanup process for factory re-installation of VNX OE for File (NAS software for VNX File)
Environment: EMC SW: VNX Operating Environment (OE) for File Environment: Product: VNX File/Unified Environment: Backend cleanup using nas_raid -s cleanup Environment: Factory re-installation using Express Installation DVD image on Control Station Problem:
Requirements to perform a factory re-installation of the Operating Environment for File (that is, NAS code) nas_raid script fails if system is part of a multi-domain system:
Problem:
Cannot cleanup domain master. Please move master to another array. Backend cleanup for factory re-installation of the File O/S Cautions:
During the cleanup process, the Control LUNs are zeroed out so as to make a fresh reinstallation possible. It should be noted that the Control LUNs and default Storage Group (~filestorage) are now part of the FLARE private LUN space, and no longer directly accessible from the GUI or NaviCLI. After the cleanup process, verify that all Control LUNs are owned by SP A on Chain 0, or the installation process will fail. Cleanup script may not remove other Storage Groups, Storage Pools, and the like. Backend cleanup script does not remove the default ~filestorage HBAUID records and must be done manually.
VNX FILE/UNIFIED BACKEND CLEANUP PROCEDURE Fix:
1.
Deconfigure Proxy ARP—the main task here is to get the SPs back on the 128.221.252 & 253 networks: # /nasmcd/sbin/clariion_mgmt –stop Note: If you cannot stop Proxy ARP services or cleanup the backend, see emc287103 for possible workarounds. LUNs 0 & 1 must be zeroed out in order to perform a fresh reinstall of File OE.
2.
Verify that the storage processors (SPs) are up and running with the default internal network IP addresses: # ping 128.221.252.200 PING 128.221.252.200 (128.221.252.200) 56(84) bytes of data. 64 bytes from 128.221.252.200: icmp_seq=1 ttl=128 time=0.535 ms # ping 128.221.253.201 PING 128.221.253.201 (128.221.253.201) 56(84) bytes of data. 64 bytes from 128.221.253.201: icmp_seq=1 ttl=128 time=0.353 ms
3.
Make sure /tftpboot directory is available at root of system--untar from /nas/tools if required: # cd / # tar zxvf /nas/tools/tftpboot.tar.gz
4.
Unset the NAS_DB environment and stop NAS Services: Note: If running dual Control Stations, shutdown CS1. If onsite, unplug the power cable from CS1 and leave it offline. # unset NAS_DB # /sbin/service nas stop
5.
Run the Cleanup script (which may take 15-20 minutes to complete): # cd /tftpboot/setup_backend # ./nas_raid -n ../bin/navicli -a 128.221.252.200 -b 128.221.253.201 –s cleanup Do you want to clean up the system [yes or no]?: yes Cleaning Storage Group "~filestorage" Removing LUN PXE boot slot 2... Starting NBS on all control LUN Zero LUN 1 with dd. Finished with LUN 1. Zero LUN 0 with dd. Finished with LUN 0. Removing diskgroup The following storage groups still exist: ~filestorage Removing spares Security domain removed Done Note: If the nas_raid script fails with 'Cannot cleanup domain master', you will need to remove any other systems from the domain before the script will complete. # /tftpboot/bin/navicli -h 128.221.252.200 domain -messner -remove 10.241.216.233 [SP IP of other domain to remove from the current domain]
6.
Verify that Control LUNs have been properly zeroed out: # /sbin/fdisk –l | grep partition Disk /dev/nda doesn't contain a valid partition table Disk /dev/ndb doesn't contain a valid partition table Disk /dev/ndc doesn't contain a valid partition table Disk /dev/ndd doesn't contain a valid partition table Disk /dev/ndf doesn't contain a valid partition table Note: It is possible that you may not have NBS access to the backend LUNs from your Blades. If this is so, you must first PXEBoot a blade in order to restore backend LUN access. The /dev/nde partition is not zeroed out. Optional Method for zeroing LUNs 0 & 1: # /nas/sbin/t2pxe -force_pxe ALL -->Force PXE boot of all servers, then if it reports
success, try to zero out LUNs 0 & 1 # dd if=/dev/zero of=/dev/nda bs=1MB count=134 # dd if=/dev/zero of=/dev/nde bs=1MB count=134 7.
Manually remove other Storage Groups, if necessary: # /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 storagegroup -list or # /tftpboot/bin/navicli -h 128.221.252.200 storagegroup -list Storage Group Name: SG_Celerra_c125 Storage Group UID: E2:12:0B:D6:F5:FC:DF:11:8F:CA:00:60:16:41:67:7D HLU/ALU Pairs: HLU Number ALU Number ------------------0 3 1 1 2 0 3 2 #/tftpboot/bin/navicli -h 128.221.252.200 storagegroup -destroy -gname SG_Celerra_c125 Destroy Storage Group SG_Celerra_c125 (y/n)? y
8.
Manually remove Pool luns from the ~filestorage Storage Group if required: # /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 lun -destroy -l 13 Are you sure you want to perform this operation?(y/n): y Cannot unbind LUN because its contained in a Storage Group Get List of HLU numbers for ~filestorage SG: # /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 storagegroup -list -gname ~filestorage Remove HLU Luns from ~filestorage: # /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 storagegroup -removehlu -gname ~filestorage -hlu 18 Remove HLU 18 from ~filestorage The specified operation will potentially affect a File System Storage configuration. Do you want to continue (y/n)? y
9.
Manually destroy Pool LUNs first, if necessary: # /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 storagegroup -removehlu -gname ~filestorage -hlu 25 -->Remove Pool lun from SG first
# /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 lun -list # /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 lun -destroy -l 0 -->Syntax for removing Pool luns once the StorageGroup has been destroyed Are you sure you want to perform this operation?(y/n): y 10. Destroy the Storage Pool once the Pool LUNs are removed: # /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 storagepool -list Pool Name: Pool 0 Pool ID: 0 Raid Type: r_10 # /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope 0 storagepool -destroy -id 0 Are you sure you want to perform this operation?(y/n): y # /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 storagepool -list 11. Manually destroy RAID Group LUNs and RAID Groups, if necessary: # /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope 0 getrg -lunlist -->In this example, there are RAID Group LUNs and a RAID Group to destroy RaidGroup ID: 1 List of luns: 78 # /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope 0 unbind 7 Unbinding a LUN will cause all data stored on that LUN to be lost. Unbind LUN 7 (y/n)? y # /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope 0 unbind 8 Unbinding a LUN will cause all data stored on that LUN to be lost. Unbind LUN 8 (y/n)? y # /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope 0 removerg 1 MetaLuns: # /tftpboot/bin/navicli -h 128.221.252.200 metalun -list -->There may be metaluns [e.g., 8184, 8185, etc] if layered apps were in use # /tftpboot/bin/navicli -h 128.221.252.200 metalun -destroy -metalun 12 -->Select metalun from list, within the 8184 lun 12. Verify whether any Control LUNs are trespassed from SP A to SP B: # /nasmcd/sbin/t2tty -c 2 "camshowconfig" CAM Devices on scsi-0: TID 00: 0:d0+ 1:d1+ 2:d2+ 3:d3+ 4:d4- 5:d5- -->d4 & d5 are trespassed to SP B CAM Devices on scsi-16: TID 00: 0:d6- 1:d7- 2:d8- 3:d9- 4:d10+ 5:d11+ 1291584475: ADMIN: 6: Command succeeded: camshowconfig Note: Through the use of - and +, the above output shows that Control Luns d4 and d5 are NOT on Chain 0 (SPA). These LUNs must be trespassed back to Chain 0 on all
servers before a new install can succeed. 13. Trespass back all Control LUNs to SPA Chain 0 as required using the following commands: # # # #
/tftpboot/bin/t2tty /tftpboot/bin/t2tty /tftpboot/bin/t2tty /tftpboot/bin/t2tty
–c –c –c –c
2 2 3 3
―camtrespass ―camtrespass ―camtrespass ―camtrespass
c0t0l4‖ c0t0l5‖ c0t0l4‖ c0t0l5‖
Note: In the above example, LUNs 4 & 5 were trespassed back to Chain 0 SPA on each of the two blades present on the system. 14. Verify existing Data Mover WWN HBAUID Records, remove HBAUID records, and verify: # /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope 0 storagegroup -list -gname ~filestorage |head -15 Storage Group Name: ~filestorage Storage Group UID: 60:06:01:60:00:00:00:00:00:00:00:00:00:00:00:04 HBA/SP Pairs: HBA UID SP Name SPPort -----------------50:06:01:60:C6:E0:14:97:50:06:01:69:46:E0:14:97 SP B 50:06:01:60:C6:E0:14:97:50:06:01:61:46:E0:14:97 SP B 50:06:01:60:C6:E0:14:97:50:06:01:68:46:E0:14:97 SP A 50:06:01:60:C6:E0:14:97:50:06:01:60:46:E0:14:97 SP A
2 3 2 3
# /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope 0 port -removehba -o -hbauid 50:06:01:60:C6:E0:14:97:50:06:01:69:46:E0:14:97 # /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope 0 port -removehba -o -hbauid 50:06:01:60:C6:E0:14:97:50:06:01:61:46:E0:14:97 # /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope 0 port -removehba -o -hbauid 50:06:01:60:C6:E0:14:97:50:06:01:68:46:E0:14:97 # /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope 0 port -removehba -o -hbauid 50:06:01:60:C6:E0:14:97:50:06:01:60:46:E0:14:97 # /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope 0 storagegroup -list -gname ~filestorage Storage Group Name: ~filestorage Storage Group UID: 60:06:01:60:00:00:00:00:00:00:00:00:00:00:00:04 15. Verify whether array security was destroyed. If not a shared system, manually destroy security. # /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 domain -list Security is not initialized. Security must be initialized before any domain operations can be performed in this system. Create a global administrator to initialize security. Note: The above return indicates that no security domain remains and has been destroyed--no further action required # /tftpboot/bin/navicli -h 128.221.252.200 domain -list Node: c250 IP Address: 128.221.253.201
Name: spb Port: 80 Secure Port: 443 IP Address: 128.221.252.200 (Master) Name: spa Port: 80 Secure Port: 443 IP Address: 10.241.216.235 Name: c250 Port: 80 Secure Port: 443 Note: The above return indicates that a security domain does exist and must be destroyed # /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope 0 domain -messner -destroy WARNING: You are about to destroy the local directories on the following systems:128.221.252.200 Please note that this operation will not update the master directory database.Proceed? (y/n) y # /tftpboot/bin/navicli -h 128.221.253.201 -user sysadmin -password sysadmin -scope 0 domain -messner -destroy WARNING: You are about to destroy the local directories on the following systems:128.221.253.201 Please note that this operation will not update the master directory database.Proceed? (y/n) y 16. Using the proper bootable Express Install media, reboot the Linux system and perform a "boot:install". For Dual Control Station environments, make sure that CS1 is powered off during the factory install of CS0. See the "Note" section below for a representative example of the questions and answers given for a typical Express Installation. Make sure to toggle the option from Yes to No when the screen for setting up the Control Station LAN IP address appears, since you DO NOT want to set the External IP address yet (you will use the VNX Installation Assistant after the installation is completed to set the Control Station name and IP address and to initialize the File/Unified system). Reboot CS0 after the File OE installation completes so as to generate the "Waiting for VIA..." initialization message. 17. Once CS0 has completed software installation and reboot, perform the factory installation of CS1 using either the DVD media or CD2 media, and keep CS0 powered up during the CS1 installation. At the end of the successful FIle OE installation on CS1, reboot it, and via the serial console, ensure that it displays the "Waiting for VIA..." initialization message. At this point, the dual CS environment is ready to be initialized using the VNX Installation Assistant. 18. Before running the VIA, however, perform the following actions, depending on whether the system is a File-only or Unified configuration: For File-only VNX Systems: a) A File-only installation should not have the -UnisphereBlock enabler installed--use navicli ndu -list to check. b) A File-only installation should have the -UnisphereFile enabler installed--use navicli ndu -list to check. c) Run the VNX Installation Assistant to complete the system initialization. For Unified VNX Systems:
a) A Unified installation should have both the -UnisphereBlock and -UnisphereFile enablers installed--use navicli ndu -list to check. b) Set the Unified flag on the Control Station: # /nas/sbin/nas_hw_upgrade -option -enable -clariionfc c) Run the VNX Installation Assistant to complete the system initialization. Typical Express Installation questions, inputs, and/or answers: 1. Express Installation using DVD or 2-disc CD set:
Notes:
boot:install ----------------Is this a Secondary Control Station (y/n/a)? n ----------------Is this a Control Station Fresh Install? yes ---------------Is this a Secondary Control Station? [yes or no]: no ---------------Accept the defaults for the "Primary Internal Network Setup", "IPMI Network Setup", and "Backup Internal Network Setup" screens ---------------DO NOT SETUP THE EXTERNAL LAN NETWORKING AT THIS TIME (we will setup the Control Station external LAN using the VIA initialization wizard after the File OE reinstallation is completed) For the Network Configuration screen, "Do you want to configure LAN (not dialup) networking for your installed system?" Tab to "No" and enter. --------------Detecting movers in cabinet: 2 Is this the expected number of movers in the cabinet? [yes or no]: yes --------------Pick a NAS Administrator username Username [default: nasadmin] : nasadmin New UNIX password: nasadmin Retype new UNIX password: nasadmin ---------------Do you wish to enable UNICODE? [yes or no]: yes 2. At the end of the File OE installation, log into the Control Station as nasadmin, su to root, then reboot the Control Station. When the following message is displayed at the Control Station serial console, initialize the Unified system using the VIA: # reboot --------------------------Waiting for VNX Installation Assistant to continue.......
View more...
Comments