Designing HPE Backup Solutions eBook (Exam HPE0-J77) First Edition Aleksandar Miljković
HPE Press 660 4th Street, #802 San Francisco, CA 94107
Designing HPE Backup Solutions eBook (Exam HPE0-J77) Aleksandar Miljkovic © 2016 Hewlett Packard Enterprise Development LP. Published by: Hewlett Packard Enterprise Press 660 4th Street, #802 San Francisco, CA 94107 All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without written permission from the publisher, except for the inclusion of brief quotations in a review. ISBN: 978-1-942741-32-9
WARNING AND DISCLAIMER This book provides information about the topics covered in the Designing HPE Backup Solutions (HPE0J77) certification exam. Every effort has been made to make this book as complete and as accurate as possible, but no warranty or fitness is implied. The information is provided on an “as is” basis. The author, and Hewlett Packard Enterprise Press, shall have neither liability nor responsibility to any person or entity with respect to any loss or damages arising from the information contained in this book or from the use of the discs or programs that may accompany it. The opinions expressed in this book belong to the author and are not necessarily those of Hewlett Packard Enterprise Press.
TRADEMARK ACKNOWLEDGEMENTS All third-party trademarks contained herein are the property of their respective owner(s).
GOVERNMENT AND EDUCATION SALES This publisher offers discounts on this book when ordered in quantity for bulk purchases, which may include electronic versions. For more information, please contact U.S. Government and Education Sales 1-855-447-2665 or email
[email protected].
Feedback Information At HPE Press, our goal is to create in-depth reference books of the best quality and value. Each book is crafted with care and precision, undergoing rigorous development that involves the expertise of members
from the professional technical community. Readers’ feedback is a continuation of the process. If you have any comments regarding how we could improve the quality of this book, or otherwise alter it to better suit your needs, you can contact us through email at
[email protected]. Please make sure to include the book title and ISBN in your message. We appreciate your feedback. Publisher: Hewlett Packard Enterprise Press HPE Contributors: Ralph Luchs, Ian Selway, Chuck Roman HPE Press Program Manager: Michael Bishop
About the Author Aleksandar Miljković has over sixteen years of IT experience with a strong background in UNIX systems and network security. His specialties include IT security, OpenStack based cloud computing, and HPE Helion CloudSystem, a fully integrated, IaaS and PaaS cloud solution for the enterprise. Aleksandar is currently involved in developing and delivering sales and technical training in areas such as virtualization, hybrid cloud, and backup and recovery systems and solutions.
Introduction This study guide helps you prepare for the Designing HPE Backup Solutions exam (HPE0-J77) that is a requirement for the HPE ASE - Storage Solutions Architect V2 certification. The exam tests your ability to define and recommend an effective Enterprise backup and recovery solution based on customer needs. Designed to help you identify the best components, this guide describes HPE's Backup, Recovery and Archive (BURA) solutions for data protection, and outlines recommended configurations for HPE’s StoreOnce Backup Systems.
Certification and Learning Hewlett Packard Enterprise Partner Ready Certification and Learning provides end-to-end continuous learning programs and professional certifications that can help you open doors and succeed in the New Style of Business. We provide continuous learning activities and job-role based learning plans to help you keep pace with the demands of the dynamic, fast paced IT industry; professional sales and technical training and certifications to give you the critical skills needed to design, manage and implement the most sought-after IT disciplines; and training to help you navigate and seize opportunities within the top IT transformation areas that enable business advantage today. As a Partner Ready Certification and Learning certified member, your skills, knowledge, and real-world experience are recognized and valued in the marketplace. To continue your professional and career growth, you have access to our large HPE community of world-class IT professionals, trend-makers and decision-makers. Share ideas, best practices, business insights, and challenges as you gain professional connections globally. To learn more about HPE Partner Ready Certification and Learning certifications and continuous learning programs, please visit http://certification-learning.hpe.com
Audience This book is designed for presales and solutions architects involved in supporting the sale of enterprise backup solutions. It is assumed that you have a minimum of one year of experience in storage technologies and have a need to identify and define the right components for a backup and recovery solution based on customer needs.
Assumed Knowledge To understand the concepts and strategies covered in this guide, storage professionals should have at least six months to one year of on the job experience. The associated training course, which includes numerous practical examples, provides a good foundation for the exam, but learners are also expected to have at least one year’s experience in storage technologies.
Relevant Certifications After you pass these exams, your achievement may be applicable toward more than one certification. To determine which certifications can be credited with this achievement, log in to The Learning Center and view the certifications listed on the exam’s More Details tab. You might be on your way to achieving additional certifications.
Preparing for Exam HPE0-J77 This self-study guide does not guarantee that you will have all the knowledge you need to pass the exam. It is expected that you will also draw on real-world experience designing and implementing storage solutions. We also recommend taking the Designing HPE Backup Solutions instructor led one-day training course (Course ID 01059534).
Recommended HPE Training Recommended training to prepare for each exam is accessible from the exam’s page in The Learning Center. See the exam attachment, “Supporting courses,” to view and register for the courses.
Obtain Hands-on Experience You are not required to take the recommended, supported courses, and completion of training does not guarantee that you will pass the exams. Hewlett Packard Enterprise strongly recommends a combination of training, thorough review of courseware and additional study references, and sufficient on-the-job experience prior to taking an exam.
Exam Registration To register for an exam, go to http://certification-learning.hpe.com/tr/certification/learn_more_about_exams.html
1 Downtime and Availability
CHAPTER OBJECTIVES In this chapter, you will learn to: Describe and differentiate the terms downtime and availability Explain the business impact of system downtime Describe and calculate the Mean Time between Failures (MTBF) and estimate its impact on the IT environment Describe the Mean Time to Repair (MTTR) and explain its importance in the overall recovery List and describe technologies that address availability, such as Redundant Array of Inexpensive (or Independent) Disks (RAID), snapshots, and replication, and contrast them with backup Explain and use Recovery Point Objective (RPO) and Recovery Time Objective (RTO) as parameters for backup planning
Introduction When it comes to downtime, availability, fault tolerance, backup, recovery, and data archiving, several topics come to mind. Almost everyone knows something about them, usually firsthand. Who has not faced a system failure, a loss of data, or some type of a disaster that affected normal operations of a network, the Internet, a point-of-sale system, or even your own computer? All these terms deal with protecting business or personal assets—either systems, applications, processes, data, or the ability to conduct business. Each varies, however, in its approach and methodology. This chapter lays the foundation for this course and the subsequent chapters by defining and describing the key concepts behind the quest for protecting IT resources. It begins by describing and differentiating between downtime and availability. It then explains how a component or a system failure, which leads to downtime, is measured and prevented. Lastly, it takes the opposite view, one of system availability, and how it can be achieved. Within the context of both downtime and availability, it positions backup and recovery.
Downtime Let us begin by focusing on what prevents a system from operating normally and what the consequences might be.
Defining downtime Downtime is a period of time during which a system, an application, a process, a service, or data is unavailable. During this time, a business entity cannot conduct sales operations, support a customer, provide a service, or conduct transactions, that is, if this business entity solely relies upon what just became unavailable. There are many kinds of downtime, but all cause some type of disruption to normal business operations.
These events cause loss of productivity, inability to communicate with clients, or even loss of sensitive or important information. Downtime of an IT system can cause loss of productivity of a single user, a workgroup, or even the entire company. Whenever downtime impairs business, the outage carries serious consequences. To prevent or minimize the impact of downtime, you must understand what causes downtime in the first place.
Describing causes of downtime
Figure 1-1 Causes of downtime and data loss or unavailability
Figure 1.1 lists the most typical causes of downtime and data loss or unavailability. Note that the largest cause of downtime is due to hardware failures and then to human error. Other causes, not shown in this chart, include the following: • • • • • • • •
Power outages Data center cooling issues Software bugs Cyber attacks Database inconsistencies or design flaws _____________________________________ (add your own) _____________________________________ (add your own) _____________________________________ (add your own)
Disasters, without a doubt, affect computer operations and data unavailability. A disaster in the IT world
is any event that disrupts a company’s computer operations and causes downtime. Disasters are not controllable, but precautions against them make the recovery much faster and easier. Disasters can be categorized according to the affected area: • Building-level incidents • Metropolitan area disasters • Regional events Furthermore, downtime can be planned as well as unplanned.
Building-level incidents
Figure 1-2 Disasters affecting a building
Disasters affecting a building (Figure 1.2) usually impact computer operations in that building as well. There may not be a direct damage to the IT systems, but these incidents may prevent access to them or to the data they host, or they may interrupt operations.
Metropolitan area disasters
Figure 1-3 Metropolitan area disasters
Usually floods, fires, large chemical incidents, moderate earthquakes, severe winter storms, or blackouts affect entire cities, impacting their infrastructure and disrupting IT systems (Figure 1.3).
Regional events
Figure 1-4 Natural disasters
Computer operations may be interrupted by natural disasters that affect an entire region within a radius of hundreds to tens of thousands of miles/kilometers. These disasters include large floods, hurricanes, earthquakes, political instability, and wars.
Planned vs. unplanned downtime Planned downtime occurs when system administrators intentionally restrict or stop the system operations to implement upgrades, updates, repairs, or other changes. During planned downtime, a particular time
period is set aside for these operations, which are carefully planned, prepared, executed, and validated. On the contrary, unplanned downtime is when an unintentional intervention restricts or stops the system availability. Planned downtime means downtime as well. Return, for a moment, to Figure 1.1, which shows the major causes of downtime and data loss/unavailability. Are these causes intentional/planned or unintentional/unplanned? If you answered unintentional/unplanned, you answered correctly. Overall, the majority of system and data unavailability is because of planned downtime due to required maintenance and upgrades. In fact, unplanned downtime accounts for only about 10% of all downtime, but its unexpected nature means that any single downtime incident may be more damaging to the enterprise, both physically and financially, than many occurrences of planned downtime. Thus, understanding the cost of downtime is critical in either case.
Estimating the cost and impact of downtime Quantifying downtime is not an easy task because the impact of downtime varies from one case to another. Losing a second of time in an air traffic control system or in a hospital life-support environment can have dire consequences, while losing hours in a billing system may not have a significant impact at all if these billing transactions were queued and committed only when the system became available again. Before you can calculate downtime, you must know its root cause. And not all root causes are strictly IT issues. To begin with, it is important to identify and understand both internal and external downtime threats—you need to know what and who has the potential to take your business down. You also need to know how. IT-related outages, planned or unplanned, can unleash a procession of costs and consequences that are direct and indirect, tangible and intangible, short term and long term, and immediate and far-reaching.
Tangible and direct costs Tangible and direct costs refer to expenses that can easily be measured and documented, are incurred up front, and are tracked in the business general ledger. These costs can be “(touched)” or “(felt)” or “(easily determined.)” Tangible and direct costs related to downtime include: • • • • • • •
Loss of transaction revenue Loss of wages due to employees’ idle time Loss of inventory Remedial labor costs Marketing costs Bank fees Legal penalties from inability to deliver on service-level agreements (SLAs)
Intangible and indirect costs Intangible and indirect costs refer to business impact that is more difficult to measure, is often felt or incurred at a later date, and is not related to a physical substance or intrinsic productive value. These costs are nonetheless real and important to a business’ success or failure. They can be more important and greater than tangible costs. Intangible and indirect costs related to downtime include the following: • • • • • • • •
Loss of business opportunities Loss of employees and/or employee morale Loss of goodwill in the community Decrease in stock value Loss of customers and/or departure of business partners Brand damage Shift of market share to competitors Bad publicity and press
Outage impact to business The cost that can be assigned to a measurable period of downtime varies widely depending upon the nature of the business, the size of the company, and the criticality of the IT system related to the primary revenue-generating processes. For example, a global financial services firm may lose millions of dollars for every hour of downtime, whereas a small manufacturer using IT as an administrative tool would lose only a margin of productivity. According to a Gartner document titled How Much Does an Hour of Downtime Cost?, for a conventional brick-and-mortar business, estimating the cost of an outage is relatively simple, compared to, let us say, a global financial services firm. In either case, such estimations are never trivial. Consider this example: Assume that a firm conducts business in Western Europe and North America during regular business hours. This firm needs its systems and services to be available 40 hours per week, or 2000 hours per year (accounting for holidays, vacations, and weekends). Therefore, the first order of approximation for the cost of an outage would be to distribute the firm’s revenue uniformly across those 2000 hours. Thus, if the firm’s annual revenue is $100 million, an average hour would represent $50,000 of its revenue. Consequently, one-hour outage would cost this firm $50,000. Two immediate objections arise to this assessment and both lead to important refinements. First, revenue is almost never distributed uniformly across all working hours. Second, most businesses experience seasonal fluctuations. Many retail organizations make 40% of their revenue and 100% of their profits in the last eight weeks of the year. One-hour outage on December 23 will have a much greater impact on the firm’s financial performance as compared to the same outage in late June, for instance. Therefore, the cost of an outage must reflect its potential impact at a particular time in the business cycle.
Table 1.1 shows the cost of downtime, estimated in 1998 by the Gartner Group. You can see that the cost of downtime is company-specific and related intangible costs are very difficult to estimate.
Table 1-1 The cost of downtime, estimated in 1998 by the Gartner Group Industry
Application
Average cost (per hour of downtime)
Financial
Brokerage operations
$6,500,000
Financial
Credit card sales
$2,600,000
Media
Pay-per-view
$1,150,000
Retail
Home shopping (TV)
$113,000
Retail
Catalog sales
$90,000
Transportation
Airline reservations
$89,500
Consequences of downtime What happens to an organization when its system goes down? Before you can predict downtime and plan its prevention, you must have a clear understanding of what happens when a system goes down. Specifically, what and who is impacted and how. You should consider the following: • Processes: Vital business processes such as order management, inventories, financial reporting, transactions, manufacturing, and human resources may be interrupted, corrupted, or even lost. • Programs: Revenue can be affected and a key employee or customer activities might be missed or lost. • Business: If customers cannot access a website, they might purchase from someone else. You might lose a customer now and forever. • People: Salaries might not be paid, and even lives could be lost due to downtime of life-sustaining medical systems. • Projects: Thousands of person-hours of work can be lost and deadlines can be missed, resulting in failure-to-perform fees and noncompliance penalties. • Operations: Those who manage daily activities of an organization may find themselves without the data they need to make informed decisions. The Gartner Group recommends estimating the cost of an outage to your firm by calculating lost revenue, lost profit, and staff cost for an average hour—and for a worst-case hour—of downtime for each critical business process. Not-for-profit organizations cannot calculate revenues or profits, so they should focus on staff productivity and qualitative assessment of the outage to their user community. An outage can weaken customer perception of the firm, harm the wider community, and derail a firm’s strategic initiatives, but these impacts may be difficult to quantify, and should, in most cases, be left unquantified. Any outage assessment based on raw, generic industry averages alone is misleading.
Predicting downtime Can you predict downtime and events that lead to it? If you could, would it mean that you could then better prepare for such events, or even prevent them from happening? There are causes of downtime which cannot be predicted, such as natural disasters or fire. You just have to determine the best prevention or recovery mechanism if they occur. On the other hand, failure rates of
computer system components can be predicted with a level of certainly. To determine and express failure rates of computer components, you can calculate the MTBF.
Defining MTBF MTBF is the measure of expected failure rates. To understand MTBF, it is best to start with something else—something for which it is easier to develop an intuitive feel. Let us take a look at a generalized MTBF measure for a computer component, such as a hard drive, a memory DIMM, or a cooling fan. This component has an MTBF of 200,000 hours. Since there are approximately 9000 hours in a year, 200,000 hours is about 21 years. In other words, if the MTBF of this component is 200,000 hours, it is expected that the component fails every 21 years. Now, take a sample of 10,000 units of this particular component and determine how many units fail every day, over a test period of 12 months. You may determine that • In the first 24 hours, two components fail. • In the second 24 hours, zero components fail. • In the third 24 hours, one component fails, and so on. You then ask • If five units fail in 12 months, how long would it take for all 10,000 units to fail at this rate? • If all units fail in regular intervals over this period, how long is this interval? If a failure is the termination of the component’s ability to perform its intended function, what is then MTBF? MTBF is an interval of time used to express the expected failure rate of a given component. It does not indicate the expected lifetime of that component and says nothing about the failure likelihood of a single unit.
Figure 1-5 Reliability bell curve: number of failures vs. time distribution
This reliability bell curve (Figure 1.5) is also known as a normal distribution, where the highest point of the curve (or the top of the bell) represents the most probable event (at time µ). All possible events are then distributed around the most probable event, which creates a downward slope on each side of the
peak. Given the reliability bell curve and a sample of components, most of them will fail around time µ. Some components will fail early in the cycle, whereas others will last much longer. But all of them will fail at some point on this curve, with a high probability that the majority of failures will be distributed according to the bell curve.
Note Trent Hamm has written a good explanation of the reliability bell curve in August 2014, using the analogy of low-end and more reliable washing machines. You can find this article at: http://www.thesimpledollar.com/the-reliability-bell-curve-what-does-more-reliability-actuallymean/.
Calculating MTBF How does one calculate an MTBF for an environment which consists of a number of such components? What if all these components were identical? What if they all were different? Once again, let us use an example: If you have 2000 identical units in your environment and each unit has the MTBF of 200,000 hours, what is the associated total MTBF? The formula for the total MTBF is as follows: MTBFtotal Therefore, MTBFtotal
=
= 100 hours = ~4 days
Within your particular environment of 2,000 identical units, you can expect a failure approximately every 4 days. It may be easy to understand the MTBF of a single unit, but in reality, systems are complex and consist of a number of different components. Within a complex environment, you can expect a failure of any particular component, a failure between components, or a failure of the entire system. The measure of a failure rate for a system consisting of one to n components, which may not necessarily be identical, is expressed as the MTBF for a complex system. Its formula is as follows: MTBF complex system= Even though individual MTBF rates for a single component have been improving, the entire environment is still vulnerable to component failures due to the interdependent nature of complex and multivendor solutions. Consider these examples: • Hardware: Storage systems may fail to provide adequate service due to MTBF problems or due to corrupted data (resulting from viruses or data integrity issues). Computing systems may lose power
or can have various other problems, such as memory or processor failures. One corner stone for a highly available environment is a good network. High-speed connections for transaction processing and robust connections for client/server backup are critical. • Software: The basis of any system is a stable operating system because a failure at the OS level may lead to vulnerability and data loss in everything that is at higher levels of the stack, such as applications, databases, and processes.
Defining and calculating MTTR You are looking at a component or a system failure rate in order to predict, prevent, and minimize downtime. There is another variable which has not yet been discussed, one of recovery. As soon as a unit fails, how long does it take to repair or replace it? To answer this question, another term is used, called MTTR (Mean Time to Repair or Mean Time to Recovery). MTTR represents the average time it takes to repair or replace the defective component and return the system to its full operation. It includes the time needed to identify the failure, to diagnose it and determine its root cause, and to rectify it. MTTR can be a major component of downtime, especially if you are unprepared.
Planning for and preventing downtime While planning for downtime, you have to be aware of the level of problem you are planning for. While you cannot influence metropolitan or regional incidents because they are usually naturally occurring disasters, you can create an appropriate disaster recovery plan and implement a disaster recovery solution once you know what it is you are planning for (also called local problem area). Many times, you can also put in place a solution that may reduce the impact of a disaster or even prevent it altogether. Lastly, regardless of the cause, location, or the level of downtime, some type of administrative intervention is usually required to quickly recover from the problem. This intervention always depends on the exact cause of the problem and differs from one case to another. To increase your chances of success, make sure that you have • • • • • • • •
Adequate problem detection in place A recovery plan that is validated and tested Protection technologies that ensure desired levels of availability Monitoring and diagnostic tools and processes Skilled staff educated in the adopted technologies and recovery processes _____________________________________ (add your own) _____________________________________ (add your own) _____________________________________ (add your own)
Learning check questions Reinforce your knowledge and understanding of the topics just covered by completing this learning check:
1. What measures the time needed to repair a component after it fails? a) MTBF b) MTTR c) RTO d) RPO 2. What measures the MTBFs of a complex system? a) MTBF b) MTTR c) RTO d) RPO 3. Floods, wildfires, tornados, and severe snowstorms are examples of what type of disasters? a) Building-level incidents b) Metropolitan area disasters c) Regional events d) Localized incidents 4. What is the major cause of system and data unavailability? a) User errors b) Metropolitan area disasters c) Planned downtime d) Inadequate backup strategy
Learning check answers This section contains answers to the learning check questions. 1. What measures the time needed to repair a component after it fails? a) MTBF b) MTTR c) RTO d) RPO 2. What measures the MTBFs of a complex system? a) MTBF b) MTTR c) RTO d) RPO 3. Floods, wildfires, tornados, and severe snowstorms are examples of what type of disasters?
a) Building-level incidents b) Metropolitan area disasters c) Regional events d) Localized incidents 4. What is the major cause of system and data unavailability? a) User errors b) Metropolitan area disasters c) Planned downtime d) Inadequate backup strategy
Availability Up to now, this module covered the effects of downtime of a computer system or unavailability of data due to a failure or a disaster. The opposite of downtime is availability (or uptime). This section discusses the terminology and methods of expressing and calculating the availability times and requirements.
Learner activity Before continuing with this section, see if you can define the terms below and answer the questions. Search the Internet if you need to. • In your own words, define these terms: o High availability: __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ o Fault tolerance: __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ o RPO: __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ o RTO: __________________________________________________
__________________________________________________ __________________________________________________ __________________________________________________ • Given the following type of failure or downtime, what would be the best prevention technology in your opinion? o Power outage: ____ o Hardware component failure: ____ o System-wide failure: ____ o Administrative or maintenance downtime: ____ o Regional or metropolitan disaster: ____
Defining availability Availability is a measure of the uptime of a computer system. It is a degree to which a system, a subsystem, or a component of a system continues running and being fully operational. High availability is something which is generally acceptable and desirable. Higher levels of availability, all the way to continuous or mission-critical availability, are usually expensive and complex to achieve.
Categorizing availability When a computer system plays a crucial role, availability requirements are dictated and enforced by the law or technical regulations. Examples of such systems include the hospital life-support equipment, air traffic systems, and nuclear power plant control systems. The Harvard Research Group defines five Availability Environments (AEs) in terms of the impact on the business and the end user/customer. Each successive level inherits the availability and functionality of the previous (lower) level. The minimum requirement for a system to be considered highly available is a backup copy of data on a redundant disk and a log-based or journal file system for identification and recovery of incomplete (“(in flight)”) transactions that have not been committed to a permanent medium. This environment corresponds to AE-1.
Figure 1-6 AEs as defined by the Harvard Research Group
Figure 1.6 summarizes these five AEs. They are defined as follows: • AE-4 (fault tolerance)—AE-4 covers business functions that demand continuous computing and environments where any failure is transparent to the user. This means no interruption of work, no lost transactions, no degradation in performance, and continuous 24×7 operation. • AE-3 (fault resilience)—AE-3 covers business functions that require uninterrupted computing services, either during essential periods or during most hours of the day and most days of the week throughout the year. The users stay online, but their transactions may need restarting, and they may experience performance degradation. • AE-2 (high availability)—AE-2 covers business functions that allow minimally interrupted computing services, either during essential periods or during most hours of the day and most days of the week throughout the year. The users may be interrupted but can quickly log back in. They may have to rerun some transactions from journal files, and they may experience performance degradation. • AE-1 (high reliability)—AE-1 covers business functions that can be interrupted as long as the integrity of the data is insured. The user work stops and an uncontrolled shutdown occurs, but the data integrity is ensured. • AE-0 (conventional)—AE-0 covers business functions that can be interrupted and environments where data integrity is not essential. The user work stops and uncontrolled shutdown occurs, and data may be lost or corrupted.
Achieving availability High levels of availability are not only expected but also required. Designers, engineers, and manufacturers build resilience, availability, and fault tolerance into components, subsystems, and entire solutions depending on what’s required and what the customer is willing to pay for. Such solutions are often built around these three goals:
1. Reduce or eliminate single points of failure by introducing and employing redundancy. This means that a failure of a single component does not cause the entire system to fail. 2. Include a failover mechanism from one component to its redundant counterpart. Depending on the solution design, the failover may be near instantaneous to taking a few seconds or minutes. 3. Provide adequate monitoring and failure detection, which then leads to quick recovery. As you might suspect, protection against MTBF-/MTTR-related problems plays a key role in achieving high levels of availability. To determine which highly available solution is right for you, you should 1. Determine the desired or the required level of availability. 2. Determine the cost of downtime. 3. Understand the recovery process, time, and procedures. 4. Focus on events that have negative aspects (events that can bring your system down as well as events that may prevent or delay its recovery). 5. Develop a cost-consideration model, which weighs the cost of downtime, as defined earlier in this chapter, versus the cost of the high-availability solution and alternatives. 6. Obtain the necessary approvals, funding, and support and then design and implement the solution. For a distributed system, if a particular part of the system is unavailable due to a communication or other failure, the system as a whole may still be deemed to be in the up state even though certain applications may be down. When running multiple applications, the system may be fully functional and available even if certain parts of the system are down.
Note The type of availability at the high end of the availability scale is called continuous availability or nonstop computing and is around 99.999% (“(five nines)”) of uptime or higher. This level of availability yields 5.26 minutes of downtime per year (25.9 seconds per month, 6.05 seconds per week, or 864.3 milliseconds per day), which is often not necessary. Many businesses are satisfied with the requirement that the system does not go down during normal business hours.
Note For additional details regarding high availability definition, requirements, and parameters, see: https://en.wikipedia.org/wiki/High_availability.
RPO and RTO Disaster recovery specialists examine the impact possibilities and the needs for availability in terms of recovery point (also referred to as RPO) and recovery time (also referred to as RTO; Figure 1.7).
Figure 1-7 RPO vs. RTO
Recovery Point Objective RPO is the acceptable amount of data loss, specified as the maximum % of data or the maximum length of time, due to an event or an incident. This is the point or state of data to which the recovery process of a given solution must be capable of returning. In terms of backups, the recovery point provides you with information necessary to determine the frequency of performing backup operations. RPO provides answers to these questions: • How much data is acceptable to be unprotected? • To what point in time do you need to recover (e.g., 24 hours, 1 hour, or several seconds)?
Recovery Time Objective RTO defines the length of time necessary to restore the system operations after a disaster or a disruption of service. RTO includes at minimum the time to correct the situation (“(break fix)”) and to restore any data. It can, however, include factors such as detection, troubleshooting, testing, and communication to the users. Recovery = environment break fix + restore from an alternate source (backup disk or tape)
RTO provides answers to these questions: • How long is the customer or user willing to wait for the data or application? • How long can you tolerate downtime?
Note Both RPO and RTO represent recovery objectives, or goals. These are not mandates and are usually not treated as absolute requirements. Most disaster recovery specialists and businesses select availability and recovery tactics that do not necessarily (or always) meet RPO and RTO, but get close. The availability requirements of a given solution or environment dictate the RTO and provide the basis for determining these backup parameters: • The type of backup • The number of restores that are necessary
• The location of the backup data • The speed and type of access to the backup data Here are a few questions that you should ask your customers (and collect answers for) when defining the appropriate protection and recovery strategy: • In terms of the operational impact of downtime: o What is more important, a fast recovery or a recovery to an exact state prior to the failure? Or are both important? o What is the impact on operations relative to the recovery point? If you do not resume where you left off, will the loss be inconvenient, damaging, or catastrophic? o _________________________________________________________________ (add your own) o _________________________________________________________________ (add your own) o _________________________________________________________________ (add your own) • In terms of the business impact of downtime: o What is the business impact of the recovery time? If you do not resume processing within seconds/minutes/hours/days, will it be inconvenient, damaging, or catastrophic to the business? o What is the most effective and efficient method to recover your information? o _________________________________________________________________ (add your own) o _________________________________________________________________ (add your own) o _________________________________________________________________ (add your own) Careful and complete assessment of the RPO and the RTO helps you define the protection and recovery strategy, and which availability technologies, products, and solutions you recommend and use.
Availability technologies As you can imagine, the list of availability approaches, technologies, products, and solutions is long. There are hardware solutions, software solutions, and a combination of both. There is replication, clustering, snapshots, clones, and RAID, to name just a few. Some protect against single points of failure, others protect against magnitudes of incidents and disasters. Table 1.2 provides a quick overview of some options that may be used to achieve the required levels of availability.
Table 1-2 Options that may be used to achieve availability To protect against:
Technology which can be used:
Power outages
UPS, generators
Component failures
Fault tolerance, RAID
Failure of the entire system, operating system, or interconnects
Fault tolerance, clusters
Administrative, planned downtime
Clusters
Regional or metropolitan downtimes
Remote copy
Redundant Array of Inexpensive (or Independent) Disks The first priority when working with data is to protect it from disk failures. Considering the high price of hard drives a few decades ago, companies struggled with two basic problems—disk reliability and disk size (or the available storage space). As a result, technology called Redundant Array of Inexpensive (or Independent) Disks (RAID) was created. The purpose of RAID is to combine smaller disk drives into larger ones or to introduce redundancy into a system to protect it against disk drive failures or both. Several types of RAID configurations were created, called RAID levels. These are most common: • RAID 0 (striping)—is used to create larger virtual disks from smaller physical ones. Data blocks are spread (striped) across these physical member disks with no redundancy. RAID 0 requires at least two physical disk drives. An advantage of RAID 0 (Figure 1.8) is its performance—it increases the number of spindles used to service I/O requests. However, due to no data protection, you should never use this RAID in a business-critical system.
Figure 1-8 RAID 0
• RAID 1 (mirroring)—is used to protect the data by making writes to two identical disk drives at the same time, thus keeping an exact copy in case of a disk failure. RAID 1 (Figure 1.9) requires at least two physical disk drives and requires them in an even quantity. It provides good performance and redundancy (it can withstand failure of multiple disk drives, as long as they are not mirrored to each
other). Its drawback is that the usable disk space is reduced by 50% due to information redundancy.
Figure 1-9 RAID 1
• RAID 5 (striping with parity)—was created as a cheaper alternative to mirroring, requiring fewer disk drives in the array and being able to sustain a loss of a single disk drive without losing data. RAID 5 requires at least three disk drives, provides good performance thanks to its data blocks being striped, and good redundancy (using distributed parity). It also provides 67%–93% of usable disk storage. RAID 5 (Figure 1.10) is a great, cost-effective choice for predominantly readintensive environments, but suffers from slow write operations.
Figure 1-10 RAID 5
• RAID 6 (striping with double parity)—is similar to RAID 5, but it can sustain a loss of two disk drives while keeping the data safe. It uses block-level striping for good performance and creates two parity blocks for each data stripe. It requires at least four disk drives. RAID 6 (Figure 1.11) is also more complicated to implement in the RAID controller or in a software.
Figure 1-11 RAID 6
• RAID 10 (combining mirroring and striping)—is a hybrid of RAID 1 and 0. It protects data by maintaining a mirror on the secondary set of disk drives while using striping across each set to improve data transfers. RAID 10 (Figure 1.12) requires at least four disk drives and an even number of them. It provides excellent performance and redundancy, which is ideal for mission-critical applications, at the cost of losing 50% of usable disk drive space.
Figure 1-12 RAID 10
Note For more information about standard https://en.wikipedia.org/wiki/Standard_RAID_levels.
RAID
levels,
go
to:
Important RAID technology is necessary for system uptime and data protection against disk drive failure, but it does not replace backups. It does not provide a point-in-time copy of the data, so the information is not protected in case of a major hardware failure, user error, logical corruption, viruses, or cyberattacks. It is also not designed to keep offline or offsite copies of your data, which is the primary purpose of backups.
Snapshots A snapshot is a point-in-time copy of data created from a set of markers pointing to stored data. Snapshots became possible with advances in the Storage Area Networking (SAN) technologies, and they complement the traditional RAID technology by adding backup functionality. Most of the snapshot implementations use a technique called copy-on-write. The idea is to make an initial snapshot and then further update it as data changes. Restoring to a specific point in time is possible as long as all iterations of the data are kept. For that reason, snapshots can protect data against corruption (unlike replication, which cannot). Another common snapshot variant is the clone, or the split-mirror, where reference pointers are made to the entire content of a mirrored set of drives, a file system, or a LUN every time the snapshot is created. Clones take longer to create as compared to the copy-on-write snapshots because all data is physically copied when the clone is created. There is also an impact on production performance when the clone is created since the copy process has to access the primary data at the same time as the host.
Note For more information about https://en.wikipedia.org/wiki/Snapshot_(computer_storage).
snapshots,
go
to:
Replication Replication is a technique which involves making a point-in-time copy of data on a storage device, which can be physically located in a remote location. Replication is sensitive to RTO and RPO and includes synchronous and asynchronous replication, where the data transfer to the remote copy is achieved either immediately or with a short time delay. Both methods create a secondary copy of the data that is identical to the primary copy, with synchronous replication solutions achieving it in real time. With replication, any data corruption or user file deletion is immediately (or very quickly) replicated to the secondary copy, therefore, making it ineffective as a backup method. Another point to remember with replication is that only one copy of the data is kept at the secondary location. The replicated copy does not include historical versions of the data from preceding days, weeks, or months, as is the case with backup.
Note For
more
information
about
replication,
go
to:
https://en.wikipedia.org/wiki/Replication_(computing).
Calculating system’s availability To calculate the overall availability of a system, you must know the system’s entire MTBF figure (based on the MTBFs individual components) and the MTTR. Then, you can use this formula: Availability =
Planning for availability It is often challenging to determine which level of availability is best for a given business environment, for these two reasons: 1. It is difficult to estimate the actual levels of availability that are required to meet given service levels while also meeting budgetary and implementation scheduling requirements. 2. Even if you are successful with the initial estimates, the system usage patterns often change and shift from one area to another, distorting the original assumptions and parameters. When planning for availability, consider seeking answers to these questions: • • • • • • • • • • • • •
Who are the customers or the users of the system and what are their expectations and requirements? How much downtime are they willing to accept? How much uptime are they willing to pay for? What and who depends on the service your system provides? What is the impact of downtime? How often do you expect the system to be down? How quickly can you recover from failures? What are the alternatives and their cost/benefit analysis? What is the budget and implementation timeframe? What is the required skillset for its implementation, operation, and maintenance? _____________________________________ (add your own) _____________________________________ (add your own) _____________________________________ (add your own)
Planning and designing for a specific availability level do pose challenges and tradeoffs that you must make. There are a wide selection of technologies, products, and solutions on the market today, each with a different methodology, architecture, requirements, benefits, and cost. No single availability solution fits every situation. Therefore, a good solution always depends on a combination of business requirements, application-specific parameters, implementation timetable, and available budget.
Learning check questions Reinforce your knowledge and understanding of the topics just covered by completing this learning check: 1. Match the availability level with its correct description:
2. What is a typical expected downtime of a fault-resilient system? a) None b) 1 hour per year c) 8.5 hours per year d) 3.5 days per year 3. Which RTO goal should be expected from a highly available system? a) Less than one second b) Seconds to minutes c) Minutes to hours d) Hours to days 4. What does the following formula calculate?
a) RTO b) RPO c) System availability d) Expected downtime
5. Which RAID levels can withstand a failure of two disk drives? (Select all that apply.) a) RAID 0 b) RAID 1 c) RAID 5 d) RAID 6 e) RAID 10 6. Which data protection technique creates reference pointers to the entire content of duplicate drives, file system, or LUNs every time a snapshot is made? a) Archiving b) Copy-on-write c) Replication d) Split-mirror e) Deduplication
Learning check answers This section contains answers to the learning check questions. 1. Match the availability level with its correct description:
2. What is a typical expected downtime of a fault-resilient system? a) None b) 1 hour per year c) 8.5 hours per year
d) 3.5 days per year 3. Which RTO goal should be expected from a highly available system? a) Less than one second b) Seconds to minutes c) Minutes to hours d) Hours to days 4. What does the following formula calculate?
a) RTO b) RPO c) System availability d) Expected downtime 5. Which RAID levels can withstand a failure of two disk drives? (Select all that apply.) a) RAID 0 b) RAID 1 c) RAID 5 d) RAID 6 e) RAID 10 6. Which data protection technique creates reference pointers to the entire content of duplicate drives, file system, or LUNs every time a snapshot is made? a) Archiving b) Copy-on-write c) Replication d) Split-mirror e) Deduplication
Summary There are many human errors, software glitches, hardware failures, and human-induced or natural disasters, all affecting IT operations. Downtime due to these disruptions can result in lost productivity, revenue, market share, customers, employee satisfaction, and in some extreme situations, even in loss of life. MTBFs and MTTR are two metrics that deal with downtime prediction and recovery. Availability is the opposite of downtime. It measures the time during which a system runs and delivers data, applications, or services. System availability varies from conventional systems to fully fault-tolerant systems. So does their cost. Backup and RAID technologies are not the same, and each has a different purpose. While both protect
your data, backups provide secondary copies of your data in case it is lost or corrupted, whereas RAID (except for RAID 0) provides some level of redundancy in case of a disk drive failure. Various RAID technologies exist, ranging from RAID 0 to RAID 10, each with different levels of protection and performance characteristics. RPO and RTO define the boundaries under which a system and its data are brought back into service after downtime. Redundant and fault-tolerant components address high-availability demands. Backup protects your data against multiple hardware-component failures, human errors, viruses, cyber-attacks, and data corruption.
2 Backup Strategies
CHAPTER OBJECTIVES In this chapter, you will learn to: Define a backup, explain its purpose, and describe the basic backup terminology List and describe backup types and explain which backup type is appropriate for a given situation List and describe the most common backup tape rotation schemes and related topics such as the overwrite protection and append period Explain the restore operations Define archiving and contrast it with backing up Describe data tiering and its benefits Explain the purpose of a backup and recovery plan and define the recommended steps in a sample backup and recovery plan List and describe common tape media drives and technologies
Introduction In the previous chapter, you learned about downtime and availability. You learned about what causes downtime, what impact it has on IT operations and on the business, and how it translates to cost. Then you also learned about availability—how to achieve it, how to calculate it, and how to plan for it. The previous chapter also made a clear distinction between availability technologies, such as RAID and replication, and explained why backups are critical in protecting business applications and data. This chapter focuses on backup strategies and related topics such as restore, archiving, and data tiering. As you may already know or suspect, backup is a wide topic, which must be understood to properly plan your backup strategies and fit into your overall protection implementation. Why? Because your backup is as good as the data you can restore from it.
Learner Assessment Before continuing with this chapter, take a few minutes to assess what you might already know by defining these topics:
Note The purpose of this exercise is to get you thinking about these topics. Do not worry if you do not know the answers; just put in your best effort. Each topic will be explained later in this chapter. • In your own terms, define backup and its purpose: ___________________________________________________________ ___________________________________________________________
___________________________________________________________ ___________________________________________________________ ___________________________________________________________ ___________________________________________________________ ___________________________________________________________ • In your own terms, define these concepts: Backup consistency ___________________________________________________________ ___________________________________________________________ ___________________________________________________________ ___________________________________________________________ Backup application ___________________________________________________________ ___________________________________________________________ ___________________________________________________________ ___________________________________________________________ Backup device ___________________________________________________________ ___________________________________________________________ ___________________________________________________________ ___________________________________________________________ Deduplication ___________________________________________________________ ___________________________________________________________ ___________________________________________________________ ___________________________________________________________
What is a backup? Now, consider these questions regarding backup: • • • • •
Is backup a protection against hardware failures? Does backup address disk-related problems? Is backup the best protection against loss of data due to hardware failures? Does backup always have the latest version of the data? Is backup still vulnerable to some data loss?
Everyone would agree that a backup is an insurance policy against data loss. A company depends on its information to stay in business, whether the information takes form of email messages, a customer database, research and development projects, or even unstructured data. An individual depends on his/her
information for work-related tasks. Even in our daily personal lives, we rely on information which we cannot (or do not want to) lose.
Backup definition
Figure 2-1 Backup
A backup (Figure 2.1) is the process of creating a copy of data, applications, or operating systems on a secondary storage medium. This copy is created, stored, and kept for future use in case the original is destroyed or corrupted. According to the Dictionary of Computer Terms: “(Backup is the activity of copying files or databases so that they will be preserved in case of equipment failure or other catastrophes. Backup is usually a routine part of the operation of large businesses with servers as well as the administrators of smaller business companies. For personal computer users, backup is also necessary but often neglected.)” In most cases, the source is data on a disk, such as files, directories, databases, and applications. The target is another disk or a tape, hosted by some type of a tape drive or a tape library. Since the backup is expected to be used for disaster recovery, it must be consistent. Consistency means that the data copy is the same as the original, and when restored, it is identical. Database consistency is achieved by consistency checks such as byte-by-byte comparison, cyclic redundancy checks, or reduced verification. The software that copies the data to its destination is called the backup application. The destination is called the backup device, such as a tape drive, with a medium onto which the copy of the data is written. The backup device can be physical or virtual. There are many benefits of virtualizing the target backup device, one of the most significant being the potential to take advantage of the deduplication functionality, and another to overcome the limitations of sequential physical devices, such as speed and size. A customer’s or your own company recovery requirements help you decide on the best way to back up the data, and there is rarely a case where all data should be treated the same way. This means that retention requirements (how long to keep the backup) vary for different types of data.
Backup purpose As already mentioned, backup is an insurance policy against data loss. It plays a critical role in • Disaster recovery (enables data to be restored after a disaster)
• • • •
Data archival (consists of files or records that have been selected for permanent or long-term preservation) Operational backup (enables a restore of a small or a selected number of objects, such as files or emails, after they have been accidentally deleted or corrupted) Recovery from data corruption (such as databases) Compliance with corporate guidelines and/or local, state, and federal requirements for data retention
Backup sets What should a backup consist of? There is no rule on what must be backed up and what can be left unprotected. Some of the good candidates to be included in the backup sets are the following: • Operating environments o Laptop operating systems o Desktop operating systems o Server operating systems (physical and virtual) • Applications o Enterprise Resource Planning (ERP) applications such as SAP, Oracle Applications, and PeopleSoft o Customer Relationship Management (CRM) applications such as SAP Digital, Siebel, Salesforce, and Microsoft Dynamics CRM o Databases, such as Oracle, UDP, and Microsoft SQL Server o Messaging applications, such as Microsoft Exchange • User and application data (for all above operating environments and applications) • Logs and journals o Application transaction logs o Database journals o File system journals Always consider including in your backup sets everything that you cannot afford to lose.
Backup types This section defines the archive bit, which is an important property of operating system files and plays a key role in performing backups, and lists and describes various types of backups.
Archive bit When a file is created or changed, the operating system maintains a flag, which is called the archive bit. The backup software uses this bit to determine whether the file has been backed up before. As soon as the file is backed up using either the full or incremental backup, this bit is turned off, indicating to the system that the file was saved to a secondary medium. If the file is changed again, the bit is turned back on, and
the file is flagged to be backed up again by the next full or incremental backup. Differential backups include only files that were created or modified since the last full backup. When a differential backup is performed, no changes are made to the archive bit.
Archive bit
ON
OFF
File must be backed up
File has been backed up
A copy is an option in many backup applications. Its function is similar to a full backup; however, the archive bit is not modified.
Learner assessment You may already be familiar with many types of backups. For fun and to test your preexisting knowledge, see if you can answer these questions:
Note The purpose of this exercise is for you to assess what you already know. Do not worry if you do not know the answers; just put in your best effort. Each topic will be explained later in this chapter. • Which backup type is described by these statements? It is also called a normal backup. It saves the entire content of the source disk on a file-by-file basis. After backing up individual files, it turns the archive bit off. a) Online backup b) Offline backup c) File-based backup d) Image-based backup e) Full backup • Which backup type is described by these statements? It enables restoring individual files to their original locations. It provides random access to individual files for a quick restore. It requires a full path to be saved in the backup set. It can create significant operating system overhead with a large number of small files. a) Online backup b) Offline backup c) File-based backup d) Image-based backup e) Full backup • Which backup type is described by these statements?
It supports 24x7 operations because it does not require a system shutdown. It can create a degradation in server and network performance. It may create data integrity and inconsistency issues. a) Online backup b) Offline backup c) File-based backup d) Image-based backup e) Full backup • Which backup type is described by these statements? It saves the entire disk at the block level. It is often called physical backup because it dumps the complete file system to a single file. It provides the fastest method to save or to recover a complete file system It requires a restore to identical disk. a) User-defined backup b) Offline backup c) File-based backup d) Image-based backup e) Full backup • Which backup type is described by these statements? It is performed upon a special request, which defines a set of files to back up. It is performed in addition to and outside of the standard tape rotation scheme. It is usually performed before a system or software upgrade or when necessary to save certain files. a) User-defined backup b) Offline backup c) File-based backup d) Image-based backup e) Full backup
Learner Assessment Answers This section contains answers to this chapter’s learner assessment questions. • Which backup type is described by these statements? It is also called a normal backup. It saves the entire content of the source disk on a file-by-file basis. After backing up individual files, it turns the archive bit off. a) Online backup b) Offline backup c) File-based backup
d) Image-based backup e) Full backup • Which backup type is described by these statements? It enables restoring individual files to their original locations. It provides random access to individual files for a quick restore. It requires a full path to be saved in the backup set. It can create significant operating system overhead with a large number of small files. a) Online backup b) Offline backup c) File-based backup d) Image-based backup e) Full backup • Which backup type is described by these statements? It supports 24x7 operations because it does not require a system shutdown. It can create a degradation in server and network performance. It may create data integrity and inconsistency issues. a) Online backup b) Offline backup c) File-based backup d) Image-based backup e) Full backup • Which backup type is described by these statements? It saves the entire disk at the block level. It is often called physical backup because it dumps the complete file system to a single file. It provides the fastest method to save or recover a complete file system It requires a restore to identical disk. a) User-defined backup b) Offline backup c) File-based backup d) Image-based backup e) Full backup • Which backup type is described by these statements? It is performed upon a special request, which defines a set of files to back up. It is performed in addition to and outside of the standard tape rotation scheme. It is usually performed before a system or a software upgrade or when necessary to save certain files. a) User-defined backup
b) Offline backup c) File-based backup d) Image-based backup e) Full backup
Offline backup For an offline backup, the administrator takes the applications and their data offline for the time required to do the backup. During the backup window, the host (such as a server or a storage array) is not available to its users, which is why this type of backup is usually done at times when the user demand is at its lowest (such as at night or on the weekends). The exact calculation of the overall system performance and the volume of backup data are both required to determine whether the actual time needed for the operation fits into the allotted backup window to avoid extended offline periods for the host. From the backup perspective, offline backup is the easiest and most secure form of performing backups, because the backup software usually does not have to deal with open files or performance issues. An offline backup can be complete or partial.
Online backup As more companies move toward 24-hour and 7-days-per-week operations, no clear backup window for offline backups exists. Online backup is the alternative and is performed during normal operating hours and with the host fully available to its users. Usually both the users and the backup environment see a degradation in the overall network and host performance during the backup window, thus influencing productivity. Because user and system files can be open during online backups, there is a danger to data integrity, which has to be solved appropriately by the backup software. Inconsistent files could mean a serious threat to the data integrity when restored from the backup medium. Incompletely backed up files (open files with ongoing write operations while the backup software is copying them to tape) may contain inconsistent data which could be spread across various databases in the enterprise, depending on how the databases are linked to each other. An example of inconsistency is database indices that no longer reflect the actual content of the database. Your backup software should at least offer the option of selecting what to do with open files—either copy them anyway, mark the files in a log file as suspect, or do not copy these files but inform the administrator via a log file or an alert. Some backup programs automatically retry open files after a certain time, when there may be a chance that the file has been closed in the meantime. An online backup can also be complete or partial.
File-by-file backup (file-based backup) With file-by-file backups, the information needed to retrieve a single file from the backup medium is retained and it is possible to restore files to their correct locations on the disk. The full path information for every file must be saved on the backup medium, which, in an environment with a large number of small files, could cause a significant overhead for the operating system. The operating system must access the file allocation table many times and can reduce the backup throughput by as much as 90%. The performance can also be influenced by other factors such as disk fragmentation. However, file-by-file
backup provides random access to individual files for a quick restore. Furthermore, this type of backup is used by all tape rotation schemes to back up selected files.
Image backup (image-based backup) An image backup: • Saves the entire disk at the block level. • Provides the highest backup performance due to large sequential I/Os (the tape drive is typically kept streaming). • Creates a snapshot of the disk at the time of the backup. • Requires a restore to an identical disk. • Is used for archiving because it does not restore individual files. Image backups may also be referred to as physical backups and are used to describe a dump of the complete file system to a single backup image file. Consequently, only the complete image can be restored. When the image backup is performed locally, the tape drive typically keeps streaming. If the image backup is done over the network to a distant tape drive, other activities on the network may influence the tape drive’s performance. An image backup is usually the fastest method to save or to recover a complete file system. In environments with an exceptionally high number of small files, this could be your best choice of backup methods.
User-defined backup A user-defined backup is • • • •
Performed upon a special request from a user who defines which set of files to back up Performed in addition to and outside of the standard tape rotation scheme Defined by the backup set which includes user files Performed before a system or a software upgrade
A user-defined backup usually means a special backup request where the user defines which files need to be copied (e.g., for a critical project). This can be the complete user account with all applications, configurations, and customizations, which can be especially useful when the user plans to do a major system change like operating system or software upgrades. Another request could consist of just the information base of the user, which includes certain applications and their data, or all information created only within a certain timeframe.
Full backup A full backup is also called a complete or normal backup. It saves the entire content of the source disk to the backup medium on a file-by-file basis. In networked environments, this source disk could be the disk
of a client machine. After backing up individual files, their archive bit is turned off. A full backup is time-consuming but critical to a full disaster recovery. Therefore, you should always start with a full backup.
Differential backup
Figure 2-2 Differential backup
A differential backup (Figure 2.2) copies all files which have been modified or created since the last full backup, but does not modify the archive bit. A differential backup is useful when you want to have the latest version of files on a tape in the latest backup set. When the same tape is used for the differential backups between full backups, usually the newer file versions are allowed to overwrite the older versions of the same file. Special caution must be taken when database log files are backed up—circular logging must be disabled and the log files must not be deleted after the backup. Since the archive bit of the files does not change during the differential backup, the files appear to the system as not being backed up and are saved again with the next differential backup.
Differential backup—data to be saved calculation and associated example
Data to be saved: • B = base amount of data • x = daily changes in % o = data overlap in % • t = number of days • = data to be saved on tape after t days of DIF backup
Example 1 Here is one example: • B = 200 MB data on disk (first full backup) • x = 10% daily changes o = 90% daily overlap • t = 5 days
Example 2 As a different example (from the above graphic), imagine that you have • 1 GB of data on the entire disk, which is backed up with a full backup on Monday onto its own tape.
• 50 MB of data that changes daily (there is a separate tape available for each day). Then with a differential backup: • Tuesday tape contains 50 MB. • Wednesday tape contains 100 MB • Thursday tape contains 150 MB If a complete data loss occurred on Friday, the restore operation would only need: • The full backup tape, which was created on Monday • The last good differential backup tape (the one created on Thursday) The advantage of a differential backup is that only two tapes are necessary to completely restore the system (the last full backup tape and the last differential backup tape). The disadvantages are the volume of data on these tapes, which grows every day, the time for the backup, which also increases every day, and only the latest versions of files can be restored.
Incremental backup
Figure 2-3 Incremental backup
An incremental backup (Figure 2.3) copies all files which have been modified or created since the last full or incremental backup and turns the archive bit off. With incremental backups, the complete history of the files is maintained. All files that have been created or modified since the last full or incremental backup are saved. Each revision or version of the files is available on the tape because the backup software does not overwrite the earlier versions of the same files but instead appends the data to the previous backup sets (or uses different tapes for each incremental backup). When backing up database log files, circular logging must be disabled. Like the normal backups, the incremental backup purges the log files after the backup process. The formulas to calculate the size of data to be saved on a certain day and the total size of data after a number of days to be restored are
• • • • •
B = the initial data size (the first full backup) x = daily changes in % g = daily growth in % t = number of days At = data to be saved on day t with incremental backups
• Tt = total size of data which must be restored after t days of incremental backups Example Imagine that you have • 1 GB of data on the entire disk, which is backed up with a full backup on Monday onto its own tape. • 50 MB of data changes daily (there is a separate tape available for each day). Then with an incremental backup: • Tuesday tape contains 50 MB • Wednesday tape contains 50 MB • Thursday tape contains 50 MB If a complete data loss occurred on Friday, the restore operation would need • The full backup tape, which was created on Monday • Every incremental backup tape (Tuesday, Wednesday, and Thursday) The advantages of the incremental backup are that the size of the backed up data on each tape remains almost the same, and the backup window is small each day. The disadvantage is that each day must be restored separately, thus consuming time.
When to choose incremental vs. differential backup Choose incremental backup when • You have a high percentage of daily changes in your data set. • You only have a limited backup window per day. • You can accept a relatively slow restore (which translates to a long downtime). Choose differential backup when • You have a low percentage of daily changes in your data set or the same files changing every day.
• The backup window is not an issue (you have enough time available for the backup). • You need a relatively fast restore (which translates to a short downtime). Selecting the appropriate type of backup (incremental or differential) is a trade-off between the time needed for the backup and the time needed for the restore, which depends on the size of data and the daily change rate. The type of backup must be considered when selecting the right tape rotation scheme. The formula for calculating the data growth rate is Tt = • • • •
B = initial data size on the disk (on the first day of calculation) g = growth rate in % of B per time unit t = time unit (days, weeks, months, ...) Tt = total data size after t
Example Imagine that you have • B = 200 MB of data on the disk • g = 2% daily growth How much data do you have after one year (t = 365)? T365 = 200 MB
Backup to disk Backups are traditionally connected with tape drives. Tape drives, despite their advances in speed and in capacity, do have disadvantages. One of the biggest disadvantages is their serial nature. For any bit of information located at the end of the tape, the preceding tape medium must be accessed in order for that information to be accessed. This does not pose a problem for large chunks of information, but when you must restore small amounts of data, such as a single mailbox or a file, the need for a different approach becomes obvious. Backup to disk (B2D) is a method of backing up large volumes of data to a disk storage unit. This method presents benefits such as • Backup to disk is very fast. In fact, the speed of B2D would likely be limited by the host’s performance capabilities rather than the infrastructure or the backup destination speed, since the B2D systems can use a dedicated iSCSI or Fibre Channel infrastructure and SSD drives to store the information. • HPE offers the B2D capabilities either through a direct backup to disk devices implemented by the Data Protector or by using dedicated B2D storage devices such as HPE StoreOnce, which provide much more functionality than a simple backup to a disk target.
• B2D allows for data deduplication at the source or at the target. • B2D allows for staged backup scenarios, where data is first backed up to a disk, and then, after the peek hours, moved to the tape. This type of backup is called backup-to-disk-to-tape. Such an environment allows for better speed, since the backup to disk can be almost instantaneous, thus improving the RTO and RPO times. • B2D allows for data replication to a remote site. This is particularly efficient when combined with deduplication so that only deduplicated data flows through the network.
Synthetic backup During the over-the-network backup, data could be moved from many clients to the central backup server, and then moved to a permanent storage such as a tape library. Moving data around the data center is always expensive and any operation such as creating a full backup of many clients can overload the network infrastructure. Considering that all backup sets (full, incremental, and differential) end up on a single backup server, it is possible to take the last full backup and append the incremental or differential backups to the full backup set. Once the appending procedure is done, the result is stored to the same backup target as the source data sets and is promoted to the full backup. This methodology is called synthetic backup. The main benefit of synthetic backups over classic full backups is that it does not involve the clients or moving the information over the network. If planned properly and executed at the right backup server at the right time, synthetic backup can be created almost instantly. Synthetic backups do not provide any advantages over the regular full backups when performing the restore procedures.
Data Protector virtual full backup HPE Data Protector provides the virtual full backup option as an efficient type of synthetic backup where data is consolidated using pointers instead of being copied. It is performed if all backups (the full backup, incremental backups, and the resulting virtual full backup) are written to a single file library that uses the distributed file medium format. The distributed file medium format is a media format available with the file library, which is a prerequisite for synthetic or virtual full backup functionality.
Working set backup A working set backup is similar to a differential backup; however, the user specifies a range of time and only data that has been created or modified since the last full or incremental backup is backed up, plus the data that has been accessed (“(touched)”) during the specified time frame. Working set backups can speed up the recovery of a crashed server because you only need to restore the working set backup to get the server up and running again and then restore the latest full backup at a later time (if necessary). The advantages of using the working set backup are the following: • Restoring a system backed up with a working set strategy requires only the medium containing the latest working set backup and the most recent full backup.
• You can perform a working set backup, restore the data to a new system, and have the system up and running faster as compared to restoring a full backup and all incremental or differential backups. • Working set backup takes less time to run than full backups. The disadvantage is that along with all files accessed during the specified time, all files created or modified since the last normal or incremental backup are included on each medium, thus creating redundant working set backups.
Block-level incremental backup A typical database, such as Oracle or Microsoft SQL Server, consists of a small number of relatively large container files. Inside those files, there is usually only a small fraction of real user data changes between backups. Thus, a standard incremental backup copies the entire container file even though only a small percentage of the data changed (effectively becoming a full backup with a resulting high usage of tape space and generated I/Os). With Block Level Incremental Backup (BLIB), only those database blocks that have been changed within the container file are recorded in a changed block map. Backup applications with the BLIB functionality, such as Veritas NetBackup, copy only the changed database blocks based on the information in the changed block map and typically take very little time and use only small percentage of backup medium storage and I/O bandwidth. A full database restore requires the last full backup of the database and all block-level incremental backup sets since then. Usually, the BLIB functionality is part of the backup application’s database agent. BLIB increases data availability because • It can reduce or eliminate the need for backup windows. • It allows for more frequent backup schedules. • The backup sets contain up-to-date data.
Snapshots (frozen images, off-host backups, or nondisruptive backups) The need to run backups without stopping the applications led to the development of the frozen image technology or snapshots. With snapshots, an image of the online data is frozen. A snapshot is a record of blocks of data at a particular instant. The image is backed up while the application continues to update the main data. There are two popular snapshot techniques: • Breakaway mirrors (clones) • At least two identical sets of data are maintained on separate disks. Before a backup begins, all database activity is quiesced (paused) to ensure that buffers and cached data are flushed to disk. Then one of the disks (the clone) is split off and remounted in the read-only mode and then backed up. During the backup, the application can update the other disk. • Breakaway mirrors are reliable, but they require at least two sets of data. They require additional storage and generate I/O traffic needed to synchronize the disks after backup. To reduce these disadvantages, copy-on-write was developed.
• Copy-on-write (point-in-time) • To create the snapshot, all access to the data is blocked (the application is quiesced), and the cache is flushed to disk to make the data consistent. All changes to the data are recorded in a changed block map (these actions take only a few seconds at a certain point in time (X), after which the application can resume its operations). • The snapshot itself is then mounted as a read-only file system. When the application writes the data, the original data being overwritten is first copied to the snapshot and the changed block map is updated. When the backup (e.g., at the time of X + 10 minutes) reads the data, it is delivered from the original location, if it has not been modified, or from the snapshot (at the time X), if the original has been modified. The snapshot insures point-in-time data integrity.
Backup tape rotation schemes A proper backup strategy should attempt to • Maintain a file history by saving several versions over a period of time • Minimize the backup window by only saving what has changed The solution for these tasks is a tape rotation scheme. The three most popular tape rotation schemes, which are configurable with many backup applications, are discussed in this section. In addition to the standard features, the tape rotation schemes provide older copies of data on tapes, which can be archived offsite in a secure location to provide disaster tolerance.
What are backup tape rotation schemes? A good plan is necessary to ensure that your backups are performed at the appropriate intervals and not only when needed—because data is valuable and difficult to replace. Most companies back up their data on a daily basis when the network is least busy. These backups may need to be performed more or less frequently, based on the criticality of the data. A regular and scheduled backup should address these issues: • Speed of recovery after a disaster occurs • Redundancy and history of the data • Efficient use of tape media (although tapes are relatively inexpensive) Automation helps to increase the overall speed of both backup and restore. The appropriate backup schedule and the data version history can be automated with a tape rotation scheme. The longer a company needs to keep its data, the more portable media (tape cartridges) are needed. The three biggest advantages of tape rotation schemes are the following: • Automation • Archiving • File history Working with tape drives requires planning. Considering a relatively limited capacity and the price of the
tape media compared to the volume of data being backed up, you want to reuse the tape cartridges by rotating them. While using a single external tape drive, it is the administrator’s task to replace the tapes and enforce the rotation order. Modern high-capacity tape libraries implement the tape rotation themselves and the administrator selects the rotation scheme as an option within the backup server settings. The most popular backup tape rotation schemes are the following: • First-In, First-Out (FIFO) • Grandfather–father–son • Tower of Hanoi
First-In, First-Out (FIFO)
Figure 2-4 FIFO
The FIFO (Figure 2.4) backup scheme saves new or modified files on the oldest medium in the backup set (e.g., the medium which contains the oldest and thus least useful previously backed up data). Performing a daily backup onto a set of 14 media, the backup depth would be 14 days. Each day, the oldest medium would be inserted when performing the backup. This is the simplest rotation scheme and is usually the first to come to mind. This scheme has the advantage of retaining the longest possible tail of daily backups. It can be used when the archived data is unimportant (or is retained separately from the short-term backup data) and any data before the rotation period is irrelevant. This scheme suffers from the possibility of data loss: suppose an error is introduced into the data but the problem is not identified until several generations of backups and revisions have taken place. When the error is detected, all backup files contain this error.
Grandfather–father–son
Figure 2-5 The grandfather–father–son
The grandfather–father–son backup (Figure 2.5) refers to a common rotation scheme for the backup media. In this scheme, there are three or more backup sets, such as daily, weekly, and monthly. The daily backups are rotated on a daily basis using the FIFO scheme described earlier. The weekly backups are similarly rotated, but on a weekly basis, and the monthly backups are rotated on a monthly basis. In addition, quarterly, half-yearly, and/or annual backups could also be separately retained. Often, some of these backups are removed from the site for safekeeping and disaster recovery purposes.
Tower of Hanoi
Figure 2-6 The Tower of Hanoi rotation method
The Tower of Hanoi rotation method (Figure 2.6) is the most complex rotation scheme to explain and use. It is based on the mathematics of the Tower of Hanoi puzzle, using a recursive method to optimize the backup cycle.
Note There are several videos available on the Internet that visually explain the Tower of Hanoi algorithm. One such video is the 4-minute 11-second video titled The Towers of Hanoi Algorithm at https://www.youtube.com/watch?v=o8iTGJcEO-w.
Figure 2.7 represents the five tape rotation schedule, using tapes labeled as A, B, C, D, and E.
Figure 2-7 Five tape rotation schedule
In this schedule, one tape set A is used every other backup session (daily sessions 1–16 are shown in this example across the top). Start day 1 with tape set A and repeat every other backup session (every other day). The next tape set B starts on the first non-A backup day (day 2) and repeats every fourth backup session (days 6, 10, and 14). Media set C starts on the first non-A or non-B backup day (day 4) and repeats every eighth session (day 12). Media set D starts on the first non-A, non-B, or non-C backup day (day 8) and repeats every sixteenth session (day 24, not shown). Media set E alternates with media set D. The Tower of Hanoi rotation method implies that you keep only one backup per level (row in the above table) and delete all outdated backups. The method allows for efficient data storage by having more backups accumulating toward the present time. If you have four backups, you can recover data as of today, yesterday, half a week ago, or a week ago. With five levels, you can recover data backed up two weeks ago. Every additional backup level doubles the maximum rollback period for your data. This schedule can be used in either a daily or a weekly rotation scheme. The decision regarding the frequency of rotation should be based on the volume of data traffic. To maintain a reasonable history of file versions, a minimum of five tape sets should be used in the weekly rotation schedule, or eight for a daily rotation scheme. As with the grandfather–father–son rotation scheme, tapes should be periodically removed from the rotation for archival purposes.
Overwrite protection and append periods
Figure 2-8 Overwrite protection period
The overwrite protection period (OPP; Figure 2.8) defines the length of time to retain the data on the medium before it can be overwritten. OPPs provide a foundation for all tape rotation schemes and are defined per media or per media set. OPP is measured from the last time the data was written to the medium. The append period (APP) defines how long you are prepared to allow appending more jobs to a medium once you began writing to it.
Example In a weekly full backup, you use four media or media sets over a period of four weeks (one set for each week). Thus, OPP is set to four weeks for each set. This protects the data from being accidentally overwritten before the period is over. The same principle applies to daily incremental or differential backup media (the protection period is set to one week, for instance). A daily (incremental or differential) backup should be performed each day at the same time. The backup starts at 1:00 and takes around 45 minutes to complete. You want to use the same tape every day and set the backup application to overwrite.
Media rotation policies Traditional backup strategies used with older backup applications required a thoroughly planned and well defined media rotation policy controlled by the administrator rather than the backup application. Modern backup tools, such as the HPE Data Protector, allow you to implement a rotation policy by specifying usage options, such that media selection for subsequent backups. A media rotation policy defines how media are used during the backup. The definition includes answers to these questions: • How many backup generations are needed? • Where are the media stored?
• How often are the media used? • When can the media be overwritten and reused for new backups? • When do the media become old enough to be replaced? Media rotation is implemented with these characteristics: • Because the media are grouped into media pools, you no longer need to manage a single medium. The software automatically tracks and manages every medium in the media pools. • You do not need to decide to which medium the backed up data is written to. You back up to a media pool. • The software automatically selects the medium from the media pool according to the medium allocation policy and usage options you specified. You can also disable the automatic selection and perform a manual medium selection. • The medium location is tracked and displayed in the user interface once the medium is configured with the backup software. • The software tracks the number of overwrites on the medium and the medium age, thus its condition.
Backup in a cloud Cloud computing became widely accepted form of handling information and services. In early days of cloud computing, companies had to deal with issues such as network speed, security, cost, and volume of data hosted in the cloud. Little by little, these early issues have been addressed by infrastructure improvements.
Figure 2-9 Cloud types
Based on the location of data, three models of cloud (Figure 2.9) computing exist • Public cloud—runs beyond the company firewall and is usually hosted at a remote location operated by a company specializing in cloud hosting (called the cloud hosting provider). The cloud hosting provider makes the cloud resources, such as applications, compute, or storage, available to the
general public subscribers over the Internet. These cloud resources may be free or offered on a payper-use basis. Security, performance, and availability are the biggest concerns with public clouds. • Private cloud—contains data and services that are hosted behind a company’s firewall. A private cloud is dedicated to a single organization and although it often offers the power, efficiency, and functionality of a public cloud, it features the security, control, and performance of a dedicated computing environment. • Hybrid cloud—includes a mix of private and public hosting to get the best features from both the private and public clouds. The type of the cloud, or the cloud design, affects planning of data backups. With the private cloud, the data is within the data center. With the public cloud, the data is at a third-party site. Many cloud service providers offer some type of a remote backup, but the questions of the backup availability, location, security, and compliance remain. In hybrid clouds, backups can experience issues and challenges related to both the public and private clouds.
Learning check questions Reinforce your knowledge and understanding of the topics just covered by completing this learning check: 1. Your customer has these backup requirements: Low percentage of daily changes in the data set; some files change every day Enough time available for backups Fast restore time and short downtime requirement Which type of backup should you recommend? a) Full backup b) Incremental backup c) Image-based backup d) Differential backup 2. What are the requirements to fully restore a database which was backed up with BLIB? (Select two.) a) Last full backup b) Last block-level incremental backup c) All block-level incremental backups since the full backup d) Last block-level differential backup e) All block-level differential backups since the full backup 3. Which backup types reset the archive bit associated with the operating system files? (Select two.) a) Differential b) Full c) Incremental d) Copy e) Image
f) Block-level incremental 4. What does the block-level incremental backup use to determine what to back up? a) The archive bit b) The changed block map c) The file allocation table d) The database log file
Learning check answers This section contains answers to this chapter’s learning check questions. 1. Your customer has these backup requirements: Low percentage of daily changes in the data set; some files change every day Enough time available for backups Fast restore time and short downtime requirement Which type of backup should you recommend? a) Full backup b) Incremental backup c) Image-based backup d) Differential backup 2. What are the requirements to fully restore a database which was backed up with BLIB? (Select two.) a) Last full backup b) Last block-level incremental backup c) All block-level incremental backups since the full backup d) Last block-level differential backup e) All block-level differential backups since the full backup 3. Which backup types reset the archive bit associated with the operating system files? (Select two.) a) Differential b) Full c) Incremental d) Copy e) Image f) Block-level incremental 4. What does the block-level incremental backup use to determine what to back up? a) The archive bit b) The changed block map
c) The file allocation table d) The database log file
What is a restore? This short section explains what a restore is and what possible challenges could be with restore operations.
Restore definition
Figure 2-10 Restore process
Earlier in this chapter, the Dictionary of Computer Terms was used to define a backup. How is a restore defined according to the Dictionary of Computer Terms? “(The retrieval of files you backed up is called restoring them.)” A restore is a process (Figure 2.10) that recreates the original data from a backup copy. This process consists of the preparation and the actual restoration of the data, as well as any applicable postrestore actions that make this data ready for use. The source of the restore process is the backup copy. A restore application is the software that writes the data to its destination. The destination is usually a disk to which the original data is written. It was already mentioned that without a backup, you cannot restore your data. Both the backup and restore operations must be part of your backup and restore strategy (your recovery plan). Many businesses do not place enough emphasis on running the backup operations, other than that they must complete successfully and not impact the normal business operations. Although you may be able to get away with achieving these two primary goals during a backup, you must pay enough attention to your restore operations, especially how quickly you can restore your data and return to production.
Possible problems with restore operations When files are deleted after the last backup, they are no longer on the disk, but they are still part of the last backup set on the backup medium. Eventually, the freed disk space could be used for new files, especially if the disk usage was close to its capacity. Unfortunately, with standard backups, there is no mechanism to track deleted files and keep their logs in a
journal. When you want to restore your backup sets (e.g., the last full backup and five incremental backups created thereafter), you may end up restoring more data than available storage. In this case, you can get an error message stating that your disk is full. Some backup applications, such as Tivoli, Legato NetWorker, and Veritas NetBackup, have an option to enable logging of deleted files. With this option, which must be enabled during the backup, it is possible to skip files that have been intentionally deleted from the disk while restoring the backup sets.
What is archiving? You may be wondering whether backing up and archiving are the same, or, if they are different, what their differences are. This short section explains just that.
Differences between backing up and archiving Depending on the goals of your data protection, you can decide to use archiving or backing up. Although sometimes used together in the same context, backup and archiving have a different purpose. A backup is used to keep copies of the data for data protection purposes, while archiving is done as a means of data management for keeping the data organized for a long term. In other words, a backup is used for short-term data protection and it might contain multiple instances of the data, while archiving includes arranging and indexing the information in order to preserve the data for a long time. Archives usually contain a single instance of the data. You can delete the original data once it is archived because accessing this information immediately is usually not required any longer. However, in reality, backed up data is usually not deleted and often continues occupying the primary storage.
Data retention and archiving Minimum record retention requirements are defined by laws, which vary by state and by country. For example, the state of Massachusetts in the USA has these legal requirements. Table 2-1 Minimum record retention requirements in the state of Massachusetts in the USA Archiving purpose
Data retention
Business records
7 years to permanent
Contracts, leases
7 years to permanent
Employee and HR records
3 years
Payroll and benefits records
3 to 7 years
Offsite archiving should be an integral part of any backup and restore process and is a critical component of any disaster recovery. A full backup of all important data, such as applications, operating systems, and databases, must be stored offsite in a secure location (ideally, a geographically remote location from the data center). Archiving also fulfills the company’s legal obligations for records retention, as listed above. The tape
media must be selected to also meet the archival life requirements: Table 2-2 The archival life requirements Tape technology
Archival life
9-track
1–2 years
8 mm
5–10 years
4 mm
8–10 years
3480 cartridge
12–15 years
DLT tape
20–30 years
LTO Ultrium
20–30 years
What is data tiering? There are many types of information companies have to deal with. It is important to understand the difference between these various types of information to properly plan for a backup. Data tiering deals with the concept of information types and arranging them into categories. To illustrate data tiering, consider a single virtual machine running a database service. A typical database configuration running within a virtual machine is shown in Figure 2.11.
Figure 2-11 A typical database configuration running within a virtual machine
This configuration consists of several backup candidates: • Virtual machine OS boot disk, which o Contains the operating system image o Changes only during the system update o Is usually provided by the cloud service provider in the form of an ephemeral (lasting a short time) disk image o Does not need to be backed up if it is provided as an ephemeral image (you may want to keep a copy which does not change when you upgrade the system image) o Is a candidate for data tier 1 • Database configuration, which o Consists of configuration files created during the database installation and configuration o Does need to be backed up only if the configuration changes
o Becomes assigned to data tier 2 • Database installation, which o Consists of binaries for the database installation o Does need to be backed up only if you upgrade the database installation o Becomes assigned to data tier 2 • Database files, which o Are located on the external storage LUN and are presented to the running virtual machine o Contain the database records o Must be backed up at all times o Are assigned to data tier 3 While designing your backup solution, you can keep tier 1 data on a slow backup medium such as a nearline or offline tape, because it is unlikely that you will need to change the ephemeral image. Tier 2 data should be placed on a faster backup medium, such as a virtual tape drive. Tier 3 data should be placed on the fastest backup medium, such as a disk drive.
Backup Basics—Your Backup And Recovery Plan
Figure 2-12 Benjamin Franklin
Benjamin Franklin (Figure 2.12), an American writer, philosopher, scientist, politician, inventor, and publisher, once said: “If you fail to plan, you are planning to fail!” Take this quote and see if you can apply it to your goal of protecting your company’s data assets from a disaster. It is clear that you must anticipate, define, create, implement, and validate your backup and recovery procedures to minimize the most common consequences of data loss-related problems. The physical process of preserving files on different media and in different locations is only a part of the backup and recovery plan. What is most important for enterprises is to minimize or even avoid expensive downtime. To do so, a strategy (a plan) must be created and implemented to use backup and restore as a way to get the systems up and running again in the shortest possible time. It is documented that 93% of companies without a backup and recovery plan go out of business within
five years of a data loss. Fifty percent of those businesses that do not recover their data within ten business days never fully recover. A good backup strategy must consider • The causes and cost of downtime • Answers to these questions: o How much data can you afford to lose at any point in time? o How much time is available for the backup? (This time is called the backup window.) o How much downtime can you tolerate? (This information helps you define your recovery.) A proper recovery plan must also define processes that include the following: • Partial recovery o Single file (e.g., if a user needs to recover an older version of a file and your restore operation should not delete the newer version) o Folder/directory (e.g., if a project moves from one computer to another) o All data associated with a single user (e.g., if a user moves to another department) o All data associated with a user group (e.g., if all project files associated with a specific project team are needed) • Full recovery o By tape (e.g., everything stored on the backup medium/backup set) o By a target system (e.g., everything stored on a server prior to its crash)
Figure 2-13 A sample backup and recovery plan
The results of the effort to preserve the information in your backup depend on your backup and recovery planning (Figure 2.13). At minimum, your backup and recovery plan should include the following:
1. Identifying which data must be saved. 2. Grouping this data into “(jobs.)” 3. Assigning levels of importance to this data. 4. Calculating the backup frequency. 5. Protecting this data. 6. Testing your backup operations. 7. Maintaining your backup history. 8. Archiving your tapes. 9. Defining and testing your restore procedures.
Note Depending on the situation, the order of these steps can be changed.
Step 1. Identifying which data must be saved
Figure 2-14 Steps 1–3
“(A typical company)’s data doubles, even triples, every year.” If you don’t believe this statement, which may actually underestimate the data explosion, read a few articles about global data growth and the birth of Big Data (Figure 2.14). One such article is here
Note BIG DATA UNIVERSE BEGINNING TO EXPLODE (http://www.csc.com/insights/flxwd/78931-big_data_universe_beginning_to_explode).
With such data explosion, what must be backed up and preserved? All data that a business replies on should be backed, including the operating systems, applications, and most importantly, user data. The entire system must be restorable. Many people consider the operating system not worth being backed up because there are operating system distribution media available to restore crashed systems. However, the enormous work which was put into installing, configuring, and updating the system is then lost.
Steps 2–3. Grouping this data into “(jobs)” and assigning level of importance to these job groups In steps 2–3, you should decide how to best group the data by considering the level of importance, the volume of the data, the sensitivity of the information, and the recovery requirements. This is where data tiering, covered earlier, plays an important role. Consider grouping the backup jobs according to • • • • •
Devices Folders/directories Media sets Partitions Workgroups
Step 4. Calculating the backup frequency
Figure 2-15 Steps 4–5
Certain data must be backed up more frequently, compared to other data. Some factors that can help you decide the backup frequency include the following (Figure 2.15):
• • • • •
Rate at which the data is created or changed in the company Acceptable data loss (recovery point) Value of the data Cost and time to save the data Allowed and acceptable downtime to restore the data
It is also necessary to evaluate the type of backup most suitable for your environment and goals. The backup type is determined by the same factors that define the backup frequency. Besides regular full backups of the complete system, certain user groups (e.g., developers or sales consultants) or certain type of data (e.g., sales records, customer databases, software development files, or financial transactions) may have the need for more frequent backups above the scheduled full system backups. All these needs must be identified throughout the company to determine the right backup type and frequency. The total size of the backed up data as well as the required backup frequency define the backup performance needed to fit the operation into the backup window. The backup window is the time available for the backup process while the host is taken offline or the user demand upon it is the lowest. This calculation yields a GB/hour value that, together with various other factors influencing performance, such as the tape drive’s typical backup rate, the network speed and bandwidth, and the backup software capabilities, influences the decision which backup solution must be implemented.
Step 5. Protecting this data There is a distinction between data protection and data backup. In case of confidential data, exposing the backup to unauthorized access is equal to the online data breach. Therefore, it is necessary to protect the data with • • • •
Password protection (the simplest method) Data encryption Inventory of volumes Physical protection, such as secure and locked storage, multiple copies of the most valuable tapes, and offsite archiving
Since gigabytes of valuable and business-critical company data often resides on just a single tape cartridge, it is necessary to protect this data against hacking. There are several means of doing so, one of which is encryption to secure the tapes and make them and the enclosed corporate intelligence senseless to possible thieves. Another is keeping track of all tapes in the company with an inventory by numbering the media and creating tape lists with their individual locations. Last but not least, a secure and locked storage for the cartridges with controlled access means physical security for the data on the tapes. All of this requires that a good working backup strategy and management system are in place. Offsite archiving can also be used to protect against theft and other breaches.
Step 6. Testing your backup operations
Figure 2-16 Steps 6–7
Before you close a backup job (Figure 2.16) and make the backup procedure a daily or a weekly routine, you must make sure that the data is backed up properly. Depending on the technology used to store the information, many things can go wrong: • • • • •
Incomplete data, such as files that are open and in use by users, applications, or the operating system Bad or worn-out media Environmental influences, such as magnetic fields or heat Lost passwords Virus attacks
To prevent these problems from occurring at the restore time: • Investigate, select, and use a verification feature of the backup application • Test the backup by restoring sensitive or sample data The way in which verification is performed differs from one backup application to another. These are some possible ways to do it: • Reduced verification—compares only the first n (number of) files on the tape with the original data on the disk. • Byte-by-byte verification—causes the backup application to read back the data from the tape and compares it with the original data on disk for each file. • Cyclic redundancy check (CRC)—causes the backup application to read back the data from the tape, calculates the CRC value of the data, and compares that value with the CRC number stored on the tape while writing the data.
Step 7. Maintaining your backup history
Enterprise backup solutions are designed to hold the information for the entire enterprise. This translates to a huge volume of information in a single place. After being designed, created, and tested, the backup jobs are then delegated to the automated systems (software or hardware tape libraries, for example). Without knowing what data is backed up and where it is, the backed up information is almost useless. To create and maintain a history of files or tapes, you can implement these procedures: • Suitable tape rotation schemes • Old backup retention • Media overwrite protection and tape expiration dates A frequent and offsite backup alone is not enough to securely protect the data of an enterprise. Keeping old backups and thus not overwriting older tapes to maintain a backup history is another key to getting lost data back. Reasons for doing this are manifold: accidentally deleted files may be discovered weeks later, a virus might have been found in the system after several days, or a decision was made to downgrade to a previous version of the operating system or application after the new one turns out to be insufficient or problematic. It saves time to be able to restore the original environment from a tape. For these and other reasons, it makes sense to save weekly backups for periods up to several months or even years. Backup horizon is the length of time for which you need to save your data. If you need to keep copies of your data for three years, then your backup horizon is three years. The backup horizon is used to define the number of tape media in rotation schemes.
Step 8. Archiving your tapes
Figure 2-17 Steps 8–9
Backup has a limited time validity. This backup characteristic is even embedded in the backup software through the mechanism of data expiration and retention. Backed up information that is prepared to be kept in a safe place for a longer time is called the archive (Figure 2.17).
Your goal is to achieve redundancy—to protect against environmental disasters, theft, human errors, and other factors. Thus, you should implement offsite backup and take your backup tapes to a different location. Once a regular backup procedure is in place in addition to the appropriate backup hardware and software, it is time to consider a strategy to back up the backups to achieve some kind of redundancy. Many events such as natural disasters (fire, water, and earthquakes) or human errors (operator puts in a wrong tape) threaten one set of tapes. To avoid this single point of failure, a second copy of the backup set should always be taken offsite (offsite backup). At the same time, a second backup copy, located in a safe place in a separate location, should always be available if needed. If the offsite backup contains a full or an image backup, it is referred to as the archive.
Step 9. Defining and testing your restore procedures Last but not least, define and test your restore procedures. If you can, simulate a system crash and see if your restore procedures work. If they do, time them and compare them against your initial goals and downtime requirements. Bring into your recovery scenarios other factors that might occur (multiple failure scenarios) and see if you can recover. Keep in mind that your entire backup and recovery plan rests upon your ability to recover your data from a failure.
Tape Media Drives And Technologies
Figure 2-18 Tape media drives
In the past, the magnetic tape was the best choice for backing up a computer system. Currently, tapes can be replaced with other technologies, but they continue playing an important role in large enterprise systems through the tape library technologies (Figure 2.18). Tape drives are best suited for full system backups because they are sequential-access devices. If a hard drive crashes, restoring the data from a tape drive is simple and painless. For recovery of individual documents or files, tape drives are much less suitable, because of their sequential-access nature and because tape backups are often done incrementally or differentially. Recovering an individual file, therefore, involves first determining which backup and/or which tape contains the desired file, and then forwarding the tape to the point where that file resides.
Advantages and disadvantages of tape drives Tape drives have some advantages over other types of media, such as: • Reliability—Because tape drives are in use only during a backup or a recovery operation, they tend to be more reliable compared to hard drives (which always spin, even when they are not in use). • Power savings—For the same reason, tape drives also use less power. • Versatility—Tape cartridges are usually small and can be easily stored offsite, allowing data to survive even if the computer itself is destroyed or stolen. • Ease of use—There is a wide support for tape drives and a good selection of backup software that makes restoring a computer from tape a reasonably painless procedure. Tape drives also have disadvantages, such as: • Expense—Although once considered as being the most economical backup method per GB of data, tape drives and tape media are now considerably more expensive than hard drives or a network backup. • Tape degradation—Magnetic medium is subject to degradation due to heat, humidity, dust, mishandling, electromagnetic forces, and ordinary wear. • Uncertainty of data integrity—Unless a full verification of each backup is performed (which takes as much time as the backup itself), you cannot be certain whether your backup is reliable. • Less suitability for nonfull restores—Tapes are sequential-access devices and are best suited for full system restores. Finding and restoring individual documents can be a long, slow, and cumbersome process. • Security considerations—Most tape backups are done at night while the machine is unattended and the previous night’s tape can easily end up with an unauthorized person. As with any removable media, security is a concern both while the machine is unattended and while the tape is being transported.
WORM technologies
Figure 2-19 WORM technologies
The Write Once, Read Many (WORM; Figure 2.19) technology makes a nonerasable copy of data onto a nonrewritable medium. Once written to the medium, the data cannot be modified. WORM drives
preceded the invention of CD-R and DVD-R disks. The CD-R and DVD-R optical disks for computers are common WORM devices. Besides the optical media, HPE offers a range of Ultrium-based WORM tape cartridges.
Removable disk backup (RDX)
Figure 2-20 Removable disk backup (RDX)
HPE RDX+ Removable Disk Backup System (Figure 2.20) is a rugged, removable, disk-based backup solution for servers and workstations. It is designed as a backup solution for remote locations or small-tomedium businesses with a few or no IT resources that are in less-than-ideal locations such as in busy offices, field base sites, or businesses that require mobility. RDX requires a USB connection and no separate power source or cord. The entire system can be protected with the hands-free HPE RDX Continuous Data Protection Software. The HPE RDX+ Removable Disk Backup System offers fast disk-based performance with the ability to store 500 GB, 1 TB, or 2 TB of data on a single removable disk cartridge at speeds of up to 360 GB/hr.
Tape libraries and virtual tape libraries
Figure 2-21 Tape libraries and virtual tape libraries
Enterprises deal with large volumes of information that they must back up and recover (Figure 2.21). Traditionally, the problem of handling a large backup capacity is solved through tape libraries. Tape libraries aggregate many tape drives and cartridges while providing the necessary mechanism to physically move, label, and rotate tapes. HPE offers a family of products named StoreEver (Figure 2.22), representing a wide range of tape libraries that integrate in a backup, recovery, and archiving portfolio.
Figure 2-22 HPE StoreEver ESL G3 Tape Library
In addition to the tape-based backup devices and libraries, HPE offers a disk-to-disk backup solution called HPE StoreOnce. These devices are configured in the backup application as Network-Attached Storage (NAS) devices, StoreOnce Catalyst, or Virtual Tape Library (VTL) targets.
Figure 2-23 HPE StoreOnce System portfolio
The total number of backup target devices provided by a StoreOnce System (Figure 2.23) varies according to the model. These devices may be all StoreOnce Catalyst, all VTL, all NAS, or any combination. All StoreOnce devices automatically utilize the StoreOnce deduplication, ensuring efficient and cost-effective use of disk space. Another benefit of StoreOnce Catalyst devices is that deduplication can be configured to occur on the Media Server (low bandwidth) or on the StoreOnce System (high bandwidth), allowing the user to decide which makes the most efficient use of the available bandwidth.
Learning check questions Reinforce your knowledge and understanding of the topics just covered by completing this learning check: 1. Which term describes the length of time you need to save your data after backup? a) Backup horizon b) Backup window c) Data retention interval d) Recovery time objective e) Recovery point objective 2. Which tape technology has the shortest archival life? a) 4 mm b) 8 mm c) 9-Track d) 3480 cartridge e) DLT tape 3. Which technology consists of a rugged, removable, disk-based backup solution that is best suited for regional or branch offices with no IT resources? a) WORM b) VTL c) RDX d) LTO 4. You are creating a backup and recovery plan for your customer. According to HPE recommendations, which step should you complete after you identify which data must be saved? a) Test your backup operations. b) Calculate the backup frequency. c) Define and test your restore procedures. d) Group your data into jobs.
Learning check answers This section contains answers to the learning check questions.
1. Which term describes the length of time you need to save your data after backup? a) Backup horizon b) Backup window c) Data retention interval d) Recovery time objective e) Recovery point objective 2. Which tape technology has the shortest archival life? a) 4 mm b) 8 mm c) 9-Track d) 3480 cartridge e) DLT tape 3. Which technology consists of a rugged, removable, disk-based backup solution that is best suited for regional or branch offices with no IT resources? a) WORM b) VTL c) RDX d) LTO 4. You are creating a backup and recovery plan for your customer. According to HPE recommendations, which step should you complete after you identify which data must be saved? a) Test your backup operations. b) Calculate the backup frequency. c) Define and test your restore procedures. d) Group your data into jobs.
Summary This chapter focuses on backup strategies and on related topics such as restore, archiving, and data tiering. It also covers a basic backup and recovery plan, which is something that every business ought to develop, test, and implement. Lastly, it takes a look at tape media drives and technologies. The key takeaways from this chapter are the following: • A backup is the process of creating a copy of your data onto a secondary medium, such as a tape or a disk. You create a backup to prevent data loss due to possible destruction or corruption. • You back up any and all data you deem critical to your company operation—operating systems and environments, applications, application and user data, and logs and journals. • Different backup strategies and technologies exist, each providing you with choices for defining and implementing the right backup and recovery plan for your company. • Restoring your data means recovering your data from a backup after destruction, deletion, or
corruption. Archiving means arranging and indexing your data for long-term retention and is different from backing up your data. Data tiering organizes your data into categories, where each category might have a different backup and recovery plan. Consequently, your backup solution should • • • • • • • • • •
Have a thoroughly tested and documented backup plan Include a backup of your operating system, applications, and user data Back up every hard disk in your environment Have a thoroughly tested and refined recovery process Back up on a daily basis, using either incremental or differential backup Use the most appropriate tape rotation scheme Duplicate backup tapes and store them offsite on a weekly basis Use the data verification feature of your backup software Employ tape libraries and/or auto-loaders to ease the administrative overhead Plan for growth
3 Topology and Performance
CHAPTER OBJECTIVES In this chapter, you will learn to: List and describe various backup methods and topologies Describe Storage Area Networks (SANs) and explain their advantages List and explain backup and disaster recovery tiers List and describe various HPE backup technologies and solutions Explain and position Network-Attached Storage (NAS) backup technologies Explain and position tape-as-NAS for archiving Perform backup information collection needed for capacity planning
Introduction Selecting the correct backup topology and solution is critical to your successful backup and recovery strategy. As you would suspect, there are many options to choose from, ranging from how you perform the backup and recovery, which infrastructure you use, to which technologies, products, and solutions you bring into your environment. Every selection that you make impacts the effectiveness of your backup, speed of your recovery, and cost. This chapter provides you with background needed to make the right decisions. It covers different backup methods and topologies and explains their advantages and disadvantages. It covers topics such as SAN, NAS, deduplication, SAN zoning, multipathing, site replication, tape offload, and encryption. It also discusses backup and disaster recovery tiers, which you can use in your planning. HPE provides a number of technologies, products, and solutions designed specifically for the purposes of protecting enterprise data and recovering from disasters. These products are overviewed in this chapter also. Lastly, the chapter ends with a short exercise of collecting backup information from customers in order to perform capacity planning and recommend the customer the right backup and recovery solution.
Backup Methods And Topologies No matter which form of backup and which tape rotation scheme is used in your backup strategy, there are two basic methods of saving data to tape media in an enterprise networked environment: • Local backup • Remote backup (also referred to as the client/server backup) A local backup usually corresponds to a local type of storage connectivity, usually referred to as Directly Attached Storage (DAS). Storage (usually disk or tape) is directly attached by a cable to the computer (server). A disk drive inside a PC or a tape drive attached to a single server are examples of simple types of DAS. I/O requests (also called protocols or commands) access devices directly. Local backup is the fastest way of performing backup. Speed is limited by the speed of the local bus or
the backup target. On the other hand, local backup is the least flexible solution, considering that it works the best for the server to which the backup media is attached. As a bridge between the local and remote backup, a technology called NAS is used. A NAS device (also called an appliance), usually with an integrated processor and disk storage, is attached to a TCP/IPbased network (Local Area Network [LAN] or Wide Area Network [WAN]) and accessed using specialized file access/file sharing protocols. File requests received by a NAS are translated by the internal processor to device requests. To perform a remote backup, it is necessary to establish a communication channel between the client and the server. Storage resides on a dedicated network—the Storage Area Network, or SAN. Like DAS, I/O requests access devices directly, but the communication passes over the SAN infrastructure. Today, most SANs use the Fibre Channel (FC) media, providing a connection for any processor and any storage on that network. Ethernet media using an I/O protocol called Internet Small Computer Systems Interface (iSCSI) emerged in 2001.
Local backup
Figure 3-1 Local backup
Local backup (Figure 3.1) is the traditional method of backing up your data. A backup device, usually a tape drive, is locally (directly) attached to the computer. The backup software is also installed on that computer and performs the backup operations by copying the data through the system from one device to the other. With local (basic) server backup, each server in a networked environment connects to its own backup device using the SCSI protocol. Each server runs its own instance of a backup application, which may be remotely controlled by a management station. However, the storage medium for each server still must be managed locally and manually, which increases complexity and overhead in large network installations. This approach is inefficient because each computer must have its own backup device and possibly its own backup administrator. The speed of the backup, however, may be very high compared to other methods because the backup data travels on dedicated local paths and does not consume LAN bandwidth.
Figure 3-2 Local backup in a networked environment
The main advantage of local backup is speed, which is only limited by local factors such as speed of the devices and the bus/controller. Also, local backup does not consume any LAN bandwidth (Figure 3.2). The main disadvantage is cost. Local backup is relatively expensive because each server needs its own backup application and a backup device as well as a person who manages the media.
Local backup with tape libraries To address user errors and the overall management of local backups, introduction of tape libraries into the environment enables multiple computers to be directly (locally) connected to a tape library instead of individual standalone tape devices.
Figure 3-3 Local backup with tape libraries
The environment consists of a centralized tape library with multiple tape drives, supporting a number of
servers that need to be backed up (Figure 3.3). When using automated tape libraries, there is something to consider—only one system can control the loader robot in the tape library. What happens if the robot or the computer controlling the robot fails? Advantages of the local backup with tape libraries are speed, which is limited by only local factors, no consumption of LAN bandwidth, and no need for individual tape media management. The disadvantages consist of the computer controlling the robot or the robot itself being a single point of failure and a high price as compared to the local backup without tape libraries. Some of the disadvantages of physical tape libraries are addressed by virtual tape libraries (VTLs). VTL is a network-based device which runs a software that simulates characteristics of the traditional tape library. Instead of saving data on tapes, a VTL uses fast-rotating hard drives. Software middle layer provides the tape library interface over the network to the backup software so that the backup software can treat the VTL as a physical tape library.
Client/server backup The client/server backup model is also called the centralized server backup.
Figure 3-4 Client/server backup
With this method, all computers and their local disk volumes have a LAN connection to one or more central servers with attached backup devices. The data is pulled over the network from the clients by the server-based backup application and then stored on tape. When a tape library is used as the backup device, the solution is referred to as automated client/server backup (Figure 3.4). This model adds extra capacity and advantages of automation. However, both cases present a single point of failure—if the server or the attached tape device fails, no client can be backed up. The advantages of this solution are consolidation onto one backup device, fewer media management challenges, and centralized management and automation. The disadvantages are a single point of failure (either the backup server or the tape device) and LAN performance degradation.
Push agents Another approach is to have a backup client application (called push agent) on each machine in the network that pushes the data to the server with the attached backup device and the server backup
application. Push agents improve the overall performance of a centralized server backup and optimize the throughput by compressing and prepackaging the data before it is sent to the main backup application on the backup server.
Figure 3-5 Push agent
The push model has some advantages over the pull model (Figure 3.5). With push agents, for instance, only pure data is sent to the server, while in the pull model, the server additionally has to process all metadata to determine which files to save (e.g., reading the metadata from the client and then checking the archive bit).
Note Metadata (or “(data about data)”) describes the content, quality, condition, and other characteristics of data. Metadata is used to organize and maintain investments in data, to provide information to data catalogues and clearinghouses, and to aid data transfers. An additional advantage of the push model is that the client application does not need to know which backup hardware is used at the server and may even have the capability to choose among several backup servers (as part of the failover policies, for example) in the network, if available. As for data security, each client may use its own encryption algorithm before sending the data to the backup device (in the pull model, one common algorithm is used instead for all clients). The clients may even have more CPU time available to compress the data to save tape capacity and reduce network utilization. A common example of the push agent advantages is the ability to perform deduplication (covered in the next section) at the client side and send only deduplicated bits over the network. This approach requires powerful clients, but greatly reduces the traffic between the client and the backup destination (such as tape library or VTL).
Deduplication
Figure 3-6 Deduplication
Depending on the client being backed up, the push service could be enabled for deduplication. Deduplication is the process of storing data in the “(intelligent)” way (Figure 3.6). To illustrate, assume that your push agent performs a full backup procedure once in a week. Considering that a large part of the data has not changed by this time (such as operating system and general software installation), exactly the same information appears on the backup target repeatedly. Another example is a backup of email servers. A typical email system could contain 100 instances of the same 1 MB file attachment. If the email platform is backed up or archived, all 100 instances are saved, requiring 100 MB of storage space. With data deduplication, only one instance of the attachment is actually stored, and each subsequent instance is just referenced to the saved copy. In this example, a 100 MB storage demand could be reduced to only 1 MB. Data deduplication offers other benefits. Lower storage requirements reflect on disk or tape expenses. Deduplication optimizes disk space and allows for longer retention periods. Longer retention periods allow for better recovery time objectives, greatly reducing the need for tape backups. Finally, data deduplication reduces the data that is sent over a slow network (such as a WAN) and greatly improves remote backup, replication, and disaster recovery capabilities. Data deduplication can generally operate at the file, block, and even bit level. File deduplication eliminates duplicate files but is not an efficient method of deduplication. Block and bit deduplication looks within a file and saves unique iterations of each block or bit. Each chunk of data is processed using a hash algorithm such as MD5 or SHA-1. This process generates a unique number for each piece of data, which is then stored in an index. If a file is updated, only the changed data is saved. That is, if only a few bytes of a document or a presentation are changed, only the changed blocks or bytes are saved (the changes do not constitute an entirely new file). This behavior makes block and bit deduplication far more efficient. However, block and bit deduplication takes more processing power and uses a much larger index to track the individual pieces. Hash collisions are a potential problem with deduplication. When a piece of data receives a hash number, that number is then compared with the index of other existing hash numbers. If that hash number is
already in the index, the piece of data is considered a duplicate and does not need to be stored again. Otherwise, the new hash number is added to the index, and the new data is stored. In rare cases, the hash algorithm might produce the same hash number for two different chunks of data. When a hash collision occurs, the system will not store the new data because it sees that its hash number already exists in the index. This is called a false positive and can result in data loss. Some vendors combine hash algorithms to reduce the possibility of hash collisions. They might also examine the metadata to prevent such collisions. For additional information on data deduplication, see these resources: • For data deduplication explained on Wikipedia, go to: http://en.wikipedia.org/wiki/Data_deduplication. • For a YouTube video on StoreOnce 2nd Generation Deduplication for Enterprise Data Protection, go to: http://www.youtube.com/watch?v=L1RJKRnqquk.
Backup encryption
Figure 3-7 Data encryption
Backup encryption is an important aspect of data protection as a method of maintaining data security and integrity (Figure 3.7). Depending on the complexity of the backup solution and the number of components, different types of encryption might be necessary to protect your data. In case of a simple point-to-point configuration, where the data drive is directly connected to the backup server, encryption could be performed by the backup software or by the tape drive at the hardware level. Often, encryption is combined with data compression and/or deduplication. In case of more complex topologies, which include backup to disk and remote agents, encryption must be performed in transit (at the SAN or LAN level) as well as on each backup target, such as temporary disks, virtual libraries, or tape drives. The goal of encryption is to protect the data integrity by preventing third parties to modify or access the
information without authorization.
Serverless backup Serverless backup refers to backup operations that do not require data to be routed through a server. Serverless backup requires a network configuration, such as a SAN, in which the storage and backup devices are not tied to specific servers. Intelligence built into the routers or other connecting devices in the SAN queries the servers for backup information. These devices then initiate the movement of that data directly from the storage devices to the backup devices through the SAN. (Paraphrased from: A Dictionary of Storage Networking Terminology; Storage Networking Industry Association) Serverless backup provides a method of transferring data directly from the disk to the tape device with only minimal impact on the application server’s CPU and memory (typically around 1%) (Figure 3.8). This works by having an intelligent FC device use the Extended SCSI COPY command standard, which is included in the current SCSI-3 specification. The Extended SCSI COPY command creates a block-byblock disk image on the tape, allowing the serverless backup to perform image backups and do so with high data throughputs of many terabytes per hour.
Figure 3-8 Serverless backup
The biggest benefit of serverless backup is reduced volume of data the application server has to process for backup purposes, thus allowing for additional bandwidth to handle normal application traffic. Normally, data would be read from the disk to the server and then from the server to the tape device. With serverless backup, the application server is decoupled from the backup function and the host only needs to send each SCSI COPY initiator command. Although this reduces the amount of data to the host, the host is still repeatedly sending the initiating COPY commands. Serverless backup requires a special SAN equipment, such as the HPE Network Storage Router M2402, which supports up to eight simultaneous streams of serverless backup jobs. To implement serverless backup, HPE Data Protector or equivalent software is necessary.
Storage Area Networks To cope with general network-related performance issues, networks dedicated to storage devices were introduced. A SAN is one such dedicated, high-speed network, similar to a LAN. A SAN connects storage devices directly to servers and clients with similar interconnect technologies used in LANs and WANs,
such as routers, switches, hubs, and gateways. There are local and remote, shared and dedicated SANs, and they include central or external storage elements. SAN interfaces can be ESCON, SCSI, HIPPI, SSA, or FC. SANs can also be used to increase backup performance.
Two basic FC topologies There are two basic FC topologies: • Arbitrated loop (FC-AL) • Switched fabric
Arbitrated loop (AL)
Figure 3-9 Arbitrated loop
AL (Figure 3.9) is a FC topology in which all devices are connected in a one-way loop. Historically, FCAL was a lower-cost alternative to a switched fabric because it allows servers and storage devices to be connected without using expensive FC switches. In the arbitrated loop topology: • All components share the same total bandwidth. • The primary hardware connectivity components are FC hubs. • Disks and tapes cannot be connected to the same AL.
Note For more information about the FC-AL, go to: http://en.wikipedia.org/wiki/Arbitrated_loop.
Switched fabric
Figure 3-10 Switched fabric
A switched fabric is a FC topology where devices (also called network nodes) connect to each other using FC switches (Figure 3.10). It has the best scalability of all FC topologies but is also more expensive due to the FC switch requirements. A switched fabric: • Provides dedicated full bandwidth from one FC port to another. • Requires FC switches. • Allows disks and tapes to be in the same FC domain.
Note For more information about http://en.wikipedia.org/wiki/Switched_fabric.
switched
fabric,
go
to:
Advantages of switched fabrics and SANs Mixed primary (disk) and secondary (tape) storage devices in the same FC-AL are not supported because disks use random access, whereas tapes use sequential access. This is very difficult to synchronize and access to disk drives would disturb running backup jobs. Another reason for the FC-AL device type limitation is that Microsoft Windows does not provide any native lock mechanism schemes for shared disks between multiple servers. Meeting this goal would require a cluster software, which is provided by the backup applications for the tape devices via the Shared Storage option. Therefore, the backup application is where the lock management function for the backup devices is provided. Advantages of SANs include native support for requirements such as centralized management, scalable capacity and performance, clustering, data sharing, multihosting, and storage consolidation. Using the latest technologies, a FC-based SAN guarantees highest performance and provides longer connectivity distances than traditional networks, while the storage management is handled in a similar fashion as
before, through dedicated SAN-enabled storage management tools. FC is a high-speed serial link between devices, able to carry a number of protocols such as SCSI and IP. FC is capable of carrying this traffic over possible distances between devices (10 km, up to 100 km) and smaller cable diameters. Thus, the SAN advantages include the following: • • • • • •
Dedicated storage network, separate from the communication network Physical security (with optical FC) Improved centralized management Superior performance Simpler cabling Long distances
SAN terminology Even though SANs can be configured in many different ways, such as small local clusters or large geographically dispersed topologies, the same basic components are used: • SAN interfaces and protocols—allow connections to storage devices and enable storage to be shared (clustered). To increase redundancy and performance, multiple channels can be installed (loops with FC). • SAN interconnects—include traditional network components, such as hubs, routers, gateways, and switches, which are used in a SAN. • SAN media—refers to the physical medium for a SAN, which can be optical (fiber) or metallic (copper).
Note Fiber is the term used for optical medium, while Fibre is the term used for the FC protocol. • SAN fabric—includes, in most cases, switched FC or switched SCSI topology, which may be extended across WANs using gateways.
Learner activity Use the Storage Networking Industry Association (SNIA) website, accessible at the following link, to match the terms with their definitions (Figure 3.11). The correct answers are revealed at the end of this chapter.
Note For a good dictionary of SAN terms, go to: http://www.snia.org/education/dictionary.
Figure 3-11 SAN terminology matching activity
Learner activity answers This section contains answers to the learner activities (Figure 3.22).
Figure 3-22 SAN terminology matching activity answers
Learning check questions Reinforce your knowledge and understanding of the topics just covered by completing this learning check: 1. What are the biggest disadvantages of the FC-AL topology? (Select two.) a) Tape devices are not supported. b) Tape and disk devices cannot coexist on the same loop. c) Intelligent SAN switches are required. d) Only optical cables must be used. e) High device chatter and slow device discovery hinder performance. 2. Which statements are true about data deduplication? (Select three.) a) Deduplication can only operate at a block level. b) Hash collisions are a potential problem with deduplication. c) Deduplication requires push agents. d) Deduplication increases retention periods and improves RTO. e) File deduplication is the most efficient method of deduplication. f) Hash collision occurs when the same hash number is generated for different chunks of data.
Learning check answers This section contains answers to the learning check. 1. What are the biggest disadvantages of the FC-AL topology? (Select two.) a) Tape devices are not supported. b) Tape and disk devices cannot coexist on the same loop. c) Intelligent SAN switches are required. d) Only optical cables must be used. e) High device chatter and slow device discovery hinder performance. 2. Which statements are true about data deduplication? (Select three.) a) Deduplication can only operate at a block level. b) Hash collisions are a potential problem with deduplication. c) Deduplication requires push agents. d) Deduplication increases retention periods and improves RTO. e) File deduplication is the most efficient method of deduplication. f) Hash collision occurs when the same hash number is generated for different chunks of data.
Backup And Disaster Recovery Tiers Backup and disaster recovery (DR) are not interchangeable, but disaster recovery is not possible without performing a backup in the first place. DR deals with the ability to get the backed-up systems restored and running as quickly as possible.
Levels/tiers of IT disaster recovery
Figure 3-12 Disaster recovery levels/tiers
In 1992, IBM together with SHARE (IBM user group) defined seven levels (tiers) of DR solutions (Figure 3.12) which, since then, were widely accepted in the Business Continuance industry. These seven levels describe and quantify different options to successfully recover mission-critical computer systems from disasters. The levels/tiers are the following: • • • • • • • •
Tier 0: Do nothing, no offsite data Tier 1: Offsite vaulting Tier 2: Offsite vaulting with hot site Tier 3: Electronic vaulting Tier 4: Electronic vaulting to hot site Tier 5: Two-site, two-phase commit Tier 6: Zero data loss Tier 7: Highly automated, business integrated solution
While tier 0 is the cheapest solution, it does not incorporate any protection for your data. The higher the level, the more expensive the solution becomes, while at the same time the recovery time is significantly reduced. Tier 6 involves dual identical data centers with all hardware, high-speed and long-distance connections, and copies of software, licenses, and personnel to reduce the recovery time to a few minutes. Backup and restore are important parts of at least the first five levels of the IT DR planning.
Learner activity Use the Internet to populate Table 3.1:
Table 3.1 DR tiers, their attributes, and expected recovery times DR tier
Attributes (2–3)
Expected recovery time
0 1 2 3 4 5 6 7
Write down which source you used to populate Table 3.1: ______________________________________________
Tier 0: Do nothing, no offsite data Tier 0 data center has no requirements for any form of DR such as backup. The site has no backup hardware, no contingency plan, no documentation, and no backed up data. Even if a company does a backup, but leaves the tapes in the backup device in the same computer room, it still resides in this tier. Usually, if a disaster strikes, a company in tier 0 never recovers from the loss of their business data.
Tier 1: Offsite vaulting A company residing in tier 1 does regular backups and moves the backup media to an offsite facility (vault). This already is a form of DR planning, and some recovery requirements have been defined. However, just the data (on tape) is in a safe and offsite location; however, there is no hardware such as a compatible backup device available or a site at which to restore the data. The retrieval and vaulting of the data are usually done by couriers and, therefore, is referred to as PTAM = Pickup Truck Access Method, which is relatively simple and inexpensive. Tier 1 = Offsite vaulting + recovery planning A typical recovery time for tier 1 installations is more than a week and is dependent on when a second site with necessary hardware can be set up. One week of business data outage can have a permanent impact on the company’s future business.
Tier 2: Offsite vaulting with hot site Companies having a tier 2 DR plan do have a second site in standby, a so-called hot site. This secondary site has enough compatible hardware and infrastructure to support the company’s business requirements, but is shut off. In the event of a disaster, the backups can be taken from the offsite vault to the hot site by couriers and then be restored. A hot site generates higher cost but reduces the recovery time. Tier 2 = Tier 1 + hot site A typical recovery time for tier 2 installations is more than a day and is dependent on the transport time for the data media as well as the time to start up the hot site.
Tier 3: Electronic vaulting Companies having a tier 3 DR plan do have a second and fully operational site which is permanently up and running (but not processing). In tier 3, electronic vaulting is supported for a subset of critical data. This means transmitting business-critical data electronically to the secondary site where backups are automatically created. The permanently running hot site increases the cost even more, but the time for the recovery of business-critical data is significantly reduced. Tier 3 = Tier 2 + electronic vaulting A typical recovery time for tier 3 installations is around one day.
Tier 4: Electronic vaulting with active 2nd site Tier 4 DR planning introduces two fully operational and actively processing sites which can share the workload. Recovery of the data can be bidirectional, which means that either site can be restored from the other. While critical data is continuously transmitted between the two sites and backups of the critical data are taken at both sites, the noncritical data still needs to be taken from the offsite vault in case of a disaster. In this scenario, an active management of the data stored in the hot site is required. Tier 4 = Tier 3 + second site permanently active A typical recovery time for tier 4 installations is up to one day.
Tier 5: Two-site, two-phase commit In tier 5, selected data is additionally maintained in an image status. Updates to both copies of the database in data center A (primary site) and in data center B (secondary site/hot site) are applied within a single-commit scope. A database update request is only considered successful when the data in both sites is successfully updated. This type of database synchronization (called remote two-phase commit) is only possible with high-bandwidth connections between the two sites. Another requirement for tier 5 is dedicated hardware in data center B that is able to automatically take over the workload from data center A. All applications and critical data are present at both sites, thus only data, which is just transferred, is lost during a disaster. Again, the recovery time is significantly reduced. Tier 5 = Tier 4 + maintenance of selected data in an image status A typical recovery time for tier 5 installations is less than 12 hours and could be as low as one to four hours.
Tier 6: Zero data loss In this one of the most expensive DR solutions, the coupling or clustering of hardware and applications is required. Long-distance, high-bandwidth connections as well as additional hardware for data replication adds to the cost. On the other hand, this is the fastest method of data recovery in case of a disaster. All data is present at both sites on dual storage and is continuously updated in both directions using full network switching capabilities. In this scenario, the applications consider data lost if a database transaction has commenced, but the request has not been satisfied. Tier 6 = Tier 5 + zero loss of data with automatic and immediate transfer to second site A typical recovery time for tier 6 installations is a few minutes.
Tier 7: Highly automated, business integrated solution Tier 7 solutions include all major components being used for a tier 6 solution with the additional integration of automation. This allows a tier 7 solution to ensure consistency of data above that which is granted by tier 6 solutions. Additionally, recovery of the applications is automated, allowing for restoration of systems and applications much faster and more reliably than would be possible through manual business continuity procedures.
Cost vs. time-to-recover relationship
Figure 3-13 Cost vs. time-to-recover relationship
The described levels (tiers) define the ability to recover data. Key factors are recovery time (how fast do you need to recover your data) and recovery point (how much data are you willing to lose). As already explained, a recovery solution is a compromise that you have to accept in terms of data safety and involved cost. A good recovery solution lives on the intercept point, which presents a reasonable proportion between the two. For planning purposes, it is vital to know the cost of downtime because you do not want to spend more money on the solution than the financial loss involved in a disaster (usually the longer a company cannot process data, the more expensive the outage is) (Figure 3.13). To find out about the possible financial loss due to a disaster, a Business Impact Analysis (BIA) needs to be undertaken and also a risk assessment should take place to identify possible problem areas.
HPE Backup Technologies And Solutions This section covers the primary HPE backup technologies and solutions such as StoreOnce and StoreEver. Key concepts associated with these technologies and solutions, such as SAN zoning, heterogeneous SAN, and multipath to tape are also covered.
SAN zoning Zoning can be helpful in larger SANs to simplify device discovery and to reduce chatter between devices. HPE recommends the following guidelines for determining how and when to use zoning (Figure 3.14): • Use zoning by HBA port. Zoning by HBA port is implemented by creating a specific zone for each server or host by the worldwide port name and adding only those storage elements to be utilized by that host. Zoning by HBA port prevents a server from detecting any other devices or servers on the SAN and simplifies the device discovery process. • Disk and tape devices on the same HBAs are supported. For larger SAN environments, it is recommended to also add storage-centric zones for disk and backup targets. This type of zoning is done by adding overlapping zones with disk and backup targets separated. • FC zoning can be implemented using physical switch port numbers, worldwide name IDs, or userdefined switch aliases.
Figure 3-14 SAN zoning with HPE StoreEver/StoreOnce and HPE StoreServ
Heterogeneous SAN with HPE StoreEver, StoreOnce, and StoreServ A heterogeneous network connects different types of operating systems and different types of devices, which all share the same SAN infrastructure. A typical usage of this network is to share tape and storage devices among multiple servers. HPE StoreOnce Backup System is a disk-based storage appliance for backing up networked servers or PCs to target devices on the appliance. These devices are configured as NAS, StoreOnce Catalyst, or VTL targets for the backup applications.
Figure 3-15 Heterogeneous SAN with HPE StoreEver, StoreOnce, and StoreServ
The SAN configuration in Figure 3.15 consists of multiple servers with different operating systems, all connected to a shared storage. Each server can access the storage through a single host bus adapter (HBA). HPE recommends zoning by HBA to reduce the traffic in the SAN and to improve performance. Table 3.2 shows how the data flows within this environment. Sharing a StoreEver Tape Library or a StoreOnce Backup eliminates the need for multiple systems connected to each server, thereby simplifying the data protection management and reducing the total cost of ownership. Table 3-2 Sample heterogeneous SAN solution and its data flow Step number
Data flows from…
Data flows to…
1
Shared SAN-attached disk array (StoreServ storage)
Each SAN-attached backup server via FC
2
Each SAN-attached backup server
SAN-attached HPE StoreEver Tape Library or HPE StoreOnce Backup via FC
Multipath to tape Advanced Path Failover (APF) uses capabilities in the HPE StoreEver LTO-6 and newer Ultrium tape drives and the HPE StoreEver tape libraries in which they are installed, combined with software drivers running on a host system to provide path failover when multiple paths are available to a tape drive or to a library controller. APF is a licensed feature.
Figure 3-16 APF with HPE StoreEver tape libraries
In Figure 3.16, two servers, designated as “(backup server A)” and “(backup server B,)” have two different host interface ports that are connected to two different SANs. Each SAN is connected to the StoreEver tape library. Each server has separate host bus adapter (HBA) ports dedicated to SAN 1 and SAN 2. All drives in the library have two ports, with one port connected to SAN 1 and the other connected to SAN 2. The library in this example has two different drives that are both configured to provide a library control path. Each drive that is configured to provide a library control path connects to the SAN as two devices, a tape drive and a library controller, at two different SCSI logical units.
HPE StoreOnce backup to tape offload HPE StoreOnce backup to tape offload is a special case of a SAN configuration that leverages highspeed, disk-based backup to achieve smaller backup windows while using tape backup for economical, long-term storage.
Figure 3-17 HPE StoreOnce backup to tape offload
Figure 3.17 illustrates a configuration that consists of one or more servers, a primary storage (HPE StoreServ), a disk-based backup storage (HPE StoreOnce), and a shared tape library (HPE StoreEver). This configuration allows high-speed, deduplicated backups to StoreOnce. A tape-to-tape copy can be performed using the backup application to send rehydrated data from the StoreOnce appliance to the SAN-attached StoreEver tape library for longer retention. Table 3-3 HPE StoreOnce backup to tape offload and its data flow Step number
Data flows from…
Data flows to…
1
Primary storage (HPE StoreServ)
SAN-attached server via FC
2
SAN-attached server
SAN-attached HPE StoreOnce backup via FC
3
HPE StoreOnce backup
SAN-attached server via FC
4
SAN-attached server
SAN-attached HPE StoreEver tape storage via FC
HPE StoreOnce Catalyst remote site replication HPE StoreOnce Catalyst enables source- and target-side deduplication. With facilitation from the data protection software, data can also be moved between sites without rehydration (undeduplication). In the data center, another copy of the data can be made to tape after rehydration, utilizing tape-to-tape copy options in your data protection software (Figure 3.18). Typical use cases for this solution are transferring data from a branch office to the main data center or copying data between sites.
HPE StoreOnce technology is an inline data deduplication process. It uses a hash-based chunking technology, which analyzes incoming backup data in chunks that average 4K in size. The hashing algorithm generates a unique hash value that identifies each chunk and points to its location in the deduplication store. Hash values are stored in an index that is referenced when subsequent backups are performed. When the data generates a hash value that already exists in the index, the data is not stored a second time, but rather a count is increased, showing how many times that hash code has been seen. Unique data generates a new hash code and that is stored on the appliance. Typically, about 2% of every new backup is new data that generates new hash codes. With Virtual Tape Library and NAS shares, deduplication always occurs on the StoreOnce backup system. With Catalyst stores, deduplication can be configured to occur on the media server (recommended) or on the StoreOnce backup system.
Figure 3-18 HPE StoreOnce Catalyst remote site replication
Table 3-4 HPE StoreOnce Catalyst remote site replication and its data flow Step number Data flows from…
Data flows to…
1
Server
Server-side deduplication
2
Server
HPE StoreOnce backup storage at the branch site via vLAN
3
HPE StoreOnce backup storage at HPE StoreOnce backup storage at the data center site via WAN (movement across the WAN the branch site can be managed by a data protection application)
4
HPE StoreOnce backup storage at SAN-attached HPE StoreEver tape storage via FC (rehydrated data, utilizing the tape-to-tape the data center site copy operation in the data protection software)
HPE StoreOnce deduplication impact on backup performance With VTL and NAS shares, deduplication always occurs at the StoreOnce backup system. With Catalyst stores, deduplication may be configured to occur at the media server, which is recommended, or at the StoreOnce backup system. The inline nature of the deduplication process means that it is a very processor- and memory-intensive task. HPE StoreOnce appliances have been designed with the appropriate processing power and memory to minimize the backup performance impact of deduplication. The best performance is achieved by configuring a larger number of libraries, shares, and Catalyst stores with multiple backup streams to each device. However, there is always a tradeoff with the overall deduplication ratio: • If servers with a lot of similar data must be backed up, a higher deduplication ratio can be achieved by backing them to the same library, NAS share, or Catalyst store, even if this means directing different media servers to the same data type device configured on the StoreOnce appliance. • If servers contain dissimilar data types, the best deduplication ratio/performance compromise is achieved by grouping the servers with similar data types together into their own dedicated libraries, NAS shares, or Catalyst stores. For example, if you are performing a backup of a set of Microsoft Exchange servers, Microsoft SQL Server database servers, file servers, and application servers, it would be best served by creating four virtual libraries, NAS shares, or Catalyst stores, with one for each server data type. The best backup performance to a device configured on a StoreOnce appliance is achieved using somewhere below the maximum number of streams per device (the maximum number of streams varies between models). When restoring data from a deduplicating device, it must reconstruct the original undeduplicated data stream from all data chunks contained in the deduplication stores. This can result in lower performance compared to the backup process (typically 80% of the backup performance). Restores also typically use only a single stream. Full backup jobs result in higher deduplication ratios and better restore performance. Incremental and differential backups do not deduplicate well.
Nas Backup Technologies A disk array or another storage device, which directly connects to the communication or messaging
network, using protocols such as TCP/IP, is called a NAS. A NAS functions as a server in a client/server relationship and usually has a processor and an operating system.
Hierarchical Storage Management (HSM) With HSM, data is stored on different media types based on the frequency with which that data is accessed. Frequently accessed data is placed on fast devices or RAID arrays, while data with lower demand is stored on slower devices such as optical disks or tapes. All data is always accessible to the users, but the retrieval speed differs, depending on where the data is. HSM is often complex to integrate and manage, but if done properly, it can reduce the cost of storage and give users virtually unlimited storage space. HSM is not a replacement for backup; it is a storage management system. HSM systems must be regularly backed up as well.
Near-online storage With near online storage, high-speed and high-capacity tape drives or tape libraries are used as disk replacements. Data that is not important or too large for disks is stored on tapes in an autoloader system or in a library. The application software must be able to read and write data from or to the tape as if it were a normal disk drive (the software is not a backup application). The tapes are not removed for archiving, but left in the devices for user access. In this scenario, which usually is used with desktop systems and infrequently in client/server environments, the tape drive acts as a slow disk.
Learning check questions Reinforce your knowledge and understanding of the topics just covered by completing this learning check: 1. What benefit does APF provide? a) Deduplication at the data source b) Deduplication at the target c) Multipath failover for tape devices d) Direct application access to tapes 2. Which technology uses high-speed and high-capacity tape drives or tape libraries as replacement for disk drives? a) Near online storage b) SAN c) Deduplication d) Backup to tape offload 3. Which DR tier requires confirmation of data updates at two different sites as part of a single transaction and yields a recovery time of 12 hours or less? a) Tier 3: Electronic vaulting b) Tier 4: Electronic vaulting to hot site
c) Tier 5: Two-site, two-phase commit d) Tier 6: Zero data loss 4. Which DR tiers bring the recovery time down to one hour or less? (Select all that apply.) a) Tier 1: Offsite vaulting b) Tier 2: Offsite vaulting with hot site c) Tier 3: Electronic vaulting d) Tier 4: Electronic vaulting to hot site e) Tier 5: Two-site, two-phase commit f) Tier 6: Zero data loss g) Tier 7: Highly automated, business integrated solution
Learning check answers This section contains answers to the learning check questions. 1. What benefit does APF provide? a) Deduplication at the data source b) Deduplication at the target c) Multipath failover for tape devices d) Direct application access to tapes 2. Which technology uses high-speed and high-capacity tape drives or tape libraries as replacement for disk drives? a) Near online storage b) SAN c) Deduplication d) Backup to tape offload 3. Which DR tier requires confirmation of data updates at two different sites as part of a single transaction and yields a recovery time of 12 hours or less? a) Tier 3: Electronic vaulting b) Tier 4: Electronic vaulting to hot site c) Tier 5: Two-site, two-phase commit d) Tier 6: Zero data loss 4. Which DR tiers bring the recovery time down to one hour or less? (Select all that apply.) a) Tier 1: Offsite vaulting b) Tier 2: Offsite vaulting with hot site c) Tier 3: Electronic vaulting d) Tier 4: Electronic vaulting to hot site e) Tier 5: Two-site, two-phase commit
f) Tier 6: Zero data loss g) Tier 7: Highly automated, business integrated solution
Tape-As-Nas For Archiving One of the biggest challenges with archiving that customers experience is the management of technology updates. This technology update management involves rolling archives from older technologies and media forward to newer technologies and media and doing so before the older technologies become obsolete. Tape-as-NAS (tNAS) simplifies these technology updates and ensures long-term data accessibility.
Challenges with using backup applications for archiving If you already have a tape-based archive or are using your backup application to archive your data, you may already know that moving from older generation tape drives to newer ones requires an application intervention: • You need to identify the data that must be migrated. • You must read the data from an old tape and write it to a new tape. If keeping up with technology updates is not addressed, long-term data management becomes a problem over time as the data continues to grow. If you are using the backup application for archiving, you must keep that application available in case that you need to recall the old data. While tape is a great choice for archiving, using a backup application for archiving is not always the best answer. The data is stored in a proprietary format and requires both an administrator and a compatible technology to access it. Using a tNAS solution for archiving simplifies access to the archived data and the future technology updates that are needed for data preservation.
Using tNAS for archiving
Figure 3-19 LTFS volume shown in Windows Explorer
A tNAS solution is accessed via Common Internet File System (CIFS) or Network File System (NFS) mount points, making the data easily accessible using normal file system methods. Such access allows future users to find their information based on user-defined file names and directory structures. A backup administrator or a backup application is no longer needed to retrieve the data. The data is stored on tape in the industry-standard Linear Tape File System (LTFS) format, thereby eliminating the need for any proprietary software for future access (Figure 3.19).
Moving to a tNAS solution
Figure 3-20 tNAS solution based on QStar ASM
Moving to a tNAS solution is simple. All you need to do is make your tNAS mount point the destination for the data as it is read or recovered from the older tape. As the data is read from the older tapes, it is sent to the tNAS cache behind the mount point (QStar ASM cache in Figure 3.20) and then moved automatically to the new tape drive located behind the cache in the LTFS format (HPE StoreEver tape library with new technology drives in Figure 3.20). The tNAS solution from HPE uses standard hardware with the Archive Storage Manager (ASM) software from QStar. It supports LTFS and all HPE tape libraries.
Technology updates with tNAS When it is time to move your data to a new technology, it is a simple process also, and it is managed within the tNAS solution. For example, if you have data on LTO-5 drives and want to install LTO-7 drives when they become available, you can use a feature called TMT to automate this data migration. TMT is part of the tNAS solution stack, and it optimizes the data transfer between the drives as a background task.
Tiered storage and data lifecycle management with tNAS
Figure 3-21 Tiered storage and data lifecycle management with tNAS
QStar Network Migrator (QNM) is a policy-based tiered storage and data lifecycle manager. QNM software uses advanced policy management to monitor and automatically migrate, copy, and move infrequently used files from a primary storage to a tiered storage or to a central archive (Figure 3.21). By migrating static or less frequently used files to lower-cost storage such tape, businesses can optimize the use of primary storage, reducing the need and costs associated with purchasing more. In addition, when data is managed properly and only the most recent or frequently changing files are included in the backup process, the backup window can be reduced.
tNAS with LTO tape for archiving HPE StoreEver LTO tape is a great choice for long-term data storage. It is reliable, durable, and costeffective. Its ability to read back two generations of media makes LTO an attractive option for archiving. With the release of LTO-7, you are able to store up to 15 TB* of data on a single tape cartridge and access it at up to 750 MB/s* speed, achieving over 18 times faster access than LTO-1. tNAS is a great solution built on the value of LTO and makes ongoing data management and access easier. You still need to plan your migrations and deletions of your old data.
Key benefits of tNAS The key benefits of tNAS include the following: • • • •
Traditional NAS mount point accessibility Easy integration into existing workflows No administration that is required for data reads from and writes to tape No manual media management
• • • • •
Self-allocation of media as needed User-created directory structure for custom data management Integration with HPE Data Verification for ongoing media health analysis Automatic creation of secondary copies for offsite protection Monitoring of data life and recycling of tape cartridges
Sample Backup Information Collection For Capacity Planning This section guides you through a sample exercise of collecting backup information for capacity planning purposes. Write your thoughts, answers, and notes in the provided spaces.
Scenario Your customer asked you to prepare a proposal for a backup solution. You are analyzing the backup performance requirements and planning for the right capacity to meet this customer’s needs. You wrote the following questions in preparation for your meeting with the customer: • How long does it take to back up the customer’s environment? In other words, what is the backup window? • How much data must be backed up? • How many tapes are needed? • How many tape drives are necessary? What other questions must you ask? _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ What information must you collect to answer these questions? _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________
_______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________
Information needed per node To do your capacity planning, you need to collect the following type of information: • Bus/adapter(s): o Maximum bandwidth (MB/s) o Maximum throughput (IO/s) o Available bandwidth o Available throughput • Storage (per logical drive): o Total capacity (in GBs) o Used capacity (for full backups) o % daily change (for incremental backups) o % overlap (for differential backups) o Type of data (for compression calculations) o Maximum data transfer rate o Maximum throughput o Available bandwidth o Available throughput • Network connection: o Maximum bandwidth o Available bandwidth o % protocol overhead
Note Maximum is used to indicate the upper limit of a given device, not the theoretical limits of the technology used (e.g., FC-AL has a theoretical limit of approximately 50,000 IO/s; however, a given FC-AL adapter may be capable of less than 50% of this rate). Available is used to indicate the resource capability of a given device not currently in use. An adapter is a physical device that allows one hardware or electronic interface to be adapted (accommodated without a loss of function) to another hardware or electronic interface. The card adapts information that is exchanged between the computer’s microprocessor and the devices that the card supports.
A controller is a device that controls the transfer of data from a computer to a peripheral device and vice versa. For example, disk drives, display screens, keyboards, and printers all require controllers. Is there any other type of information that you believe you need to collect? If so, indicate it below: _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________
Information needed per network device Information you need to collect for network hubs and switches includes the following: • • • • •
Maximum transfer rate Maximum throughput Available bandwidth Available throughput Number of ports; then, for each port: o Maximum transfer rate o Maximum throughput o Available bandwidth o Available throughput
Example 10 ports at 100 MHz each yield 125 MB/s. What if the switch throughput is only 50 MB/s? _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ Is there any other type of information that you believe you need to collect? If so, indicate it below: _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________
_______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________
Information needed per backup server For each backup job, you need to collect answers to the following questions: • • • • •
Which clients are being backed up? Which files are being backed up? What is the data compression rate? Which rotation scheme should be used? What is the frequency, protection period, and append rules? Which backup type (full, differential, or incremental) should be used and when?
Miscellaneous questions may include the following: • What is the expected data growth rate? • What is the data change rate? Is there any other type of information that you believe you need to collect? If so, indicate it below: _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________ _______________________________________________________________________________
Capacity planning calculations As part of the capacity planning calculations, you determine • • • •
The data arrival rate at the backup server(s) The number of tape drives needed The number of tape media needed The duration of the backup window
At the same time, you may be able to identify possible bottlenecks in the device chain. If necessary, try different variables and recalculate for different approaches or solutions.
Summary The basic backup methods consist of performing local backup and remote backup. Local backup uses backup devices directly attached to the host you want to back up. It is typically not only the fastest method of backing up data but also the least flexible option and often expensive. Local backup can use advanced technologies such as tape libraries to automate management and share the cost among more devices. Remote backup uses shared devices, such as tape libraries, autoloaders, and VTLs, and copies the data over a network. Deduplication plays an important role with remote backups, as it lowers storage requirements, optimizes disk space, and allows for longer disk retention periods and better recovery time objectives. SAN is a high-speed network that connects storage devices directly to servers and clients. SAN is used for storage access and for remote backup purposes. SAN topologies include FC-AL and switched fabric. Switched fabric is a better topology because it supports multiple device types, provides dedicated full bandwidth, and supports requirements such as centralized management, scalable capacity and performance, clustering, data sharing, multihosting, and storage consolidation. Advantages of FC and SANs are the following: • • • • • •
Dedicated storage network, separate from the communication network Physical security (with optical FC) Improved centralized management Superior performance Simpler cabling Long distances
Different technologies also exist to further optimize or complement backup and restore. These technologies include serverless backup, multipath to tape, backup to tape offload, remote site replication, SAN zoning, HSM, and near online storage. DR deals with the ability of data centers to become fully operational after a disaster. Levels or tiers of DR are from 0 (the lowest) to 7 (the highest). As you move up each level, your ability to recover from a disaster improves, but at a higher expense. Therefore, before deciding upon the right DR tier, a customer should perform a cost vs. time-to-recover analysis and a business impact analysis. Backup encryption maintains data security and integrity by protecting the data against unauthorized access and alterations. tNAS is a solution that provides an easy way to access tape as if it were a NAS storage. You mount the tNAS volume as a traditional NAS mount point, which gives you simple drag-n-drop access to your files. Thus, tNAS simplifies access to archived data and helps companies move to future tape technologies.
4 Backup and Archive Configurations
CHAPTER OBJECTIVES In this chapter, you will learn to: List, describe, and position the following configurations and solutions: o Simple backup, such as point-to-point, HPE StoreEver, and HPE StoreOpen o SAN-based backup, such as connected independent fabrics, HPE StoreEver tape library partitioning, and extended SAN o Remote backup, such as HPE StoreServ Remote Copy software, HPE StoreOnce Catalyst remote site replication, and HPE StoreServ File Persona o Virtualization-related backup, such as HPE StoreOnce Recovery Manager Central for VMware, Veeam backup and restore, and VMware vStorage API for Data Protection (VADP) Describe the HPE Enterprise Secure Key Manager Explain how tiered data retention works with HPE StoreServ Demonstrate how to effectively use the HPE Data Agile BURA Compatibility Matrix
Introduction Familiarity with basic concepts is essential, but this is not sufficient for effective design of BURA solutions. This chapter provides you with sample configurations of HPE backup and archiving solutions for you to understand which components are used for each configuration, what their function is, and how they are connected. Some of the more complex designs are assembled from many components; this chapter provides an enumerated list of steps to illustrate the data flow within these designs. The examples covered in this chapter include direct-attached basic and extended SAN designs. All hardware items shown in these examples are listed in the HPE Data Agile BURA Compatibility Matrix. This compatibility matrix provides information for designing data protection solutions, including backup, restore, and archiving using HPE StoreOnce (disk), HPE StoreEver (tape), and HPE StoreAll (NAS) storage products.
Note For the HPE Data Agile BURA Compatibility Matrix, go http://h20566.www2.hpe.com/hpsc/doc/public/display? sp4ts.oid=412183&$3;mp;amp;docId=emr_na-c04616269&$3;mp;amp;docLocale=en_US.
to:
Simple Backup Configurations Simple backup configurations include point-to-point solutions and direct copy of files to the tape media using the Linear Tape File System (LTFS) software.
Point-to-point configuration
Typical usage: Local and network client backup The point-to-point configuration is the simplest method of connecting the backup server with the backup device. Although the configuration is simple, it can provide enterprise-grade performance and capacity. Examples shown in Figure 4.1 use these HPE technologies: • HPE StoreEver—is enterprise-grade tape library system, built for capacity, performance, and reliability. HPE StoreEver can scale up to over 12,000 tape cartridges in increments of 100 slots for capacity on demand. Incorporating between 1 and 192 drives, you can consolidate and store up to 75 PB (with data compression of 2.5:1) of enterprise data. • HPE StoreOpen—HPE StoreOpen with LTFS for Windows is a tape-based file system for HPE LTO tapes. HPE StoreOpen Standalone for Windows is a free application that helps LTFS users use and manage individual HPE tape drives. Available for Mac and Windows environments, it completely removes the need for low-level terminal commands by guiding you through the full process of selecting, preparing, and mounting LTFS cartridges. It essentially transforms into a device which looks and works like a regular block device, making it easy to search and work with backed up files. In Figure 4.1, the backup servers transfer data to the HPE StoreEver tape drive or the HPE StoreEver tape library. The servers utilize a backup software or HPE StoreOpen with LTFS for managing LTFS functionality on the HPE StoreEver LTO-5 and newer Ultrium tape drives. External tape drives use a SAS interface. Drives contained in the HPE StoreEver tape library use a Fibre Channel (FC) interface. Clients on the LAN transfer data first to the backup server and then send the data to the tape storage. This solution provides data protection for a large number of clients through a dedicated backup server.
Figure 4-1 Point-to-point connectivity and data flow with HPE StoreEver
Table 4.1 explains how the data flows within this point-to-point solution. Table 4-1 Point-to-point solution with HPE StoreEver data flow Step number
Data flows from…
Data flows to…
1
Backup server
HPE StoreEver tape storage device via FC or SAS
HPE StoreOpen with LTFS software
Typical usage: Data transfer between a workstation/host and a standalone HPE StoreEver tape drive or HPE StoreEver tape library for backup or data transport between LTFS solutions. There are two ways of accessing information on the backup media: • Sequential • Random In case of sequential access, each read operation requires that you start reading from the beginning, regardless of where your data is. In case of a tape drive, that means that you have to rewind the tape or keep the record of the current position while handling the tape. Sequential access is suitable for legacy
backup technologies such as the UNIX tar command, but it is not user-friendly. The random access mode allows you to access the information directly without the need for sequential reading from the beginning of the media. This is the common access method for optical (DVD) and magnetic (HDD) devices. To bridge the gap between sequential and random access to tapes, HPE offers LTFS and StoreOpen: • HPE StoreOpen is a set of free software utilities that provides disk-like access to HPE StoreEver tape products using LTFS. • HPE StoreOpen Standalone mounts a tape drive to the host like a disk or a thumb drive, thereby giving you access to the data on the tape cartridge just like a disk drive or a thumb drive. • HPE StoreOpen Automation mounts a tape library to a host and presents each tape cartridge that is in the library as a file folder behind a mount letter. Each file folder is given the name of the barcode label of the tape media and can be accessed like any other folder in the operating system. The user can navigate the directory of an LTO cartridge even if it is not loaded in an LTO tape drive. A cartridge is loaded automatically if a data read or write operation is needed. HPE StoreOpen reads and writes in the LTFS format, thereby making it a great complementary product to other tape solutions that use LTFS, such as the HPE Tape-as-NAS solution, allowing for data exchange between solutions. Figure 4-2 displays the StoreOpen Standalone operating system view and Figure 4.3 displays the StoreOpen Automation operating system view.
Figure 4-2 StoreOpen Standalone operating system view of an LTFS volume
Figure 4-3 StoreOpen Automation operating system view of an LTFS volume
Figure 4-4 StoreOpen with LTFS connectivity and data flow
Table 4.2 explains the StoreOpen Standalone data flow.
Table 4-2 StoreOpen Standalone data flow Step number
Data flows from…
Data flows to…
1
Host/workstation
HPE StoreEver Standalone LTO tape drive via directly connected SAS
Table 4.3 explains the StoreOpen Automation data flow. Table 4-3 StoreOpen Automation data flow Step number
Data flows from…
Data flows to…
1
Host/workstation
HPE StoreEver tape library via directly connected SAS or via FC
San-Based Backup Configurations SAN-based backup configurations typically utilize connected independent fabrics, tape library partitioning, and/or extended SAN. These are explained in this section.
Connected independent fabrics
Typical usage: Connecting multiple SANs. Figure 4.5 illustrates a configuration that consists of independent fabrics, or SAN islands, utilizing either a multiprotocol router (MPR) or Fibre Channel routing (FCR) between switches. Multiple backup servers connect to their local switch within their independent fabric. Through MPR or via licensing for FC routing, each server can connect to the shared tape library as if it were in the same independent fabric. This solution enables the interconnection of devices between SAN fabrics without merging those fabrics, thereby providing a more secure, flexible storage networking foundation.
Figure 4-5 Connected independent fabrics
HPE StoreEver tape library partitioning
Typical usage: Segregation of different generations of LTO media, or when multiple departments or data protection applications need to share a single tape library. Reliance on simple tape devices is not an option for the enterprises. Instead, they use tape libraries. StoreEver is the common name for different tape library devices built by HPE. Tape libraries have a large capacity and can contain multiple tape drives to allow parallel access. To provide parallel access to different clients at the same time, tape libraries can be partitioned. Figure 4.6 illustrates a configuration consisting of two independent backup applications or departments accessing a single tape library. Utilizing the partitioning feature in the tape library management console and HPE Command View for Tape Libraries, each backup application or department is presented with a logical library comprised of a subset of drives and slots from the physical library. HPE Command View for Tape Libraries software is a single-pane-of-glass management software which manages, monitors, and configures all HPE tape libraries through a single console. Tape library partitioning increases flexibility
in the data center and lowers the total cost of ownership (TCO).
Figure 4-6 HPE StoreEver tape library partitioning
Extended SAN
Typical usage: Offsite backup and disaster recovery. Similar to LANs and WANs, SANs also span over long distances. Extended SAN leverages transport technologies and design solutions to achieve the long distance, high-performance communication over FC. Figure 4.7 illustrates a shared HPE StoreEver tape library and HPE StoreOnce Virtual Tape Library (VTL) that use SAN extension technologies to facilitate connectivity between remote sites. Data is read from the disk storage array located at site A and written to the StoreEver tape library or the StoreOnce VTL located at site B over the extended network link. While Fibre Channel over IP (FCIP) and Wave Division Multiplexing (WDM) allow connectivity over very long distances, long-wave small formfactor pluggable (SFP) transceivers can accommodate distances of 10–35 km. This solution illustrates a variety of SAN extension technologies, providing offsite connectivity for remote backup and disaster recovery (DR).
Figure 4-7 Extended SAN connectivity and data flow
Table 4.4 explains the data flow within the extended SAN. Table 4-4 Extended SAN data flow Step number
Data flows from…
Data flows to…
1
Each disk array
Each SAN-attached server via FC
2
Each SAN-attached server
SAN-attached HPE StoreEver tape library or HPE StoreOnce VTL via FC
Learning check questions Reinforce your knowledge and understanding of the topics just covered by completing this learning check: 1. In which tape cartridge slot increments must you scale up the HPE StoreEver tape library system? a) 50 b) 100 c) 150 d) 200 2. Which HPE software enables partitioning of StoreEver tape libraries?
a) StoreOpen Enterprise b) StoreOpen Automation c) Command View for Tape Libraries d) Recovery Manager Central for Virtual Tape Libraries
Learning check answers This section contains answers to the learning check questions. 1. In which tape cartridge slot increments must you scale up the HPE StoreEver tape library system? a) 50 b) 100 c) 150 d) 200 2. Which HPE software enables partitioning of StoreEver tape libraries? a) StoreOpen Enterprise b) StoreOpen Automation c) Command View for Tape Libraries d) Recovery Manager Central for Virtual Tape Libraries
Remote Backup Configurations Remote backup configurations discussed in this section include the following: • • • • • •
HPE StoreServ Remote Copy software HPE StoreOnce Catalyst remote site replication HPE StoreAll backup to HPE StoreEver tape storage with Symantec NetBackup HPE StoreServ File Persona Share Backup HPE StoreServ File Persona software NDMP backup HPE StoreOnce VSA and enterprise remote office/branch office (ROBO)
HPE StoreServ Remote Copy software
Typical usage: Disaster tolerance, business continuity, and long-distance replication. The HPE StoreServ Remote Copy software enables multisite and multimode replication with both midrange and high-end arrays. Remote-copy configurations are based on the relationship between a pair
of storage systems, known as the remote-copy pair. Within a remote-copy pair, the primary storage system A is the system holding the volumes that are copied onto the secondary storage system B. Figure 4.8 illustrates an asynchronous remote copy operation between arrays at different sites with a tape backup at site B. By creating a snapshot of data at site B and presenting this data to a server at site B, this data can be backed up to a library at site B, thereby creating a remote backup of data from site A.
Figure 4-8 HPE StoreServ Remote Copy software connectivity and data flow
Table 4.5 explains the HPE StoreServ Remote Copy data flow. Table 4.5 HPE StoreServ Remote Copy data flow Step number
Data flows from…
Data flows to…
1
HPE StoreServ storage at site A
HPE StoreServ storage at site B via a WAN
2
Snapshot of replicated LUNs
Takes place within the array at site B
3
HPE StoreServ storage at site B
SAN-attached DR server(s) via FC
4
DR server(s)
SAN-attached HPE StoreEver tape library via FC
HPE StoreOnce Catalyst remote site replication
Typical usage: Transferring data from a branch office to the main data center or copying data between sites.
While planning for the backup, archiving, and DR strategies, companies have to take care of backup media handling, physical storage, and security. Traditionally, copies of the tapes have been physically moved between the primary and secondary sites. However, this is slow, expensive, and insecure. To address the speed, security, and cost issues, companies started to leverage LAN and WAN infrastructures. It is easy and relatively cheap to build a LAN infrastructure which can move your backup information over the network. As the volume of data grows and the number of backup locations multiplies, it becomes expensive to maintain the links between the remote sites. HPE offers StoreOnce backup system to address performance, security, and complexity of the backup and archival process. The HPE StoreOnce backup system is a disk-based storage appliance for backing up network media servers or PCs to target devices on the appliance. These devices are configured as either NAS shares, Catalyst stores, or VTL targets for the backup applications. HPE StoreOnce Catalyst allows source- and target-side deduplication. With facilitation from the data protection software, data can also be moved between sites without rehydration. In the data center, another copy of the data can be made to tape after rehydration, utilizing the tape-to-tape copy options in your data protection software.
Note HPE StoreOnce Catalyst remote site replication was covered in more detail in Chapter 3.
HPE StoreAll backup to HPE StoreEver tape storage with Symantec NetBackup
Typical usage: HPE StoreAll Gateway NAS data protection. Figure 4.9 illustrates a Network Data Management Protocol (NDMP) configuration for a three-way NDMP backup of the StoreAll Gateway to a StoreEver tape library. NDMP is an open protocol used to control data backup and recovery communications between the primary and secondary storage in a heterogeneous network environment. The configuration consists of two StoreAll Gateway clusters with HPE StoreServ storage attached, a media or master server, and a StoreEver tape library. Cluster 1 acts as the NDMP Host Data Mover or Data Service Provider (DSP). The NetBackup for NDMP master or media server acts as the data management application (DMA) server and manages the robotics. During the backup session, StoreAll Gateway cluster 1 acts as the Tape Data Server and StoreAll Gateway cluster 2 acts as the NDMP Data Server.
Figure 4-9 HPE StoreAll backup to HPE StoreEver Tape Library with Symantec NetBackup connectivity and data flow
Table 4.6 explains the data flow within this environment. Table 4-6 StoreAll backup to StoreEver tape storage with Symantec NetBackup data flow Step number
Data flows from…
Data flows to…
1
HPE StoreAll Gateway cluster 2 (via NDMP)
HPE StoreAll Gateway cluster 1 (via LAN)
2
SAN-attached HPE StoreAll Gateway cluster 1
SAN-attached HPE StoreEver tape library via FC
Figure 4.10 illustrates the data flow for a standard tape backup scenario for the HPE StoreAll Gateway, where the backup software is installed directly on the Gateway servers. The StoreAll cluster and StoreEver tape library are both SAN-attached.
Figure 4-10 Standard HPE StoreAll backup to HPE StoreEver Tape Library with Symantec NetBackup connectivity and data flow
Table 4.7 explains the data flow for a standard tape backup for StoreAll Gateway when the backup software is installed on the Gateway servers. Table 4-7 A standard tape backup data flow for StoreAll Gateway with backup software on Gateway servers Step number
Data flows from…
Data flows to…
1
SAN-attached Gateway cluster
SAN-attached StoreEver tape library via FC
HPE StoreServ File Persona Share Backup
Typical usage: NAS file storage. Figure 4.11 shows a Share Backup with the HPE StoreServ File Persona software. The HPE StoreServ File Persona software suite is a licensed feature of the HPE StoreServ operating system and enables a
rich set of file protocols and core file data services for client workloads. File Persona bridges the gap between the native block storage StoreServ capacity and common file system-based storage. Figure 4.11 illustrates an Ethernet LAN for reading and transferring data from HPE StoreServ 7400c, running the HPE StoreServ File Persona services, to the media server. The media server then uses the appropriate protocol to read the data from the File Persona and writes it to the target. The target can be an HPE StoreEver tape library using FC, an HPE StoreOnce appliance using FC/SMB or NFS (depending on the StoreOnce model), or an HPE StoreAll appliance using the SMB or NFS protocol.
Figure 4-11 HPE StoreServ File Persona Share Backup connectivity and data flow
Note You can use the entire HPE StoreServ converged controller family in place of HPE StoreServ 7400c. Table 4.8 explains the data flow within the HPE StoreServ File Persona Share Backup solution. Table 4-8 HPE StoreServ File Persona Share Backup data flow Step number Data flows from…
Data flows to…
1
HPE StoreServ 7400c running File Persona (Ethernet)
Media server via a LAN
2
Media server
SAN-attached HPE StoreEver tape library via FC, HPE StoreOnce storage via FC/LAN, or HPE StoreAll storage via LAN
HPE StoreServ File Persona software NDMP backup
Typical usage: NAS file storage. Figure 4.12 illustrates a NDMP backup between the File Persona and the HPE StoreOnce/iSCSI library via the Gigabit Ethernet NIC. The NDMP protocol transports the data between the NAS devices and the backup devices. This removes the need for transporting the data through the backup server itself, thus enhancing speed and removing load from the backup server. Figure 4.12 shows the HPE StoreServ 7400c solution running the HPE StoreServ File Persona services acting as the NDMP data mover and the ISV NDMP agent server acting as the data management application (DMA). The DMA controls and maintains the overall backup schedule for the File Persona systems and arranges the activities between them and the iSCSI library when the backup session starts. Then, the DMA is released from the session and once the session is completed, the metadata is sent to the DMA for review. The independent software vendor (ISV) NDMP agent controls the metadata and the NDMP flow control to the NDMP server, where the data can then be sent to the iSCSI library for backup.
Figure 4-12 HPE StoreServ File Persona NDMP backup connectivity and data flow
The ISV used for the NDMP backup must support NDMP. You can use the entire HPE StoreServ converged controller family in place of the HPE StoreServ 7400c solution. Table 4.9 explains the data flow for the HPE StoreServ File Persona software NDMP backup.
Table 4-9 Data flow for the HPE StoreServ File Persona software NDMP backup Step number
Data flows from…
Data flows to…
1
HPE StoreServ 7400c running HPE StoreServ File Persona services (Gigabit Ethernet)
iSCSI library via LAN
HPE StoreOnce VSA and enterprise remote office/branch office (ROBO)
Typical usage: Managing local backup and offsite backup copies. HPE StoreOnce VSA is a virtual appliance running the HPE StoreOnce software. HPE StoreOnce VSA is optimal for centrally managed enterprise remote office or branch office (ROBO) locations with local backup and offsite backup copies. Branch offices can send backup data to a local HPE StoreOnce VSA target with data deduplication-optimized copy over a WAN or to a remote HPE StoreOnce backup appliance located at the data center and DR site. Figure 4.13 illustrates the HPE StoreOnce VSA and enterprise ROBO configuration and its data flow.
Figure 4-13 HPE StoreOnce VSA and centrally managed ROBO connectivity and data flow
Virtualization-Related Backup Configurations These days, virtualization is widely accepted in the enterprise environments. Server-based virtualization enables consolidation of multiple physical servers onto fewer ones running virtual machines (VMs) to reduce cost and increase utilization. Virtualization-related backup configurations discussed in this section include the following: • • • •
HPE StoreOnce Recovery Manager Central for VMware VMware vStorage API for Data Protection (VADP) Veeam backup and restore to HPE StoreEver tape storage with HPE StoreServ Veeam backup to HPE StoreOnce with tape offload
HPE StoreOnce Recovery Manager Central for VMware
Typical usage: Protect VMware VM disks and datastores. HPE StoreOnce Recovery Manager Central (RMC) provides native integration of HPE StoreServ and HPE StoreOnce in a converged data protection solution that removes the dependency on traditional backup software. Figure 4.14 illustrates how RMC for VMware (RMC-V) enables protection of VMware VM disks and datastores using consistent snapshots (which can then be used for rapid online recovery) and StoreOnce Express Protect Backup by facilitating backups of snapshots from StoreServ to StoreOnce. Backups on StoreOnce are self-contained volumes that are deduplicated to save space. These backups can be used to recover data back to the original StoreServ storage system or to a different system. Administrators can access all functionality from within the VMware vCenter web client.
Figure 4-14 HPE StoreOnce Recovery Manager Central for VMware connectivity and data flow
Table 4.10 explains the data flow within the HPE StoreOnce RMC-V solution. Table 4-10 HPE StoreOnce RMC-V data flow Step number
Data flows from…
Data flows to…
1
The vCenter administrator initiates StoreOnce Express Protect Backup of a VM or a datastore.
2
The VM I/O is quiesced, and the VM snapshot is taken.
3
The RMC virtual appliance instructs the StoreServ array to take a snapshot of the virtual volume.
4
VM snapshots are deleted and the VM I/O resumes.
5
The StoreServ array copies the virtual volume snapshot (VM data)…
…to the RMC virtual appliance via FC.
6
RMC virtual appliance moves data via the Catalyst client low-bandwidth transfer…
…to the StoreOnce appliance via FC.
The snapshot is exported to the RMC virtual appliance.
VMware vStorage API for Data Protection (VADP)
Typical usage: SAN backup of VMware. VMware provides a layer of communication between the VM and the underlying storage infrastructure. It works as a form of a file system, called the VMFS, and provides advanced features such as clustering capabilities and snapshot functionality. To leverage the underlying hardware capabilities, VMware provides application programming
interfaces (APIs) to third-party vendors. The VMware API for Data Protection is called VADP. VADP enables the backup software to perform centralized VM backups without the overhead of running individual backups from inside each VM. Figure 4.15 illustrates a simple VADP backup environment with a vCenter Server, an ESX server running multiple VMs, a backup server with data protection software that is integrated with VADP, a StoreServ array where the datastore LUNs reside, and a StoreOnce storage or a StoreEver tape library for backup.
Figure 4-15 VMware VADP data protection connectivity and data flow
Table 4.11 explains the data flow within a VMware VADP solution. Table 4-11 VMware VADP data flow Step number
Data flows from…
Data flows to…
1
VADP integrated backup server requests a VM.
vCenter Server via Ethernet
2
vCenter Server identifies the correct ESX server and requests a snapshot of the target VM.
ESX server via Ethernet
3
A software snapshot is created within the ESX server.
Within the ESX server
4
HPE StoreServ (snapshot image backup data)
Backup server via FC
5
Backup server
HPE StoreOnce storage or HPE StoreEver tape library via FC
6
The backup server sends a command to destroy the backup snapshot.
vCenter Server via Ethernet
7
The vCenter Server directs the correct ESX host to destroy the backup snapshot.
ESX server via Ethernet
Veeam backup and restore to HPE StoreEver tape storage with HPE StoreServ integration
Typical usage: SAN backup of VMware and Microsoft Hyper-V to a physical tape library. HPE products offer integration with many third-party software backup solutions, such as Veeam, a solution that is popular with customers running the VMware virtual infrastructure. Figure 4.16 illustrates a virtual environment utilizing a Veeam backup solution with the HPE StoreServ integration. Veeam works directly with the HPE StoreServ storage to create storage snapshots and mount them to the Veeam Proxy. It is similar to the direct SAN access, but less impactful on the infrastructure and the VMs being backed up.
Figure 4-16 Veeam backup to tape connectivity and data flow
Table 4.12 explains the data flow within the Veeam backup to tape solution.
Table 4-12 Veeam backup to tape data flow Step number
Data flows from…
Data flows to…
1
Veeam backup server
ESX server to create a snapshot via LAN/WAN
2
The ESX server creates the snapshot and sends acknowledgement.
Veeam Proxy server via LAN (occurs simultaneously with step 3)
3
Veeam backup server
HPE StoreServ storage creates the snapshot.
4
Veeam backup server
The ESX server deletes the snapshot via LAN.
5
The StoreServ storage snapshot is mounted.
Veeam Proxy server via FC
6
The Veeam Proxy server stores the data.
Veeam repository server via FC
7
Veeam repository server
HPE StoreEver tape storage via FC
Figure 4.17 illustrates a VM restore from the HPE StoreEver tape storage with Veeam.
Figure 4-17 VM restore from tape with Veeam connectivity and data flow
Table 4.13 explains the data flow when a VM is restored from tape with Veeam.
Table 4.13 VM restore from tape with Veeam data flow Step number
Data flows from…
Data flows to…
1
The system administrator initiates the restore operation from the Veeam backup server via LAN Veeam GUI.
2
The Veeam backup server sends a command to load the correct HPE StoreEver tape library via FC tapes.
3
The HPE StoreEver tape library sends the restore image.
Veeam backup server via FC
4
The Veeam backup server extracts the VM from the restore image.
Veeam repository server via LAN
5
The Veeam backup server restores the VM image.
ESX server and datastore LUNs on the HPE StoreServ storage via LAN
6
The Veeam backup server sends a command to delete the restore image.
Repository server via LAN
Figure 4.18 illustrates a user-directed restore from a tape with Veeam.
Figure 4-18 User-directed restore from tape with Veeam connectivity and data flow
Table 4.14 explains the data flow when a user initiates a restore from tape using Veeam.
Table 4-14 User-directed restore from tape with Veeam Step number Data flows from…
Data flows to…
1
The system administrator initiates a restore from the Veeam GUI.
Veeam backup server via LAN
2
The Veeam backup server sends a command to load the correct tapes.
HPE StoreEver tape library via FC
3
The HPE StoreEver tape library sends the restore image.
Veeam backup server via FC
4
The Veeam backup server sends the restore image.
Veeam repository server via FC
5
The system administrator manually directs the VM restore or other restore options such as Microsoft Exchange or Microsoft SharePoint using the Veeam GUI.
Veeam backup server via LAN
Veeam backup to HPE StoreOnce with tape offload
Typical usage: Offload to tape using Veeam backup and replication software. Figure 4.19 illustrates how HPE StoreEver is used with Veeam backup to tape (archival of data from HPE StoreOnce). Veeam backup and replication enables the ability to archive backup files stored in backup repositories to HPE StoreEver tapes.
Figure 4-19 Veeam backup to HPE StoreOnce connectivity and process flow
Within this configuration, the process flow is as follows: 1. The Veeam backup server enumerates backup files in the database backup catalog and tape catalog to detect if any data was modified since the last backup and queues the changes for archiving. 2. The Veeam backup server connects to the transport service to start the data transport process. 3. The modified data flows from the backup repository on the HPE StoreOnce VSA to the HPE StoreEver tape. For this process to occur, the source transport service on the Veeam backup server reads the data from the HPE StoreOnce VSA and the target transport service on the Veeam backup server writes the data to the HPE StoreEver tape. 4. The Veeam backup server updates the backup and tape catalogs.
Hpe Enterprise Secure Key Manager (Eskm) Encryption ensures information security and consistency. Data is encrypted at the source, in transit, and at the destination. Modern encryption algorithms are based on digital certificates also known as security keys. Loosing or exposing private keys means losing the information confidentiality; therefore, key management becomes very important. As the number of services within the company increases, the number of security keys that must be managed multiplies. The keys must be kept secure and backed up properly. HPE offers the Enterprise Secure Key Manager (ESKM) appliance to handle backup and restoration of security keys.
Figure 4-20 HPE Enterprise Secure Key Manager
ESKM is a complete key management solution for generating, storing, serving, controlling, and auditing access to data encryption keys in a secure appliance. Figure 4.21 shows the integration of encryption tape appliances into a SAN environment consisting of multiple servers, primary disk storage (HPE StoreServ), and an HPE StoreEver tape library. The backup data goes directly to the HPE StoreEver LTO tape drive. The ESKM sends an encryption key to the tape library across the Ethernet and the library passes the key to the drive via its management port.
Figure 4-21 HPE ESKM connectivity and data flow
Table 4.15 explains the data flow within the HPE ESKM solution.
Table 4-15 HPE ESKM data flow Step number
Data flows from…
Data flows to…
1
SAN-attached HPE StoreServ storage
SAN-attached server via FC
2
SAN-attached server
SAN-attached HPE StoreEver tape library via FC
3
HPE StoreEver tape library requests and receives the key.
ESKM via Ethernet
4
Library passes the key to the LTO tape drive over the management port.
ESKM via Ethernet
5
ESKM (1)
ESKM (2) keys are automatically replicated between the nodes via Ethernet.
Tiered Data Retention For Hpe Storeserv Figure 4.22 illustrates a SAN that combines HPE StoreServ storage, HPE StoreOnce backup, and HPE StoreEver tape to offer a complete business protection solution. Users can implement a disk-based solution with the HPE StoreOnce backup for daily, short-term backups and fast recovery. HPE StoreEver tape is used in conjunction with the HPE StoreOnce backup to address long-term archival and compliance objectives. As the data on the HPE StoreOnce backup appliance becomes infrequently or never accessed, the tiered data retention policies can be used to expire the data on the HPE StoreOnce while moving that data to a secondary copy on the HPE StoreEver tape. You can also transport a secondary LTO cartridge offsite for DR purposes. This reduces risk, increases efficiency, and lowers storage costs, which enable you to maximize the value you get from your data over its entire lifecycle while minimizing TCO.
Figure 4-22 Tiered data retention for HPE StoreServ connectivity and data flow
Table 4.16 explains how the data flows within the tiered data retention for HPE StoreServ solution.
Table 4-16 Tiered data retention for HPE StoreServ data flow Step number
Data flows from…
Data flows to…
1
Primary storage (HPE StoreServ 7440c)
SAN-attached server via FC
2
SAN-attached server
SAN-attached HPE StoreOnce 5100 and SAN-attached HPE StoreEver MSL6480 via FC
3
SAN-attached HPE StoreEver MSL6480
Offsite tape vaulting
Learning check questions Reinforce your knowledge and understanding of the topics just covered by completing this learning check: 1. Which HPE solution consists of a disk-based storage appliance for backing up network media servers or PCs to target devices on the appliance and supports source and target deduplication? a) StoreServ Remote Copy b) StoreOnce Catalyst remote site replication c) StoreAll backup to StoreEver tape storage with Symantec NetBackup d) StoreServ File Persona Share Backup 2. Your customer is looking for a backup solution for dispersed remote and branch offices. These offices run virtualized environments and must send their deduplicated data to a central data center and its DR site over a WAN connection. Which HPE solution should you recommend to this customer for these offices? a) StoreServ File Persona software NDMP backup b) StoreAll backup to StoreEver tape storage with Symantec NetBackup c) StoreOnce Catalyst remote site replication d) StoreOnce VSA 3. What does VADP accomplish? a) It enables centralized backup software to back up VMs without running individual backups inside these VMs. b) It enables automatic data archiving from StoreServ arrays to StoreEver without using specialized archival software. c) It generates, stores, serves, controls, and audits access to data encryption keys in a secure appliance. d) It transports data between the NAS devices and the backup devices.
Learning check answers This section contains answers to the learning check questions.
1. Which HPE solution consists of a disk-based storage appliance for backing up network media servers or PCs to target devices on the appliance and supports source and target deduplication? a) StoreServ Remote Copy b) StoreOnce Catalyst remote site replication c) StoreAll backup to StoreEver tape storage with Symantec NetBackup d) StoreServ File Persona Share Backup 2. Your customer is looking for a backup solution for dispersed remote and branch offices. These offices run virtualized environments and must send their deduplicated data to a central data center and its DR site over a WAN connection. Which HPE solution should you recommend to this customer for these offices? a) StoreServ File Persona software NDMP backup b) StoreAll backup to StoreEver tape storage with Symantec NetBackup c) StoreOnce Catalyst remote site replication d) StoreOnce VSA 3. What does VADP accomplish? a) It enables centralized backup software to back up VMs without running individual backups inside these VMs. b) It enables automatic data archiving from StoreServ arrays to StoreEver without using specialized archival software. c) It generates, stores, serves, controls, and audits access to data encryption keys in a secure appliance. d) It transports data between the NAS devices and the backup devices.
Learner Activity Using your favorite search engine, search the Internet for the HPE Data Agile BURA Compatibility Matrix. Save it on your desktop and open it. Using the compatibility matrix, answer these questions: 1. What is the published data of this document and version? _____________________________________________________________________________ 2. What are the key updates in this version? _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ 3. To back up data to an HPE StoreAll target running OS version 6.5.x, what minimum version of HPE
Data Protector must you use? _____________________________________________________________________________ 4. You want to connect the StoreEver Ultrium 920 (LTO-3 HH SCSI) tape drives to your existing ProLiant DL servers. Which ProLiant DL servers are not supported with these tape drives? _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ 5. Are VMware ESX hosts supported with direct-attached SAS tape drives? (Hint: Use the Virtual Machine Support section.) Yes No 6. Are VMware ESX hosts supported with direct-attached SCSI tape drives? (Hint: Use the Virtual Machine Support section.) Yes No
Learner activity answers This section contains answers to the learner activity. 1. What is the published data of this document and version? April 2015, version 2 (However, this answer will change over time as new documents are released.) 2. What are the key updates in this version? (The answer will also vary, depending on the release of this document.) New StoreOnce Software version 3.12.1 New Support for Symantec Backup Exec 15 New LTO-6 Firmware New MSL6480 Tape Library Firmware 4.70 3. To back up data to an HPE StoreAll target running OS version 6.5.x, what minimum version of HPE Data Protector must you use (Figure 4.23)?
Figure 4-23 Applications certified to back up data using StoreAll as a target
4. You want to connect the StoreEver Ultrium 920 (LTO-3 HH SCSI) tape drives to your existing ProLiant DL servers. Which ProLiant DL servers are not supported with these tape drives (Figure 4.24)?
Figure 4-24 HPE StoreEver standalone tape support
5. Are VMware ESX hosts supported with direct-attached SAS tape drives? (Hint: Use the Virtual Machine Support section.) Yes No 6. Are VMware ESX hosts supported with direct-attached SCSI tape drives? (Hint: Use the Virtual Machine Support section.) Yes No
Additional References
An IDC press release: Companies Looking to Cut Cost &$3;mp;amp; Complexity out of Storage Infrastructure Drive Growth for New Storage Technologies in the Second Quarter class="ImageCaption">http://www.idc.com/getdoc.jsp?containerId=prUS25883215
Summary BURA solutions can be direct-attached or SAN-based. HPE Data Agile BURA Compatibility Matrix provides information for designing data protection solutions. Simple backup configurations use the point-to-point topology, where the backup device is connected directly to the backup server. Data is then copied directly to the tape media. HPE StoreEver and StoreOpen solutions can be used in this topology. SAN-based backup configurations are more advanced and often employ sophisticated methodologies, such as connecting independent SAN fabrics, partitioning tape libraries, and extending SAN over long distances. Remote backup configurations vary in complexity and in technologies they employ. They include multisite replication, deduplication, tiered backups and archival, backups of clusters and virtual environments, and snapshots. Encryption ensures data security and consistency. Modern encryption algorithms use security keys, which must be protected and managed. Enterprise Secure Key Manager is a complete key management solution for generating, storing, serving, controlling, and auditing access to data encryption keys in a secure appliance.
5 Practice Test Introduction The Designing HPE Backup Solutions exam (HPE0-J77) tests your ability to understand technologies, concepts, and strategies of effective Enterprise Backup Solutions, including identifying the correct components for a backup and recovery solution. It also tests your understanding of HPE's BURA solutions to provide customers with complete data protection. This exam also tests your competency level and understanding of HPE StoreOnce Backup Systems and BURA best practices, configuration guidelines, and recommended configurations. The intent of this study guide is to set expectations about the context of the exam and to help candidates prepare for it. Recommended training to prepare for this exam can be found at the HPE Certification and Learning website (http://certification-learning.hpe.com) and in books like this one. It is important to note that although training is recommended for exam preparation, successful completion of the training alone does not guarantee you will pass the exam. In addition to the training, exam items are based on knowledge gained from on-the-job experience and other supplemental reference material that may be specified in this guide.
Minimum Qualifications Candidates should have a minimum of one year of experience in storage technologies with a strong understanding of how to design enterprise storage solutions based on customer needs.
HPE0-J77 EXAM DETAILS The following are details about the HPE0-J77 exam: • • • •
Number of items: 50 Exam time: 1 hour 15 minutes Passing score: 70% Item types: Multiple choice (single-response), multiple choice (multiple response), matching, and drag-and-drop • Reference material: No online or hard copy reference material is allowed at the testing site.
HPE0-J77 TESTING OBJECTIVES This exam validates that you can successfully perform the following objectives. Each main objective is given a weighting, indicating the emphasis this item has in the exam. 24% Differentiate and apply foundational storage architectures and technologies. • Describe backup systems technology.
• Describe data availability methods and technologies. • Describe and differentiate between replication, backup and archiving. 40% Plan and design HPE storage solutions. • • • • •
Discover backup solution opportunities. Plan and design the backup solution. Size the backup solution. Review and validate the backup solution proposal. Present the backup solution to the customer.
17% Performance-tune, optimize, and upgrade HPE storage backup solutions. • Identify and compare the existing backup solution design to the best practices documentation. • Optimize and performance-tune the storage backup solution.6% Tune, optimize, and upgrade HP Networking solutions. 19% Manage, monitor, administer, and operate HPE storage backup solutions. • • • •
Install, configure, set up, and demo proof-of-concept for HPE storage backup solutions. Demonstrate data replication tasks for proof-of-concept. Demonstrate external array storage for proof-of-concept. Demonstrate backup and archival storage in a single-site environment for proof-of-concept.
Test Preparation Questions and Answers The following questions will help you measure your understanding of the material presented in this eBook. Read all of the choices carefully; there might be more than one correct answer. Choose all correct answers for each question.
Questions 1. What is the biggest cause of unplanned downtime? a. Human errors b. Software bugs and crashes c. Cyber-attacks and computer viruses d. Hardware failures e. Theft 2. Which of the following are examples of intangible costs of downtime? (Select two.) a. Lost transaction revenue b. Brand damage c. Marketing costs
d. Lost inventory e. Decrease in stock value 3. In your environment, you have 2500 identical disk drives with a MTBF rate of 500,000 per drive. How frequently can you expect a disk drive failure? a. Approximately every 2 days b. Approximately every 4 days c. Approximately every 8 days d. Approximately every 16 days 4. What defines the length of time needed to restore the failed system operations back to normal after a disaster or a disruption of service? a. RTO b. MTTR c. RPO d. MTBF 5. Which availability technology is ideal for protection against regional and metropolitan disasters? a. Generators b. RAID c. Redundant server components d. Remote copy 6. During creation of your backup and recovery strategy, which step is recommended by HPE to be performed after grouping your data into jobs? a. Calculate the backup frequency b. Determine the data protection technology c. Decide on the archival methodology d. Assign levels of importance to your data e. Choose your backup tape rotation scheme 7. Your customer has the following backup characteristics and requirements: o The data set changes frequently each day. o The customer has only a very limited backup window each day. o If data is lost and must be restored, the customer can afford a relatively slow restore. Which type of backup should you recommend? a. Incremental b. Differential c. Image-based d. Offline e. File-based 8. What is the biggest disadvantage of the FIFO backup tape rotation scheme? a. It is too complex. b. It has the shortest tail of daily backups. c. It can suffer from possible data loss.
d. It requires the largest number of tapes. 9. Which type of backup, performed by a central backup server, takes the last full backup of multiple networked clients, appends incremental or differential backups to the full backup set, and then promotes it to the full backup to save time and network bandwidth? a. Synthetic backup b. Virtual full backup c. Working set backup d. Block-level incremental backup
10. A customer needs to back up her environment but cannot afford to shut down running business-critical applications. She is also cost-sensitive to the storage required to protect her environment. Which technique would best meet this customer’s requirements? a. Breakaway mirror snapshots b. Copy-on-write snapshots c. Block-level incremental backup d. Synthetic backup
11. Which technology detects redundant data and stores or transfers only one instance of such data? a. Deduplication b. Snapshots c. Copy-on-write d. Data tiering e. Split-mirrors
12. Which term is used to describe a situation in which the deduplication algorithm computes the same hash number for a new block of data as the data already stored and does not store this new data? a. True negative b. False negative c. True positive d. False positive
13. Which technology transmits a single ray of light as its carrier and is used for long distances? a. FCoE b. iSCSI c. Multimode fiber d. Shortwave fiber e. Single mode fiber
14. Refer to Figure 5.1.
Figure 5-1 Sample tNAS solution
Which component is referred to by label A in this tiered storage and data lifecycle management solution based on tNAS? a. QNM agent b. QStar ASM server c. QStar ASM cache d. StoreAll Gateway
15. What is required to implement tier 3 (electronic vaulting) disaster recovery plan? a. A second, fully operational site (permanently running but not processing) b. A second, fully operational site (permanently running and actively processing) c. An automated, business-integrated solution with high-speed replication and clustering d. Two active sites with high-bandwidth connection between them and two-phase transaction commit
16. What must you use to connect multiple independent SAN fabrics that consist of HPE StoreServ storage, HPE StoreEver tape libraries, and HPE StoreOnce VTLs? (Select two.) a. Multiprotocol router b. StoreAll Gateway server c. StoreServ File Persona server d. Fibre Channel Data-Plane Forwarder e. FCR-licensed Fibre Channel switch
17. Refer to Figure 5.2.
Figure 5-2 Tape library partitioning
Figure 5-2 shows HPE StoreEver tape library partitioning for concurrent access from applications A and B. Which software from HPE must be used to achieve this configuration? a. StoreOpen Enterprise b. StoreServ Remote Copy c. Command View for Tape Libraries d. StoreAll backup to StoreEver tape storage with Symantec NetBackup
18. Refer to Figure 5.3.
Figure 5-3 HPE StoreServ File Persona Share Backup configuration
Which highlighted component within the HPE StoreServ File Persona Share Backup configuration is responsible for using the appropriate protocol to read the data from the HPE StoreServ File Persona and writing it to the target backup device? a. Media server b. vCenter server c. ISV NDMP agent d. Gateway server
19. What is the target use case for HPE StoreOnce Catalyst remote site replication? a. Data protection of StoreAll Gateways b. Transfer of data from a branch office to the main data center c. Automated management of local backup and offsite backup copies d. Disaster tolerance, business continuity, and long-distance replication
20. What does the NDMP protocol achieve in backup and disaster recovery environments? (Select two.) a. It removes the need to transport data through the backup server. b. It enables long-distance replication through incompatible gateways. c. It securely transports data encryption keys over an unprotected LAN. d. It enables automatic data archiving from StoreServ arrays to StoreEver without using specialized archival software. e. It controls data backup and recovery communications between the primary and secondary storage in a heterogeneous environment.
Answers 1. What is the biggest cause of unplanned downtime? a. Human errors
b. Software bugs and crashes c. Cyber-attacks and computer viruses d. Hardware failures e. Theft Figure 1-1, Causes of downtime and data loss or unavailability, indicates that hardware failures attribute 39% of unplanned downtime, followed by human errors (29%), software corruption (14%), theft (9%), and computer viruses (6%). For more information, see Chapter 1, section Describing causes of downtime. 2. Which of the following are examples of intangible costs of downtime? (Select two.) a. Lost transaction revenue b. Brand damage c. Marketing costs d. Lost inventory e. Decrease in stock value Examples of intangible and indirect costs related to downtime include the following: • • • • • • •
Loss of business opportunities Loss of employees and/or employee morale Loss of goodwill in the community Decrease in stock value Loss of customers and/or departure of business partners Brand damage Shift of market share to competitors o Bad publicity and press
For more information, see Chapter 1, section Estimating the cost and impact of downtime. 3. In your environment, you have 2500 identical disk drives with a MTBF rate of 500,000 per drive. How frequently can you expect a disk drive failure? a. Approximately every 2 days b. Approximately every 4 days c. Approximately every 8 days d. Approximately every 16 days The formula for the total MTBF is as follows:
Therefore,
For more information, see Chapter 1, section Calculating MTBF. 4. What defines the length of time needed to restore the failed system operations back to normal after a disaster or a disruption of service?
a. RTO b. MTTR c. RPO d. MTBF RTO defines the length of time necessary to restore the system operations after a disaster or a disruption of service. RTO includes at minimum the time to correct the situation (“break fix”) and to restore any data. It can, however, include factors such as detection, troubleshooting, testing, and communication to the users. For more information, see Chapter 1, section Recovery Time Objective. 5. Which availability technology is ideal for protection against regional and metropolitan disasters? a. Generators b. RAID c. Redundant server components d. Remote copy Table 1-2, Options that may be used to achieve availability, indicates that remote copy is a technology that can protect against regional and metropolitan downtimes. The other options can aid in such protection but are implemented within the servers or data centers and are, therefore, impacted by regional and metropolitan disasters. For more information, see Chapter 1, section Availability technologies. 6. During creation of your backup and recovery strategy, which step is recommended by HPE to be performed after grouping your data into jobs? a. Calculate the backup frequency b. Determine the data protection technology c. Decide on the archival methodology d. Assign levels of importance to your data e. Choose your backup tape rotation scheme Per HPE, your backup and recovery plan should include these ordered steps: a. Identifying which data must be saved. b. Grouping this data into “ jobs.” c. Assigning levels of importance to this data. d. Calculating the backup frequency. e. Protecting this data. f. Testing your backup operations. g. Maintaining your backup history. h. Archiving your tapes. i. Defining and testing your restore procedures For more information, see Chapter 2, section Backup basics— your backup and recovery plan. 7. Your customer has the following backup characteristics and requirements: o The data set changes frequently each day. o The customer has only a very limited backup window each day. o If data is lost and must be restored, the customer can afford a relatively slow restore.
Which type of backup should you recommend? a. Incremental b. Differential c. Image-based d. Offline e. File-based According to recommendations, you should choose an incremental backup when: o You have a high percentage of daily changes in your data set. o You only have a limited backup window per day. o You can accept a relatively slow restore (which translates to a long downtime). For more information, see Chapter 2, section When to choose incremental vs. differential backup. 8. What is the biggest disadvantage of the FIFO backup tape rotation scheme? a. It is too complex. b. It has the shortest tail of daily backups. c. It can suffer from possible data loss. d. It requires the largest number of tapes. The FIFO scheme suffers from the possibility of data loss. Suppose an error is introduced into the data but the problem is not identified until several generations of backups and revisions have taken place. When the error is detected, all backup files contain this error. For more information, see Chapter 2, section First In, First Out (FIFO). 9. Which type of backup, performed by a central backup server, takes the last full backup of multiple networked clients, appends incremental or differential backups to the full backup set and then promotes it to the full backup to save time and network bandwidth? a. Synthetic backup b. Virtual full backup c. Working set backup d. Block-level incremental backup The question describes a synthetic backup, which takes the last full backup from clients and appends their incremental or differential backups to the full backup set. Once the appending procedure is done, the result is stored to the same backup target as the source data sets and is promoted to the full backup. For more information, see Chapter 2, section Synthetic backup.
10. A customer needs to back up her environment but cannot afford to shut down running business-critical applications. She is also cost-sensitive to the storage required to protect her environment. Which technique would best meet this customer’ s requirements? a. Breakaway mirror snapshots b. Copy-on-write snapshots c. Block-level incremental backup d. Synthetic backup The customer’s need to perform backups without stopping the running applications leads to snapshots (neither the synthetic backup nor the block-level incremental backup meet this requirement).
Then, the choice is between the breakaway mirror (clone) snapshot and the copy-on-write (point-intime) snapshot. Breakaway mirror has a larger storage overhead because it requires at least two sets of data. Therefore, copy-on-write is the best solution for this customer. For more information, see Chapter 2, section Snapshots (frozen images, off-host backups, or nondisruptive backups).
11. Which technology detects redundant data and stores or transfers only one instance of such data? a. Deduplication b. Snapshots c. Copy-on-write d. Data tiering e. Split-mirrors Per Wikipedia, data deduplication is a specialized data compression technique for eliminating duplicate copies of repeating data. This is also described in the Deduplication section of Chapter 3. For more information, see Chapter 3, section Deduplication.
12. Which term is used to describe a situation in which the deduplication algorithm computes the same hash number for a new block of data as the data already stored and does not store this new data? a. True negative b. False negative c. True positive d. False positive When a hash collision occurs, the system will not store the new data because it sees that its hash number already exists in the index. This is called a false positive and can result in data loss. For more information, see Chapter 3, section Deduplication.
13. Which technology transmits a single ray of light as its carrier and is used for long distances? a. FCoE b. iSCSI c. Multimode fiber d. Shortwave fiber e. Single mode fiber Per SNIA dictionary (http://www.snia.org/education/dictionary/s), single mode fiber is optical fiber designed for the transmission of a single ray or mode of light as a carrier. Single mode fiber transmission is typically used for long-distance signal transmission. For more information, see the Learner activity section in Chapter 3.
14. Refer to Figure 5.1.
Figure 5-1 Sample tNAS solution
Which component is referred to by label A in this tiered storage and data lifecycle management solution based on tNAS? a. QNM agent b. QStar ASM server c. QStar ASM cache d. StoreAll Gateway The referenced component in Figure 5.1 is QStar ASM cache, which is the tNAS cache behind the mount point of the tNAS solution. It is used as a temporary storage until the data is written to the tape drive behind the cache. For more information, see Chapter 3, section Tiered storage and data lifecycle management with tNAS.
15. What is required to implement tier 3 (electronic vaulting) disaster recovery plan? a. A second, fully operational site (permanently running but not processing) b. A second, fully operational site (permanently running and actively processing) c. An automated, business-integrated solution with high-speed replication and clustering d. Two active sites with high-bandwidth connection between them and two-phase transaction commit Companies having a tier 3 disaster recovery plan do have a second and fully operational site which is permanently up and running (but not processing). In tier 3, electronic vaulting is supported for a subset of critical data. This means transmitting business-critical data electronically to the secondary site where backups are automatically created. The permanently running hot site increases the cost even more, but the time for the recovery of business-critical data is significantly reduced. A second, fully operational site (permanently running and actively processing) option is describing tier 4 (electronic vaulting with active 2nd site).
Automated, business-integrated solution with high-speed replication and clustering option is describing tier 7 (highly automated, business-integrated solution). Two active sites with high-bandwidth connection between them and two-phase transaction commit is describing tier 5 (two-site, two-phase commit). For more information, see Chapter 3, section Backup and disaster recovery tiers.
16. What must you use to connect multiple independent SAN fabrics that consist of HPE StoreServ storage, HPE StoreEver tape libraries, and HPE StoreOnce VTLs? (Select two.) a. Multiprotocol router b. StoreAll Gateway server c. StoreServ File Persona server d. Fibre Channel Data-Plane Forwarder e. FCR-licensed Fibre Channel switch Figure 4-5 in Chapter 4 illustrates a configuration consisting of independent fabrics, with HPE StoreServ storage, HPE StoreEver tape libraries, and HPE StoreOnce VTLs. To enable interconnection of devices between these SAN fabrics, without merging them, a multiprotocol router (MPR) or a Fibre Channel routing-enabled switch is used. For more information, see Chapter 4, section Connected independent fabrics.
17. Refer to Figure 5.2.
Figure 5-2 Tape library partitioning
Figure 5-2 shows HPE StoreEver tape library partitioning for concurrent access from applications A
and B. Which software from HPE must be used to achieve this configuration? a. StoreOpen Enterprise b. StoreServ Remote Copy c. Command View for Tape Libraries d. StoreAll backup to StoreEver tape storage with Symantec NetBackup Utilizing the partitioning feature in the tape library management console and HPE Command View for Tape Libraries, each backup application or department is presented with a logical library comprised of a subset of drives and slots from the physical library. For more information, see Chapter 4, section HPE StoreEver tape library partitioning.
18. Refer to Figure 5.3.
Figure 5-3 HPE StoreServ File Persona Share Backup configuration
Which highlighted component within the HPE StoreServ File Persona Share Backup configuration is responsible for using the appropriate protocol to read the data from the HPE StoreServ File Persona and writing it to the target backup device? a. Media server b. vCenter server c. ISV NDMP agent d. Gateway server The media server uses the appropriate protocol to read the data from the File Persona and writes it to the target. For more information, see Chapter 4, section HPE StoreServ File Persona Share Backup.
19. What is the target use case for HPE StoreOnce Catalyst remote site replication? a. Data protection of StoreAll Gateways b. Transfer of data from a branch office to the main data center c. Automated management of local backup and offsite backup copies
d. Disaster tolerance, business continuity, and long-distance replication The typical usage of HPE StoreOnce Catalyst remote site replication is transferring data from a branch office to the main data center or copying data between sites. For more information, see Chapter 4, section HPE StoreOnce Catalyst remote site replication.
20. What does the NDMP protocol achieve in backup and disaster recovery environments? (Select two.) a. It removes the need to transport data through the backup server. b. It enables long-distance replication through incompatible gateways. c. It securely transports data encryption keys over an unprotected LAN. d. It enables automatic data archiving from StoreServ arrays to StoreEver without using specialized archival software. e. It controls data backup and recovery communications between the primary and secondary storage in a heterogeneous environment. NDMP is an open protocol used to control data backup and recovery communications between the primary and secondary storage in a heterogeneous network environment. The NDMP protocol transports the data between the NAS devices and the backup devices. This removes the need for transporting the data through the backup server itself, thus enhancing speed and removing load from the backup server. For more information, see Chapter 4, sections HPE StoreAll backup to HPE StoreEver tape storage with Symantec NetBackup and HPE StoreServ File Persona software NDMP backup.