ism file
May 27, 2016 | Author: Tripti Tiwari | Category: N/A
Short Description
Download ism file...
Description
1. Explain Data Proliferation. What are the effective solutions to it? Explain "Data proliferation" is an umbrella term concerned with the large number of files and amount of data stored by entities such as governments and businesses. The massive amount of data coming in daily means these entities need more space and hardware, but data proliferation is moving faster than computer advancements as of 2011. It does not matter what type of information is stored — whether it is structured or unstructured; all that matters is that computer memory is being taken up. Storing all this data can be difficult, leading to extra costs. Another problem with data proliferation is that the network on which the data is stored and all associated programs tend to slow down. The problem of data proliferation is not one that readily concerns consumers and average computer users. While average computer users have required more memory over time, computers have been able to advance at a rate to satisfy these needs. When it comes to businesses, governments and other entities collecting massive data on a daily basis, however, the problem of data proliferation may manifest.
If an average computer user needs more computer memory, he typically just gets a larger hard drive. When a large entity needs more memory, it typically must get more servers. At a normal rate, this should not present any problems, but many large entities in 2011 are storing increasing amounts of data at rates that outpace technology, and a massive number of servers may be needed to hold everything the entity needs to store. This is because computer technology is not yet able to make a device capable of holding all the information, which means a large entity must continue buying and using more and more hardware. Some data terms or problems only concern one type of information. When it comes to data proliferation, however, it does not matter what type of data are involved. As long as computer memory is taken up at a rapid rate, then data proliferation becomes a problem. One of the many problems with data proliferation is cost. Aside from the cost of extra storage hardware, there also are physical storage and human resources costs. The servers must be placed somewhere and people must be employed to run the servers, resulting in costs that theoretically could become too much for an entity to sustain and lead to severely decreased profits. Another problem concerns network speed, because the clogging of data may lead programs to move much slower, meaning employees can do less work during a workday. Effective solutions for Data Proliferation-:
Applications that better utilize modern technology. Reductions in duplicate data (especially as caused by data movement). Improvement of metadata structures. Improvement of file and storage transfer structures. User education and discipline. Highly scalable data warehousing. Disaster recovery system. The implementation of Information Lifecycle Management solutions to eliminate low-value information as early as possible before putting the rest into actively managed long-term storage in which it can be quickly and cheaply accessed.
2. Discuss activities involve in developing an ILM strategy, and explain benefits of ILM strategy. The information lifecycle is the “change in the value of information” over time. When data is first created, it often has the highest value and is used frequently. As data ages, it is accessed less frequently and is of less value to the organization. Understanding the information lifecycle helps to deploy appropriate storage infrastructure, according to the changing value of information. For example, in a sales order application, the value of the information changes from the time the order is placed until the time that the warranty becomes void (see Figure 1-7). The value of the information is highest when a company receives a new sales order and processes it to deliver the product .After order fulfillment, the customer or order data need not be available for real-time access. The company can transfer this data to less expensive secondary storage with lower accessibility and availability requirements unless or until a warranty claim or another event triggers its need. After the warranty becomes void, the company can archive or dispose of data to create space high-value information. Today’s business requires data to be protected and available 24 × 7. Data centers Can accomplish this with the optimal and appropriate use of storage infrastructure. An effective information management policy is required to support this infrastructure and leverage its benefits. Information lifecycle management (ILM) is a proactive strategy that enables an IT organization to effectively manage the data throughout its lifecycle, based on predefined business policies. This allows an IT organization to optimize the Storage infrastructure for maximum return on investment. An ILM strategy should include the following characteristics:
Business-centric: It should be integrated with key processes, applications, and initiatives of the business to meet both current and future growth in information. Centrally managed: All the information assets of a business should be under the purview of the ILM strategy. Policy-based: The implementation of ILM should not be restricted to a few departments. ILM should be implemented as a policy and encompass all business applications, processes, and resources. Heterogeneous: An ILM strategy should take into account all types of storage platforms and operating systems. Optimized: Because the value of information varies, an ILM strategy should consider the different storage requirements and allocate storage resources based on the information’s value to the business. Tiered Storage: Tiered storage is an approach to define different storage levels in order to reduce total storage cost. Each tier has different levels of protection, performance, data access frequency, and other considerations. Information is stored and moved between different tiers based on its value over time.
ILM Benefits-:
Implementing an ILM strategy has the following key benefits that directly address the challenges of information management:
Improved utilization by using tiered storage platforms and increased visibility of all enterprise information. Smplified management by integrating process steps and interfaces with individual tools and by increasing automation. A wider range of options for backup, and recovery to balance the need for business continuity. Maintaining compliance by knowing what data needs to be protected for what length of time. Lower Total Cost of Ownership (TCO) by aligning the infrastructure and management costs with information value. As a result, resources are not wasted, and complexity is not introduced by managing low-value data at the expense of high-value data.
3. Describe the various components of disk physical storage. Several types of data storage exist in most computer systems. These storage media are classified by the speed with which data can be accessed, by the cost per unit of data to buy the medium and by the medium’s reliability. Among the media typically available are these: cache, main memory, flash memory, magnetic disk, optical disk, tapes. The fastest storage media-for example, cache and main memory-are referenced to as primary storage. The media is the next level in the hierarchy-for example, magnetic disks are referred to as secondary storage, or online storage. The media in the lowest level in the hierarchy-for example, magnetic tape and optical-disk jukeboxes are referred to as tertiary storage, or offline storage. Disk physical storage is non-volatile type of memory for safe keeping i.e.it does not lose it contents when the power to the device is removed. Disk Components-:
Platter. Spindle . Read/Write head. Actuator arm assembly. Controller
Platter-: A typical HDD consists of one or more flat circular disks called platters. The data is recorded on these platters in binary codes. The set of rotating platters is sealed in case, called a Head Disk Assembly (HDA). A platter is a rigid, round disk coated with magnetic material on both surfaces (top and bottom). Spindle-: A spindle connects all the platters, and is connected to a motor. The motor of the spindle rotates with a constant speed. The disk platter spins at a speed of several thousands of revolutions per minute (rpm). Disk drives have spindle speeds of 7,200 rpm, 10,000, or 15,000 rpm. Read/Write head-: It read and writes data from or to a platter. Drivers have two R/W heads per platter, one for each surface of the platter. The R/W head changes the magnetic polarization on the surface of the platter when writing data. While, reading data, this head detects magnetic polarization on the surface of the platter. During read and writes, the R/W head senses the magnetic polarization and never touches the surface of the platter. Actuator Arm Assembly-: The R/W heads are mounted on the actuator arm assembly, which positions the R/W head at the location on the platter where the data needs to be written or read. The R/W heads for all platters on a drive are attached to one actuator arm assembly and moves across the platters simultaneously. There are two R/W heads per platter, one for each surface. Controller-: The controller is a printed circuit board, mounted at the bottom of a disk drive. It consists of a microprocessor, internal memory, circuitry, and firmware.
Fig. A typical Disk storage system
4. What is RAID and describe its levels? RAID (redundant array of independent disks; originally redundant array of inexpensive disks) is a way of storing the same data in different places (thus, redundantly) on multiple hard disks. By placing data on multiple disks, I/O (input/output) operations can overlap in a balanced way, improving performance. Since multiple disks increases the mean time between failures (MTBF), storing data redundantly also increases fault tolerance. A RAID appears to the operating system to be a single logical hard disk. RAID employs the technique of disk striping, which involves partitioning each drive's storage space into units ranging from a sector (512 bytes) up to several megabytes. The stripes of all the disks are interleaved and addressed in order. In a single-user system where large records, such as medical or other scientific images, are stored, the stripes are typically set up to be small (perhaps 512 bytes) so that a single record spans all disks and can be accessed quickly by reading all disks at the same time. In a multi-user system, better performance requires establishing a stripe wide enough to hold the typical or maximum size record. This allows overlapped disk I/O across drives. RAID Levels-:
RAID-0: This technique has striping but no redundancy of data. It offers the best performance but no fault-tolerance. RAID-1: This type is also known as disk mirroring and consists of at least two drives that duplicate the storage of data. There is no striping. Read performance is improved since either disk can be read at the same time. Write performance is the same as for single disk storage. RAID-1 provides the best performance and the best fault-tolerance in a multi-user system. RAID-2: This type uses striping across disks with some disks storing error checking and correcting information. It has no advantage over RAID-3. RAID-3: This type uses striping and dedicates one drive to storing parity information. The embedded error checking (ECC) information is used to detect errors. Data recovery is accomplished by calculating the exclusive OR (XOR) of the information recorded on the other drives. Since an I/O operation addresses all drives at the same time, RAID-3 cannot overlap I/O. For this reason, RAID-3 is best for single-user systems with long record applications. RAID-4: This type uses large stripes, which means you can read records from any single drive. This allows you to take advantage of overlapped I/O for read operations. Since all write operations have to update the parity drive, no I/O overlapping is possible. RAID-4 offers no advantage over RAID-5. RAID-5: This type includes a rotating parity array, thus addressing the write limitation in RAID-4. Thus, all read and write operations can be overlapped. RAID-5 stores parity information but not redundant data (but parity information can be used to reconstruct data).
RAID-5 requires at least three and usually five disks for the array. It's best for multi-user systems in which performance is not critical or which do few write operations. RAID-6: This type is similar to RAID-5 but includes a second parity scheme that is distributed across different drives and thus offers extremely high fault- and drive-failure tolerance.
Fig. RAID 0
Fig. RAID 1
Fig. RAID 2
5. What is JBOD? Describe its architecture, features and disadvantages? JBOD-: JBOD stands for Just a Bunch of drives or disks. This term defines the storage array in which all devices with in the array are addressable independent units. It is usually, in software as a part of functionality of an operating system’s file system. These provide additional but do not offer any fault strength in event of a non working drive. Partitioning data through the disks and providing a layer of virtualization services is generally a part of the I/O management of operating system. The controller represents the control centre of a disk subsystem. Disk subsystems without controllers are called JBODs (Jus a bunch of Disks); JBODs provide only an enclosure and a common power supply for several for several hard disks. Architecture-:
Fig. JBOD Architecture The backplane provides an interconnect between I/O cards that interface with disk drives. Each I/O card contains drive bypass circuitry, which interface through separate loops with an array of disk drives. Features of JBOD-:
More storage capacity. Less expensive. Not shared between servers.
Disadvantages of JBOD-:
The disadvantage of any JBOD implementation is its lack of fault resiliency or fault strength, though there is a software RAID products that allow the stripping of data with recovery mechanisms included in the JBOD enclosure.
6. Give the comparison between SAN, NAS, DAS and CAS? Utility Application
Backup
SAN NAS Provides Provides storage storage for for application heterogeneous servers clients Centralized Centralized
DAS CAS Storage is Provides unique to one associative server storage By server over Centralized LAN File Fixed Content
Nature of Block File Information Data of Life Contents are Contents are Contents Cycle created actively created actively created actively Typical storage Fiber Channel Ethernet SCSI Interconnect Snapshots Possible Yes No
are Contents constraint preserving Internet protocol Yes
are
7. What are the different forms of virtualization? Explain each in brief. There are the different forms of Virtualization-: Storage Virtualization This is where the resources of many different network storage devices such as hard drives are pooled so that it looks like they are all one big vat of storage. This is then managed by a central system that makes it all look much simpler to the network administrators. This is also a great way to keep an eye on resources in a business, as you can then see exactly how much you have left at a given time. It gives the administrator much less hassle when it comes to backups etc. Network Virtualization Network Virtualization is when all of the separate resources of a network are combined, allowing the network administrator to share them out amongst the users of the network. This is done by splitting the resources’ bandwidth into channels and allowing the administrator to assign these resources as and when required. This allows each user to access all of the network resources from their computer. This can be files and folders on the computer, printers or hard drives etc. This streamlined approach makes the life of the network administrator much easier, and it makes the system seem much less complicated to the human eye than it really is. Server Virtualization This is the main area of virtualization, whereby a number of “virtual machines” are created on one server meaning that multiple tasks can then be assigned to the one server, saving on processing power, cost and space. This means that any network tasks that are happening on the server still appear to be on a separate space, so that any errors can be diagnosed and fixed quickly.
8. Explain the common information model (CMI)? The Common Information Model (CIM) is a computer industry standard for defining device and application characteristics so that system administrators and management programs will be able to control devices and applications from different manufacturers or sources in the same way. For example, a company that purchased different kinds of storage devices from different companies would be able to view the same kind of information (such as: device name and model, serial number, capacity, network location, and relationship to other devices or applications) about each of them or be able to access the information from a program. CIM takes advantage of the Extensible Markup Language (XML). Hardware and software makers choose one of several defined XML schemas (information structures) to supply CIM information about their product.CIM was developed by an industry group, the Distributed (formerly Desktop) Management Task Force (DMTF), as part of an initiative called Web-Based Enterprise Management (WBEM). CIM is intended to be more comprehensive than earlier models now in use, the Simple Network Management Protocol (SNMP) and Desktop Management Interface (DMI). With CIM, relationship information (what's connected to what) can be used to help trace the source and status of problems.
9. Explain Cloud Computing architecture.
Fig. Cloud Computing Reference Architecture Overview
A Reference Architecture (RA) provides a blueprint of a to-be-model with a well-defined scope, requirements it satisfies, and architectural decisions it realizes. Definition of a single Cloud Computing Reference Architecture, enabling cloud-scale economics in delivering cloud services while optimizing resource and labor utilization and delivering a design blueprint for
Cloud services, which are offered to customers
Private, public or hybrid cloud projects
Workload-optimized systems
Enabling the management of multiple cloud services (across I/P/S/BPaaS) based on the same, common management platform for enabling economies of scale. The following sections are as follows-: Roles-: The cloud computing reference architecture defines three main roles-Cloud Service consumers, Cloud Service provider, And Cloud Service Creator. Cloud Service Consumer-: A cloud service consumer is an organization, a human being or an IT system that consumes service instances delivered by a particular cloud service. Cloud Service Provider-: The Cloud service provider has the responsibility of providing cloud services to Cloud Service consumers. Cloud Service creator-: The Cloud service creator is responsible for creating a cloud service, which can be run by a cloud service provider and by that exposed to cloud service consumers
10. Explain the following terms-: 1. Cloud Burst 2. Cloud-oriented architecture 3. Cloud-Service architecture 4. Cloud Spanning 5. Cloud ware. Cloud Burst-: Cloud bursting is an application deployment model in which an application runs in a private cloud or data center and bursts into a public cloud when the demand for computing capacity spikes. The advantage of such a hybrid cloud deployment is that an organization only pays for extra compute resources when they are needed. Experts recommend cloud bursting for high performance, non-critical applications that handle non-sensitive information. An application can be deployed locally and then burst to the cloud to meet peak demands, or the application can be moved to the public cloud to free up local resources for business-critical applications. Cloud bursting works best for applications that don’t depend on a complex application delivery infrastructure or integration with other applications, components and systems internal to the data center. When considering cloud busting, an organization must consider security and regulatory compliance requirements. For example, cloud bursting is often cited as a viable option for retailers that experience peaks in demand during the holiday shopping season. However, cloud computing service providers do not necessarily offer a PCI DSS-compliant environment and retailers could be putting sensitive data at risk by bursting it to the public cloud. Cloud-Oriented Architecture-: A cloud-oriented architecture (COA) is a conceptual model encompassing all elements in a cloud environment. In information technology, architecture refers to the overall structure of an information system and the interrelationships of entities that make up that system. A cloud-oriented architecture is related to both service-oriented architectures (SOA) and event-driven architectures (EDA) and is a combination of two other architectural models: the resource-oriented architecture (ROA) and the hypermedia-oriented architecture (HOA). A ROA is based on the idea that any entity that can be assigned a uniform resource identifier (URI) is a resource. As such, resources include not only infrastructure elements such as servers, computers and other devices, but also Web pages, scripts and JSP/ASP pages, and other entities such as -- for one example among a great many possibilities -- traffic lights. Hypermedia extends the notion of the hypertext link to include links among any set of multimedia objects, including sound, video, and virtual reality. Cloud-Service Architecture-: A term coined by Jeff Barr, chief evangelist at Amazon Web Services. This term describes an architecture in which applications and application components act as services on the cloud, which serve other applications within the same cloud environment. Cloud Spanning-: Running an application in a way that its components straddle multiple cloud environments (which could be any combination of internal/private and external/public clouds.).
Unlike Cloud bursting which refers strictly to expanding the application to an external cloud to handle spikes in demand? Cloud Ware-: A general term referring to a variety of software, typically at the infrastructure level, that enables building, deploying, running or managing applications in cloud computing environment.
View more...
Comments