Advanced Computer Architecture.PDF

November 13, 2017 | Author: Suganya Periasamy | Category: Cpu Cache, Instruction Set, Central Processing Unit, Parallel Computing, Multi Core Processor
Share Embed Donate


Short Description

Advanced Computer Architecture.PDF with its functionalities and its operations...

Description

NOORUL ISLAM COLLEGE OF ENGINEERING, KUMARACOIL DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

2 & 16 Marks Question Answers CS64- Advanced Computer Architecture S6 BE(CSE)-AU

Prepared by R.Suji Pramila Lecturer/CSE NIU

UNIT I 1. What is Instruction Level parallelism? The technique which is used to overlap the execution of instructions and improve performance is called ILP. 2. What are the approaches to exploit ILP? The two separable approaches to exploit ILP are,  

Dynamic or hardware intensive approach Static or Compiler intensive approach

3. What is pipelining? Pipelining is an implementation technique whereby multiple instructions are overlapped in execution when they are independent of one another. 4. Write down the formula to calculate the pipeline CPI? The value of the CPI (Cycles per Instruction) for the pipelined processor is the sum of the CPI and all contributions from stalls. Pipeline CPI = Ideal pipeline CPI + structural stalls + Data hazard stalls + control stalls. 5. What is loop level parallelism? Loop level parallelism is a way to increase the amount of parallelism available among instructions is to exploit parallelism among iterations of loop. 6. Give the methods to enhance performance of ILP? To obtain substantial performance enhancements, the ILP across multiple basic blocks are exploited using  

loop level parallelism vector instructions

7. List out the types of dependences. There are three different types of dependences   

Data dependences Name dependences Control dependences

8. What is Data hazard? A hazard is created whenever there is dependence between instructions and they are close enough that the overlap caused by pipelining, or other reordering of instructions, would change the order of access to the operand involved in the dependence. 9. Give the classification of Data hazards Data Hazards are classified into three types depending on the order of read and write accesses in the instructions   

RAW (Read After Write) WAW (Write After Write) WAR (Write After Read)

10. List out the constraints imposed by control dependences? The two constraints imposed by control dependencies are  

An instruction that is control dependent on branch cannot be moved before the branch so that its execution is no longer controlled by the branch. An instruction that is not control dependent on a branch cannot be moved after the branch so that its execution is controlled by the branch.

11. What are the properties used for preserving control dependence? Control dependence is preserved by two properties in a simple pipeline.  

Instruction execute in program order Detection of control or branch hazards

12. Define Dynamic Scheduling? Dynamic scheduling is a technique in which the hardware rearranges the instruction execution to reduce the stalls while maintaining data flow and exception behavior. 13. List the advantages of dynamic scheduling?   

It handles dependences that are unknown at compile time. It simplifies the compiler. Uses speculation techniques to improve the performance.

14. What is score boarding? Score boarding is defined as it allows out of order execution when all the resources are available and there is no data dependence. It can’t be eliminated until these two hazards WAW, WAR are cleared. 15. What are the advantages of Tomosulo’s Approach?  

Distribution of hazard detection layer Elimination of WAR and WAW hazard

16. What are the types of branch prediction? There are two types of branch prediction. They are,  

Dynamic branch prediction Static branch prediction

17. Define Amdahl’s Law? This law states that particular performance of the computer can be improved by improving some portion of the computer. This is known as Amdahl’s Law. 18. What are the things present in Dynamic branch prediction? It uses two things they are,  

Branch prediction buffer Branch history table

19. Define Correlating branch prediction? Branch prediction that uses the behavior of other branches to make a prediction is called correlating branch prediction. 20. What are the basic ideas of pipeline scheduling? The basic ideas of pipeline scheduling are,  

To keep pipeline full: Find sequence of unrelated instructions that can be overlapped in the pipeline. To avoid pipeline stall: Separate dependent instructions by a distance in clock dependent instructions by a distance in clock cycles equal to the pipeline latency of that source instruction.

21. What are the four fields involved in ROB? ROB contains four fields,    

Instruction type Destination field Value field Ready field

22. What is reservation station? In Tomasulo’s scheme register renaming is provided by reservation station. The basic idea is that the reservation station fetches and buffers an operand as soon as it is available, eliminating the need to get the operand from a register. 23. What is ROB? ROB stands for reorder buffer. It supplies operands in the interval between completion of instruction execution and instruction commit. ROB is similar to the store buffer in Tomasulo’s algorithm. 24. What is imprecise exception? An exception is imprecise if the processor state when an exception is raised does not look exactly as if the instructions were executed sequentially in strict program order. 25. What are the two possibilities of imprecise exceptions?  

If the pipeline has already completed instructions that are later in program order then that instruction will cause exception. If the pipeline has not yet completed instructions that are earlier in program order then that instructions will cause exception.

26. What are the two main features preserved by maintaining both data and control dependence?  

Exception behavior Data flow

27. What are the types of dependence?  

Anti dependence Output dependence

28. What is anti dependence? An anti dependence between instruction i and instruction j occurs when instruction j writes a register or memory location that instruction i reads. The original ordering must be preserved to ensure that i read the correct value. 29. What is output dependence? An output dependence occurs when instruction i and instruction j write the same register or memory location. The ordering between the instructions must be preserved to ensure that the value finally written corresponds to instruction j. 30. What is register renaming? Renaming of register operand is called register renaming. It can be either done statically by the compiler or dynamically by the hardware.

UNIT 2 1. Define VLIW. VLIW is a technique for ILP by executing instructions without dependencies in parallel. The compiler analysis the program and detects operations to be executed in parallel; such operations are packed into one “ large” instruction. 2. List out the advantages of VLIW processor.  Simple hardware Number of functional units can be increased without needing additional sophisticated hardware to detect parallelism like in superscalus.  Good compilers can detect parallelism based on global analysis of the whole program. 3. Define EPIC  Epic is Explicit Parallel Instruction Computing  It is an architecture framework proposed by HP.  It is based on VLIW and was designed to overcome the key limitations of VLIW while simultaneously giving more flexibility to compiler writers.

4. What is loop level analysis? Loop level analysis involves determining what depends exist among the operands in a loop across the iterations of a loop are data dependent on data values produced in earlier iterations. 5. What are the types of Data dependencies in loops?  Loop Carried dependencies  Not loop carried dependence 6. What is loop carried dependence? Data dependence between different loop iterations (data produced in earlier iterations used in a later one) is called a loop carried dependence. 7. What are the tasks in finding the dependence in a program? There are 3 tasks. They are  Have good scheduling of code  Determine which loop might contain parallelism  Eliminate name dependence 8. Define dependence analysis algorithm. Dependence analysis algorithm is algorithm used to detect the dependence by the compiler based on the assumptions that  Array indices are affine  There exist GCD of the two affine indices 9. What is copy propagation? Copy propagation is the algebraic simplifications of expressions and an optimization which eliminates operation that copy values. 10. What is tree-height reduction technique? Tree-height reduction is optimization which reduces the height of the tree structure representing a computation, making it wider but shorter.

11. What are the components of software pipeline loop?  A software pipeline loop consists of a loop body, start- up code and clean-up code.  Start up code is to execute code left out from the first original loop iterations.  Finish code to execute instructions from the last original iterations. 12. What is trace scheduling? Trace scheduling is way to organize the process of global code motion it simplifies instruction scheduling by incuring the cost of possible code motion on the less critical paths. 13. List out steps used for trace scheduling.  Trace selection  Trace compaction 14. Define Inter-procedural analysis. A procedure with pointer parameters and if we want to analyse the procedure across the boundaries of the particular procedure. It is called interprocedural analysis. 15. What is software pipelining? It is a technique for reorganizing loop such that each iteration in the code is made from instructions chosen from different iterations of original loop. 16. Define critical path. Critical path is defined as the longest sequence of dependent instructions in a program. 17. Define IA-64 processor. The IA-64 is a RISC-Style, register-register instruction set with the features designed to support compiler based exploitation of ILP. 18. What is CFM and what is its use?  CFM stands for Current Frame Pointer  CFM pointer points to the set of registers to be used by a given procedure.

19. What are the parts of CFM pointer? There are two parts. They are  Local area – Used for local storage  Output area - Used to pass values to any called procedure. 20. What is Itanium processor? Itanium processor is a implementation of Intel IA-64 processor. It is capable of having 6 issues per clock cycle. The 6 issues includes 3 branches and 2 memory reference. 21. What are the parts of 10 stage pipeline in Itanium processor?  Front end  Instruction delivery(EXP, REN)  Operand delivery(WLD, REG)  Execution(EXE, DEG, WRB) 22. What are the limitations of ILP?  Limitations on hardware model  Limitations on window size and maximum issue count  Effect of finite register  Effects of imperfect alias analysis 23. List the two techniques for eliminating dependent computations  Software pipelining  Trace scheduling 24. Define Trace selection and Trace compaction Trace Selection Trace selection tries to find a likely sequence of basic blocks whose operations will be put into small number of instructions this sequence is called trace. Trace Compaction Trace compaction tries to squeeze the trace into a small number of wide instructions. Trace compaction is code scheduling hence it attempts to move operations as early as it can in a sequence packing the operations into as few wide instructions as possible.

25. Define Superblocks. Superblocks are formed by a process similar to that used for traces, but are a form of extended basic blocks, which are restricted to a single entry point but allow multiple exits. 26. Use of conditional or predicted instructions. Conditional or predicted instructions are used to eliminate braches, converting a control dependencies and potentially improving performance. 27. Define Instruction Group Instruction group is a sequence of consecutive instructions with no register data dependencies among them. All the instructions in a group could be executed in parallel if sufficient hardware resources existed and if any dependences through memory were preserved. 28. Use of template field in bundle. The 5 bit template field within each bundle describes both the presence of any stops associated with the bundle and the execution unit type required by each instruction within the bundle. 29. List the two types of speculation supported by IA 64 processor.  Control Speculation  Memory reference speculation 30. Define Advance loads. Memory reference support in the IA 64 uses a concept called advanced loads. Advance load is a load that has been speculatively moved above store instructions on which it is potentially dependent. To speculatively perform a load the ld.a instruction is used. 31. Define ALAT Executing advance load instructions created an entry in a special table called ALAT. It stores both the register destination of the load and the address of the accessed memory location. When a store is executed, an associative look up against the active ALAT entries is performed. If there is an ALAT entry with the same memory memory address as the store, mark the ALAT entry as invalid.

32. What are the functional units in Itanium Processor? There are nine functional units in the Itanium processor. Two I units Two M units Three B units Two F units All the functional units are pipelined. 33. Define Scoreboard In Itanium processor 10 stage pipeline divided into 4 parts. In operand delivery part scoreboard is used to detect when individual instruction can proceed so that a start of one instruction in a bundle need not cause the entire bundle to stall. 34. Define Book Keeping Code Basic block consists of 1 entry and 1 exit code. This code is known as Book 1Keeping Code.

Unit-3 1. Define cache coherence problem? Cache coherence problem describes how two different processors can have two different values for the memory location. 2.

What are the two aspects of cache coherence problem? i. coherence- It determines what value can be returned by the particular read operation. ii. Consistency- It determine when the value may be returned by the read operation.

3. What are the two types of cache coherence protocol? i. Directory based protocol. ii. Snooping protocol. 4. Define Directory based protocol. The shared portion of the main memory may be kept in one common place called directory. From this directory we can retrieve the data. 5. Name the different types of snooping protocol. i. invalidate protocol ii. update/write broadcast protocol.

6. Difference between write Update and invalidate protocol. Write update: i. Multiple write broadcast is present ii. Here they consider separate word for each cache block iii. Access time is less Invalidate: i. Only one invalidation is present ii. Invalidation is performed for entire cache block iii. Access time is high 7.

What are the different types of access in distributed shared memory architecture? i, Local: If the processor refers the local memory then it is called local access. ii. Remote: If the processor refers the other process memory then it is called remote access

8.

What are the disadvantages of remote access?  Compiler mechanism for cache coherence is very limited  Without the cache coherence property the multiprocessor system loss the advantage of fetch and use multiple words  Prefetch is very useful only when the multiprocessor fetch multiple word

9.

What are the states available in directory based protocol? i. Shared:-One or more processor can have the copies of same dat. ii. Uncached :- No processor has the copy of data block. iii. Exclusive:- Exactly one processor has the copy of data block.

10.

What are the nodes available in distributed system? i. Local Node ii. Home Node iii. Remote Node

11.

Define Synchronization. Synchronization is the mechanism that is build with user level software routine, which depends on hardware supplied synchronization instruction.

12.

Name the basic hardware primitives. i. Atomic Exchange ii. Test and set

iii. Fetch and Increment 13. Define spinlock. It is a lock that a processor continuously tries to acquire spinning around a loop until it succeeds It is mainly used when the programmer wants to use the lock for a small period of time 14. What are the mechanism to implement locks? There are two methods to implement the locks. i. Implementing lock without using cache coherence ii. Implementing lock using cache coherence. 15. What are the advantage of using spin lock? There are two advantages of using spin lock i. They have low overhead ii. Performance is high 16. Name the synchronization mechanisms for large scale multiprocessor. i. Exponential back off ii. queuing locks iii. combining tree 17. What are the two primitives used for implementing synchronization?  Lock Based Implementation  Barrier based Implementation 18. Define sequential consistency. It requires that the result of any execution be the same as, if the memory access executed by each processor where kept in order and accesses among different processor are interleaved. It reduces the amount of incorrect execution 19. Define multithreading. The process of executing the multiple thread by common memory or common processor in which the execution is done is overlapping fashion. 20. What are the types of multi threading? i. Fine grained multithreading:- It has the ability to switch threads for each instruction ii. coarse grained multithreading:- It has the ability to switch the threads only for costly stalls.

Unit-4 1. Define cache. Cache is the name given to the first level of the memory hierarchy encountered once the address leaves the CPU. Eg: file caches, name caches. 2. What are the factors on which the cache miss depends on? The time required for the cache miss depends on both  Latency  Bandwidth 3. What is the principle of locality? Program access a relatively small portion of the address space at any instant of time is called principle of locality. 4. What is called pages? The address space is usually broken into fixed-size blocks, called pages. Each page resides either in main memory or on disk. 5. What is called memory stall cycles? The number of cycles during which the CPU is stalled waiting for a memory access is called memory stall cycles. 6. Write down the formula for calculating average memory access time? Average memory access time=Hit time+Miss rate*Miss penalty. When hit time is the time to hit in the cache, the formula can help us decide between split caches and a unified cache. 7. What are the techniques to reduce the miss rate?  Larger block size  Larger caches  Higher associativity  Way prediction and pseudo associative caches  Compiler optimizations. 8. What are the techniques to reduce hit time?  Small and simple cache: direct mapped  Avoid address translation during indexing of the cache  Pipelined cache access  Trace cache 9. List out the types of storage devices.  Magnetic storages : disk, floppy, tape  Optical storages : compact disks(CD), digital/video/ verstaile disks(DVD)  Electrical storage : flash memory

10. What is sequence recorded? The sequence recorded on the magnetic medics is a sector number, a gap, the information for that sector including error correction code, a gap, the sector number of the next sector and so on. 11. What is termed as cylinder? The term cylinder is used to refer to all the tracks under the arms at a given point on all surfaces. 12. List the components to a disk access. There are three mechanical components to a disk access:  Rotation latency  Transfer time  Seek time 13. What is average seek time? Average seek time is the sum of the time for all possible seeks divided by the number of possible seek. Average seek times are advertised to be 5 ms to 12 ms. 14. What is transfer time Transfer time is the time it takes to transfer a block of bits, typically a sector, under the read-write head. This time is a function of the block size, disk size, rotation speed, recording density of the track, and speed of the electronics connecting the disk to computer. 15. Write the formula to calculate the CPU execution time. CPU execution time=(CPU clock cycles+ memory stall cycles)*clock cycle time. 16. Write the formula to calculate the CPU time. CPU time=(CPU execution clock cycles+ memory stall clock cycles)* clock cycle time. 17. Define miss penalty for an out of order execution processor. For an out of order execution processor, miss penalty is defined as follows. (Memory stall cycles/Instruction) *( misses/instruction) *(total miss latencyoverlapped miss latency. 18. What are the techniques available to reduce cache penalty or miss rate via parallelism? The three techniques that overlap the execution of instructions are 1.Non blocking caches to reduce stalls on cache miss- to match the out of order processors 2.Hardware prefetching of instructions and data 3.Compiler- controlled prefetching. 19. How are the conflict misses divided? The four divisions of conflict misses are,  Eight way  Four way

 Two way  One way 20. List the advantage of memory hierarchy? Memory hierarchy takes advantageof a.locality b.cost/performance of memory technologies 22. What is the goal of memory hierarchy? The goal is to provide a memory system with *cost almost as low as the cheapest level of memory *speed almost as fast as the faster level 23. Define cache hit ? When the cpu finds a requests data item in the cache, it is called a cache hit. *Hit Rate: the fraction of cache access found in the cache *Hit Time: time to access the upperlevel which consists of RAM access time+Time to determine hit\miss 24.Define cache miss? When the cpu doesnot find a data item it needs in the cache, a cache miss occurs *Miss Rate-1-(Hit Rate) *Miss penalty-Time to replace a block in cache +time to deliver the block to the processor 25. What does Latency and Bandwidth determine? -Latency determine the time to retrieve the first word of the block -Bandwidth determine the time to retrieve the rest of this block 26. What are the types of locality? *Temporal locality(Locality in time) *Spatial locality(Locality in space)

27. How does page fault occur? When the cpu references an item within a page that is not present in the cache or main memory, a page fault occurs, and the entire page is moved from the disk to main memory 28. What is called the miss penalty? The number of memory stall cycles depends on both the number of misses and the cost per miss, which is called the miss penalty 29. What is Average memory access time? The average memory access time for processors is the better measure of memory hierarchy performance with in-order execution 30. What are the categories of cache miss(3cs of cache miss) *compulsory *capacity *conflict 31. What are the techniques to reduce miss penalty? *multi-level caches *critical word first and early restart *giving priority to read misses over writes *Merging writes buffer *victim caches

UNIT-5

1) What is the function of Power Processing Unit? *a full set of 64-bit power pc register. *32-168 bit vector multimedia register. *a 32 KB LI data cache. *a 32 KB LI instruction cache. 2) List out the disadvantages of Heterogeneous multi-core processors? *Developer productivity.

*Portability. *Manage ability. 3) Define Software Multithreading Software multithreading is a piece of software that is aware of more than one core/processor and can use these to be able to simultaneously complete multiple tasks. 4) Define Hardware Multithreading Hardware multithreading is a multithreading that allows multiple to share the functional units of a single processor in an overlapping fashion. 5) Difference between Software and Hardware Multithreading *Multithreading(Computer Architecture), multithreading in hardware. *Thread(Computer Science), multithreading in software. 6) List some advantages of Software Multithreading. *Increased responsiveness and worker productivity. -Increased application responsiveness when different tasks run in parallel. *Improved performance in parallel environments. -When running computations on multiple processors. *More computations per cubic foot of data center. -Web based applications are often multi-threaded in nature. 7) List out the two approaches of Hardware Multithreading. The two main approaches in Hardware multithreading are *Fine-grain Multithreading. *Coarse-grain Multithreading. 8) Define Simultaneous Multithreading(SMT) SMT is a variation on multithreading that uses resources of a multiple –issue, dynamically scheduled processor to exploit ILP at the samw time it exploits ILP. ie., convert thread-level parallelism into more ILP.

9) Give the features exploited by SMT. It exploits the following features of modern processors *Multiple Functional Units. -Modern Processors typically have more functional units available than a single thread can utilize. *Register Renaming and Dynamic Scheduling. -Multiple instructions from independent threads can co-exist and coexecute. 10) What are the Design challenges of SMT? The Design Challenges of SMT processor includes the following*Larger Files needed to hold multiple contents. *Not affecting clock cycle time. *Instruction issue-more candidate instructions need to be considered. *Instruction comlpletion-choosing which instructions to commit may be challenging. *Ensuring that cache and TLB conflicts generated by SMT do not degrade performance. 11) Compare the SMT processor with the base Superscalar Processor The SMT processor are compared to the base superscalar processor in several key measures *Utilization of functional units. *Utilization of Fetch units. *Accuracy of branch predictors. *Hit rates of primary caches. *Hit rates of secondary caches. 12) List the factors that limits the issue slot usage The issue slot usage is limited by the following factors. *Imbalances in resources needs.

*Resources availabilty over multiple threads. *Number of active threads considered. *Finite Limitations of buffer. *Ability to fetch enough instruction from multiple threads. 13) Define Multi-core microprocessor A multi-core microprocessor is one that combines two or more separate processors in one package. 14) What is Het erogeneous Multi-core processors? Herogeneous Multi-core processor is a processor in which multiple cores of different types are implemented in one CPU. 15) List out the advantages of Herogeneous Multi-core processors. *Massive parallelism. *Specialization of Hardware for tools. 16) List out the Disadvantages of Herogeneous Multi-core processors. *Developer productivity. *Portability. *Manageability. 17) What is IBM cell processor? The IBM cell processor is a heterogeneous multi-core processor comprised of controlintensive processor and computative-intensive SIMD processor cores, each with its own distinguishing feature. 18) List the components of IBM cell architecture *Power Processing Elements(PPE). *Synergistic Processor Elements(SPE). *I/O controller. *Element Interconnect Bus(EIB).

19) What are the components of PPE? The PPE is made out of two main units.. 1.Power Processor Unit(PPU) 2.Power Processor Storage Subsystem(PPSS) 20) What is Memory Flow Controller(MFC)? The Memory Flow Controller is actually the interface between the Synergistic Processor(SPU) and the rest of the cell chip. Actually, the MFC interfaces the SPU with the EIB.

16 MARKS 1. Explain the concepts and challenges of Instruction-Level Parallelism.  



Define Instruction-Level Parallelism Data dependences and hazards o Data dependences o Name dependences o Data hazards Control Dependences

2. Explain dynamic scheduling using Tomasulo’s approach. 

 

Explain the 3 steps: o Issue o Execute o Write result Explain the 7 fields of reservation station Figure: The basic structure of a MIPS floating-point unit using Tomasulo’s algorithm.

3. Explain the techniques for reducing branch costs with dynamic hardware prediction.     

Define basic branch prediction and branch-prediction buffers. Figure: The states in a 2-bit prediction scheme Correlating branch predictors Tournament predictors: Adaptively combining local and global predictors Figure: state transition diagram for tournament predictors with 4 states.

4. Explain in detail about hardware-based speculation.  

 

Define hardware speculation, instruction commit, reorder buffer. Four steps involved in instruction execution. o Issue o Execute o Write result o Commit Figure: The basic structure of a MIPS FP unit using Tomasulo’s algorithm and extended to handle speculation Multiple issue with speculation

5. Explain in detail about basic compiler techniques for exposing ILP.   

Basic pipeline scheduling and loop unrolling. Example codes Using loop unrolling and pipeline scheduling with static multiple issue

6. Explain in detail about static multiple issue using VLIW approach.  

 

Define VLIW. The basic VLIW approach: o Explain about the registers used. o Functional units used. o Complex global scheduling. Example code Technical and logistical problems.

7. Explain in detail about advanced compiler support for exposing and exploiting ILP. 

 

Detecting and Enhancing loop-level parallelism. o Finding dependences o Eliminating dependent computations. Software pipelining: Symbolic loop unrolling o Example code fragment. Global code scheduling o Trace scheduling: focusing on the critical path. o Super blocks o Example code fragment.

8. Explain in detail about hardware support for exposing more parallelism at compile time. 

Conditional or Predicated instructions



o Example codes Compiler speculation with hardware support. o Hardware support for preserving exception behavior o Hardware support for memory reference speculation o Example codes.

9. Explain in detail about the Intel IA-64 instruction set architecture.     

The IA-64 register model Instruction format and support for explicit parallelism. Instruction set basics Predication and Speculation support The Itanium processor o Functional units and instruction issue o Itanium performance

10. Explain the limitations of ILP.     

Hardware model Limitations of the window size and maximum issue count. Effects of realistic branch and jump prediction Effects of finite register. Effects of imperfect alias analysis.

11. Explain in detail about symmetric shared memory architecture.  

  

Define multiprocessor cache coherence. Basic schemes for enforcing coherence. o Define directory based o Define snooping Snooping protocols. Basic implementation techniques. An example protocol.

12. Explain the performance of symmetric shared-memory multiprocessors.    

Define true sharing and false sharing. Performance measurements of the commercial workload. Performance of the multiprogramming and OS workload. Performance for the scientific/technical workload.

13. Explain in detail about synchronization.



 



Basic hardware primitives. o Define atomic exchange. o Define test and set, fetch-and-increment, load linked and store conditional instructions. Implementing locks using coherence. Synchronization performance challenges. o Barrier synchronization o Code for simple and sense reversing barrier. Synchronization mechanisms for larger-scale multiprocessors. o Software implementations o Hardware primitives

14. Explain the models of memory consistency.  

Sequential consistency. Relaxed consistency models. o W->R ordering o W->W ordering o R->W and R->R ordering

15. Explain the performance of symmetric shared-memory and distributed shared-memory multiprocessors. Symmetric shared-memory multiprocessors:    

Define true sharing and false sharing. Performance measurements of the commercial workload. Performance of the multiprogramming and OS workload. Performance for the scientific/technical workload.

Distributed shared-memory multiprocessor:  

Miss rate Memory access cost unit

16. Explain in detail about reducing cache miss penalty.     

First miss penalty reduction technique: multilevel caches. Second miss penalty reduction technique: critical word first and early restart. Third miss penalty reduction technique: giving priority to read misses over writes. Fourth miss penalty reduction technique: merging write buffer Fifth miss penalty reduction technique: victim caches

17. Explain in detail about reducing miss rate.     

First miss rate reduction technique: Larger block size. Second miss rate reduction technique: Larger caches. Third miss rate reduction technique: Higher associativity. Fourth miss rate reduction technique: Way prediction and Pseudoassociative caches Fifth miss rate reduction technique: Compiler optimizations. o Loop interchange o Blocking

18. Explain in detail about memory technology.      

DRAM technology. SRAM technology. Embedded processor memory technology: ROM and Flash Improving memory performance in a standard DRAM chip Improving memory performance via a new DRAM interface: RAMBUS Comparing RAMBUS and DDR SDRAMDes.

19.Explain the types of storage devices.      

Magnetic Disks The future of magnetic disks. Optical disks Magnetic tapes Automated tape libraries Flash memory

20. Explain in detail about Buses-Connecting I/O devices to CPU/Memory    

Bus design decisions Bus standards Interfacing storage devices to the CPU- Figure: A typical interface of I/O devices and an I/O bus to the CPU-memory bus. Delegating I/O responsibility from the CPU

21. Explain in detail about SMT.   

Converting thread-level parallelism to instruction-level parallelism Design challenges in SMT processors Potential performance advantages from SMT

22. Explain about CMP architecture.  Define CMP  Architecture  Explanation 23. Explain detail about software and hardware multithreading o Software multithreading o Hardware multithreading o Explanation 24. Explain about heterogeneous multi core processor. o Define multi core processor. o heterogeneous multi core processor o Diagram 25. Explain about IBM cell processor  Define cell processor  Architecture  Explanation

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF