Cache Memory

Share Embed Donate


Short Description

Download Cache Memory...

Description

Computer Architecture Chapter 5 Memory Hierarchy Design

Chapter Overview 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main Memory 5.7 Virtual Memory 5.8 Protection and Examples of Virtual Memory

Introduction 5.1 Introduction

The Big Picture: Where are We Now? The Five Classic Components of a Computer

5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty

Processor Input Control Memory

5.5 Reducing Hit Time 5.6 Main Memory 5.7 Virtual Memory 5.8 Protection and Examples of Virtual Memory

Datapath

Output

Topics In This Chapter: SRAM Memory Technology DRAM Memory Technology Memory Organization

Introduction Capacity Access Time Cost CPU Registers 100s Bytes 1s ns Cache K Bytes 4 ns 1-0.1 cents/bit Main Memory M Bytes 100ns- 300ns $.0001-.00001 cents Disk /bit G Bytes, 10 ms (10,000,000 ns) 105- 10 6 cents/bit Tape infinite secmin8 10

The Big Picture: Where are We Now?

Levels of the Memory Hierarchy

Upper Level faste r

Staging Xfer Unit

Register s Instr. Operands Cach e Block s Memor y Page s Dis k File s Tap e

prog. /compiler 1-8 bytes cache cntl 8-128 bytes OS 512-4K bytes user/operato r Mbytes

Large Lower r Level

The ABCs of Caches 5.1 Introduction

In this section we will:

5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty

Learn lots of definitions about caches – you can’t talk about something until you understand it (this is true in computer science at least!)

5.5 Reducing Hit Time 5.6 Main Memory 5.7 Virtual Memory 5.8 Protection and Examples of Virtual Memory

Answer some fundamental questions about caches: Q1: Where can a block be placed in the upper level? (Block placement) Q2: How is a block found if it is in the upper level? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy)

Cache Memory The purpose of cache memory is to speed up accesses by storing recently used data closer to the CPU, instead of storing it in main memory. Although cache is much smaller than main memory, its access time is a fraction of that of main memory. Unlike main memory, which is accessed by address, cache is typically accessed by content; hence, it is often called content addressable memory . Because of this, a single large cache memory isn’t always desirable-- it takes longer to search.

Cache Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module

Cache/Main Memory Structure

Cache operation – overview CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot

Cache Read Operation - Flowchart

Comparison of Cache Sizes Processor

Type

L1 cachea

L2 cache

L3 cache

Mainframe

Year of Introduction 1968

IBM 360/85

16 to 32 KB





PDP-11/70

Minicomputer

1975

1 KB





VAX 11/780

Minicomputer

1978

16 KB





IBM 3033

Mainframe

1978

64 KB





IBM 3090

Mainframe

1985

128 to 256 KB





Intel 80486

PC

1989

8 KB





Pentium

PC

1993

8 KB/8 KB

256 to 512 KB



PowerPC 601

PC

1993

32 KB





PowerPC 620

PC

1996

32 KB/32 KB





PowerPC G4

PC/server

1999

32 KB/32 KB

256 KB to 1 MB

2 MB

IBM S/390 G4

Mainframe

1997

32 KB

256 KB

2 MB

IBM S/390 G6

Mainframe

1999

256 KB

8 MB



Pentium 4

PC/server

2000

8 KB/8 KB

256 KB



IBM SP

2000

64 KB/32 KB

8 MB



CRAY MTAb

High-end server/ supercomputer Supercomputer

2000

8 KB

2 MB



Itanium

PC/server

2001

16 KB/16 KB

96 KB

4 MB

SGI Origin 2001

High-end server

2001

32 KB/32 KB

4 MB



Itanium 2

PC/server

2002

32 KB

256 KB

6 MB

IBM POWER5

High-end server

2003

64 KB

1.9 MB

36 MB

CRAY XD-1

Supercomputer

2004

64 KB/64 KB

1MB



The ABCs of Caches

Definitions The Principle of Locality

The Principle of Locality: Program access a relatively small portion of the address space at any instant of time. Three Different Types of Locality: Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse) Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straightline code, array access) Sequential Locality : Sequential order of program execution except branch instructions.

A few terms Inclusion Property Coherence Property Access frequency Access time Cycle time Latency Bandwidth Capacity Unit of transfer

Definitions

The ABCs of Caches

Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X) Hit Rate: the fraction of memory access found in the upper level Hit Time: Time to access the upper level which consists of Upper level access time + Time to determine hit/miss Miss: data needs to be retrieve from a block in the lower level (Block Y) Miss Rate = 1 - (Hit Rate) Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor Consider a memory with three levels Average memory access time (assuming hit at 3rd level) h1 * t1 + (1 – h1) [t1 + h2 * t2 + (1 – h2) * ( t2 + t3)] where t1, t2 and t3 are access times at the three levels Access frequency of level Mi: fi = (1- h1) (1- h2)…(1-hi)hi

Effective Access time =

(fi * ti)

The ABCs of Caches

Definitions Cache Measures

Hit rate : fraction found in that level So high that usually talk about Miss rate Average memory-access time = Hit time + Miss rate x Miss penalty (ns or clocks) Miss penalty : time to replace a block from lower level, including time to replace in CPU access time : time to lower level = f(latency to lower level) transfer time : time to transfer block =f(Bandwidth between upper & lower levels)

Measures CPU Execution time = (CPU Clock Cycles + Memory Stall Cycles) * Clock Cycle Time CPU clock cycles includes cache hit and CPU is stalled during miss

Memory Stall cycles = Number of misses * Miss penalty = IC * (Misses / Instruction) * Miss penalty = IC * (Memory Accesses / Instruction) * Miss Rate * Miss penalty Miss rate and miss penalties are different for reads and writes

Memory Stall Cycles = IC

* (Reads / Instruction) * Read Miss Rate * Read Miss penalty + IC * (Writes / Instruction) * Write Miss Rate * Write Miss penalty

Miss Rate = Misses / Instruction

= (Miss rate * Memory Accesses ) / Instruction Count = Miss rate * (Memory Accesses / Instruction)

Typical Cache Organization

Definitions

The ABCs of Caches Memory Address 0 1 2 3 4 5 6 7 8 9 A B C D E F

Memor y

Simplest Cache: Direct Mapped 4 Byte Direct Mapped Cache Cache Index 0 1 2 3

Location 0 can be occupied by data from: Memory location 0, 4, 8, ... etc. In general: any memory location whose 2 LSBs of the address are 0s Address => cache index Which one should we place in the cache? How can we tell which one is in the cache?

Cache Memory Where can a block be placed in the Cache?

Block 12 is placed in an 8 block cache: Fully associative, direct mapped, 2-way set associative S.A. Mapping = Block Number Modulo Number Sets

Fully associative: block 12 can go anywhere Bloc k no.

0123456 7

Direct mapped: block 12 can go only into block 4 (12 mod 8)

Bloc k no.

0123456 7

Set associative: block 12 can go anywhere in set 0 (12 mod 4)

Bloc k no.

Block-frame address

Bloc k no.

111111111122222222223 0 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

0123456 7

Set Set Set Set 0 1 2 3

The ABCs of Caches

How is a block found if it is in the cache?

Each entry in the cache stores words Tag on each block No need to check index or block offset

Address Tag

Byte Offset Index

The ABCs of Caches

How is a block found if it is in the cache?

Each entry in the cache stores words Tag on each block No need to check index or block offset

Address Tag

Byte Offset Index

Cache Memory Take advantage of spatial locality Store multiple words

The diagram below is a schematic of what cache looks like.

Block 0 contains multiple words from main memory, identified with the tag 00000000. Block 1 contains words identified with the tag 11110101. The other two blocks are not valid.

Cache Memory As an example, suppose a program generates the address 1AA. In 14-bit binary, this number is: 00000110101010. The first 7 bits of this address go in the tag field, the next 4 bits go in the block field, and the final 3 bits indicate the word within the block.

Cache Memory

Block address

31 Cache Tag

Example: 0x50

Cache Organizations I : DirectMapped Cache

9 Cache Index Ex: 0x01

4 Byte Select Ex: 0x00

0

Stored as part of the cache “state” Valid Bit

Cache Tag 0x50

:

Cache Data Byte 31 : Byte 63 :

Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3

:

: Byte 1023

:

Byte 992 31

Tag s-r

Direct Mapping Address LineStructure or Slot r

Word w

14

8

24 bit address 2 bit word identifier (4 byte block) 22 bit block identifier 8 bit tag (=22-14) 14 bit slot or line

No two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tag

2

Direct Mapping Cache Line Table Cache line Main Memory blocks held 0 0, m, 2m, 3m…2s-m 1 1,m+1, 2m+1…2s-m+1 m-1 m-1, 2m-1,3m-1…2s-1

Direct Mapping Cache Organization

Direct Mapping pros & cons Simple Inexpensive Fixed location for given block If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high

Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of memory Every line’s tag is examined for a match Cache searching gets expensive

Fully Associative Cache Organization

Associative Mapping Address Structure

Tag 22 bit

Word 2 bit

22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit word is required from 32 bit data block e.g. Address Tag Data Cache line FFFFFC FFFFFC 24682468 3FFF

Cache Memory

Block address

31 Cache Tag

Example: 0x50

Cache Organizations II : SetAssociative Cache

9

8

4 Byte Select Ex: 0x00

Cache Index Ex: 0x01 mod 16

0

Stored as part of the cache “state” Valid Bit

Cache Data Byte 31 :

Cache Tag

0x50

:

Byte 63

:

Byte 1

Byte 0 Set 0

Byte 33 Byte 32

Set 1

:

: Byte 1023

:

Byte 992

Set 15

Set Associative Mapping Cache is divided into a number of sets Each set contains a number of lines A given block maps to any line in a given set e.g. Block B can be in any line of set i

e.g. 2 lines per set 2 way associative mapping A given block can be in one of 2 lines in only one set

Set Associative Mapping Example 13 bit set number Block number in main memory is modulo 213 000000, 00A000, 00B000, 00C000 … map to same set

Two Way Set Associative Cache Organization

Set Associative Mapping Address Structure Tag 9 bit

Set 13 bit

Use set field to determine cache set to look in Compare tag field to see if we have a hit e.g Address Tag Data Set number 1FF 7FFC 1FF 12345678 1FFF 001 7FFC 001 11223344 1FFF

Word 2 bit

Two Way Set Associative Mapping Example

Replacement Algorithms (1) Direct mapping No choice Each block only maps to one line Replace that line

Replacement Algorithms (2) Associative & Set Associative Hardware implemented algorithm (speed) Least Recently used (LRU) e.g. in 2 way set associative Which of the 2 block is lru?

First in first out (FIFO) replace block that has been in cache longest

Least frequently used replace block which has had fewest hits

Random

Write Policy Must not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly

Write through All writes go to main memory as well as cache Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Lots of traffic Slows down writes Remember bogus write through caches!

Write back Updates initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set Other caches get out of sync I/O must access main memory through cache N.B. 15% of memory references are writes

Cache Memory

Let’s Do An Example: The Memory Addresses We’ll Be Here’s a number of addresses. We’ll be asking for the data at these Using addresses and see what happens to the cache when we do so. Address

1090 1440 5000 1470

Tag 3 1 3 1 3 1 3 1

Set 9 8

Offset 5

0000000000000000000001 001 0 9 8 0 5 0000000000000000000001 0

9 8

xxxxxxxxxxxxxxxxxxxxxxx 9 8

xxxxxxxxxxxxxxxxxxxxxx x

Cache: 1. Is Direct Mapped 2. Contains 512 bytes. 3. Has 16 sets. 4. Each set can hold 32 bytes or 1 cache line.

110 1 xxx x xxx x

5

5

4

Result 0

0001 0 4 0 0000 0 4

0100 0 4

0

0

xxxxx

Miss Miss

Set Address

Initially the cache is empty. Cache: 1. Is Direct Mapped 2. Contains 512 bytes. 3. Has 16 sets. 4. Each set can hold 32 bytes or 1 cache line.

Here’s the Cache We’ll Be V Tag Touching Data (Can hold a 32-byte cache line.)

0 (0000)

N

1 (0001)

N

2 (0010)

N

3 (0011)

N

4 (0100)

N

5 (0101)

N

6 (0110)

N

7 (0111)

N

8 (1000)

N

9 (1001)

N

10 (1010)

N

11 (1011)

N

12 (1100)

N

13 (1101)

N

14 (1110)

N

15 (1111)

N

Cache Memory We want to READ data from address 1090 = 010|0010|00010 Add.

Tag

Set

Offset

256

0000

1000

00000

512

0001

0000

00000

Doing Some Cache Action Set Address

V

Tag

Data (Always holds a 32-byte cache line.)

0 (0000)

N

1 (0001)

N

2 (0010)

Y 00000….10 Data from memory loc. 1088 - 1119 N

3 (0011)

N

4 (0100)

N

5 (0101)

N

6 (0110)

N

1024

0010

0000

00000

1090

0010

0010

00010

7 (0111)

N

1099

0010

0010

01011

8 (1000)

N

1440

0010

1101

00000

9 (1001)

N

1470

0010

1101

11110

10 (1010)

N

1600

0011

0010

00000

11 (1011)

N

12 (1100)

N

1620

0011

0010

10100

13 (1101)

N

2048

0100

0000

00000

14 (1110)

N

4096

1000

0000

00000

15 (1111)

N

5000

1001

1100

01000

Cache Memory We want to READ data from address 1440 = 010|1101|00000 Offset

Doing Some Cache Action Set Address

V

0 (0000)

N

1 (0001)

N

2 (0010)

Y

Tag

Data (Always holds a 32-byte cache line.)

Add.

Tag

Set

256

0000

1000

00000

3 (0011)

N

512

0001

0000

00000

4 (0100)

N

1024

0010

0000

00000

5 (0101)

N

6 (0110)

N

1090

0010

0010

00010

7 (0111)

N

1099

0010

0010

01011

8 (1000)

N

1440

0010

1101

00000

9 (1001)

N

1470

0010

1101

11110

10 (1010)

N

1600

0011

0010

00000

11 (1011)

N

1620

0011

0010

10100

12 (1100)

N

13 (1101)

N Y 00000….10 Data from memory loc. 1440 - 1471

2048

0100

0000

00000

14 (1110)

N

4096

1000

0000

00000

15 (1111)

N

5000

1001

1100

01000

00000….10

Data from memory loc. 1088 - 1119

Cache Memory We want to READ data from address 5000 = 1001|1100|01000 Offset

Doing Some Cache Action Set Address

V

0 (0000)

N

1 (0001)

N

2 (0010)

Y

3 (0011)

N

4 (0100)

N

5 (0101)

N

Add.

Tag

Set

256

0000

1000

00000

512

0001

0000

00000

1024

0010

0000

00000

6 (0110)

N

1090

0010

0010

00010

7 (0111)

N

1099

0010

0010

01011

8 (1000)

N

1440

0010

1101

00000

9 (1001)

N

10 (1010)

N

11 (1011)

N

Tag

00000…….10

Data (Always holds a 32-byte cache line.)

Data from memory loc. 1088 - 1119

1470

0010

1101

11110

1600

0011

0010

00000

12 (1100)

Y 00000….1001 Data from memory loc. 4992 - 5023 N

1620

0011

0010

10100

13 (1101)

Y

2048

0100

0000

00000

14 (1110)

N

4096

1000

0000

00000

15 (1111)

N

5000

1001

1100

01000

00000…0010

Data from memory loc. 1440 - 1471

Cache Memory We want to READ data from address 1470 = 0010|1101|11110 Add.

Tag

Set

Offset

256

0000

1000

00000

512

0001

0000

00000

Doing Some Cache Action Set Address

V

0 (0000)

N

1 (0001)

N

2 (0010)

Y

3 (0011)

N

4 (0100)

N

5 (0101)

N

6 (0110)

N

7 (0111)

N

Tag

Data (Always holds a 32-byte cache line.)

00000…….10

Data from memory loc. 1088 - 1119

00000….1001

Data from memory loc. 4992 - 5023

1024

0010

0000

00000

1090

0010

0010

00010

1099

0010

0010

01011

8 (1000)

N

1440

0010

1101

00000

9 (1001)

N

1470

0010

1101

11110

10 (1010)

N

1600

0011

0010

00000

11 (1011)

N

1620

0011

0010

10100

12 (1100)

Y

2048

0100

0000

00000

13 (1101)

Y 00000….0010 00000…00010 Data from memory 1440 - 1471 Data from memory loc. 1440loc. - 1471

4096

1000

0000

00000

14 (1110)

N

15 (1111)

N

5000

1001

1100

01000

Cache Memory We want to READ data from address 1600 = 0011|0010|00000 Offset

Doing Some Cache Action Set Address

V

Tag

Data (Always holds a 32-byte cache line.)

0 (0000)

N

1 (0001)

N

2 (0010)

Y 00000…….10 Data from memory 1060 - 1091 Y 00000….0011 Data from memory loc. 1600loc. - 1631

Add.

Tag

Set

256

0000

1000

00000

3 (0011)

N

512

0001

0000

00000

4 (0100)

N

1024

0010

0000

00000

5 (0101)

N

6 (0110)

N

1090

0010

0010

00010

7 (0111)

N

1099

0010

0010

01011

8 (1000)

N

1440

0010

1101

00000

9 (1001)

N

1470

0010

1101

11110

10 (1010)

N

1600

0011

0010

00000

11 (1011)

N

1620

0011

0010

10100

12 (1100)

Y

00000….1001

Data from memory loc. 4992 - 5023

13 (1101)

Y

00000…00010

Data from memory loc. 1440 - 1471

2048

0100

0000

00000

14 (1110)

N

4096

1000

0000

00000

15 (1111)

N

5000

1001

1100

01000

Cache Memory We want to WRITE data to address 256 = 0000|1000|00000 Offset

Doing Some Cache Action Set Address

V

0 (0000)

N

1 (0001)

N

2 (0010)

Y

Tag

Data (Always holds a 32-byte cache line.)

Add.

Tag

Set

256

0000

1000

00000

3 (0011)

N

512

0001

0000

00000

4 (0100)

N

1024

0010

0000

00000

5 (0101)

N

1090

0010

0010

00010

6 (0110)

N

7 (0111)

N

1099

0010

0010

01011

8 (1000)

Y 00000….0000 Data from memory loc. 256 - 287 N

1440

0010

1101

00000

9 (1001)

N

1470

0010

1101

11110

10 (1010)

N

1600

0011

0010

00000

11 (1011)

N

1620

0011

0010

10100

12 (1100)

Y

00000….1001

Data from memory loc. 4992 - 5023

2048

0100

0000

00000

13 (1101)

Y

00000…00010

Data from memory loc. 1440 - 1471

14 (1110)

N

4096

1000

0000

00000

15 (1111)

N

5000

1001

1100

01000

00000….0011

Data from memory loc. 1600 - 1631

Cache Memory We want to WRITE data to address 1620 = 0011|0010|10100 Offset

Doing Some Cache Action Set Address

V

Tag

Data (Always holds a 32-byte cache line.)

0 (0000)

N

1 (0001)

N

2 (0010)

Y 00000…….10 Data from memory 1060 - 1091 Y 00000….0011 Data from memory loc. 1600loc. - 1631

Add.

Tag

Set

256

0000

1000

00000

3 (0011)

N

512

0001

0000

00000

4 (0100)

N

1024

0010

0000

00000

5 (0101)

N

6 (0110)

N

1090

0010

0010

00010

7 (0111)

N

1099

0010

0010

01011

8 (1000)

Y

1440

0010

1101

00000

9 (1001)

N

1470

0010

1101

11110

10 (1010)

N

1600

0011

0010

00000

11 (1011)

N

1620

0011

0010

10100

12 (1100)

2048

0100

0000

00000

4096

1000

0000

00000

5000

1001

1100

01000

00000….0000

Data from memory loc. 256 - 287

Y

00000….1001

Data from memory loc. 4992 - 5023

13 (1101)

Y

00000…00010

Data from memory loc. 1440 - 1471

14 (1110)

N

15 (1111)

N

Cache Memory We want to WRITE data to address 1099 = 0010|0010|01011 Add.

Tag

Set

Offset

Doing Some Cache Action Set Address

V

Tag

Data (Always holds a 32-byte cache line.)

0 (0000)

N

1 (0001)

N

2 (0010)

Y 00000….0010 Data from memory loc. 1088loc. - 1119 Y 00000…00011 Data from memory 1600 - 1631

3 (0011)

N

4 (0100)

N

256

0000

1000

00000

512

0001

0000

00000

5 (0101)

N

1024

0010

0000

00000

6 (0110)

N

1090

0010

0010

00010

7 (0111)

N

1099

0010

0010

01011

8 (1000)

Y

1440

0010

1101

00000

9 (1001)

N

10 (1010)

N

11 (1011)

N

00000….0000

Data from memory loc. 256 - 287

1470

0010

1101

11110

1600

0011

0010

00000

12 (1100)

Y

00000….1001

Data from memory loc. 4992 - 5023

1620

0011

0010

10100

13 (1101)

Y

00000…00010

Data from memory loc. 1440 - 1471

2048

0100

0000

00000

14 (1110)

N

4096

1000

0000

00000

15 (1111)

N

5000

1001

1100

01000

Cache Memory What happens on a write? Write through —The information is written to both the block in the cache and to the block in the lower-level memory. Write back —The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced. is block clean or dirty?

WT always combined with write buffers so that don’t wait for lower level memory

Cache Memory Write Buffer for Write Through Cache Processor

DRA M

Write Buffer

A Write Buffer is needed between the Cache and Memory Processor: writes data into the cache and the write buffer; Memory controller: write contents of the buffer to memory.

Write buffer is just a FIFO: Typical number of entries: 4; Must handle bursts of writes;

Cache Memory

Write-miss Policy: Write Allocate vs . Not Allocate

Assume: a 16-bit (sub-block) write to memory location 0x0 and causes a miss. Do we allocate space in cache and possibly read in the block?

Yes: Write Allocate (Write back caches) No: Not Write Allocate (Write through)

Example: WriteMem[100] WriteMem[100] ReadMem[200] WriteMem[200] WriteMem[100] NWA: four misses and one hit WA: two misses and three hits

Pentium 4 Cache 80386 – no on chip cache 80486 – 8k using 16 byte lines and four way set associative organization Pentium (all versions) – two on chip L1 caches Data & instructions

Pentium III – L3 cache added off chip Pentium 4 L1 caches 8k bytes 64 byte lines four way set associative

L2 cache Feeding both L1 caches 256k 128 byte lines 8 way set associative

L3 cache on chip

Intel Cache Evolution Problem

Solution

Processor on which feature first appears

External memory slower than the system bus.

Add external cache using faster memory technology.

386

Increased processor speed results in external bus becoming a bottleneck for cache access.

Move external cache on-chip, operating at the same speed as the processor.

486

Internal cache is rather small, due to limited space on chip

Add external L2 cache using faster technology than main memory

486

Contention occurs when both the Instruction Prefetcher and the Execution Unit simultaneously require access to the cache. In that case, the Prefetcher is stalled while the Execution Unit’s data access takes place.

Create separate data and instruction caches.

Pentium

Increased processor speed results in external bus becoming a bottleneck for L2 cache access.

Create separate back-side bus that runs at higher speed than the main (front-side) external bus. The BSB is dedicated to the L2 cache.

Pentium Pro

Some applications deal with massive databases and must have rapid access to large amounts of data. The on-chip caches are too small.

Move L2 cache on to the processor chip.

Pentium II

Add external L3 cache.

Pentium III

Move L3 cache on-chip.

Pentium 4

Reducing Cache Misses 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main Memory 5.7 Virtual Memory 5.8 Protection and Examples of Virtual Memory

Classifying Misses: 3 Cs

Compulsory —The first access to a block is not in the cache, so the block must be brought into the cache. Also called cold start misses or first reference misses . (Misses in even an Infinite Cache) Capacity —If the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur due to blocks being discarded and later retrieved. (Misses in Fully Associative Size X Cache) Conflict —If block-placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory & capacity misses) will occur because a block can be discarded and later retrieved if too many blocks map to its set. Also called collision misses or interference misses . (Misses in N-way Associative, Size X Cache)

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF