21-1
Short Description
Download 21-1...
Description
mm
40
40
60
80
1 00
21. Design for Testability J. A. Abraham
60 Department of Electrical and Computer Engineering
The University of Texas at Austin
80
EE 382M.7 – VLSI I Fall 2011 November 9, 2011
1 20
Design for Testability (DFT) mm
40
60
80
1 00
Reduce costs associated with testing complex circuit Design circuit so that it will be easier to test Increase accessibility of internal nodes 40
Controllability: ability to establish specific signal value at each internal node by setting inputs Observability: ability to determine internal values by controlling inputs and observing outputs
Ensure predictable circuit responses 60 Tradeoffs Technical: area, I/O pins, performance Economic: design time, yield, time to revenue 80
1 20
Design for Testability (DFT) mm
40
60
80
1 00
Reduce costs associated with testing complex circuit Design circuit so that it will be easier to test Increase accessibility of internal nodes 40
Controllability: ability to establish specific signal value at each internal node by setting inputs Observability: ability to determine internal values by controlling inputs and observing outputs
Ensure predictable circuit responses 60 Tradeoffs Technical: area, I/O pins, performance Economic: design time, yield, time to revenue 80
1 20
Testable Sequential Circuits Sequential circuits are very difficult to test mm
40
60
40
60
80
1 00
1 20
Design the internal memory elements to be part of a shifter register chain to provide controllability, observability through serial shifts
scan chain, problem of testing any circuit reduces to testing With 80 the combinational logic
Level-Sensitive Scan Design (LSSD) mm
40
60
80
1 00
Structured DFT developed at IBM All internal storage implemented in hazard-free polarity-hold latches (SRLs), part of a scan chain
Latches controlled by two or more non-overlapping clocks , 40 with rules for clocking
60
80
All clock inputs to SRLs must be in their off states when primary inputs (PIs) are “off” Clock signal at any clock input to an SRL must be controlled from one or more clock PIs No clock can be ANDed with another clock Clock PIs cannot feed data inputs to latches, either directly or through combinational logic
1 20
Scan Chains Convert each flip-flop to a scan register mm Only costs 40one extra multiplexer 60
80
1 00
1 20
Normal mode: flip-flops behave as usual Scan mode: flip-flops behave as shift register Contents of flops can be scanned out and new values scanned in 40
60
80
Scannable Flip-Flops mm
40
60
80
40
60
80
100
120
Boundary Scan mm
40
60
80
100
120
Testing boards is also difficult
40
Need to verify solder joints are good Drive a pin to 0, then to 1 Check that all connected pins get the values
Single-sided PC boards with “through-hole” construction used “bed of nails” to contact pins of chip on the back side of the board 60 and BGA boards cannot easily contact pins SMT Build capability of observing and controlling pins into each chip to make board test easier 80
Boundary Scan (IEEE 1149.1) mm
40
60
80
40
60
80
100
120
Boundary Scan Example mm
40
60
80
40
60
80
100
120
Boundary Scan Interface mm
40
60
80
100
Boundary scan is accessed through five pins 40
TCK: test clock TMS: test mode select TDI: test data in TDO: test data out TRST*: test reset (optional)
Chips with internal scan chains can access the chains through 60 boundary scan for unified test strategy
80
120
I DDQ
Testing mm
40
60
80
100
Measure quiescent power supply current of CMOS circuits for selected test vectors
120
Example, microprocessor has nanoamps of current with no faults; many defects (shorts, for example) cause much higher 40 currents Direct relationship found (Sandia Labs., HP, Phillips) between IDDQ test acceptance rates and quality, reliability of ICs 60 need to activate site of potential defect (no need to Only propagate errors)
Leakage current in very large chips may overwhelm the abnormal current due to a fault in one gate 80
I DDQ
Testing, Cont’d mm
40
60
80
40
60
80
100
120
Built-In Self Test (BIST) mm
40
60
80
100
120
Increasing circuit complexity, tester cost Interest in techniques which integrate some tester capabilities on the chip 40
Reduce tester costs Test circuits at speed (more thoroughly)
Approach: 60
Compress test responses into “signature” Pseudo-random (or pseudo-exhaustive) pattern generator (PRG) on the chip
Integrating pattern generation and response evaluation on chip – BIST 80
Pseudo-Random Sequence Generator (PRSG) Linear Feedback mm 40 Shift Register 60 80 100 120 Shift register with input taken from XOR of state Pseudo-Random Sequence Generator (or Pseudo-Random Pattern Generator (PRPG), or Linear Feedback Shift Register (LFSR)) 40
60
80
Step 0 1 2 3 4 5 6 7
Q 111 110 101 010 100 001 011 111
(repeats)
Example of BIST mm
40
60
80
100
120
40
60
80
Technique called STUMPS (from IBM)
Testability Issues – Custom Microprocessors mm
40
60
Chip complexity, large volumes, costs Area/performance penalties
80
100
120
Fault (defect) coverage goals 40 generation (fault simulation) costs Test Development time Tester time (manufacturing costs) 60
ASICs generally use a “full scan” methodology for testability Some of the ad-hoc techniques used in custom microprocessors will be described in the following slides 80
Testability Features of the Motorola 68020 Chip Statistics:
mm
40 60 Process: 2µ HCMOS Die Size: 375 × 347 mils Transistor count: 200K
80
100
Test Overhead 40
60
Test Logic Area: < 3% Test Microcode: < 2% Test routing shared with functional routing to keep chip area down Bus structure helped create test partitioning for embedded structures Tradeoff of extra time to produce ad-hoc test logic with expected higher yield
Test techniques for ROMs, PLAs, Cache 80
Test mode not available to customers (Why??)
120
Testing Scheme for ROMs in the 68000 Family mm
40
60
80
100
120
Initial schemes for testing 68000 family control structures involved bussing nanoROM outputs off-chip via the address bus in test mode 40
Could not test at speed since nanoROM output buffers were not sized to drive a bus Only nanobits were observable, and the PLAs, microROM and nanoROM had to be tested as one large sequential machine 60
Test sequences were lengthy and difficult to write Tests were also microcode dependent 80
Testability Techniques for 68020 ROMs Used test mode to force next microcode address (NMA) from mm data pins 40 60 80 100 120 Data pins also control a MUX for both micro and nano ROM outputs, which are moved to the BC bus, into the data section of the execution unit, and to the address bus which can 40 be observed
60
80
Exhaustive testing of the 2K ROM entries 32 bits of ROM visible every 2 clocks Four passes of tests needed to read the 110 outputs of the two ROMs
Microrom Test in MC68881 Floating Point Co-processor ROM physically split into two sections In test mode, ROM is addressed directly through the mm command register 40 60 80 100 120 Exhaustive addresses fed to ROM in a “ping-pong” fashion (address/address complement) Outputs of ROM go to two 16-bit signature registers (using CCITT-16 polynomial x15 + x12 + x5 + 1) 40 Monitor both the quotient and final signature serially on a test pin (Probability of aliasing 2−(n+m−1) ) 60
80
MC68881 NanoROM Test NanoROM physically located between the microROM and the execution unit (ECU), and outputs fed to ECU
mm No functional 40 path for 60 100 registers 120 nanoROM to80 access signature
In test mode, nanoROM columns coupled with the microROM columns (with additional columns of nanoROM multiplexed to signature register) 40
60
80
MC68881 Entry PLA Test Four entry PLAs, A0, A1, A2 and A3 the entry is mm A0 PLA contains 40 60 point reset 80 vectors and 100 completely tested functionally A1–A3 tested like the ROMs 40
60
80
120
Command register generates patterns and outputs are routed through a bus to the signature register Test patterns generated using PLA test generator PLA Inputs Outputs Products A1 6 10 23 A2 13 10 133 A3 5 10 28 Response 25 25 56
Built-in Self Test in the Intel 80386 Normal PLA inputs disabled during test and output of mm pseudorandom provides exhaustive set of tests to 40 generator 60 80 100 AND-plane input
120
CROM tested with binary counter (exhaustive test) Responses compressed using multiple-input signature registers 40
Test transistors: 2.7% 60
Area overhead for BIST: 1.8% Transistor sites tested with BIST: 52.5%
80
Area tested: 18.6%
Testing Cache Memory Arrays in MC68030 Cache cell layout design is resistant to both bridging defects and capacitive coupling
mm Most likely 40 bridging defect 60 is between 80adjacent metal 100 bit lines 120
Memory fault model: One or more cells stuck at 0 or 1 Coupling between cells 40
11n March Test (Marinescu, 1982)
60
80
Refresh and data retention tests for the dynamic memory cells
Test Architecture of MC68040 Utilizes three different DFT methods
mm
40
40 60 80 100 120 Functional tests for data paths with normal mode instructions Ad-hoc test modes constructed for testing on-chip memory arrays and on-chip phase-locked loop Extension of existing functional snoop logic to allow arrays to be accessed from the bus similar to a static RAM array Structured custom scan approach for ROM, PLA and control areas
Global test bus driven from two 5-bit registers coordinates test logic 60 IEEE 1149.1 boundary scan test interface Boundary scan data register shared with scan logic
Test overhead: 80
18,560 devices, 1.55% of total number of devices 3.15% of total die area, including global test signal routing
MC68040 DFT Method Coverage mm
40
60
80
40
60
80
100
120
MC68040 Scan Chain and Timing mm
40
60
80
40
60
80
100
120
BIST in IBM Risc System/6000 LSSD with pseudo-random BIST mm
40
40
60
Common Engineering Processor (COP), bus, on-card sequencer 80 (OCS), engineering support processor (ESP)
60
80
100
120
Common on-chip Processor (COP) in IBM RS/6000 mm
40
60
80
100
120
40
60
80
Small processor controls operation
Hardware for pseudo-random vector generation and result compression (31-bit LFSR)
BIST in IBM/Motorola Power-PC Variety of test techniques applied to the Power-PC 603 mm Full LSSD 40test of logic60 80 BIST of “large” embedded RAMS Functional test of small RAMS I DDQ tests
100
BIST for cache and tag RAMs 40
120
Functional vectors (good for data cache, not instruction cache) and random BIST (size, complexity, test coverage) not applicable
Use modified march test of Dekker (1988) 60
log2 n pattern for data
Overhead: BIST is 2.9% of RAM array, 0.58% of total chip Performance impact: less than 100 pS due to extra MUX input leg 80 All four RAMs tested in parallel, 2.5 mS at 80 MHz
Testing Embedded Cores in a System on Chip mm
40
60
40
Lagged Fibonnaci Sequence random pattern generator 100 X 80 X n55 ) mod120 m, n = (X n24 + n ≥ 55 Primitive polynomial: X 55 + X 24 + 1 Period: 2321 × (255 − 1) No multiplication necessary x86 code for generator:
60
Source/Sink for test: Embedded processor Test access mechanisms: Busses Core test 80 “wrapper” for Test Access Interface (TAI)
LDR LDR ADD STR AND AND AND
R5,[R0],#4 R6,[R1],#4 R5,R5,R6 R5,[R2],#4 R0,R0,R3 R1,R1,R3 R2 R2 R3
Issues with Built-In Self Test Technique can run test sequences at operating frequencies and mm 40 60 80 100 capture results within the chip Pseudorandom pattern generators, signature analyzers Can also use 40 weighted random patterns or deterministic patterns Example (Synopsys): (Deterministic Logic BIST 60
Problems: Hardware overhead Test power Non-functional modes during test 80
120
Industry Issues with Processor Testing mm
40
60
80
100
Concern with detecting real defects Small delay defects due to process variations, power droops and capacitive coupling 40 Cause a shift in the speed of the part Problems with logic BIST (same issues with scan AC tests) Overheads on chip False 60 paths tested Test operating conditions different from normal operating modes 80
120
Software-Based (Native-Mode) Self Test for Processors mm
40
60
80
100
Why not use functional capabilities of processors to replace BIST hardware?
120
No additional hardware
Reduce test costs by using low-cost testers 40 Increase coverage of delay defects and increase yield by testing native No issues with excessive power consumption during test 60
Developed at University of Texas (Int’l Test Conference 1998) Application to processors at Intel (Int’l Test Conference 2002) 80
Native-Mode Signature Compression mm
40
60
80
100
40
Pseudo-code for software signature analyzer
60 general register or memory, holds signature // S: for each register Ri to be compressed
{ Shift Right Through Carry(Ri ); 80 (Carry) {S = XOR(S, feedback polynomial)}; if S = XOR(S, Ri );
}
120
Intel Functional BIST Functional Random Instruction Testing at Speed (FRITS) applied to Itanium processor family and Pentium 4 line (ITC 2002) mm
40
60
80
100
Tests (kernels) are instruction sequences Kernels loaded into cache and executed in real time during test 40 application They generate and execute pseudo-random or directed sequences of machine code On Pentium-4, FRITS 60
added 5% unique coverage to manual tests screened 10% – 15% of chips which passed wafer sort/package tests, but failed system tests enabled low-cost testers: 40% increase in defect screening on 80 structural tester Kernels execute 20 loops in ≈ 8 mSecs
120
Are Random Tests Sufficient? mm
40
60
80
100
120
Intel implementation involved code in the cache which generated random instruction sequences Interest in generating instructions targeting faults 40
Possible to generate instruction sequences which will test for an internal stuck-at fault in a module In order to deal with defects in DSM technologies, need to target small60delay defects Recent work: automatically generate instruction sequences which will target small delay defects in an internal module 80
On-Chip Logic Structures for Debug mm 40 60 80 100 Distributed reconfigurable infrastructure fabric User-guided insertion compatible with existing design flows
Soft IP inserted at RTL Configured post-silicon for maximum debug flexibility 40 Activated dynamically
60
triggers to start/stop recording of signals assertion checkers event detectors and counters pattern generators
Configuration and instrument control via JTAG TAP 80 DAFCA Source:
120
Instrumented Chip Example mm
40
60
80 Source: DAFCA
40
60
80
100
120
Logic Analysis and Signal Capture mm
40
40
60
80
100
120
DEMON block implements triggers Signals captured by Tracer
60
80 Source: DAFCA
Software reads trace buffer and prepares VCD or FSDB files
Challenges in Detecting Small Delays Need to test long paths at speed External tester not match DUT mm 40 speeds do60 80 speeds 100 Tester skew, pad insertion delay High-speed testers very expensive Can attempt AC-scan 40
60
Test clock frequency is determined by delay between launch and capture Generate fast launch and capture signals on-chip 80 Tester used only for scan in/out data Tester need not be fast
120
Detecting Delay Defects After Manufacturing mm
40
60
80
100
Necessary to detect “small-delay defects” Delay defects which don’t affect performance soon after manufacture could be reliability hazards 40 Need Path Delay Tests to ensure that defective parts are screened out
60
Difficult to apply two-pattern tests in scan mode Requires precise control of clocks for high-speed circuits If capture clocks are based on the system clock, there is no information on the slack for the path
Solution: On-chip programmable capture mechanism Ability to capture faster than at-speed 80
120
Delay Lines for On-Chip Programmable Capture mm
40
40
60
80 Source: R. Tayade,
et al.
60
80
100
120
System for On-Chip Programmable Capture mm
40
40
60
80 Source: R. Tayade,
et al.
60
80
100
120
Power-Supply Noise Detection mm Reference Voltage
40
60
80
100
40 constant voltage
60 Volta e Detector
80 Source: T.-Y. Wu
Phase Detector two signals se arated b a phase difference et al.
result
120
Voltage Detector Reference Delay Line
mm
reset VDDclean
40
INVregular 60
80
100 RDL_out
INVtail RDL_outb switch
40
VDL_outb VDDlocal VDL_out
Varied Delay Line
TRDL
60 switch
80
TVDL - TRDL
RDL_out VDL_out
Source: T.-Y. Wu
et al.
120
Phase Detector pulse generator
switch
capture
mm
40
60
80
100 INVregular
RDL_out
INVtail r 0
r 1
r 2
out
out
v1
v2
r m
40
v0
out
v m
60VDL_out RDL_outb
VDL_outb
80
Source: T.-Y. Wu
et al.
RDL_pulse
slow
VDL_pulse
fast
out
120
Layout of Delay Lines Interleaving RDL and VDL reduces the impact of systematic intra-die process variation mm 40
60
80
100
40
60
RDL_part1
VDL_part2
RDL_part3
VDL_part4
VDL_part1
RDL_part2
VDL_part3
RDL_part4
80 Source: T.-Y. Wu
et al.
120
Die Photo of the SCOUT Detector mm
40
60 500um
80
Reference voltage generator
40
Voltage and phase detectors
60
Performance monitor and test circuitry
80 Source: T.-Y. Wu
et al.
100
120
View more...
Comments