Pa2 Jay Pres
Short Description
good for icc flow...
Description
Ultra High Speed (5Ghz) Block Custom Physical Design Flow with ICC
Prakash Jayasekharan Senior PD Engineer
Suman Musunuru Senior Design Engineer
Maxim Integrated Products
Maxim Integrated Products 1
Agenda •
Challenges in High speed Physical Design
•
- Design Constraints, Library and Design issues Custom solutions with Synopsys ICC flow
- Matrix re-characterization, Synthesis improvements, placement sensitive flow, CTS waveform balancing, Signal EM, power
Timing/STA correlation results
•
- Star-RC vs Calibre, ICC vs PT-SI
Conclusion/Takeaways Appendix A Appendix B (scripts)
• • •
2
Maxim Integrated Products
Design Constraints 65nm SOC design
•
- 2.4 Million gate
- Block A and Block B @5GHz (200ps period) - 5% late, 10% early Derating (both clock and data), 5% Jitter - Target skew ~15ps
Transition ~20ps Pulse width ~ 80ps - IR < 3% Peak (or 30mv Weff)
3
Maxim Integrated Products
Library Issues Re-characterization of timing libraries
•
- Traditional library tables produce pessimism in timing delay calculation (setup/delays worst by 10ps at least) .lib spice
4
Maxim Integrated Products
...Library issues Extra pessimism not tolerable because
•
- 10ps for each cell gets added to become significant - Paths become too tight to fix Library is mostly made of weak drive strength buffers, complex gates. Realistic fanout higher switching power
•
=> higher insertion delay
- Weak clock tree cells cause more insertion delay > 70% of the logic is sequential. Setup (reg2reg) timing is critical
•
Decap cells for peak IR released late in the flow
•
- could not be added in block A 6
Maxim Integrated Products
...Design Issues •
Small coupling caps (1fF) due to size of design - Small nets in the design do not get extracted and can be dropped . Use coupling_abs_threshold to reduce thresh
• 4 corners for IR/EM, 3 corners for Timing - highV, high Temp added finally for IR/EM Voltage
Temp
Tag
Description
0.9
125.0
WCCOM
Traditional worst case timing
1.1
-40
LTCOM
Traditional best case timing
0.9
-40
WCLCOM
Temp inversion corner
1.1
125
MLCOM
Worst EM/IR/Leakage
7
125C
T e m p 0.9
1.1
Voltage -40C
Maxim Integrated Products
Agenda •
Challenges in High speed Physical Design
•
- Design Constraints, Library and Design issues Custom solutions with Synopsys ICC flow
- Matrix re-characterization, Synthesis improvements, placement sensitive flow, CTS waveform balancing, signal EM, power
Timing/STA correlation results
•
- Star-RC vs Calibre, ICC vs PT-SI
Conclusion/Takeaways Appendix A Appendix B (scripts)
• • •
8
Maxim Integrated Products
Matrix re-characterization timing() { related_pin : "cp" ; timing_type : setup_rising ; fall_constraint(cnst_ctin_rtin_3x3) { index_1("0.003, 0.2019, 0.9"); index_2("0.003, 0.2019, 0.9"); values("0.00995, 0.0199, 0.06965",\ "0.08955, 0.1095, 0.2089",\ "0.2189, 0.1791, 0.3184"); }
timing() { related_pin : "cp" ; timing_type : setup_rising ;
B E F O R E (3x3)
10x10 reduces extra pessimism
fall_constraint(cnst_ctin_rtin_10x10) { index_1("0.003, 0.009191, 0.03092, 0.07243, 0.1371, \ 0.2278, 0.3472, 0.4976, 0.6812, 0.9"); index_2("0.003, 0.009191, 0.03092, 0.07243, 0.1371, \ 0.2278, 0.3472, 0.4976, 0.6812, 0.9"); values("0.00995, 0.00995, 0.00995, 0.00995, 0.00995, 0.0199, 0.02985, 0.0398, 0.04975, 0.06965",\ "0.0199, 0.0199, 0.00995, 0.0199, 0.0199, 0.0199, 0.02985, 0.0398, 0.0597, 0.06965",\ "0.02985, 0.02985, 0.02985, 0.02985, 0.02985, 0.0398, 0.04975, 0.0597, 0.06965, 0.08955",\ "0.04975, 0.04975, 0.0398, 0.04975, 0.0597, 0.06965, 0.0796, 0.08955, 0.1095, 0.1194",\ "0.06965, 0.06965, 0.0597, 0.06965, 0.0796, 0.0995, 0.1095, 0.1293, 0.1492, 0.1691",\ "0.08955, 0.08955, 0.0796, 0.08955, 0.0995, 0.1194, 0.1393, 0.1691, 0.199, 0.2288",\ "0.1194, 0.1095, 0.0995, 0.1095, 0.1194, 0.1393, 0.1592, 0.189, 0.2288, 0.2686",\ "0.1492, 0.1393, 0.1194, 0.1293, 0.1393, 0.1492, 0.1791, 0.2089, 0.2487, 0.2885",\ "0.1791, 0.1791, 0.1492, 0.1492, 0.1592, 0.1691, 0.189, 0.2288, 0.2587, 0.3085",\ "0.2189, 0.2189, 0.1791, 0.1791, 0.1791, 0.189, 0.2089, 0.2388, 0.2786, 0.3184");
A F T E R ( 10x10)
} 9
Maxim Integrated Products
Synthesis Improvements Very slow cells like XOR, 4:1 Mux, AOI gates prohibited
•
- some sensitive logic hand instantiated to prevent AOI or XOR selection
Register Cloning/Fanout optimization to reduce fanout
•
- 10-15% increase in sequential area, but helps reduce flop delay
- set_register_replication (DC) can be used
Load Cap =C/2
Load Cap =C
10
Load Cap =C/2
Maxim Integrated Products
Placement Sensitive Flow Cell placement is closely controlled in all stage Bad timing due to:
• •
- Placement of cells due to loose constraints - High buffer insertion to close timing
Clocks over-constrained by 10% and incremental psynopts improves timing
•
- Best possible flop placement achieved
Clock latency set to simulate post-cts derating in placement
•
11
Maxim Integrated Products
Placement.. Default timing flow
create_placement + WNS :-0.05, 50 paths psynopt
Derating clock_opt
WNS:-0.10, 60 paths
SI+ Wires route_opt + route_opt -incr
12
WNS:-0.18, 90 paths
Maxim Integrated Products
Placement.. 40 ps uncertainty Dont upsize Just Move
PSFlow
create_placement+ psynopt
psynopt(1)
WNS :-0.05, 50 paths
Allow buffer resizing
WNS:-0.10, 80 paths
Remove extra uncertainty (24ps) Don’t move registers
psynopt(2)
WNS:-0.025, 50 paths
clock_opt –only_cts SI + wires WNS:-0.08, 20 paths
13
WNS:+0.005,10 paths
WNS:-0.015, 10 paths (waived)
route_opt+ route_opt -incr
route_opt -incr (reg2reg only)
Maxim Integrated Products
CTS-Waveform Balancing Getting around clock cells’ asymmetricism
•
- Decision to use same non-equal duty cycle inverter back to back to avoid pulse width issues
14
Maxim Integrated Products
CTS-others Register placement is fixed Fast transition times help speed up Ck-Q timing
• •
- Also reduces setup times at the flops
Final duty cycle tolerance -40/60% Since skew is very small eliminates hold fixing
• •
15
Maxim Integrated Products
Power Analysis Both blocks are in special power domain (not shared by top ) Target < 3% (i.e. 33mv) IR drop achieved @MLCOM (1.1, 125) is 14 + 17 = 31 mv
• • •
Pads
block B
block A
Top core 16
Maxim Integrated Products
Power EM EM, Rj issues due to high current through buses with insufficient Vias (Important run for high speed) ICC custom route tool used to add extra Via2, M2
•
•
4x2 array
17
2x pin width
Maxim Integrated Products
Signal EM
SAIF based EM Fix Signal EM (If any)
Statistical EM
* Fix Signal EM Iterations Timing clean up( Worst func mode for power )
Fix minor DRCs/Antennas
Repeat for critical functional modes.
STA
* fix_signal_em (or) script Simulate/generate vcd /saif file.
18
Reduced Timing Iterations
Maxim Integrated Products
...Signal EM •
19
Sample EM fix with repair file (clock widened 2x to 4x)
Maxim Integrated Products
Agenda •
Challenges in High speed Physical Design
•
- Design Constraints, Library and Design issues Custom solutions with Synopsys-ICC flow
- Matrix re-characterization, Synthesis improvements, placement sensitive flow, CTS waveform balancing , signal EM, power
Timing/STA correlation results
•
- Star-RC vs Calibre, ICC vs PT-SI
Conclusion/Takeaways Appendix A Appendix B (scripts)
• • •
20
Maxim Integrated Products
Correlation Bottom up flow to make sure ICC settings are close enough to PrimeTime, Star-RC ( Solvnet IC Compiler
•
Correlation Checklist Trilogy )
Extraction Settings
•
OPERATING_TEMPERATURE: 25, COUPLE_TO_GROUND: NO, COUPLING_ABS_THRESHOLD: 1e-15 , MODE=400 , EXTRACT_VIA_CAPS =YES
Noise / Timing Settings
•
set db_load_ccs_noise_data true, set timing_crpr_threshold_ps 0, set si_filter_accum_aggr_noise_peak_ratio 0.2
21
Maxim Integrated Products
Star-RC vs Calibre spef Block B: Star-RC within 8% mean
•
22
Maxim Integrated Products
ICC vs PT-SI slack Block B: ICC (4ps) slightly pessimistic vs PT (2ps)
# Paths
# Paths
•
-0.004
0.000
0.005
WNS(ns) 23
0.009
-0.002
0.000
0.005
WNS(ns) Maxim Integrated Products
0.011
Agenda •
Challenges in High speed Physical Design
•
- Design Constraints, Library and Design issues Custom solutions with Synopsys-ICC flow
- Matrix re-characterization, Synthesis improvements, placement sensitive flow, CTS waveform balancing , signal EM, power.
Timing/STA correlation results
•
- Star-RC vs Calibre, ICC vs PT-SI
Conclusion / Takeaways Appendix A Appendix B (scripts)
• • •
24
Maxim Integrated Products
Conclusion / Takeaways Fix Library Issues
•
- Good range of cells with decent strengths for optimization
- Cell names must be user friendly to limit use (for better EM/IR) - Larger matrices for setup/pulse timing to prevent timing pessimism - Symmetric clock cells tagged with special naming - Don’t use cells should be clearly marked
Fix Process Corners (e.g. MLcom , WCLcom)
•
- Special situations like Temperature inversion for timing, High
Temp corners for leakage, peak IR drop should be known well in advance
25
Maxim Integrated Products
…Conclusion / Takeaways Think Top level
•
- Think about next stage, top level
Correlate (SolvNet : IC Compiler Correlation Checklist Trilogy )
•
- Star-RC / ICC extraction should be correlated to device level - PT-SI and ICC noise settings should be checked Tune ICC to meet requirements (e.g. custom placement, custom cts, custom router, etc…)
•
- Get to know all options available - Script for Reusability
26
Maxim Integrated Products
Thanks… Synopsys Hotline Filed and accepted requests for EM gui and temperature scaling Retaining FILLs in soft block while after flattening Ability to check min grid during zroute verify
• • •
Others 1. KhanKap Mounarath – Sr. Scientist, Maxim
2. DSM group/ Library , Maxim EDA 3. Bill Sicaras - Synopsys AC
27
Maxim Integrated Products
Appendix A PT-SI and Spice correlation
•
Spice level simulation performed on the worst path Startpoint: clk_div_0/div_by2_by4_0/sig_i4_reg (rising edge-triggered flip-flop clocked by dac_clk1) Endpoint: clk_div_0/div_by2_by4_0/sig_i4_reg (rising edge-triggered flip-flop clocked by dac_clk1) Path Group: dac_clk1 Path Type: max
∑ ( launch clock delay + CK-Q delay +
combinational delay to the Endpoint register ) is within 5% for Block B
28
Maxim Integrated Products
Appendix B (scripts) Script used for placement ## Source the common settings for placement and optimization source common_placement_settings_icc.tcl set placer_max_cell_density_threshold 0.68 ## 15% of the clock period which is 200ps is 30ps ## 30ps plus 10ps uncertainty is 40ps overconstraining set_timing_derate late 1.15 set_clock_uncertainty 0.01 [all_clocks] set_critical_range 0.090 cd18_decoder_dac ## INITIAL PLACEMENT create_placement effort high congestion congestion_effort high legalize_placement ## FIRST ROUND OF optimizations set_dont_touch [get_cells * ] set_dont_touch [get_nets * ] psynopt ## tighten the output paths set_clock_uncertainty 0.015 [all_clocks] set_clock_latency 0.200 [get_clocks dac_clk] set_clock_latency 0.100 [get_clocks dac_clko] psynopt ## SECOND ROUND OF optimization ## Remove the dont touches and let the tool optimize the ## timing more . ( upsize cells etc. ) remove_attribute [get_cells hier *] dont_touch quiet remove_attribute [get_nets hier *] dont_touch quiet ## do not optimize some sensitive logic set_dont_touch [get_cells U*] psynopt ## save cell and report timing ## 29
Maxim Integrated Products
Appendix B Script used for CTS # DON’T MOVE CAREFULLY PLACED CELLS set_dont_touch_placement [get_cells hier *_reg* ] set_attribute [get_cells hier spr*] is_fixed true remove_clock_tree clock_trees { dac_clk dac_clko} honor_dont_touch reset_clock_tree_references define_routing_rule decoder_clk_shield_rule default_reference_rule taper_level 0 multi lier_width 2 multiplier_spacing 1 shield ## CONTROL TRANSITION FOR CLOCKS ## RELAX BUFFER LEVLES TO l help fix fanout set_clock_tree_options layer_list $runOption(input,clkRoutelayerList) routing_rule ecoder_clk_shield_rule use_default_routing_for_sinks 1 target_skew 0.010 max_buffer_levels 9 max_transition .024 set_clock_tree_options clock_trees dac_clk routing_rule decoder_clk_shield_rule \ use_default_routing_for_sinks 1 target_skew 0.010 max_buffer_levels 9 set_max_fanout 2 [get_ports dac_clk] set_max_fanout 2 [get_ports dac_clko] ## Tighter transition on output clk. timing is ok. set_clock_tree_options clock_trees dac_clko max_buffer_levels 3 max_transition 0.022 check_clock_tree clocks dac_clk report_clock_tree summary clock_trees dac_clk level_info report_clock_tree show_all_sinks report_clock_tree settings > clktree/settings.rpt update_clock_latency ## Turn on removal and recovery check ## set enable_recovery_removal_arcs true ## Perform clock tree synthesis only clock_opt only_cts operating_condition min_max 30
Maxim Integrated Products
View more...
Comments