From RTL to GDS using Synopsys flow within less then 10 weeks

May 9, 2018 | Author: Lohit | Category: Electrical Engineering, Computer Engineering, Digital Electronics, Electronic Engineering, Electronic Design

Share Embed Donate

Report this link

Short Description

Download From RTL to GDS using Synopsys flow within less then 10 weeks...

Description

Multi-Million Gate design From RTL to GDS Using Synopsys flow within less then 10 weeks

Yaron Lavi

Intel Corporation [email protected]

ABSTRACT

Synopsys holds a set of tools, which enables smooth flow from RTL to GDS (TO) within relative short time and with only two major layout iterations. Although schedule (RTL2GDS) is high depended on design complexity, layout utilization, computing resources, head-count and many other factors, we found a flow which enable to do the job with high confidence level and with approximately constant time to multi-million gate count projects. This paper will present a proved flow that we used in several projects in which we took the advantages of: § Design compiler for synthesis. § Physical compiler + DFT compiler for placement and scan insertion § Astro for clock tree, HFN and routing § PrimeTime for static timing analyzes The results are working Silicon in all design target corners, which is being manufactured in high volume quantity. Yet, it would be fair to mention that there are tools from other vendors, which support the design effort and validation, but this is the backbone of the back-end flow.

Table of Context 2 1.0 Introduction........................................................................................................................... 3 2.0 Design flow ........................................................................................................................... 4 2.1 RTL Verification.............................................................................................................. 5 2.2 Professional Synthesis ...................................................................................................... 6 2.3 Floor Planning (Update).................................................. ................................................. 8 2.4 Physical Synthesis (G2PG)............................................................................................... 8 2.5 Layout and Timing closure ............................................................................................. 10 2.5.1 Astro Physical stage................................................ ............................................... 10 2.5.2 Full Timing model for Static Timing Analyzes ................................................. ......... 13 2.5.3 Astro ECO mode ................................................... ............................................... 14 2.6 Final tuning................................................... ................................................................. 15 3.0 Conclusions and Recommendations ................................................................................ 15 4.0 Acknowledgements ...................................................................................................... 15

Table of Tables Table 1 - Typical Synthesis results............................................... ................................................. 6 Table 2 - Physical congestion typical results .................................................................................. 9 Table 3 - Astro clock tree skews ................................................ ............................................... 10

Table of Figures Figure 1 - General design flow ..................................................................................................... 4 Figure 2 - Physical SCAN Chains ............................................... ................................................. 9 Figure 3 - Layout Typical Congenstion.................................................. ..................................... 12 Figure 4 - Bonus and FIB cell scattering ................................................ ..................................... 14

SNUG Israel 2004

2

From RTL to GDS

1.0 Introduction The increasing demand of the market for new communication products, with tough competition from powerful as well as new competitor make the slogan of Time To Market a key element of success. Products life time becomes shorter, and a fast production High Volume Manufacturing ramp-up is needed. As a result, design cycles are shortening and there is a need to have a steady flow. This flow should support high level of confidence, to meet the schedule constraints. We have to reduce the relation of gate count and complexity of design depended, and become approximately constant time to multimillion gate count projects success. In this paper, I will present our flow (and cost) to meet the requirements of such a task. It is based on Synopsys set of tools with a flow which was proved in couple of projects. Using advanced process for the communication application may have the advantage of adding extra cells with no impact over timing. Always there is a place for additional margin to guarantee fast execution. An important point to emphasize is the fact that any request/constraints have a cost to be paid. The confidence level of quality and schedule that we have developed will cost: a. More computing resources b. Much more license. c. Extra die area Those targets based on two major assumptions, which must be kept: a. High quality of RTL code. b. Only two major cycles of timing closure loops. The flow is based on 0.18um and 0.13um process with about 2M gates design. The design also includes hard macros like memories and others. This size of design is behind the tools limitations (to handle as one chunk), so it being divided for several cluster to run at parallel. Most of the examples, which presented here, are shown for one cluster of the design, in order to prevent IP disclosure. The paper will include the following steps of design a. RTL to Gates b. Gates to Placed Gates c. Astro Physical Stage Clock Tree & HFN • Routing • d. Timing closure e. ECO flow f. Full chip verification

SNUG Israel 2004

3

From RTL to GDS

2.0 Design flow The design flow includes several steps and milestones, which can be viewed in general as described in figure1. Product Definition

RTL Coding Re-use

Logic ENV TB

Manual Design Analog

Floor-Plan

Behavioral Model Logic Model

Full Chip Testing

Synthesis

Verification Regression, ATG, STA, FV, GLS, RV

Analog Testing

LAYOUT Manual, APR

LVS/DRC TO

Figure 1 - General design flow

This paper focus is over the physical design flow (colored) and clarifies the way to get to Tape-Out within 10 weeks with one full synthesis loop and two major timing closure loops. Of course physical design team should know the design and the methodology very well before starting this flow, because there is no place for re-work. It is highly recommended to pass through such a process two times over uncompleted and non-ready design to make sure the flow and tools are working as expected. This is the time to raise issues of floor-plan (clusters partitioning), MAX delay path, too complex random logic for the formal verifications tools etc. The physical design can be summarized in the following points: RTL verification toward Back-end readiness – pre stage to meet • Synthesis optimization (including clock gates insertion) – 2 week • Physical synthesis (including scan insertion and DRC fixing) – 1 week • Routing (including HFN and CT insertion) – 1 week • Static Timing Analysis (full annotation) X2 – 2 weeks • Logic + Timing ECO’s X2 – 2 weeks • Layout ECO’s – 2 weeks •

SNUG Israel 2004

4

From RTL to GDS

2.1

RTL Verification

One of the critical and key milestones is the Fist Sign-Off , in which the entire RTL database is delivered to the Physical Design flow. At this point there is always a point to validate that database is ready. What is ready in the eyes of the physical designer? Our definition for database ready includes very tight definition with the following major points: 1. Synthesizable code 2. Synchronous design 3. No Latches 4. No Max delays 1 5. Design for Testability verified (Scan, Memory BIST etc). 6. Asynchronous path defined and verified 7. All design exceptions are approved. 8. All kinds of constraints file are ready. 9. Right use of pre created special cells. All those check points can be verified with several tools in the market. Some of them are Synopsys tools of the flow (like Design Compiler, Prime-Time), while others are specific for different aspects of Design Rule Checking (like SpyGlass, Logic Equivalent Checker, ATPG etc). Additional important verification, which is being done at this point, is the floor-plan area vs. gate count matching. This is the last time to make any change in the floor plan due to the major schedule impact, when it is done later in the flow. All physical design clusters should have no more the 70% utilization. This utilization is considered low enough to include all design “buffer” in the flow and guarantee no need for floor plan changes. The MAX delay margin is also an important parameter that should take into account at early stages of design. Communication products frequencies are low compared with the advanced process which is being used. For example, typical frequencies are in the range of 40-160 MHz, with some exceptions of design of 250 MHz. It is far away from the CPU’s which using similar process, but with GHz frequencies. Therefore, at the pre-stages of Physical-Compiler/ASRTO, we define the clock uncertainty as 25-30% of clock cycle in this stage (basic synthesis of RTL). This is a very high margin, causing all MAX delay violations to be solved by Logic concepts (like pipelines) at early stages of design. The physical designer is hardly meeting with this time consuming problem of MAX delay. Summary: Keep verification simple and use conservative design rules Start with low utilization to remove floor plan risk (pay with die) Prevent MAX delay by high clock uncertainty to protect streaming of the flow (pay with area) 1

Defining MAX delay is highly related to the clock definition. The margin which defined in the clock uncertainty may cause unreal MAX delay violation. Our concept will be explained in the next chapter. SNUG Israel 2004

5

From RTL to GDS

2.2

Professional Synthesis

In this part we are actually start the physical design flow and the schedule clock start to count. We are using at this stage all our computing resources and all available licenses to make the best results out of the RTL in gates. The constraints files are ready from the early/basic stage and no surprises at this stage should occur. This includes various types of synthesis like: Top down • Bottom up • Bottom up using characterized method • Using advanced flow with DC_Ultra and DW_foundation when needed. • This stage has some characteristics, which can be similar to “trial and error” method. But the efforts yield results. Each design has its right approach to synthesis and you can’t know it from just looking over the code. The design exploration and elaboration is an effort that must be taken into account. I would expect any physical designer to know all kinds of synthesis methods and use them over his block prior to the final run, but it seems always the final drop of RTL has its own secrets (especially when dealing with arithmetic data path blocks). As a result we can see netlist with up to 10% gate-count reduction, including scan FF’s and clock gates (power compiler). In the table below it can be seen a typical block synthesis results. The Maximum benefit is gained by a professional synthesis using additional licenses like Ultra and DW foundation. It is design depended but our experience showed 5-10% reduction in gate count. More over when using 0.13um process (in the table) the effect of extra synthesis MAX delay margin has minor effect over gate count and area. Those extra cells, which are the cost for the extra margin, reduce the delay in the flow to fix the MAX delay violations. The power compiler with the standard clock gate cell reduce gate count by additional ~7%.

Clock uncertainty Margin [%] 10 10 25 25 25*

NAND Gate count [Kgate] 171.3 186.4 175.6 190.0 211.5

Power Compiler ON OFF ON OFF OFF

Number of Instance 42,412 46,840 44,631 48,502 50,212

* - Basic synthesis Table 1 - Typical Synthesis results

Summary: Keep control over gate-count with all netlist changes (DFT and Power) SNUG Israel 2004

6

From RTL to GDS

SNUG Israel 2004

7

From RTL to GDS

2.3

Floor Planning (Update)

In general, the floor plan stage is very early in the design flow. I’m mansion it at this point, since it is the last point to make any modification with minor impact over schedule. Blocks, macros, pads are verified once again, that easily can be placed on the die size. This is based on the synthesis final results. Also, the pins from each cluster should place in the logical order. Jupiter tool, which can find the logical connection between the units, will place them accordingly. Just to remind, the main considerations for chip floor planning are: Die size • Connection between macros/blocks to the pads • Blocks interconnect • Grid supply • Amount of pins • Complexity of connection in the gate area , which will lead to a group/region definitions • The results of the above would be inserted back to the Physical-Compiler for the placed synthesis.

2.4

Physical Synthesis (G2PG)

In this stage, the number of iteration reduced dramatically. Cluster are now at the level of 850K gates (equivalent NAND gate) and the run time is about 30 hours including all the reports . The design passes through several iterations of optimization, and SCAN chains are being built according to the placement of the FF’s. The minimum is one PhyOpt and additional two incremental runs. The design MAX delay margins are being reduced to 20% of clock cycle. Additional iteration of fixing MAX Transition and MAX Capacitance is done in order to reduce those violations from the Layout stage. Usually, resizing solves these violations, but we also add in extra buffer to split long or high fan-out nets. Min delay violations (actually preventions) are being handled too (there is no Clock Tree yet) based on statistical results. The approach that we use for the MIN delay prevention is to add extra buffer in any path that has the potential to become MIN delay path. For example: “Back to Back” FF’s are in this category. All of that yield “ECO” of about inserting 7,000 different buffers over a block with 25K FF’s (180K instances). The utilization now is raised to 75% and may in extreme cases raised up to 85% that is our upper limit. Beyond this limit, (utilization, gate count and instance count) the results become poor (MAX delay, congestion) and run time is much higher. Also the Physical-Compiler tool can crash.

SNUG Israel 2004

8

From RTL to GDS

Figure 2 - Physical SCAN Chains

The physical synthesis stage is also a good place to add Logic ECO’s. The ECO mode is very easy to use in the netlist level. Physical-Compiler adds and places the gates with no major effect over timing when the design is ready. As you can see the reports of typical block is seen quit good. X Congestion threshold: Violations (usage > threshold) Number of edges: Maximum violation: Average violation:

Y 0.7000

9323/95 392 0.6802 0.0800

0.700 0 17046/9 5392 0.5357 0.1797

Histo graph for congestion on X < 0.80: **************************************** (92278) 0.80 - 1.00: ** (3058) 1.00 - 1.20: * (45) 1.20 - 1.40: * (11) Histo graph for congestion on Y < 0.80: **************************************** (80945) 0.80 - 1.00: ******* (13724) 1.00 - 1.20: * (717)

Table 2 - Physical congestion typical results

Database now defined as verilog netlist, PDEF file and SDC files. Two files are defined as SDC: one SDC for Clock Tree and relaxed one to the routing flow. The SDC file for the Clock Tree is the same as the Physical-Compiler work with 20% clock uncertainty definition. For the routing, we use a relaxed uncertainty of 15%. In several cases, the results and run time are better that way.

SNUG Israel 2004

9

From RTL to GDS

2.5

Layout and Timing closure

2.5.1 Astro Physical stage This is the first time we meet the layout tool (Astro). The placement, netlist and timing constraints are loaded to the tool and all cells are being fixed. Our methodology will use the placement of the Physical-Complier as mush as possible. Only minor placement changes are being allowed. At any stage, the Layout tool doesn’t make any DRC (resize, add buffers) fixes automatically. This is from the reason that the timing engine of Astro is still different from the Prime-Time (the sign-off timing tool). Also, the flow of changing the design by ECO flow makes it more controllable and accurate. It can be done since Physical-Compiler project the layout timing very close to the results of the PrimeTime. We also protect it by reducing the design margins within the progress of the design. The first stage is done with tighter timing constraints, in which clock trees, reset trees and High FanOut net are being built. The Clock Tree stage starts with the load SDC section. It is for Astro to understand the design constraint and the clocks definition. In order to verify that all the SDC constraint was read correctly we dumped it out for review. Now we have to check that design can achieve the timing requirement with out the nets impact, so we will apply the “without interconnect” option and check timing report. Design should meet timing in that stage. An important stage is the clock tree optimization. With the cost of insertion delay, the clock tree skew is reduced to the level that there is almost no MIN delays violation. It depends also of the level of prevention that we used before, but those values are taken into consideration. For example, see table CLOCK1

Short path Long path Skew

CLOCK2

No Optimization 0.4 244.7

After Optimization 2.4 13.1

No Optimization 1.3 2.9

After Optimization 1.43 1.45

244.3

10.7

1.6

0.02

Table 3 - Astro clock tree skews

Clock Tree stage is building structures of trees for all the FF’s, which correlates to the same clock nets. The Clock Tree creates levels of buffers, to allow the implementation of the clock signal to all those flops with the right drive. Astro give as the option to interfere in the building process, to define the cells to be used and to optimize that tree. Of course that Clock Tree could be built trough gates which are not flops, also it is available to use generate clock from other clock. We recognized that the clock tree optimization has to be manual

SNUG Israel 2004

10

From RTL to GDS

changed only for the path of special gated clock with controlled FF’s. This manual optimization is easier and faster then solving the MIN delay violation it creates 2. Now we are ready to run the HFO nets . The definition of such net is a wide connection net, e.g. many gates to be connected to the same net which has no clock attribute. The result of using HFO net command is a net described by buffers, which creates a structure as a clock tree. It is not recommended to treat the HFN nets (such as reset) as clocks, and that why they have to have their own refer. Routing Astro is used as a routing tool only. The constraints file now include clock uncertainty degree of 1520% of clock cycle, and hold time margin is 100 – 350 pS. This is very fast runs and we can complete full cluster within 2 days.

Like the previous stage we start the routing stage with loaded the SDC for the routing constraint definitions and duped it out in order to verify that all the SDC constraint were read Before we start to route, we check that we have blockages in the areas that we don’t won’t to route on, like blocks memory or areas that save to other purpose. We can use blockage for specific metal (like blockage for metal 1) or for all of them. As well, we loaded route guide for special nets like clocks that we prefer to route in higher metal. The route starts with special and/or sensitive nets like clock nets, and proceed wish all the other nets in three steps •

Global route –that maps the general pathway through the design for each unrouted nets (with no physical layer)

•

Track assignment – Assign nets to wire tracks then places wires and VIAs to show the initial routing configuration.

•

Detail route – perform detail routing on a design and then writes the violations to a routed cell

Lastly, we fixed the violations with search & repair command that find the violations and fix them automatically. The tool makes almost all the layout DRC fixes. As you can see below, out typical block pass the routing stage with only one small area of high congestion.

2

This is done manually after verification of Prime-Time at the first timing loop SNUG Israel 2004

11

From RTL to GDS

Figure 3 - Layout Typical Congenstion

Database now is verilog netlist and SPEF file produced by STAR-RC

SNUG Israel 2004

12

From RTL to GDS

2.5.2 Full Timing model for Static Timing Analyzes The database, which includes the new netlist & SPEF files from the Astro, is being read into the Prime-Time and being verified in both fast and slow corners. It is highly important to verify that all the un-annotated nets warnings are due to nets’ branches that doesn’t appear in the layout (and therefore are not a real problem). In case those warning do point to real problems, the issue should be check & fixed. Also all the nets’ names mismatches, between the SPEF files & the net list file, should also be fixed. All working modes have to be tested. Such as: System mode, Production mode, Debug mode and SCAN mode. This is the time to verify timing violation and fix them. The four main violations that are handled are max & min delay, cap & transition violations. All violation should be handling, up to the relevant percentage of the violation (for example, we may decide not to fix a cap violation, which is below 10% of the driving cell ability) Our experience shows that only 300-500 path need to be fixed, such as by adding buffers, up/down sizing or changing the location of a cell. For each working mode there is different constraints file according to the risk and the importance of the mode. For example: different MIN delay margin is defined for SCAN in shift and capture mode. When all violations are fixed and verified in the Prime-Time environment, we load the netlist to Design-Compiler and use DC command to add all the changes to the netlist. This is netlist for the 1st timing closure. The conclusion from this step is that Physical-Compiler has a very good projection of timing and the combination of Physical-Compiler as a placer and Astro as a router yield high efficiency. Since this stage may require a lot of checks, it may be a very high time consumer. We need to reload all the constraint for every checked mode (clocks & external definitions, relevant RC files, different case analysis, etc.). Therefore, we need to update the design for every different checked mode. When an update is done, we reload the DB (netlist & RC files) for sefty reasons. Even though we use high powered machines, a full chip update may take more then 30 minutes. Since we may check up to 12 different modes (double 2, for both corner), it will be a waste of time not to run all of them in parallel. At the last stages of the project, we use about 5-6 linux machines, with enough CPUs for running up to 3 primetime license per machine. For handling all this parallel running, we build an environment, that automatically creates elaborate reports for all relevant checked modes.

SNUG Israel 2004

13

From RTL to GDS

2.5.3 Astro ECO mode The updated netlist is being loaded to Astro and a compare file is being produced. New cells are being located manually, in order to prevent any new violation like MAX transition that the automatic placer can cause. At this stage we also add our FIB cells 3 and bonus cells 4. Astro knows how to place them homogeneously in the design. The amount of that kind of cell is mainly driven by the cluster utilization and the risk of the block. We repeat steps 2.5.2 and 2.5.3 for one additional loop in which the number of violation reduce by factor of 10 each time In some cases, there is additional loop to fix 1-10 violations but in general, two iterations are enough to close all issues.

Figure 4 - Bonus and FIB cell scattering

3

FIB cells – Extra STD library cell which all the interface connection go to the upper metal for easy design changes in the LAB. 4 Bonus cells – Extra STD library cells which will be used for bugs fixes by metal changes only. SNUG Israel 2004

14

From RTL to GDS

2.6

Final tuning

In many cases, there is still a need to balance clocks or to add/remove delay due to external AC timing constraints. Those tasks are being done only in one top-level cluster, which is kept open till TO. In some other cases, interconnects between clusters cause some violations and there is a need to make again small ECO to solve issues like MAX capacitance. In the bottom line such a concept that we presented in this paper keeps us in the time frame and enables to produce TO’s.

3.0

Conclusions and Recommendations

The paper presents a concept of taking the advantage of advanced processes and pay of some extra area in order to get a stable and predictable flow from RTL to GDS. It can be use in the communication products in which most of them using low frequencies demands compared to the process. Using a set of tools from Synopsys house seems to add another level of stability and prediction to the Tape-Out flow. This combination proved itself in our product line in the past and we continue to work that way.

4.0

Acknowledgements

I want to thanks my colleagues who work with this methodology and help me to collect the data to this paper. Sagy Eick, Oren Mamet, Oded Pilowsky, Shai Michaeli

SNUG Israel 2004

15

From RTL to GDS

From RTL to GDS using Synopsys flow within less then 10 weeks

Short Description

Description

Comments

We need your help!