Physical Design Flow and Timing Analysis
Ryan Robucci
1. References
- https://eclipse.umbc.edu/robucci/cmpeRSD/Lectures/Lecture11__Metastability/
- [ꭝ] Illustration from The Design Warriors's Guide to FPGAs by Clive Maxfield, Elsevier
- [ꭝꭝ] Modeling, Synthesis and Rapid Prototyping with the Verilog HDL, Michael D. Ciletti
- Basic Timing Analysis Concepts: https://eclipse.umbc.edu/robucci/cmpe415/attachments/Lecture14.pdf#page=10 Page 10-31
2. Objectives
- Physical Design Flow
- Understand key terminology of timing analysis, specifically Static Timing Analysis (STA) and Dynamic Timing Analysis
- Understand Static Timing Analysis (STA)
- Understand Dynamic Timing Analysis as a Complement to STA
- Terms
- User Constraints (e.g. user constraints file, directives, * project settings)
- Source Register; Destination Register
- Startpoint; Endpoint
- False Path
- Multi-cycle Path
- Path Delay
- Setup Time Slack
- Hold Time Slack
- Clock Path
- Clock Network
- Asynchronous Path
- Clock Domain
- Unrelated Clock Domain, and Related Clock Domains
- Clock Jitter and Uncertainty
- Clock Path
- Metastability
- Clock Skew
- Signal Skew
- Pulse Width
- Contamination delay
- Aync Reset and Control
- SDF
- ECO
- Event-Based vs Cycle-Based Simulation
3. Electronic Design Tools
- CAD: Computer Aided Design
- CEA: Computer Aided Engineering
- EDA: Electronic Design Automation
- Historically,
- CAE refereed to front end tools like design capture and simulation
- CAD refereed to backend tools like layout, place, and route
- EDA was accepted as the merger of all topics
4. ASIC vs FGPA Design
http://www.xilinx.com/company/gettingstarted/fpgavsasic.htm
- FPGAs have a shorter design process and shorter design cycles
- When problems arise, design iterations are required that can be very time-consuming.
- FPGAs have a significant advantage when design updates are required based on in-system testing
- Updates in-field allows in-system test and re-programming; insertion of new debug circuitry and later removal; and feature upgrades
5. Automated Physical Design
- A global optimum solution going from RTL to place-and-routed physical design is difficult is difficult to find
- the processes is broken into several steps towards a (hopefully) good solution
- Heuristics are used in each step that generally lead to a good outcome
- Synthesis converts a high-level behavioral description into a representation of a digital circuit
- Mapping, Packing, and Place and Route follow
6. Synthesis and Mapping
- Synthesis converts a high-level behavioral description into a representation of a digital circuit, such as a netlist
- A netlist is a structural representation of a design, with instances of simple cells from a library
Mapping is the process of associating entities such as gate-level functions in the gate-level netlist with the physical LUT-level functions available on the FPGA
ꭝMaxfield
7. Packing
Packing is grouping of LUT and registers into CLBs
8. Place and Route
Place and Route is the process of placing CLBs and finding routing configuration to make required interconnections
9. Timing Verification vs Logical Verification
- Timing Verification refers to analyzing a with regard to timing requirments.
- This usually specifically refers to sub-cycle (within clock cycle)
timing analysis
- Cycle-timing verification refers to looking at which cycles activity occurs and is distinct
- Timing Analysis is very distinct from functional/behavioral/logical verification
- Logical Verification:
- see if
sum = {(a&b) | (a&c) | (b&c) , a ^ b ^ c, }is a valid implementation of sum(a,b) = a+b - typically checked using many test input vectors from the set 000,001,…111 and comparing the output vectors produced from a high-level model and Verilog simulation
- special symbolic analysis tools exist that can check for logic equivalence efficiently for some designs
- see if
10. Timing Analysis and Post-Place-and-Route simulation
- The product of Place-and-Route is a fully routed physical design that a timing analysis tool can extract timing information from and check for any timing violations (setup, hold,etc…) associated with any of the internal registers.
- These estimatates involve more accurate than load estimates that would be used before place and route
- Clock networks, switch configuration for routing, and signal buffers are known at this stage
- A new netlist can be generated that includes accurate delays in a standard delay format SDF file associated with the post place and route netlist (can't push delays directly back to original description as lots of stuff has moved around or changed)
11. FPGA vs ASIC Physical Design Tools
- FPGAs vs ASICs: FPGAs are regular structures and a represent a
constrained design space for analysis tools (and synthesis tools).
- Many designs will emerge from one underlying physical hardware design
- This underlying hardware can be heavily characterized in the fabricated IC. This for allows tools that can perform accurate general design analysis for anyone using the same underlying hardware. In ASICs, each design can be very different or have unique differences, meaning it is more difficult for tools to accurately predict timing for every possible design. In ASIC Design, SPICE-Level simulation is sometimes used.
12. Static Timing Analysis
- Static timing analysis refers to using using delays extracted from
physical implementation to analyze timing directly rather than through
simulation
- Place-and-routed delays are extracted from place and routed design
- Static timing analysis does not involve driving inputs input the system and analyzing resulting waveforms
- Static Timing Analysis is often fast and may be part of an automation tool's optimization process to test and evaluate design option trade-offs
- Pre place-and-route estimates delays and can drive synthesis, timing-driven synthesis, Timing-Driven Synthesis logic option
- Typically pessimistic delay assumptions are made to arrive at a worst-case model – a data-driven simulation may reveal what delays are actually relevant in a design
13. Clock Period Constraint
- For older Xilinx Tools, a PERIOD constraint should be supplied for every clock
14. Clock Domain Defined by Seq. Elements
- Sequential Elements Include
- Flip Flops
- Latches
- Clocked Distributed and Block RAM/ROM
- FIFOs
- I/O Hardware with Clock Input (e.g. I/O SerDes)
- Hardware bocks with Clock Input (e.g. Xilnix MULT18/18)
- The combinational paths between sequential elements in the same clock domain are constrained and must be analyzed
15. Clock Network
- A clock network is a dedicated network of circuity of clock buffers for balancing delay
Typically an H-tree is used Balanced Delay Using H-tree:
- FPGA: clock network has many gloabal and local clock networks/regions
and support components
https://www.allaboutcircuits.com/technical-articles/clock-management-clock-resources-of-fpgas/
- Different I/O banks may be connected to different clock drivers
- For ASIC, custom trees are inserted in a process step call "Clock -
Tree Insertion"
- Insertion Delay refers to new timing after Clock Tree Insertion
((Clock Buffer Depection Reference: S. A. Tawfik and V. Kursun, "Buffer Insertion and Sizing in Clock Distribution Networks with Gradual Transition Time Relaxation for Reduced Power Consumption," 2007 14th IEEE International Conference on Electronics, Circuits and Systems, Marrakech, Morocco, 2007, pp. 845-848, doi: 10.1109/ICECS.2007.4511123.))
16. Timing requirements for Registers
- Registers tend to have voltage as well as specific timing requirements for the inputs
Setup and Hold
- If violated, may get
- Old Data
- New Data
- Metastable Output
- Not conform to valid voltage specifications in-time (late settling)
- May change in middle of clock period (multiple output transitions triggered by the clk)
- Possible Output transitions when expected is 0->1:
representation in digital timing diagrams: a metastable output may be represented / depicted by a period of invalid/uncertain state
17. Clock Jitter / Uncertainty
- clock jitter is continuously randomized uncertainty in sampling time of data
- manufacturing and voltage changes can cause fixed changes, contributing to uncertainty
- data change event times also are uncertain
- voltage noise and temporal noise are related
- … -> voltage noise -> temporal noise -> voltage noise -> …
- induces generated signal timing
18. Setup and Hold Times
- Setup and Hold times define a window around a clock edge during which data inputs to a register should not transition.
- Setup Time defines the time before a clock edge that a signal must settle. A violation occurs with a path delay is too large. (It so happens that negative setup times are common)
- Hold Time defines the time after a clock edge that a signal must not begin a transition. A violation occurs when a path delay is too small.
- Setup and Hold Time Slack quantify how much "time to spare" before and error occurs. A negative slack indicates a violation.
19. Clock Skew
- Skew refers of the differice in time of hte arrival of a change at two differ points for a signal that is logically the same. Can be caused by transmission line delays, or buffers
- Signal skew refers to logic data signals whereas clock skew refers to the clock signals
- Clocks are buffered through a clock routing network. Clock signals are delayed with respect to the original clock.
- Paths are defined starting at a source register and terminating at a destination register
- Path delay: \(T_{\rm PD}=T_{\rm clk-to-q} + T_{\rm comb.\ path\ delay}\)
- Clock Skew is the difference in arrival time of clock edges at destination registers and source registers
<!– xslide –>
20. Positive Clock Skew
- Setup and hold time windows are defined with respect to the destination register clock edge
- If the destination clock is more delayed than the source clock, it represents positive clock skew with regard to that path. This gives m*ore time for a path to settle and thus avoid a setup time violation*. It unfortunately delays the cutoff time for holding a data signal.
21. Negative Clock Skew
- If the destination clock is less delayed than the source clock, it represents negative clock skew with respect to that path. This gives less time for a path to settle and advances the cutoff time for when a signal must hold.
22. Setup Timing Slack in Critical Path
- Static Timing Analysis is a structural analysis based on previous
characterizations. A key parameter from the analysis is the setup
timing slack
- \(T_{\rm CLK\,TO\,Q}\): Delay from clock edge to sequential gates update
- \(T_{\rm CPD}\): Delay through combinatorial gates and routing
- \(T_{\rm PD}\): Path Delay, \(T_{\rm CLK\,TO\,Q} +T_{\rm CPD}\)
Critical Path Timing requirement: \(T_{\rm clk} +T_{\rm skew}> {\rm TPD} + T_{\rm setup}\) \(T_{\rm skew}\): Time from when clk edge occurs at an source flip-flip to when the edge occurs at a destination (e.g. clock delay 1 -clock delay 2) As defined here, positive clock skew with respect to a critical path increases setup slack, while negative skew reduces it.
23. Setup Time Analysis (slack)
For each path there should be some time slack. A positive setup time slack value refers to how much extra delay could be added in the data path or how much faster a clock rate could be.
\(T_{suSlack}\): positive slack indicates the timing requirement is met for a defined clock period while a negative slack means it is not \(T_{suSlack} =T_{clk} -({\rm TPD} + T_{\rm setup} )+T_{\rm skew} - ({\rm Jitter\ or\ Uncertainty})\)
- All possible paths must be analyzed. The paths with the longest delay
are important, but the analysis should be a combination of path
delay and clock skew and clock and path delay uncertainty, not
just path delay.
- A short delay path could still be a problem because of race conditions
- Setup Time Analysis = Data Path Delay including source clock-to-q delay + Desitination Synchronous Element Setup Time - Clock Path Skew
24. Hold Time Analysis and Slack
Hold Time Analysis Avoids Race Conditions
- Tslack(hold) = TPD -Thold- Tskew
- Most problematic are "short" combinatorial paths and high clock skew (e.g. logically back-to-back registers that are far from each other on the clock network)
- Can fix by slowing data path (adding several slow buffers in series)
- In VLSI, tend to route clock in opposite direction of data whenever creating shift register chains.
25. Contamination Delay for Hold Time Analysis (and async. removal time)
- Contamination Delay is more conservative (safer) than Clock to Q
For a more conservative estimate, TPD can be replaced with a smaller value called the Contamination Delay, a measure of how soon the output of a FF may becomes unusable (e.g. not valid 0) \(T_{PD}\): Path Delay, \(T_{CLK TO Q}+T_{CPD}\) becomes \(T_{PD}: T_{CD}+T_{CPD}\)
where \(T_{CD}\) is the time after the clock that the output may change as opposed to when it is likely to have settled
- can also be used for asynchronous control removal time (later in today's discussion)
26. Unconstrained Paths and CDC Clock Domain Crossing
- A clock domains is defined by sequential elements with the same
control clock
- Combinatioal logic on paths between registers in the same clock domain are part of that domain
- Paths within the domain may be analyzed for timing and ensured to behave as predicted from cycle to cycle
- Combinational logic paths from a register of one clock domain to
another is known as a clock-domain-crossing (CDC) signal
- unless the clock domains are "related clock domain", such as one being derived from the other and having an integer multiple, it may not be possible that such paths won't have transistions that violate setup/hold time constraints
What else? Input Offsets and Output requirements (Assume CLKA and CLKB are independently constrained – Unrelated Clock Domains) and look at
27. Clock Domain Crossing (CDC) Signals
- Clock Domain Crossing arises from the need to communication between Multiple Clock Domains
- You have learned in sync. design in one clock domain
- Having Multiple Clock Domains means Clock Domain Crossing must be
considered
- typically a timing analysis setup should identify such signals and either provide more information on how to analyze or simply waive violations
28. Asynchronous Inputs
Not all signals arise from a clock domain at all, such as physical, real-world inputs
- A problem arises at a fanout point, where mutiple interpretations are performed in parallel. If the voltage is not a valid high or valid low (e.g. invalid during sampling), inconsistent interpretations can result.
30. Multicycle Paths
- Can override ONE-PERIOD-DELAY-CONSTRAINT on some paths, such as to provid a series of multipliers two clock cycles instead of one to complete
- In this example the assertion of the enable signals (en) determines the actual constraints. An STA tool may not have an understanding/information about the design to infer this on its own, which is why the override is needed
Consult documentation for how to specify individual paths (such as specifying source and destination register) and describe adjustments to the timing constraints uniquely for those
31. False Paths
- Can exclude paths were timing based on a single-cycle is not important (false paths)
In the following example, assume mode is a signal set once upon processor power-up initialization.
... S3: D <= (endianMode?{AH,AL}:{AL,AH}) + (endianMode?{BH,BL}:{BL,BH});
- The transition on endianMode is therefore not required for timing analysis, but it would make sense to request 2 STA, on the case where endianMode is 1 and when it is 0
Where the design accounts for timing concerns, such as explicit designs to handle Clock Domain Crossing and Synchronizers, standard analysis and warning isn't meaningful
always @ (posedge clk) begin Q1<=in; Q2<=Q1; end
- Or if the architecture doesn't require that a path meet timing requirements.
- In the following example, based on the design constraint the highlighted path would never be exercised
- In the following example, based on the design constraint the highlighted path would never be exercised
- Or logically irrelevant path:
- Example:
- Example:
Other Examples: https://www.edn.com/design/integrated-circuit-design/4433229/Basics-of-multi-cycle---false-paths
32. Asynchronous Signals (e.g. Async Clear)
- Note that asynchronous signals are not easily analyzed for timing and are sensitive to glitches.
Wherever sensitivity to glitches exits, using registered output logic to generate the control signal is a good idea, otherwise ensure a hazard-free logic path:
33. Dynamic Timing Analysis
- Whereas static timing analysis processes timing equations based on the circuit structure, dynamic timing analysis uses the results of a temporal simulation
- The generated/saved waveforms (i.e. transient waveforms) are analyzed for timing violations and glitches
- A complexity is that suitable input sequences (input vectors) must be created to test characteristics of the circuit at internal points.
34. Dynamic Timing Analysis using Event-Driven Simulation
- Dynamic Timing Analysis typically uses an event-driven simulation, just like functional simulation i.e. dynamic timing analysis involves a gate-level simulation with timing information
- Event driven simulation exploits the fact that most signals are quiescent at any given point in time.
- In event driven simulation, computational effort not expended on quiescent signals, i.e., their values are not recomputed at each time step
- Rather, the simulator waits for an event to occur, i.e., for a signal to undergo a change in value, and ONLY the values of those signals are recomputed.
- Decide time step size dynamically (“on the fly”), based on the next event future queue
- Verilog tools include several methods for annotating timing including use of
specify blocks
- Timing Data augmentation using a sidecar files e.g. Standard Delay Format Files
- Delay parameters (e.g. #) may also be used to model timing delays described by Inertial Delay and Transport Delay
Verilog also support automatic timing check tasks to flag violations during simulation These can be embedded alongside sequential logic
$hold (reference_event, data_event, limit[,notifier]) ;
Assertion violations are detected during simulation and reported
- SystemVerilog Assertions (SVA) represent an specialized syntax for describing timing relationships, which can be inlined with other SystemVerilog RTL code and maintained (updated) along with it
Online Analysis
- checks are activated and deactivated as-needed when satisfied or violated
- Offline Analysis (post-simulation)
- saves waveforms in format such as VCD and analysis is performed afterward
- this is not the typical approach since disk IO is time-consuming (saving and reading all waveforms required for analysis)
35. Async Signal Timing Checks
- recovery time: required time after de-assertion edge of asynchronous control that the control must be stable before the trigger-edge of clock
- removal-time: time after a clock trigger-edge that an asserted asynchronous control signal must remain stable
36. Dynamic vs Static Timing Analysis; Functional vs Timing Verification
- Functional simulation and verification refers to verifying logical descriptions, including Boolean expressions and register transfers
- Timing verification refers to making sure that all timing requirements, external and internal (setup, hold, etc..) are satisfied
- Dynamic Timing Analysis refers to analysis of timing by simulating the system with inputs and examining the resulting waveforms.
- Static timing analysis uses no data and does not use a simulation – is just analyzes the structure
- A timing analysis tool stores delays in a separate file Standard Delay Format SDF file along side the post place and route netlist
37. Overview Static vs Dynamic Timing Analysis
- Dynamic timing analysis and functional simulation each require well-selected input sequences These are called test-vectors. Static Timing analysis uses no test vectors.
- Dynamic timing analysis is sometimes combined with functional simulation while static timing analysis can not be
- Though Dynamic Timing and Functional Analysis use an event-driven simulation which is much faster than SPICE-level "analog" circuit simulation, Static Timing Analysis is much faster as it only analyzes structure and can even be performed during place and route iterations.
- Dynamic Timing Analysis can easily report glitches on asynchronous set/reset
- Dynamic Timing Analysis will "correctly overlook" false paths if test vectors are valid, since the false path will not exhibit time violations in the actual circuit
- Multiple Clock Domains and Asynchronous Reset/Clear
- Static timing analysis is more straight forward with one clock domain, though can be extended to handle multiple clock domains. Dynamic timing analysis can automatically handle multiple clock domains as well as assess timing of asynchronous reset/clear
38. FPGA vendors role in EDA
- On ASIC side, tools are expensive
- FPGA companies focused on selling FPGAs provided tools for cheap which drives use and sale of their FPGAs.
- Tools were produced by third parties
- FPGA continue to innovate tools to promote adoption of FPGAs
39. Additional Materials
40. SDF Timing Annotation
- SDF text file format for capturing timing information about cells or design and information about how to perform timing checks
- may accompany a synthesis cell library library
- Example http://www.eda.org/sdf/sdf_3.0.pdf:
DELAYFILE (SDFVERSION "3.0") (DESIGN "BIGCHIP") (DATE "March 12, 1995 09:46") (VENDOR "Southwestern ASIC") (PROGRAM "Fast program") (VERSION "1.2a") (DIVIDER /) (VOLTAGE 5.5:5.0:4.5) (PROCESS "best:nom:worst") (TEMPERATURE -40:25:125) (TIMESCALE 100 ps) (CELL (CELLTYPE "BIGCHIP") (INSTANCE top) (DELAY (ABSOLUTE (INTERCONNECT mck b/c/clk (.6:.7:.9)) (INTERCONNECT d[0] b/c/d (.4:.5:.6)) ) ) ) (CELL (CELLTYPE "AND2") (INSTANCE top/b/d) (DELAY (ABSOLUTE (IOPATH a y (1.5:2.5:3.4) (2.5:3.6:4.7)) (IOPATH b y (1.4:2.3:3.2) (2.3:3.4:4.3)) ) ) ) (CELL (CELLTYPE "DFF") (INSTANCE top/b/c) (DELAY (ABSOLUTE (IOPATH (posedge clk) q (2:3:4) (5:6:7)) (PORT clr (2:3:4) (5:6:7)) ) ) (TIMINGCHECK (SETUPHOLD d (posedge clk) (3:4:5) (-1:-1:-1)) (WIDTH clk (4.4:7.5:11.3)) ) ) (CELL . . . ) )
- Netlist: a simple structural representation of a circuit
- Gate-Level Netlist: a circuit reduced to the simplest gates/cells available
- Timing Annotator: part of a tool that matches and associates cell timing information from SDF or specify blocks to parts of the design to set the stage for timing analysis of the overall esign
- Back Annotation: the result of a timing analysis tool can be a more complete or accurate timing information based on additional analysis of the resulting physical circuit. Can be capture in an SDF file
- Interconnect delays: allow association of unique a with each port of
each instance, not representing part of the original netlist or any
instance but instead the extra delay associated with the interconnct
path to a given input from its source e.g.
- identify instance2.portb with signal propogation delay 20 ps
- identify instance2.porta with signal propogation delay 25 ps
- identify instance3.portq with signal propogation delay 30 ps
41. Engineering Change Order (ECO)
- In engineering business, an ECO is a formal request to change a design specification, often specifying specific part of a design to change
- In EDA tools, ECO refers to abilities to track, manage,
analyze/predict impact (e.g. on timing), and perform changes on some
part of a design at some step in the design flow the design flow,
often without having to go "back to the beginning"
- For an EDA tool flow, this could involve logic insertion into a synthesized (post-implementation) netlist without re-synthesizing the entire design, or even after place and route
- Tools may support incremental operations like incremental
place-and-route to merge updates into existing design
- step 1: rip out (delete) old circuitry
- step 2: insert new circuitry
- Supports what-if analysis order of magnitude faster
- check insertion of a inverter, flip-flop, or extra mux&IO
42. Design Rule Checking DRC
checks at various stages as to if the physical or logic design
- is valid
- e.g. standard signal does not have multiple drivers
- warnings like:
- "net has only one connection"
- "output port is undriven"
- conforms to additional contraints provided
- e.g.
- out1p and out2n should be physically assigned to a pair of pins supporting low-voltage 1-V differential signaling
- clk input is routed physically through an appropriate input clock pin with connection to global clock buffers rather than through a general-purpose I/O pin
- e.g.
- waived DRC violations (waivers):
- user may waive-off some specific errors by provided additional information to tools and avoid distraction from important issues
43. Cycle-Based Simulators
- Event-driven simuation may involve multiple reevaluations of circuit components as inputs are propagated with delays
- Cycle-based simulators attempt to update registers once per cycle, computing the function of the combinational logic once per cycle
- Cycle-based simulators can be 5-10 times faster
- Cycle-based simulation is suitable for design with well-defined clocks
(e.g. not async circuits)
- Latches, combinatorial loops, clock-gating are more complex
- example tool https://www.veripool.org/verilator/
44. Two-State vs Four-State Simulation
- Two-state simulation Uses one-bit for values rather than 2 required to represent each 0,1,x,z
- Less memory and faster operations (simple check on bits rather than considering all combinations of signal states)
- allows better mapping of storage and operations to the native simualation platform
- code must not make use of x and z
- use of x as a literal in code may be mapped to a random choice of 1 or
0 during run-time evaluation
- so, rather than seeing an x propagated in sim, would see a randomized value that may be different in each execution
example of cycle-based 2-state simulator https://www.veripool.org/verilator/
Created: 2025-10-30 Thu 13:57
[ꭝꭝ] Figure 2.8 Michael D. Ciletti