Lecture 13 – Timing Analysis

Ryan Robucci

References

[ꭝ] Illustration from The Design Warriors's Guide to FPGAs by Clive Maxfield, Elsevier
[ꭝꭝ] Modeling, Synthesis and Rapid Prototyping with the Verilog HDL, Michael D. Ciletti
Basic Timing Analysis Concepts: https://eclipse.umbc.edu/robucci/cmpe415/attachments/Lecture14.pdf#page=10 Page 10-31

ASIC vs FPGA Design Flow

Source: Xilinx Training Material fpga-vs-asic-design-fow.ppt

Synthesis

Mapping is the process of associating entities such as gate-level functions in the gate-level netlist with the physical LUT-level functions available on the FPGA

ꭝMaxfield

Packing

Packing is grouping of LUT and registers into CLBs

Place and Route

Place and Route is the process of placing CLBs and finding routing configuration to make required interconnections

Timing Analysis and Post-Place-and-Route simulation

After Place and route, we have a fully routed physical design and a timing analysis tool can extract timing and check for any timing violations (setup, hold,etc...) associated with any of the internal registers.
These are more accurate than load estimates that would be used before place and route
A new netlist can be generated that includes accurate delays in a standard delay format SDF file associated with the post place and route netlist (can't push delays directly back to original description as lots of stuff has moved around or changed)

FPGA vs ASIC Tools

FPGAs vs ASICs: FPGAs are regular structures and a represent a constrained design space for analysis tools (and synthesis tools).
Many designs will emerge from one underlying physical hardware design.
This underlying hardware can be heavily characterized in the fabricated IC. This for allows tools that can perform accurate general design analysis for anyone using the same underlying hardware. In ASICs, each design can be very different, meaning it is more difficult for tools to accurately predict timing for every possible design. In ASIC Design, SPICE-Level simulation is sometimes used.

Static Timing Analysis

Static timing analysis refers to using using delays extracted from physical implementation to analyze timing directly rather than through simulation
- Place-and-routed delays are extracted from place and routed design
Static timing analysis does not involve driving inputs input the system and analyzing resulting waveforms
Static Timing Analysis is often fast and may be part of an automation tool’s optimization process to test and evaluate design option trade-offs
Pre place-and-route estimates delays and can drive synthesis, timing-driven synthesis, Timing-Driven Synthesis logic option
Typically pessimistic delay assumptions are made to arrive at a worst-case model – a data-driven simulation may reveal what delays are actually relevant in a design

Timing Analysis Concepts

User Constraints (e.g. user constraints file, directives, * project settings)
Clock Domains
Clock Jitter
Aync Reset
Multi-cycle Paths
False Paths

Clock Period Constraint

For Xilinx Tools, a PERIOD constraint should be supplied for every clock

Clock Domain Defined by Seq. Elements

Sequential Elements Include
- Flip Flops
- Latches
- Clocked Distributed and Block RAM/ROM
- FIFOs
- I/O Hardware with Clock Input (e.g. I/O SerDes)
- Hardware bocks with Clock Input (e.g. Xilnix MULT18/18)
The combinational paths between sequential elements in the same clock domain are constrained and must be analyzed

Setup and Hold Times

Setup and Hold times define a window around a clock edge during which data inputs to a register should not transition.
Setup Time defines the time before a clock edge that a signal must settle. A violation occurs with a path delay is too large. (It so happens that negative setup times are common)
Hold Time defines the time after a clock edge that a signal must not begin a transition. A violation occurs when a path delay is too small.
Setup and Hold Time Slack quantify how much “room to spare” before and error occurs. A negative slack indicates a violation.

Clock Skew

Clocks are buffered through a clock routing network. Clock signals are delayed with respect to the original clock.
Paths are defined starting at a source register and terminating at a destination register
Path delay: $T_{\rm PD}=T_{\rm clk-to-q} + T_{\rm comb.\ path\ delay}$
Clock Skew is the difference in arrival time of clock edges at destination registers and source registers

Positive Clock Skew

Setup and hold time windows are defined with respect to the destination register clock edge
If the destination clock is more delayed than the source clock, it represents positive clock skew with regard to that path. This gives more time for a path to settle and thus avoid a setup time violation. It unfortunately delays the cutoff time for holding a data signal.

Negative Clock Skew

If the destination clock is less delayed than the source clock, it represents negative clock skew with respect to that path. This gives less time for a path to settle and advances the cutoff time for when a signal must hold.

Setup Timing Slack in Critical Path

Static Timing Analysis is a structural analysis based on previous characterizations. A key parameter from the analysis is the setup timing slack
- $T_{\rm CLK\,TO\,Q}$ : Delay from clock edge to sequential gates update
- $T_{\rm CPD}$ : Delay through combinatorial gates and routing
- $T_{\rm PD}$ : Path Delay, $T_{\rm CLK\,TO\,Q} +T_{\rm CPD}$

Critical Path Timing requirement: $T_{\rm clk} +T_{\rm skew}> {\rm TPD} + T_{\rm setup}$
$T_{\rm skew}$ : Time from when clk edge occurs at an source flip-flip to when the edge occurs at a destination (e.g. clock delay 1 -clock delay 2) As defined here, positive clock skew with respect to a critical path increases setup slack, while negative skew reduces it.

Setup Time Analysis (slack)

For each path there should be some slack to the timing. A positive slack value refers to how much extra delay could be added or how much faster a clock rate could be..

$T_{suSlack}$ : positive slack indicates the timing requirement is met for a defined clock period while a negative slack means it has not
$T_{suSlack} =T_{clk} -({\rm TPD} + T_{\rm setup} )+T_{\rm skew} - ({\rm Jitter\ or\ Uncertainty})$
All possible paths must be analyzed. The paths with the longest delay are important, but the analysis should be a combination of path delay and clock skew and clock and path delay uncertainty, not just path delay.
- A short delay path could still be a problem because of race conditions
Setup Time Analysis = Data Path Delay including source clock-to-q delay + Desitination Synchronous Element Setup Time - Clock Path Skew

Hold Time Analysis and Slack

Hold Time Analysis Avoids Race Conditions
- Tslack(hold) = TPD -Thold- Tskew
Most problematic are “short” combinatorial paths and high clock skew (e.g. back-to-back registers far from each other on the clock network)
Can fix by slowing path (adding several slow buffers in series)
In VLSI, tend to route clock in oposite direction of data whenever creating shift register chains.

Unconstrained Paths

What is (not) Included? Input Offsets and Output requirements
Assume CLKA and CLKB are independently constrained – Unrelated Clock Domains

Source: Xilinx Timing Constraints User Guide

By default, the input and output paths, regardless if there is only one or more clock domains, are not analyzed
- Can add a specification for time after a clock edge allowed to reach output pad
- Can add a specification for the transition window for input to analyze setup and hold time
Cross domain paths are paths originating from the output of a sequential element in one clock domain and ending at the input of sequential elements in another clock domain
By default, if multiple clock domains are present, cross-domain paths are not constrained and not analyzed.

If the clocks of two clock domains are related in some limited ways, the paths can be analyzed, if the relationship is specified...

Slide Source: Xilinx Timing Constraints User Guide
Following is an example of the PERIOD constraint syntax. The TS_Period_2 constraint value is a multiple of the TS_Period_1 TIMESPEC.
```
TIMESPEC TS_Period_1 = PERIOD "clk1_in_grp" 20 ns   HIGH 50%;
TIMESPEC TS_Period_2 = PERIOD "clk2_in_grp"   TS_Period_1 * 2;
```
Note that if the two PERIOD constraints are not related in this method, the cross clock domain data paths is not covered or analyzed by any PERIOD constraint.
Other information can be specified like lag (phase)
The existence of paths that cannot be analyzed by STA, unconstrained paths, will be noted in the timing analysis report

Multicycle Paths

Can override delay on some paths, such as providing a series of multipliers two clock cycles instead of one to complete
In this example the assertion of the enable signals (en) determines the actual constraints. An STA tool may not have an understanding/information about the design to infer this
Consult documentation for adjusting timing constraints of individual paths

False Paths

Can exclude paths were timing based on a single-cycle is not important (false paths)
In the following example, assume mode is a signal set once upon processor power-up initialization.
```
...
S3: D <=   (endianMode?{AH,AL}:{AL,AH}) 
         + (endianMode?{BH,BL}:{BL,BH});
```
- The transition on endianMode is therefore not required for timing analysis, but it would make sense to request 2 STA, on the case where endianMode is 1 and when it is 0
Where the design accounts for timing concerns, such as explicit designs to handle Clock Domain Crossing and Synchronizers, standard analysis and warning isn't meaningful
```
always @ (posedge clk) begin
  Q1<=in;
  Q2<=Q1;
end
```
Or if the architecture doesn't require a path to meet timing. For instance in the following example, it may be the case that selA and selB are never both high, meaning the path through C1 and C3 is a false path

Other Examples:
https://www.edn.com/design/integrated-circuit-design/4433229/Basics-of-multi-cycle---false-paths

Asynchronous Signals (e.g. Async Clear)

Note that asynchronous signals are not easily analyzed for timing and are sensitive to glitches.

Wherever sensitivity to glitches exits, use registered output logic to generate the control signal.

Dynamic Timing Analysis

Whereas static timing analysis processes timing equations based on the circuit structure, dynamic timing analysis uses the results of a temporal simulation
The generated/saved waveforms (i.e. transient waveforms) are analyzed for timing violations and glitches
A complexity is that suitable input sequences (input vectors) must be created to test characteristics of the circuit at internal points.

Dynamic Timing Analysis using Event-Driven Simulation

Verilog tools include several methods for annotating timing including use of

specify blocks

[ꭝꭝ] Figure 2.8 Michael D. Ciletti
Timing Data augmentation using a sidecar files e.g. Standard Delay Format Files
Delay parameters (e.g. #) may also be used to model timing delays
Assertions
Verilog also support timing check tasks to flag violations during simulation
These can be added to code with sequential logic
```
$hold (reference_event, data_event, limit[,notifier]) ;
```
Assertion violations are detected during simulation and reported

Dynamic vs Static Timing Analysis; Functional vs Timing Verification

Functional simulation and verification refers to verifying logical descriptions, including Boolean expressions and register transfers
Timing verification refers to making sure that all timing requirements, external and internal (setup, hold, etc..) are satisfied
Dynamic Timing Analysis refers to analysis of timing by simulating the system with inputs and examining the resulting waveforms.
Static timing analysis uses no data and does not use a simulation – is just analyzes the structure
A timing analysis tool stores delays in a separate file Standard Delay Format SDF file along side the post place and route netlist

Review Static vs Dynamic

Dynamic timing analysis and functional simulation each require well-selected input sequences These are called test-vectors. Static Timing analysis uses no test vectors.
Dynamic timing analysis is sometimes combined with functional simulation while static timing analysis can not
Though Dynamic Timing and Functional Analysis use an event driven simulation which is much faster than SPICE-level “analog” circuit simulation, Static Timing Analysis is much faster and is used during place and route iterations.
Static timing analysis is more straight forward with one clock domain, though can be extended to handle multiple clock domains. Dynamic timing analysis can automatically handle multiple clock domains.