Suggested Coding and Design Practices

CMPE 415  Programmable Logic Devices

Prof. Ryan Robucci
References

Data Pipeline and RTL

- General single-clock domain register-transfer-logic structure to synthesize (right):

- In general a synthesizer approaches synthesis one clock domain at a time.
  - According to their clock inputs, sequential gates connected to a given clock and the combinatorial gates between them form the gates within a clock domain
  - Logic optimizers work to simplify logic within a domain and must satisfy timing constraints like setup and hold with additional allowance for clock jitter and clock skew between gates
  - Implementation also requires routing clock signals through special clock networks that minimize skew

- You should first learn style that supports a single-clock domain synthesis and causes PRE-SYNTHESIS functional simulation results to match results from post-synthesis timing and ultimately hardware
  - Can expand understanding to multiple clock domains with cross-domain paths, multi-cycle paths, asynchronous inputs and external connections later
Suggested Procedural RTL Coding Practices

A) Use non-blocking statements to assign outputs of sequential gates

B) Use blocking statements to assign outputs of combinatorial gates

C) Avoid unnecessary latches when coding for FPGAs, but when necessary use non-blocking statements
   • Latches fall outside the scope of basic synchronous timing analysis assumptions and complicate timing and functional testing.

D) Do not make assignments to the same variable from more than one always block
   • It confuses synthesizers and can lead to non-deterministic simulation behavior if multiple assignments are made at the same point in time

E) Consider what each output bit is assigned for any possible evaluation under any input and sequential state condition
   • When mixing sequential and combinatorial code, it is easy to get into the mindset of coding for sequential logic outputs (which commonly defaults to retaining previous values in many cases) and overlook what happens with the intermediate combinatorial logic in all cases
   • When coding sequential code missed cases can lead to unintended enables, storing and use of old data

F) Do not use any delay operator for the purpose of affecting synthesis

Related guidelines:
Rule: No Continuous Feedback

For synthesizable code, continuous assignments and logic should not have feedback, this would represent combinatorial circuit feedback (which has hysteresis / memory)

*Simulation-only / modeling code* can combinatorial feedback only if delays are added

<table>
<thead>
<tr>
<th>Primitives</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>wire</strong>  y, a, b, c;</td>
</tr>
<tr>
<td><strong>and</strong> (y, a, b, c)</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Continuous assignments</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>wire</strong>  y, a, b, c</td>
</tr>
<tr>
<td><strong>assign</strong> y=a&amp;b&amp;c;</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Feedback</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>assign</strong> y=y&amp;a&amp;b&amp;c;</td>
</tr>
<tr>
<td><strong>assign</strong> y=a&amp;b;</td>
</tr>
<tr>
<td><strong>assign</strong> b=y&amp;a;</td>
</tr>
<tr>
<td><strong>assign</strong> y=(en &amp; x) | (~en &amp; y);</td>
</tr>
<tr>
<td><strong>assign</strong> y=en?x:y;</td>
</tr>
<tr>
<td><strong>nor</strong> n1(q, r, q_n);</td>
</tr>
<tr>
<td><strong>nor</strong> n1(q_n, s, q);</td>
</tr>
<tr>
<td><strong>assign</strong> q=~(r | q_n);</td>
</tr>
<tr>
<td><strong>assign</strong> q_n=~(s | q);</td>
</tr>
</tbody>
</table>

SR Latch
A basic register

```verilog
reg q;
always @(posedge clk)
begin
    q <= d;
end
```

The sensitivity list includes only a timing control signal, the clock. Output is only updated on clock edges even if d is changing, this requires memory.
## Synchronous reset

```verilog
reg q;
always @(posedge clk)
    if (reset)
        q <= 0;
    else
        q <= d;
```

The sensitivity list only includes the clock, indicating that the set and clear only propagate to the output upon the rising clock edge.

## Synchronous reset (active low)

```verilog
reg q;
always @(posedge clk)
    if (reset_n==1'b0)
        q <= 0;
    else
        q <= d;
```
Asynchronous reset

```
reg q;
always @(posedge clk, posedge clear)
  if(clear)
    q <= 0;
  else
    q <= d;
```

- The sensitivity list includes clear signal, indicating the clear should propagate to the output immediately. The edge specifier in the sensitivity list is required by some synthesizers, though not required for simulation.

Asynchronous reset (active low)

```
reg q;
always @(posedge clk, negedge clear_n)
  if(~clear_n)
    q <= 0;
  else
    q <= d;
```

- As a general rule, review the synthesis tool's documentation regarding recommended coding templates. Some synthesizers are more flexible than others. Some may for instance require that the reset condition be handled first with an if else construct as shown.
For FPGAs, often asynchronous resets are more efficient than synchronous resets since the inbuilt technology registers often have async. resets already on them, while synchronous resets would involve additional logic like an AND gate in the input data path.

Synthesizers may require conforming to code templates and require that async. Reset be handle by if statements immediately following trigger statement.

```verilog
always @ (posedge clk, negedge clr_n)
    if (clr_n==0)...
    else ...
always @ (posedge clk)
    if (reset_n==0)...
    else...
```

Note use of `negedge`
**Registered-Output Logic**

**Combinatorial Only includes all inputs:**

```vhdl
reg y;
always @(a, b) //all comb. dependencies listed
  y = a & b;

Might also have used y<= a & b; but we follow the practice of using blocking assignments for all combinatorial logic
```

**Sequential (registered-output combinatorial logic):**

```vhdl
reg q;
always @(posedge clk)
  q <= a & b;
```
This code attempts to model a swap of $y_1$ and $y_2$

Timing of execution of parallel always blocks is not guaranteed in simulation – though synthesis will probably work since synthesis approaches each always block somewhat independently at first.
Simulation of parallel blocks
(from Slideset 5)

always @(posedge clk, posedge rst)
  if (rst) y2 = 1; // preset
  else y2 = y1;

always @(posedge clk, posedge rst)
  if (rst) y1 = 0; // reset
  else y1 = y2;

Which one first? Does it even matter?
**Good Parallel Blocks**
*from Slideset 5*

This will not only synthesize correctly, but also simulate correctly:

```verilog
module fbosc2 (y1, y2, clk, rst);
    output y1, y2;
    input clk, rst;
    reg y1, y2;

    always @(posedge clk or posedge rst)
        if (rst) y1 <= 0; // reset
        else y1 <= y2;

    always @(posedge clk or posedge rst)
        if (rst) y2 <= 1; // preset
        else y2 <= y1;
endmodule
```

(Cummings 2000)
module amb_parallel_swap();
  reg clk, rst;
  reg y1, y2;
  reg z1, z2;

initial clk = 0;
always #50 clk = ~clk;

initial begin
  rst = 1;
  #10;
  rst = 0;
end

initial begin
  #1000 $finish;
end

always @(posedge clk, posedge rst) 
  if (rst) y1 = 0; // reset 
  else y1 = y2;

always @(posedge clk, posedge rst) 
  if (rst) y2 = 1; // preset 
  else y2 = y1;

always @(posedge clk, posedge rst) 
  if (rst) z1 <= 0; // reset 
  else z1 <= z2;

always @(posedge clk, posedge rst) 
  if (rst) z2 <= 1; // preset 
  else z2 <= z1;
endmodule
What if timing requirement is not satisfied?

Intentional Pipelining for Timing

module calc(q, a, b, c, d, clk);
output reg [31:0] q;
input [31:0] a, b, c, d;
input clk;
always @(posedge clk) begin:bn
    reg [31:0] tmp1;
    reg [31:0] tmp2;
    tmp1 = a * b;
    tmp2 = c * d;
    q <= tmp1 * tmp2;
end
endmodule

Critical Path Timing requirement:
$T_{CLK\_TO\_Q} + PD + T_{\text{setup}} < T_{\text{clk}}$
• Can reduce the clock speed
  – But this slows the entire system

• Can introduce pipelining
  – Overall propagation of computation is longer (two clock cycles incurring multiple setup and hold times)
  – Maintains fast system clock

• When timing is satisfied, introducing pipelining in the **critical path** of a system can allow an increase the clock rate and therefore overall system throughput
module calc(q, a, b, c, d, clk);
output reg reg[31:0] q;
input [31:0] a, b, c, d;
input clk;
always @(posedge clk) begin:bn
reg [31:0] tmp1;
reg [31:0] tmp2;
tmp1<=a*b;
tmp2<=c*d;
q<=tmp1*tmp2;
end
endmodule

Critical Path

TCLK_TO_Q + PD + Tsetup <Tclk
module pipeb2 (q3, clk);
output [7:0] q3;
input [7:0] d; input clk;
reg [7:0] q3, q2, q1;
always @(posedge clk) begin
    q3 = q2;
    q2 = q1;
    q1 = d;
end
endmodule

module pipeb1 (q3, d, clk);
    output [7:0] q3;
    input [7:0] d; input clk;
    reg [7:0] q3, q2, q1;
    always @(posedge clk) begin
        q1 = d;
        q2 = q1;
        q3 = q2;
    end
endmodule

Figure 2 - Sequential pipeline register

Figure 3 - Actual synthesized result!
Bad Parallel Block Pipeline Implementations

module pipeb3 (q3, d, clk);
  output [7:0] q3;
  input [7:0] d;
  input clk;
  reg [7:0] q3, q2, q1;
  always @(posedge clk) q1=d;
  always @(posedge clk) q2=q1;
  always @(posedge clk) q3=q2;
endmodule

These may synthesize correctly, but simulation may not match

module pipeb4 (q3, d, clk);
  output [7:0] q3;
  input [7:0] d;
  input clk;
  reg [7:0] q3, q2, q1;
  always @(posedge clk) q2=q1;
  always @(posedge clk) q3=q2;
  always @(posedge clk) q1=d;
endmodule (Cummings 2000)
**Good Pipeline Implementations**

Use non-blocking statements for registers

---

```verilog
module pipen1 (q3, d, clk);
  output [7:0] q3;
  input [7:0] d;
  input clk;
  reg [7:0] q3, q2, q1;
  always @(posedge clk) begin
    q1 <= d;
    q2 <= q1;
    q3 <= q2;
  end
endmodule
```

---

```verilog
module pipen2 (q3, d, clk);
  output [7:0] q3;
  input [7:0] d;
  input clk;
  reg [7:0] q3, q2, q1;
  always @(posedge clk) begin
    q3 <= q2;
    q2 <= q1;
    q1 <= d;
  end
endmodule
```

---

```verilog
module pipen3 (q3, d, clk);
  output [7:0] q3;
  input [7:0] d;
  input clk;
  reg [7:0] q3, q2, q1;
  always @(posedge clk) q1<=d;
  always @(posedge clk) q2<=q1;
  always @(posedge clk) q3<=q2;
endmodule
```

---

```verilog
module pipen4 (q3, d, clk);
  output [7:0] q3;
  input [7:0] d;
  input clk;
  reg [7:0] q3, q2, q1;
  always @(posedge clk) q2<=q1;
  always @(posedge clk) q3<=q2;
  always @(posedge clk) q1<=d;
endmodule
```

---

(Cummings 2000)
Cascading Combinatorial Logic

**Ex: AND-OR**

```verilog
module ao4 (y, a, b, c, d);
output y;
input a, b, c, d;
reg y, tmp1, tmp2;
always @(a or b or c or d) begin
    tmp1 <= a & b;
    tmp2 <= c & d;
    y <= tmp1 | tmp2;
end
endmodule
```

- **Works, but requires multiple passes in simulation**
- **y reflects old values**
- **y may not be updated correctly until next change triggers another evaluation**

Guideline: When modeling combinatorial logic with an always block, use blocking assignments.

---

```verilog
module ao2 (y, a, b, c, d);
output y;
input a, b, c, d;
reg y, tmp1, tmp2;
always @(a or b or c or d) begin
    tmp1 = a & b;
    tmp2 = c & d;
    y = tmp1 | tmp2;
end
endmodule
```

- **efficient sim**

---

```verilog
module ao5 (y, a, b, c, d);
output y;
input a, b, c, d;
reg y, tmp1, tmp2;
always @(a,b,c,d,tmp1,tmp2) begin
    tmp1 <= a & b;
    tmp2 <= c & d;
    y <= tmp1 | tmp2;
end
endmodule
```

(Cummings 2000)
module nbex1 (q, a, b, clk, rst_n);
  output q;
  input clk, rst_n;
  input a, b;
  reg q, y;
  always @(a or b)
    y = a ^ b;
  always @(posedge clk or negedge rst_n)
    if (!rst_n) q <= 1'b0;
    else q <= y;
endmodule

(module nbex2 (q, a, b, clk, rst_n);
  output q;
  input clk, rst_n;
  input a, b;
  reg q;
  always @(posedge clk or negedge rst_n)
    if (!rst_n) q <= 1'b0;
    else q <= a ^ b;
endmodule)

Separation of sequential and combinatorial

(Cummings 2000)
module ba_nba2 (q, a, b, clk, rst_n);
output q;
input a, b, rst_n;
input clk;
reg q;
always @(posedge clk or negedge rst_n) begin: ff
  reg tmp;
  if (!rst_n)
    q <= 1'b0;
  else begin
    tmp = a & b;
    q <= tmp;
  end
end
endmodule

(Cummings 2000)

Required Block name for defining local variables

Recommend Coding:
Local variable declared in a named block allowed in Xilinx ISE*
Prevents accidental use outside block

Mix of blocking for intermediate/combinatorial logic and non-blocking for sequential

*WARNING:Xst:646 - Signal <ff/tmp> is assigned but never used. This unconnected signal will be trimmed during the optimization process.
Mix of blocking and non-blocking
To same variable.

module ba_nba6 (q, a, b, clk, rst_n);
  output q;
  input a, b, rst_n;
  input clk;
  reg q, tmp;
  always @(posedge clk or negedge rst_n)
    if (!rst_n)
      q = 1'b0; // blocking assignment to "q"
    else begin
      tmp = a & b;
      q <= tmp; // nonblocking assignment to "q"
    end
endmodule

(Cummings 2000)
module badcode1 (q, d0, d1, sel, clk,);
    output q;
    input d0, d1, clk, rst_n;
    reg q;

    always @(posedge clk or negedge rst_n)
    if (sel==1'b0) q <= d0;

    always @(posedge clk or negedge rst_n)
    if (sel==1'b1) q <= d1;
endmodule

These blocks are make mutually exclusive assignments
May make sense. May sim, but synth. usually complains
of multiple drivers.
Non-blocking

```verilog
always @(posedge clk, posedge rst) begin
    if (rst) begin
        z1 <= 0; // reset
    end else begin
        z1 <= z2;
        z2 <= z1;
    end
end
```

Order doesn't matter

Blocking

```verilog
always @(posedge clk, posedge rst) begin: swap
    reg temp;
    if (rst) begin
        y1 <= 0; // reset
    end else begin
        temp = y1;
        y1 <= y2;
        y2 <= temp;
    end
end
```

temp won’t exist in the synthesized design
Guideline: Use non-blocking for EVERY register

```verilog
module dffx (q, d, clk, rst);
  output q;
  input d, clk, rst;
  reg q;
  always @(posedge clk)
    if (rst) q <= 1'b0;
    else q <= d;
endmodule

module dffb (q, d, clk, rst);
  output q;
  input d, clk, rst;
  reg q;
  always @(posedge clk)
    if (rst) q = 1'b0;
    else q = d;
endmodule
```

- It is better to develop the habit of coding all sequential always blocks, even simple single-block modules, using nonblocking assignments as shown in Example 14.
General warning for Sloppy combinations

**Intention**

```
module dff2 (qA, qB, d, clk, rst);
output reg qA, qB;
input d, clk, rst;
always @ (posedge clk)
    if (rst) begin
        qA <= 1'b0;
    end else begin
        qA <= dA;
        qB <= dB;
    end
endmodule
```

**Code**

**Synthesis**

The may be unintentional.
Consider every output for traversal of the decision tree.
Another Sloppy combination

module arith (output reg [15:0] q, input [15:0] a, input [15:0] b, input clk);
    always @(posedge clk) begin: blk
        reg asq,bsq;
        if (a>b) begin
            asq=a*a;
            bsq=b*b;
            q <= asq + bsq;
        end
        else begin
            q <= asq - bsq;
        end
    end
endmodule
module arith (output reg [15:0] q, input [15:0] a, input [15:0] b, input clk);
    always @(posedge clk) begin: blk
        reg [15:0] asq, bsq;
        asq = 16’bx; bsq = 16’bx;
        if (a>b) begin
            asq = a*a;
            bsq = b*b;
            q <= asq + bsq;
        end else begin
            q <= asq - bsq;
        end
    end
endmodule

ensures combinatorial logic does not imply memory
module arith (output reg [15:0] q, input [15:0] a, input [15:0] b, input clk);
    always @(posedge clk) begin: blk
        reg [15:0] asq, bsq;
        if (a>b) begin
            q <= asq + bsq;
        end else begin
            q <= asq - bsq;
        end
        asq = a*a;
        bsq = b*b;
    end
endmodule
Combinatorial

module arith (output reg q, input a, input b);
  always @(a,b) begin: blk
    reg asq,bsq;
    q = asq + bsq;
    asq = a*a;
    bsq = b*b;
  end
endmodule

Registered Logic – an attempt to only add a registered output

module arith (output reg q, input a, input b, input clk);
  always @(posedge clk)
  begin: blk
    reg asq,bsq, q_pre;
    q_pre = asq + bsq;
    asq = a*a;
    bsq = b*b;
    q <= q_pre;
  end
endmodule
module andReg(q, a, b, clk, rst_n);
  output q;
  input a, b, rst_n;
  input clk;
  reg q, tmp;
  always @(posedge clk, negedge rst_n)
    if (!rst_n)
      q <= 1'b0;
    else begin
      tmp <= a & b;
      q <= tmp;
    end
endmodule

Non-blocking assignment breaks our rules
Multiple Clock Domains

As a beginner, avoid the creation of additional clock domains caused by using various signals as a clock.

Avoid use of a logic signal as a clock

```verilog
assign gt = a>b;
reg q;
always @(posedge gt)
begin
    q <= a;
end
```

Synchronous logic

```verilog
reg q;
always @(posedge clk)
begin
    if a>b:
        q <= a;
end
```

A synthesizer may wish to partition its task by organizing logic into clock domains, optimizing the logic within them, and then performing timing analysis (e.g. critical path propagation delay, setup time and hold time checks) within the domain and attempt to handle signals that cross clock domains. Furthermore, FPGAs use a special hardware routing network for clocks that is distinct from general logic signals. Creation of a additional clock domains requires care and should be avoided at this time.
No Double-Edge Clocking in this Course

always @(posedge clk)
always @(negedge clk)
always @(posedge clk)
Gated Clocks

- Gated clocks avoid unnecessary switching in a system and reduce power.
- Gating introduces clock skew between parts, complicating timing by introducing additional clock domains.
- It is best to use special hardware to create additional clocks which have not yet been taught.
- For now, consider it safer to implement an enable.
- For those that want to read ahead: http://www.xilinx.com/support/answers/38099.html
Gated Clock Issues

- clk_gated is delayed from clk
  - Increased opportunity for a race condition from register A to register B
  - Increase delay from clk edge to updated Q on B
    registered reduces allowed propagation time in logic on path2

- Timing analysis and tools must be able to account for this but timing on combinatorial paths are not as predictable
Gated Clock

module gated_clock (clk, reset_, clk_gate, data, Q);
  input clk, reset_, clk_gate, data;
  output Q;
  reg Q.
  wire clk_gated = clk && clk_gate;
  always @(posedge clk_gated, negedge reset_)
    if (reset_ == 0) Q <= 0; else Q <= data;
endmodule
Gated Clock Functionality using Enable

Safer, simpler for timing analysis and optimization tools:

```verilog
module not_gated_clock (clk, reset_, data_en, data,Q);
  input clk, reset_,data_en,data;
  output Q;
  reg Q;
  always @(posedge clk or negedge reset_)
    if (reset_==0) Q<=0; else if (data_en) Q<= data;
    //else assignment to previous value is inferred
end
endmodule
```