Lecture 11 – Synthesis and Loops

Ryan Robucci

1. Lecture 11 – Synthesis and Loops

Prof. Ryan Robucci

2. References

  • (Schaumont 2010) Schaumont, Patrick

A Practical Introduction to Hardware/Software Codesign, Springer ISBN 978-1-4614-3737-6 https://www.springer.com/gp/book/9781441960009

3. Single Assignment Code

Consider the following code:

a=b+1;
a=a*3;

This is the same as

a = (b+1)*3;

This only assigning a, and there is a single assignment to it.

It can be implemented in hardware with

dot-1.svg

or perhaps

dot-2.svg

4. Conversion to Single Assignment

There is a technique involving introduction of new variables and renaming to ensure single assignments in a section of code.

a1=b+1;
a2=a1*3;

Application of this technique simplifies identification of the source of each result, making it easier to understand the underlying data-flow structure. Data-dependency graphs can be generated more easily from single-assignment code.

5. MERGE

  • MERGE is a conceptual tool that can be introduced facilitate code analysis.

We’ll first motivate MERGE with an example.

a=b;
for(i=1;i<6;i++){
   a=a+i;
}
//(Schaumont 2010)
a1=b;
for(i=1;i<6;i++){
  a2= a? + i; //what to use? 
             // for  first iteration need a1, 
             //      thereafter       need a2
}
//(Schaumont 2010)

Solution is to use introduce the concept of a MERGE. A MERGE maps to a mux in hardware (for dynamic decisions, while for static it is optimized away), typically supporting a loop construct or feedback.

Still must ensure that no combinatorial loops are formed, adding registers and multiple clock cycles as needed (below a2 may be implemented a register to ensure this).

a1=b;
for(i=1;i<6;i++){
   a3=MERGE(i==1?a1:a2);    
                        //now a1 and a2 will be registers 
   a2=a3+1;
}
// (Schaumont 2010)

WARNING MERGE IS ONLY A CONCEPT

MERGE is a concept, NOT an ACTUAL VERILOG LANGUAGE CONSTRUCT. Do not attempt to use merge in your code.

6. Unrolling and Simplification

Unrolling can be used to create single assignment code:

a1=b;
a2=a1+1;
a3=a2+2;
a4=a3+3;
a5=a4+4;
a6=a5+5;

This might be implemented in one clock cycle with HW and 5 adders as follows.

dot-03.svg

Algorithmic, in this case arithmetic, simplification can trim unnecessary complexity from run time

a = b+15;

dot-04.svg

The preceding implies a single addition per clock cycle.

7. Single Assignment Code Example

This example code describes an iterative algorithm that will be implemented with registers

Original Code

int gcd(int a , int b) {
  while ( a != b) {
    if ( a > b )
      a = a-b;
    else
      b=b-a;
  }
  return a;
}
// Shaumont 2010

Single Assignment Code:

int gcd(int a1 , int b1) {
    while (MERGE(_?a1:a2)!=
                  MERGE(_?b1:b2){
      a3 = MERGE(_?a1:a2);
      b3 = MERGE(_?b1:b2);
      if (a3 > b3)
        a2 = a3-b3;
      else
        b2 = b3-a3;
    }
    return a2;
}
// Schaumont 2010

Hardware Implementation:

<div class="borrowed" style="width:200px">

<div class="bottom-right">&dagger;Schaumont</div> </div>

  • Single Assignment Code allows examination of data dependencies and hardware resources such as what can be done in a single clock cycle (combinatorial) and where a register is required.
  • These concepts are also important when writing behavioral HDL code in Verilog or VHDL.

8. Synthesizeable Combinatorial Code with attention to Loops

  • Today we'll discuss a few constructs which involve a concern of Control Loops and Data (dependency) Loops in procedural HDL
  • We'll first concern ourselves with combinatorial behavior and then allow additional flexibility with sequential
  • Should not attempt to synthesize any code that implements unresolvable dependency loops as combinatorial HW (executes in one clock cycle)

9. Single Assignment Code

  • Think of each statement as a node on a graph with the edges denoting dependencies. Nodes can be producers and consumers of values. A graph with loops cannot be directly resolved as a combinatorial circuit.

The inputs not generated from within the code are also nodes (they represent an assignment from elsewhere).

always_comb begin //always @ (a,b,c)
y1 = a+b;
y2 = y1+c;
y3 = y1+y2*y2;
end

dot-05.svg

  • Note how in this code segment alone the output is not well-defined. It is not clear of y1,y2 are used elsewhere or if they are just working variables for partial results.
  • Branches can be thought of as multiplexers that depend on the evaluation of conditional expression. A new flag variable based on the condition evaluation may be introduced to make this clear.

    y1 = a+b;
    if (y1 < 0)
      y2 = 0;
    else
      y2 = y1;
    
always_comb begin //always @ (a,b)
y1 = a+b;
flag = (y1 < 0);
y2 = flag?0:y1;
end

dot-06.svg

  • To achieve the status of single assignment code, every variable may only be assigned once
  • We may need to convert code to an equivalent single-assignment code to understand its underlying structure. To do this introduce additional variables when variables are assignment more than once

Original Code 1:

y = a+b;
s = y+c;
y = y+s*s;

Single Assignment Code 1:

y1 = a+b;
s = y1+c;
y2 = y1+s*s;

Original Code 2:

y = a+b;
if (y < 0)
y = 0;

Single Assignment Code 2:

y1 = a+b;
flag = (y1 < 0);
y2 = flag?0:y1;

10. Externally formed feedback loops

If a system produces its own input, an external loop is formed.

<div class=grid> <div class=left>

always_comb begin
  u = u & a;  
  v=  v & b;
end

dot-07.svg

  • Some synthesis tools might produce an error and halt synthesis, but the SystemVerilog standard only states that
    • Software tools should perform additional checks to warn if the behavior within an alwayscomb procedure does not represent combinational logic, such as if latched behavior can be inferred.

(At the time of this note, Vivado exhibited no difference in behavior for always_comb vs. always @(*) in this code)

An advanced code linter can find the issue:

11. Combinational Loop detection with Single Assignment Code Analysis

  • In this exercise,
    • let the inputs to the block be indexed with 0
    • start assignments with index 1
always @(*) begin //a,b, and use-before-assignment y
  u =  v & a;  
  v =  u & b;
end

becomes

always @(*) begin //a0,b0, and use-before-assignment v0
  u1 =  v0 & a0;
  v1 =  u1 & b0;
end
  • v0 is created nowhere, indicating use-before-assignment of v
    • In the context of the use, it would become obvious that there is no external source for y0 and that its source in the original code is what is now called y1.
      • Providing an external register to exist in the loop would be acceptable, though that would be fundamentally different code.

dot-08.svg

</div> </div>

Combinational loops can be formed among multiple blocks, and this is to be avoided as well.

always @(*) begin: process_u
  u = y & a;  
end

always @(*) begin: process_y
  y=  u & b;
end

dot-09.svg

12. Verilog Synthesis: Feedback (data dependency loops)

It is important to be able to identify data dependency loops. Unresolvable loops cannot be implemented with combinational hardware.

Example 1

always @ (a,b) begin
 y = 1;
 y = y&a;
 y = y&b;
end
  • No feedback after substitutions

Example 2

always @ (a,y) begin
 y = ~(y&a);
end
  • Unresolveable Feedback

Example 3

always @ (a,b,yA,yB) begin
 yA = ~(yB&a);
 yB = ~(yA&b);
end
  • Feedback

Example 4

always @ (posedge clk) 
begin
 y = ~(y&a);
end
  • This clearly does not attempt to describe combinatorial hardware it uses an edge-selective trigger to describe sequential hardware. A register `y` is inserted by a synthesizer, but alongside other blocks, the danger is non-deterministic pre-synthesis simulation. This code should use non-blocking assignment.

dot-09.svg

13. A test for data dependency loops

  • A conceptual test to understand if procedural code is implementing strictly combinatorial logic is to set every combinatorial result assigned to x at the entry point in the code, representing an erasure of any previous result lingering in the simulation
  • If the behavior would change in even one case in even one way, then the code was not strictly implementing combinatorial logic

    always @ (a,y) begin
      y=1’bx; //**
      y = ~(y&a);
    end
    
always @(a,b,yA,yB) begin
 yA=1’bx; yB=1’bx; //**
 yA = ~(yB&a);
 yB = ~(yA&b);
end
always @(a,b) begin
 y=1’bx; //**
 y = 0;
 y = y&a;
 y = y&b;
end
always @(a,b,yA,yB) begin
 yA=1’bx; yB=1’bx; //**
 yA = ~(yB&a);
 yB = ~(yA&b);
end
always @(a,b) begin
 p=1’bx; y=1’bx; //**
 if (a) p=1;
 y = ~(p&b);
end
always@(a,b,c,s,y) begin
 p=1’bx; y=1’bx; //**
 if (s) begin
   p = (a&b);
   y = ~(yA|c);
 end else begin
   y = a|b;  
 end
end
  • Remember, in pre-synthesis simulation you will see 1'bx as a result, but through synthesis the possibility of 1'bx is instead interpreted as a "don't care".
  • If your code already achieved complete assignment without the initialization to 1'bx, then the combinatorial erasure should have no affect.

Note 'x in Simulation vs. Synthesis

  • In Simulation, 'x indicates either undetermined values or explicit assignment to 'x
  • When a synthesizer analyzes code and determines that the code explicitly assigns 'x, it interprets it as a don't care and may optimizes according to an implementation cost function.

14. A test for data dependency loops in mixed comb. and seq. blocks

  • The combinational erasure test works for internal combinatorial values computed in edge-selective-trigger blocks.
  • Setting combinatorial values to 'x at code entry may help identify issues in pre-synthesis simulation but it should not be kept in released code where a don't care for synthesis is not intended.
  • In the following code, can you describe a scenario in which the erasure has an effect?

Examples Without combinatorial erasure

always @ (posedge clk)
begin

 y <= _y;        //**
 _y = ~(a&b);    //**
end
always @ (posedge clk)
begin

 _y = ~(a&b);   //**
 y <= _y;       //**
end

Examples With combinational erasure

always @ (posedge clk)
begin
 _y = 1’bx;
 y <= _y;
 _y = ~(a&b);
end
always @ (posedge clk)
begin
 _y = 1’bx;
 _y = ~(a&b);
 y <= _y;
end

15. Synthesized Cycle Iteration Loops

module littleloop(input clk,input a,input b,output reg y);
   always @ (posedge clk)
     begin: label
        reg _y;        
        y <= _y;
        _y = ~(a&b);
     end
endmodule

dot-10.svg

little loop

module littlecorrection(input clk,input a,input b,output reg y);
   always @ (posedge clk)
     begin: label
        reg _y;        
        _y = ~(a&b);
        y <= _y;
     end
endmodule

dot-11.svg

little loop

Module with Enable:

  • The following modules produce the same result, with feedback from the register. Some give an explicit name to the net attached to the register data input .
module enableV1(input clk,input en,input d,output reg y);
   always @ (posedge clk)
     begin: label
        reg _y;
        if (en) begin
           _y=d;           
        end else begin
           _y=y;           
        end
        y <= _y;
     end
endmodule
module enableV2(input clk,input en,input d,output reg y);
   always @ (posedge clk)
     begin: label
        reg _y;
        if (en) begin
           _y=d;           
        end


        y <= _y;
     end
endmodule
module enableV3(input clk,input en,input d,output reg y);
   always @ (posedge clk)
     begin: label

        if (en) begin
           y<=d;           
        end



     end
endmodule

dot-12.svg

enable

16. Combinatorial Synthesis: Loops

We will classify traditionally coded loops in procedural code by how they are expected to be interpreted by synthesizers.

  • Static Loops: Number if iterations defined at compile time.
    • Could directly perform finite loop unrolling
  • Often synthesizers cannot convert non-static loops to combinatorial circuit.
  • Dynamic-loops are those for which determination of the number of loop iterations required is determined at run-time, usually because the number of loops is determined by a run-time variable.
  • In the example below, the condition that is checked before every iteration is dependent on assignments within the body of the loop.
  • Furthermore, the multiple data movements are problematic

    While-like Loop:

    temp=datain;
    count =0;
    for (index=0;|temp;index=index+1) begin
      if temp[0]==1 (count=count +1) begin
        temp = temp>>1;
      end
    end
    
  • The previous code can be rewritten to have well-defined static loop count and no implied data movement. The constant `8` sets the statically defined loop count.

    count =0;
    for(index=0;index<8;index=index+1) begin
      if temp[index]==1 (count=count +1);
    end
    

What can and cannot be synthesized depends on the synthesizer

The logic function is not fundamentally excluded from mapping to cominational hardware, as seen in the next example. So, while this code might not be synthesizable by every synthesizer today, others might be able to analyze the code and interpret the combinational hardware function successfully. Also what a given synthesizer can't do today, it may be able to do tomorrow. Your synthesizer should provide documentation on what constructs are allowed and it is worth checking yearly.

17. Synthesis: Feedback (data dependency loops)

  • Registered output logic (mix comb and seq.) should be separated to understand the dependencies. New variables may be introduced to denote the difference in signals before and after a register.

    module counter(input clk, output reg [3:0] count);
       parameter CNT_MAX=11;
       always @ (posedge clk) begin
          if (count == CNT_MAX)
            count <= 0;
          else
            count <= count + 1;
       end
    endmodule
    
  • Feedback is perhaps unclear here. See rewrite below.

    module counter(input clk, output reg [3:0] count);
       parameter CNT_MAX=11;
    
       reg [3:0] count_prereg;
       always @ (posedge clk) begin
          count <= count_prereg;
       end
       always @ * begin: comb1
          reg flag;
          flag = (count == CNT_MAX);
          count_prereg = flag?0:(count+1); //could also have written if else using flag condition
       end
    endmodule
    

counter

wavedrome-00.svg

  • Feedback across clock cycles is acceptable.
  • No feedback in combinational partition.

18. Synthesis: keyword disable

  • The keyword disable may be used to implement a “break” from a loop. Consider this not yet covered and avoid for now if without further study.

19. Sequential Synthesis: Multi-Cycle Loops

  • Note that you may be able describe a sequential circuit with non-static loops, but it is common to be NOT SUPPORTED by synthesizers. The following suggests a multi-cycle (multiple clock periods) operation.

    "while" loop with iteration synced to a clock:

    count=0;
    for(index=0;|temp;index=index+1) begin
      @(posedge clk);
      if temp[0]==1 (count=count +1)
      temp>>1;
    end
    

20. Synthesis: Multi-cycle Operations

  • This is a forward note, we'll learn about this more later in the course.
  • It is typical to employ multi-cycle operations to reduce hardware through resource sharing (reuse of hardware in difference clock cycles) and reduce the critical path lengths.

    dot-13.svg

  • In the land of simulation one can model such behavior as follows, though it is usually not supported for synthesis

    always @ (posedge clk) 
    begin
      temp=a*b;
      @(posege clk)
      y=temp*c;
    end
    
  • We’ll want to understand how to implement multi-cycle operations using state machines to create variations based on the number of resources (e.g. multipliers, registers) affordable or desired, as factors like clock-cycle latency and throughput, power, clock freq., etc…

Warning Multi-cycle computations

Multi-cycle computations and rarely supported constructs above are shown here for completeness. Later lectures will formally introduce methods for such descriptions. Do not attempt to use these in synthesizable code without further study.

21. Generate Construct

A structural “for loop”: For….Generate

  • Uses a special indexing variable defined using genvar. Within a generate block indicated using keyword generate
  • Use for repetitive structural instantiations

    genvar index;
    generate
      for (index=0; index < 8; index=index+1) begin: gen_code_label
        BUFR BUFR_inst (
          .O(clk_o[index]), // Clock buffer output
          .CE(ce), // Clock enable input
          .CLR(clear), // Clock buffer reset input
          .I(clk_i[index]) // Clock buffer input
        );
      end
    endgenerate
    

For…Generate example: Adder

An adder is a good example since it involves a repetitive structure with interconnection between instantiations (*use for param)

  • Study the details of the carefully crafted index in the carry out of each instance

    assign cin[0]=1'b0;
    genvar index;
    generate
      for (index=0; index < 8; index=index+1) begin: some_hierarchical_label
        adder adder_inst (
          .cin(cin[index]),
          .a(a[index]), 
          .b(b[index]),
          .cout(cin[index+1]),
          .y(u[index])
        );
      end
    endgenerate
    

dot-14.svg

Updated Syntax in SystemVerilog

In SystemVerilog the generate and endgenerate keywords are optional. Under many use cases, the keywords can be easily inferred by the interpreter, such as when the for loop utilizes a genvar variable, and thus the additional keywords are superfluous.

Note Hierarchial names of generate blocks

if statements, case-items, and for-loops,can be provided explicit hierarchical labels by providing a named the begin...end block , such as some_hierarchical_label in the example. Variables and instances defined within can be addresses with the hierarchy <parent>.<some_hierarchical_label>.<children> or in the case of for loops <parent>.some_some_hierarchical_label[<generation_index>].<children> At least in SystemVerilog, names for conditional statements are implied for unnamed if blocks if involving simple named parameter checking

module top;
parameter test_bench_only = 0;
genvar i;
if (test_bench_only) logic a; // top.test_bench_only.a
else logic b; // top.test_bench_only.b

22. Concluding Points

  • Combinatorial dependency loops cannot be synthesized
  • Static for loops can be synthesized by being unrolled
  • Dynamic for Loops may not be synthesizable
  • Dynamic for Loops with timing control may be synthesized as a “multicycle operation” or a state machine.
  • Repetitive/Patterned Structural Instantiations may be done with for…generate loops
  • We'll want to formalize multi-cycle operations as state machines.

23. Interesting Synthesis Example of Control Signals and Data

A conditional addition such as

if temp[0]==1'b1 (count=count +1);

are often synthesized with an adder, as in

count=count+temp[0];

simplifying the circuity.

dot-15.svg

dot-16.svg

24. Example Exercise and Discussion on Static/Dynamic Loop Determination

Does the for loop below represent a dynamic or static loop?

module myModule(a,...
input [4:0] a;

reg [29:0] thermocode;

integer i;


always ... begin
thermocode=0;
for(i=0; i<a;i++) begin
  thermocode[i]=1'b1;
end

end
..
endmodule

Hint1: Start by trying to unroll the loop iterations

thermocode[0]=1'b1
thermocode[1]=1'b1
thermocode[2]=1'b1
thermocode[3]=1'b1
..where to stop?

Where to stop is not obvious until a is known, which only happens at "run-time"

How to make the loop static?

Proposal 1: set a to a fixed value?

  • this would make the loop static but since a is an input from another module we want to keep the input

Next Hint: Can you predetermine the possible range of i, which is based on the range of a?

Responses:

  • 230
  • 0 to 31
  • [0,a)
  • 25

Lets try to unroll the loop starting with the first iteration i=0.

i=0 involves setting thermocode[0]

When is thermocode[0] set to 1?

Ans: when a>0, resulting in thermocode[0]=a>0

thermocode[0]=a>0
thermocode[1]=a>1
thermocode[2]=a>2
thermocode[3]=a>3
thermocode[4]=a>4
thermocode[5]=a>5
thermocode[6]=a>6
thermocode[7]=a>7
thermocode[8]=a>8
thermocode[9]=a>9
thermocode[10]=a>10
...
thermocode[20]=a>20
thermocode[21]=a>21
thermocode[22]=a>22
thermocode[23]=a>23
thermocode[24]=a>24
thermocode[25]=a>25
thermocode[26]=a>26
thermocode[27]=a>27
thermocode[28]=a>28
thermocode[29]=a>29
//thermocode[30]=a>30 xx thermocode is only 30-bit
//thermocode[31]=a>31 xx thermocode is only 30-bit

We can realize that though a can go up to 29, the thermocode only includes 30 bits

Now, lets shorthand this long set of repetitive code as a static loop:

for(i=0; i<30;i++) begin
  thermocode[i]=a>i;
end

So, should the original code be considered "synthesizable" or not? Answer: depends on the synthesizer. The context in which the loop exists (here the context involves a bounded a) may be of importance, but may or may not be taken into account by the synthesizer.

Author: Dr. Ryan Robucci

Created: 2025-04-02 Wed 11:52

Validate