Lecture 04 – Synthesis and Loops

Prof. Ryan Robucci

Table of Contents

References

Single Assignment Code

Consider the following code:

a=b+1;
a=a*3;

This is the same as

a = (b+1)*3;

This only assigning a, and there is a single assignment to it.

It can be implemented in hardware with

matrix b b inc inc b->inc a a triple triple inc->triple triple->a

or perhaps

matrix b b triple triple b->triple a a add + add->a three 3 three->add triple->add

Conversion to Single Assignment

There is a technique involving introduction of new variables and renaming to ensure single assignments in a section of code.

a1=b+1;
a2=a1*3;

Application of this technique simplifies identification of the source of each result, making it easier to understand the underlying data-flow structure. Data-dependency graphs can be generated more easily from single-assignment code.

MERGE

MERGE is a conceptual tool that can be introduced facilitate code analysis.
We’ll first motivate MERGE with an example.

a=b;
for(i=1;i<6;i++){
   a=a+i;
}
//(Schaumont 2010)
a1=b;
for(i=1;i<6;i++){
  a2= a? + i; //what to use? 
             // for  first iteration need a1, 
             //      thereafter       need a2
}
//(Schaumont 2010)

Solution is to use introduce the concept of a MERGE**.
A MERGE maps to a mux in hardware, typically creating a loop or feedback.

Still must ensure that no combinatorial loops are formed, adding registers and multiple clock cycles as needed (below a2 may be implemented a register to ensure this).

a1=b;
for(i=1;i<6;i++){
   a3=MERGE(i==1?a1:a2);    
                        //now a1 and a2 will be registers 
   a2=a3+1;
}
// (Schaumont 2010)

MERGE IS ONLY A CONCEPT

** MERGE is a concept, NOT an ACTUAL VERILOG LANGUAGE CONSTRUCT. Do not attempt to use merge in your code.

Unrolling and Simplification

Unrolling can be used to create single assignment code:

a1=b;
a2=a1+1;
a3=a2+2;
a4=a3+3;
a5=a4+4;
a6=a5+5;

This might be implemented in one clock cycle with HW and 5 adders as follows.

matrix 1 1 add1 + 1->add1 1 2 2 add2 + 2->add2 2 3 3 add3 + 3->add3 3 4 4 add4 + 4->add4 4 5 5 add5 + 5->add5 5 a6 a6 b b b->add1 a1 add1->add2 a2 add2->add3 a3 add3->add4 a4 add4->add5 a5 add5->a6

Algorithmic, in this case arithmetic, simplification can trim unnecessary complexity from run time

a = b+15;

matrix b b add1 + b->add1 a a 15 15 15->add1 add1->a

The preceding implies a single addition per clock cycle.

Single Assignment Code Example

This example code describes an iterative algorithm that will be implemented with registers

Original Code

int gcd(int a , int b) {
  while ( a != b) {
    if ( a > b )
      a = a-b;
    else
      b=b-a;
  }
  return a;
}
// Shaumont 2010

Single Assignment Code:

int gcd(int a1 , int b1) {
    while (MERGE(_?a1:a2)!=
                  MERGE(_?b1:b2){
      a3 = MERGE(_?a1:a2);
      b3 = MERGE(_?b1:b2);
      if (a3 > b3)
        a2 = a3-b3;
      else
        b2 = b3-a3;
    }
    return a2;
}
// Schaumont 2010

Hardware Implementation:

†Schaumont

Synthesizeable Combinatorial Code with attention to Loops

Single Assignment Code

Original Code 1:

y = a+b;
s = y+c;
y = y+s*s;

Single Assignment Code 1:

y1 = a+b;
s = y1+c;
y2 = y1+s*s;

Original Code 2:

y = a+b;
if (y < 0)
y = 0;

Single Assignment Code 2:

y1 = a+b;
flag = (y1 < 0);
y2 = flag?0:y1;

Externally formed feedback loops

If a system produces its own input, an external loop is formed.

always @(*) begin
  x = y & a;  
  y=  x & b;
end

matrix a a x x a->x b b y y b->y yin y yin->x x->y y:s->yin

Single assignment analysis:

always @(*) begin
  x = y1 & a;  
  y2=  x & b;
end

matrix a a x x a->x b b y y1 b->y yin y1 yin->x y:s->yin x->y

In the context of the use, it would become obvious that there is no external source for y1 and that its source in the orginal code is what is now called y2.
Providing an external register to exist in the loop would be acceptable, though it is fundamentally different code.

Combinational loops can be formed among multiple blocks, and this is to be avoided as well.

always @(*) begin: process_x
  x = y & a;  
end

always @(*) begin: process_y
  y=  x & b;
end

matrix cluster_x process_x cluster_y process_y a a x x a->x yin y yin->x xin x x->xin y y y->yin xin->y b b b->y

Verilog Synthesis: Feedback (data dependency loops)

It is important to be able to identify data dependency loops. Unresolvable loops cannot be implemented with combinational hardware.

Example 1

always @ (a,b) begin
 y = 1;
 y = y&a;
 y = y&b;
end

Example 2

always @ (a,y) begin
 y = ~(y&a);
end

Example 3

always @ (a,b,yA,yB) begin
 yA = ~(yB&a);
 yB = ~(yA&b);
end

Example 4

always @ (posedge clk) 
begin
 y = ~(y&a);
end

matrix a a y y a->y yin y yin->y y:s->yin cycle

A test for data dependency loops

Remember, in pre-synthesis simulation you will see 1'bx as a result, but through synthesis the possibility of 1'bx is instead interpreted as a "don't care". If your code already achieved complete assignment without the initialization to 1b'x, then the initialization should have no affect.

'x in Simulation vs. Synthesis

'x in a resulting in simulation output means that in simulation the value was undetermined. When a synthesizer analyzes code and determines that the code assigns 'x, it interprets it as a don't care and may optimizes according to implementation cost.

A test for data dependency loops in mixed comb. and seq. blocks

Without combinatorial erasure

always @ (posedge clk)
begin

 y <= _y;
 _y = ~(a&b);
end
always @ (posedge clk)
begin

 _y = ~(a&b);
 y <= _y;
end

With combinational erasure

always @ (posedge clk)
begin
 _y = 1’bx;
 y <= _y;
 _y = ~(a&b);
end
always @ (posedge clk)
begin
 _y = 1’bx;
 _y = ~(a&b);
 y <= _y;
end

Synthesized Cycle Iteration Loops

module littleloop(input clk,input a,input b,output reg y);
   always @ (posedge clk)
     begin: label
        reg _y;        
        y <= _y;
        _y = ~(a&b);
     end
endmodule

matrix a a _y _y a->_y b b b->_y _yin _y y y _yin->y _y:s->_yin cycle

little loop

module littlecorrection(input clk,input a,input b,output reg y);
   always @ (posedge clk)
     begin: label
        reg _y;        
        _y = ~(a&b);
        y <= _y;
     end
endmodule

matrix a a _y _y a->_y b b b->_y y y _y:s->y

little loop

Module with Enable:

module enableV1(input clk,input en,input d,output reg y);
   always @ (posedge clk)
     begin: label
        reg _y;
        if (en) begin
           _y=d;           
        end else begin
           _y=y;           
        end
        y <= _y;
     end
endmodule
module enableV2(input clk,input en,input d,output reg y);
   always @ (posedge clk)
     begin: label
        reg _y;
        if (en) begin
           _y=d;           
        end
                      
        
        y <= _y;
     end
endmodule
module enableV3(input clk,input en,input d,output reg y);
   always @ (posedge clk)
     begin: label

        if (en) begin
           y<=d;           
        end
                      
        

     end
endmodule

matrix en en _y _y en->_y d d d->_y yin y yin->_y y y y:s->yin _y:s->y

enable

Combinatorial Synthesis: Loops

We will classify traditionally coded loops in procedural code by how they are expected to be interpreted by synthesizers.

What can and cannot be synthesized depends on the synthesizer

The logic function is not fundamentally excluded from mapping to cominational hardware, as seen in the next example. So, while this code might not be synthesizable by every synthesizer today, others might be able to analyze the code and interpret the combinational hardware function successfully. Also what a given synthesizer can't do today, it may be able to do tomorrow. Your synthesizer should provide documentation on what constructs are allowed and it is worth checking yearly.

Synthesis: Feedback (data dependency loops)

Synthesis: keyword disable

Sequential Synthesis: Multi-Cycle Loops

Synthesis: Multi-cycle Operations

Multi-cycle computations

Multi-cycle computations and rarely supported constructs above are shown here for completeness. Later lectures will formally introduce methods for such descriptions. Do not attempt to use these in synthesizable code without further study.

Key Points pertaining to Loops

Generate Construct

A structural “for loop”: For....Generate

For...Generate example: Adder

An adder is a good example since it involves a repetitive structure with interconnection between instantiations (*use for param)

Updated Syntax in SystemVerilog

In SystemVerilog the generate and endgenerate keywords are optional. Under many use cases, the keywords can be obviously inferred by the interpreter, such as when the for loop utilizes a genvar variable, and thus the additional keywords are superfluous.

Hierarchial names of generate blocks

if statements, case-items, and for-loops,can be provided explicit hierarchical labels by providing a named the begin...end block , such as some_hierarchical_label in the example.
Variables and instances defined within can be addresses with the hierarchy <parent>.<some_hierarchical_label>.<children> or in the case of for loops <parent>.some_some_hierarchical_label[<generation_index>].<children>
At least in SystemVerilog, names for conditional statements are implied for unnamed if blocks if involving simple named parameter checking

    module top;
    parameter test_bench_only = 0;
    genvar i;
    if (test_bench_only) logic a; // top.test_bench_only.a
    else logic b; // top.test_bench_only.b

Concluding Points

Interesting Synthesis Example of Control Signals and Data

A conditional addition such as

if temp[0]==1'b1 (count=count +1);

are often synthesized with an adder, as in

count=count+temp[0];

simplifying the circuity.

matrix flag == mux mux flag->mux count count mux->count countsplit count:s->countsplit oneb 1'b1 add + oneb->add temp0 temp[0] temp0->flag add->mux:nw onea 1'b1 onea->flag countsplit->mux:ne countsplit->add:ne

matrix temp0 temp[0] add + temp0->add count count add->count:n count:s->add

Example Exercise and Discussion on Static/Dynamic Loop Determination

Does the for loop below represent a dynamic or static loop?

module myModule(a,...
input [4:0] a;

reg [29:0] thermocode;

integer i;


always ... begin
thermocode=0;
for(i=0; i<a;i++) begin
  thermocode[i]=1'b1;
end

end
..
endmodule

Hint1:
Start by trying to unroll the loop iterations

thermocode[0]=1'b1
thermocode[1]=1'b1
thermocode[2]=1'b1
thermocode[3]=1'b1
..where to stop?

Where to stop is not obvious until a is known, which only happens at "run-time"

How to make the loop static?

Proposal 1: set a to a fixed value?

Next Hint: Can you predetermine the possible range of i, which is based on the range of a?

Responses:

Lets try to unroll the loop starting with the first iteration i=0.

i=0 involves setting thermocode[0]

When is thermocode[0] set to 1?

Ans: when a>0, resulting in thermocode[0]=a>0

thermocode[0]=a>0
thermocode[1]=a>1
thermocode[2]=a>2
thermocode[3]=a>3
thermocode[4]=a>4
thermocode[5]=a>5
thermocode[6]=a>6
thermocode[7]=a>7
thermocode[8]=a>8
thermocode[9]=a>9
thermocode[10]=a>10
...
thermocode[20]=a>20
thermocode[21]=a>21
thermocode[22]=a>22
thermocode[23]=a>23
thermocode[24]=a>24
thermocode[25]=a>25
thermocode[26]=a>26
thermocode[27]=a>27
thermocode[28]=a>28
thermocode[29]=a>29
//thermocode[30]=a>30 xx thermocode is only 30-bit
//thermocode[31]=a>31 xx thermocode is only 30-bit

We can realize that though a can go up to 29, the thermocode only includes 30 bits

Now, lets shorthand this long set of repetitive code as a static loop:

for(i=0; i<30;i++) begin
  thermocode[i]=a>i;
end

So, should the original code be considered "synthesizable" or not?
Answer: depends on the synthesizer. The context in which the loop exists (here the context involves a bounded a) may be of importance, but may or may not be taken into account by the synthesizer.