Prof. Ryan Robucci
Consider the following code:
a=b+1; a=a*3;
This is the same as
a = (b+1)*3;
This only assigning a, and there is a single assignment to it.
It can be implemented in hardware with
or perhaps
There is a technique involving introduction of new variables and renaming to ensure single assignments in a section of code.
a1=b+1;
a2=a1*3;
Application of this technique simplifies identification of the source of each result, making it easier to understand the underlying data-flow structure. Data-dependency graphs can be generated more easily from single-assignment code.
MERGE is a conceptual tool that can be introduced facilitate code analysis.
We’ll first motivate MERGE with an example.
a=b; for(i=1;i<6;i++){ a=a+i; } //(Schaumont 2010)
a1=b; for(i=1;i<6;i++){ a2= a? + i; //what to use? // for first iteration need a1, // thereafter need a2 } //(Schaumont 2010)
Solution is to use introduce the concept of a MERGE**.
A MERGE maps to a mux in hardware, typically creating a loop or feedback.
Still must ensure that no combinatorial loops are formed, adding registers and multiple clock cycles as needed (below a2 may be implemented a register to ensure this).
a1=b; for(i=1;i<6;i++){ a3=MERGE(i==1?a1:a2); //now a1 and a2 will be registers a2=a3+1; } // (Schaumont 2010)
MERGE IS ONLY A CONCEPT
** MERGE is a concept, NOT an ACTUAL VERILOG LANGUAGE CONSTRUCT. Do not attempt to use merge in your code.
Unrolling can be used to create single assignment code:
a1=b; a2=a1+1; a3=a2+2; a4=a3+3; a5=a4+4; a6=a5+5;
This might be implemented in one clock cycle with HW and 5 adders as follows.
Algorithmic, in this case arithmetic, simplification can trim unnecessary complexity from run time
a = b+15;
The preceding implies a single addition per clock cycle.
This example code describes an iterative algorithm that will be implemented with registers
Original Code
int gcd(int a , int b) { while ( a != b) { if ( a > b ) a = a-b; else b=b-a; } return a; } // Shaumont 2010
Single Assignment Code:
int gcd(int a1 , int b1) { while (MERGE(_?a1:a2)!= MERGE(_?b1:b2){ a3 = MERGE(_?a1:a2); b3 = MERGE(_?b1:b2); if (a3 > b3) a2 = a3-b3; else b2 = b3-a3; } return a2; } // Schaumont 2010
Hardware Implementation:
Think of each statement as a node on a graph with the edges denoting dependencies. Nodes can be producers and consumers of values. A graph with loops cannot be directly resolved as a combinatorial circuit.
The inputs not generated from within the code are also nodes (they represent an assignment from elsewhere).
always @ (a,b,c) begin y1 = a+b; y2 = y1+c; y3 = y1+y2*y2; end
Branches can be thought of as multiplexers that depend on the evaluation of conditional expression. A new flag variable based on the condition evaluation may be introduced to make this clear.
y1 = a+b; if (y1 < 0) y2 = 0; else y2 = y1;
always @ (a,b) begin y1 = a+b; flag = (y1 < 0); y2 = flag?0:y1; end
To achieve the status of single assignment code, every variable may only be assigned once
We may need to convert code to an equivalent single-assignment code to understand its underlying structure. To do this introduce additional variables when variables are assignment more than once
Original Code 1:
y = a+b; s = y+c; y = y+s*s;
Single Assignment Code 1:
y1 = a+b; s = y1+c; y2 = y1+s*s;
Original Code 2:
y = a+b; if (y < 0) y = 0;
Single Assignment Code 2:
y1 = a+b; flag = (y1 < 0); y2 = flag?0:y1;
If a system produces its own input, an external loop is formed.
always @(*) begin x = y & a; y= x & b; end
Single assignment analysis:
always @(*) begin x = y1 & a; y2= x & b; end
In the context of the use, it would become obvious that there is no external source for y1 and that its source in the orginal code is what is now called y2.
Providing an external register to exist in the loop would be acceptable, though it is fundamentally different code.
Combinational loops can be formed among multiple blocks, and this is to be avoided as well.
always @(*) begin: process_x x = y & a; end always @(*) begin: process_y y= x & b; end
It is important to be able to identify data dependency loops. Unresolvable loops cannot be implemented with combinational hardware.
always @ (a,b) begin y = 1; y = y&a; y = y&b; end
always @ (a,y) begin y = ~(y&a); end
always @ (a,b,yA,yB) begin yA = ~(yB&a); yB = ~(yA&b); end
always @ (posedge clk) begin y = ~(y&a); end
y
is inserted by a synthesizer, but alongside other blocks, the danger is non-deterministic pre-synthesis simulation. This code should use non-blocking assignment.
A conceptual test to understand if procedural code is implementing strictly combinatorial logic is to set every combinatorial result assigned to x at the entry point in the code, representing an erasure of any previous result lingering in the simulation
If the behavior would change in even one case in even one way, then the code was not strictly implementing combinatorial logic
always @ (a,y) begin y=1’bx; //** y = ~(y&a); end
always @(a,b,yA,yB) begin yA=1’bx; yB=1’bx; //** yA = ~(yB&a); yB = ~(yA&b); end
always @(a,b) begin y=1’bx; //** y = 0; y = y&a; y = y&b; end
always @(a,b,yA,yB) begin yA=1’bx; yB=1’bx; //** yA = ~(yB&a); yB = ~(yA&b); end
always @(a,b) begin p=1’bx; y=1’bx; //** if (a) p=1; y = ~(p&b); end
always@(a,b,c,s,y) begin p=1’bx; y=1’bx; //** if (s) begin p = (a&b); y = ~(yA|c); end else begin y = a|b; end end
Remember, in pre-synthesis simulation you will see 1'bx
as a result, but through synthesis the possibility of 1'bx
is instead interpreted as a "don't care". If your code already achieved complete assignment without the initialization to 1b'x
, then the initialization should have no affect.
'x in Simulation vs. Synthesis
'x in a resulting in simulation output means that in simulation the value was undetermined. When a synthesizer analyzes code and determines that the code assigns 'x, it interprets it as a don't care and may optimizes according to implementation cost.
Without combinatorial erasure
always @ (posedge clk) begin y <= _y; _y = ~(a&b); end
always @ (posedge clk) begin _y = ~(a&b); y <= _y; end
With combinational erasure
always @ (posedge clk) begin _y = 1’bx; y <= _y; _y = ~(a&b); end
always @ (posedge clk) begin _y = 1’bx; _y = ~(a&b); y <= _y; end
module littleloop(input clk,input a,input b,output reg y); always @ (posedge clk) begin: label reg _y; y <= _y; _y = ~(a&b); end endmodule
module littlecorrection(input clk,input a,input b,output reg y); always @ (posedge clk) begin: label reg _y; _y = ~(a&b); y <= _y; end endmodule
Module with Enable:
module enableV1(input clk,input en,input d,output reg y); always @ (posedge clk) begin: label reg _y; if (en) begin _y=d; end else begin _y=y; end y <= _y; end endmodule
module enableV2(input clk,input en,input d,output reg y); always @ (posedge clk) begin: label reg _y; if (en) begin _y=d; end y <= _y; end endmodule
module enableV3(input clk,input en,input d,output reg y); always @ (posedge clk) begin: label if (en) begin y<=d; end end endmodule
We will classify traditionally coded loops in procedural code by how they are expected to be interpreted by synthesizers.
Static Loops: Number if iterations defined at compile time.
Often synthesizers cannot convert non-static loops to combinatorial circuit. Dynamic-loops are those for which determination of the number of loop iterations required is determined at run-time, usually because the number of loops is determined by a run-time variable.
In the example below, the condition that is checked before every iteration is dependent on assignments within the body of the loop.
Furthermore, the multiple data movements are problematic
While-like Loop:
temp=datain; count =0; for (index=0;|temp;index=index+1) begin if temp[0]==1 (count=count +1) begin temp = temp>>1; end end
The previous code can be rewritten to have well-defined static loop count and no implied data movement. The constant 8
sets the statically defined loop count.
count =0; for(index=0;index<8;index=index+1) begin if temp[index]==1 (count=count +1); end
What can and cannot be synthesized depends on the synthesizer
The logic function is not fundamentally excluded from mapping to cominational hardware, as seen in the next example. So, while this code might not be synthesizable by every synthesizer today, others might be able to analyze the code and interpret the combinational hardware function successfully. Also what a given synthesizer can't do today, it may be able to do tomorrow. Your synthesizer should provide documentation on what constructs are allowed and it is worth checking yearly.
Registered output logic (mix comb and seq.) should be separated to understand the dependencies. New variables may be introduced to denote the difference in signals before and after a register.
module counter(input clk, output reg [3:0] count); parameter CNT_MAX=11; always @ (posedge clk) begin if (count == CNT_MAX) count <= 0; else count <= count + 1; end endmodule
Feedback is perhaps unclear here. See rewrite below.
module counter(input clk, output reg [3:0] count); parameter CNT_MAX=11; reg [3:0] count_prereg; always @ (posedge clk) begin count <= count_prereg; end always @ * begin: comb1 reg flag; flag = (count == CNT_MAX); count_prereg = flag?0:(count+1); //could also have written if else using flag condition end endmodule
disable
may be used to implement a “break” from a loop. Consider this not yet covered and avoid for now if without further study.Note that you may be able describe a sequential circuit with non-static loops, but it is common to be NOT SUPPORTED by synthesizers. The following suggests a multi-cycle (multiple clock periods) operation.
"while" loop with iteration synced to a clock:
count=0; for(index=0;|temp;index=index+1) begin @(posedge clk); if temp[0]==1 (count=count +1) temp>>1; end
This is a forward note, we'll learn about this more later in the course.
It is typical to employ multi-cycle operations to reduce hardware through resource sharing (reuse of hardware in difference clock cycles) and reduce the critical path lengths.
In the land of simulation one can model such behavior as follows, though it is usually not supported for synthesis
always @ (posedge clk) begin temp=a*b; @(posege clk) y=temp*c; end
We’ll want to understand how to implement multi-cycle operations using state machines to create variations based on the number of resources (e.g. multipliers, registers) affordable or desired, as factors like clock-cycle latency and throughput, power, clock freq., etc...
Multi-cycle computations
Multi-cycle computations and rarely supported constructs above are shown here for completeness. Later lectures will formally introduce methods for such descriptions. Do not attempt to use these in synthesizable code without further study.
genvar index; generate for (index=0; index < 8; index=index+1) begin: gen_code_label BUFR BUFR_inst ( .O(clk_o[index]), // Clock buffer output .CE(ce), // Clock enable input .CLR(clear), // Clock buffer reset input .I(clk_i[index]) // Clock buffer input ); end endgenerate
An adder is a good example since it involves a repetitive structure with interconnection between instantiations (*use for param)
Study the details of the carefully crafted index in the carry out of each instance
assign cin[0]=1'b0; genvar index; generate for (index=0; index < 8; index=index+1) begin: some_hierarchical_label adder adder_inst ( .cin(cin[index]), .a(a[index]), .b(b[index]), .cout(cin[index+1]), .y(u[index]) ); end endgenerate
Updated Syntax in SystemVerilog
In SystemVerilog the generate
and endgenerate
keywords are optional. Under many use cases, the keywords can be obviously inferred by the interpreter, such as when the for loop utilizes a genvar
variable, and thus the additional keywords are superfluous.
Hierarchial names of generate blocks
if statements, case-items, and for-loops,can be provided explicit hierarchical labels by providing a named the begin...end
block , such as some_hierarchical_label
in the example.
Variables and instances defined within can be addresses with the hierarchy <parent>.<some_hierarchical_label>.<children>
or in the case of for loops <parent>.some_some_hierarchical_label[<generation_index>].<children>
At least in SystemVerilog, names for conditional statements are implied for unnamed if blocks if involving simple named parameter checking
module top; parameter test_bench_only = 0; genvar i; if (test_bench_only) logic a; // top.test_bench_only.a else logic b; // top.test_bench_only.b
A conditional addition such as
if temp[0]==1'b1 (count=count +1);
are often synthesized with an adder, as in
count=count+temp[0];
simplifying the circuity.
Does the for loop below represent a dynamic or static loop?
module myModule(a,... input [4:0] a; reg [29:0] thermocode; integer i; always ... begin thermocode=0; for(i=0; i<a;i++) begin thermocode[i]=1'b1; end end .. endmodule
Hint1:
Start by trying to unroll the loop iterations
thermocode[0]=1'b1 thermocode[1]=1'b1 thermocode[2]=1'b1 thermocode[3]=1'b1 ..where to stop?
Where to stop is not obvious until a
is known, which only happens at "run-time"
How to make the loop static?
Proposal 1: set a
to a fixed value?
Next Hint: Can you predetermine the possible range of i
, which is based on the range of a
?
Responses:
Lets try to unroll the loop starting with the first iteration i=0.
i=0 involves setting thermocode[0]
When is thermocode[0] set to 1?
Ans: when a>0, resulting in thermocode[0]=a>0
thermocode[0]=a>0 thermocode[1]=a>1 thermocode[2]=a>2 thermocode[3]=a>3 thermocode[4]=a>4 thermocode[5]=a>5 thermocode[6]=a>6 thermocode[7]=a>7 thermocode[8]=a>8 thermocode[9]=a>9 thermocode[10]=a>10 ... thermocode[20]=a>20 thermocode[21]=a>21 thermocode[22]=a>22 thermocode[23]=a>23 thermocode[24]=a>24 thermocode[25]=a>25 thermocode[26]=a>26 thermocode[27]=a>27 thermocode[28]=a>28 thermocode[29]=a>29 //thermocode[30]=a>30 xx thermocode is only 30-bit //thermocode[31]=a>31 xx thermocode is only 30-bit
We can realize that though a
can go up to 29, the thermocode only includes 30 bits
Now, lets shorthand this long set of repetitive code as a static loop:
for(i=0; i<30;i++) begin thermocode[i]=a>i; end
So, should the original code be considered "synthesizable" or not?
Answer: depends on the synthesizer. The context in which the loop exists (here the context involves a bounded a
) may be of importance, but may or may not be taken into account by the synthesizer.