Lecture 11 – Synthesis and Loops
Ryan Robucci
1. Lecture 11 – Synthesis and Loops
Prof. Ryan Robucci
2. References
- (Schaumont 2010) Schaumont, Patrick
A Practical Introduction to Hardware/Software Codesign, Springer ISBN 978-1-4614-3737-6 https://www.springer.com/gp/book/9781441960009
3. Single Assignment Code
Consider the following code:
a=b+1; a=a*3;
This is the same as
a = (b+1)*3;
This only assigning a, and there is a single assignment to it.
It can be implemented in hardware with
or perhaps
4. Conversion to Single Assignment
There is a technique involving introduction of new variables and renaming to ensure single assignments in a section of code.
a1=b+1; a2=a1*3;
Application of this technique simplifies identification of the source of each result, making it easier to understand the underlying data-flow structure. Data-dependency graphs can be generated more easily from single-assignment code.
5. MERGE
- MERGE is a conceptual tool that can be introduced facilitate code analysis.
We’ll first motivate MERGE with an example.
a=b; for(i=1;i<6;i++){ a=a+i; } //(Schaumont 2010)
a1=b; for(i=1;i<6;i++){ a2= a? + i; //what to use? // for first iteration need a1, // thereafter need a2 } //(Schaumont 2010)
Solution is to use introduce the concept of a MERGE. A MERGE maps to a mux in hardware (for dynamic decisions, while for static it is optimized away), typically supporting a loop construct or feedback.
Still must ensure that no combinatorial loops are formed, adding registers and multiple clock cycles as needed (below a2 may be implemented a register to ensure this).
a1=b; for(i=1;i<6;i++){ a3=MERGE(i==1?a1:a2); //now a1 and a2 will be registers a2=a3+1; } // (Schaumont 2010)
MERGE is a concept, NOT an ACTUAL VERILOG LANGUAGE CONSTRUCT. Do not attempt to use merge in your code.
6. Unrolling and Simplification
Unrolling can be used to create single assignment code:
a1=b; a2=a1+1; a3=a2+2; a4=a3+3; a5=a4+4; a6=a5+5;
This might be implemented in one clock cycle with HW and 5 adders as follows.
Algorithmic, in this case arithmetic, simplification can trim unnecessary complexity from run time
a = b+15;
The preceding implies a single addition per clock cycle.
7. Single Assignment Code Example
This example code describes an iterative algorithm that will be implemented with registers
Original Code
int gcd(int a , int b) { while ( a != b) { if ( a > b ) a = a-b; else b=b-a; } return a; } // Shaumont 2010
Single Assignment Code:
int gcd(int a1 , int b1) { while (MERGE(_?a1:a2)!= MERGE(_?b1:b2){ a3 = MERGE(_?a1:a2); b3 = MERGE(_?b1:b2); if (a3 > b3) a2 = a3-b3; else b2 = b3-a3; } return a2; } // Schaumont 2010
Hardware Implementation:
<div class="borrowed" style="width:200px">
<div class="bottom-right">†Schaumont</div> </div>
- Single Assignment Code allows examination of data dependencies and hardware resources such as what can be done in a single clock cycle (combinatorial) and where a register is required.
- These concepts are also important when writing behavioral HDL code in Verilog or VHDL.
8. Synthesizeable Combinatorial Code with attention to Loops
- Today we'll discuss a few constructs which involve a concern of Control Loops and Data (dependency) Loops in procedural HDL
- We'll first concern ourselves with combinatorial behavior and then allow additional flexibility with sequential
- Should not attempt to synthesize any code that implements unresolvable dependency loops as combinatorial HW (executes in one clock cycle)
9. Single Assignment Code
- Think of each statement as a node on a graph with the edges denoting dependencies. Nodes can be producers and consumers of values. A graph with loops cannot be directly resolved as a combinatorial circuit.
The inputs not generated from within the code are also nodes (they represent an assignment from elsewhere).
always_comb begin //always @ (a,b,c) y1 = a+b; y2 = y1+c; y3 = y1+y2*y2; end
- Note how in this code segment alone the output is not well-defined. It is not clear of y1,y2 are used elsewhere or if they are just working variables for partial results.
Branches can be thought of as multiplexers that depend on the evaluation of conditional expression. A new flag variable based on the condition evaluation may be introduced to make this clear.
y1 = a+b; if (y1 < 0) y2 = 0; else y2 = y1;
always_comb begin //always @ (a,b) y1 = a+b; flag = (y1 < 0); y2 = flag?0:y1; end
- To achieve the status of single assignment code, every variable may only be assigned once
- We may need to convert code to an equivalent single-assignment code to understand its underlying structure. To do this introduce additional variables when variables are assignment more than once
Original Code 1:
y = a+b; s = y+c; y = y+s*s;
Single Assignment Code 1:
y1 = a+b; s = y1+c; y2 = y1+s*s;
Original Code 2:
y = a+b; if (y < 0) y = 0;
Single Assignment Code 2:
y1 = a+b; flag = (y1 < 0); y2 = flag?0:y1;
10. Externally formed feedback loops
If a system produces its own input, an external loop is formed.
<div class=grid> <div class=left>
always_comb begin u = u & a; v= v & b; end
- Some synthesis tools might produce an error and halt synthesis, but the SystemVerilog standard only states that
- Software tools should perform additional checks to warn if the behavior within an alwayscomb procedure does not represent combinational logic, such as if latched behavior can be inferred.
(At the time of this note, Vivado exhibited no difference in behavior for always_comb vs. always @(*) in this code)
An advanced code linter can find the issue:
11. Combinational Loop detection with Single Assignment Code Analysis
- In this exercise,
- let the inputs to the block be indexed with 0
- start assignments with index 1
always @(*) begin //a,b, and use-before-assignment y u = v & a; v = u & b; end
becomes
always @(*) begin //a0,b0, and use-before-assignment v0 u1 = v0 & a0; v1 = u1 & b0; end
- v0 is created nowhere, indicating use-before-assignment of v
- In the context of the use, it would become obvious that there is no external source for y0 and that its source in the original code is what is now called y1.
- Providing an external register to exist in the loop would be acceptable, though that would be fundamentally different code.
- In the context of the use, it would become obvious that there is no external source for y0 and that its source in the original code is what is now called y1.
</div> </div>
Combinational loops can be formed among multiple blocks, and this is to be avoided as well.
always @(*) begin: process_u u = y & a; end always @(*) begin: process_y y= u & b; end
12. Verilog Synthesis: Feedback (data dependency loops)
It is important to be able to identify data dependency loops. Unresolvable loops cannot be implemented with combinational hardware.
Example 1
always @ (a,b) begin y = 1; y = y&a; y = y&b; end
- No feedback after substitutions
Example 2
always @ (a,y) begin y = ~(y&a); end
- Unresolveable Feedback
Example 3
always @ (a,b,yA,yB) begin yA = ~(yB&a); yB = ~(yA&b); end
- Feedback
Example 4
always @ (posedge clk) begin y = ~(y&a); end
- This clearly does not attempt to describe combinatorial hardware it uses an edge-selective trigger to describe sequential hardware. A register `y` is inserted by a synthesizer, but alongside other blocks, the danger is non-deterministic pre-synthesis simulation. This code should use non-blocking assignment.
13. A test for data dependency loops
- A conceptual test to understand if procedural code is implementing strictly combinatorial logic is to set every combinatorial result assigned to x at the entry point in the code, representing an erasure of any previous result lingering in the simulation
If the behavior would change in even one case in even one way, then the code was not strictly implementing combinatorial logic
always @ (a,y) begin y=1’bx; //** y = ~(y&a); end
always @(a,b,yA,yB) begin yA=1’bx; yB=1’bx; //** yA = ~(yB&a); yB = ~(yA&b); end
always @(a,b) begin y=1’bx; //** y = 0; y = y&a; y = y&b; end
always @(a,b,yA,yB) begin yA=1’bx; yB=1’bx; //** yA = ~(yB&a); yB = ~(yA&b); end
always @(a,b) begin p=1’bx; y=1’bx; //** if (a) p=1; y = ~(p&b); end
always@(a,b,c,s,y) begin p=1’bx; y=1’bx; //** if (s) begin p = (a&b); y = ~(yA|c); end else begin y = a|b; end end
- Remember, in pre-synthesis simulation you will see 1'bx as a result, but through synthesis the possibility of 1'bx is instead interpreted as a "don't care".
- If your code already achieved complete assignment without the initialization to 1'bx, then the combinatorial erasure should have no affect.
- In Simulation, 'x indicates either undetermined values or explicit assignment to 'x
- When a synthesizer analyzes code and determines that the code explicitly assigns 'x, it interprets it as a don't care and may optimizes according to an implementation cost function.
14. A test for data dependency loops in mixed comb. and seq. blocks
- The combinational erasure test works for internal combinatorial values computed in edge-selective-trigger blocks.
- Setting combinatorial values to
'xat code entry may help identify issues in pre-synthesis simulation but it should not be kept in released code where a don't care for synthesis is not intended. - In the following code, can you describe a scenario in which the erasure has an effect?
Examples Without combinatorial erasure
always @ (posedge clk) begin y <= _y; //** _y = ~(a&b); //** end
always @ (posedge clk) begin _y = ~(a&b); //** y <= _y; //** end
Examples With combinational erasure
always @ (posedge clk) begin _y = 1’bx; y <= _y; _y = ~(a&b); end
always @ (posedge clk) begin _y = 1’bx; _y = ~(a&b); y <= _y; end
15. Synthesized Cycle Iteration Loops
module littleloop(input clk,input a,input b,output reg y); always @ (posedge clk) begin: label reg _y; y <= _y; _y = ~(a&b); end endmodule
module littlecorrection(input clk,input a,input b,output reg y); always @ (posedge clk) begin: label reg _y; _y = ~(a&b); y <= _y; end endmodule
Module with Enable:
- The following modules produce the same result, with feedback from the register. Some give an explicit name to the net attached to the register data input .
module enableV1(input clk,input en,input d,output reg y); always @ (posedge clk) begin: label reg _y; if (en) begin _y=d; end else begin _y=y; end y <= _y; end endmodule
module enableV2(input clk,input en,input d,output reg y); always @ (posedge clk) begin: label reg _y; if (en) begin _y=d; end y <= _y; end endmodule
module enableV3(input clk,input en,input d,output reg y); always @ (posedge clk) begin: label if (en) begin y<=d; end end endmodule
16. Combinatorial Synthesis: Loops
We will classify traditionally coded loops in procedural code by how they are expected to be interpreted by synthesizers.
- Static Loops: Number if iterations defined at compile time.
- Could directly perform finite loop unrolling
- Often synthesizers cannot convert non-static loops to combinatorial circuit.
- Dynamic-loops are those for which determination of the number of loop iterations required is determined at run-time, usually because the number of loops is determined by a run-time variable.
- In the example below, the condition that is checked before every iteration is dependent on assignments within the body of the loop.
Furthermore, the multiple data movements are problematic
While-like Loop:
temp=datain; count =0; for (index=0;|temp;index=index+1) begin if temp[0]==1 (count=count +1) begin temp = temp>>1; end end
The previous code can be rewritten to have well-defined static loop count and no implied data movement. The constant `8` sets the statically defined loop count.
count =0; for(index=0;index<8;index=index+1) begin if temp[index]==1 (count=count +1); end
The logic function is not fundamentally excluded from mapping to cominational hardware, as seen in the next example. So, while this code might not be synthesizable by every synthesizer today, others might be able to analyze the code and interpret the combinational hardware function successfully. Also what a given synthesizer can't do today, it may be able to do tomorrow. Your synthesizer should provide documentation on what constructs are allowed and it is worth checking yearly.
17. Synthesis: Feedback (data dependency loops)
Registered output logic (mix comb and seq.) should be separated to understand the dependencies. New variables may be introduced to denote the difference in signals before and after a register.
module counter(input clk, output reg [3:0] count); parameter CNT_MAX=11; always @ (posedge clk) begin if (count == CNT_MAX) count <= 0; else count <= count + 1; end endmodule
Feedback is perhaps unclear here. See rewrite below.
module counter(input clk, output reg [3:0] count); parameter CNT_MAX=11; reg [3:0] count_prereg; always @ (posedge clk) begin count <= count_prereg; end always @ * begin: comb1 reg flag; flag = (count == CNT_MAX); count_prereg = flag?0:(count+1); //could also have written if else using flag condition end endmodule
- Feedback across clock cycles is acceptable.
- No feedback in combinational partition.
18. Synthesis: keyword disable
- The keyword
disablemay be used to implement a “break” from a loop. Consider this not yet covered and avoid for now if without further study.
19. Sequential Synthesis: Multi-Cycle Loops
Note that you may be able describe a sequential circuit with non-static loops, but it is common to be NOT SUPPORTED by synthesizers. The following suggests a multi-cycle (multiple clock periods) operation.
"while" loop with iteration synced to a clock:
count=0; for(index=0;|temp;index=index+1) begin @(posedge clk); if temp[0]==1 (count=count +1) temp>>1; end
20. Synthesis: Multi-cycle Operations
- This is a forward note, we'll learn about this more later in the course.
It is typical to employ multi-cycle operations to reduce hardware through resource sharing (reuse of hardware in difference clock cycles) and reduce the critical path lengths.
In the land of simulation one can model such behavior as follows, though it is usually not supported for synthesis
always @ (posedge clk) begin temp=a*b; @(posege clk) y=temp*c; end
- We’ll want to understand how to implement multi-cycle operations using state machines to create variations based on the number of resources (e.g. multipliers, registers) affordable or desired, as factors like clock-cycle latency and throughput, power, clock freq., etc…
Multi-cycle computations and rarely supported constructs above are shown here for completeness. Later lectures will formally introduce methods for such descriptions. Do not attempt to use these in synthesizable code without further study.
21. Generate Construct
A structural “for loop”: For….Generate
- Uses a special indexing variable defined using genvar. Within a generate block indicated using keyword generate
Use for repetitive structural instantiations
genvar index; generate for (index=0; index < 8; index=index+1) begin: gen_code_label BUFR BUFR_inst ( .O(clk_o[index]), // Clock buffer output .CE(ce), // Clock enable input .CLR(clear), // Clock buffer reset input .I(clk_i[index]) // Clock buffer input ); end endgenerate
For…Generate example: Adder
An adder is a good example since it involves a repetitive structure with interconnection between instantiations (*use for param)
Study the details of the carefully crafted index in the carry out of each instance
assign cin[0]=1'b0; genvar index; generate for (index=0; index < 8; index=index+1) begin: some_hierarchical_label adder adder_inst ( .cin(cin[index]), .a(a[index]), .b(b[index]), .cout(cin[index+1]), .y(u[index]) ); end endgenerate
In SystemVerilog the generate and endgenerate keywords are optional.
Under many use cases, the keywords can be easily inferred by the interpreter, such as when the for loop utilizes a genvar variable, and thus the additional keywords are superfluous.
if statements, case-items, and for-loops,can be provided explicit hierarchical labels by providing a named the begin...end block , such as some_hierarchical_label in the example.
Variables and instances defined within can be addresses with the hierarchy <parent>.<some_hierarchical_label>.<children> or in the case of for loops <parent>.some_some_hierarchical_label[<generation_index>].<children>
At least in SystemVerilog, names for conditional statements are implied for unnamed if blocks if involving simple named parameter checking
module top; parameter test_bench_only = 0; genvar i; if (test_bench_only) logic a; // top.test_bench_only.a else logic b; // top.test_bench_only.b
22. Concluding Points
- Combinatorial dependency loops cannot be synthesized
- Static for loops can be synthesized by being unrolled
- Dynamic for Loops may not be synthesizable
- Dynamic for Loops with timing control may be synthesized as a “multicycle operation” or a state machine.
- Repetitive/Patterned Structural Instantiations may be done with for…generate loops
- We'll want to formalize multi-cycle operations as state machines.
23. Interesting Synthesis Example of Control Signals and Data
A conditional addition such as
if temp[0]==1'b1 (count=count +1);
are often synthesized with an adder, as in
count=count+temp[0];
simplifying the circuity.
24. Example Exercise and Discussion on Static/Dynamic Loop Determination
Does the for loop below represent a dynamic or static loop?
module myModule(a,... input [4:0] a; reg [29:0] thermocode; integer i; always ... begin thermocode=0; for(i=0; i<a;i++) begin thermocode[i]=1'b1; end end .. endmodule
Hint1: Start by trying to unroll the loop iterations
thermocode[0]=1'b1 thermocode[1]=1'b1 thermocode[2]=1'b1 thermocode[3]=1'b1 ..where to stop?
Where to stop is not obvious until a is known, which only happens at "run-time"
How to make the loop static?
Proposal 1: set a to a fixed value?
- this would make the loop static but since a is an input from another module we want to keep the input
Next Hint: Can you predetermine the possible range of i, which is based on the range of a?
Responses:
- 230
- 0 to 31
- [0,a)
- 25
Lets try to unroll the loop starting with the first iteration i=0.
i=0 involves setting thermocode[0]
When is thermocode[0] set to 1?
Ans: when a>0, resulting in thermocode[0]=a>0
thermocode[0]=a>0 thermocode[1]=a>1 thermocode[2]=a>2 thermocode[3]=a>3 thermocode[4]=a>4 thermocode[5]=a>5 thermocode[6]=a>6 thermocode[7]=a>7 thermocode[8]=a>8 thermocode[9]=a>9 thermocode[10]=a>10 ... thermocode[20]=a>20 thermocode[21]=a>21 thermocode[22]=a>22 thermocode[23]=a>23 thermocode[24]=a>24 thermocode[25]=a>25 thermocode[26]=a>26 thermocode[27]=a>27 thermocode[28]=a>28 thermocode[29]=a>29 //thermocode[30]=a>30 xx thermocode is only 30-bit //thermocode[31]=a>31 xx thermocode is only 30-bit
We can realize that though a can go up to 29, the thermocode only includes 30 bits
Now, lets shorthand this long set of repetitive code as a static loop:
for(i=0; i<30;i++) begin thermocode[i]=a>i; end
So, should the original code be considered "synthesizable" or not?
Answer: depends on the synthesizer. The context in which the loop exists (here the context involves a bounded a) may be of importance, but may or may not be taken into account by the synthesizer.