FSM II
Ryan Robucci
1. Finite State Machine (FSM)
- Characterized by
- A set of states
- A set of inputs and outputs
- A state transition function
- An output function
- Hardware Implementation
- Current State held in a register
- Any additional status information held in status register
- Next-State and State Control Logic Determines next state and control signals based on registers
- Datapath implements operations on data under control of sate control logic
2. Finite State Machine with Datapath (FSMD)
A very common framework being described and implemented is a Finite State Machine with a Datapath: a data path controlled by signals from a FSM
3. Software and Hardware \(^\circ\)
- customized hardware is fast
- control logic is difficult to describe
- consider:
- software for control and
- hardware for calculations.
- general purpose processor can be used for software
4. Divide and Conquer
- Datapath operations can be encoded within the same procedural code as the state machine description or can be built separately.
- First consider partitioning.
- Datapath: Identify elements in the datapath from experience with
traditional digital systems (e.g. communications modules, arithmetic
modules, multiplexer's and demultiplexers, registers and multi-word
buffers, FIFOs, IP Cores etc…).
- Control: Identify the control signals required and the status/condition information required to make decisions on the control.
- Datapath: Identify elements in the datapath from experience with
traditional digital systems (e.g. communications modules, arithmetic
modules, multiplexer's and demultiplexers, registers and multi-word
buffers, FIFOs, IP Cores etc…).
- Datapath operations can be encoded in the state machine description or
can be build separately.
- If pre-built modules are used, as is common, the datapath is necessarily a separate description.
- Often the datapath represents the algorithm calculations, separating the datapath code makes the code more readable
- Coding datapath separately can allow more explicit influence over and insight into the resources used for computation while separating details of control code
- Can sometimes think of datapath as implementing instructions for the controller to invoke/call
- Separating the datapath code can allow better reuse under a different scheduling of operations (implementing a different algorithm)
- In general, designing the datapath first and then the control is a good strategy
Controller (FSM)
- Irregular Structure
- Very well-suited for FSM-style description, difficult to describe with (boolean) expression-based
- Registers represent the familiar sense of the state of the system
Datapath
- Regular Structure
- Often easy to describe with expressions or structurally build with blocks
- Registers represent algorithmic states (intermediate or partial results)
5. State Machine Coding: Transition Coverage
- the center of focus of a state machine design is arguably the states
- the focus of state machine coding is arguably the state transitions
- code for transitions can be organized/grouped by
- source/current state
- destination/next state
- code for transitions can be organized/grouped by
- for testing:
- transition coverage is central in statemachine testing
- path coverage with reference to the primary state register is also important when extended state registers or interactions with external components with states are present
6. State Machine with Numerous Branching and Merging
- example find borg|car|cat|bat|bot|bet|bit and output \(\rm done\) flag where \(\rm done\) is a registered output
- \(\rm done\) flag logic cannot be coded directly based on CS in each final state case
- registered outputs generally coded once per transition (line of code per edge), though the use of leading defaults may save some lines of code
... always @ (posedge clk) begin //defaults CS<=failed; found<=0; done<=0; case (CS) ... recieved_a: if (input =='r') begin CS<=recieved_r_for_car; done<=1; found<=1; end else if (input =='t') begin CS<=recieved_t; done<=1; found<=1; end else begin //CS<=failed; //covered leading default done<=1; found<=0; end ...
- Alternative "3-block style" with output logic coded with use of next state NS
begin CS<=failed; found<=0; done<=0; //defaults case (NS) recieved_t: begin done<=1;found<=1; end recieved_r_for_car:begin done<=1;found<=1; end recieved_g: begin done<=1;found<=1; end failed: begin done<=1;found<=0; end default: begin done<=0; end ...
7. Algorithmic Statemachines
case(CS) S0: begin dx <= x1-x0; busy<= 1; end S1: begin dx <= y1-y0; end S2: begin dxSq <= dx*dx; end S3: begin dySq <= dy*dy; end S4: begin dsq <= dxSq+dySq; busy<= 0; end endcase
- Encoding datapath operations WITHIN in the statemachine description, along with the controller rather than in a separate code block, is common
- It is common to see "algorithmic" statemachines described with control and computation embedded in the same procedural block. These can be modules which perform complex computations over multiple cycles and require internal working registers/memory
- We'll first focus on control state machines first, with an emphasis on timing and external status and control signals then discuss computational statemachines
8. Data-Flow Graph and Datapath Hardware
- Data-Flow Graphs represents dependencies among operations in the process of an arithmetic algorithm (more compact than a full state diagram).
- A algorithmic state machine performs one or more operations in a state (i.e. clock cycle) while satisfying the required order dependencies from the graph
- Exercise: Draw the Datapath and Identify Status and Control as well as Pre-Register data names
case(CS) S0: begin dx <= x1-x0; busy<= 1; end S1: begin dx <= y1-y0; end S2: begin dxSq <= dx*dx; end S3: begin dySq <= dy*dy; end S4: begin dsq <= dxSq+dySq; busy<= 0; end endcase
(in-class drawing)
9. Extended State Registers and Code Organization
- At times we'll want to include an extra register to store information
that we don't want to code as part of our primary state register.
- Examples:
- partial results in the process of a multi-cycle computation
- saved results to provide at the output ports at a later time
- status flags for events that should be remembered and used in later processing and state decisions
- event counters
- timing counters, to control timing without creating additional states
- Examples:
- These extended state registers are not necessarily represented by the number of drawn states state transition diagram or the primary coded state register, but formally they ARE part of the state of the system
- Note the explicit state register and the additional extended state
registers
- Note that the output and status can be based one or more of the input, NS, and other state registers.
10. Code Organization (transition coverage)
- Logic based on only the current state groups updates based on the current/source state, e.g. describes all output independently of transition away from the current state
case (CS) S0: begin flag=0; end //unconditional, combinational output in S0 S1: begin flag=1; end
- Including the input signals makes the output/update input-dependent
case (CS) S0: begin if (in==0) flag=0; //conditional, combinational output while in S0, // updates immediately with input change else flag=1; //conditional, combinational output while in S0, // updates immediately with input change end S1: begin flag=1; end
- Including NS as a dependency selects edges based on the destination state. Note that NS itself may depend on the input. This is useful if complex parts of the output logic have already been capture in the NS logic description.
case (CS) S0: begin if (NS==S3) flag=0; //conditional, combinational output while in S0, // updates immediately with input change through immediate NS change else flag=1; //conditional, combinational output while in S0, // updates immediately with input change through immediate NS change end S1: begin flag=1; end
- Using a root case statement based on NS tends to organize updates based on the destination state (transitions into state). This is useful for registered output logic.
case (NS) S0: begin flag=0; end // combinational output, update seen immediately // with input update that selects destination state S0 S1: begin flag=1; end
case (NS) S0: begin if (CS==S3) flag=0; // combinational output, update applied immediately // with input update that selects destination state S0 // applies only if current state is S3 else flag=1; end S1: begin flag=1; end
Using NS useful for
- registered output logic to organize code so that assignments coincide with state symbols according to timing coincidence
always @ (posedge clk) begin case (NS) S0: begin flag<=0; end //glitch-free update when CS becomes S0 S1: begin flag<=1; end //glitch-free update when CS becomes S1
- whenever current combinational outputs are correlated to the next state
//S_found should be high whenenver the next state is S_found and should be asserted until the state has been fully exited always_comb begin if (NS==S_found) begin found=1; else found=0; end
//S_found should be high whenenver the next state is S_found and should be asserted until the state has been fully exited always_comb begin if (NS==S_found || CS=S_found) begin found=1; else found=0; end
11. Registered Output for Partitioning and Timing
One of the advantages of registered output design is in design partitioning and addressing the timing of a design within each partition.
Registered Ouput Modules:
Assembled Registered Ouput Modules:
- Length of Combinatorial Paths are better matched to those in the partitions
Irregular Modules:
Assembled Irregular Modules:
- Length of paths is determined by factors across partitions
- Logic Optimizer and designer therefore have different perspectives of the design
- Output path has under-specified timing constraint that may need to account for delays of output
Improper assembly with combinational outputs can create combinational loops, whereas having all module output be registers avoids this.
12. Module Interaction
counter
always_ff @ (posedge clk) begin counter<=counter+1; end
counter time-unraveled
counter with reset, reset at time count==17
Reference Counter with synchronous reset
always_ff @ (posedge clk) begin if (reset == 1) begin counter<=0; end else begin counter<=counter+1; end end
Desired Timeline:
counter with controller used to implement counting 0-17:
always_comb begin if (counter == 17) begin reset=1; end else begin reset=0; end end
always_ff @ (posedge clk) begin if (reset == 1) begin counter<=0; end else begin counter<=counter+1; end end
incorrect implementation and integration of controller with registered outputs:
always_ff @ (posedge clk) begin if (counter == 17) begin reset<=1; end else begin reset<=0; end end
always_ff @ (posedge clk) begin if (reset == 1) begin counter<=0; end else begin counter<=counter+1; end end
hack using prediction
- hack changing 17 to 16 implies that in the control FSM there is some predictive model of system 1's transition to the actionable state
- prediction is not always feasible
always_ff @ (posedge clk) begin if (counter == 17-1) begin //HACK: -1 to predict counter value reset<=1; end else begin reset<=0; end end
always_ff @ (posedge clk) begin if (reset == 1) begin counter<=0; end else begin counter<=counter+1; end end
use of combinatorial interface to provide in-cycle control decision
always_ff @ (posedge clk) begin if (counter_next == 17) begin reset<=1; end else begin reset<=0; end end
always_comb begin if (reset == 1) begin counter_next = 0; end else begin counter_next<=counter+1; end end always_ff @ (posedge clk) begin counter<=counter_next; end
BAD: Combinational logic FEEDBACK
combinational logic pathways from input to output can lead to combinational logic loops
always_comb begin if (counter_next == 17) begin reset=1; end else begin reset=0; end end
always_comb begin if (reset == 1) begin counter_next = 0; end else begin counter_next<=counter+1; end end always_ff @ (posedge clk) begin counter<=counter_next; end
integration as one fsm with registered outputs
bit reset_q; //bit can only be 1/0, avoids intial X in simulation assign pulse=reset; //OUTPUT LOGIC //the counter and the reset define the state since reset is not just used as output always_ff @ (posedge clk) begin automatic [4:0] counter_next; if (reset == 1) begin counter_next = 0; end else begin counter_next=counter+1; end if (counter_next == 17) begin reset<=1; counter<=0; end else begin reset<=0; counter<=counter_next; end end
13. Common Interface (AXI-like)
- the primary may assert a request until it is acknowledged
- commonly, if a request and an ack are asserted in the same cycle, then it is assume that the data is consumed
- for reading (e.g. results) the master may wait until a result is ready to read a response
- a separate flag like done or data valid (presented for consumption) is common for this
14. "For Loop" State Machine
- Extended-state registers can help implement loop behaviors
Example: create outer loop with 10 iterations and inner loop with 5 iterations
Drawing Possible Hardware
Realizations of i,j registers and support hardware
- After examination of the state diagram, three required behaviors are desired: \(\rm Hold\), \(\rm Increment\) , \(\rm Load\ 0\)
- A primary FSM will generate the control signals
- Note \(\rm i\) would be the output of the register, the output from the register could be called (\(\rm i\_next\) , \(\rm \_i\), \(\rm i\_comb\), \(\rm i\_int\), \(\rm i\_prereg\))
class discussion: timing diagram
Combinatorial vs Sequential Pitfalls
- Wrongly using the output of a register instead of the input can be a pitfall when using single-always-block (registered outputs) and extended state registers
Wrong Code State D
SA: i<=0; ... SD: i <= i+1; if (i<10) CS<=SA; else CS<=SE; ….
Correct Code State D
SD: next_i = i+1; if (next_i<10) CS<=SA; else CS<=SE; i <= next_i; ….
Tweaking Question ( a question asked in class)
- Could one just change i<10 to i<9?
Wrong Code State D
SA: i<=0; ... SD: i <= i+1; if (i<10) CS<=SA; else CS<=SE; ….
Tweaked / "Hacked"
SA: i<=0; ... SD: i <= i+1; if (i<9) CS<=SA; else CS<=SE; ….
- Perhaps yes, but the reasoning is important and better presentation of coder's intent should be considered. This may produce the correct hardware, but it is an improper presentation of coder's design intent.
- Lets say you are working on a class HW project – if you make the wrong version, run a simulation or debug in FPGA hardware, notice that one extra loop is performed and "tweak" the code to "just make it work" without understand the options, than that would not be a good reason. You want to be cognizant of and understand the different choices, then you have the knowledge and understanding to select the most appropriate implementation. Furthermore, at some point your code will be very large and systems will be complex – expecting to make many such tweaks is not a reasonable approach. You want to aim for as much to be correct as possible the first time.
15. Multi-Cycle Computations as FSMs
- In the following slides, we will consider examples of algorithmic FSMs. These are used to describe and synthesize multi-cycle computations. The number of state transitions defined are typically less than with a complex controller FSM.
- A more regular, forward, orderly progression through the states is
typical, and the designer's thought processes may be focused on the
computation being performed in each state and the resulting partial
results (in registers), as opposed to the output during the state.
- Thus, it becomes more reasonable to use a single edge-triggered always block in the design. The fewer number of transition decisions (branches) makes the concern of code bloat negligible.
16. Algorithmic State Machine Diagram
- quick reference:
- https://www.amd.com/content/dam/amd/en/documents/university/vivado-teaching/hdl-design/2015x/Verilog/docs-pdf/lab11.pdf
- if that link doesn't work: https://www.google.com/search?q=amd+asm+chart+lab
- old link:https://www.xilinx.com/content/dam/xilinx/support/documents/university/Vivado-Teaching/HDL-Design/2015x/Verilog/docs-pdf/lab11.pdf
An quick introduction to informal ASM Diagrams ** ASM Chart Elements
- State Box
- State Name (required)
- State Encoding (optional)
- Moore Machine Output Updates (combinational logic)
- Conditional Rounded Box
- Mealy Machine Output Updates (optional, combinational logic)
- Register Updates (optional, denoted with <=) non-standard addition to ASM charts
- in formal ASM diagrams, these would be represented by the combination of a register enable being asserted and data presented to the register
- internal/automatic these are another non-standard addition to ASM charts
- denoted with leading underscore_, blocking assignment
- default assumed to be don't care for branches not specified
- Decision Diamond
- Test Condition
- Exit Paths
- Two for
if - two or more for
case
- Two for
- General Notes:
- Well-defined initial state, such as single entry point with actionless path to one initial state
- Multiple exit paths may exist
- Note that registers may only be updated once per clock cycle,
- Improper to represent multiple updates in one cycle
- Commonly outputs are written just by name (rather than out=1), and it is assumed the unspecified outputs default to 0
- State Box
Example Euclid GCD:
function gcd(a, b) while (a != b) if a > b a = a − b else b = b − a return a
FSM Optimizations
- As we are designing Multi-Cycle computations, we may consider two
optimizations:
- Rescheduling – moving internal operations to other states
- Resource sharing – utilizing same component in multiple states
- The following examples based on or borrowed from \(\dagger\)\(\dagger\) Thomas&Moorby
Module Header
module FSM_opt( output reg [7:0] f, input clk, input wire [7:0] i, input wire [7:0] j, input wire [7:0] k, input rst ); reg [7:0] CS; reg [7:0] i_int,j_int,k_int,p,m; localparam S_0 = 8'b00000000; localparam S_1 = 8'b00000001; localparam S_2 = 8'b00000010;
always @ (posedge clk) begin if (rst) begin CS<=S_0; f<=0; end else begin case(CS) S_0: begin i_int<=i; j_int<=j; k_int<=k; CS<=S_1; end S_1: begin p<=i_int+j_int; CS<=S_2; end S_2: begin m=p*j_int; f<=m*k_int; CS<=S_0; end endcase end end endmodule
- Multi-cycle computation: Note, the single-block style here was used since the state transition sequence is straightforward. Also, the data path is combined with the controller. We are not worried about "code bloat" from having to code each output on every transition. Instead of concentrating on timing coincidence of state and outputs vs coding coincidence of state and outputs, we are focused on the coincidence of states and computations. In this single block style the computation performed and resources required for each each are clear, though the corresponding output of each computation is seen and can be used on the cycle following the corresponding state indicated in the state register. In other words, we are interested in what update to registers is being computed and prepared in each state.
Rescheduling
- Rescheduling this multiply allows for faster clock rates ( assuming two clock cycles were required at the system level). Some synthesizers may do similar types of rescheduling for you.
(Cycle) Rescheduling
always @ (posedge clk) begin if (rst) begin CS<=S_0; f<=0; end else begin case(CS) S_0: begin i_int<=i; j_int<=j; k_int<=k; CS<=S_1; end S_1: begin _p=i_int+j_int; m<=_p*j_int; //**to here** CS<=S_2; end S_2: begin //m=i_int*j_int; //**move from here** f<=m*k_int; CS<=S_0; end endcase end end endmodule
reference
reschedule
- Some synthesizers may do similar types of rescheduling for you. Xilinx tools command to perform a "retiming": https://www.xilinx.com/support/answers/65410.html
Tools: Retiming
- it is important to be able to reschedule manually
- but it is also important to know that tools may perform this operation automatically
- to support it, the userguide may suggest to simply insert extra registers (such as at the output) and enable register retiming
- Retiming in Vivado Synthesis: https://adaptivesupport.amd.com/s/article/934201?language=en_US
- since retiming does not change the behavior, it can even be performed after RTL synthesis on a low-level netlist during optimization for place and route
- Xilinx allows for global timing, as well as more specific custom attributes:
(* retiming_forward = 1 *) reg my_sig;RETIMINGFORWARD attribute instructs the tool to move a register forward through logic closer to the driven sequential elements(*retiming_backward = 1 *) reg my_sig;RETIMINGBACKWARD attribute instructs the tool to move a register backward through logic closer to the sequential driving elements- https://docs.amd.com/r/en-US/ug901-vivado-synthesis/RETIMING_BACKWARD
Multicycle Path Timing Constraint (Forward Discussion) \(^\circ\)
In the following version, all compuation is performed in the S2 cycle.
always @ (posedge clk) begin if (rst) begin CS<=S_0; f<=0; end else begin case(CS) S_0: begin i_int<=i; j_int<=j; k_int<=k; CS<=S_1; end S_1: begin CS<=S_2; end S_2: begin p=i_int+j_int; m=p*j_int; f<=m*k_int; CS<=S_0; end endcase end end endmodule
In this specific case we can provided a relaxed timing constraint for the paths from the output of the registers for i,j,k to the input of f. We will later learn about manipulation constraints for the synthesizer to allow for multi-cycle paths, but for now we assume that all combinational chains must complete work within a clock cycle.
17. Resource Sharing
Reference Computation Module for discussion
Head
module FSM_opt( output reg [7:0] f, output reg [7:0] g, input clk, input wire [7:0] i, input wire [7:0] j, input wire [7:0] k, input rst ); reg [7:0] CS; reg [7:0] i_int,j_int,k_int; localparam S_0 = 8'b00000000; localparam S_1 = 8'b00000001; localparam S_2 = 8'b00000010;
Body
always @ (posedge clk) begin if (rst) begin CS<=S_0; f<=0; h<=0; end else begin case(CS) S_0: begin i_int<=i; j_int<=j; k_int<=k; CS<=S_1; end S_1: begin CS<=S_2; end S_2: begin f<=i_int*j_int; h<=j_int*k_int; CS<=S_0; end endcase end end endmodule
Suggested RTL
Alternative RTL with lower gate count:
always @ (posedge clk) begin if (rst) begin f<=0; g<=0; CS<=S_0; end else begin case(CS) S_0: begin i_int<=i; j_int<=j; k_int<=k; CS<=S_1; end S_1: begin CS<=S_2; end S_2: begin f<=i_int*j_int; //** g<=j_int*k_int; CS<=S_0; end endcase end end endmodule
Explicitly coding the rescheduling so that only one multiply is performed per cycle is simple and seen in the next code. However, this doesn't ensure resource sharing as shown in the figure with one a single multiplier.
Alt. Module Body
always @ (posedge clk) begin if (rst) begin f<=0; g<=0; CS<=S_0; end else begin case(CS) S_0: begin i_int<=i; j_int<=j; k_int<=k; CS<=S_1; end S_1: begin f_int<=i_int*j_int; //**moved to here**// CS<=S_2; end S_2: begin f<=f_int; //*timed output load*// g<=k_int*j_int; CS<=S_0; end endcase end end endmodule
- Suggesting Resource Sharing guides the synthesizer to reuse hardware for operations at multiple places in the code
- The coding method depends on the synthesizer tool: the following code is more likely interpreted as resource sharing since the multiplier result is always written to the same variable
module fsm(f,g,i,j,k,rst,clk); input [7:0] i,j,k; output reg [7:0] f,g; input rst,clk; reg [3:0] CS; localparam S_0=0,S_1=1,S_2=2; always @ (posedge clk) begin:named reg [7:0] i_int,j_int,k_int,f_int; reg [7:0] mult_out; mult_out = 8'bx; if (rst) begin f<=0; g<=0; CS<=S_0; end else begin case(CS) S_0: begin i_int<=i; j_int<=j; k_int<=k; CS<=S_1; end S_1: begin mult_out = i_int*j_int; f_int <= mult_out; CS<=S_2; end S_2: begin f <= f_int; mult_out = k_int*j_int; g <= mult_out; CS<=S_0; end endcase end end endmodule
Another option is to manually extract the multiplier out of the edge-triggered code and write datapath logic explicitly for the multiplier paired with a mux along with a state machine to change the inputs
Multiplier input controlled using an explicit mux in the datapath:
module fsm(f,g,i,j,k,rst,clk); input [7:0] i,j,k; output reg [7:0] f,g; input rst,clk; reg [3:0] CS; localparam S_0=0,S_1=1,S_2=2; wire mult_a_sel; wire [7:0] mult_out,mult_a,mult_b; assign mult_out=mult_a * mult_b; assign mult_a=mult_a_sel?k_int:i_int; assign mult_b=j_int; always @ (posedge clk) begin:named1 i_int<=_i_int; j_int<=_j_int; f_int<=_k_int; f_int<=_f_int; g<=_g; f<=_f; CS<=NS; end always_comb begin:named2 reg [7:0] i_int,j_int,k_int,f_int; if (rst) begin _i_int=8'bx; _j_int=8'bx; _k_int=8'bx; _f_int=8'bx; _g=0; _f=0; NS=S_0; mult_a_sel=1'bx; end else begin _i_int=i_int; _j_int=j_int; _k_int=i_int; _f_int=f_int; _g=g; _f=f; mult_a_sel=1'bx; case(CS) S_0: begin _i_int=i; _j_int=j; _k_int=k; NS=S_1; end S_1: begin mult_a_sel=0; _f_int=mult_out; NS=S_2; end S_2: begin _f=f_int; mult_a_sel=1; _g=mult_out; NS=S_0; end endcase end end endmodule
- Another variation with the mux logic embedded within the state machine
Multiplier input controlled using an implied mux in the datapath, requires that data to be passed through the statemachine description block:
module fsm(f,g,i,j,k,rst,clk); input [7:0] i,j,k; output reg [7:0] f,g; input rst,clk; reg [3:0] CS; localparam S_0=0,S_1=1,S_2=2; reg mult_a_sel; //** now reg instead of wire wire [7:0] mult_out,mult_a,mult_b; assign mult_out=mult_a * mult_b; assign mult_b=j_int; always @ (posedge clk) begin:named1 i_int<=_i_int; j_int<=_j_int; f_int<=_k_int; f_int<=_f_int; g<=_g; f<=_f; CS<=NS; end always_comb begin:named2 reg [7:0] i_int,j_int,k_int,f_int; if (rst) begin _i_int=8'bx; _j_int=8'bx; _k_int=8'bx; _f_int=8'bx; _g=0; _f=0; NS=S_0; mult_a=1'bx; end else begin _i_int=i_int; _j_int=j_int; _k_int=i_int; _f_int=f_int; _g=g; _f=f; mult_a_sel=1'bx; case(CS) S_0: begin _i_int=i; _j_int=j; _k_int=k; NS=S_1; end S_1: begin mult_a=i_int; _f_int=mult_out; NS=S_2; end S_2: begin _f=f_int; mult_a=k_int; _g=mult_out; NS=S_0; end endcase end end endmodule
18. More Rescheduling Examples
Another Rescheduling Example
… always @ (posedge clk) … case (CS) S_0:begin q<=r+s; CS<=S_1; end S_1:begin CS<=S_2; end S_2:begin qout<=q+5; CS<=S_0; end …
… always @ (posedge clk) … case (CS) S_0:begin q<=r+s; CS<=S_1; end S_1:begin q<=q+5; CS<=S_2; end S_2:begin qout<=q; CS<=S_0; end …
Yet Another Rescheduling Example
Code Provided to the Synthesizer
input [7:0] i,j,k; output [7:0] f,h; reg [7:0] f,g,h,q,r,s; always @ (posedge clk) … case (CS) S_0:begin r<=i*2; s<=j+j; end S_1:begin f<=i+j; g<=j*23; CS<=S_1; end S_2:begin h<=f+k; CS<=S_2; end S_3:begin f<=f*g; q<=r*s; CS<=S_0; end ... …
- FSM Synthesizers can automatically try variations maintain input and
output timing:
Movement of q=r*s using a new temporary register qint:
S_0:begin r<=i+i; s<=j*2; end S_1:begin f<=i+j; g <=j*23; CS<=S_1; end S_2:begin h<=f+k; q_int<=r*s; CS<=S_2; end //** S_3:begin f<=f*g; q <=q_int; CS<=S_0; end //**
Movement of f*g: using an exisinting working register g:
... S_0:begin r<=i+i; s<=j*2; end S_1:begin f<=i+j; g<=j*23; CS<=S_1; end S_2:begin h<=f+k; g<=f*g; CS<=S_2; end //** S_3:begin f<=g; q<=r*s; CS<=S_0; end //** ...
❌ Example of (Possibly) Invalid rearrangement for i+j+k
- captures input k in state S1 instead of S2 and vice versa for input j
- unless the optimizer tool or designer can determine from elsewhere in the design that input i and j do not change in those
time cycles, input timing must be maintained just like output updates
... case (CS) S_0:begin r<=i+i; s<=j*2; end S_1:begin f<=i+k; g<=j*23; CS<=S_1; end //** S_2:begin h<=f+j; CS<=S_2; end //** S_3:begin f<=f*g; q<=r*s; CS<=S_0; end ...
19. State Encoding
- The state encoding effects the size of the decoder, speed, dependent logic optimization, etc.
- Contents of the section below (aside from tables) are directly quoted or minimally modifed from Xilinx documents http://www.xilinx.com/support/documentation/sw_manuals/xilinx11/xst.pdf (old) https://www.xilinx.com/support/documents/sw_manuals/xilinx2022_2/ug901-vivado-synthesis.pdf
- Be familiar with Auto, One-hot, Gray, Johnson, User
State Encodings Vivado
Vivado reports encoding in logs:
INFO: [Synth 8-802] inferred FSM for state register 'state_reg' in module 'fsm_test' INFO: [Synth 8-3354] encoded FSM with state register 'state_reg' using encoding 'sequential' in module 'fsm_test
- Encodings
- Auto: In this mode, XST tries to select the best suited encoding algorithm for each FSM.
One-Hot State Encoding
- Is the default encoding scheme for a state machine, up to 32 states.
- Is usually a good choice for optimizing speed or reducing power dissipation.
- Assigns a distinct bit of code to each FSM state.
- Implements the State Register with one flip-flop for each state.
- In a given clock cycle during operation, only one bit of the State Register is asserted.
- Only two bits toggle during a transition between two states
000000010000001000000100… 0100000010000000Gray State Encoding
- Guarantees that only one bit switches between two consecutive states.
- Is appropriate for controllers exhibiting long paths without branching.
- Minimizes hazards and glitches.
- Can be used to minimize power dissipation.
0000000000000001000000110000001000000110… Johnson:
- Johnson State encoding is beneficial when using state machines containing long paths with no branching (as in Gray State Encoding).
00000000000000010000001100000111… 11111111111111101111110011111000… 10000000-> rollover to all zeros- Sequential:
- Identifies long paths
- Applies successive radix two codes to the states on these paths.
- Minimizes next state equations.
- User:
- In this mode, XST uses original encoding, specified in the HDL file. For example, if you use enumerated types for a state register, then in addition you can use the ENUMENCODING constraint to assign a specific binary value to each state. Please refer to "Design Constraints" chapter for more details.
- ROM (FSM) Synthesis (RAM-Based Finite State Machine):
- Large Finite State Machine (FSM) components can be made more compact and faster by implementing them in the block RAM resources provided in Virtex® devices and later technologies. FSM Style (FSMSTYLE) directs XST to use block RAM resources for FSMs.
20. Unreachable State Check
- One of the benefits of extracting FSMs, is that additional analysis and design checks are available: "XST can detect unreachable states in an FSM. It lists them in the log file in the HDL Synthesis step."
21. Safe Finite State Machine (FSM) Implementation
- XST can add logic to your Finite State Machine (FSM) implementation that will let your state machine recover from an invalid state. If during its execution, a state machine enters an invalid state, the logic added by XST will bring it back to a known state, called a recovery state. This is known as Safe Implementation mode.
- By default, XST automatically selects a reset state as the recovery state. …
- This feature is useful in system susceptible to corruption or as a way to handle undefined power-on initialization
22. Digital Encoding Key Points
- One-hot encoding is good for speed and simplicity of state decoding
logic to control datapath, and next-state logic is simple state transition
- easy to detect invalid state
- More compact codes such as standard binary encoding generally
require a smaller state state register than one-hot encoding at the
possible cost of size and speed.
- But this depends on the density of combinatorial logic vs. registers
the supporting HW platform and in the design.
- FPGAs have many registers and so the cost of additional combinatorial logic may large compared to the savings from needing less registers.
- But this depends on the density of combinatorial logic vs. registers
the supporting HW platform and in the design.
- Codes where only one or two bits change at a time in the state
register may be beneficial.
- Less transitions may lead to less power depending on overall design, e.g. if stage register output connects to a high-capacitance load (high-fanout)
- Can minimize the chance of metastability errors (such as systems
with tight timing, or radiation vulnerability).
- These codes may also minimize logic glitches.
23. FSM Report Log Example
- The XST log file reports the full information of recognized Finite State Machine (FSM) components during the Macro Recognition step. Moreover, if you allow XST to choose the best encoding algorithm for your FSMs, it reports the one it chose for each FSM. As soon as encoding is selected, XST reports the original and final FSM encoding. If the target is an FPGA device, XST reports this encoding at the HDL Synthesis step. If the target is a CPLD device, then XST reports this encoding at the Low Level Optimization step.
Log File Example
Synthesizing Unit <fsm_1>. Related source file is "/state_machines_1.vhd". Found finite state machine <FSM_0> for signal <state>. ------------------------------------------------------ | States | 4 | | Transitions | 5 | | Inputs | 1 | | Outputs | 4 | | Clock | clk (rising_edge) | | Reset | reset (positive) | | Reset type | asynchronous | | Reset State | s1 | | Power Up State | s1 | | Encoding | automatic | | Implementation | LUT | ------------------------------------------------------ Found 1-bit register for signal <outp>. Summary: inferred 1 Finite State Machine(s). inferred 1 D-type flip-flop(s). Unit <fsm_1> synthesized. ======================================================== HDL Synthesis Report Macro Statistics # Registers : 1 {#registers--1 } 1-bit register : 1 ======================================================== ======================================================== * Advanced HDL Synthesis * ======================================================== Advanced Registered AddSub inference ... Analyzing FSM <FSM_0> for best encoding. Optimizing FSM <state/FSM_0> on signal <state[1:2]> with gray encoding. ------------------- State | Encoding ------------------- s1 | 00 s2 | 01 s3 | 11 s4 | 10 ------------------- ======================================================= HDL Synthesis Report Macro Statistics # FSMs : 1 {#fsms--1 } =======================================================
24. Constraints/Compiler Settings and Directives
- Most synthesizers support various instructions to modify synthesis behavior
- Many options can be applied globally or on a per module instance or even per block level.
- tools
- constraints files,
- projects settings
- Additional Synthesis Options and Directives for Xilinx: https://docs.amd.com/r/en-US/ug901-vivado-synthesis/Using-Synthesis-Settings
- some as a command-line options
- custom verilog attributes (syntax for attributes is standard)
\(* fsm_encoding = "none" *\) reg [1:0] state
- inline via non-standard syntax within comments (aka meta-comments)
The comment below is an example of a synthesizer directive / inline-constraint:
casex select // synthesis full_case 4'b1xxx: res = data1; 4'bx1xx: res = data2; 4'bxx1x: res = data3; 4'bxxx1: res = data4;
25. FSM Extraction
- tools may not detect and process state machines by default, read manual for synthesizer tools
- for some tools, an explicit command is available
- Xilinx FSM Constraints
- Xilinx Synthesis Constraints for state machines
- Related constraints are:
- fsmextract
- determines if state machines are detected/extracted
- http://www.xilinx.com/itp/xilinx7/books/data/docs/cgd/cgd0093_54.html
- fsmencoding
- can set state encoding globally or per-instance
- http://www.xilinx.com/itp/xilinx7/books/data/docs/cgd/cgd0092_53.html
- fsmfftype
- use D or toggle flip flops for state register
- http://www.xilinx.com/itp/xilinx4/data/docs/cgd/f8.html
- enumencoding
- sets encoding when fsmextract is used to select user
- http://www.xilinx.com/itp/xilinx4/data/docs/cgd/e3.html
- fsmextract
26. Should a default case always be provided?
You should not automatically add default case to synthesize a FSM since the logic has to cover many unnecessary states and can create considerable extra logic
- Check documentation for your synthesizer's behavior
- Special effort may be required to have the synthesizer ignore the default case yet allow logging in simulation: http://www.trilobyte.com/pdf/golson_snug94.pdf:
// synopsys translate_off default: \$display(“He’s dead, Jim.”) ; // synopsys translate_on
27. Example state machine code if time allows
Example FFT Data Flow HW solution depicting extended state registers and state repetition
https://eclipse.umbc.edu/robucci/cmpe415/attachments/hw6/hw6.pdf
https://eclipse.umbc.edu/robucci/cmpe415/attachments/hw6/fft_state_machine.v