Lecture 08 – FSMD

Ryan Robucci

1. State Machines Basics

Mealy Machine

gen-mealy.svg

Moore Machine

gen-moore.svg

2. References

3. Implementation of Mealy and Moore Machines

20240326190932.png

4. State-Machine Implementation

20240326191122.png

  • All Finite State Machines can be partitioned into a combinatorial and sequential parts
  • Often these two pieces are coded in separate code blocks, but they may also be combined sometimes
  • Note: some simple register-specific operations such a synchronous reset are coded in either the combination or the sequential blocks, but are often coded in the sequential block to indicate selection of register configuration. An asynchronous clear should be part of the sequential block.

5. State-Machine Verilog Implementation


module FSM (clk, clear, x,y);

input logic clk, clear;
input logic [2:0] x;
output logic [1:0] y;

// Declare the symbolic names for states
          //Verilog: localparam [6:0] S1 = 7'b0000001,S2 = 7'b0000010, S7 = 7'b1000000;
  typedef enum int {S0,S1,S2} state_t;



// Declare current state and next state variables
  //Verilog: reg [2:0] CS,NS;
  state_t CS,NS;


always_ff @ (posedge clk or posedge clear) begin
  if (clear == 1'b1) CS <= S0;
  else CS <= NS;
end

always_comb begin //always @ (CS, x)
  case (CS)
    S0 : begin
         y = 2'b00;
         if (x[2] && ~x[1] && x[0])
           NS = S1;
         else if (x[2] && x[1] && ~x[0])
           NS = S2;
         else
           NS = S0;
    end
    S1 : begin
         y = 2'b10;

    ...
end

6. Mealy Implementation


module FSM (clk, clear, x,y);

input logic clk, clear;
input logic [2:0] x;
output logic [1:0] y;

// States
typedef enum int {S0,S1,S2} state_t;


// Declare current state and next state variables
state_t CS,NS;


always_ff @ (posedge CLOCK or posedge clear) begin
  if (clear == 1'b1) CS <= S1;
  else CS <= NS;
end

always_comb begin //always @ (CS, x)
  case (CS)
    S1 : begin
         y = {x[2] , ~x[1] && x[0]};
         if (x[2] && ~x[1] && x[0])
           NS = S2;
         else if (x[2] && x[1] && ~x[0])
           NS = S4;
         else
           NS = S1;
    end
    S2 : begin
         y = 2'b10;

    ...



  • every output bit must be well-defined for every case and input to avoid latches
  • for NS, every output bit is set in every case to avoid latches

7. SystemVerilog FSM State Register

reg [7:0] CS,NS; //reg!=register
localparam S_init    = 8'b00000000;
localparam S_start0 = 8'b00000001;
localparam S_start1 = 8'b00000010;

replaced by

enum int {S_init, S_start0,S_start1} state_t;
state_t CS,NS;
  • meaningful enum symbols displayed in waveform viewers and loggers
  • Symbol accessed as string through member function:

    $strobe("State %s %0d",CS.name(),CS);
    
  • use of enum int and avoiding explicit encodings avoid confusion of automatic encoding which would replace values provided with localparam

8. "Registered Output Moore" and "Registered Output Mealy" Machines

🛰️ Registered Output FSM

  • Good for describing common synchronous digital logic building blocks like counters with an output that is updated once per cycle
  • Predictable System Timing
    • Consistent Design and Build Unit with combinatorial delay and terminated with registers that output once per cycle
    • consistent with design stages of pipeline/RTL
    • Standardizes delay constraints for blocks (avoid creating long combinatorial paths through multiple blocks)
  • Avoid Glitches
    • critical for generating edge-sensitive control signals
    • may achieve lower-power operation by avoiding charge/discharge cycles especially when driving off-chip loads
  • Mitigate Metastability
    • unfortunate timing downstream, causing invalid voltage levels or late transitions
  • Adds clk cycle delay to outputs

reg_output20240326194726.png

  • Additionally, code can reuse previous outputs for future output and state updates, meaning the new registers are part of the state of the system
  • In code, you can update outputs as needed in a given state and retain values by default if not assigned

20240326194805.png

  • ✏️ Note Formal Size of State Considering any output logic register to be part of the state (serving as "extended state register"), a "Registered Output Mealy" machine is technically a Moore Machine

9. Single Always Block Style for Registered Output Logic (Mealy)

always_ff @ (posedge CLOCK or posedge CLEAR) begin: FSM
  logic _temp;

  if (CLEAR == 1'b1) begin
    CS <= S1;
  end else begin 
    case (CS) 
      S1 : begin 
        _temp = ~x[1];                     //intermediate logic (blocking assignment)
        y  <= {x[2] , _temp & x[0]};       //register intent (non-blocking assignment)
        if (x[2] && _temp && x[0])  
          CS <= S2;                        //register intent (non-blocking assignment)
        else if (x[2] && x[1] && ~x[0]) 
          CS <= S4; 
        else 
          CS <= S1;
      end
      S2 : begin 
          y  <= {x[1] && ~x[1] , x[0]};   
         . 
         . 
         . 
  • Embedded combination circuit signal assignments ended with blocking assignment and thus encode "immediate" effect. All consumers of blocking assignment results should be within this code block. Even within the block, no consumer should rely on a value assigned from a blocking statement in a previous trigger event.
  • Remember, any variable written to inside an edge-triggered block can become a register regardless of the use of blocking or non-blocking…consider every output variable, combinational and sequential, in every case and branch of decision tree and make sure assignments are always made to avoid latches

10. Coding Style for State Machines

🛰️ Registered Output and Three-Always Block

  • Registered output can cause “cycle delays” so some output transitions need to be coded along with the state transition into a state rather than with the state they are supposed to coincide with (discuss output on prev. slide)
  • Sometimes this feels like you’re coding outputs in the “previous state” or coding output ahead of time to account for register delay. I refer to it as coding the output along with the state transition it coincides with. This leads to additional lines of code as you need to code each output logic possibly for every transition to a state rather than once per state
  • To avoid this “code bloat”, yet another approach is to code the registered outputs in a separate block. This leads to three blocks:
    • Combinatorial Next State Logic along with any combinatorial outputs
    • Sequential State Register
    • Sequential Registered Outputs according to the destination (next) state from the combinatorial Block.
  • Good contrasting examples can be found here: http://www.sunburst-design.com/papers/CummingsSNUG2003SJ_SystemVerilogFSM.pdf

11. "Registered Output Moore" and "Registered Output Mealy" Machines using Three Always Blocks

Avoids delays while allowing registered outputs to be coded with the state they are intended to coincided with, by coding Next State and Output for Next state together.

20240326195614.png

  • Primary State Register

    always(posedge clk) begin
       if reset …
       else CS<=NS;
    end
    
  • Next State Logic and Any combinatorial outputs

    always_comb begin
     NS=CS;   /*NS is result of combinatorial logic */   
     case(CS)
       S_init: begin 
         if (go==1 && selAB==0) NS=S_startA;
         ...
       end;
       S_startA: begin
         NS=S_init; 
       end;
    …
    
  • Next State Registered Output Logic Coded Once Per State Rather than once per transition
    • May optionally include Transition-specific output rules based on both CS and NS

      always(posedge clk) begin
         case(NS)
         S_init: begin 
           goA<=0;
         end;
         S_startA: begin
           goA<=1;
         end;
      …
      end
      

12. Example State-Machine: An Arbitrator

Block Diagram

20240326195849.png

13. State Diagram

  • Assume slave need a start signal for two clock cycles
  • Using more states simplifies output logic

20240326195939.png

  • Using fewer states requires more output logic and, in the code shown effectively embeds extended state information in additional registers

20240326195950.png

14. Registered Outputs With Single Always Block

module control_with_state_machine2(
   input logic clk,
   input logic rst,
   output logic startA, //start signal to slave A
   output logic startB, //start signal to slave B
   input logic busyA_n,     //busy signal from slave A
   input logic busyB_n,     //busy signal from slave B
   output logic ready,      //ready signal to master
   input logic selAnB       //go signal from master
   input logic start        //go signal from master
);

enum int {S_init, S_start0,S_start1} state_t
state_t CS,NS;

assign ready = ~(selAnB?busyB_n:busyA_n); // extra combinational logic on the side


always_ff @ (posedge clk) begin
  if (rst == 1)
    CS<=S_init;
  else
    case (CS)
    S_init: begin
      if (start == 1) begin
        startA <= ~selAnB;
        startB <= selAnB;
        CS<=S_start0;
      end else begin
        startA <= 0;
        startB <= 0;
        CS<=S_init;
      end
    end
    S_start0: begin
        startA <= startA;
        startB <= startB;
        CS<=S_start1;
    end
    S_start1: begin
        startA <= 0;
        startB <= 0;
        CS<=S_init;
    end
  endcase
end
endmodule

15. Timing Diagram

Place holder for in-class timing diagram. Key concepts: * Identification of implied combinatorial signals including register inputs (outputs and state (NS)) * Arrival of input changes versus response timing * Coincidence of Code and Output Timing

timing-diagram-fsm.svg

16. Registered Outputs 3-Always-Block Style

module control_with_state_machine2(
   input logic clk,
   input logic rst,
   output logic startA, //start signal to slave A
   output logic startB, //start signal to slave B
   input logic busyA_n,     //busy signal from slave A
   input logic busyB_n,     //busy signal from slave B
   output logic ready,      //ready signal to master
   input logic selAnB       //go signal from master
   input logic start        //go signal from master
);

enum int {S_init, S_start0,S_start1} state_t
state_t CS,NS;

assign ready = ~(selAnB?busyB_n:busyA_n); // extra combinational logic outside state machine

  always_ff @ (posedge clk) begin
    if (rst == 1)
      CS<=S_init;
    else
      CS<=NS;
  end

  always_comb begin
     NS = CS;
     case (CS)
    S_init: begin
       if (go) begin
          NS = S_start0;
       end
    end
    S_start0:
      NS = S_start1;
    S_start1:
      NS = S_init;
      endcase
  end

  always_ff @ (posedge clk) begin
     case (NS)
       S_init:
         startA <= 0;   
         startB <= 0;   
       S_start0:
         startA <= ~selAnB;
         startB <= selAnB;
       S_start1:
         startA <= startA;  
         startB <= startB;  
      endcase
  end


endmodule
  • also need to consider appropriate resets and defaults logic

17. Finite State Machine (FSM)

  • Characterized by
    • A set of states
    • A set of inputs and outputs
    • A state transition function
    • An output function
  • Hardware Implementation
    • Current State held in a register
    • Any additional status information held in status register
    • Next-State and State Control Logic Determines next state and control signals based on registers
    • Datapath implements operations on data under control of sate control logic

18. Finite State Machine with Datapath (FSMD)

  • A very common framework being described and implemented is a Finite State Machine with a Datapath: a data path controlled by signals from a FSM

001.png

19. Software and Hardware \(^\circ\)

  • If available customizable hardware is fast, and control logic is difficult to describe, a good mix can be software for control and hardware for calculations. We will see later approaches that use a general purpose processor for control.

gen-fsmd.svg

<!– ## 4 Rules of Proper FSMD \(^\circ\)

  • The relate to describing a piece of hardware (or software) in a modeling language which is software.

    • 1 Neither registers nor signals can be assigned more than once during a clock cycle (covered in our Verilog code rules by the one-block assignment rule)
    • 2 No circular definitions exist between wires (i.e. no combinatorial logic loops)
    • 3 If a signal is used as an operand of an expression, it must have a known value in the same clock cycle
    • 4 All datapath outputs must be defined (assigned) during all clock cycles (in some cases a DontCare may be allowed)

    –>

<!– ## Hardware/Software Partitioning \(^\circ\)

  • For a general application, hardware is best for timing-critical (especially simultaneous processes and triggering events) while software is flexible and good for implementing algorithms with high complexity – in the sense of [Kolmogorov complexity](https://en.wikipedia.org/wiki/Kolmogorov_complexity), the length of the code to implement the algorithm to produce some output given some inputs.
  • Remember, timing-critical can refer to predictability of timing, not just how fast it can go (a real-time system is a system with timing guarantees) –>

<!– ## Control and Datapath Partitioning

  • First consider partitioning.
    • Identify elements in the datapath from experience with traditional digital systems (e.g. communications modules, arithmetic modules, multiplexor's and

demultiplexors, registers and multi-word buffers, FIFOs, IP Cores etc…).

  • Identify the control signals required and the status/condition information required to make decisions on the control.
  • Datapath operations can be encoded in the state machine description or can be build separately.
    • If pre-built modules are used, as is common, the datapath is necessarily a separate description.
    • Often the datapath represents the algorithm calculations, separating the datapath code makes the code more readable
    • Coding datapath separately can allow more explicit influence over and insight into the resources used for computation while separating details of control code
    • Can sometimes think of datapath as implementing instructions for the controller to invoke/call
    • Separating the datapath code can allow better reuse under a different scheduling of operations (implementing a different algorithm)
    • In general, designing the datapath first and then the control is a good strategy –>

<!– ## Temporal Processes of statemachine

<div align=right> <img align=right src=./media/002.png>ꭝ </div>

<div style="font-size:18pt">

  • Just after clock edge, state variables are updated. For controller this means that a state-transition is complete and state registers holds the new current state. For the data path, this means that the register variables are updated as a result of assigning (algebraic, boolean, logical, etc…) expressions to them.
  • Controller FSM Combines the control state and the data-path state to evaluate the new next state for the, at the same time it will also select what instructions should be executed by the datapath
  • The datapath FSM will evaluate the next-state for the state variables in the datapath, using the updated datapath state as well as the instructions received from the control FSM
  • Just before the next clock edge, both the control FSM and the datapath FSM have evaluated and prepared the next-state value for the control state as well as the datapath state

</div> –>

20. Divide and Conquer

  • Datapath operations can be encoded within the same procedural code as the state machine description or can be built separately.
  • First consider partitioning.
    • Datapath: Identify elements in the datapath from experience with traditional digital systems (e.g. communications modules, arithmetic modules, multiplexer's and demultiplexers, registers and multi-word buffers, FIFOs, IP Cores etc…).
    • Control: Identify the control signals required and the status/condition information required to make decisions on the control.
  • Datapath operations can be encoded in the state machine description or can be build separately.
    • If pre-built modules are used, as is common, the datapath is necessarily a separate description.
    • Often the datapath represents the algorithm calculations, separating the datapath code makes the code more readable
    • Coding datapath separately can allow more explicit influence over and insight into the resources used for computation while separating details of control code
    • Can sometimes think of datapath as implementing instructions for the controller to invoke/call
    • Separating the datapath code can allow better reuse under a different scheduling of operations (implementing a different algorithm)
    • In general, designing the datapath first and then the control is a good strategy

Controller (FSM)

  • Irregular Structure
  • Very well-suited for FSM-style description, difficult to describe with (boolean) expression-based
  • Registers represent the familiar sense of the state of the system

Datapath

  • Regular Structure
  • Often easy to describe with expressions or structurally build with blocks
  • Registers represent algorithmic states (intermediate or partial results)

21. State Machine with Numerous Branching and Merging

  • example find borg|car|cat|bat|bot|bet|bit and output \(\rm done\) flag where \(\rm done\) is a registered output

gen-word-fsm.svg

  • \(\rm done\) flag logic cannot be coded directly based on CS in each final state case
  • registered outputs in general must coded once per transition, though the existence of one common default may save some lines of code
always @ (posedge clk) begin

//defaults
CS<=failed;
found<=0; done<=0;

case (CS)
  ...
  recieved_a: if (input =='r') begin
                CS<=recieved_r_for_car;
                done<=1; found<=1;
              end else if (input =='t') begin
                CS<=recieved_t;
                done<=1; found<=1;
              end else begin
                //CS<=failed;  //covered by default
                done<=1; found<=0;
              end
  ...
  • Alternative "3-block style" with output logic coded with use of next state NS
begin
  CS<=failed; 
  found<=0; done<=0; //defaults
  case (NS)
    recieved_t:        begin done<=1;found<=1; end
    recieved_r_for_car:begin done<=1;found<=1; end
    recieved_g:        begin done<=1;found<=1; end
    failed:            begin done<=1;found<=0; end
    default:           begin done<=0; end
  ...

22. Algorithmic Statemachines

004.png

case(CS)
  S0: begin
     dx <= x1-x0;
     busy<= 1;
  end
  S1: begin
     dx <= y1-y0;
  end
  S2: begin
     dxSq <= dx*dx;
  end
  S3: begin
     dySq <= dy*dy;
  end
  S4: begin
     dsq <= dxSq+dySq;
     busy<= 0;
  end
endcase

  • Encoding datapath operations WITHIN in the statemachine description, along with the controller rather than in a separate code block, is common
  • It is common to see "algorithmic" statemachines described with control and computation embedded in the same procedural block. These can be modules which perform complex computations over multiple cycles and require internal working registers/memory
  • We'll first focus on control state machines first, with an emphasis on timing and external status and control signals then discuss computational statemachines

23. Data-Flow Graph and Datapath Hardware

2019-10-31-09-43-21.png

  • Data-Flow Graphs represents dependencies among operations in the process of an arithmetic algorithm (more compact than a full state diagram).
  • A algorithmic state machine performs one or more operations in a state (i.e. clock cycle) while satisfying the required order dependencies from the graph
  • Exercise: Draw the Datapath and Identify Status and Control as well as Pre-Register data names

004.png

case(CS)
  S0: begin
     dx <= x1-x0;
     busy<= 1;
  end
  S1: begin
     dx <= y1-y0;
  end
  S2: begin
     dxSq <= dx*dx;
  end
  S3: begin
     dySq <= dy*dy;
  end
  S4: begin
     dsq <= dxSq+dySq;
     busy<= 0;
  end
endcase

005.png (in-class drawing)

24. Extended State Registers

  • At times we'll want to include an extra register to store information that we don't want to code as part of our primary state register.
    • Examples:
      • partial results in the process of a multi-cycle computation
      • saved results to provide at the output ports at a later time
      • status flags for events that should be remembered and used in later processing and state decisions
      • event counters
      • timing counters, to control timing without creating additional states
  • These extended state registers are not necessarily represented by the number of drawn states state transition diagram or the primary coded state register, but formally they ARE part of the state of the system
  • Note the explicit state register and the additional extended state registers
    • Note that the output and status can be based one or more of the input, NS, and other state registers.

fsm-is-hardware.svg

  • Logic based on only the current state groups updates based on the current/source state, e.g. describes all output independently of transition away from the current state
case (CS)
  S0: begin flag=0; end //unconditional, combinational output in S0
  S1: begin flag=1; end
  • Including the input signals makes the output/update also input-dependant
case (CS)
  S0: begin 
    if (in==0) 
      flag=0;   //conditional, combinational output while in S0,
                //  updates immediately with input change
    else
      flag=1;   //conditional, combinational output while in S0,
                //  updates immediately with input change
    end 
  S1: begin flag=1; end
  • Including NS as a dependency selects edges based on the destination state. Note that NS itself may depend on the input. This is useful if complex parts of the output logic have already been capture in the NS logic description.
case (CS)
  S0: begin 
    if (NS==S3) 
      flag=0;   //conditional, combinational output while in S0,
                //  updates immediately with input change through immediate NS change
    else
      flag=1;   //conditional, combinational output while in S0,
                //  updates immediately with input change through immediate NS change
    end 
  S1: begin flag=1; end
  • Using a root case statement based on NS tends to organize updates based on the destination state (transitions into state). This is useful for registered output logic.
case (NS)
  S0: begin flag=0; end // combinational output, update seen immediately
                        // with input update that selects destination state S0
  S1: begin flag=1; end 
case (NS)
  S0: begin 
    if (CS==S3)   
      flag=0;          // combinational output, update applied immediately
                       // with input update that selects destination state S0
                       // applies only if current state is S3  
    else
      flag=1;
    end 
  S1: begin flag=1; end
  • Using NS useful for registered output logic.

    always @ (posedge clk) begin
      case (NS)
        S0: begin flag<=0; end //glitch-free update when CS becomes S0
        S1: begin flag<=1; end //glitch-free update when CS becomes S1
    

25. Registered Output for Partitioning and Timing

One of the advantages of registered output design is in design partitioning and addressing the timing of a design within each partition.

Registered Ouput Modules:

gen-registered-output-modules.svg

Assembled Registered Ouput Modules: * Length of Combinatorial Paths are matched to those in the partitions

gen-regular-assembled.svg

Irregular Modules:

modules-irregular.svg

Assembled Irregular Modules:

  • Length of paths is determined by factors across partitions
  • Logic Optimizer and designer therefore have different perspectives of the design
  • Output path has under-specified timing constraint that may need to account for delays of output

irregular-assembled.svg

  • Improper assembly with combinational outputs can create combinational loops, whereas having all module output be registers avoids this.

    assembled-comb-loops-bad.svg

26. Module Interaction

  • counter: temporal_fsm_Page 8.svg
  • counter time unraveled: temporal_fsm_Page 1.svg
  • counter with reset, used to count 0-17:
    temporal_fsm_Page 2.svg
  • controller used to implement counting 0-17: temporal_fsm_Page 3.svg
  • incorrect implementation and integration of controller with registered outputs:
    temporal_fsm_Page 4.svg
  • hack using prediction:
    • hack changing 17 to 16 implies that in the control FSM there is some predictive model of system 1's transition to the actionable state
    • not all systems interfaced and controlled are so easilty predictable
      temporal_fsm_Page 6.svg
  • use of combinatorial interface to provide in-cycle control decision temporal_fsm_Page 5.svg
  • integration as one fsm with registered outputs temporal_fsm_Page 7.svg

27. Control FSM Cycle Timing

  • Control FSM are commonly required to implement timing based on cycles, reading of status signals in particular cycles, and generating control signals at the appropriate time

28. Fast and Slow Systems

Many issues will arise in using systems that operate at different speeds, or don't operate in a predetermined fixed timing

fast-slow-0.svg

fast-slow-1.svg

Most generally

  • the master may assert a request until it is acknowledged
  • commonly, if a request and an ack are asserted in the same cycle, then it is assume that the data is consumed

timing-diagram-fast-slow-0.svg

  • for reading (e.g. results) the master may wait until a result is ready to read a response
    • a separate flag like done or data valid (presented for consumption) is common for this

timing-diagram-fast-slow-1.svg

29. Waiting – Conditional and Unconditional

  • It is common to require one or both of
    • conditional waiting (wait for condition)
    • unconditional waiting
  • These can be used to implement
    • Fixed waiting
      • wait a predetermined number of cycles
      • no condition
    • Minimum wait (Fixed+Conditional)
      • wait for some fix time and
      • then wait for a condition based on external input to be satisfied. This is useful when interfacing with external "slow" entities that need time after being signaled to send back a response.

Implementation of Waits

Some Options:

  • Add “top” level wait states to state machine in each place needed
  • Use a counter
    • a) Use external counter ( implement as an external state machine or component) and interface to it
    • b) embed something like a counter in the coding of the state machine, thus creating substates using the counter as an extended state register.
  • Create a single programmable wait state to jump to and return to from multiple states using a “jump” register extended state register

Unconditional Delay Using Extra States

006.png

Unconditional Delay Using Extended State Register Variable : Counter

007.png

Minimal Pause: Delay + Conditional Exit Using An Extra State (unconditional delay +conditional exit)

008.png

009.png

Minimum Wait: Delay + Conditional Exit Using Extended State Register Variable Counter

010.png

30. Programmable Wait State

Explicit Top-Level States vs Programmable State

  • Explicit

    011.png

  • Programmable using embedded state registers

    012.png

31. Thought Question

  • Is it better to grow the state register or use a separate counter variable?
    • no simple universal answer
  • Specific Example Delay/Pause/Waiting:
    • Think about the delay examples given earlier using a counter. The counter was implementing one form of hierarchical states.
    • Consider waiting for an acknowledgment signal for \(2^{20}\) clock cycles. This would require ~1 Millon states with the condition to move to the next state or proceed to end state if acknowledge was received.

32. "For Loop" State Machine

  • Extended-state registers can help implement loop behaviors
  • Example: create outer loop with 10 iterations and inner loop with 5 iterations

    013.png

33. Drawing Possible Hardware

Realizations of i,j registers and support hardware

  • After examination of the state diagram, three required behaviors are desired: \(\rm Hold\), \(\rm Increment\) , \(\rm Load\ 0\)
  • A primary FSM will generate the control signals

014.png

20240401112216.png

20240401112251.png

  • Note \(\rm i\) would be the output of the register, the output from the register could be called (\(\rm i\_next\) , \(\rm \_i\), \(\rm i\_comb\), \(\rm i\_int\), \(\rm i\_prereg\))

34. class discussion: timing diagram

017.png

35. Combinatorial vs Sequential Pitfalls

  • Wrongly using the output of a register instead of the input can be a pitfall when using single-always-block (registered outputs) and extended state registers

Wrong Code State D

SA:
  i<=0; ...
SD:
   i <= i+1;
   if (i<10) CS<=SA;
    else CS<=SE;
   ….

Correct Code State D

SD:
  next_i = i+1;
  if (next_i<10) CS<=SA;
  else      CS<=SE;
  i <= next_i;
   ….

36. Tweaking Question ( a question asked in class)

  • Could one just change i<10 to i<9?

Wrong Code State D

SA:
  i<=0; ...
SD:
   i <= i+1;
   if (i<10) CS<=SA; 
   else CS<=SE;
    ….

Tweaked / "Hacked"

SA:
  i<=0; ...
SD:
   i <= i+1;
   if (i<9) CS<=SA; 
   else CS<=SE;
    ….

  • Perhaps yes, but the reasoning is important and better presentation of coder's intent should be considered. This may produce the correct hardware, but it is an improper presentation of coder's design intent.
  • Lets say you are working on a class HW project – if you make the wrong version, run a simulation or debug in FPGA hardware, notice that one extra loop is performed and "tweak" the code to "just make it work" without understand the options, than that would not be a good reason. You want to be cognizant of and understand the different choices, then you have the knowledge and understanding to select the most appropriate implementation. Furthermore, at some point your code will be very large and systems will be complex – expecting to make many such tweaks is not a reasonable approach. You want to aim for as much to be correct as possible the first time.

37. Detecting Signal Change in Single-Clock-Domain Synchronous Logic

  • When looking for a change on an signal, avoid a careless temptation to "detect" edges using edge specifiers if it is not warranted to create a new clock domain. Ex:
always @ (posedge e)
  e_counter <= e_counter + 1;
  • Consider saving a previous version of the signal in a register, and using both present and previous input values to detect a transition:

20240401115019.png

always @ (posedge clk) begin
  if (~e_prev & e) e_counter <= e_counter + 1; 
  e_prev<=e;
end

which is the same as

always @ (posedge clk)begin
  e_prev<=e;
  if (~e_prev & e) e_counter <= e_counter + 1; 
end

One Reaction per Transition

one-action.svg

019.png

  • A slave device commonly starts processes based on a start signal from a master. A slave process may be fast or slow compared to a master. For instance, an instruction processor acting as a master controlling a slave through general I/O ports. Driven by software, it the processor may require many clock cycles to respond (software bit-banging and handshaking can be slow).
  • Considering a slow master applying a command signal, yet requiring many clock cycles to respond to a handshaking signal from a slave. The slave might falsely initiate a second round of activity if the command is asserted too long.

38. Transition detection with statemachines

change_detection_timing_dia.png

  • A slave state machine can use multiple states to analyze signals and examine changes

Rest State

rest_state.svg

NS=CS;
case (CS)
  WAIT_GO_LOW:
      if (go==0) NS=WAIT_FOR_GO_HIGH;
  WAIT_GO_HIGH:
      if (go==1) begin
        if (command='A') begin
           NS=START;
           working_value=data_in ... //can start some work
        end ...
      end
...
  • this is approach is simple if the specification allows for a rest state where a cycle is always spent waiting for go to return to go==0
  • if there are many possible source states before the direct move to WAIT_FOR_GO_HIGH, then logic must be duplicated

Rest State As-Needed

rest_state_optional.svg

case OPERATION_A: begin
      ...
   if (work_done) begin
      if (go==0) begin
        NS=WAIT_FOR_GO_LOW;
      end else begin
        NS=WAIT_FOR_GO_HIGH;
      end
      ...

using outside logic

Saving the previous input value:

always @ (posedge clk) go_prev<=go;

rest_state_optional_go_fsm.svg

This type of logic may be repeated in many states:

case WAIT_FOR_FRESH_GO:
      if (go==1 && go_prev==0)...

May generate a combinatorial fresh trigger detection to share:

assign go_fresh=go&&~go_prev;

rest_state_optional_go_fresh.svg

  • potential disadvantage is that this distributes control logic: consolidating logic in one FSM may streamline verification purposes

39. Discussion on Modern Handshaking and Stream

(on-board)

40. Limitations of FSM Model

  • FSM typically lack any hierarchy. There is no way to connect details of state-machines leading to state explosion. Otherwise, a hierarchy of states is quite useful for organizing an algorithm.
  • No run-time reconfiguration: The rigid implementation of hardware and the limited representation of FSMs do not make it very flexible model, especially if one considers run-time flexibility (state diagram is fixed and doesn't change based on MODE, etc…)

State Machine State Explosion

022_left.png

023.png

  • Consider two state machines. Maybe one captures user input like desired temperature, it has states, inputs and outputs appropriate to perform that task. Perhaps the other controls a heating element based on measured and desired temperature. Separated, they may be fairly simple, but what happens when they are described as a single, flat state machine?
  • A multiplicative effect in the number of states. Two three-state FSMs became one nine-state FSM
  • This motivates partitioning into multiple state machines in hardware design

Global Exceptions

  • Now, see what happens if a single new condition, a universal exception must be added.
  • A dramatic result from only one new condition. Readability and perhaps feasibility of mentally managing the FSM is severely impacted.

024.png


vs software…

  • A single if statement can be added, taking advantage of the hierarchy of logic embedded in code.

switch(current_state){
 case A: …
 case B: …
 case C: …
 case D: …
}

if (exception){
  do this
}else{
  switch(current_state){
   case A: …
   case B:
   case C:
   case D:
}

Delay/Pause/Waiting with Extended State Register

  • Think about the delay examples given earlier using a counter. The counter was implementing one form of hierarchical states.
  • Consider waiting for an acknowledgment signal for \(2^{20}\)220 clock cycles. This would require ~1 Millon states with the condition to move to the next state or proceed to end state if acknowledge was received.

Runtime Flexibility \({}^{\circ}\)<<>>∘

  • The rigid implementation of hardware and the limited representation of FSMs do not make it very flexible model, especially if one considers run-time flexibility

Microprogrammed Control \({}^{\circ}\)<<>>∘

  • Rigid next-state logic is replaced by a rigid next-address logic with a programmable control store.
  • Complex designs make require using/creating some form of compiler and a custom language

025.png

†Shaumont

General Processor Control \({}^{\circ}\)<<>>∘

  • Next step towards generalization is to use a general purpose processor
    • A number of models exist for doing that, from implementing custom hardware as a slave coprocessor that implements hardware-accelerated instructions (or functions) to treating the custom hardware modules and processor as cooperative peer components.
    • Interfacing is another issue and choices depend on rigidness of interfaces and which should be the master or slave.
      • Common choices:
        • Confirm hardware design to be able attach to processor bus
        • Write a hardware wrapper to attach hardware to system bus
        • Use general IO to interface processor to hardware
      • With respect to the processor, need to decide on interrupt or polling interface. Platforms should provide options for both for all the choices above

Multi-Cycle Computations as FSMs

  • In the following slides, we will consider examples of algorithmic FSMs. These are used to describe and synthesize multi-cycle computations. The number of state transitions defined are typically less than with a complex controller FSM.
  • A more regular, forward, orderly progression through the states is typical, and the designer's thought processes may be focused on the computation being performed in each state and the resulting partial results (in registers), as opposed to the output during the state.
    • Thus, it becomes more reasonable to use a single edge-triggered always block in the design. The fewer number of transition decisions (branches) makes the concern of code bloat negligible.

41. Algorithmic State Machine Diagram

<!--https://www.xilinx.com/content/dam/xilinx/support/documents/university/Vivado-Teaching/HDL-Design/2015x/Verilog/docs-pdf/lab11.pdf–>

An introduction to informal ASM Diagrams ** ASM Chart Elements

  • State Box
    • State Name (required)
    • State Encoding (optional)
    • Register Updates (denoted with <=)
    • Moore Machine Output Updates (combinational logic)
  • Conditional Rounded Box
    • Register Updates (optional, denoted with <=)
    • Mealy Machine Output Updates (optional, combinational logic)
  • Decision Diamond
    • Test Condition
    • Exit Paths
      • Two for if
      • two or more for case
  • General Notes:
    • Well-defined initial state, such as single entry point with actionless path to one initial state
    • Multiple exit paths may exist
    • Note that registers may only be updated once per clock cycle,
      • Improper to represent multiple updates in one cycle
    • Commonly outputs are written just by name (rather than out=1), and it is assumed the unspecified outputs default to 0

Example Euclid GCD:

function gcd(a, b)
    while (a != b) 
        if a > b
            a = a − b
        else
            b = b − a
    return a

gen-alg-fsm-gcd.svg

FSM Optimizations

  • As we are designing Multi-Cycle computations, we may consider two optimizations:
    • Rescheduling – moving internal operations to other states
    • Resource sharing – utilizing same component in multiple states
  • The following examples based on or borrowed from \(\dagger\)\(\dagger\) Thomas&Moorby

Module Header

module FSM_opt( 
    output reg [7:0] f,
    input clk,
    input wire [7:0] i,
    input wire [7:0] j,
    input wire [7:0] k,
    input rst     );

reg [7:0] CS;
reg [7:0] i_int,j_int,k_int,p,m;

localparam S_0 = 8'b00000000;
localparam S_1 = 8'b00000001;
localparam S_2 = 8'b00000010;
always @ (posedge clk) begin
  if (rst) begin
    CS<=S_0;
    f<=0;
  end else begin
    case(CS)
      S_0: begin
                i_int<=i; 
                j_int<=j;
                k_int<=k;
                CS<=S_1;
           end
      S_1: begin
                p<=i_int+j_int;
                CS<=S_2;
           end
      S_2: begin
                m=p*j_int;
                f<=m*k_int;
                CS<=S_0;       
            end
    endcase
  end  
end
endmodule

gen-alg-fsm-0.svg

pipeline1.svg

  • Multi-cycle computation: Note, the single-block style here was used since the state transition sequence is straightforward. Also, the data path is combined with the controller. We are not worried about "code bloat" from having to code each output on every transition. Instead of concentrating on timing coincidence of state and outputs vs coding coincidence of state and outputs, we are focused on the coincidence of states and computations. In this single block style the computation performed and resources required for each each are clear, though the corresponding output of each computation is seen and can be used on the cycle following the corresponding state indicated in the state register. In other words, we are interested in what update to registers is being computed and prepared in each state.

Rescheduling

  • Rescheduling this multiply allows for faster clock rates ( assuming two clock cycles were required at the system level). Some synthesizers may do similar types of rescheduling for you.
  • Reference Computation Module for discussion

Head

module FSM_opt( 
    output reg [7:0] f,
    output reg [7:0] g,
    input clk,
    input wire [7:0] i,
    input wire [7:0] j,
    input wire [7:0] k,
    input rst
    );
reg [7:0] CS;
reg [7:0] i_int,j_int,k_int;

localparam S_0 = 8'b00000000;
localparam S_1 = 8'b00000001;
localparam S_2 = 8'b00000010;

Body

always @ (posedge clk) begin
  if (rst) begin
    CS<=S_0;
    f<=0;
  end else begin
    case(CS)
      S_0: begin 
        i_int<=i;
        j_int<=j;
        k_int<=k;
        CS<=S_1;  end        
      S_1: begin
        CS<=S_2;  end
      S_2: begin 
        f<=i_int*j_int;
        CS<=S_0;  end
    endcase
  end  
end
endmodule

Suggested RTL: 029.png

always @ (posedge clk) begin
  if (rst) begin
    CS<=S_0;
    f<=0;
  end else begin
    case(CS)
      S_0: begin
                i_int<=i; 
                j_int<=j;
                k_int<=k;
                CS<=S_1;             
           end        
      S_1: begin 
                p_int=i_int+j_int;
                m<=p_int*j_int;    //**to here**
                CS<=S_2;
           end
      S_2: begin 
              //m=i_int*j_int;  //**move from here**
                f<=m*k_int; 
                CS<=S_0;       
            end
    endcase
  end  
end
endmodule

reference

gen-alg-fsm-1.svg

reschedule

gen-alg-fsm-2.svg

pipeline1_reschedule.svg

Multicycle Path Timing Constraint (Forward Discussion) \(^\circ\)

In the following version, all compuation is performed in the S2 cycle.

always @ (posedge clk) begin
  if (rst) begin
    CS<=S_0;
    f<=0;
  end else begin
    case(CS)
      S_0: begin
                i_int<=i; 
                j_int<=j;
                k_int<=k;
                CS<=S_1;             
           end        
      S_1: begin 
                CS<=S_2;
           end
      S_2: begin 
                p=i_int+j_int;
                m=p*j_int;
                f<=m*k_int; 
                CS<=S_0;       
            end
    endcase
  end  
end
endmodule

pipeline1_multicyclepath.svg

In this specific case we can provided a relaxed timing constraint for the paths from the output of the registers for i,j,k to the input of f. We will later learn about manipulation constraints for the synthesizer to allow for multi-cycle paths, but for now we assume that all combinational chains must complete work within a clock cycle.

42. Resource Sharing

Reference Computation Module for discussion

Head

module FSM_opt( 
    output reg [7:0] f,
    output reg [7:0] g,
    input clk,
    input wire [7:0] i,
    input wire [7:0] j,
    input wire [7:0] k,
    input rst
    );
reg [7:0] CS;
reg [7:0] i_int,j_int,k_int;

localparam S_0 = 8'b00000000;
localparam S_1 = 8'b00000001;
localparam S_2 = 8'b00000010;

Body

always @ (posedge clk) begin
  if (rst) begin
    CS<=S_0;
    f<=0;
    h<=0;
  end else begin
    case(CS)
      S_0: begin 
        i_int<=i;
        j_int<=j;
        k_int<=k;
        CS<=S_1;  end        
      S_1: begin
        CS<=S_2;  end
      S_2: begin 
        f<=i_int*j_int;
        h<=j_int*k_int; 
        CS<=S_0;  end
    endcase
  end  
end
endmodule

Suggested RTL

029.png

gen-alg-fsm-3.svg

Alternative RTL with lower gate count:

always @ (posedge clk) begin
  if (rst) begin
    f<=0; g<=0;
    CS<=S_0;
  end else begin
    case(CS)
      S_0: begin 
        i_int<=i;
        j_int<=j;
        k_int<=k;
        CS<=S_1;  end        
      S_1: begin

        CS<=S_2;  end
      S_2: begin 
        f<=i_int*j_int; //**
        g<=j_int*k_int; 
        CS<=S_0;  end
    endcase
  end  
end
endmodule

rtl_lower_gate_count.png

Explicitly coding the rescheduling so that only one multiply is performed per cycle is simple and seen in the next code. However, this doesn't ensure resource sharing as shown in the figure with one a single multiplier.

Alt. Module Body

always @ (posedge clk) begin
 if (rst) begin
   f<=0; g<=0;
   CS<=S_0;
 end else begin
  case(CS)
   S_0: begin
    i_int<=i;
    j_int<=j;
    k_int<=k;
    CS<=S_1;  end
   S_1: begin
    f_int<=i_int*j_int; //**moved to here**//
    CS<=S_2;  end
   S_2: begin
    f<=f_int;   //*timed output load*//
    g<=k_int*j_int;
    CS<=S_0;  end
  endcase
 end  
end
endmodule

gen-alg-fsm-4.svg

  • Suggesting Resource Sharing guides the synthesizer to reuse hardware for operations at multiple places in the code
  • The coding method depends on the synthesizer tool: the following code is more likely interpreted as resource sharing since the multiplier result is always written to the same variable
module fsm(f,g,i,j,k,rst,clk);
input [7:0] i,j,k;
output reg [7:0] f,g;
input rst,clk;
reg [3:0] CS;
localparam S_0=0,S_1=1,S_2=2;

always @ (posedge clk) begin:named
 reg [7:0] i_int,j_int,k_int,f_int;
 reg [7:0] mult_out;
 mult_out = 8'bx;
 if (rst) begin
   f<=0; g<=0;
   CS<=S_0;
 end else begin
  case(CS)
   S_0: begin
    i_int<=i;
    j_int<=j;
    k_int<=k;
    CS<=S_1;
   end
   S_1: begin
    mult_out = i_int*j_int;
    f_int <= mult_out;
    CS<=S_2;
   end
   S_2: begin
    f <= f_int;
    mult_out = k_int*j_int;
    g    <= mult_out;
    CS<=S_0;
   end
  endcase
 end  
end
endmodule

43. Another option is to manually extract the multiplier out of the edge-triggered code and write datapath logic explicitly for the multiplier paired with a mux along with a state machine to change the inputs

Multiplier input controlled using an explicit mux in the datapath:

module fsm(f,g,i,j,k,rst,clk);
input [7:0] i,j,k;
output reg [7:0] f,g;
input rst,clk;
reg [3:0] CS;
localparam S_0=0,S_1=1,S_2=2;

wire mult_a_sel;
wire [7:0] mult_out,mult_a,mult_b;
assign mult_out=mult_a * mult_b;
assign mult_a=mult_a_sel?k_int:i_int;
assign mult_b=j_int;


always @ (posedge clk) begin:named1
  i_int<=_i_int;
  j_int<=_j_int;
  f_int<=_k_int;
  f_int<=_f_int;
  g<=_g;
  f<=_f;
  CS<=NS;
end

always_comb begin:named2
 reg [7:0] i_int,j_int,k_int,f_int;
 if (rst) begin
  _i_int=8'bx;
  _j_int=8'bx;
  _k_int=8'bx;
  _f_int=8'bx;
  _g=0;
  _f=0;
  NS=S_0;
  mult_a_sel=1'bx;
 end else begin
  _i_int=i_int; 
  _j_int=j_int;
  _k_int=i_int;
  _f_int=f_int;
  _g=g;
  _f=f;
  mult_a_sel=1'bx;
  case(CS)
   S_0: begin
    _i_int=i;
    _j_int=j;
    _k_int=k;
    NS=S_1;
   end
   S_1: begin
    mult_a_sel=0;
    _f_int=mult_out;
    NS=S_2;
   end
   S_2: begin
    _f=f_int;   
    mult_a_sel=1;
    _g=mult_out;
    NS=S_0;
   end
  endcase
 end  
end
endmodule

  • Another variation with the mux logic embedded within the state machine

Multiplier input controlled using an implied mux in the datapath, requires that data to be passed through the statemachine description block:

module fsm(f,g,i,j,k,rst,clk);
input [7:0] i,j,k;
output reg [7:0] f,g;
input rst,clk;
reg [3:0] CS;
localparam S_0=0,S_1=1,S_2=2;

reg mult_a_sel; //** now reg instead of wire
wire [7:0] mult_out,mult_a,mult_b;
assign mult_out=mult_a * mult_b;
assign mult_b=j_int;


always @ (posedge clk) begin:named1
  i_int<=_i_int;
  j_int<=_j_int;
  f_int<=_k_int;
  f_int<=_f_int;
  g<=_g;
  f<=_f;
  CS<=NS;
end

always_comb begin:named2
 reg [7:0] i_int,j_int,k_int,f_int;
 if (rst) begin
  _i_int=8'bx;
  _j_int=8'bx;
  _k_int=8'bx;
  _f_int=8'bx;
  _g=0;
  _f=0;
  NS=S_0;
  mult_a=1'bx;
 end else begin
  _i_int=i_int; 
  _j_int=j_int;
  _k_int=i_int;
  _f_int=f_int;
  _g=g;
  _f=f;
  mult_a_sel=1'bx;
  case(CS)
   S_0: begin
    _i_int=i;
    _j_int=j;
    _k_int=k;
    NS=S_1;
   end
   S_1: begin
    mult_a=i_int;
    _f_int=mult_out;
    NS=S_2;
   end
   S_2: begin
    _f=f_int;   
    mult_a=k_int;
    _g=mult_out;
    NS=S_0;
   end
  endcase
 end  
end
endmodule

Another Rescheduling Example

always @ (posedge clk)case (CS) 
  S_0:begin q<=r+s;    CS<=S_1;  end
  S_1:begin            CS<=S_2;  end
  S_2:begin qout<=q+5; CS<=S_0;  end

always @ (posedge clk)case (CS) 
  S_0:begin q<=r+s;  CS<=S_1; end
  S_1:begin q<=q+5;  CS<=S_2; end
  S_2:begin qout<=q; CS<=S_0; end

Yet Another Rescheduling Example

Code Provided to the Synthesizer

input  [7:0] i,j,k;
output [7:0] f,h;
reg    [7:0] f,g,h,q,r,s;
always @ (posedge clk)case (CS) 
  S_0:begin r<=i*2; s<=j+j;           end
  S_1:begin f<=i+j; g<=j*23; CS<=S_1; end
  S_2:begin h<=f+k;          CS<=S_2; end
  S_3:begin f<=f*g; q<=r*s;  CS<=S_0; end
  ...
…
  • FSM Synthesizers can automatically try variations maintain input and output timing:
    • Movement of q=r*s using a new temporary register qint:

      S_0:begin r<=i+i; s<=j*2;                   end
      S_1:begin f<=i+j; g    <=j*23;    CS<=S_1;  end
      S_2:begin h<=f+k; q_int<=r*s;     CS<=S_2;  end //**
      S_3:begin f<=f*g; q    <=q_int;   CS<=S_0;  end //**
      
    • Movement of f*g: using an exisinting working register g:

      ...
      S_0:begin r<=i+i; s<=j*2;           end
      S_1:begin f<=i+j; g<=j*23; CS<=S_1; end
      S_2:begin h<=f+k; g<=f*g;  CS<=S_2; end //**
      S_3:begin f<=g;   q<=r*s;  CS<=S_0; end //**
      ...
      
  • ❌ Example of (Possibly) Invalid rearrangement for i+j+k

    • captures input k in state S1 instead of S2 and vice versa for input j
    • unless the optimizer tool or designer can determine from elsewhere in the design that input i and j do not change in those

    time cycles, input timing must be maintained just like output updates

      ...
    case (CS) 
    S_0:begin r<=i+i; s<=j*2;           end
    S_1:begin f<=i+k; g<=j*23; CS<=S_1; end //**
    S_2:begin h<=f+j;          CS<=S_2; end //**
    S_3:begin f<=f*g; q<=r*s;  CS<=S_0; end
    ...
    

State Encoding

  • The state encoding effects the size of the decoder, speed, dependent logic optimization, etc.
  • Encodings supported by Xilinx: http://www.xilinx.com/support/documentation/sw_manuals/xilinx11/xst.pdf
    • Auto: In this mode, XST tries to select the best suited encoding algorithm for each FSM.
    • One-Hot: One-hot encoding is the default encoding scheme. Its principle is to associate one code bit and also one flip-flop to each state. At a given clock cycle during operation, one and only one bit of the state variable is asserted. Only two bits toggle during a transition between two states. One-hot encoding is very appropriate with most FPGA targets where a large number of flip-flops are available. It is also a good alternative when trying to optimize speed or to reduce power dissipation.

      00000001
      00000010
      00000100
      01000000
      10000000
    • Gray: Gray encoding guarantees that only one bit switches between two consecutive states. It is appropriate for controllers exhibiting long paths without branching. In addition, this coding technique minimizes hazards and glitches. Very good results can be obtained when implementing the state register with T flip-flops.

      00000000
      00000001
      00000011
      00000010
      00000110
    • Compact: Compact encoding consists of minimizing the number of bits in the state variables and flip-flops. This technique is based on hypercube immersion. Compact encoding is appropriate when trying to optimize area.
    • Johnson: Like Gray, Johnson encoding shows benefits with state machines containing long paths with no branching.

      00000000
      00000001
      00000011
      00000111
      11111111
      11111110
      11111100
      11111000
      10000000 -> rollover to all zeros
  • Sequential: Sequential encoding consists of identifying long paths and applying successive radix two codes to the states on these paths. Next state equations are minimized.
  • Speed1: Speed1 encoding is oriented for speed optimization. The number of bits for a state register depends on the particular FSM, but generally it is greater than the number of FSM states.
  • User: In this mode, XST uses original encoding, specified in the HDL file. For example, if you use enumerated types for a state register, then in addition you can use the ENUMENCODING constraint to assign a specific binary value to each state. Please refer to "Design Constraints" chapter for more details.
  • RAM-Based Finite State Machine (FSM) Synthesis: Large Finite State Machine (FSM) components can be made more compact and faster by implementing them in the block RAM resources provided in Virtex® devices and later technologies. FSM Style (FSMSTYLE) directs XST to use block RAM resources for FSMs.

Unreachable State Check

  • One of the benefits of extracting FSMs, is that additional analysis and design checks are available: "XST can detect unreachable states in an FSM. It lists them in the log file in the HDL Synthesis step."

Safe Finite State Machine (FSM) Implementation

  • XST can add logic to your Finite State Machine (FSM) implementation that will let your state machine recover from an invalid state. If during its execution, a state machine enters an invalid state, the logic added by XST will bring it back to a known state, called a recovery state. This is known as Safe Implementation mode.
  • By default, XST automatically selects a reset state as the recovery state. …
  • This feature is useful in system susceptible to corruption or as a way to handle undefined power-on initialization

Digital Encoding Key Points

  • One-hot encoding is good for speed and simplicity of state decoding logic to control datapath, and next-state logic is simple state transition
    • easy to detect invalid state
  • More compact codes such as standard binary encoding generally require a smaller state state register than one-hot encoding at the possible cost of size and speed.
    • But this depends on the density of combinatorial logic vs. registers the supporting HW platform and in the design.
    • FPGAs have many registers and so the cost of additional combinatorial logic may large compared to the savings from needing less registers.
  • Codes where only one or two bits change at a time in the state register may be beneficial.
    • Less transitions may lead to less power depending on overall design, e.g. if stage register output connects to a high-capacitance load (high-fanout)
    • Can minimize the chance of metastability errors (such as systems with tight timing, or radiation vulnerability).
    • These codes may also minimize logic glitches.

FSM Log Example

  • The XST log file reports the full information of recognized Finite State Machine (FSM) components during the Macro Recognition step. Moreover, if you allow XST to choose the best encoding algorithm for your FSMs, it reports the one it chose for each FSM.  As soon as encoding is selected, XST reports the original and final FSM encoding. If the target is an FPGA device, XST reports this encoding at the HDL Synthesis step. If the target is a CPLD device, then XST reports this encoding at the Low Level Optimization step.

Log File Example

Synthesizing Unit <fsm_1>.
Related source file is "/state_machines_1.vhd".
Found finite state machine <FSM_0> for signal <state>.
------------------------------------------------------
| States         | 4                 |
| Transitions    | 5                 |
| Inputs         | 1                 |
| Outputs        | 4                 |
| Clock          | clk (rising_edge) |
| Reset          | reset (positive)  |
| Reset type     | asynchronous      |
| Reset State    | s1                |
| Power Up State | s1                |
| Encoding       | automatic         |
| Implementation | LUT               |
------------------------------------------------------
Found 1-bit register for signal <outp>.
Summary:
inferred 1 Finite State Machine(s).
inferred 1 D-type flip-flop(s).
Unit <fsm_1> synthesized.
========================================================



HDL Synthesis Report
Macro Statistics
# Registers : 1 {#registers--1 }
1-bit register : 1
========================================================
========================================================
* Advanced HDL Synthesis *
========================================================
Advanced Registered AddSub inference ...
Analyzing FSM <FSM_0> for best encoding.
Optimizing FSM <state/FSM_0> on signal <state[1:2]> 
with gray encoding. 
-------------------
State | Encoding
-------------------
   s1 | 00
   s2 | 01
   s3 | 11
   s4 | 10
-------------------
=======================================================
HDL Synthesis Report
Macro Statistics
# FSMs : 1 {#fsms--1 }
=======================================================

Constraints/Compiler Directives

  • Most synthesizers support various instructions to modify synthesis behavior
  • Many options can be applied globally or on a per module instance or even per block level.
  • Some are entered in constraint files, as a command-line option or inline via special commented tags:
  • The comment below is an example of a synthesizer directive / inline-constraint:

    casex select // synthesis full_case
    4'b1xxx: res = data1;
    4'bx1xx: res = data2;
    4'bxx1x: res = data3;
    4'bxxx1: res = data4;
    

Xilinx FSM Constraints

Default case

  • You should not automatically add default case to synthesize a FSM since the logic has to cover many unnecessary states

    // synopsys translate_off
    default: \$display(“He’s dead, Jim.”) ;
    // synopsys translate_on
    

state variables

  • Explicit States are stored in the state register, but other registers for variables can exist serving as extended state variables
  • In-class example given
  • Technically, every register in a digital system is a part of the state. What variables you decide to think of as state in your state diagram and what is coded in the CS register and used in the case statement is up to you.
  • When there are many similar states, sometimes combining them in code and adding a register for a variable makes sense. This can reduce the state decoding and state logic and may make the code more maintainable and easier to read.

States, the state register, and registers for variables serving as extended state variables

  • In-class example given
  • Technically, every register in a digital system is a part of the state. What variables you decide to think of as state in your state diagram and what is coded in the CS register and used in the case statement is up to you.
  • When there are many similar states, sometimes combining them in code and adding a register for a variable makes sense. This can reduce the state decoding and state logic and may make the code more maintainable and easier to read.

Example state machine code if time allows

Author: Dr. Ryan Robucci

Created: 2024-10-14 Mon 18:27

Validate