Lecture 08 – Verilog Case-Statement Based State Machines

Ryan Robucci

• Spacebar to advance through slides in order
• Shift-Spacebar to go back
• Arrow keys for navigation

• ESC/O-Key to see slide overview
• ? to see help

Printable Version

Table of Contents

References

Slide Audience

Finite State Machine (FSM)

input immediate output next output next status Combinational Logic NS STATE Sequential Logic EXTENDED STATE REGISTERS output CS STATE REGISTER output status

Finite State Machine with Datapath

Book’s FSM Hardware Implementation

†Shaumont

Components of a modeling/description language for FSM ^\circ

Software and Hardware ^\circ

4 Rules of Proper FSMD ^\circ

Hardware/Software Partitioning ^\circ

Control and Datapath Partitioning

Temporal Processes of statemachine

  • Just after clock edge, state variables are updated. For controller this means that a state-transition is complete and state registers holds the new current state. For the data path, this means that the register variables are updated as a result of assigning (algebraic, boolean, logical, etc…) expressions to them.
  • Controller FSM Combines the control state and the data-path state to evaluate the new next state for the, at the same time it will also select what instructions should be executed by the datapath
  • The datapath FSM will evaluate the next-state for the state variables in the datapath, using the updated datapath state as well as the instructions received from the control FSM
  • Just before the next clock edge, both the control FSM and the datapath FSM have evaluated and prepared the next-state value for the control state as well as the datapath state
†Shaumont

Controller FSM vs Datapath

  • Controller FSM

    • IrregularStructure
    • Very well-suited for FSM-style description, difficult to describe with (boolean) expression-based
    • Registers represent the familiar sense of the state of the system
  • Datapath

    • Regular Structure
    • Often easy to describe with expressions or structurally build with blocks
    • Registers represent algorithmic states (intermediate or partial results)

Partition-and-Conquer Design

State Machine with High Amounts of Branching and Merging

Exercise: consider adding bat to the word list.

Algorithmic Statemachines

Data-Flow Graph

Number of Always Blocks

Extended State Registers

note the explicit state register and the additional extended state registers

input immediate output next output next status Combinatorial Logic NS STATE (prone to glitch and timing variation) Sequential Logic EXTENDED STATE REGISTERS output CS STATE REGISTER output status

Note that the output and status can be based one or more of the input, NS, and other state registers.

  case (CS)
    S0: begin flag=0; end //unconditional, combinational output in S0
    S1: begin flag=1; end
  case (CS)
    S0: begin 
      if (in==0) 
        flag=0;   //conditional, combinational output while in S0,
                  //  updates immediately with input change
      else
        flag=1;   //conditional, combinational output while in S0,
                  //  updates immediately with input change
      end 
    S1: begin flag=1; end
  case (CS)
    S0: begin 
      if (NS==S3) 
        flag=0;   //conditional, combinational output while in S0,
                  //  updates immediately with input change through immediate NS change
      else
        flag=1;   //conditional, combinational output while in S0,
                  //  updates immediately with input change through immediate NS change
      end 
    S1: begin flag=1; end
  case (NS)
    S0: begin flag=0; end // combinational output, update seen immediately
                          // with input update that selects destination state S0
    S1: begin flag=1; end 
  case (NS)
    S0: begin 
      if (CS==S3)   
        flag=0;          // combinational output, update applied immediately
                         // with input update that selects destination state S0
                         // applies only if current state is S3  
      else
        flag=1;
      end 
    S1: begin flag=1; end

Registered Output for Partitioning and Timing

One of the advantages of registered output design is in design partitioning and addressing the timing of a design within each partition.

Registered Ouput Modules:

G clusterA module A clusterB module B clusterC module C RA RA RB RB RC RC LA LA LA->RA LB LB LB->RB LC LC LC->RC

Assembled Registered Ouput Modules:

Irregular Modules:

G clusterG module G clusterF module F clusterE module E RE RE LE LE RE->LE RF RF LF LF LF->RF LG LG

Assembled Irregular Modules:

Control FSM Cycle Timing

Fast and Slow Systems

Many issues will arise in using systems that operate at different speeds, or don’t operate in a predetermined fixed timing

SLOW MASTER Ack Req Fast SLAVE

Fast MASTER Ack Req Slow SLAVE

Most generally

Waiting – Conditional and Unconditional

Implementation of Waits

Some Options:

Unconditional Delay Using Extra States

Unconditional Delay Using Extended State Register Variable : Counter

Minimal Pause: Delay + Conditional Exit Using An Extra Final Wait State (unconditional delay +conditional exit)

Another Varient:

Minimum Wait: Delay + Conditional Exit Using Extended State Register Variable Counter

Programmable Wait State

Thought Question

“For Loop” State Machine

Drawing Possible Hardware

Realizations of i,j registers and support hardware

class discussion: timing diagram

Combinatorial vs Sequential Pitfalls

Wrong Code State D

SA:
  i<=0; ...
SD:
  i <= i+1;
  if (i<10) 
    CS<=SA;
  else 
    CS<=SE;.
     

Correct Code State D

SD:
  next_i = i+1;
  if (next_i<10) 
    CS<=SA;
  else 
    CS<=SE;
  i <= next_i;.

Tweaking Question ( a question asked in class)

Wrong Code State D

SA:
  i<=0; ...
SD:
   i <= i+1;
   if (i<10) CS<=SA; 
   else CS<=SE;.

Tweaked / “Hacked”

SA:
  i<=0; ...
SD:
   i <= i+1;
   if (i<9) CS<=SA; 
   else CS<=SE;.

Detecting Signal Change in Single-Clock-Domain Synchronous Logic

always @ (posedge clk) begin
  if (~e_prev & e) e_counter <= e_counter + 1; 
  e_prev<=e;
end

which is the same as

always @ (posedge clk)begin
  e_prev<=e;
  if (~e_prev & e) e_counter <= e_counter + 1; 
end

One Reaction per Transition

FSM SLAVE SLOW MASTER Ack Req

Transition detection condition for statemachines

Limitations of FSM

State Machine State Explosion

†Shaumont

Global Exceptions

†Shaumont
†Shaumont

Global exception in software

switch(current_state){
 case A:case B:case C:case D:}
if (exception){
  do this
}else{
  switch(current_state){
   case A:case B:
   case C:
   case D:
}

Delay/Pause/Waiting with Extended State Register

Runtime Flexibility ^\circ

Microprogrammed Control ^\circ

†Shaumont

General Processor Control ^\circ

Multi-Cycle Computations as FSMs

Algorithmic State Machine Diagram

An introduction to informal ASM Diagrams

Example Euclid GCD:

function gcd(a, b)
    while (a != b) 
        if a > b
            a = a − b
        else
            b = b − a
    return a

INIT/truestartfalseA<=XB<=YCOMPUTE/A>BtruefalseA<=A-BB<=B-AfalseA==BtrueSTOP/

FSM Optimizations

Module Header

module FSM_opt( 
    output reg [7:0] f,
    input clk,
    input wire [7:0] i,
    input wire [7:0] j,
    input wire [7:0] k,
    input rst     );

reg [7:0] CS;
reg [7:0] i_int,j_int,k_int,p,m;

localparam S_0 = 8'b00000000;
localparam S_1 = 8'b00000001;
localparam S_2 = 8'b00000010;
always @ (posedge clk) begin
  if (rst) begin
    CS<=S_0;
    f<=0;
  end else begin
    case(CS)
      S_0: begin
                i_int<=i; 
                j_int<=j;
                k_int<=k;
                CS<=S_1;
           end
      S_1: begin
                p<=i_int+j_int;
                CS<=S_2;
           end
      S_2: begin
                m=p*j_int;
                f<=m*k_int;
                CS<=S_0;       
            end
    endcase
  end  
end
endmodule

S0i_int<=ij_int<=jk_int<=kS1p<=i_int+j_intS2with m=p*j_int: f<=m*k_int

S0i_int<=ij_int<=jk_int<=kS1p<=i_int+j_intS2with m=p*j_int: f<=m*k_int

Rescheduling

always @ (posedge clk) begin
  if (rst) begin
    CS<=S_0;
    f<=0;
  end else begin
    case(CS)
      S_0: begin
                i_int<=i; 
                j_int<=j;
                k_int<=k;
                CS<=S_1;             
           end        
      S_1: begin 
                p_int=i_int+j_int;
                m<=p_int*j_int;
                CS<=S_2;
           end
      S_2: begin 
                f<=m*k_int; 
                CS<=S_0;       
            end
    endcase
  end  
end
endmodule

S0i_int<=ij_int<=jk_int<=kS1with p=i_int+j_int: m<=p*j_intS2f<=m*k_int

Some synthesizers may do similar types of rescheduling for you.
Xilinx tools command to perform a “retiming”: https://www.xilinx.com/support/answers/65410.html

Multicycle Path Timing Constraint (Forward Discussion) ^\circ

In the following version, all compuation is performed in the S_2 cycle.

always @ (posedge clk) begin
  if (rst) begin
    CS<=S_0;
    f<=0;
  end else begin
    case(CS)
      S_0: begin
                i_int<=i; 
                j_int<=j;
                k_int<=k;
                CS<=S_1;             
           end        
      S_1: begin 
                CS<=S_2;
           end
      S_2: begin 
                p=i_int+j_int;
                m=p*j_int;
                f<=m*k_int; 
                CS<=S_0;       
            end
    endcase
  end  
end
endmodule

In this specific case we can provided a relaxed timing constraint for the paths from the output of the registers for i,j,k to the input of f. We will later learn about manipulation constraints for the synthesizer to allow for multi-cycle paths, but for now we assume that all combinational chains must complete work within a clock cycle.

Resource Sharing

Reference Computation Module for discussion

Head

module FSM_opt( 
    output reg [7:0] f,
    output reg [7:0] g,
    input clk,
    input wire [7:0] i,
    input wire [7:0] j,
    input wire [7:0] k,
    input rst
    );
reg [7:0] CS;
reg [7:0] i_int,j_int,k_int;

localparam S_0 = 8'b00000000;
localparam S_1 = 8'b00000001;
localparam S_2 = 8'b00000010;

Body

always @ (posedge clk) begin
  if (rst) begin
    CS<=S_0;
    f<=0;
    h<=0;
  end else begin
    case(CS)
      S_0: begin 
        i_int<=i;
        j_int<=j;
        k_int<=k;
        CS<=S_1;  end        
      S_1: begin
        CS<=S_2;  end
      S_2: begin 
        f<=i_int*j_int;
        h<=j_int*k_int; 
        CS<=S_0;  end
    endcase
  end  
end
endmodule
Suggested RTL

S0i_int<=ij_int<=jk_int<=kS1S2f<=i_int*j_inth<=j_int*k_int

Alternative RTL with lower gate count

always @ (posedge clk) begin
  if (rst) begin
    f<=0; g<=0;
    CS<=S_0;
  end else begin
    case(CS)
      S_0: begin 
        i_int<=i;
        j_int<=j;
        k_int<=k;
        CS<=S_1;  end        
      S_1: begin

        CS<=S_2;  end
      S_2: begin 
        f<=i_int*j_int; //**
        g<=j_int*k_int; 
        CS<=S_0;  end
    endcase
  end  
end
endmodule

Explicitly coding the rescheduling so that only one multiply is performed per cycle is simple and seen in the next code. However, this doesn’t ensure resource sharing as shown in the figure with one a single multiplier.

Alt. Module Body

always @ (posedge clk) begin
 if (rst) begin
   f<=0; g<=0;
   CS<=S_0;
 end else begin
  case(CS)
   S_0: begin
    i_int<=i;
    j_int<=j;
    k_int<=k;
    CS<=S_1;  end
   S_1: begin
    f_int<=i_int*j_int; //**moved to here**//
    CS<=S_2;  end
   S_2: begin
    f<=f_int;   //*timed output load*//
    g<=k_int*j_int;
    CS<=S_0;  end
  endcase
 end  
end
endmodule

S0i_int<=ij_int<=jk_int<=kS1h_int<=j_intS2f<=i_int*j_inth<=h_int






















Combinational
mult. output→



Combinational
mult. output→










module fsm(f,g,i,j,k,rst,clk);
input [7:0] i,j,k;
output reg [7:0] f,g;
input rst,clk;
reg [3:0] CS;
localparam S_0=0,S_1=1,S_2=2;

always @ (posedge clk) begin:named
 reg [7:0] i_int,j_int,k_int,f_int;
 reg [7:0] mult_out;
 mult_out = 8'bx;
 if (rst) begin
   f<=0; g<=0;
   CS<=S_0;
 end else begin
  case(CS)
   S_0: begin
    i_int<=i;
    j_int<=j;
    k_int<=k;
    CS<=S_1;  end
   S_1: begin
    mult_out=i_int*j_int;  //**
    f_int<=mult_out;       //**
    CS<=S_2;  end
   S_2: begin
    f<=f_int;   
    mult_out=k_int*j_int;  //**
    g<=mult_out;           //**
    CS<=S_0;  end
  endcase
 end  
end
endmodule






















← * mux multiplier input from i
← save multiplier output for output f


← reveal new f
← * mux multiplier input from k
← save multiplier output for g

Multiplier input controlled using an explicit mux in the datapath:










combinational multiplier output→
multiplier input multiplixer→















avoid generating unessisary enable
or reset for control signal rst→



reset→
reset→
reset state→
don't care for mux control on reset→

default register input are register outputs→
• (i.e. no change)




mux control default don't care→







mux control→
f_int register update→



mux control→
g register update→
module fsm(f,g,i,j,k,rst,clk);
input [7:0] i,j,k;
output reg [7:0] f,g;
input rst,clk;
reg [3:0] CS;
localparam S_0=0,S_1=1,S_2=2;

wire mult_a_sel;
wire [7:0] mult_out,mult_a,mult_b;
assign mult_out=mult_a * mult_b;
assign mult_a=mult_a_sel?k_int,i_int;
assign mult_b=j_int;


always @ (posedge clk) begin:named1
  i_int<=_i_int;
  j_int<=_j_int;
  f_int<=_k_int;
  f_int<=_f_int;
  g<=_g;
  f<=_f;
  CS<=NS;
end

always_comb begin:named2
 reg [7:0] i_int,j_int,k_int,f_int;
 if (rst) begin
  _i_int=8'bx; //avoid unessisary enables or resets
  _j_int=8'bx;
  _k_int=8'bx;
  _f_int=8'bx;
  _g=0;
  _f=0;
   NS<=S_0;
   mult_a_sel=1'bx;
 end else begin
  _i_int=i_int; 
  _j_int=j_int;
  _k_int=i_int;
  _f_int=f_int;
  _g=g;
  _f=f;
  mult_a_sel=1'bx;
  case(CS)
   S_0: begin
    _i_int=i;
    _j_int=j;
    _k_int=k;
    NS=S_1;  end
   S_1: begin
    mult_a_sel=0; //mult_out=i_int*j_int; //**
    _f_int=mult_out;       //**
    NS=S_2;  end
   S_2: begin
    _f=f_int;   
    mult_a_sel=1; // mult_out=k_int*j_int; //**
    _g=mult_out;          //**
    NS=S_0;  end
  endcase
 end  
end
endmodule

Multiplier input controlled using an implied mux in the datapath, requires that data to be passed through the statemachine description block:










multiplication with inputs→
fix one multiplier input→






























set multiplier default input as don't care→







set multiplier input→
mux control generated implicitly   



set multiplier input→
mux control generated implicitly   
module fsm(f,g,i,j,k,rst,clk);
input [7:0] i,j,k;
output reg [7:0] f,g;
input rst,clk;
reg [3:0] CS;
localparam S_0=0,S_1=1,S_2=2;

wire [7:0] mult_out,mult_a,mult_b;
reg [7:0] mult_a;                  //**
assign mult_out=mult_a * mult_b;
assign mult_b=j_int;


always @ (posedge clk) begin:named1
  i_int<=_i_int;
  j_int<=_j_int;
  f_int<=_k_int;
  f_int<=_f_int;
  g<=_g;
  f<=_f;
  CS<=NS;
end

always_comb begin:named2
 reg [7:0] i_int,j_int,k_int,f_int;
 if (rst) begin
  _i_int=8'bx; //avoid unessisary enables or resets
  _j_int=8'bx;
  _k_int=8'bx;
  _f_int=8'bx;
  _g=0;
  _f=0;
   NS=S_0;
  mult_a=8'bx; //**
 end else begin
  _i_int=i_int; 
  _j_int=j_int;
  _k_int=i_int;
  _f_int=f_int;
  _g=g;
  _f=f;
  mult_a=8'bx; //**
  case(CS)
   S_0: begin
    _i_int=i;
    _j_int=j;
    _k_int=k;
    NS=S_1;  end
   S_1: begin
    mult_a=i_int;          //** 
    _f_int=mult_out;       
    NS=S_2;  end
   S_2: begin
    _f=f_int;   
    mult_a=k_int;         //**
    _g=mult_out;          
    NS=S_0;  end
  endcase
 end  
end
endmodule

Another Rescheduling Example

Version 1

always @ (posedge clk)case (CS) 
  S_0:begin q<=r+s;    CS<=S_1;  end
  S_1:begin            CS<=S_2;  end
  S_2:begin qout<=q+5; CS<=S_0;  end

Version 2

always @ (posedge clk)case (CS) 
  S_0:begin q<=r+s;  CS<=S_1; end
  S_1:begin q<=q+5;  CS<=S_2; end
  S_2:begin qout<=q; CS<=S_0; end

Yet Another Rescheduling Example

Code Provided to the Synthesizer

input  [7:0] i,j,k;
output [7:0] f,h;
reg    [7:0] f,g,h,q,r,s;
always @ (posedge clk)case (CS) 
  S_0:begin r<=i*2; s<=j+j;           end
  S_1:begin f<=i+j; g<=j*23; CS<=S_1; end
  S_2:begin h<=f+k;          CS<=S_2; end
  S_3:begin f<=f*g; q<=r*s;  CS<=S_0; end
  ...

State Encoding

Unreachable States

Safe Finite State Machine (FSM) Implementation

Encoding Review

FSM Log File

Log File Example

Synthesizing Unit <fsm_1>. Related source file is "/state_machines_1.vhd". Found finite state machine <FSM_0> for signal <state>. ------------------------------------------------------ | States | 4 | | Transitions | 5 | | Inputs | 1 | | Outputs | 4 | | Clock | clk (rising_edge) | | Reset | reset (positive) | | Reset type | asynchronous | | Reset State | s1 | | Power Up State | s1 | | Encoding | automatic | | Implementation | LUT | ------------------------------------------------------ Found 1-bit register for signal <outp>. Summary: inferred 1 Finite State Machine(s). inferred 1 D-type flip-flop(s). Unit <fsm_1> synthesized. ========================================================
HDL Synthesis Report Macro Statistics # Registers : 1 1-bit register : 1 ======================================================== ======================================================== * Advanced HDL Synthesis * ======================================================== Advanced Registered AddSub inference ... Analyzing FSM <FSM_0> for best encoding. Optimizing FSM <state/FSM_0> on signal <state[1:2]> with gray encoding. ------------------- State | Encoding ------------------- s1 | 00 s2 | 01 s3 | 11 s4 | 10 ------------------- ======================================================= HDL Synthesis Report Macro Statistics # FSMs : 1 =======================================================

Constraints/Compiler Directives

Xilinx FSM Constraints

Default case

state variables

Additional Synthesis Options and Directives

http://www.xilinx.com/support/documentation/sw_manuals/xilinx11/pp_db_xst_hdl_synthesis_options.htm

Example state machine code if time allows

≑