Lecture 16 – Signed Integers and Arithmetic

Ryan Robucci

Table of Contents

References

Signed and Unsigned Integers

Unsigned Interpretation:
i=0n1(2ixi)=\displaystyle\sum_{i=0}^{n-1}(2^i x_i)=
2n1xn1+i=0n2(2ixi)\displaystyle \textcolor{red}{2^{n-1} x_{n-1} } + \sum_{i=0}^{n-2}(2^i x_i )

Signed Interpretation (MSB used for representing Negative Numbers)

2n1xn1+i=0n2(2ixi)\displaystyle \textcolor{red}{-2^{n-1}x_{n-1} } + \sum_{i=0}^{n-2}(2^i x_i )

Negative Numbers in Two’s Complement (2NX2^N-X)

Two’s Complement Wheel

Helps visualize

  • bit interpretation
  • modular arithmetic
    • e.g. (X+Y)%24(X+Y)\%2^4
  • overflow
image src:https://stackoverflow.com/questions/55145028/binary-ones-complement-in-python-3, though origin uncertain

Conversion Arithmetic

Conversion for 4-bit Data

Bits
x3x2x1x0x_3x_2x_1x_0
Unsigned relation Signed
1111 15 +1616\xtofrom[+16]{-16} 1=2415\textcolor{red}{-1}=2^4-15
1110 14 +1616\xtofrom[+16]{-16} 2=2414\textcolor{red}{-2}=2^4-14
1101 13 +1616\xtofrom[+16]{-16} 3=2413\textcolor{red}{-3}=2^4-13
1100 12 +1616\xtofrom[+16]{-16} 4=2412\textcolor{red}{-4}=2^4-12
1011 11 +1616\xtofrom[+16]{-16} 5=2411\textcolor{red}{-5}=2^4-11
1010 10 +1616\xtofrom[+16]{-16} 6=2410\textcolor{red}{-6}=2^4-10
1001 9 +1616\xtofrom[+16]{-16} 7=249\textcolor{red}{-7}=2^4-9
1000 8 +1616\xtofrom[+16]{-16} 8=248\textcolor{red}{-8}=2^4-8
0111 7\textcolor{blue}{7} = +7\textcolor{blue}{+7}
0110 6\textcolor{blue}{6} = +6\textcolor{blue}{+6}
0101 5\textcolor{blue}{5} = +5\textcolor{blue}{+5}
0100 4\textcolor{blue}{4} = +4\textcolor{blue}{+4}
0011 3\textcolor{blue}{3} = +3\textcolor{blue}{+3}
0010 2\textcolor{blue}{2} = +2\textcolor{blue}{+2}
0001 1\textcolor{blue}{1} = +1\textcolor{blue}{+1}
0000 0\textcolor{blue}{0} = +0\textcolor{blue}{+0}

Visual Example Sine Wave transmitted using 5-bit data:

Complement and Increment

Ones Complement

Sign-Magnitude

Long/short unsigned/signed Conversion Rules

(C/C++) Combinations of (dest type) = (source type) to consider

LHS=RHS( unsigned long)=(unsigned long)( unsigned long)=( signed long)( signed long)=(unsigned long)( signed long)=( signed long)(unsigned short)=(unsigned long)(unsigned short)=( signed long)( signed short)=(unsigned long)( signed short)=( signed long)( unsigned long)=(unsigned short)( unsigned long)=( signed short)( signed long)=(unsigned short)( signed long)=( signed short)(unsigned short)=(unsigned short)(unsigned short)=( signed short)( signed short)=(unsigned short)( signed short)=( signed short)\begin{align*} \rm LHS &= RHS \\ \text{( unsigned long)} &= \text{(unsigned long)} \\ \text{( unsigned long)} &= \text{( signed long)} \\ \text{( signed long)} &= \text{(unsigned long)} \\ \text{( signed long)} &= \text{( signed long)} \\ \\ \text{(unsigned short)} &= \text{(unsigned long)}\\ \text{(unsigned short)} &= \text{( signed long)}\\ \text{( signed short)} &= \text{(unsigned long)}\\ \text{( signed short)} &= \text{( signed long)}\\ \\ \text{( unsigned long)} &= \text{(unsigned short)}\\ \text{( unsigned long)} &= \text{( signed short)}\\ \text{( signed long)} &= \text{(unsigned short)}\\ \text{( signed long)} &= \text{( signed short)}\\ \\ \text{(unsigned short)} &= \text{(unsigned short)}\\ \text{(unsigned short)} &= \text{( signed short)}\\ \text{( signed short)} &= \text{(unsigned short)}\\ \text{( signed short)} &= \text{( signed short)} \end{align*}

Long/short unsigned/signed Verilog Conversion Rule Demo

module conversion_demo;

wire [7:0] 
  u8x = 8'b11111111;              
wire signed [7:0] 
  s8x = 8'b11111111;   
wire [15:0]       
  u16x = 16'b1111_1111_1111_1111;    
wire signed [15:0] 
  s16x = 16'b1111_1111_1111_1111;   
wire        [15:0] u16y_ux = u8x;   
wire        [15:0] u16y_sx = s8x;
wire signed [15:0] s16y_ux = u8x;   
wire signed [15:0] s16y_sx = s8x;   

wire        [7:0] u8y_ux = u16x;    
wire        [7:0] u8y_sx = s16x;
wire signed [7:0] s8y_ux = u16x;    
wire signed [7:0] s8y_sx = s16x;   

initial begin
#0;
$display("u16y_ux:%16b,%7d",u16y_ux,u16y_ux);
//u16y_ux: 0000000011111111,     255
$display("u16y_sx:%16b,%7d",u16y_sx,u16y_sx);
//u16y_sx: 1111111111111111,   65535
$display("s16y_ux:%16b,%7d",s16y_ux,s16y_ux);
//s16y_ux: 0000000011111111,     255
$display("s16y_sx:%16b,%7d",s16y_sx,s16y_sx);
//s16y_sx: 1111111111111111,      -1
$display(" u8y_ux: %16b, %7d",u8y_ux,u8y_ux);
//u8y_ux:         11111111,      255
$display(" u8y_sx: %16b, %7d",u8y_sx,u8y_sx);
//u8y_sx:         11111111,      255
$display(" s8y_ux: %16b, %7d",s8y_ux,s8y_ux);
//s8y_ux:         11111111,       -1
$display(" s8y_sx: %16b, %7d",s8y_sx,s8y_sx);
//s8y_sx:         11111111,       -1
end
endmodule

Review Points on Bit Length

Adder Structures

Knowing the options for implementation of addition in the context of algorithms is important. Options are overviewed in later lectures.

Review: Sign Extension

Verilog: Sign Extension

wire signed [7:0] x,y;
wire signed [15:0] s1,s2;

assign s1 = {8{x[7]},x} + 
            {8{y[7]},y} // explicit sign
                        // extension

assign s2 = x + y; // implicit sign extension

Run-time Overflow Detection for Addition

Takeaway

Run-time Overflow Detection for Subtraction

Takeaway

Run-time Overflow Detection for Addition

wire signed [7:0] x,y,s;
wire flagOverflow;

assign s = x+y; //context determined 
                //    8-bit addition
//overflow case is when the sign of the 
//  input operands are the same and
//  sign of result does not match
assign flagOverflow = (x[7] == y[7]) && 
                      (y[7] ~= s[7])

Run-time Overflow Detection for Subtraction

Rounding Errors float to int

Takeway

To achieve common rounding when casting from float to int

Affecting Conversion Rounding Error with Numerator Bias

Assume unsigned int A,B;

Example:

Takeway

For positive integers, bias the numerator with half of the denominator to approximate division following by rounding

First go big or first go small: choices for order of operations affect result

Takeway

Impacts of Limited Precision in DSP Filters

Eliminating Overflow or Saturation with Input Prescaling

Increasing Effective Computation Precision with Input Prescaling

Log Scale

Avoid Unnecessary Operations

Saturation

Soft Max

Instantaneous value mapping and affect on sinusoidal time series:

Quantization / Round-off Noise

Limit Cycles

Working with signed and unsigned reg and wire

Verilog 2001 provides signed reg and wire vectors

Casting to and from signed may be implicit or may be explicit by using

Verilog: Signed/Unsigned Casting

Truncation and Extension

Truncation

Signed and Unsigned Extension

Verilog Rules for expression bit lengths

Verilog Contex-Determined Addition

Self-Determined Self-Determined Expression and Self-Determined Operands

Rules for expression bit lengths from IEEE Standards

5.4.1 Rules for expression bit lengths

The rules governing the expression bit lengths have been formulated so that most practical situations have a natural solution.

The number of bits of an expression (known as the size of the expression) shall be determined by the operands involved in the expression and the context in which the expression is given.

A self-determined expression is one where the bit length of the expression is solely determined by the expression itself—for example, an expression representing a delay value.

A context-determined expression is one where the bit length of the expression is determined by the bit length of the expression and by the fact that it is part of another expression. For example, the bit size of the right-hand expression of an assignment depends on itself and the size of the left-hand side.

Table 5-22 shows how the form of an expression shall determine the bit lengths of the results of the expression. In Table 5-22, i, j, and k represent expressions of an operand, and L(i) represents the bit length of the operand represented by i.

Multiplication may be performed without losing any overflow bits by assigning the result to something wide enough to hold it.

Reference from IEEE Standards

IEEE Standard for Verilog Hardware Description Language IEEE Std 1364-2005

Obtaining the IEEE Verilog Specification

At UMBC or if offsite use the UMBC single-sign-on option, goto http://ieeexplore.ieee.org/
Search for IEEE Std 1364-2005
You’ll find

Addition and Mixed Sign

initial begin
neg_two = -2;
s = 1; u = 1;
x1 = u + neg_two; x2 = s + neg_two;
y1 = u + neg_two; y2 = s + neg_two;
$display("%b",x1); $display("%b",x2);
$display("%b",y1); $display("%b",y2); end
endmodule

Result:

0000000011111111
1111111111111111
0000000011111111
1111111111111111

Signed/Unsigned Pitfall Example

. . .
reg [3:0] bottleStock = 10; //**unsigned**
always @ (posedige clk, negedge rst_)
if (rst_==0)
  bottleStock<=10;
else if (bottleStock >= 0) //always TRUE!!!
  bottleStock <= bottleStock-1;

Signed/Unsigned Pitfall Example

. . .
input wire [2:0] remove;
signed reg [3:0] remainingStock = 10;//**signed**
always @ (posedige clk, negedge rst_)
if (rst_==0)
  remainingStock<=10;
else if ((remainingStock-remove) >= 0) //always TRUE!!!
  remainingStock <= remainingStock-remove;

Scaling by Powers of Two using Shift

Takeaway

Multiplication

1011 (M bits) x 10010 (Nbits):

Software Multiplication using Smaller Hardware Multipliers

Takeaway

EMULATING LARGE MULTIPLIERS

Multiplication Overflow

Multiplication by Low-Density Constants

Takeaway

Rounded Integer Division

round(float(A)/float(B))=(A+B/2)/B|\rm round(float(A)/float(B))| = (|A|+|B/2|)/|B|

Takeaway

Rounded Division example with even denominator

 floating-point division: 7.0/float4=1.75 rounded result: round(7.0/float4)=2.0 cast result to int: int(7.0/float4)=1 bias float before cast: int((7.0/float4)0.5)=int((1.75)0.5)=int(2.25)=2 integer biasing: ((7+(4/2))/int4)=fix((7.0+(2))/float4)=fix(2.25)=2\begin{alignedat}{5} &\text { floating-point division: } && \rm && 7.0 \underset{float}/ -4 && &&= -1.75 \\ &\text { rounded result: } && \rm round(&& 7.0 \underset{float}/ -4 && ) &&= -2.0 \\ &\text { cast result to int: } && \rm int( && 7.0 \underset{float}/ -4 && ) &&= -1 \\ &\text { bias float before cast: } && \rm int(( && 7.0 \underset{float}/ -4 && ) \red{- 0.5}) &&= \\ & && \rm int(( && \blue{-1.75} &&) \red{- 0.5}) &&= \\ & && \rm int( && && \blue{-2.25}) &&= -2 \\ &\text { integer biasing: } && \rm (( && 7\red{+(4/2)}) \underset{\rm int}/ -4 && ) &&= \\ & && \rm fix(( && 7.0\red{+(2)}) \underset{\rm float}/ -4 && ) &&= \\ & && \rm fix( && \blue{-2.25} && ) &&= -2 \\ \end{alignedat}

Rounded Division example with odd denominator

 floating-point division: 8.0/float3=2.666... rounded result: round(8.0/float3)=3.0 cast result to int: int(8.0/float3)=2 bias float before cast: int((8.0/float3)0.5)=int(2.666...0.5)=int(3.16...)=3 integer biasing: ((8+(3/2))/int3)=fix((8.0+(1))/float3)=fix(3.0)=3\begin{alignedat}{5} &\text { floating-point division: } && \rm && 8.0 \underset{float}/ -3 && &&= -2.666... \\ &\text { rounded result: } && \rm round(&& 8.0 \underset{float}/ -3 && ) &&= -3.0 \\ &\text { cast result to int: } && \rm int( && 8.0 \underset{float}/ -3 && ) &&= -2 \\ &\text { bias float before cast: } && \rm int(( && 8.0 \underset{float}/ -3 && ) \red{- 0.5}) &&= \\ & && \rm int( && \blue{-2.666...} && \red{- 0.5} ) &&= \\ & && \rm int( && && \blue{-3.16...}) &&= -3 \\ &\text { integer biasing: } && \rm (( && 8\red{+(3/2)}) \underset{\rm int}/ -3 && ) &&= \\ & && \rm fix(( && 8.0\red{+(1)}) \underset{\rm float}/ -3 && ) &&= \\ & && \rm fix( && \blue{-3.0} && ) &&= -3 \\ \end{alignedat}

Floor vs Fix vs Round for Singed Values

Positive-Biased Value Before Fix

Negative-Biased Value Before Fix

Sign-Dependant-Bias before Fix

To Properly Mimic Signed-Integer Divided by Power of Two with Shift, a Sign-dependant Pre-Shift Bias is Required

Takeaway

Divide xx by 2k2^k is defined to be ROUND TOWARDS ZERO — mimicking this detail with arithmetic right-shift requires pre-biasing xx by adding 2k12^k-1: ( x + ((1<<k)-1) ) << k

Arbitrary Precision Arithmetic

Error Propagation

Error introduced by division:

Error propagation by multiplication:

Error introduced by multiplication:

Error propagated by division:

Error Metric:

Earlier examples shown to illustrate error generation and propagation with reordering of operatands
Keep the decision to go big or go small in mind as in each step of implementation in the next topic.
We will formally introduce an additional scale at each step and per operand.

Fixed-point arithmetic

Floating Point Math and Fixed-Point Math

Fixed-point arithmetic

QM.N notation

Addition and Subtraction with scale S (operands with same scale)

Multiplication with scale S

Division with scale S:

Bias before reducing precision

Example Fixed-Point Arith Library Code

Code examples in the below block are copied and provided under the Creative Commons Attribution-ShareAlike License: https://creativecommons.org/licenses/by-sa/3.0/

Wikipedia contributors. (2021, November 24). Q (number format). In Wikipedia, The Free Encyclopedia. Retrieved 18:10, November 29, 2021, from https://en.wikipedia.org/w/index.php?title=Q_(number_format)&oldid=1056933643

int16_t q_add_sat(int16_t a, int16_t b)
{
    int16_t result;
    int32_t tmp;

    tmp = (int32_t)a + (int32_t)b;
    if (tmp > 0x7FFF)
        tmp = 0x7FFF;
    if (tmp < -1 * 0x8000)
        tmp = -1 * 0x8000;
    result = (int16_t)tmp;

    return result;
}

// precomputed value:
#define K   (1 << (Q - 1))
 
// saturate to range of int16_t
int16_t sat16(int32_t x)
{
	if (x > 0x7FFF) return 0x7FFF;
	else if (x < -0x8000) return -0x8000;
	else return (int16_t)x;
}

int16_t q_mul(int16_t a, int16_t b)
{
    int16_t result;
    int32_t temp;

    temp = (int32_t)a * (int32_t)b; // result type is operand's type
    // Rounding; mid values are rounded up
    temp += K;
    // Correct by dividing by base and saturate result
    result = sat16(temp >> Q);

    return result;
}

int16_t q_div(int16_t a, int16_t b)
{
    /* pre-multiply by the base (Upscale to Q16 so that the result will be in Q8 format) */
    int32_t temp = (int32_t)a << Q;
    /* Rounding: mid values are rounded up (down for negative values). */
    /* OR compare most significant bits i.e. if (((temp >> 31) & 1) == ((b >> 15) & 1)) */
    if ((temp >= 0 && b >= 0) || (temp < 0 && b < 0)) {   
        temp += b / 2;    /* OR shift 1 bit i.e. temp += (b >> 1); */
    } else {
        temp -= b / 2;    /* OR shift 1 bit i.e. temp -= (b >> 1); */
    }
    return (int16_t)(temp / b);
}

Additional Operations

Equality Comparison

wire signed [7:0] x,y;
wire flagEq;

flagEq = (x==y);

Example may be implemented by eight xnor2 followed by and8;

Magnitude Comparator