L16 FPGA Technology III

Prof. Ryan Robucci

References

Mux vs LUT (repeat)

Mux

LUT

LUT as RAM (LUTRAM)

Heirarchy: LC, Slice, CLB

Complex Logic Block Cells (repeat)

Xilinx Datasheet

“Hard” IP vs “Soft” IP vs “Firm IP” (repeat)


Block RAM (repeat)


RAM / ROM with FPGA Designs

In addition to registers, there are three primary choices for implementation of large memories:

  1. Register Memory Slice/Logic Blocks have a number of inbuilt slice registers (slice is a Xilinx term)

    • fast
    • allows collocating memory and computation
    • can reduce routing
    • can serve as a local buffer for block RAM and external memory
    • DISADVANTAGE: limited number available
  2. Distributed RAM

    • LUT (normally used for logic) or any other memory within a configurable cell used as a distributed RAM
      • LUTRAM: LUT as used RAM
    • fast
    • allows collocating memory and computation
    • can reduce routing
    • can serve as a local buffer for block RAM and external memory
    • DISADVANTAGE: consumes resources for otherwise used for logic implementation
  3. Block RAM

    • High-Density Dedicated RAM
    • less flexible
    • limited access, e.g. dual read port allows reading only two-values at a time
    • may require large routing and/or copying contents repeated to/from distributed RAM for many types of computations
    • may have registered and non-registered (e.g. for large combinatorial LUT) options
  4. External RAM

    • large capacity SDRAM (synchronous DRAM) can be used off-chip
    • large memory applications require this
    • route and multiplex, cache into local memory
    • usually a synthesized or HARD memory controller for interfacing with memory is available on an FPGA platform

Depiction of small region of an FPGA with Block RAM and and Slices with LUTs:

Example Report Showing LUT used as Memory, and Block Ram utilization:

...

1. Slice Logic
--------------

+----------------------------+-------+-------+------------+-----------+-------+
|          Site Type         |  Used | Fixed | Prohibited | Available | Util% |
+----------------------------+-------+-------+------------+-----------+-------+
| Slice LUTs*                | 18777 |     0 |          0 |     20800 | 90.27 |
|   LUT as Logic             | 18629 |     0 |          0 |     20800 | 89.56 |
|   LUT as Memory            |   148 |     0 |          0 |      9600 |  1.54 |
|     LUT as Distributed RAM |   148 |     0 |            |           |       |
|     LUT as Shift Register  |     0 |     0 |            |           |       |
| Slice Registers            | 17050 |     0 |          0 |     41600 | 40.99 |
|   Register as Flip Flop    | 16916 |     0 |          0 |     41600 | 40.66 |
|   Register as Latch        |   134 |     0 |          0 |     41600 |  0.32 |
| F7 Muxes                   |   671 |     0 |          0 |     16300 |  4.12 |
| F8 Muxes                   |    30 |     0 |          0 |      8150 |  0.37 |
+----------------------------+-------+-------+------------+-----------+-------+

...
 
2. Memory
---------
+-------------------+------+-------+------------+-----------+-------+
|     Site Type     | Used | Fixed | Prohibited | Available | Util% |
+-------------------+------+-------+------------+-----------+-------+
| Block RAM Tile    | 36.5 |     0 |          0 |        50 | 73.00 |
|   RAMB36/FIFO*    |   36 |     0 |          0 |        50 | 72.00 |
|     RAMB36E1 only |   36 |       |            |           |       |
|   RAMB18          |    1 |     0 |          0 |       100 |  1.00 |
|     RAMB18E1 only |    1 |       |            |           |       |
+-------------------+------+-------+------------+-----------+-------+

...




Block Ram:

Global view highligting used cells:

Enlarged View the same with individual BMEM cells outlined next to the areas of programmable fabric:

Sync/Async Reads

Memory Inference Capabilities

(xilinx:)
Memory inference capabilities include the following:
https://docs.xilinx.com/r/en-US/ug901-vivado-synthesis/Choosing-Between-Distributed-RAM-and-Dedicated-Block-RAM

Provided that only one write port is described, Vivado synthesis can identify RAM descriptions with two or more read ports that access the RAM contents at addresses different from the write address.

Dual-Port RAM


(https://docs.xilinx.com/r/en-US/ug958-vivado-sysgen-ref/Dual-Port-RAM)

Modes for how synchronous writes affect reads from the same address in the same cycle

(based on Xilinx documentation)

  1. Read-first (old data read)
    When a read and a write occur at the same address, old content is read before new content is loaded.
  2. Write-first (new data read)
    • data written is immediate available in the same cycle for read
    • also known as read-through.
  3. No-change
    • active data write prevents read data output updates
    • must be followed by explicit read operation in a following cycle to see the result

RAM/ROM Coding and Intialization

Hardware Multipliers (repeat)

Accumulators and MAC (repeat)

DSP slices (Xilinx)

source: https://docs.amd.com/r/2021.2-English/ug1483-model-composer-sys-gen-user-guide/DSP48E1

The Xilinx DSP48E block is an efficient building block for DSP applications that use supported devices. The DSP48E combines an 18-bit by 25-bit signed multiplier with a 48-bit adder and programmable mux to select the adder’s input.

Embedded Processors (repeat)

Support Implementation of processing using traditional software (C code)

General Purpose IO (repeat)

Gigabit Transceivers (repeat)

What needs to be configured? (repeat)

The bitstream contains the data that is loaded to configure the FPGA.
this includes

Configuration Files

FPGA Configuration

🧐 Other Configuration Modes


There are other configuration modes that are discussed in the book, but you don’t need to be concerned with these right now. Just note that there are options.



Clock Trees and Clock Managers Overview (repeat)

Clock Tree and Manager Features Details (mentioned earlier)

(crop to image:)

Sample Listing of FPGA Features (repeat)

Text/table/image contents from https://www.xilinx.com/support/documentation/data_sheets/ds180_7Series_Overview.pdf

đŸ–č Advertised Feaures

  • Advanced high-performance FPGA logic based on real 6-input look-up table (LUT) technology configurable as distributed memory.
  • 36 Kb dual-port block RAM with built-in FIFO logic for on-chip data buffering.
  • High-performance SelectIOℱ technology with support for DDR3 interfaces up to 1,866 Mb/s.
  • High-speed serial connectivity with built-in multi-gigabit transceivers from 600 Mb/s to max. rates of 6.6 Gb/s up to 28.05 Gb/s, offering a special low-power mode, optimized for chip-to-chip interfaces.
  • A user configurable analog interface (XADC), incorporating dual 12-bit 1MSPS analog-to-digital converters with on-chip thermal and supply sensors.
  • DSP slices with 25 x 18 multiplier, 48-bit accumulator, and pre-adder for high-performance filtering, including optimized symmetric coefficient filtering.
  • Powerful clock management tiles (CMT), combining phase-locked loop (PLL) and mixed-mode clock manager (MMCM) blocks for high precision and low jitter.
  • Quickly deploy embedded processing with MicroBlazeℱ processor.
  • Integrated block for PCI ExpressÂź (PCIe), for up to x8 Gen3 Endpoint and Root Port designs.
  • Wide variety of configuration options, including support for commodity memories, 256-bit AES encryption with HMAC/SHA-256 authentication, and built-in SEU detection and correction
  • Low-cost, wire-bond, bare-die flip-chip, and high signal integrity flip-chip packaging offering easy migration between family members in the same package. All packages available in Pb-free and selected packages in Pb option.
  • Designed for high performance and lowest power with 28 nm, HKMG, HPL process, 1.0V core voltage process technology and 0.9V core voltage option for even lower power.

FPGA is more than a sea-of-gates (repeat)

IP-Centric Design

SoC Design (repeat)

Soft Processor Development

▶
all
running...
wizard/gui gui to create customized processor custom software libraries processor hardware desc. compiled processor hardware user code Compiler & Linker Executable

Configuration and Loading

▶
all
running...
IP Core User Core custom HDL FPGA
▶
all
running...
load bitstream
▶
all
running...
Synthesized Processor IP Core User Core custom HDL FPGA
▶
all
running...
load software binary
▶
all
running...
Synthesized Processor IP Core User Core custom HDL compiled code FPGA

A Typical SoC-FPGA-based System

▶
all
running...
hello world ?>█ usb-tty driver user terminal/console/ user application PC Synthesized Processor IP Core User Core custom HDL compiled code JTAG Interface U S B USB-Serial USB-JTAG USB cable Virtual Com Port Other HW FPGA FPGA Board

Modern SoC-Centric Design Software

High-Level C/C++ -like Design Flow

Hardware Implications in Coding

Embedded Processor-Centric Design Flow

Hardware Software Codesign

The Design Warror's Guide to FPGAs, ISBN 0750676043, Copyright© 2004 Mentor Graphics Corp

DSP Design Flows


Silicon Virtual Prototyping

Board and multi-FPGA development

Concluding Points on Developement Flows

L16 FPGA Technology III

Mux vs LUT (repeat)

RAM / ROM with FPGA Designs

SoC Design (repeat)

≡