# HW 6 (Part A & B) ## Objective In this project students will learn to use generated IP Cores (block RAM and a CORDIC processor for caculating sqart root). An essential skill to develope is the coding of a control FSM to interface modules. You will also practice making use of parameters in a design to create effecient simulations. Below is the Datapath, not including control signals (the registers AH,AL,DH,DL and output MUX may be embedded within your statemachine if you so desire). You must identify the status and control signals and create a statemachine to control them. > ![alt text](data_path.svg) ## Due Dates * Due Tuesday Dec 11 (no late submissions will be accepted) ## Requirements * Your design should access 4 btyes from the serial port to initiate action * Your implmentation should assume 4 bytes will be sent: ADDRESS-HIGH, ADDRESS-LOW, DATA-HIGH, DATA-LOW * After 4 bytes are received your design should * read the contents of the RAM and send the HIGH-BYTE, then LOW-BYTE back through the uart * send the data bytes through the CORDIC Sqrt processor to compute the result * the sqrt result just computed should be stored in the RAM at the same address that was just read * your hardware design should use 115200 buad rate * you must include a simulation of the top-level design and discuss it in your report with a parameter to control baud rate. It is srongly recommended to consider modify the buad rate for the purpose of top-level simulation -- this should be conrolled by a SINGLE parameter set in your test-bench. You should not modify your UART implementation files between simulation and implementation. * if required you can instatiate other reduced-complexity versions of hardware to speed simulation (such as a smaller RAM), though your final hardware implemenation should meet the full project requirements * ****your FSM controller must as much as possible rely on use of status and control signals to manage the datapath timing. You should not rely on "fixed" waits such as a count of the number of clock cycles the squareroot operation takes. This represents a better abstraction.**** ### Generating a multi-cyle, pipelined squart root CORDIC processor Follow these steps to generate a multi-cycle pipelined square root processor #### Step 1: Select IP Core * Open the IP Catalog: Menu Window -> IP Catalog * Search for cordic * Many listings of CORDIC will be shown. They are all the same, just listed in different catagories. Double click any one of them to open Customize IP #### Step 2: Customize IP Core You must now configure the options in two tabs. Documentation for the CORDIC core explaining the options and how to use the generated core can be found here: https://www.xilinx.com/support/documentation/ip_documentation/cordic/v6_0/pg105-cordic.pdf Choose the Squart Root, Unsigned Integer. This assumes the data is whole integers instead of fixed point representation. <img width=50% src=cordic_option_1.png> <img width=50% src=cordic_option_2.png> Click OK and The generate window will show #### Step 3: Generate IP Core <img width=35% src=cordic_generate.png> * Finally click Generate -- ****The generation process may take some time**** You should now have a cordic_0 module available in your design. <img width=35% src=cordic_in_design.png> #### Step 4: Instatiate IP Core To instatiate the core, you can get a instatiation template. Under the Sources Window, click the IP Sources tab. The navigate to cordic_0.veo. <img width=75% src=cordic_instatiation_template_annotated.png> I am providing instructions here on use of the ports: * s_axis_cartesian_tvalid : set high for ONE CLOCK CYCLE to indicate new data * s_axis_cartesian_tdata : 16-bit input data * m_axis_dout_tvalid : indicates when the output has been updated according to when the input tvalid was set high. If you pulse the input tvalid for one clock cycle, this Will pulse for one clock cycle. * m_axis_dout_tdata : OUT STD_LOGIC_VECTOR(15 DOWNTO 0) ### Generating a RAM We'll generate a ram using the IP core generator. We'll choose to use block RAM (uses speciallized blocks of memory hardware on the FPGA). Alternatively the Distributed RAM option would haved use the LUT as RAM. You do not need to use an IP core generator or vendor specific to create RAM, but it is typically recommended. Use of Vendor-Specific Tool-Generated Cores and would be avoided for portable implemenations. Here is the seletion of Block Memory Generator: Documentation: https://www.xilinx.com/support/documentation/ip_documentation/blk_mem_gen/v8_4/pg058-blk-mem-gen.pdf On ****Port A Options Tab****, set ****Write Depth to 4096**** to allow 4096 memory locations Your Final Instatiation Template should look something like this: ~~~verilog blk_mem_gen_0 your_instance_name ( .clka(clka), // input clka .wea(wea), // input [0 : 0] wea .addra(addra), // input [11 : 0] addra .dina(dina), // input [15 : 0] dina .douta(douta) // output [15 : 0] douta ); ~~~ ### UART * you can use code provided at https://eclipse.umbc.edu/robucci/cmpe415/attachments/uart.v * Documention: https://reference.digilentinc.com/reference/programmable-logic/nexys-4-ddr/reference-manual#usb-uart_bridge_serial_port You'll need to configure use of the C4 and D4 pins in the Contraints/.xdc file. * If using windows, I strongly recommend use of RealTerm ### Controller FSM * You'll need to design a case-statement-based FSM to control the modules and make "top-level" system functional ## Timing Analysis Report and Clock Constraint (PART B) Your system clock is configured for 100 MHz. Run the STATIC timing analaysis and include the worst case slack for setup and hold in your report. To do this, implement your design and use the guide below: <img width=50% src=timing_report_annotated.png> Now, modify the clock contraint to see if your design could work with a 200 MHz (5ns instead of 10ns) clock. Use the guide below: <img width=50% src=clock.png> Note, this does not actually change the clock frequency on the FPGA board, since that is driven by fixed oscilator on the board. This mearly allows you to see if the design would be able to operate at a faster rate. Rerun the timing analysis report and provide the new Worst Case Slack. Briefly comment on the change in your report. REMEMBER TO RESTORE THE CLOCK CONSTRAINT TO 10 NS BEFORE SUBMITTING YOUR DESIGN. ## What to be sure to turn in * Create a report that briefly explains your design and your testing. You must have one testbench for each module. * Include the output of your Verilog testbench(s) in your report (THIS IS EXPLICITLY GRADED) with additional explanation about each testbench as needed to convince someone that each part of your design works and your simulation-based testing of each module is sufficient. * Include the worst case setup and hold slack as well as the same for a 200 MHz clock. REMEMBER TO RESTORE THE CLOCK CONSTRAINT TO 10 NS BEFORE SUBMITTING YOUR DESIGN. Provide Files: * [uart_loopback.v](../attachments/hw6/uart_loopback.v) * [uart_loopback_tb.v](../attachments/hw6/uart_loopback_tb.v) * [memory_hex_in.txt](../attachments/hw6/memory_hex_in.txt) Real Term: * [Real Term Test with Windows](../realterm/index.html) -