https://styleguide.umbc.edu/umbc-colors/

Multi-clock/Async Communications and Clock Management

Ryan Robucci

• Spacebar to advance through slides in order • Shift-Spacebar to go back • Arrow keys for navigation
• ESC/O-Key to see slide overview • ? to see help
Printable Version

References

For classroom presentation,

Clock Uncertainty

  • Correct physical design requires an appriciation for systematic and random errors
  • For synchronous design, concerned with clock skew
    • can be systematic, i.e. inherent to the design (imbalanced wiring delays, buffers)
    • can be random (e.g. wiring and device variation per manufacturing run, and even between each transistor in a design)
  • We are also concerned with temporal randomness, e.g. Clock Jitter
  • In general
    • systematic errors can be corrected for at design-time
    • manufacturing randomness can be mitgated through design or corrected (potentially completely compensated for) by post-manufacturing tunable circuits
    • Temporal noise like Clock Jitter can in general only be mitigated, but in some cases reduced through filtering

Example of bad clock distribution 18^{\dagger\dagger_{18}}

  • Imbalanced (systematic/design)
  • Balanced

  • Remeber, clock skew matters, delay may be OK if balanced

Multiple Clock Domains 19^{\dagger\dagger_{19}}

  • Many digital systems have more than one clock domain:

  • Need to synchronize the two clock domains, two basic building blocks:

    • Phase-locked loop (PLL)
    • Delay-locked loop (DLL)

Example: Classical clock recovery 20^{\dagger\dagger_{20}}

  • Clocking information embedded in data stream
  • Use PLL to recover the clock
  • State of system is stored in analog loop filter

Oversampled Clock/Data Recovery 21^{\dagger\dagger_{21}}

  • Oversample the data and perform phase alignment digitally
  • De-couples clock generation from tracking of data
  • Data must guarantee transitions (edges) to ensure tracking

Phase Alignment in Source Synchronous Systems 22^{\dagger\dagger_{22}}

  • Timing information carried by reference clock
  • Use DLL to ensure proper clock phase for sampling
  • Can correct for skew using DLL or PLL

Purpose of PLLs and DLLs

  • Phased-Locked Loops (PLLs) and Delay-Locked Loops (DLLs) are two options for removing clock delay
  • Additional Features:
    • Frequency Synthesis (clock multiplication and clock division and the combination of both)
    • Clock Conditioning (modifying duty cycle and phase shifting)
    • Multi-clock output (such as multiple clock with fixed phase or fixed frequency)
      • Allows specification of related clock domains (e.g. known phase)
    • Run-Time Adjustments (adjust delay/phase or frequency at run-time)

PLLs and DLLs

https://www.xilinx.com/support/documentation/application_notes/xapp132.pdf

  • DLL: based on a variable delay line based on voltage controlled delay (analog) or a chain of delay elements with selectable output tap along the chain
  • PLL: based on a variable (voltage controlled) ocilator,
  • Each, in addition to tunable element, includes a comparison of a distributed clock to a reference clock, called a Phase Detector, in order to generate a correction signal

  • The control loop can employ a filter to make small adjustments over time

  • Parts of DLLs and PLLs:

    • Phase-Frequency Detector
    • Charge Pump
    • Loop Filter
    • Tunable Delay/Freq
    • Feedback Divider

What is a Delay locked loop? 23^{\dagger\dagger_{23}}

  • based on delaying input reference clock, so reference clock jitter passes to output
  • no phase error accumulation
  • frequency synthesis is difficult, PLL is well-suited for this

Typical Loop Filter L17:S30^{\dagger_{L17:S30}}

A “charge-pump” acting as an ideal integrator

  • Incremental update signals increase or decrease charge over many cycles

Typical Osscilation/Delay Elements

(oscillator shown)

What is Phase locked loop? 24^{\dagger\dagger_{24}}

  • frequency multiplication is easy
  • can perform frequency synthesis according to ratio of integers
  • reference clock jitter reduced by filtering
  • suffers phase error accumulation

Use of a divider for Freq Multiplication

  • ignoring the initial locking procedure, by keeping every n-th edge in sync with the reference we can achieved a synchronized, higher-freq clock

\dagger\dagger\dagger\dagger\dagger\dagger (Rabaey et al.)

Linear Phase Detectors L17:S27^{\dagger_{L17:S27}}

  • Correction signal is linearly proportional to error

  • XOR phase detector - 90

    • sensitive to input duty cycle
  • SR phase detector - 180

    • 1-shots remove duty cycle sensitivity

Non-linear Phase Detector [EE 371 Lecture 17 29]

  • An ideal flip/flop should force a loop to lock at 0
  • The set-up time of the flip-flop will introduce phase offset
    • Symmetric structures can eliminate this problem [16]
    • Can be used to cancel the set-up time of an input sampler [13]
  • The loop dynamics change:

    • The loop is now a “bang-bang” system which dithers around a locking point:
    • Risky for a PLL, routinely done for DLL’s.
  • The dither magnitude depends on the delay through the loop and the “loop-gain”

Timing Loop Performance Parameters 25^{\dagger\dagger_{25}}

  • Phase Jitter:

    Varience in cycle to cycle clock period
    Related to ability to "filter" reference clock jitter (which could be desired or not)
  • Phase Offset: Error between output phase and reference phase
  • Bandwidth: rate at which output phase tracks reference, ability to track changing clock frequency/phase

  • Acquisition time (to lock)

  • Frequency range (lock range)

  • Practical Note on Lock:

    • Common to have an output "LOCK" signal to indicate when it is valid to start system operation. Can tie to a reset signal in your design

Zero Delay Buffer

Clock Management with DLL 26^{\dagger\dagger_{26}}

  • Can eliminate on-chip clock delay (so can PLL)
  • can also eliminate on-board clock delay by returning a reference from the board (so can PLL)
  • 4 fixed-phase outputs (0, 90, 180, 270) (can be implemented by multiple taps along a delay chain)
    • Multiple outputs are always phase-coherent to eachother (as opposed to parallel PLLs)
  • Selectable (RUN-TIME) phase shift ( n / 256 of the period) through configuration or through increment/decrement 1/256 of clock period or 50 picosecond granularity
  • Frequency synthesis (2x is common, PLL more capable)

DLL in Xilinx Virtex data/clock alignment 27^{\dagger\dagger_{27}}

Xilinx DLL with various phase outputs 28^{\dagger\dagger_{28}}

  • DLLs are commonly used to generate outputs at the same frequency but different phases, just by having multiple taps along the delay line

Using DLL to de-skew onboard clock signals 29^{\dagger\dagger_{29}}

  • FPGA can drive external clocks, and by passing back an external (board level) reference signal they can be adjusted to match a reference clock (at a known phase)
  • This compensates for unknown loads (common to see with external memory ICs)

Altera Cyclone II PLL (1) 30^{\dagger\dagger_{30}}

  • Phase-locked loop (PLL) is a closed-loop frequency-control system based on the phase difference between the input clock signal and the feedback clock signal of a controlled oscillator.

Main components:

  • Phase frequency detector (PFD)
  • Charge pump & loop filter
  • Voltage controlled oscillator (VCO)
  • Counters (N – pre-scale, M – feedback, C – post-scale)

Altera Cyclone II PLL (3) 32^{\dagger\dagger_{32}}

  • The output frequency is given by:

References for this topic 33^{\dagger\dagger_{33}}

Chapter 8, pp757-773, Digital Design Principles & Practices, John Wakerly.
“Metastability in Altera Devices”, Altera App Note 42.

“Using the ClockLock &ClockBoost PLL Features in APEX Devices”, Altera App Note 115.
“Using the Virtex Delay-Locked Loop”, XAPP-132.
“Advantages of APEX PLLs Over Virtex DLLs”, Altera TB60.

UART\dagger\dagger\dagger

  • UART: universal asynchronous receiver/transmitter
  • basic UART system provides robust, moderate-speed, full-duplex communication with only three signals: Tx (transmitted serial data), Rx (received serial data), and ground. In contrast to other protocols such as SPI and I2C, no clock signal is required because the user gives the UART hardware the necessary timing information.
  • Internal “clocks” of systems must be synchronized based on the data signal
  • System must both know the approximate clock rate and generate internal clocks
  • The error in the rates must be low enough to assume synchronization for one frame, so frame are kept short (<10 bits)

Start and Stop bits \dagger\dagger\dagger

  • Start bit: The first bit of a one-byte UART transmission. It indicates that the data line is leaving its idle state. The idle state is typically logic high, so the start bit is logic low.
  • The start bit is an overhead bit; this means that it facilitates communication between receiver and transmitter but does not transfer meaningful data.
  • Stop bit: The last bit of a one-byte UART transmission. Its logic level is the same as the signal’s idle state, i.e., logic high. This is another overhead bit.

Buad Rate \dagger\dagger\dagger

  • Baud rate: The approximate rate (in bits per second, or bps) at which data can be transferred.
  • A more precise definition is the frequency (in bps) corresponding to the time (in seconds) required to transmit one bit of digital data.
  • For example, with a 9600-baud system, one bit requires 1/(9600 bps) ≈ 104.2 µs. The system cannot actually transfer 9600 bits of meaningful data per second because additional time is needed for the overhead bits and perhaps for delays between one-byte transmissions.

Parity Bit \dagger\dagger\dagger

  • Parity bit: An error-detection bit added to the end of the byte.
  • There are two types—“odd parity” means that the parity bit will be logic high if the data byte contains an even number of logic-high bits, and “even parity” means that the parity bit will be logic high if the data byte contains an odd number of logic-high bits. This may seem counterintuitive, but the idea is that the parity bit ensures that the number of logic-high bits is always even (for even parity) or odd (for odd parity). So if you’re using even parity and the byte has three logic-high bits, the parity bit will be logic high, so that the total number of logic-high bits in the transmitted data (i.e., the byte itself plus the parity bit) is even.
    • By forcing the number of logic-high bits to be always even (for even parity) or odd (for odd parity), the parity bit provides a crude error-detection mechanism—if a bit gets flipped somewhere in the transmission process, the number of logic-high bits won’t match the chosen parity mode. Of course, the strategy breaks down if two bits are flipped, so the parity bit is far from bulletproof. If you have a serious need for error-free communication, I recommend a CRC.

Synchronizing and Sampling \dagger\dagger\dagger

  • To ensure that an active edge of the receiver clock can occur near the middle of the bit period, the frequency of the baud-rate clock sent to the receiver module is much higher (by a factor of 8 or 16 or even 32) than the actual baud rate.
  • Let’s say that one bit period corresponds to 16 receiver clock cycles. In this case, synchronization and sampling can proceed as follows:
    • The receive process is initiated by the falling edge of the start bit.
    • The receiver waits for 8 clock cycles, in order to establish a sampling point that is near the middle of the bit period.
    • The receiver then waits 16 clock cycles, which brings it to the middle of the first data-bit period.
    • The first data bit is sampled and stored in the receive register, and then the module waits another 16 clock * cycles before sampling the second data bit.
    • This process repeats until all data bits have been sampled and stored, and then the rising edge of the stop bit returns the UART interface to its idle state.

RS232

  • Hardware Flow Control: RTS (Request to Send) and CTS (Clear to Send) are optional additional hardware signals supporting hardware flow control. (additional control signals exist as well)
    • Disadvantage is additional physical conductors.
  • Software flow control instead uses special characters Xoff & Xon which a “receiver” sends to the “sender” to pause and resume transmission respectively.
    • Could be sent by either end
    • Disadvantage that Xon and Xoff characters may exist in the actual data being transmitted, requiring alternative encoding, sometimes using escape sequences or other methods like headers
    • Sending of Xon and Xoff can be delayed as compared to hardware flow control signals, so the communication should not send data too quickly