Lecture 15 – Multi-clock/Async Communications and Clock Management
Ryan Robucci
References used and others helpful as concise resources:
-
-
http://www.ee.ic.ac.uk/pcheung/teaching/ee3_DSD//Topic 6 - Clocking & Metastability.pdf
- Imperial College OF SCIENCE, TECHNOLOGY AND MEDICINE
- University of London
Department of Electrical & Electronic Engineering
B.Eng. & M.Eng. Third Year Course
Digital System Design (DSD) Spring 2009
Peter Y. K. Cheung
Course Website: http://www.ee.ic.ac.uk/pcheung/teaching/ee3_DSD/
-
https://www.allaboutcircuits.com/technical-articles/back-to-basics-the-universal-asynchronous-receiver-transmitter-uart/
-
Lecture 22: PLLs and DLLs, Weste and Harris
-
† Rabaey, Chandrakasan, and Nikolic, Digital integrated circuits: a design perspective, Prentice-Hall, 2003.
-
Intel® Quartus® Prime Timing Analyzer Cookbook https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/manual/mnl_timequest_cookbook.pdf
-
Michael H. Perrot, Tutorial on Phase-Locked Loops, CICC 2009 : https://www.cppsim.com/PLL_Lectures/digital_pll_cicc_tutorial_perrott.pdf
-
Prof. Steve Long, PLL Circuits handout USC : https://web.ece.ucsb.edu/~long/ece594a/PLL_intro_594a_s05.pdf
-
Phase Detector: digital, linear, mixer . .
The phase detector is a key element of a phase locked loop and many other circuits. There are several types ranging from digital to analogue mixer and more. https://www.electronics-notes.com/articles/radio/pll-phase-locked-loop/phase-detector-digital-analogue-mixer.php
-
Phase-Locked Loop (PLL) Fundamentals, Ian Collins https://www.analog.com/en/analog-dialogue/articles/phase-locked-loop-pll-fundamentals.html
-
S. I. Ahmed and T. A. Kwasniewski, "Overview of oversampling clock and data recovery circuits," Canadian Conference on Electrical and Computer Engineering, 2005., Saskatoon, Sask., 2005, pp. 1876-1881, doi: 10.1109/CCECE.2005.1557348.
-
Jaeha Kim and Deog-Kyoon Jeong, "Multi-gigabit-rate clock and data recovery based on blind oversampling," in IEEE Communications Magazine, doi: 10.1109/MCOM.2003.1252801.
- clocking is a concern ...
- within clock domains, especially when one considers differences in arrival time of source and destination clock edges and various variations in data path delays
- between clock domains especially in cases with unknown clock signal relationships
- for input and output data especially with unknown or non-existent external clocks and unknown time constraints on external paths
Clock and Data Distribution
- at the PCB level, commonly-sourced clock edges can arrive at various ICs at different times because of differnt buffering and signal propagation delays
- external (PCB) and internal (to IC) clocks can be skewed
- delay compensation (delay/phase tunning circuity) can be used to account for this
- external IC datapath delays must be considered along with clock delays
- note that the delay between two systems can be larger than a clock period
- example, long transmission line
- systems cannot handshake at the clock rate
- Internal delays within an IC are typically well-controlled and predictable as compared that faced with external communication
- Clock Distribution and Digital Logic network must ensure that input data is sampled at the correct time, safety margins are chosen appropriately
- Synchronous
- Systems have same clock frequency, known (relative) phase
- Mesochronous
- Systems have same frequency, unknown (relative) phase
- Plesiochronous
- systems have almost same frequency (such as with independently generated clocks)
- apparent relative phase of clocks can drift
- entire clock period offset can lapse -- called a bit slip
Clock Signal Management and Processing Building Blocks
- handing of clock signals is quite different than boolean logic design, and requires new building blocks
- clock signal distribution (buffering) is handled by a special circuity on the FPGA as part of a clock network
- The following building blocks allow for many clock adjustments including tuning delay, changing clock frequency and more discussed later
- Phase-locked loop (PLL)
- Delay-locked loop (DLL)
Systematic and Random Clock Error
-
Correct physical design requires an appreciation for systematic and random errors
-
For synchronous design, concerned with clock skew
- can be systematic, i.e. inherent to the design (imbalanced wiring delays, buffers)
- can be random (e.g. wiring and device variation per manufacturing run, and even between each transistor in a design)
-
We are also concerned with temporal randomness, e.g. Clock Jitter
-
In general
- systematic errors can be corrected for at design-time
- manufacturing randomness can be mitigated through design or corrected (potentially completely compensated for) by post-manufacturing tunable circuits
- Temporal noise like Clock Jitter can in general only be mitigated, but in some cases reduced through filtering
-
PLL/DLL can be provided to account for known systematic error, can be tuned in-system to account for manufacturing and component variations (like cable length), and perform correctly for temporal variations like a drifting clock frequency or clock jitter
- source clock signal is explicitly transmitted
- commonly concerned with serialized data that must be transmitted at a fast rate
- delays can be large compared to bit periods, meaning clock and data path delays are less proportionally accurate and might not ensure optimum sampling point
- in case of long-distance communication, multiple clock periods
- adjustments at receiver can tune a clock offset to set optimal sampling point of transmitted data signal
- can maximize timing safety margins to minimize bit error rates
- clock tuning building block PLL or DLL is required
Clock and Data Recovery (CDR)
-
Clock signal is NOT transmitted directly
-
Clock is recovered from the data input at the receiver
- Transition edges of data are detected and used to find the clock frequency and phase (time offset)
-
Avoids need to transmit a well-aligned clock in a parallel path
-
PLL/DLL circuits can monitor edges of incoming data to infer clock phase and frequency
-
Requires that incoming data have a sufficiently small intervals between data transitions (can't be long transition-free runs strings, 111...
or 000...
)
-
Simple example scheme to minimize transition free runs: insert an extra opposing bit after 4 consecutive matching bits (digital logic can implement an encoder that inserts bits after stretches and decoder detects stretches and removes inserted bit)
001111100001
001111
010000
11
-
RLL: run-length limited coding describes other methods (https://en.wikipedia.org/wiki/Run-length_limited)
- interesting that RLL coding used in magnetic and optical media to infer spacial alignment on physical surfaces
Ref: Jaeha Kim and Deog-Kyoon Jeong, "Multi-gigabit-rate clock and data recovery based on blind oversampling," in IEEE Communications Magazine, doi: 10.1109/MCOM.2003.1252801.
- This approach utilizes digital processing to recover the clock
- Data is sampled and multiple phases and digital processing examines all of the samples to infer the location of data edges and select the best (most reliable) sample to use as the data result
- The digital processing complexity can vary, but introduce cost such as delays in data propagation
Purpose of PLLs and DLLs
PLLs and DLLs
https://www.xilinx.com/support/documentation/application_notes/xapp132.pdf
- DLL: based on a variable delay line based on voltage controlled delay (analog) or a chain of delay elements with selectable output tap along the chain
- PLL: based on a variable (voltage controlled) oscillator
-
Each, in addition to tunable element, includes a comparison of a distributed clock to a reference clock, called a Phase Detector, in order to generate a correction signal
-
The control loop can employ a filter to make small adjustments over time
-
Parts of DLLs and PLLs:
- Phase-Frequency Detector
- Charge Pump
- Loop Filter
- Tunable Delay/Freq
- Feedback Divider
- based on reproducing but delaying each edge of the input reference clock, the input clock drives the production of the output signal
- the immediate inter edge intervals input clock are propagated to the output
- cycle-to-cycle reference clock timing jitter therefore is not removed
- as compared to a PLL there is no short term phase error accumulation, since the output responds to the input reference clock directly
- as compared to a PLL, frequency synthesis (freq other than that of the reference clock) is difficult
A representative 1-st order analog loop filter:
- Incremental update signals increase or decrease charge over many cycles. A pulse of current provides the incremental charge and corresponding voltage update.
- The output response is like a low-pass filter responding slowly to an input signal.
- The designed constants for the rate of change affect the properties of this low-pass filter
- Faster low-pass filters respond more quickly but propagate fast variations which may be noise that one would rather reject
- Slow filters respond slowly but can reject undesirable high frequency fluctuations
- Higher order filters can be implemented if required, a 1-st order filter depicted
Delay and Oscillator Blocks
- a power voltage regulator can be set by the control signal to change the delay of an inverter providing the basis for the delay line required by a DLL
- several inverters in series create a delay chain
- the delay is proportional to the number of inverters and the delay of each inverter
- tunable capacitance, resistance, and current drive are all possible, Harris (slide 10)
- using an odd number of inverters and a feedback connection implements a structure known as a ring oscillator which is a free-running oscillator requiring no input clock, suitable for synthesis and is used in the PLL
- multiple taps can be accessed to provide multiple phase offsets (multiple synchronized clocks) or online digitally selectable phase tuning
- an "analog" VCO with fewer stages can be can also be used, but does not provide multiple taps. Commonly PLLs may not provide multiple taps
-
DLLs can generate multiple outputs at the same frequency and coherent but as different constant phase differences, just by having multiple taps along the delay line (common to have 0, 90, 180, and 270 ∘ available)
-
Feedback may be selected from probes of the locally generated clk signal, global or regional clocks, or even external (board) references to compensate for skew
-
While typical to see 2x freq multiplication (can be cascaded for 4x), provided PLL blocks are typically more capable
-
duty cycle correction (default) option creates outputs with 50/50 duty cycle though input clock may not have the same
-
typically support run-time configuration of delay, such as by multiplexing delay taps
- oscillator generates a clock signal
- loop filter output can adjust the clock frequency
- phase detector compares the generated clock to a reference clock to compute an error signal which is used to create correcting control signals for the loop filter
- later slides detail this component
- frequency multiplication is easy as compared to DLL, perform frequency synthesis of a multiplication by an integer by added digital clock dividers in the feedback loop (discussed later)
- can perform frequency synthesis also according to ratio of integers by adding digital clock dividers in the feedback loop and the output path (discussed later)
- reference clock jitter reduced by filtering effect, each edge of the input is only used to make a small adjustment to the output clk, but it doesn't directly drive it
- suffers phase error accumulation (cannot immediately correct its phase to an input clock with changing phase/ inter-edge intervals)
- The term linear here refers to the fact that the correction signal is somehow linearly related to the error
- Small error produce small corrections, and large errors produce large corrections.
- Here, the width of the correction signal is considered to be the magnitude of the correction

-
Non linear phase detectors have a binary (all or nothing) correction signal
not directly proportional to the amount of phase error

-
As compared to a linear filter, this filter always generates a correction signal up or down
-
A system with this type of control tends to stabilize to behavior of alternating up then down which balance but creates steady variation known as dither noise (common in "bang-bang" control systems)

- Common therefore for DLLs but less desireable for PLLs and complicates control dynamics as well for PLLs
- The amount of dither noise at the output depends on the loop filter...slower response filters reject more dither noise
-
a clock divider (toggle flipflop or other counter) can be used in the feedback loop to slow the frequency of the feedback signal requiring the oscillator to run faster to compensate
-
ignoring the initial locking procedure, by keeping every m-th edge in sync with the reference we can achieved a synchronized, higher-freq clock
† (Rabaey et al.)
-
Phase (Timing) Noise (Jitter):

Variance in cycle-to-cycle clock period
Related to ability to "filter" reference clock jitter (which could be desired or not)
-
Phase Offset: Error between output phase and reference phase

ϕerror=TToffset×2π [rad]
ϕerror=TToffset×360 [deg]
-
"Loop" Bandwidth: determines rate at which output phase tracks reference changes, ability to track changing clock frequency/phase

- for periodic signals, a delay of an integer number of periods is the the same as no delay
- A DLL can facilitate an external feedback port to compensate for external delays and achieve apparent zero-delay between the reference input and the feedback input by ensuring for instance a one-period delay
- PLL can do the same, adjusting its phase until the point of reference is in-phase with the reference signal
https://www.intel.com/content/www/us/en/programmable/documentation/mcn1401782837027.html#mcn1401871785037

The main blocks of the PLL are the
- phase frequency detector (PFD)
- charge pump,
- loop filter,
- VCO, and
- counters, such as a
- feedback counter (M),
- a pre-scale counter (N), and
- post-scale counters (C).
An output frequency is
- The feedback signal freq is FIN/N×M
- The output signal freq is FIN/N×M/C
Note that multiple post-scale factors can be provided sharing only one PLL feedback loop
Example:

- UART: universal asynchronous receiver/transmitter
- two systems communicate with an predetermined clock rate
- (receivers could can use signal processing to infer an unknown clock rate but not typical)
Start and Stop bits
-
start bit: provides transition from high-to-low for frame alignment
-
baud rate: determines interval between bits (bit rate)
-
stop bit: ensures return to high regardless of final bit to facilitate next alignment event (configurable rest interval between packets, stop bit length can be 1,1.5, or 2)
-
word-length: typically 8 data bits are sent, though can be 9
-
parity bit: optional, can be added at end of data packet before stop bit for error detection
-
Receiver Operation
- Receiver operates on a clock faster than the baud-rate, typically at least 16x
- receiver oversamples the input stream
- the start bit provides an alignment event
- a counter is used to generate read-enable signals ideally in the middle of valid data bits. Requires a known relationship between the sample clock and the baud-rate
Bit Slippage and Clock Drift
- Bit Slippage (loss of bit or sampling bit twice) occurs when clocks of either the transmitter or receiver are not well-matched (or system cannot account for the mismatch)
- Packets are limited in length (e.g. 8 data bits) so as to avoid bit slippage towards end of packet
Alignment worsens longer after the alignment event, packets are limited in length.
Hardware and Software Flow Control
- Hardware flow control: uses extra "wires" to manage data flow
- Software flow control: reserves certain codes (Xoff & Xon) for handshaking
- Be sure these options are set correctly in your terminal program or in the two systems you are using