$\bigodot$ 2019 Ashwarya Rajwardan

#### RECEIVER EQUALIZATION FOR A 10 GIGABIT PER SECOND HIGH-SPEED SERIAL LINK IN 65 NM CMOS TECHNOLOGY

BY

#### ASHWARYA RAJWARDAN

#### THESIS

Submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical and Computer Engineering in the Graduate College of the University of Illinois at Urbana-Champaign, 2019

Urbana, Illinois

Adviser:

Professor José Schutt-Ainé

# ABSTRACT

This thesis addresses the receiver equalization techniques for a 10 Gbps USB 3.1 link in 65 nm CMOS technology. Two types of equalizers are implemented: a continuous time linear equalizer (CTLE) and a 1-tap full-rate decision feedback equalizer (DFE). The combined CTLE and DFE architecture is simulated with an rms receiver clock jitter of 5.3 ps and achieves a BER  $< 10^{-12}$  while consuming 3.3 mW at the Nyquist frequency of 5 GHz.

To my parents and Aruneema, for their love and support.

### ACKNOWLEDGMENTS

I have been a part of ECE Illinois since fall 2014 and making it this far would have been impossible without the help and encouragement of some of the most talented people at the University of Illinois at Urbana-Champaign and my family.

First and foremost, I wish to thank my incredible advisor, Professor José Schutt-Ainé. I joined his research group in fall 2015 as an undergraduate research assistant and have learned a great deal about signal integrity ever since. I am immensely thankful to him for pushing me to approach difficult problems piece by piece and develop a mental framework for research. He has has been like a father figure to me and always found time to discuss issues whenever I showed up in his office. He is an inspirational figure and I am grateful to him for making me more capable of facing challenges in the future.

My parents, Dr. Arun Sharma and Sanyukta, have been a constant pillar of support. I cannot thank them enough for being bold enough to send me abroad for higher education. Without their love and a bold vision for my career, I would not have had a chance to attend such a great institution and make it this far in my career. I would also like to thank my sister, Aruneema, for checking up on me frequently.

I am very fortunate to have interacted with some of the incredibly smart students at UIUC. The knowledge and guidance provided by Donkgwook Kim and Da Wei on equalizer circuits were indispensable. In addition to them, fellow graduate students Thong Nguyen, Xinying Wang, Bobi Shi and Assistant Professor Xu Chen also helped me shape my thesis project. I am also immensely grateful to Professor Chandrasekhar Radhakrishna for guiding me throughout my undergraduate and graduate career on academic life and career goals.

Besides students and colleagues at UIUC, Jen Carlson, Assistant Director

of Academic Programs, and James Hutchinson, Publications Editor, deserve a token of appreciation. Jen's advice on navigating the graduate curriculum has been very helpful. James' thorough review and multiple revisions made sure that my thesis comes out in the best possible form.

A special mention must be made for Arunita Kar. She has been bringing happiness into my life since 2017 and I am thankful for all her love and encouragement to push through the graduate program at UIUC.

I am also thankful to have Sakshi Srivastava, Rushabh Mehta, Anurag Choudhary, and Naishadh Sambrani as my friends who are smart, talented and caring and have encouraged me to be a better version of myself.

Lastly, I am thankful to these amazing people: Nick Ratajczyk, Corey Snyder, Brendan Eng, Prateek Garag, Jigar Patel, Ali Kourani, Varun Krishna, Chris Yim, Vishesh Verma and Robert Kummerer for a good social life outside academia.

# TABLE OF CONTENTS

| LIST OF ABBREVIATIONS                                                                                                                         |
|-----------------------------------------------------------------------------------------------------------------------------------------------|
| CHAPTER 1 INTRODUCTION       1         1.1 Motivation       1         1.2 Outline of Thesis       2                                           |
| CHAPTER 2       OVERVIEW OF EQUALIZERS       3         2.1       Transmitter Equalizers       5         2.2       Receiver Equalizers       6 |
| CHAPTER 3CONTINUOUS-TIME LINEAR EQUALIZER83.1Design Overview83.2Transistor Implementation103.3Simulation Setup and Results11                  |
| CHAPTER 4DECISION FEEDBACK EQUALIZER164.1Design Overview164.2Transistor Implementation184.3Simulation Setup and Results24                     |
| CHAPTER 5       CONCLUSION       31         5.1       Summary       31         5.2       Future Work       31                                 |
| REFERENCES                                                                                                                                    |

# LIST OF ABBREVIATIONS

BER: Bit Error Rate CLK: Clock CTLE: Continuous Time Linear Equalizer dB: Decibels DFE: Decision Feedback Equalizer FFE: Feed Forward Equalizer Gbps: Gigabits/sec GTps: Gigatransfers/sec nm: Nanometer NMOS: N-Type Metal Oxide Semiconductor PMOS: P-Type Metal Oxide Semiconductor Pk-Pk: Peak-to-peak **RX**: Receiver RMS: Root Mean Square SNR: Signal to Noise Ratio TX: Transmitter **UI: Unit Interval**  $\mu$ m: Micrometer

# CHAPTER 1

# INTRODUCTION

#### 1.1 Motivation

High-speed serial links (HSSL) form the heart of reliable wired communication in almost very device around us. USB 3.1, HDMI 2.0, PCIe, SATA III and Thunderbolt are some popular examples. Table 1.1 summarizes the data rates of some of the popular standards today. Particularly for the USB, the data rates have increased significantly from USB Gen 1.0 to USB Gen 4.0 as shown in the graph in figure 1.1 [1].

Table 1.1: Serial Link Data Rates

| Parameter       | Value           |
|-----------------|-----------------|
| Thunderbolt 3.0 | $40 { m ~Gbps}$ |
| HDMI 2.0        | $18 { m ~Gbps}$ |
| PCIe 4.0        | 16 GTps         |
| USB 3.1         | 10 Gbps         |
| SATA III        | $6 { m ~Gbps}$  |

One can observe from the graph that there is an ever-increasing demand for a high-bandwidth and a robust wireline communication system. One way to increase bandwidth of wireline communication is to have several parallel links. However, parallel links pose issues such as crosstalk, timing skew, large on-chip area and a high cost of manufacturing. Fortunately, a lot of advancements have been made in the area of serial links to enable very high data rates. High-bandwidth equalizers are one such advancement. They are typically implemented on both the transmitter and the receiver side and help counteract channel losses, ISI and jitter. In this thesis two types of receiver equalizers, namely CTLE and DFE, are discussed and designed. Both the circuits are commonly used and there are a lot of variations available today that achieve high data rates while being highly power efficient. For this



Figure 1.1: USB trend.

thesis, the equalizers are implemented using conventional topologies to serve as proof of concept for future designs.

### 1.2 Outline of Thesis

Chapter 2 provides a brief overview of equalizers and their usage in highspeed serial links. Chapter 3 discusses the transistor-level implementation and simulation results of a CTLE followed by the transistor-level design and simulation results of a combined CTLE and DFE circuit in Chapter 4. Chapter 5 concludes the thesis with discussion of design improvements and scope for future work.

### CHAPTER 2

# OVERVIEW OF EQUALIZERS

In the previous chapter we introduced the concept of equalizers and how they can be implemented on the transmitter and receiver side of the serial link. Figure 2.1 [2] shows the placement of equalizers.



Figure 2.1: High-speed serial link with equalizers. Adapted from [2].

In a typical serializer-deserializer circuit, parallel data bits coming from source is serialized first, sent over the channel and then converted back to parallel bits using a de-serializer. If the serialized data is sent over the channel in its raw form then the pulse will look like something as shown in figure 2.2.

A single pulse spreads out over multiple symbol periods thereby creating ISI. The ISI is quantified by terms called pre-cursors and post-cursors. In figure 2.2,  $a_{-1}$  represents the first pre-cursor,  $a_1, a_2, a_3$  represent post-cursors and  $a_0$  represents the main cursor, which is the sampling point for the actual data bit. The TX FIR equalizer, RX CTLE and DFE distort the pulse to suppress the pre- and post-cursors. The TX clock, generated using a PLL, drives the serializer and TX equalizer. The RX clock, recovered from data bits using a CDR, drives the DFE for sampling and the de-serializer. This thesis implements equalizers for a USB 3.1 channel derived from an ADS



Figure 2.2: Pulse response of a channel.

workspace [3] as shown in figure 2.3. The channel loss is shown in figure 2.4.



Figure 2.3: A typical USB 3.1 link with vias and type A/B receptacles.



Figure 2.4: Channel loss (SDD21) of a typical USB 3.1 link with vias and receptacles.

### 2.1 Transmitter Equalizers

The objective of TX equalizers is to pre-distort the pulses to negate the effects of the channel. They can address both the pre-cursor and post-cursor issue. The TX equalizers achieve the ISI cancellation using FIR filters implementing pre-emphasis or de-emphasis. Pre-emphasis or de-emphasis here means the boosting of high-frequency components. Figure 2.5 shows a block diagram of a pre-emphasis FIR filter with three taps. The original data bit is delayed to get  $D_{n-1}$  and  $D_{n-2}$ , multiplied by certain weights  $C_i$  using taps and added at the summing node to get pre-distorted data.



Figure 2.5: Pre-emphasis FIR filter. Adapted from [4].



Figure 2.6: Intel Stratix IV GX 1-tap pre-emphasis simulation. Adapted from [5].

Figure 2.6 shows the result of pre-emphasizing for a 1-tap pre-emphasis

FIR filter. A pre-emphasis filter has some limitations, as discussed in [4].

- 1. The pre-emphasis filter cannot improve SNR.
- 2. It requires large voltage swing for a good equalization. This results in cross-talk.
- 3. It requires high-resolution DACs.
- 4. Some residual ISI terms still remain despite pre-emphasis reducing voltage and timing margins.

### 2.2 Receiver Equalizers

The objective of receiver equalizers is to improve the BER by boosting the high-frequency and attenuating the low-frequency component of the received signal and removing the post-cursors to eliminate ISI. This can be achieved in analog domain or digital domain. In the past, implementation of equalizers in digital domain involved use of power hungry ADCs [6], but recent low-power architectures like the ones proposed in [7] show a great potential. Despite recent advancements in digital equalization, analog equalization still remains a popular choice. There is no involvement of high-speed ADCs. In analog domain, equalization can be achieved using a continuous-time equalizer and a decision feedback equalizer (DFE). A continuous-time equalizer can be implemented either passively as shown in figure 2.7 or actively using MOSFETs as shown in figure 3.1.



Figure 2.7: Passive continuous equalizer. Adapted from [4].

$$H(s) = \frac{R_2}{R_1 + R_2} \frac{1 + R_1 C_1 s}{1 + \frac{R_1 R_2}{R_1 + R_2} (C_1 + C_2) s}$$
(2.1)

The passive version has some major drawbacks [4]:

- 1. It can cause impedance mismatches, thereby causing a need to use inductors that can be too large for on-chip integration.
- 2. There is no improvement in SNR. The passive circuit cannot provide any gain over 0 dB.

An active continuous-time equalizer (CTLE) as implemented in Chapter 3 gives more control over the transfer function and can provide boost greater than 0 dB and ensure greater eye-opening. A CTLE is usually used in conjunction with a DFE as the data bits can have residual post-cursors which can be cancelled almost completely by the DFE. Chapter 5 discusses the design and implementation of a conventional current-summer DFE architecture.

### CHAPTER 3

# CONTINUOUS-TIME LINEAR EQUALIZER

### 3.1 Design Overview

A continuous-time linear equalizer (CTLE) as shown in figure 3.1 is essentially a differential amplifier with RC source degeneration. The resistive  $(R_S)$  and capacitive  $(C_S)$  degeneration provide a two-pole/one-zero system that enables a peaking gain to counteract the lossy profile of the channel at Nyquist frequency. Depending on the peaking gain, the CTLE can provide gain and equalization with a low power and on-chip area. Although the differential CTLE architecture is immune to common mode noise that can couple into its input, it introduces noise by itself due to high-frequency boost action [4]. The noise boost issue can be solved by using a DFE after CTLE.



Figure 3.1: CTLE.

The transfer function is as follows:

$$H(s) = \frac{\frac{g_m}{C_L} \left(s + \frac{1}{R_s C_s}\right)}{\left(s + \frac{1 + g_m R_s/2}{R_s C_s}\right) \left(s + \frac{1}{R_L C_L}\right)}$$
(3.1)

The poles and zeros are typically chosen on the basis of channel response. Consider the frequency response of a USB C link model as shown in figure 2.4. At 5 GHz, the insertion loss is about 12 dB. We can design a CTLE to give us a peaking gain of 12 dB at 5 GHz. However, it will translate into a very large transconductance gm and also consume lot of power. Since the CTLE circuit is followed by DFE in this thesis, a decent peaking gain of 5 dB should be good enough to open the eye. The parameters chosen for the design are shown in table 3.1

 Table 3.1: CTLE Design Parameters

| Parameter         | Expression                        | Value              |
|-------------------|-----------------------------------|--------------------|
| Zero $(f_z)$      | $\frac{1}{2\pi R_s C_s}$          | 500  MHz           |
| Pole $1(f_{p1})$  | $rac{1+g_m R_s/2}{2\pi R_s C_s}$ | $1~\mathrm{GHz}$   |
| Pole 2 $(f_{p2})$ | $\frac{1}{2\pi R_L C_L}$          | $10 \mathrm{~GHz}$ |
| Peaking Gain      | $g_m \tilde{R}_L^{-}$             | 5  dB              |
| DC Gain           | $\frac{g_m R_L}{1+q_m R_s/2}$     | -1 dB              |
| Load Capacitance  | $C_L$                             | $20~\mathrm{fF}$   |
| Supply Voltage    | Vdd                               | $1.2 \mathrm{V}$   |

Since the input capacitance of the DFE discussed in this thesis is about 1 fF, designing CTLE for a load of 20 fF should prevent any loading effect. Typically variable  $R_S$  and  $C_S$  are implemented to change CTLE characteristics once the design is set.

#### 3.1.1 Design Equations

$$R_L = \frac{1}{2\pi * C_L * f_{p2}} \approx 800 \ \Omega \tag{3.2}$$

$$R_S = 2 * R_L * (10^{-DCGain/20} - 10^{-PeakingGain/20}) \approx 890 \ \Omega$$
 (3.3)

$$gm = \frac{10^{DCGain/20}}{R_L - 10^{DCGain/20} * R_S/2} \approx 2.8 \text{ mS}$$
(3.4)

$$C_S = \frac{1}{2 * pi * R_S * f_Z} \approx 360 \text{ fF}$$
(3.5)

### 3.2 Transistor Implementation

The CTLE schematic in Cadence Virtuoso is shown in figure 3.2. The transistor width, length and biasing current parameters are shown in table 3.2. The transistor parameters were chosen to get the desired gm. The parametric sweep feature of Cadence Virtuoso was used to arrive at the results.



Figure 3.2: CTLE Cadence schematic.

| Parameter           | Value                            |
|---------------------|----------------------------------|
| Vdd                 | 1.2 V                            |
| idc                 | $500 \ \mu A$                    |
| $\mathrm{R}_L$      | $800 \ \Omega$                   |
| RS                  | $890 \ \Omega$                   |
| $\operatorname{CL}$ | $20~\mathrm{fF}$                 |
| M0: $W/L$           | $15~\mu{ m m}/60~{ m nm}$        |
| M1: $W/L$           | $15~\mu{ m m}/60~{ m nm}$        |
| M2: $W/L$           | 200  nm/60  nm                   |
| M4: $W/L$           | 200  nm/60  nm                   |
| M6: W/L             | $200~\mathrm{nm}/60~\mathrm{nm}$ |

Table 3.2: CTLE Transistor Parameters

### 3.3 Simulation Setup and Results

#### 3.3.1 AC Analysis

To check the frequency response of our CTLE circuit a test-bench for ac analysis was implemented as shown in figure 3.3.

The transmitter is implemented using **vdc** in the simulation tool for common mode voltage of 600 mV and a **vsin** and a **vcvs** for a differential sinusoid with 250 mV peak-to-peak. The simulation was run from 10 MHz to 100 GHz and the following transfer functions were used:

1. For CTLE: 
$$20log_{10}(\frac{voutp-voutn}{vin-vin})$$

- 2. For Channel:  $20log_{10}(\frac{vch\_p-vch\_n}{vip2-vin2})$
- 3. For CTLE + Channel:  $20log_{10}(\frac{voutp2-voutn2}{vip2-vin2})$

In figure 3.4 the peaking gain, which is the difference between the peak gain and DC gain, is about 5 dB. Thus, the overall system response sees a boost of 5 dB at Nyquist frequency.

#### 3.3.2 Pulse Response

The test-bench for pulse response is same as that for transient analysis except **vsource** was replaced by **vpulse**. The following parameters were set:



Figure 3.3: CTLE ac analysis test-bench.

- 1. Voltage 1 = -600 mV
- 2. Voltage 2 = 600 mV
- 3. Period = 60 ns
- 4. Rise time = fall time = 35 ps
- 5. Pulse width = 100 ps

After running the transient simulation, the pulse response before and after CTLE was obtained as shown in figure 3.5. It can be seen that the frequency boosting action of CTLE knocks down the post-cursors significantly. The pulse before CTLE has first post-cursor value of 0.39 and the pulse after CTLE has the first post-cursor value of 0.15, which is a 2.6X reduction from channel response. Other post-cursors are reduced to zero thereby relaxing the requirements of DFE circuit in the next stage.

#### 3.3.3 Transient Analysis

To run transient simulation, the same setup as shown in figure 3.3 was used with the exception of  $\mathbf{Vdc}$  which was replaced by **vsource**. The modified



Figure 3.4: Frequency response: CTLE, CTLE+Channel, Channel.



Figure 3.5: CTLE pulse response at input (green) and output (red).

Tx circuit is used to generate a fully differential square wave signal. In the properties of **vsource** a PN10 PRBS sequence with the following properties was chosen:

- 1. Zero value = -600 mV
- 2. One value = 600 mV
- 3. Bit period = 100 ps
- 4. Rise time = fall time = 0.35\*bit period
- 5. Edge type = halfsine

One can choose the PN sequence number in the **LFSR Mode** section of the **vsource** properties.

Then in the **ADE L** window after setting all the transistor parameters a transient simulation was run with stop-time of 105 ns to cover 1000 bits. The eye-diagram was plotted for CTLE output **voutp2-voutn2** and channel output **vch\_p2-vch\_n2** with the following settings:

- 1. Start time: 0 ns
- 2. End time: 105 ns
- 3. Period: 200 ps (2 UI)
- 4. Intensity option: Checked on

| Parameter              | Before CTLE        | After CTLE         |
|------------------------|--------------------|--------------------|
| Mean One Level         | 296  mV            | $350 \mathrm{mV}$  |
| Mean Zero Level        | -335  mV           | - 370 mV           |
| Vertical Eye Opening   | 140  mV            | $371 \mathrm{mV}$  |
| Horizontal Eye Opening | $58 \mathrm{\ ps}$ | 82  ps             |
| Period Jitter (pk-pk)  | 40.35  ps          | $20 \mathrm{\ ps}$ |

Table 3.3: CTLE eye-diagram results

From table 3.3 and figure 3.6 it can be observed that there has been about 4X increase in vertical eye-opening and 1.3X increase in horizontal eye-opening. The periodic jitter also decreased by 1.6X.



Figure 3.6: Eye-diagrams before and after CTLE equalization.

# CHAPTER 4

# DECISION FEEDBACK EQUALIZER

#### 4.1 Design Overview

A decision feedback equalizer (DFE) is a non-linear feedback circuit implemented on the receiver side of a high-speed link. Its main goal is to minimize the post-cursors of the data pulse thereby reducing ISI and improving bit error rate (BER). There are three main parts of the circuit: Summer, Slicer and FIR feedback filter. Refer to figure 4.1. The summing node sums the incoming symbol d[n] with a weighted  $w_i$ , time-shifted version of itself to cancel out the post-cursor and produce an output y[n] as shown in equation 4.1. The weights  $w_i$  are derived from post-cursor data.

$$y[n] = d[n] - w_1 d[n-1] - w_2 d[n-2] \cdots - w_n d[n-1]$$
(4.1)



Figure 4.1: Block diagram of DFE.

The slicer decides whether the symbol is a 1 or a 0. The feedback filter comprises a flip-flop and a current steering DAC. The flip-flop acts as a memory element to store the previous data bit, and the current steering DAC sinks current from the summing node to cancel the post-cursor. For the DFE to work, the most important condition to be met is settling time. The first loop of the DFE must settle within 1 Unit Interval. This implies that for the first loop,

$$T_{D1} = T_{setup} + T_{CK->Q} + T_{settle} < 1UI \tag{4.2}$$

and for second loop,

$$T_{D1} = 2T_{setup} + 2T_{CK->Q} + T_{settle} < 2UI \tag{4.3}$$

and so on.  $T_{setup}$  refers to setup time of the data pulse before the clock edge,  $T_{CK->Q}$  refers to CLK-to-Q delay of the slicer and  $T_{settle}$  refers to the RC settling time at the summer node. The number of taps required in the DFE circuit is determined by the length of the post-cursor tail. The longer the tail, the greater the number of taps. However, increasing the number of taps results in greater power consumption, as shown in figure 4.2 and reduction in circuit's bandwidth due to increasing drain capacitance of the taps.



Figure 4.2: Power consumption vs. number of taps for 10 Gbps (QPSK) DFE. Adapted from [8].

Unlike CTLE, DFE does not amplify noise while boosting the high-frequency

component of the input signal. This helps increase the SNR of the signal. However, the circuit has certain drawbacks [2]:

- 1. It cannot cancel pre-cursors.
- 2. It is difficult to meet timing requirements for the feedback path.

### 4.2 Transistor Implementation

This thesis implements the summer using a current-summer architecture, slicer using a strong-arm latch and an  $\tilde{S}\tilde{R}$  latch. The  $\tilde{S}\tilde{R}$  latch acts a memory element. The schematics are shown in figure 4.3 and figure 4.5.



Figure 4.3: Conventional current summer.

A current summer topology is used for its ease of implementation. However, it suffers from one major drawback - high power consumption. The biasing current sources Icursor/2 and Itap1 are implemented using NMOS transistors M7 and M8, which are in turn biased using current mirror transistors M6 and M5 respectively. The actual implementation in Cadence Virtuoso is shown in figure 4.4. For slicer, a strong-arm latch cascaded with  $\tilde{SR}$ latch is implemented using the NAND gate topology as shown in figure 4.5. One can use either a strong-arm latch topology or a CML latch topology. The former has no static power dissipation and is slower than the latter.



Figure 4.4: Summer implementation in Cadence Virtuoso.



Figure 4.5: Slicer components.

#### 4.2.1 Design Procedure

1. Slicer

The slicer is composed of a strong-arm comparator [9] and an  $\tilde{S}\tilde{R}$  latch as shown in figure 4.5. For the first iteration of design, the width/length ratio of the transistors M1, M2, M3, M4, M5 and M6 was chosen to be of minimum dimension 200 nm/60 nm. For reset transistors S1, S2, S3 and S4 it was chosen to be 400 nm/60 nm since we want it to be strong enough to reset the nodes S, R, P and Q. The tail NMOS M7 was also given 400 nm/60 nm ratio so as to handle currents from M1 and M2. Due to the offset issue, as discussed in [9], offset cancellation capacitors were also added at nodes P and Q. Since the capacitors need to discharge the nodes P and Q unequally, they were given the values of 1.5p and 1p respectively. For  $\tilde{S}\tilde{R}$  latch, the ratio was chosen as 200 nm/60 nm for M1, M2, M3 and M4 and 800 nm/60 nm for M5, M6, M7 and M8. The width of PMOS was chosen to be 4x the width of NMOS to ensure an equal pulse width for 0 and 1. The strong-arm/SR latch combination was tested with a load capacticance of 10 fF. It was observed that the size of transistors in the strong-arm latch was not enough to drive the  $\tilde{S}\tilde{R}$  latch. To solve the issue, a parametric sweep was performed on the width/length ratio of transistors in the strongarm to arrive at the optimum values shown in table 4.1 and table 4.2. Since the slicer is a clocked circuit an ideal clock with artificial random jitter was used in Cadence Virtuoso. The delay of the clock was adjusted to 30 ps to ensure that the data symbols are sampled near the main-cursor and rms random jitter value was chosen to be 5% of the UI. In a practical scenario the clock signal is recovered from Rx bits using a Clock Data Recovery (CDR) circuit.

 Table 4.1: Strong-Arm Latch Parameters

| Parameter              | Value                         |  |
|------------------------|-------------------------------|--|
| W/L: M1,M2,M3,M4,M5,M6 | $1 \ \mu m/60 \ nm$           |  |
| W/L: S1, S2, S3, S4    | $2 \ \mu { m m}/60 \ { m nm}$ |  |
| W/L: M7                | $2 \ \mu { m m}/60 \ { m nm}$ |  |

Table 4.2: ŠR Latch Parameters

| Parameter           | Value          |
|---------------------|----------------|
| W/L: M1,M2,M3,M4    | 200  nm/60  nm |
| W/L: M5, M6, M7, M8 | 800  nm/60  nm |

2. Current Summer and Tap

The current summer node is essentially a differential amplifier with

resistor source degeneration for linearity. To get the values of  $R_L$ ,  $R_S$ ,  $C_L$  and width/length ratio, consider the single ended model of the circuit shown in figure 4.6



Figure 4.6: Single-ended model of DFE.

First we assume  $C_{par} = 10$  fF. This includes parasitic capacitance and load capacitance. We choose such a value because the input capacitance of the slicer is on the order of 1 fF. A value of 8 fF for load capacitance  $C_L$  is sufficient to counter the loading. To obtain the value of RL we use the settling condition of DFE. For the first tap, the DFE must settle within 1 unit interval (UI) as shown in equation 5.2. The slicer designed in this thesis has a  $T_{CK->Q} \approx 35$  ps as shown in figure 4.7.



Figure 4.7: Slicer clock-to-Q delay.

In an  $\tilde{S}\tilde{R}$  latch, the input data is tracked while the clock pulse is high. By definition, setup time is the minimum duration for which the input needs to be stable before a clock-edge. Since we are using a leveltriggered device, the input stabilizes after the positive edge of the clock. Thus, the setup time comes out to be negative. For our calculation  $T_{setup}$  was chosen to be -20 ps. Thus,

$$-20 \text{ ps} + 35 \text{ ps} + T_{settle} < 100 \text{ ps}$$
 (4.4)

This gives  $T_{settle} < 85$  ps. Now

$$T_{settle} = 3\tau = 3 * R_L * C_{par} \tag{4.5}$$

Using  $C_{par} = 10$  fF we get  $R_L < 2800 \Omega$ .

The width/length ratio for the transistors can be obtained from the transconductance gm of the NMOS. To obtain this we establish the unity gain bandwidth of our circuit  $(f_{uqb})$ .

$$f_{ugb} = \frac{Gm}{2\pi * C_{par}} \tag{4.6}$$

where  $G_m$  is the transconductance of the entire current-summer circuit.

Since our circuit needs to operate well for a Nyquist frequency of 5 GHz, we choose  $f_{ugb} = 10$  GHz. One can choose any  $f_{ugb}$  greater than 5 GHz; however, a higher bandwidth will lead to greater power consumption and increased transistor size. For  $C_{par} = 10$  fF we get Gm = 0.628 mS. Any value of  $Gm \ge 0.628$  ms should be good for our circuit. The value of Gm is governed by Rs and transconductance gm of the NMOS as shown in equation 4.7 and gm is given by equation 4.8. The value of  $R_s$  is given by the linearity requirement. In this thesis it is chosen to be 100  $\Omega$  as a starting point.

$$Gm = \frac{gm}{1 + gm\frac{Rs}{2}} \tag{4.7}$$

$$gm = \frac{2I_{bias}}{V_{ov}} \tag{4.8}$$

 $I_{bias}$  is the bias current flowing through the NMOS M1 and M1 and is related to  $I_{cursor}$  as  $I_{bias} = \frac{I_{cursor}}{2}$  and  $V_{ov} = V_{gs} - V_{th}$  of NMOS. Now,

for a good SNR we want the swing at the summer/output node to be greater than equal swing at the input of DFE. The swing is dependent on DC gain of the circuit given by  $DCgain = G_m R_L$ . To ensure a good margin for timing requirement given in equation 4.4,  $R_L$  was chosen to be 750  $\Omega$ . For a DC gain of 1.5 Gm = 2 mS. This translates into a  $g_m \approx 2.3$  mS. To obtain the width of the NMOS, the testbench shown in figure 4.8 was used.  $V_{gs}$  and  $V_{ds}$  were set to  $\frac{Vdd}{2} = 0.6V$  and the width of the NMOS was swept from 200 nm to 5  $\mu$ m.



Figure 4.8: Testbench to extract width of NMOS.

From the simulation  $V_{ov}$  observed to be about 400 mV. This gives  $I_{bias} = 460 \ \mu\text{A}$  and  $I_{cursor} = 920 \ \mu\text{A}$ . From the simulation result shown in figure 4.9 it can observed that a width of 4  $\mu$ m works well. The length was kept at 60 nm.

To extract the value of  $I_{tap}$  equation 4.9 was used.

$$I_{tap} = G_m V_{ISI} \tag{4.9}$$

From the pulse response in figure 4.12  $V_{ISI}$  is observed to be 102 mV. This gives  $I_{tap} = 204 \ \mu$ A. However, from the simulations 204  $\mu$ A was found to be over-equalizing the input signal. The value of  $I_{tap}$  was parametrically reduced to arrive at optimum value shown in table 4.3. The width/length ratios of tap transistors M2 and M3 were chosen to ensure that the taps sinks the desired  $I_{tap}$ . For current mirror pair M4,M5 and M6,M7 the width/length ratios are chosen to ensure good current matching among the pairs and to sink the desired currents. Minimum sized devices have poor current matching.



Figure 4.9: NMOS drain current vs. width.

### 4.3 Simulation Setup and Results

This section discusses the simulation setup of CTLE and DFE combination. In the previous chapter we saw how the CTLE is powerful enough to open the eye for a 10 Gbps signal by 2.6x. In this section we will be opening the eye further and also improving the SNR.

#### 4.3.1 AC Analysis

The schematic for ac analysis is shown in figure 4.10 and input parameters are shown in table 4.4.

 $V_{in\_ac}$  was chosen to be 600 mV so that the differential swing at the channel input is 1.2 V. The frequency response is shown in figure 4.11.

#### 4.3.2 Pulse Response

The test-bench for pulse response is same as that for transient analysis except **vsource** was replaced by **vpulse**. The Following parameters were set:

- 1. Voltage 1 = -600 mV
- 2. Voltage 2 = 600 mV

| Parameter        | Value                                    |
|------------------|------------------------------------------|
| W/L: M0,M1       | $4 \ \mu m/60 \ nm$                      |
| W/L: M2,M3       | $1~\mu{ m m}/60~{ m nm}$                 |
| W/L: M4          | $1.5 \ \mu \mathrm{m}/100 \ \mathrm{nm}$ |
| W/L: M6          | 300  nm/100  nm                          |
| W/L: M5          | $10 \ \mu m/1 \ \mu m$                   |
| W/L: M7,M8       | 500  nm/100  nm                          |
| $\mathrm{R}_L$   | 750 $\Omega$                             |
| $\mathbf{R}_{s}$ | $100 \ \Omega$                           |
| $\mathrm{C}_L$   | $8~{ m fF}$                              |
| $I_{cursor}$     | $900 \ \mu A$                            |
| $I_{tap1}$       | $140 \ \mu A$                            |

 Table 4.3: DFE Parameters

| Parameter                 | Value         |
|---------------------------|---------------|
| Common Mode Voltage (Vcm) | 600  mV       |
| $V_{in\_ac}$              | 600  mV       |
| egain                     | 1             |
| Zo                        | $50 \ \Omega$ |

- 3. Period = 60 ns
- 4. Rise time = fall time = 35 ps
- 5. Pulse width = 100 ps

After running the transient simulation, the pulse response before and after DFE was obtained as shown in figure 4.12. It can be seen that the DFE knocks down the first post-cursor completely. The pulse before DFE has a post-cursor value of 0.2 and the pulse after DFE has a post-cursor value of 0.

#### 4.3.3 Transient Analysis

The schematic for transient analysis is the same as the testbench for ac analysis except the vdc source was replaced with vsource. To simulate the circuit a PRBS PN10 sequence was chosen with a bit period of 100 ps and rise/fall time of 35 ps. The transient simulation was run for 105 ns to cover



Figure 4.10: DFE ac analysis testbench.



Figure 4.11: Frequency response of equalizers.

1000 random bits. The effect of random jitter was modeled by using an ideal clock with an rms random jitter value of 5 ps. The eye-diagrams (2 UI) for input, channel output, CTLE output and DFE summer node were obtained. They are shown in figures 4.13, 4.14, 4.15 and 4.16. Table 4.5 summarizes the results of the eye before and after equalization.

From the table one can observe that the CTLE and DFE combination results in 4.7X increase in vertical eye opening, 1.54X increase in horizontal eye opening, 3X decrease in peak-to-peak jitter and 12.5 dB improvment in SNR. The DFE alone causes a 1.8X increase in vertical eye height, 1.2X ps decrease in jitter and 4 dB improvement in SNR. The average power



Figure 4.12: Pulse response at DFE input (red) and summer node (blue).

| Parameter                                | After Channel      | After CTLE           | After DFE            |
|------------------------------------------|--------------------|----------------------|----------------------|
| Mean One Level $\mu_1$                   | 296  mV            | $305 \mathrm{~mV}$   | $353 \mathrm{~mV}$   |
| Standard Deviation One Level $\sigma_1$  | 104  mV            | 49  mV               | 40  mV               |
| Mean Zero Level $\mu_0$                  | -335  mV           | -350  mV             | -412  mV             |
| Standard Deviation Zero Level $\sigma_0$ | 162  mV            | $56 \mathrm{mV}$     | $36 \mathrm{~mV}$    |
| Vertical Eye Opening                     | 140  mV            | $367 \mathrm{~mV}$   | 660  mV              |
| Horizontal Eye Opening                   | $54 \mathrm{\ ps}$ | 82.5  ps             | $83.5 \ \mathrm{ps}$ |
| Jitter (pk-pk)                           | 48  ps             | $19.2 \mathrm{\ ps}$ | 16.4  ps             |
| SNR (dB)                                 | 7.5                | 15.9                 | 20                   |
| BER                                      | 8.8E-03            | 2.3E-10              | 6.26E-24             |

 Table 4.5: DFE eye-diagram Results summary

consumption of the overall circuit was observed to be 3.3 mW. The CTLE eases the requirements on DFE by knocking down most of the post cursors. In table 4.5 one should note that the SNR and BER values are an overestimation because the transient analysis was run for only 1000 bits. The statistical formulas used to derive SNR and BER from the eye-diagram are given in equations 4.10 and 4.11.

$$SNR (V/V) = \frac{\mu_1 - \mu_0}{\sigma_1 - \sigma_0}$$
(4.10)

$$BER = 0.5 erfc(\frac{SNR}{\sqrt{2}}) \tag{4.11}$$



Figure 4.13: Eye-diagram of channel input.



Figure 4.14: Eye-diagram of channel output.

#### 4.3.4 Effect of Tap Current on the Eye

The tap currents of the DFE determine the amount of post-cursor cancellation. Figure 4.17 shows that if the tap current is less than the optimum value then the signal is under-equalized and if it is greater than the optimum value then the signal is over-equalized. In both the cases the vertical-opening of the eye sees a reduction. The horizontal-opening did not get affected much.



Figure 4.15: Eye-diagram of CTLE output.



Figure 4.16: Eye-diagram of DFE summer node.

#### 4.3.5 Effect of Clock Random Jitter on DFE

To observe the effect of random-jitter in the receiver clock, the **Random Jitter (RJ)** field of the clock source was parametrically swept from 0 ps to 25 ps. The results are shown in figure 4.18. One can observe from figure 4.18 that the DFE designed in this thesis works optimally until the rms jitter



Figure 4.17: Effect of itap variation on the eye-diagram parameters.

value of the receiver clock hits about 7.5 ps. After 7.5 ps the eye width and height start decreasing rapidly. Typically the rms random jitter tolerance for the receiver clock is set to be less than 0.1 of the UI which in this case will be less than 10 ps.



Figure 4.18: Effect of random jitter on the eye-diagram parameters.

# CHAPTER 5

# CONCLUSION

### 5.1 Summary

This thesis extends the work in [10] to build and simulate a complete receiver equalizer in 65 nm CMOS technology. The focus of [10] was on behavioral modelling of feed-forward equalizers in the TX path and transistor implementation of CTLE in RX path for a 6 Gbps link. In this thesis, a conventional CTLE and a full-rate 1 tap conventional DFE are implemented for a 10 Gbps USB 3.1 link model. The combined CTLE and DFE architecture achieved a 4.7X increase in vertical eye opening, 1.54X increase in horizontal eye opening, 3X decrease in peak-to-peak jitter and 12.5 dB improvement in SNR with a power consumption was of 3.3 mW. There is an immense scope for improvement in the architecture from the perspective of power consumption and eve-opening. With the increasing push for higher-data rates, lower power consumption especially for mobile devices is very critical. As for eye-opening, the DFE topology in this thesis implements a conventional current-summer which has a low-voltage swing and uses a slow slicer that makes it difficult to close the first feedback loop. With the increasing number of taps, the current topology is likely to break. To solve the voltage swing issue, a current-integrating summer can be used. To improve the speed of the slicer, an optimized RS latch [11] with a strong-arm latch can be used.

### 5.2 Future Work

The focus of this thesis was purely on receiver equalizers. No provision was made to cancel out the pre-cursors of the data bits. To improve the overall eye opening of the entire USB 3.1 link used in this thesis, a transistor-level



Figure 5.1: Conceptual representation of an adaptive DFE. adapted from [12].

FFE also needs to be implemented. The DFE in this thesis relied on an ideal clock with an artificially induced random jitter. In an actual serial-link, the correct sampling time and receiver clock are extracted from the CDR circuit. Hence to correctly model the non-idealities in a high-speed link, a 5 GHz lowjitter CDR circuit needs to be designed. There is also a scope for making the DFE adaptive. After the tape-out, the channel characteristics can vary under PVT variations. In addition, new parasitics are also introduced that are not modeled well in simulation tools. In such a case an adaptive DFE makes the optimal equalizer. A generic block diagram of an adaptive DFE is shown in figure 5.1 where e[n] represents the error between the digital slicer output and eye-performance. The eye-monitor block digitizes the eye voltage and the update block processes the error term on the basis of a particular adaptive algorithm to adjust the tap currents. Figure 10 in [12] provides one such implementation.

There are many algorithms to implement the adaptive block of the DFE.

However, they are all implemented in digital domain, hence they necessitate the use of ADCs. One of the most popular algorithms is the least mean squares (LMS). The LMS algorithm aims to minimize the error coefficient  $e_k$  in equation 5.1 [13].

$$c_{j,k+1} = c_{j,k} + he_k d_{k-j} \tag{5.1}$$

where  $c_k$  is the tap coefficient in step k of the DFE and  $d_{k-j}$  is the output of slicer. The LMS algorithm is difficult to implement because it needs the values of  $e_k$  and  $d_k$  which are extracted from ADCs. An alternate algorithm called sign-sign LMS (SS-LMS) provides faster convergence. It relies only on sign of  $e_k$  and  $d_k$  which can be extracted using comparators. Interested readers can refer to figure 10 in [13] to for more details on implementation.

The LMS based algorithms typically rely on a training sequence to obtain correct sampling points. However, in [14] it has been shown that convergence in LMS is achievable without using training sequence. There are other adaptive DFE architectures than LMS such as Eye-opening adaptive DFE, Jitter based adaptive DFE and Blind ADC based adaptive DFE that offer faster convergence and greater control over the eye. Readers are encouraged to refer to [13] and [12] for further information.

### REFERENCES

- [1] "What are the USB data transfer rates and specifications?" https://www.sony.com/electronics/support/articles/00024571, accessed 2019-03-19.
- [2] S. Palermo, "ECEN689: Special topics in high-speed links circuits and systems. Lecture 19: RX DFE equalization," Texas AM University, 2010.
- [3] "Compliance test benches keysight (formerly agilents electronic measurement), keysight.com, 2019." [Online]. Available: https://www.keysight.com/main/editorial.jspx?action=downloadcc= USlc=engckey=2691925id=2691925ml=eng
- [4] P. K. Hanumolu, G.-Y. Wei, and U.-K. Moon, "Equalizers for highspeed serial links," *International Journal of High Speed Electronics and Systems*, vol. 15, no. 02, pp. 429–458, 2006.
- "Understanding the pre-emphasis [5] Altera, and linear equalization features in transmitter pre-emphasis in Stratix IV GX devices," Innovation, pp. 1–16, 2010. [Online]. Available: https://www.intel.com/content/dam/www/programmable/us/en/pdfs/ literature/an/an602.pdf
- [6] E. H. Chen, R. Yousry, and C. K. K. Yang, "Power optimized ADCbased serial link receiver," *IEEE Journal of Solid-State Circuits*, vol. 47, no. 4, pp. 938–951, 2012.
- [7] P. Chiu, M. Liu, Q. Tang, and C. H. Kim, "A 2.1 pj/bit, 8 gb/s ultralow power in-package serial link featuring a time-based front-end and a digital equalizer," in 2018 IEEE Asian Solid-State Circuits Conference (A-SSCC), Nov 2018, pp. 187–190.
- [8] C. Thakkar, L. Kong, K. Jung, A. Frapp, and E. Alon, "A 10gb/s 45mw adaptive 60ghz baseband in 65nm cmos," in 2011 Symposium on VLSI Circuits - Digest of Technical Papers, June 2011, pp. 24–25.
- B. Razavi, "The strongarm latch [a circuit for all seasons]," *IEEE Solid-State Circuits Magazine*, vol. 7, no. 2, pp. 12–17, Spring 2015.

- [10] A. Jain, "Equalization in continuous and discrete time for high speed links using 65 nm technology," master's thesis, University of Illinois, May 2016. [Online]. Available: http://hdl.handle.net/2142/90516
- [11] B. Nikolic, V. Stojanovic, V. G. Oklobdzija, J. Chiu, and M. Leung, "Sense amplifier-based flip-flop," in 1999 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC. First Edition (Cat. No.99CH36278), Feb 1999, pp. 282–283.
- [12] H. Higashi, S. Masaki, M. Kibune, S. Matsubara, T. Chiba, Y. Doi, H. Yamaguchi, H. Takauchi, H. Ishida, K. Gotoh, and H. Tamura, "A 5-6.4-gb/s 12-channel transceiver with pre-emphasis and equalization," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 4, pp. 978–985, April 2005.
- [13] F. Yuan, A. R. AL-Taee, A. Ye, and S. Sadr, "Design techniques for decision feedback equalisation of multi-giga-bit-per-second serial data links: A state-of-the-art review," *IET Circuits, Devices Systems*, vol. 8, no. 2, pp. 118–130, March 2014.
- [14] J. Labat, O. Macchi, C. Laot, and N. Le Squin, "Is training of adaptive equalizers still useful?" in *Proceedings of GLOBECOM'96. 1996 IEEE Global Telecommunications Conference*, vol. 2, Nov 1996, pp. 968–972.