# Modeling and Analysis of High-Speed I/O Links

Ganesh Balamurugan, Member, IEEE, Bryan Casper, James E. Jaussi, Member, IEEE, Mozhgan Mansuri, Frank O'Mahony, and Joseph Kennedy, Member, IEEE

Abstract-Improvements in signaling methods, circuits and process technology have allowed input/output (I/O) data rates to scale beyond 10 Gb/s over several legacy channels. In this regime, it is critical to accurately model and comprehend channel/circuit nonidealities in order to co-optimize the link architecture, circuits, and interconnect. Empirical and worst-case analysis methods used at lower rates are inadequate to account for several deterministic and random noise sources present in I/O links today. In this paper, we review models and methods for statistical signaling analysis of high-speed links, and also propose a new way to integrate behavioral modeling approaches with analytical methods. A computationally efficient segment-based analysis method is shown to accurately capture the effect of transmit jitter and its interaction with the channel. In addition, a new jitter interpretation approach is proposed to enable the analysis of arbitrary I/O clocking topologies. We also present some examples to illustrate the practical utility of these analysis methods in the realm of high-speed I/O design.

*Index Terms*—High-speed I/O, I/O power optimization, link analysis tools, signaling analysis, statistical signaling analysis.

## I. INTRODUCTION

VER the past decade, high-speed input/output (I/O) data rates have scaled from a few hundred Mb/s to > 10 Gb/s. This has been possible due to improvements in link architecture (e.g., point-to-point instead of multidrop), signaling methods (e.g., transmit pre-emphasis), circuits (e.g., low noise receivers and precision clocking), and semiconductor process technology. A typical multi-Gb/s I/O link today (e.g., [1]–[3]) includes several of the components shown in Fig. 1: equalizers at TX/RX to compensate for channel intersymbol interference (ISI), high-speed clock generation/distribution circuits, and clock-data recovery (CDR) circuits to optimally sample the incoming data. This increased complexity has required significant improvements in link modeling and analysis techniques to enable design optimization and validation. Advanced signaling analysis tools are also used today to help draft I/O standards and specifications. By comprehensively modeling both the circuits and interconnect in a high-speed link, these tools enable system level co-optimization of channel components, link architecture and low level circuits.

In the past, link analysis was based on empirical simulations or worst-case analysis. In the former approach, a transient analysis was done using a circuit/interconnect simulator to com-

The authors are with the Microprocessor Technology Laboratories, Intel Corporation, Hillsboro, OR 97124 USA (e-mail: ganesh.balamurugan@intel.com).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TADVP.2008.2011366



Fig. 1. Components of a high-speed I/O link.

pute performance margins from the receiver eye obtained with a few thousand bits. With increasing link complexity (as shown in Fig. 1) and the strong interaction between various link components (e.g., channel characteristics and CDR performance), it is impractical to model the entire link using a SPICE-like time domain simulator. Since it is not possible to increase the number of bits used in the simulation to validate the low BERs needed in links today ( $< 10^{-12}$ ), it is difficult to predict performance margins with any certainty using a solely empirical method. Worst-case link analysis methods (e.g., peak distortion analysis in [4], [5]) circumvent the need for time consuming simulations by analytically determining the worst-case eye from the channel response and noise models. While these methods are useful for applications like equalizer optimization due to their computational simplicity [6], they are not suited to high-speed link performance analysis. This is because the results of a worst-case analysis in which the worst-case effects of several noise and interference sources are simply superimposed can be highly pessimistic (this is especially true when modulation schemes more complex than NRZ are used). Due to these drawbacks of empirical and worst-case analysis methods, almost all link analysis tools used today (e.g., [7]–[9]) are statistical in nature.

Statistical link analysis attempts to account for all relevant noise and interference sources in a probabilistic fashion to derive statistical performance metrics like BER analytically, without extensive behavioral/circuit simulations. With appropriate models for various link components, this type of analysis enables computationally efficient validation and performance characterization for very low target BERs ( $< 10^{-12}$ ). In this paper, we present models for various key blocks in a high-speed link and techniques to use them for statistical link analysis.

Manuscript received June 27, 2008; revised November 22, 2008. This work was recommended for publication by Associate Editor W. Beyene upon evaluation of the reviewers comments.

IEEE TRANSACTIONS ON ADVANCED PACKAGING



Fig. 2. I/O channel modeling: (a) physical channel example, (b) model using an impulse response matrix.



Fig. 3. Transmitter model.

Besides reviewing existing statistical techniques [4], [7]–[9], we present a method to combine them with behavioral modeling approaches to improve accuracy, particularly with respect to link jitter. The next section covers link modeling, while the analysis methods are described in Section III. Section IV addresses ways to model actual TX/RX clock jitter to fit within the analysis framework of Section III. In Section V, we present some examples to illustrate the utility of statistical link analysis in high-speed I/O design.

## II. LINK MODELING

Analyzing the performance of an I/O link requires models for both the communication channel and the various link components shown in Fig. 1. At the transmit side of the link, a PLL generates a high-speed clock that is usually buffered and distributed to several transmitters. Within each transmitter, data bits may be coded (e.g., for dc balancing or clock recovery) and modulated before being sent to the output driver. In many instances, the output driver is implemented using a segmented DAC for transmit pre-emphasis [1]. At the receiver, the incoming data is conditioned by an analog front-end (AFE) before being sampled and quantized. A clock and data recovery loop (CDR) determines the optimal sampling instants. Below, we describe ways to model each of these link components.

# A. Channel

Physically, an I/O channel may be composed of a combination of I/O pads at TX/RX, package traces/pins, sockets, connectors, FR4 traces, and a backplane (see Fig. 2). This channel can usually be assumed to be linear and time-invariant (LTI), and can thus be described using an impulse or frequency response. A multiport LTI model (Fig. 2) is necessary to capture the effect of all relevant crosstalk sources, in addition to intersymbol interference (ISI). It is important to include the effective capacitance seen at the TX/RX pads (due to ESD, device and interconnect parasitics) in the overall channel model as they can significantly limit performance over certain channels.

## B. Transmitter

While the LTI assumption is usually valid for the physical I/O channel, it may be stretched in the case of the output driver [10]. This is primarily due to nonlinearities and parasitics of the active devices in the driver. Techniques to model these have been proposed in [10] and [11]. For our purposes, we will assume that an equivalent LTI model can be derived using a method such as the least mean squares approach in [11]. The segment-based analysis method described in Section III-A can comprehend any dc nonlinearity in the output DAC. TX pre-emphasis can be modeled by a discrete-time linear filter with the appropriate tap weights (Fig. 3). The TX peak output constraint implies that the

BALAMURUGAN et al.: MODELING AND ANALYSIS OF HIGH-SPEED I/O LINKS



Fig. 4. Receiver model.

sum of the absolute values of the tap weights should be unity [12]. Jitter in the TX clock arises from various sources such as device noise, PLL reference clock and power supply noise [13]. This modulates the phase of the TX output and can cause significant degradation of margins at high jitter frequencies [14]. A model for the transmitter that includes all these elements is shown in Fig. 3. Jitter modeling is described in detail in Section IV.

## C. Receiver

The AFE in the receiver is a signal conditioning circuit that may be a low pass filter (to filter out of band noise), an automatic gain control unit, or a continuous time linear equalizer (to mitigate ISI [1]). In all cases, it can be modeled as a filter that modifies the effective channel response (Fig. 4). The AFE is usually followed by a sampler and regenerative latches [15]. These blocks can be modeled by an ideal quantizer with input offset (arising from device mismatches) and input-referred noise (due to device noise and other on-chip noise sources), as shown in Fig. 4. Note that the noise here also includes any output noise from the AFE. Similar to TX pre-emphasis, the DFE can be modeled by a discrete-time linear feedback filter. For high-speed I/O where target BERs are  $< 10^{-12}$ , error propagation in the DFE has negligible effect on performance. This allows the linearization of DFE as described in [16] enabling its effect to be analyzed using a modified single bit response. The most common implementation of RX CDR determines and tracks the RX sampling phase based on data/edge samples (e.g., [2]). To calculate the jitter in the RX sampling clock, all relevant sources of jitter such as reference clock noise, VCO noise, charge-pump mismatch, and self-noise of bang-bang phase detectors [17] need to be included. Channel ISI/crosstalk and transmit data coding also have an effect on CDR performance. Due to the nonlinear nature of most CDRs used today and the multitude of noise sources, it is preferable to use a behavioral modeling/Monte Carlo simulation approach to capture all the relevant nonidealities. The data from such a simulation can be postprocessed (as described in Section IV) to extract jitter parameters that can be used in the analysis described in Section III.

## III. STATISTICAL SIGNALING ANALYSIS

In this section, we describe methods to comprehend the dominant link interference and noise sources in a statistical manner.



Fig. 5. Steps in statistical signaling analysis.

While some of these techniques have been described in previous publications [4], [7], [8], we include it here for review and also to set the framework for a new jitter interpretation approach proposed in the next section. Using the models of Section II, the following analysis will allow us to characterize the link performance in terms of a statistical BER eye, which is the link BER as a function of receiver sampling offset in voltage and time domains. A BER eye can be considered to be a 2-D extension of the more conventional bathtub plot that characterizes BER as a function of timing offset alone. In addition to LTI, the following key assumptions are made in the analysis: 1) TX data bit sequence in each channel is independent, identically distributed (i.i.d) and independent of data sent over other channels, and 2) noise/jitter distributions are stationary, independent of the data sequence. Although the analysis methods described here can be used for arbitrary jitter distributions, we will use the commonly used dual-Dirac normal distribution (a combination of Gaussian and bi-modal distributions [18]) for illustration. The analysis flow is outlined in Fig. 5 and the various steps are explained below.

## A. ISI + Untracked TX Jitter

The first step in the analysis is to determine the distribution of RX samples due to just ISI and TX jitter that is not tracked by the receiver. The strong interaction between these two signaling impairments has been noted in previous publications [7], [14], [19]. Intuitively, this is because TX jitter (at high frequencies) modulates TX symbol widths and hence the amount of ISI seen at the receiver. If TX jitter is white, the RX sample distribution can be obtained using the segment-based analysis described in [7]. In reality, TX jitter can be highly correlated and part of it is tracked and cancelled by the RX CDR. In Section IV, we describe a way to map the actual correlated TX/RX jitter to equivalent white TX and RX jitter. In a segment-based analysis, the TX data stream is divided into segments centered on the nominal data transitions as shown in Fig. 6. The idea is to compute the contributions of individual segments and appropriately combine them to get the aggregate distribution of RX samples. To do this, we start by tabulating all valid realizations of a segment in terms of the initial and final values and



Fig. 6. Transition PDF computation in segment-based analysis of ISI and TX jitter. An asymmetric bi-modal TX jitter distribution is used for illustration.



Fig. 7. Example of combination of segments in segment-based analysis.

the transition edge jitter (Fig. 6). Note that dc nonlinearity in the output driver can be comprehended at this stage by appropriately modifying the initial and final values based on a characterization of the output driver circuit (see Fig. 3). The contribution of each segment to the aggregate sample PDF (called the "transition PDF" in [7]) is determined by convolving the segment with the channel impulse response, sampling the result and weighting by the TX jitter distribution (Fig. 6). Once the transition PDFs (TPDFs) of all the precursor and postcursor segments have been computed in this manner, they are combined to obtain the aggregate RX sample PDF. This can be done in a sequential fashion as shown in the example in Fig. 7. For the binary NRZ case with no TX pre-emphasis, there are four possible transitions for each pre-/post-cursor segment as shown. Starting from the last significant postcursor segment, at each step, we average selected transition PDFs and combine them with appropriate transition PDFs in the neighboring segment. For example, we average the TPDFs corresponding to the "00" and "10" transitions in postcursor segment #2 and convolve it with the TPDF of "01" in segment #1 to obtain the contribution to the RX sample PDF of all possible data patterns that have a 0-to-1 transition 1 UI before the cursor. This step is repeated until all the segments have been traversed and we converge at the cursor segment. Note that this approach can be easily extended to include multilevel signaling (M-PAM), TX pre-emphasis, and TX duty cycle error (or more generally, any deterministic TX jitter) by appropriately augmenting the set of possible segments and the rules for combining them. For example, in the case of TX duty cycle error, we would combine (i.e., convolve) the TPDFs of the wider segments with those of the narrower segments only and vice versa. Formulas describing this iterative process for arbitrary number of signaling levels and TX pre-emphasis taps can be found in [7].

#### B. Pre-aperture BER Eye

After computing the effect of ISI and untracked TX jitter, we add the effect of crosstalk to the RX sample distribution. Assuming i.i.d data streams, the interference distribution due to crosstalk can be easily computed from the crosstalk channel responses using the approach outlined in [4] to compute the distribution of ISI. As the data in crosstalk channels is assumed to be independent of that in the primary channel, the resulting interference distribution can be directly convolved with the sample distribution obtained in the previous step (Fig. 5). Note that the effect of crosstalk depends on the phase shift between the clocks in the aggressor and victim channels. For plesiochronous clocking scenarios, the crosstalk PDF can be computed by averaging the results for all possible relative phases. Using the resulting distribution that comprehends ISI, crosstalk and (uncorrelated) TX jitter, we can compute the BER conditioned on a given RX sampling voltage/timing offset. This can be done by integrating appropriate areas in the aggregate sample PDF just computed (see Fig. 8). The result is a 2-D BER plot as a function of RX sampling offsets, referred to as the pre-aperture BER eye, since it assumes an ideal receiver (with no voltage noise/sampling jitter).

#### C. RX Aperture and Final BER Eye

RX aperture captures the uncertainty in the sampling clock/ voltage reference that is used to sample/resolve the incoming RX data. Assuming the timing and voltage noise sources are independent, the 2-D RX aperture distribution can be computed simply by multiplying the individual distributions [Fig. 9(a)]. The final BER eye can now be obtained by convolving the preaperture BER eye with the RX aperture as shown in Fig. 9(b). In addition to the nominal BER (BER at the center of the BER eye), this statistical BER eye can be used to compute voltage and timing margins at the target BER as shown in Fig. 9(b).

Much of the computational complexity of the statistical signaling analysis method just described is due to the integrated analysis of ISI, TX pre-emphasis and TX jitter in Section III-A. The computation time can be significantly reduced by combining TX and RX jitter into an effective RX sampling jitter. This can however result in significant errors when the channel exhibits ISI and when TX pre-emphasis is used. Fig. 10 shows a comparison of BER eyes obtained for 5 Gb/s data transfer over a server backplane with 2-tap pre-emphasis when 2 ps rms jitter is injected at the transmitter or receiver. Neglecting the interaction between TX jitter and the channel overestimates the voltage



Fig. 8. Pre-aperture BER eye computation from RX sample distribution.



Fig. 9. Computation of (a) RX aperture, and (b) final BER eye. VM and TM are the statistical voltage and timing margins at the target BER.



Fig. 10. Difference between effect of jitter injected at TX and RX.

margin at  $10^{-20}$  BER by ~ 25%. This underlines the importance of comprehending the eye closure (both vertical and horizontal) resulting from the modulation of TX data symbols by high-frequency TX jitter.

# IV. JITTER MODELING AND EXTRACTION

Jitter in an I/O link can arise from a variety of sources such as device noise, supply noise, and device mismatch [5], [8] causing both deterministic and random errors in timing references at the transmitter and receiver. In general, the actual TX and RX jitter are both individually and jointly correlated. Accurately accounting for TX/RX jitter with arbitrary jitter distributions and correlation statistics is a computationally daunting task, and necessitates simplifying assumptions. The simplest of these is to



Fig. 11. Jitter interpretation problem definition.

subtract the actual TX jitter from the RX jitter to compute the effective sampling jitter. This however neglects the interaction between TX jitter and the channel, which can be quite significant for lossy channels [14], [19]. Another approach is to map Gaussian TX and RX jitter distributions to an equivalent voltage noise that degrades the receiver sensitivity [8]. Besides being restricted to Gaussian jitter, this approach assumes that noise due to jitter is uncorrelated with the transmit data sequence. Below, we propose a method that combines the advantages of behavioral and statistical approaches to handle jitter more accurately. In the context of I/O clocking, behavioral modeling approaches are useful for several reasons: 1) a wide variety of clocking topologies and clock recovery methods can be modeled; 2) the complex interactions between various clocking components can be extracted; and 3) the colored (nonwhite) nature of most jitter sources can be comprehended. However, due to their computational complexity, they can only predict BERs  $> 10^{-6}$ . By appropriately using the results from a behavioral simulation to parameterize the necessarily simpler model assumed for statistical analysis, behavioral model results can be extrapolated to the lower BERs of  $\sim 10^{-12}$  needed for I/O links. The statistical analysis detailed in Section III assumed that the actual correlated TX/RX jitter in the link can be mapped to a model with two jitter components: white TX jitter that modulates the ISI distribution (in Section III-A) and RX sampling jitter that determines the RX sampling aperture (in Section III-C). The uncorrelated (white) nature of TX jitter allows the various segment transition PDFs to be combined in a computationally efficient way using the approach shown in Fig. 7. In this section, we describe a method to postprocess the actual TX/RX jitter to derive jitter parameters that can be used in the analysis of Section III. In the following discussion, we will refer to the latter as interpreted TX/RX jitter to distinguish them from the actual TX/RX jitter.

Jitter PDFs are commonly modeled using the dual Dirac normal distribution as mentioned in Section II. The required

jitter interpretation can be formulated as a parameter estimation problem, where the parameters to be estimated are the dual Dirac distribution parameters of the interpreted TX/RX jitter. To frame the problem, we note that though jitter is noise in the time domain, it can be converted into equivalent voltage noise [5], [8], [14]. Thus, the jitter interpretation problem can be defined in terms of the voltage noise induced by TX/RX jitter as shown in Fig. 11. The sequence  $X_n$  in Fig. 11 is noise due to the actual TX/RX jitter, while  $Y_n$  is the noise sequence due to the interpreted jitter. We would like to find parameters of the interpreted TX/RX jitter model assumed in Section III so that the two sequences  $X_n$  and  $Y_n$  in Fig. 11 are statistically similar. The voltage noise distribution due to the actual TX/RX jitter  $(X_n)$  can be obtained by a behavioral simulation that includes models of TX/RX clocking circuits with all the relevant noise sources (see Fig. 12 for an example). In Appendix I, we describe a systematic way to compute the parameters that best fit the observed distribution of noise caused by jitter, obtained from such a simulation. A second-order model (an extension of the model in [8]) is used to convert jitter into equivalent voltage noise and compute its distribution. Together with second order correlation statistics of the actual noise caused by jitter, this can be used to estimate the dual Dirac model parameters for interpreted TX/RX jitter. An example of the result of this process applied to the clock topology of Fig. 12 is shown in Fig. 13. Fig. 12 shows the clock path in a forwarded clock link with various components such as the TX PLL, RX DLL, and clock distribution buffers whose characteristics determine TX/RX jitter. Behavioral models for these individual blocks can be used in a time/phase-step simulation to extract (actual) jitter in TX data clock and RX sampling clock in Fig. 12. An example of such a model for a clock buffer chain is shown in the insert in Fig. 12. This model includes deterministic and random noise sources in the buffer that add to its input jitter and also a transfer function (in terms of a jitter impulse response [19]) that filters it. Fig. 13(a) shows the actual TX/RX jitter



Fig. 12. Clock topology in a forwarded clock link showing the main components in TX, RX clock paths. The insert shows a behavioral model of a buffer chain.



Fig. 13. Example showing the result of jitter interpretation. The dots represent simulation data while the solid line is the distribution using interpreted TX and RX jitter parameters.

sequence obtained from a behavioral phase step simulation of the clock topology in Fig. 12 for a backplane link operating at 8 Gb/s. Note that the receiver tracks and rejects much of the low frequency TX jitter (while adding some jitter due to active circuits in the RX clock path). The distribution of the voltage noise resulting from the actual TX/RX jitter (the sequence  $X_n$  in Fig. 11) is shown by the dots in Fig. 13(b). Despite the highly correlated nature of the actual TX/RX jitter, the noise distribution obtained using the interpreted jitter parameters (distribution of sequence  $Y_n$  in Fig. 11 obtained using the approach in Appendix I), overlaid on the same graph, shows a good fit to the simulated data. The jitter interpretation process described here can be used to extract effective TX/RX jitter parameters from behavioral models of arbitrary clock topologies, enabling CDR performance comparison and sensitivity analysis.



Fig. 14. Channel responses of two backplane topologies.

| TABLE I                                    |  |
|--------------------------------------------|--|
| Assumptions for Backplane Channel Analysis |  |

| General Assumptions               |                                    | Jitter and Noise Assumptions                                                |            |  |
|-----------------------------------|------------------------------------|-----------------------------------------------------------------------------|------------|--|
| Modulation                        | Binary NRZ with data<br>scrambling | TX duty-cycle error (modeled as uncorrelated bimodal jitter)                | 0.01Ul p-p |  |
| BER target                        | 10 <sup>-12</sup>                  | TX uncorrelated gaussian jitter                                             | 0.01UI rms |  |
| TX swing                          | ±500m∨                             | RX duty-cycle error (modeled as bimodal jitter)                             | 0.01Ul p-p |  |
| TX FIR (Pre-emphasis) Assumptions |                                    | RX sampling uniform jitter                                                  | 0.2UI p-p  |  |
| Coefficient resolution            | 30mV                               | RX sampling gaussian jitter                                                 | 0.01UI rms |  |
| 2 tap                             | 1 postcursor tap                   | RX slicer noise                                                             | 1mV rms    |  |
|                                   | 0 precursor taps                   |                                                                             |            |  |
| 3 tap                             | 1 postcursor tap                   | Channel assumptions                                                         |            |  |
|                                   | 1 precursor tap                    |                                                                             |            |  |
| 4 tap                             | 2 postcursor taps                  | Pad Capacitance                                                             | RX=400fF,  |  |
|                                   | 1 precursor tap                    |                                                                             | TX=200fF   |  |
| 5 tap                             | 3 postcursor taps                  | FR4 total length                                                            | 22 in.     |  |
|                                   | 1 precursor tap                    |                                                                             |            |  |
| 6 tap                             | 4 postcursor taps                  | Routing constrained to avoid near-end crosstalk. Far-end crosstalk assumed. |            |  |
|                                   | 1 precursor tap                    |                                                                             |            |  |
| DFE As                            | sumptions                          | Channel includes socket-Ts, FCI Airmax™                                     |            |  |
| Coefficient resolution            | 1mV                                | connectors, 2 daughter cards, 1 BP and vias.                                |            |  |
| No error propagation              | •                                  | 1                                                                           |            |  |

#### V. EXAMPLES

The link modeling and analysis techniques described so far can be applied to resolve a variety of issues that arise in the design and optimization of a high-speed link. In this section, we provide two examples to highlight the utility of link analysis tools. The first example considers two topologies of the common backplane channel and compares them in terms of achievable data rates for several equalization architectures [7]. The second example shows how statistical link analysis can help in link power optimization.

#### A. Backplane Channel Design

Backplanes used in current high-capacity routers, switches and blade server systems need to be designed to support multi-Gb/s data rates. Suppose we want to compare the performance of two backplane (BP) topologies in terms of their achievable data rates for various equalization architectures. The two BPs are identical except for the length of their via stubs (Fig. 14): one has a 5 mm BP via stub at each connector location while the via stubs in the other BP are limited to < 1 mm by backdrilling the connector vias. Fig. 14 shows that the two channels can have significantly different frequency responses due to the difference in via stubs. The effect of the via stubs on achievable bandwidth can be quantified using a statistical link analysis based on the jitter/noise assumptions listed in Table I. For various equalization schemes listed in the table, the link performance is quantified in terms of the maximum achievable data rate (MADR) for a target BER of  $10^{-12}$ . Fig. 15 demonstrates that simple drilling of the BP vias significantly improves the MADR regardless of equalization complexity: an improvement in bandwidth close to  $2 \times$  can be obtained with 4 taps of DFE and 4 taps pre-emphasis. Alternatively, backdrilling enables a given bandwidth with significantly lower equalization complexity. For example, 12 Gb/s over a non-backdrilled channel requires 4-tap pre-emphasis



Fig. 15. Effect of BP via stubs on MADR.



Fig. 16. Transceiver power optimization at 15 Gb/s over single-board channel [15].

and 16-tap DFE, while a simple 2-tap pre-emphasis solution is adequate in the backdrilled case. Thus, backdrilling can result in a significantly lower power, area, complexity and risk. For more details on cost analysis, see [7].

#### **B.** Power Optimization

I/O power is becoming a significant portion of the total power in many digital systems. Minimizing I/O power involves co-optimizing several link components such as output driver, receiver front-end, clocking circuits, and the channel [15]. If the clocking power is amortized across several lanes, the link power is dominated by power consumed in the driver ( $P_{drv}$ ) and receiver front-end ( $P_{rxfe}$ ).  $P_{drv}$  and  $P_{rxfe}$  are related, since a more sensitive receiver allows the reduction of TX signal swing (and hence  $P_{drv}$ ) at the cost of  $P_{rxfe}$  (required to achieve the improved sensitivity). Statistical link analysis can help to jointly optimize  $P_{drv}$  and  $P_{rxfe}$  to deliver a certain bandwidth over a given channel. An example of this power tradeoff for 15 Gb/s data transfer over a chip-to-chip interconnect in a single board (similar to that described in [15]) is shown in Fig. 16. For each RXFE sensitivity setting, the minimum TX signal swing required to achieve a BER  $< 10^{-20}$  is determined using the statistical signaling analysis techniques described in Section III and the timing noise assumptions shown in Fig. 16. The RXFE power required for each sensitivity value is estimated assuming an inverse quadratic relationship between sensitivity and power (using the measured RXFE power of 4 mW for 1 mV-rms noise from [15] as a reference). The results of Fig. 16 show that suboptimal receiver designs (whose sensitivities fall outside 1–2 mV-rms in this case) can have a significant transceiver power cost. Similar power analyses can be done with the methods described in this paper, to optimize clocking circuits (where again nonlinear relationship exists between jitter and power) and the interconnect [7].

## VI. CONCLUSION

With I/O data rates moving into the multi-Gb/s regime, signaling analysis techniques have evolved from simple empirical and worst-case analysis to a more comprehensive statistical analysis. This paper presented computationally efficient models and methods to analyze I/O link performance in the presence of deterministic interference sources like ISI/crosstalk, and random timing/voltage noise. Over lossy channels, there can be significant amplification of TX jitter, which can be comprehended by the iterative segment-based analysis described here. We also presented a method to combine the strengths of behavioral and statistical modeling approaches to improve the accuracy of jitter analysis for arbitrary clocking topologies.

#### APPENDIX

Jitter in the transmitter and receiver can be converted to equivalent voltage noise in the received samples using a Taylor's series expansion [8]. This allows the received sample values to be expressed as a sum of two terms

$$x_n = x_{n,nj} + \Delta x_n \tag{1}$$

where  $x_{n,nj}$  is the sampled value in the absence of any TX/RX jitter (but includes ISI) and  $\Delta x_n$  is the noise induced by both TX and RX jitter.  $x_{n,nj}$  can be expressed in the terms of the discrete-time channel pulse response  $h_n$  and transmit symbols  $b_n$ 

$$x_{n,nj} = \sum_{k} b_k h_{n-k}.$$
 (2)

A second-order model for the equivalent voltage noise due to jitter can be derived from a Taylor's series approximation to be

$$\Delta x_n = \sum_k (s_n - q_k) v_k f_{n-k} + \frac{1}{2} \sum_k (s_n - q_k)^2 v_k g_{n-k}$$
(3)

where  $q_n$  is the TX jitter sequence,  $s_n$  is the RX jitter sequence,  $f_n$  is the sampled impulse response,  $g_n$  is the sampled derivative of the impulse response, and  $v_n = (b_n - b_{n-1})$  is the differential data sequence. The dual-Dirac approximation [18] allows us to model the distributions of  $q_n$  and  $s_n$  as a combination of bi-modal and Gaussian distributions. The jitter interpretation problem of Section IV involves determining the parameters of the distributions of  $q_n$ ,  $s_n$  that best fit the distribution of  $\Delta x_n$  obtained from a behavioral simulation as described in Section IV (see Fig. 11). The parameter fit process can be simplified by deriving the relationship between second-order statistics of  $\Delta x_n$ and the parameters of  $q_n/s_n$ . It can be shown [using (2), (3)] that the cross correlation of  $\Delta x_n$  with the ideal RX samples  $x_{n,nj}$ is proportional to the sum of the variances of TX and RX jitter ( $\sigma_s^2$  and  $\sigma_q^2$ , respectively). For an alternating data pattern, this relationship is given by

$$E[x_{n,nj}\Delta x_n] = (\sigma_s^2 + \sigma_q^2) \sum_k (-1)^k \sum_l g_l h_{k-l}.$$
 (4)

Since the channel model is known, (4) can be used to estimate the sum of the variances of interpreted TX and RX jitter. Similarly, the variance of the interpreted TX jitter can be estimated from the variance of  $\Delta x_n$  using (3) and (4). Having computed the individual variances of the  $q_n$  and  $s_n$ , it remains to determine how these variances are divided between the Gaussian and bi-modal jitter distributions of the dual Dirac model. This can be easily done using a least mean squares fit of the simulated jitter noise distribution with that obtained analytically using (3).

#### REFERENCES

- B. Casper et al., "A 20 Gb/s forwarded clock transceiver in 90 nm CMOS B," in *IEEE Int. Solid-State Circuits Conf. Dig. Techn. Papers*, Feb. 2006, pp. 263–272.
- [2] J. L. Zerbe *et al.*, "Equalization and clock recovery for a 2.5–10-Gb/s 2-PAM/4-PAM backplane transceiver cell," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2121–2130, Dec. 2003.
- [3] K. Krishna et al., "A multigigabit backplane transceiver core in 0.13-/spl mu/m CMOS with a power-efficient equalization architecture," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2658–2666, Dec. 2005.
- [4] B. Casper, M. Haycock, and R. Mooney, "An accurate and efficient analysis method for multi-Gb/s chip-to-chip signaling schemes," in *IEEE Very Large Scale (VLSI) Circuits Symp. Tech. Papers*, Jun. 2002, pp. 54–57.
- [5] P. K. Hanumolu *et al.*, "Analysis of PLL clock jitter in high-speed serial links," *IEEE Trans. Circuits Syst. II: Analog Digit. Signal Process.*, vol. 50, no. 11, pp. 879–886, Nov. 2003.
- [6] J. Ren and M. Greenstreet, "A unified optimization framework for equalization filter synthesis," in *Proc. 42nd Design Automat. Conf.*, Jun. 2005, pp. 638–643.
- [7] B. Casper *et al.*, "Future microprocessor interfaces: Analysis, design and optimization," in *IEEE Custom Integrated Circuits Conf.*, 2007, pp. 479–486.
- [8] V. Stojanovic and M. Horowitz, "Modeling and analysis of high speed links," in *IEEE Custom Integrated Circuits Conf. Dig. Tech. Papers*, 2003, pp. 589–594.
- [9] Stateye. [Online]. Available: http://www.stateye.org/
- [10] A. Amirkhany, A. Abbasfar, J. Savoj, and M. A. Horowitz, "Timevariant characterization and compensation of wideband circuits," in *IEEE Custom Integrated Circuits Conf.*, Sep. 2007, pp. 487–490.
- [11] J. Savoj, A. Abbasfar, A. Amirkhany, B. W. Garlepp, and M. A. Horowitz, "A new technique for characterization of digital-to-analog converters in high-speed systems," in *Design, Automat. Test Eur. Conf. Exhibit.*, Apr. 2007, pp. 1–6.
- [12] V. Stojanovic, A. Amirkhany, and M. A. Horowitz, "Optimal linear precoding with theoretical and practical data rates in high-speed seriallink backplane communication," in 2004 IEEE Int. Conf. Commun., Jun. 2004, vol. 5, pp. 2799–2806.
- [13] M. Mansuri and C.-K. K. Yang, "Jitter optimization based on phaselocked loop design parameters," *IEEE J. Solid-State Circuits*, vol. 37, no. 11, pp. 1375–1382, Nov. 2002.
- [14] G. Balamurugan and N. Shanbhag, "Modeling and mitigation of jitter in multi-Gbps source-synchronous I/O links," in *Int. Conf. Comput. Design*, Oct. 2003, pp. 254–260.

- [15] G. Balamurugan et al., "A scalable 5–15 Gbps, 14–75 mW low-power I/O transceiver in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 1010–1019, Apr. 2008.
- [16] E. Lee and D. Messerschmitt, *Digital Communication*. Norwell, MA: Kluwer, 1999.
- [17] J. L. Sonntag and J. Stonick, "A digital clock and data recovery architecture for multi-Gigabit/s binary links," *IEEE J. Solid-State Circuits*, vol. 41, no. 8, pp. 1867–1875, Aug. 2006.
- [18] R. Stephens, Dual Dirac Approx: Jitter Analysis: The Dual-Dirac Model, RJ/DJ, and Q-scale 2004 [Online]. Available: http://cp.literature.agilent.com/litweb/pdf/5989–3206EN.pdf
- [19] W. Beyene, "Modeling and analysis techniques of jitter enhancement across high-speed interconnect systems," in *IEEE Electrical Performance Electron. Packag.*, Oct. 29–31, 2007, pp. 29–32.



**Ganesh Balamurugan** (S'99–M'04) received the B.Tech. degree in electrical engineering from Indian Institute of Technology, Madras, India, in 1996, and the Ph.D. degree in electrical engineering from the University of Illinois, Urbana-Champaign, in 2004.

Since 2004, he has been with Intel's Circuit Research Laboratory, Hillsboro, OR. His research interests include low power equalization and clock recovery in high-speed links, I/O link modeling and analysis techniques, and adaptation methods for link performance optimization.



**Bryan Casper** received the M.S. degree in electrical engineering from Brigham Young University, Provo, UT.

He is currently leading the high-speed signaling team of Intel's Circuit Research Laboratory, Hillsboro, OR. In 1998, he joined the Performance Microprocessor Division of Intel Corporation and worked on the development of Pentium and Xeon processors. Since 2000, he has been a circuit researcher responsible for the research, design, validation, and characterization of high-speed mixed

signal circuits and I/O systems.



James E. Jaussi (M'01) received his B.S. and M.S. degrees in electrical engineering from Brigham Young University, Provo, UT, in 2000. He is currently working toward the Ph.D. degree in electrical engineering at Oregon State University, Corvallis.

For the past seven years, he has worked for Intel Laboratories, Hillsboro, OR. His main focus is research, design and characterization of high-speed CMOS transceivers and mixed signal circuits, with an emphasis in receiver equalization and clock recovery architectures.



**Mozhgan Mansuri** was born in Tehran, Iran. She received the B.S. and M.S. degrees in electronics engineering from Sharif University of Technology, Tehran, Iran, in 1995 and 1997, respectively, and Ph.D. degree in electrical engineering from University of California, Los Angeles, in 2003.

She was a Design Engineer with Kavoshgaran Company, Tehran, Iran, where she worked on the design of 46–49 MHz cordless and 900 MHz cellular phones from 1997 to 1999. In 2003, she joined Intel Corporation, Hillsboro, OR. Her research interests

include low-power low-jitter clock synthesis/recovery circuits (PLL and DLL) and low-power high-speed I/O links.

BALAMURUGAN et al.: MODELING AND ANALYSIS OF HIGH-SPEED I/O LINKS



**Frank O'Mahony** received the B.S., M.S., and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 1997, 2000, and 2004, respectively. His doctoral research focused on resonant clock distribution techniques for high-performance microprocessors.

Since 2003 he has been with Intel's Circuit Research Laboratory, Hillsboro, OR. His research interests include high-speed and low-power data links, clock generation and distribution, and design techniques for low-noise, variation-tolerant clocking and

signaling circuits.

Dr. O'Mahony received the 2003 Jack Kilby Award for Outstanding Student Paper at ISSCC.



**Joseph Kennedy** (S'88–M'91) received the B.S. degree in electrical and computer engineering from Oregon State University, Corvallis, in 1991.

He is a Senior Circuits Researcher with Intel's Circuits Research Laboratories, Hillsboro, OR. Since joining Intel in 1995 his responsibilities have included all aspects of research and design of high-speed mixed-signal circuits and I/O systems. His recent focus has been on low-power scalable-rate CMOS I/O transceivers. Prior to joining Intel, Joe spent four years with Lattice Semiconductor where

he worked as a lead circuit designer developing data-path circuits and I/O interfaces for electrically programmable logic components.