[IEEE 2011 IEEE International Symposium on Circuits and Systems (ISCAS) - Rio de Janeiro, Brazil (2011.05.15-2011.05.18)] 2011 IEEE International Symposium of Circuits and Systems

A 10 Gb/s Low-Power SerDes Receiver Based on aHybrid Speculative/SAR Digitization Technique

Arash Zargaran-Yazd, Shahriar Mirabbasi, and Res SalehDepartment of Electrical and Computer Engineering

University of British Colubmia, 2332 Main Mall, Vancouver, BC Canada V6T 1Z4Email: {arashz, shahriar, res}@ece.ubc.ca

Abstract— This paper presents a DSP-based SerDes receiverthat utilizes a hybrid speculative/successive-approximationregister (SAR) analog-to-digital converter (ADC) to digitize thedata at the receiver’s analog front-end. The proposed digitizationtechnique addresses the high-power drawback of conventionalflash-based ADCs used in sampling-based wireline receivers,while allowing for digital equalization and data recovery at10 Gb/s through five time-interleaved channels. The receiver chipis designed and laid out in a 65 nm CMOS process. The chipoccupies 1.53 mm2 , and based on simulation results, it consumes263 mW from a 1 V core supply.

I. INTRODUCTION

As analog performance of devices deteriorate in scaledCMOS technologies, the trend of moving as much as signalprocessing tasks as possible from the analog domain to thedigital domain is becoming more and more popular. Followingthis trend, advanced high-speed serializer/deserializer (SerDes)circuits are becoming more DSP-based to take advantage ofimproved functionality and flexibility of the digital clock anddata recovery (CDR) and equalization [1]–[4]. As wirelinedata rates increase in the multi-Gb/s range in the emergingstandards, e.g., IEEE 802.3ba Ethernet standard, sophisticatedfeed-forward equalization (FFE) and decision feedback equal-ization (DFE) are needed to compensate for the inter-symbolinterference (ISI) resulted from the band-limited nature of thechannel at multi-Gb/s speeds [5]. Such elaborate equalizationtranslates into tens of FFE and DFE taps, that if implementedin analog domain, result in a large silicon area and a highpower consumption beyond what can be afforded in practicalSerDes receivers. Therefore, the evolution towards a digitalarchitecture seems vital to enhance the power/area scalability,achieve bit error rates (BERs) equal to or better than 10−12

under high channel loss [5], and simplify the migration ofcircuits to more advanced process technologies.Although promising, the performance merits of DSP-basedreceivers are confined by metrics such as power consumption,speed, digitization resolution, and the impairments of thefront-end ADC [2], [4]. Complicating the problem is thefact that AFE’s power consumption trades off with thesefour factors primarily due to the power-hungry comparators.Recently published work [6], [7] use interpolating-flash and

This research is supported in part by the Natural Sciences and EngineeringResearch Council of Canada (NSERC) and the Institute for Computing,Information and Cognitive Systems (ICICS) at UBC. CAD support and accessto technology is facilitated by CMC Microsystems.

C1

C2

Pipelined T/H

Termination & Buffers

RX 5

5

5

DS

P core

DAC

DAC

5

5 5×5bits

5×5bits

d(i) 5bits

b(i) Clock Recovery

÷5

CLK1

CLK2

5

5

5

ADC

SAR cycles

CLK1b

Stage1: HoldStage2: Track

Stage1: TrackStage2: Hold

CLK12GHz

CLK210GHz

½ cycle delay

Total Hold time

ab(i)

Fig. 1. Block diagram of the proposed receiver. All clocks and analog signalsare differential.

pipeline ADC topologies for the front-end ADC to decreasethe number of comparators from the 2n regime in full-flashn-bit ADCs [1], [8] to 17 [6] and 6 [7] comparators per sub-ADC for 5 bits of resolution, respectively. This is achieved atthe price of limiting the speed of each sub-ADC to 2.5 GS/s[6] and 1.2 GS/s [7]. In this paper, we propose a DSP-basedreceiver which utilizes a hybrid digitization algorithm in whichonly 2 comparators are used in each sub-ADC to achieve adigitization rate of 2 GS/s per channel.The organization of this paper is as follows. The key conceptsof the proposed receiver are discussed in Section II. In the sub-sequent subsections, we focus on the required characteristicsof the front-end ADC and the proposed hybrid digitizationalgorithm. Simulation Results and comparison with recentwork is presented in Section III. Finally, Section IV concludesthe paper.

II. THE PROPOSED DSP-BASED RECEIVER

Figure 1 shows a block diagram of the proposed DSP-based receiver. Following the termination and buffer block,the received data from the channel is sampled and digitized atbaud-rate in receiver’s analog front-end (AFE) by a 5-channeltime-interleaved track-and-hold (T/H) and ADC pair whichare controlled by the differential clocks CLK1 and CLK1b,and the 5-phase CLK2, respectively. In this figure, ab(i) isthe received bit which has been distorted by the channel andhas an analog nature, d(i) is the digitized value of ab(i),

978-1-4244-9474-3/11/$26.00 ©2011 IEEE 446

b(i)= 1

b(i)= 0Speculative Digitization

(DSP)

SAR Digitization

(C2)

SAR Digitization

(C1)

Fig. 2. Data distortion due to inter-symbol interference. The two patternsdiffer in the received bit at Time=1.1ns.

and b(i) denotes the recovered bit which is known to thereceiver only after data recovery. The T/H is implementedas two pipelined stages which are controlled by the 2 GHzclock CLK1. Such topology insures that the sampled signalis kept constant for five SAR cycles which are triggered bythe 10 GHz clock CLK2. As will be explained in Section II-B, each sub-ADC has two comparators whose thresholds areadjusted in the analog domain using DACs. The latches insidethe comparators also operate in a pipeline manner and arecontrolled by CLK2 which is ideally extracted by recoveringthe clock from the recovered data. CLK1 is generated locallyby a divide-by-five action on CLK2. The value of d(i) isneeded by the DSP-based clock recovery algorithms such asMueller-Muller [9] to update the phase of CLK2. Followingthe AFE, the digitization and data recovery functions arecompleted in the DSP core based on the binary data yieldedfrom the two comparators during the five SAR cycles. A 5-tapDFE is used to cancel the ISI from 5 post-cursor components.Thus, the digitized level d(i) of the most recent sampled bitab(i) is assumed to be mostly influenced by the last fivesampled bits ab(i − 1), ab(i − 2), ab(i − 3), ab(i − 4) andab(i − 5), which have preceded ab(i). Decision feedbackequalization and data recovery are jointly performed in digitaldomain through comparison of d(i) with a value from lookuptable selected by the five-bit word of b(i− 1) to b(i− 5) withb(i− 5) as MSB. The lookup table values can be determinedby characterizing the channel at start-up cycle of the chip orthrough adaptive algorithms during the operation.

A. ADC Performance Requirements

The ADC topology and its digitization resolution dependon the speed, power, and BER requirements of the receiver.For multi-Gb/s receivers, usually flash topology is the premierchoice [1], [4], [8]. However, flash ADC’s high power con-sumption and capacitive loading make its use in high-speedSerDes (and the move from conventional receivers to ADC-based receivers) debatable [3]. Recently the SAR topology hasshown promising results in achieving speeds up to 24 GS/s

[10] and 40 GS/s [11] . Nevertheless, such ADCs do notnecessarily consume less power than a full-flash ADC [12].As the maximum conversion rate of a single SAR ADC is stillless than 100 MS/s [13], it should be highly interleaved (i.e.,160× in [10]) to reach sampling rates beyond 10 GS/s. Suchimmense level of interleaving results in high area overhead dueto peripheral circuitry, and an elaborate calibration mechanismis needed to cancel the sub-ADC non-idealities to avoidpattern-noise in ADC array [14]. In our design, by using theproposed speculative digitization technique in the SAR ADC,we have limited the interleaving factor of the ADC to five fora 10 GS/s speed. To determine the sampling frequency of theADC, a trade-off should be made between the speed of datarecovery and the complexity of the clock recovery function[6]. In this work, as in [1], the overall ADC sampling anddigitization rate is equal to the baud-rate to achieve a datarecovery rate equal to the highest clock rate in the system.Hence, Mueller-Muller [9] algorithm would be the optimumscheme for CDR. A blind-sampling CDR scheme [6] obviatesthe need for a phase-tracking feedback loop while limiting thedata recovery rate to half of the ADCs sampling rate.The band-limited nature of the channel reduces the amountof voltage change between consecutive bits (Fig. 2). Also, inmulti-level signalling schemes, the maximum number of signallevels that can be transmitted is confined by the receiver’sADC resolution [2]. Additionally, with higher ISI, the dataneeds to be sampled by an ADC with higher resolution toallow for optimum equalization and clock recovery with theaid of the DSP core. Therefore, the required ADC resolutiondepends on the channel characteristics and the signallingscheme used in the transmitter. In this work, the hybriddigitization algorithm achieves an overall resolution of 5 bits.

B. Digitization Algorithm

Figure 2 shows two distorted 10 Gb/s patterns at the receiverend of the channel, which only differ in the bit received attime 1.1 ns. Due to the lowpass nature of the channel, thereceived data can no longer maintain the original voltagelevels which are represented by a ’0’ or a ’1’. The high-frequency signal components are filtered out and hence, if aconsecutive ’0’ and ’1’ pattern is transmitted, the differencein the values of ab(i) and ab(i− 1) are reduced compared tothat at the transmit side. The value of ab(i) depends on itspreceding bits, as if the channel has a memory [15]. Hence, ifa full-flash ADC is used in the AFE, only a few comparatorswill be decisive in digitizing ab(i) at each sampling cycle ofeither of sub-ADCs, as ab(i) will not cross the threshold ofthe rest of the comparators. Through the hybrid algorithm ofFig. 3, we exploit channel’s memory to quickly speculate thedigitized value of each received bit based on the five previouslyrecovered bits, and then use the conventional SAR digitizationcycles to fine-tune the speculated value to yield d(i). As shownin the algorithm of Fig. 3, a finite state machine (FSM) uses thespeculated value to narrow down the SAR search domain. TheFSM takes over the SAR step of the hybrid algorithm (dashedhorizontal lines in Fig. 2) from the points specified by the two

447

SARComparison

[C1==1 && C2==1] {b(i)=1}

State2=1Data Recovery=pending

Digitization=pending

[State2=0]

{Spec1= SAR1, Spec2= SAR2, Reset all

States, cycle=1 }

Start of SAR (CLK2) cycles

[State2=1]{d(i)=SAR1}

[C1==0 && C2==0] {b(i)=0}

[C1==0 && C2==1]

{cycle++, SAR1--, SAR2++}

State1=1Data Recovery=doneDigitization=pending

[cycle<=5]

Calibration of Speculative

Digitization data

State3=1Data Recovery=done

Digitization=done

[cycle>5 && state3 != 1]

[cycle>5 && state3==1]

[State2=0]

[State2=1]{d(i)=SAR2}[State1=0]

[State1=1]

[b(i)=1]{d(i)=SAR1}

[b(i)=0]{d(i)=SAR2}

{cycle++}

[b(i)=0]{SAR2--}

[b(i)=1]{SAR1++}

Fig. 3. The hybrid speculative/SAR digitization algorithm implemented asa finite state machine. [] and {} indicate IF and DO statements, respectively.

speculated values (solid vertical lines in Fig. 2). As there aretypically two separated domains based on either transmittinga ’1’ or a ’0’, two separate comparators are used in each sub-ADC to perform the SAR cycles for each of the two casesb(i)=0 or b(i)=1, in parallel. Breaking the SAR search domaininto two separate paths enhances the speed of digitization anddata recovery functions. As shown in Figs. 2 and 3, in thespeculative step of the hybrid algorithm, the thresholds ofcomparators C1 and C2 are simultaneously adjusted at thebeginning of each CLK1 cycle based on the two speculatedvalues for either of b(i) cases. This point coincides with thebeginning of SAR cycles while the T/H keeps the sampledsignal constant for five CLK2 cycles, such that the SARalgorithm can go through five threshold-adjustment iterationcycles per sampled bit. As a result, the effective digitizationspeed of each sub-ADC would be 2 GS/s, yielding an overallbaud-rate ADC sampling/digitization speed of 10 GS/s. Thespeculated and SAR values for comparator thresholds areconverted to analog domain using current steering DACs [16].

C. Data Recovery

Using two comparators in each sub-ADC, as opposed to theconventional SAR architecture which uses only one compara-tor, provide us with several advantages. Having two parallelcomparators boosts the speed of each sub ADC, and allowsfor integration of first step of data recovery into the ADC.At the speculative step, the thresholds of the two comparatorsin each sub-ADC are adapted in a way that one of which

falls in vicinity of the sampled data. Thus, the decision ofthe comparators can promptly point to the ’0’ or ’1’ natureof the distorted data. A simple logic is needed to performthe second stage of data recovery in DSP core. As shown inthe diagram of Fig. 3, if at the start of SAR cycles C1=1and C2=1, or C1=0 and C2=0 (C1 and C2 represent thedecisions of the two comparators), the data can be promptlyrecovered as b(i)=1 or b(i)=0, respectively. However, if C1=0and C2=1, ab(i) is some where between the thresholds ofthe two comparators. If the channel has been characterizedproperly, then ab(i) should be close to the threshold of one ofthe comparators. Therefore, SAR iteration cycles are neededto slightly change the thresholds in vicinity of the speculatedvalues. Consequently, the data would be recovered if any of thecases C1=1 and C2=1 hence b(i)=1, or C1=0 and C2=0 henceb(i)=0, are achieved. If non of the cases are achieved at the endof SAR cycles, a flag would indicate the need for calibrationof speculative data in lookup tables. This can be avoided ifan adaptive real-time algorithm for channel characterizationexists.

D. Power/Resolution Trade-off in ADC and DAC

In the proposed ADC, although the number of comparatorsremain constant as the resolution increases, the resolutionof the DAC should increase accordingly. Therefore, at thefirst glance, it may seem that the power consumption ofthe DAC offsets the power savings of the proposed ADC,resulting in an overall power consumption comparable to thatof a full-flash architecture. However, a closer look at theelements of the high speed comparator and DAC reveals thatpower-consumption/resolution trade-off in high speed ADCsis substantially more aggressive than in DACs operating atcomparable speeds. Our comparators are similar to that in[17] having a preamplifier, two CML latches, and a CML-to-CMOS stage; all of which drawing static current. On the otherhand, if we neglect the typically low power consumption of thedigital logic, the price that we pay to increase the resolutionin a current-steering DAC is confined to that of a differentialpair with a tail current source [16]. It is important to mentionthat while all of the CML blocks inside the comparator areconstantly drawing static current, the current sources inside theDAC can be completely switched on/off based on the value ofthe digital code, further decreasing DAC’s power consumptionat the price of a slightly compromised conversion speed. Withthe proposed ADC structure, we are pushing the power tollof increased resolution towards the digital logic (low powerconsumption) and DAC (fair power consumption), as opposedto the power-hungry comparators of flash ADC.

III. SIMULATION RESULTS AND COMPARISON

To verify the proposed receiver architecture, we have firstperformed system-level simulations in Matlab/Simulink us-ing non-idealities in channel response, comparator thresholdoffset, and clock jitter [18]. A legacy channel with −12 dBattenuation at 2 GHz is used to incorporate ISI into the signal.Five DFE taps are found to be adequate to cancel the ISI at

448

AFE

&

DACs DSP

1242µm

12

39

µm

Fig. 4. Layout of the proposed receiver in TSMC 65nm technology

10 Gb/s data rate. Then, the receiver architecture is designed,simulated and laid out in a 65nm CMOS process operatingfrom a 1 V core supply and a 2.5 V I/O supply. Virtuallyerror-free data recovery is achieved at 10 Gb/s. The AFE,DACs, and DSP core consume 32 mW, 219 mW, and 12 mW,respectively. Figure 4 shows the chip layout that occupies1.53mm2 (excluding the pads). The AFE and DACs occupy60% of the overall circuitry die area. Table III comparesthe characteristics and performance metrics of this work withrecently published ADC-based receivers.

TABLE ICOMPARISON OF THE PROPOSED RECEIVER WITH RECENT PUBLISHED

WORK

ISSCC2007[1]

JSSC2008[8]

JSSC2009[7]

ISSCC2009[19]

JSSC2010[6]

Thiswork

Technology(nm) 65 90 130 65 65 65

Data Rate(Gb/s) 12.5 10 4.8 10.31 5 10

Power(W) 0.33 4.5 0.3 1.2 0.178 0.263

Area(mm2) 0.45 32 1.69 15 0.51 1.53

ADCtopology Full-

flashPipeline Pipeline Full-

flashInter-polating-

flash

HybridSAR

NominalADCresolution(bits)

4.5 8 5 6 5 5

Over sam-pling fac-tor

1 1 1 1 2 1

Interleavingfactor 2 8 4 4 2 5

DFE taps 5 No 6 Dynamic No 5FFE taps 2 25 No Dynamic 2 NoOff-chipblock No No Clock

recov-ery

No No DLL

IV. CONCLUSION

A low-power architecture for high-speed digital SerDesreceivers is presented. A custom digitization algorithm isdeveloped for the front-end ADC to utilize channel’s low-passcharacteristic in order to speculate the the digitized value ofthe sampled signal. Such algorithm speeds up the succeedinglow-power SAR digitization step; thus, allowing to reach asampling/digitization speed of 2 GS/s per channel. Additionalpower savings are achieved by obviating the need for the multi-stage buffers preceding the ADC, and by embedding parts ofthe data recovery and DFE functions within the ADC. The datarecovery speed is enhanced by using a 5-way interleaving atAFE and high order of parallelism at DSP core.

REFERENCES

[1] M. Harwood, N. Warke, R. Simpson, T. Leslie, A. Amerasekera, andS. Batty, “A 12.5gb/s serdes in 65nm cmos using a baud-rate adc withdigital receiver equalization and clock recovery,” in ISSCC, Feb. 2007,pp. 436–437.

[2] A. Sheikholeslami, “Multi-level signaling for chip-to-chip and backplanecommunication (a tutorial),” in ISMVL, May 2009, pp. 203–207.

[3] A. Sheikholeslami, B. Payne, and J. Lin, “Will adcs overtake binaryfrontends in backplane signaling (special evening session, se3),” ISSCC,p. 514, feb 2009.

[4] C.-K. K. Yang and E.-H. Chen, “Adc-based serial i/o receivers,” in CICC,Sept. 2009, pp. 323–330.

[5] O. Ishida, “40/100gbe technologies and related activities of ieee stan-dardization,” mar. 2009, pp. 1–29.

[6] O. Tyshchenko, A. Sheikholeslami, H. Tamura, M. Kibune, H. Yam-aguchi, and J. Ogawa, “A 5-gb/s adc-based feed-forward cdr in 65 nmcmos,” JSSC, vol. 45, no. 6, pp. 1091–1098, jun. 2010.

[7] A. Varzaghani and C.-K. Yang, “A 4.8 gs/s 5-bit adc-based receiverwith embedded dfe for signal equalization,” JSSC, vol. 44, no. 3, pp.901–915, March 2009.

[8] O. Agazzi, M. Hueda, D. Crivelli, H. Carrer, A. Nazemi, and G. Luna,“A 90 nm cmos dsp mlsd transceiver with integrated afe for electronicdispersion compensation of multimode optical fibers at 10 gb/s,” JSSC,vol. 43, no. 12, pp. 2939–2957, Dec. 2008.

[9] K. Mueller and M. Muller, “Timing recovery in digital synchronous datareceivers,” Communications, IEEE Transactions on, vol. 24, no. 5, pp.516–531, May 1976.

[10] P. Schvan, J. Bach, C. Fait, P. Flemke, R. Gibbins, Y. Greshishchev,N. Ben-Hamida, D. Pollex, J. Sitch, S.-C. Wang, and J. Wolczanski, “A24gs/s 6b adc in 90nm cmos,” in ISSCC, Feb. 2008, pp. 544–545.

[11] Y. Greshishchev, J. Aguirre, M. Besson, R. Gibbins, C. Falt, andP. Flemke, “A 40gs/s 6b adc in 65nm cmos,” feb. 2010, pp. 390–391.

[12] B. Murmann, “A/d converter trends: Power dissipation, scaling anddigitally assisted architectures,” in CICC, Sept. 2008, pp. 105–112.

[13] Y. Zhu, C.-H. Chan, U.-F. Chio, S.-W. Sin, U. Seng-Pan, and R. Martins,“A 10-bit 100-ms/s reference-free sar adc in 90 nm cmos,” JSSC, vol. 45,no. 6, pp. 1111–1121, jun. 2010.

[14] O. Agazzi and V. Gopinathan, “Background calibration of interleavedanalog to digital converters for high-speed communications using inter-leaved timing recovery techniques,” in ISCAS, May 2005, pp. 1390–1393Vol. 2.

[15] J. Proakis and M. Salehi, Digital Communications, 5th ed. New York,NY, USA: McGraw-Hill, Inc., 2008.

[16] C.-H. Lin, F. van der Goes, J. Westra, J. Mulder, Y. Lin, E. Arslan,E. Ayranci, X. Liu, and K. Bult, “A 12 bit 2.9 gs/s dac with im3 60 dbcbeyond 1 ghz in 65 nm cmos,” JSSC, vol. 44, no. 12, pp. 3285–3293,dec. 2009.

[17] M. Choi and A. Abidi, “A 6-b 1.3-gsample/s a/d converter in 0.35- mu;mcmos,” JSSC, vol. 36, no. 12, pp. 1847–1858, dec. 2001.

[18] P. Chopp and A. Hamoui, “Discrete-time modeling of clock jitter incontinuous-time delta-sigma modulators,” may. 2007, pp. 497–500.

[19] J. Cao, B. Zhang, U. Singh, D. Cui, A. Vasani, and A. Garg, “A500mw digitally calibrated afe in 65nm cmos for 10gb/s serial linksover backplane and multimode fiber,” in ISSCC, Feb. 2009, pp. 370–371.

449

Documents

[IEEE 2011 IEEE International Symposium on Circuits and Systems (ISCAS) - Rio de Janeiro, Brazil (2011.05.15-2011.05.18)] 2011 IEEE International Symposium of Circuits and Systems