Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
412 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 51, NO. 2, FEBRUARY 2016
A 6.7 MHz to 1.24 GHz 0.0318 mm2 Fast-LockingAll-Digital DLL Using Phase-Tracing Delay
Unit in 90 nm CMOSMin-Han Hsieh, Member, IEEE, Liang-Hsin Chen, Shen-Iuan Liu, Fellow, IEEE, and Charlie Chung-Ping Chen
Abstract—In this paper, an all-digital delay-locked loop(ADDLL) with a phase-tracing delay unit (PTDU) has been pro-posed to achieve wide-operating frequency range, low power,and low cost. For the wide-range DLL, the long delay line isreplaced by a PTDU which includes two gated ring oscillators(GROs) for generating the wide delay range with a reduced diearea. According to the dual-loop control scheme in this work,the input clock rising edge and falling edge are tracked inde-pendently to ensure that the ADDLL output maintains the dutycycle of the input reference. Furthermore, the ADDLL utilizes anopen-loop scheme to achieve fast lock time of five clock cycles forall supported input frequencies. The proposed ADDLL has beenfabricated in TSMC 90 nm CMOS technology and supports awide-operating frequency range from 6.7 MHz to 1.24 GHz withina small active area of 0.0318 mm2. The measured peak-to-peakand root-mean-square jitter at 1.24 GHz are 2.22 ps and 424.62 fs,respectively. The ADDLL consumes 14.5 mW while operating at1.24 GHz.
Index Terms—All-digital, delay-locked loop (DLL), fast locking,open-loop locking, phase-tracing delay unit (PTDU), wide range.
I. INTRODUCTION
I N RECENT years, advanced deep-submicron CMOS tech-nologies have provided IC designers with the capability
to design high quality and high reliability system on chips(SoC). Delay-locked loops (DLLs) have been frequently usedto seamlessly connect the custom IP blocks in SoCs, andthey are integral in enabling clock generation and synchro-nization for high-speed data communications. Although manynew standards have been proposed for high-speed communica-tions, those standards should be compatible with the standardsthat are currently being used. For example, in memory systemswhere DLLs are commonly used, the maximum I/O clock fre-quency is 1066 MHz in the latest DDR3 standard, while theminimum I/O clock frequency is 100 MHz in the original DDRstandard. As the technology keeps moving forward, it is pre-dictable in the future that an ultra-wide-range DLL is requiredin supporting all frequencies of all standards. In the past, cus-tomized analog DLLs were used for high-speed I/O [1], [2].
Manuscript received January 30, 2015; revised July 18, 2015; acceptedOctober 11, 2015. Date of publication November 03, 2015; date of currentversion January 29, 2016. This paper was approved by Associate Editor PavanKumar Hanumolu.
The authors are with the Graduate Institute of Electronics Engineering,National Taiwan University, Taipei 10617, Taiwan (e-mail: [email protected]; [email protected]; [email protected]; [email protected]).
Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/JSSC.2015.2494603
However, analog DLLs face some problems, such as high leak-age current, low supply voltage, and increased PVT variationsdue to CMOS scaling. In order for highly integrated digitalsystems to reach its reliability requirements, many IC design-ers have shifted their focus to digitally assisted and all-digitalDLLs (ADDLLs) [3]–[6], forgoing analog DLLs despite theirfiner timing resolution and resulting low jitter performance.Moreover, digital DLLs achieve fast-locking times and low-power consumption while also having simplified designs. Thereare two locking mechanisms used to realize ADDLLs: closedloop and open loop. The closed-loop scheme is widely adoptedin conventional phase-locked loops (PLLs) and DLLs. Closed-loop ADDLLs will update the control code every referenceclock cycle depending on the detected phase error. The out-put jitter is not only due to the noise, but it is also a functionof the delay resolution. As a result, the closed-loop ADDLLssuffer from a worse jitter performance due to cycle-to-cyclecontrol code variations. Although there are some papers dis-cussing about the loop filter design for reducing output jitter,jitter amplification is still an important problem in the closed-loop ADDLLs [7]. On the other hand, because DLLs are onlyconcerned with the time delay from the input signal of a givenfrequency, open-loop ADDLLs can provide better jitter perfor-mance. Usually the peak-to-peak jitter is a few picoseconds andthe root-mean-square jitter is at the femtosecond level becausethe control code is not toggling for a fixed delay. Additionally,open-loop DLLs also achieve fast-locking time.
In order to achieve a wide frequency range, some engineersrecommend integrating two or more DLLs with different oper-ating frequencies. Therefore, a band selector is necessary toselect the appropriate DLL for a given standard. The drawbackof this method, however, is its high cost. To minimize compo-nents and therefore reduce cost, the practical solution would beto include only one DLL with wide operating frequency range[8]–[14] to satisfy different standards. In wide-range DLLs, har-monic locking would be an important issue due to the widedelay range. Even though there is only one DLL, the longdelay line in the wide-range DLLs for wide delay range usu-ally occupies a large area and thus also increases the cost andpower.
In this work, a wide-range, fast-locking, low-cost, and low-power ADDLL has been proposed. The ADDLL utilizes aphase-tracing delay unit (PTDU) to replace the massivelydeployed delay cells in the delay line. The PTDU providesa wide delay range using two simple gated ring oscillators
0018-9200 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
HSIEH et al.: 6.7 MHz TO 1.24 GHz 0.0318 mm2 FAST-LOCKING ADDLL 413
Fig. 1. Delay line in delay-locked loops. (a) Traditional serially linked delay line. (b) Proposed GRO-based delay line.
(GROs) to simultaneously reduce the area cost and powerconsumption of a wide-range DLL. An open-loop scheme isadopted so that the ADDLL can achieve a fast-locking time offive cycles regardless of input frequency. The open-loop schemealso achieves better jitter performance with 2.22 ps peak-to-peak jitter and 424.64 fs root-mean-square jitter at 1.24 GHz.The operating frequency range of the proposed ADDLL is from6.7 MHz to 1.24 GHz. Because the PTDU tracks both the ris-ing edge and falling edge of the reference clock independently,the ADDLL’s output maintains the duty cycle of the referenceclock. In this paper, the main innovation and some analysis ofthe proposed ADDLL will be described in Section II. The archi-tecture and circuit implementation details will be presented inSection III. Measurement results will be shown in Section IV.Finally, a brief conclusion will be given in Section V.
II. INNOVATION AND ANALYSIS OF THE
PROPOSED ADDLL
The main idea of the proposed ADDLL involves replacingthe long serially linked delay cells with two NOR-GROs. Thefunction of the DLL is to generate an output clock CKDLL,which is synchronized with the input clock CKREF. In conven-tional DLLs, the delay line consists of many delay cells that areused to generate delays up to a full clock period, as shown inFig. 1(a). Because each delay cell is composed of two inverters,the delay of each cell is 2t, where t is the inverter delay. WhenN delay cells are used, the longest clock period supported is2Nt. Therefore, more delay cells are required to support awider frequency range. Thus, wide-range DLLs would incur ahigh area cost. In the proposed ADDLL, the long delay lineis replaced by two NOR-GROs each with n stages, where nshould be an odd number. One GRO is gated by CKREF whilethe other is gated by /CKREF, as shown in Fig. 1(b). The NOR-GRO gated by CKREF oscillates when CKREF is high andanother NOR-GRO gated by /CKREF oscillates when CKREF
is low. As a result, a synchronous output clock CKDLL willbe generated by sensing the desired edge of each of the twoGRO output, representing the delay associated with the risingand falling edges of CKREF. Consequently, the area cost will
be reduced to only 2n delay cells. Assuming each gate delay ist, the oscillation period of an n-stage GRO is 2nt with a 50%duty cycle. Therefore, the estimated delay range of the proposedADDLL is
2nt ≤ T ≤ 2 (2K + 1)nt (1)
where K represents the number of GRO oscillation cycles asdetermined by the maximum value of the pulse counter. In orderto achieve the same operating range, a serially linked delay linerequires N = (2K + 1)n delay cells and Ninv gates, where
Ndelayline = 2N = 2 (2K + 1)n. (2)
Because each delay cell in the conventional delay line consistsof two inverter gates, twice as many gates are needed as delaycells. Thus, the conventional serially linked delay line requires2K + 1 times more gates than the GRO-based delay line toachieve the same delay range.
In order to make the comparison more fairly, we shouldinclude the control logic which is usually used for automaticfrequency detecting into the analysis. Fig. 2 shows the perioddetector in the traditional wide-range DLL [12]. Another NDFFs are applied and each DFF is triggered by the correspond-ing clock signal in the traditional serially linked delay line.Thus, the reference clock period will be determined accordingto the DFFs output Q[N : 1], which is a thermometer code. Theoverall cell number in the traditional wide-range DLL is
Ntraditional = Ndelayline +NDFF = 3N = 3 (2K + 1)n. (3)
Another innovation of our proposed ADDLL is in the design ofour pulse counter which is also the period detector of the pro-posed ADDLL. There are three ways to count the number ofGRO pulses, as shown in Fig. 3. The first involves using a sim-ple binary counter, which allows for maximal power and areasavings, as shown in Fig. 3(a). However, although the binarycounter uses the less DFFs, log2K, and saves the area mostly, itmaybe failed at high clock frequency output of the GROs. Thesecond way involves serially linking DFFs into a shift-register,as shown in Fig. 3(b). When utilizing this method, the speed is
414 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 51, NO. 2, FEBRUARY 2016
Fig. 2. Clock period detector for the traditional wide-range DLLs. (a) Architecture. (b) Waveform diagram.
Fig. 3. GRO pulse counters. (a) Binary counter. (b) Shift-register type. (c) Proposed combinational type.
limited by the propagation delay, setup time, and hold time ofthe DFFs. Although it is easier to meet timing requirements athigh GRO frequencies with a serially linked DFF-based pulsecounter, a long DFF chain with K DFFs must be employed tosupport a wide frequency range. This would effectively negatethe power and area savings one would hope to achieve byswitching to a GRO-based delay line in the first place. Thus,we use a combination of the two methods to address this issue,shown in Fig. 3(c). We adopt x serially linked DFFs for highspeed, followed by a counter with y bits to support a widefrequency range. Therefore, the maximum pulse count K is
K = x · 2y. (4)
Then, y can be derived as
y = log2
(K
x
). (5)
However, we only need x+ y DFFs which is
x+ y = x+ log2
(K
x
). (6)
We should notice that since two GROs are used, twice this num-ber of DFFs is actually required. As a result, we can see thenumber of logic gates is roughly reduced in logarithm. Theproposed architecture is only meaningful given the followingstatement:
Ntraditional > 2 (n+ x+ y) . (7)
From (3), we can rewrite the condition as
3 (2K + 1)n > 2
[n+ x+ log2
(K
x
)]. (8)
Usually, K is relative to the DLL operating frequency range. nis the number of gates in a GRO which is relative to the oscil-lating frequency. x is the shift-register length, which should bedetermined by the speed requirement of the binary counter usedin pulse counting. We can select proper value of K, n, and x fora wide frequency range with area saving. For the same numberof cells 2[n+ x+ log2(K/x)], the conventional serially linkeddelay line provides the maximum delay TA,bound
TA,bound =4
3
[n+ x+ log2
(K
x
)]t. (9)
HSIEH et al.: 6.7 MHz TO 1.24 GHz 0.0318 mm2 FAST-LOCKING ADDLL 415
Fig. 4. Comparison of the traditional serially linked delay line and the proposed GRO-based delay line (with automatic period detector). (a) Delay range versuscell number. (b) Operating frequency versus power consumption.
Equation (9) shows that if the longest required clock periodis larger than TA,bound, the GRO-based delay line achievessmaller area for a constant 2[n+ x+ log2(K/x)] delay cells.Fig. 4(a) shows the cost comparison between a conventionaldelay line and the GRO-based delay line for selected n, x,and K values. As the required delay increases, the number ofcells required of a conventional delay line also increases, pro-portionally to the delay increases. However, the GRO-baseddelay line requires a constant number of delay cells 2[n+ x+log2(K/x)] to support the wide frequency range. However,since true single phase clock (TSPC) logic is adopted for theDFFs in this work, we should notice that the area of a DFF is 4×larger than one inverter. For saving silicon area, the conditionexpressed in (8) should be rewritten as
6 (2K + 1)n > 2
[n+ 4x+ 4log2
(K
x
)]. (10)
In terms of power consumption, the GRO-based delay linealso performs better than the serially linked delay line. In atraditional wide-range DLL, all of the delay cells in the long,serially linked delay line are switching at the reference fre-quency. Because the power consumption is proportional tothe operating frequency, a higher operating frequency leads tohigher power consumption. Additionally, the number of cellsin the delay line is also related to the system power consump-tion. We should account for every cell in the delay line whencalculating the power intake of the system. In a traditionalwide-range DLL, many inverters are used to enable wide delayrange. While operating with a 2mt clock period, although onlym buffers are required to produce enough delay, the power ofall N buffers should be included. As a result, the total powerconsumption of 2N gates is
Pdelayline = CV 2 1
t
(N
m
). (11)
The power consumption is proportional to N/m. No mat-ter what the operating frequency is, all delay cells areswitching. Therefore, lower power consumption is attainableat lower frequencies, where the number of useful delay cell
m approaches the total number of delay cells N . On the otherhand, while operating at higher frequencies where fewer buffersare useful, a much higher power consumption N/m timesCV 2 (1/t) is necessary. The minimum power is CV 2 (1/t) atthe lowest operating frequency where m is equal to N . Sinceeach DFF in the period detector is triggered in every referenceclock period 2mt and the power of the TSPC DFF is roughly4× larger than that of an inverter, the power consumption of theDFFs is
PDFF = CV 2 1
t
(2N
m
). (12)
As a result, the overall power consumption of the traditionalDLL is
Ptotal,traditional = Pdelayline + PDFF = CV 2 1
t
(3N
m
). (13)
However, for the proposed GRO-based delay line, because twoGROs are oscillating with a period of 2nt and each GRO isoscillating in only half of the input reference cycle, the powerof the 2 GROs is
PGRO = CV 2 1
t· 12. (14)
In the pulse counter, the shift-register is still triggered in theGRO oscillation frequency of 1/2nt and the switching fre-quency of the binary counter has been slowed down to 1/(2ntx)by the shift-register. Thus, the power consumption of the pulsecounter is (DFF power is 4× larger than that of an inverter)
PPC = CV 2 1
t· 2
nx
[x2 + log2
(K
x
)]. (15)
Consequently, the total power is
Ptotal,proposed = PGRO + PPC
= CV 2 1
t· 12+ CV 2 1
t· 2
nx
[x2 + log2
(K
x
)]
= CV 2 1
t·{1
2+
2
nx
[x2 + log2
(K
x
)]}.
(16)
416 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 51, NO. 2, FEBRUARY 2016
Fig. 5. Proposed ADDLL. (a) Architecture. (b) Waveform diagram.
Since the proposed GRO-based delay line consumes a constantpower regardless of the DLL input frequency, the GRO-baseddelay line profits from lower power, as compared to conven-tional delay lines, at higher operating frequencies. For a seriallylinked delay line with N buffers consuming the same power, theoperating clock period TP,bound is
TP,bound = 2mP,bound · t=
3N14 + 1
nx [x2 + log2 (K/x)]· t.
(17)
Therefore, when the operating clock period is smaller thanTP,bound, the GRO-based delay line saves power over theconventional delay line by drawing a constant power ofPtotal,proposed. Based on (13) and (16), the mathematical con-dition can be derived as
3N
m>
1
2+
2
nx
[x2 + log2
(K
x
)]. (18)
Since N , m, and K are relative to the operating frequencyrange, we should notice that the proposed architecture provides
power and area advantages in supporting ultra-wide frequencyranges with a well-selected shift-register length x and GRO gatenumber n to meet the conditions in (10) and (18). Fig. 4(b)shows a rough power analysis. The proposed ADDLL con-sumes a constant power, which is independent of the operatingfrequency. DLLs are usually used to complement the phase dif-ference between the original input clock and the target clock.Because the GROs can track the rising and falling edges ofthe target clock independently, we can create an output clockin phase with the target clock from the original input clock bycombining the tracking edges. Furthermore, we do not need tocalibrate the mismatch between the two GROs caused by PVTvariations because the two GROs are totally independent anduncorrelated.
III. ARCHITECTURE AND CIRCUIT IMPLEMENTATION OF
THE PROPOSED ADDLL
Fig. 5(a) shows the proposed architecture of the ADDLL.The ADDLL consists of the PTDU, digital phase selector(DPS), digital phase mixer (DPM), edge combiner (EC), and
HSIEH et al.: 6.7 MHz TO 1.24 GHz 0.0318 mm2 FAST-LOCKING ADDLL 417
control unit. In the PTDU, two GROs are included to convertthe input clock CKREF into two pulse signals T_L and T_H,with the most delay times to the rising and falling edge ofCKREF, respectively. The PTDU is the frequency detector andcoarse tuning block of the wide-range DLL. Because of thepoor timing resolution of the PTDU, the DPS and DPM areadopted fine-tuning blocks following the PTDU to improve thetiming resolution. However, the delays of the DPS and DPMshould be defined correctly for minimum output phase error.Therefore, the control unit, including two replica delay lines(RDLs) and code generators, will generate the control codesDH/L and FH/L to determine the delays of the DPS and DPMcorrectly. Furthermore, an open-loop locking scheme is adoptedfor fast locking. In the end, because H and L are pulse func-tions representing the delay of the rising and falling edges ofCKREF, an EC is required to recover an output clock with theduty cycle of CKREF.
Fig. 5(b) shows the waveform diagram of the ADDLL. In thiswork, a synchronous output clock CKDLL is generated by com-bining two phases H and L, which represent the delay of thefalling edge and the rising edge of CKREF. H and L are gen-erated from the PTDU, DPS, and DPM operations, where thePTDU provides most of the (coarse) signal delay, and the DPSand DPM finely specify the delay to minimize the output phaseerror. In the PTDU, two NOR-GROs are adopted instead in lieuof a long chain of delay cells to generate T_H and T_L. OneNOR-GRO is gated by /CKREF and oscillates when CKREF ishigh; another is gated by CKREF and oscillates when CKREF islow. In order to achieve automatic locking, the PTDU is adoptedas the frequency detector in the proposed wide-range ADDLL.The operating frequency is detected by the counted pulse num-ber j of GRO. While operating at different frequencies, j willbe in different values. By sensing a proper NOR-GRO risingedge prior to the rising edge and falling edge of CKREF, T_Land T_H are generated to track the rising and falling edge ofCKREF, respectively. In this work, the third from last of theNOR-GRO rising edge [edge j − 2 in Fig. 5(b)] is selectedto provide enough time for the following DPS, DPM, and ECblocks to operate correctly. The PTDU requires a smaller areaand achieves a wider operating frequency range without usinga long delay line. As shown in Fig. 5(b), assuming each gatedelay is t, the oscillation period of the GRO is 6t with a 50%duty cycle. Consequently, the estimated ADDLL delay timerange is
21t ≤ T ≤ [6 (K + 1) + 3] t. (19)
Because the maximum value of pulse counter K is 320 in thiswork, a pulse counter consisting of a five-stage shift-registerand a 6 bit binary counter is adopted.
A. Phase-Tracing Delay Unit
To track the falling edge and rising edge of CKREF indepen-dently, the PTDU is divided into two parts, H-part for fallingedge tracing (with output signal T_H) and L-part for risingedge tracing (with output signal T_L), as shown in Fig. 6(a).As described later, to compensate for the intrinsic delay of
the various ADDLL blocks, the third from the last rising edge[j − 2 edge in Fig. 5(b)] of the falling edge/rising edge tracingGRO is output as T_H/T_L. The H-part and L-part each consistof a cyclic pulse generator (CPG), a path selector, a timing con-troller, a MUX, and a NOR-GRO which is gated by CKREF inthe L-part and /CKREF in the H-part. The CPG and the counterin the timing controller make up the pulse counter. In the CPG,Q1 to Q5 are generated from serially linked and chronologi-cally triggered DFFs when the GRO oscillates. Because theseDFFs are reset by Q5 or the input reference clock, each sig-nal from Q1 to Q5 may have multiple pulses in a given clockcycle. Two steps are required to select a rising edge from Q1
to Q5 corresponding to the (j − 2)th rising edge of the GRO.The first step, performed by the path selector, is to select thecorrect QN . Since there are many pulses in the selected QN ,the second step is to select the last pulse of QN which corre-sponds to the third from last rising edge (edge j − 2) of theGRO as the tracing signal. When the GRO stops oscillating atthe end of each input cycle, the registers in the path selectorcaptures the values of Q1 to Q5 for generating C[5:1]. SinceQ1 to Q5 are triggered chronologically, C[5:1] is a thermome-ter code. Therefore, there is only one nonzero bit in S[5:1] dueto the XOR gates. As a result, S[5:1] determines which QN
should be selected. Due to the input clock jitter, DFF setupand hold time violations may induce bubble codes in S[5:1].Therefore, a finite-state machine (FSM) is necessary to gener-ate a stable control code Sel[5:1]. As shown in Fig. 6(b), theFSM determines whether to refresh the control code Sel[5:1] ornot by comparing the current Sel[5:1] and the renewed controlcode S[5:1]. When each Sel[i] is obtained, Sel[5:1] will remainthe same value if S[i− 1], S[i], or S[i+ 1] is the same asS[i]. Otherwise, Sel[5:1] will be refreshed to S[5:1]. If QN hasmultiple rising edges within a clock period, only the last edgeshould be selected as the tracing signal. Therefore, a timingcontroller is adopted to determine when to output the selectedQN . The timing controller consists of a counter triggered by therising edge of Q5 and reset by the input clock, a register stor-ing the current count, a subtractor, and a comparator. Becausethere may be several pulses in Q5, the counter in the timingcontroller calculates the current pulse number and stores thenumber in the register. When Q1 or Q2 is selected in the pathselector, the subtrahend of the subtractor is zero because Sel[3]or Sel[4] is true and Sel[1], Sel[2], and Sel[5] are false in binary.Otherwise, the subtrahend of the subtractor is 1. Therefore,when the value of the counter is the same as the output of sub-tractor, the selected signal QN will be passed out. Fig. 7(a)shows the waveform diagram of how Q2 is selected andFig. 7(b) shows the waveform diagram of how Q5 is selected inthe L-part.
B. Digital Phase Selector, Digital Phase Mixer, and EdgeCombiner
After the PTDU provides the two tracing phases T_H andT_L with coarse delay times, the DPS and DPM [9] will finelytune the delays of T_H and T_L independently, resulting insignals H and L that are used for recovering the output clockCKDLL via the EC. In the DPS, two adjacent phases of a short
418 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 51, NO. 2, FEBRUARY 2016
Fig. 6. Proposed PTDU. (a) PTDU architecture and circuit diagram of L-part. (b) FSM circuit diagram.
delay line are selected by MUXs for output according to thecontrol code DH/L[5 : 0], as shown in Fig. 8(a). The timingresolution of the DPS is limited by the buffer delay τ , which isusually around 40 ps in 90 nm CMOS technology. Therefore,without the DPM, the maximum output phase error is 40 ps dueto the open-loop locking method applied in this work. Thus,to reduce the phase error, a DPM is used to improve timingresolution. After two adjacent phases are selected by the DPS,the DPM interpolates the timing of these two phases for bet-ter timing precision by controlling the current distribution tothe delay cells for each phase input. As shown in Fig. 8(b), thetwo selected phases PH/L < i > and PH/L < i+ 1 > are fedinto the DPM, which is composed of two independent delaycells with a shared output. Each delay cell is constructed ofeight gated inverter cells controlled by the thermometer codeTH/L[6 : 0] and one dummy cell. The dummy cell for PH/L
< i > is always “ON” and the dummy cell for PH/L < i+1 > is always “OFF.” Since the number of “ON” inverters
in the DPM is constant, we can interpolate the phasesPH/L < i > and PH/L < i+ 1 > by distributing the numberof “ON” inverters between the DPM delay cells according toTH/L[6 : 0]. Because there are eight “ON” inverters to dis-tribute between the two delay cells, the timing resolutionwill be improved to τ/8 which is around 4–6 ps depend-ing on the timing resolution of the DPS. The DPS and DPMthus finely define the delay of the phases H and L giventhe tracing phases T_H and T_L. Fig. 9(a) shows the sim-ulated overall delay of the DPS and DPM associated withthe corresponding control code DH/L[5 : 0] and TH/L[6 : 0].The delay of the DPS and DPM is monotonic increasingalong with the increasing control codes so that there is nofalse locking problem in the ADDLL. Fig. 9(b) shows thesimulated INL and DNL of the DPS and DPM delay. Themaximum delay difference between two adjacent codes (DNL)is 7 ps which also represents the maximum phase error ofoutput.
HSIEH et al.: 6.7 MHz TO 1.24 GHz 0.0318 mm2 FAST-LOCKING ADDLL 419
Fig. 7. Waveform diagram of L-part in PTDU. (a) Q2 is selected. (b) Q5 is selected.
Fig. 8. (a) DPS. (b) DPM.
Finally, an EC is used to reconstruct output waveform.Fig. 10 shows the circuit diagram of EC. In the EC, when theH signal is at high, the EC discharges the output CKDLL to beat low. On the other hand, the output CKDLL of the EC will be
at high when the L signal is at high. Because H and L are pulsesignals, H and L will not be at high at the same time. If both ofH and L are at low, the output CKDLL will be held by the latchat the previous level. The timing diagram associated with the
420 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 51, NO. 2, FEBRUARY 2016
Fig. 9. Simulated delay of the DPS and DPM. (a) Delay versus control code. (b) INL and DNL.
Fig. 10. Edge combiner.
DPS and DPM is shown in Fig. 11, where the rising and fallingedge timings of the reference clock are represented by signalsL and H .
In summary, after H-part and L-part of the PTDU gener-ate the tracing phases T_H and T_L with rough and imprecisedelays, the DPS and DPM finely tune the delays from T_Hand T_L to H and L with better timing resolution to mini-mize output phase error. The DPS compensates for the outputphase error with a resolution of τ and the DPM compensates forthe phase error with a resolution of τ/8. Finally, the EC recov-ers the output waveform from H and L while maintaining theinput clock’s duty cycle. However, the DPS, DPM, and EC allpossess intrinsic minimum delays as shown in Fig. 11. To com-pensate for the delays, we had the PTDU select for the thirdfrom the last rising edge (j − 2 in Fig. 5) of each GRO output.In order to ensure that the ADDLL will be locked, the tuningrange of the DPS and DPM should be at least twice as large asthe PTDU timing resolution because bubble codes may occurin the PTDU, i.e., 6τ should not be less than 2(6t). Since theeach buffer in the DPS includes two inverters, we make τ to beequal to 2t while using the same transistor size. Thus, 6τ wouldbe equal to 2(6t).
C. Control Unit
Since the PTDU is able to provide two proper phases T_Hand T_L automatically, a control unit is required to define the
delays of the DPS and DPM to minimize the phase error. Thecontrol unit includes two RDLs, as shown in Fig. 12(a). Thefirst RDL provides minimum delay from the DPS to EC withan output clock CKPSR. Because both the DPS and DPM inthe first RDL are in minimum delay, the delay of the DPS inthe main delay line can be defined by quantizing the phase dif-ference between CKPSR and CKREF with a timing resolutionof τ . In the second RDL, an output clock CKPMR is generatedwith a minimum delay from the DPM to the EC. Therefore, thedelay of the DPM in the main delay line can be determined byquantizing the phase difference between CKPMR and CKREF
with a timing resolution of τ/8. Fig. 12(b) shows the timingdiagram for the control unit.
The decision code DH/L[5 : 0] for defining the DPS delayis generated by the DPS code generator in the first RDL,which actually is a delay-line time-to-digital converter (TDC),as shown in Fig. 13(a). Note that the delay lines of the TDC andDPS should be matched, so each delay cell should have a delayof τ . Thus, the DPS code generator is able to quantize the phasedifference between CKPSR and CKREF with a timing resolu-tion of τ . As a result, we can specify the delay required of theDPS in the main delay line to compensate for the output phaseerror with a timing resolution of τ . An encoder is necessary toconvert the thermometer code QH/L[6 : 1] triggered chronolog-ically by the DFFs into binary code DH/L[5 : 0]. However, as inthe PTDU, an FSM is also required in the DPS code generatorbecause the decisions may suffer from bubble codes due to DFFsetup/hold time violations. After the delay of the DPS in themain delay line is defined, the second RDL with a DPM codegenerator creates CKPMR with the minimum delay in DPM.The timing diagram is shown in Fig. 13(b).
Since it is hard to build a TDC with a timing resolution ofτ/8, a tunable delay unit (TDU) is used in the DPM code gen-erator, as shown in Fig. 14(a). There are two different delaysin the TDU, which are Ttdu and Ttdu + Ui. Ui is the resolutionfactor. The delay of each signal path in the TDU is controlledby the selection signal S. While S is high, the delay from IN1 toOUT1 is Ttdu and the delay from IN2 to OUT2 is Ttdu + Ui.On the contrary, the delay from IN1 to OUT1 is Ttdu + Ui andthe delay from IN2 IN2 to OUT2 is Ttdu, while S is low. For
HSIEH et al.: 6.7 MHz TO 1.24 GHz 0.0318 mm2 FAST-LOCKING ADDLL 421
Fig. 11. Timing diagram of DPS, DPM, and EC.
Fig. 12. Control unit. (a) Block diagram. (b) Timing waveform.
fast locking, the successive approximation technique is appliedin the DPM code decision process. Fig. 14(b) shows the DPMcode decision flowchart. If CKPMR leads CKREF in the begin-ning, which means the delay is too large, FH/L[2 : 0] shouldbe directly set to “000” for minimum delay. If CKPMR lagsCKREF in the beginning, the DPM code decision process willstart. It takes three TDU stages to generate the DPM code sincethere are three bits of FH/L[2 : 0]. In each stage, only onebit will be determined according to the input phase relation.When CKPMR leads CKREF, CKREF will be delayed by Ttdu
and CKPMR will be delayed by Ttdu + Ui. On the contrary, ifCKPMR lags CKREF, CKREF will be delayed by Ttdu + Ui
Fig. 13. DPS code generator. (a) Block diagram. (b) Timing waveform.
and CKPMR will be delayed by Ttdu. This decision processis then recursively repeated. Therefore, the phase differencebetween CKREF and CKPMR will be reduced with a timingresolution of Ui within each recursive cycle. The timing res-olution factor Ui used in each stage should be τ/2, τ/4, andτ/8, respectively, since the successive approximation for thecontrol code is performed in binary. However, the timing reso-lution factor Ui can be carefully determined by transistor sizing.We generate Ttdu by one buffer with transistor size of W1/Lmin
and generate Ttdu + Ui by another buffer with transistor size ofW2/Lmin. Therefore, we can define the timing resolution fac-tor Ui by choosing transistor sizes of W1/Lmin and W2/Lmin.Taking PVT variations and device mismatch of the buffers intodesign considerations, the timing error of Ui should be smallerthan τ/16 for ensuring that there is no missing code in the DPM
422 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 51, NO. 2, FEBRUARY 2016
Fig. 14. DPM code generator. (a) TDU. (b) Decision flow. (c) Block diagram and timing waveform.
Fig. 15. Locking procedure of the proposed ADDLL.
code generator. Thus, the DPM code generator is functionalwith PVT variations and device mismatch. In the circuit imple-mentation of the DPM code generator, we can accomplish threerecursive decision steps in only one clock cycle because of theTDU. Fig. 14(c) shows the block and waveform diagram ofthe DPM code generator. Three serially linked TDUs with dif-ferent timing resolution factors Ui are used. The input phase
Fig. 16. Simulated power consumption of the ADDLL.
relation at each stage is determined by the bang-bang phasedetector (BBPD) between each TDU. At the first TDU whereU1 is τ/2, because CKREF always leads CKPMR, CKPMR
will be delayed by Ttdu + τ/2 and CKREF will be delayed byTtdu, resulting in O1 and R1, respectively, in the next stage.FH/L[2] is decided according to the phase relation of O1 andR1. FH/L[2] also determines the signal paths of the secondTDU stage where U2 is τ/4. The second TDU creates a timedifference of τ/4 and results in signals O2 and R2. The phase
HSIEH et al.: 6.7 MHz TO 1.24 GHz 0.0318 mm2 FAST-LOCKING ADDLL 423
Fig. 17. (a) Chip micrograph. (b) Area breakdown.
Fig. 18. Locked state of the ADDLL. (a) 6.7 MHz. (b) 1.24 GHz.
relation between these two signals determines the value ofFH/L[1]. Similarly, FH/L[0] is decided according to the phaserelation between O3 and R3 after the third TDU creates a finalτ/8 time difference. Because the required three TDU stagesare in one signal path, the DPM code generator needs only oneclock cycle to determine the DPM delay.
The ADDLL will be locked by the sixth clock cycle and thewaveform diagram is shown in Fig. 15. The PTDU requirestwo clock cycles to generate the tracing phases T_H and T_L.The first cycle is for generating Sel[5:1] in the path selec-tor and the second cycle is needed for registers in the timingcontroller to store the count. CKPSR and CKPMR in the twoRDLs are generated at the third cycle. The DPS code genera-tor defines the DPS delay by DH/L[5 : 0] in the fourth cycle.During the fifth cycle, the delay of the DPM will also be deter-mined by the DPM code generator. The ADDLL is able toprovide a synchronous output clock CKDLL by the start of thesixth cycle. As a result, the locking time of the ADDLL is fivecycles regardless of the operating frequency. For system clockgeneration, because the ADDLL applies the same procedureto track any clock phase, creating an output clock that is inphase with the tracked clock only takes the same five cyclesused for locking. Fig. 16 shows the simulated power break-down of the proposed ADDLL. As same as the power analysisin Section II, the PTDU which uses the proposed GRO-based
delay line consumes constant power dissipation independentof the reference frequency. However, the power consumptionof the DPS, the DPM, the EC, and the control logic is stillproportional to the operating frequency. The simulated powerof PTDU is slightly increasing along with the increasing fre-quency because there are some buffers in the PTDU operating atthe reference frequency. As a result, the power consumption ofthose buffer operating at reference frequency is still frequencydependent and follows P = CV 2f .
IV. EXPERIMENTAL RESULTS
The proposed ADDLL has been fabricated in TSMC 90 nmCMOS technology and occupies 0.0318 mm2 active siliconarea. The micrograph of the chip and the area breakdown areshown in Fig. 17, including the PTDU, DPS, DPM, and twoRDLs in the open-loop control logic. The operating range ofthe ADDLL is from 6.7 MHz to 1.24 GHz. Fig. 18 shows theoutput signal waveform while the ADDLL is locked. CKDLL isaligned with CKREF at both 1.24 GHz and 6.7 MHz. Becausethe delays associated with the rising and falling edges ofCKREF locked independently, the output duty cycle should bethe same as the input duty cycle. The measured root-mean-square jitters are 424.64 fs and 6.92 ps at 1.24 GHz and6.7 MHz, respectively, as shown in Fig. 19. In addition, the
424 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 51, NO. 2, FEBRUARY 2016
Fig. 19. Jitter histogram of the ADDLL. (a) 6.7 MHz. (b) 1.24 GHz.
Fig. 20. Measured locking time of the ADDLL. (a) 10 MHz. (b) 525 MHz.
measured peak-to-peak jitters are 2.22 ps at 1.24 GHz and40 ps at 6.7 MHz. For the 40 ps jitterP-P at 6.7 MHz, 36 pscome from the signal generator. The measured locking time isfive cycles regardless of the operating frequency as shown inFig. 20. The PTDU generates T_H and T_L with coarse delaytimes in the first two cycles. Then, the control logic generatestwo outputs CKPSR and CKPMR through two RDLs in the thirdcycle. The control logic uses the fourth cycle to define the delayin the DPS and the fifth cycle to define the delay in the DPM.The locking procedure is completed and the ADDLL providesan output clock synchronized with the input clock by the sixthcycle. The ADDLL consumes 14 mW power at 1.24 GHz with1.2 V supply voltage.
Compared to previous wide-range DLLs, the proposedADDLL achieves the widest frequency range while maintaininglow-power consumption and small silicon area. In Fig. 21(a),the frequency range is defined by the ratio of the highest sup-ported frequency to the lowest frequency, which is related tothe delay line length. In addition, we normalized the area to thetechnology node in order to compare the area fairly. Since thepower is related to the operating range which is proportionalto the number of delay cells, the power consumption should
also be normalized to the supply voltage and operating range,as shown in Fig. 21(b). Furthermore, frequency range, powerconsumption, and area are all correlated in wide-range DLLs.Wider operating frequency range requires larger area, whichleads to higher power consumption. We define figures of merit(FOM) to compare the wide-range DLL performance in areaand power independently. The power and area FOM are definedas follows:
FOMPower =Power
Supply2 · fMaxfMin
(20)
FOMArea =Area
L2 · fMaxfMin
. (21)
However, to evaluate the overall performance of a wide-rangeDLL, the overall FOM should be
FOMDLL =Power · area
Supply2 · L2 · fMaxfMin
. (22)
The definition of FOM is based on normalized power, nor-malized area, and frequency range. Table I summarizes and
HSIEH et al.: 6.7 MHz TO 1.24 GHz 0.0318 mm2 FAST-LOCKING ADDLL 425
Fig. 21. Performance summary. (a) Operating frequency range. (b) Power consumption.
TABLE ICOMPARISON OF WIDE-RANGE DLLS
compares the performance of the proposed ADDLL with otherworks. Because of the open-loop scheme, the ADDLL achievesbetter peak-to-peak and root-mean-square jitter performance.Furthermore, taking frequency range, power consumption, andarea into consideration, the proposed ADDLL gets the bestFOMDLL of 0.206.
V. CONCLUSION
In this paper, a wide-range ADDLL is proposed for appli-cations that require fast locking and low cost. The GRO-baseddelay line replaces conventional long serially linked delay linesand saves area. The GRO-based delay line not only achieves awide operating frequency with small area, but it also minimizes
power consumption due to the use of fewer delay cells. Becauseof the open-loop locking scheme, the ADDLL only requiresfive cycles to be locked. Two cycles are required for the PTDUoutput phases T_H and T_L to be ready, with one cycle forthe phase selector and one cycle for the timing control. Afterthe PTDU is locked, it takes one cycle to determine the DPSdelay and another cycle to define the DPM delay in the controlunit that utilizes two RDLs. As a result, we obtain a syn-chronous output clock at the end of the fifth cycle. Because theADDLL locks to the rising and falling edges independently, theADDLL is able to recover the input duty cycle at the outputclock. Therefore, if the input duty cycle is 50%, we do not needan extra duty cycle correction (DCC) circuit in system appli-cations. In general, the frequency range, power consumption,
426 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 51, NO. 2, FEBRUARY 2016
and area are correlated in a wide-range DLL. However, theproposed architecture utilizing a GRO-based delay line breaksdown the relationship between these metrics. With silicon areaand power consumption comparable to previous designs, theADDLL achieves much wider frequency range. Consequently,the ADDLL has the best reported FOMArea, FOMPower, andoverall FOMDLL. Because digital DLLs are highly suited fortechnology scaling, the proposed wide-range, fast-locking, andlow-cost ADDLL architecture enables higher frequencies andwider operating frequency range that would be advantages innew SoC communication systems at advanced technology.
ACKNOWLEDGMENT
The authors would like to thank the National ChipImplementation Center (CIC), Taiwan, for chip fabrication.
REFERENCES
[1] S. J. Kim et al., “A low-jitter wide-range skew-calibrated dual-loopDLL using antifuse circuitry for high-speed DRAM,” IEEE J. Solid-StateCircuits, vol. 37, no. 6, pp. 726–734, Jun. 2002.
[2] B.-G. Kim, L.-S. Kim, K.-I. Park, Y.-H. Jun, and S.-I. Cho, “A DLL withjitter reduction techniques and quadrature phase generation for DRAMinterfaces,” IEEE J. Solid-State Circuits, vol. 44, no. 5, pp. 1522–1530,May 2009.
[3] M. Hossain et al., “A 400 MHz–1.6 GHz fast lock, jitter filtering ADDLLbased burst mode memory interface,” in IEEE Symp. VLSI Circuits Dig.Tech. Papers, Jun. 2013, pp. 244–245.
[4] J.-S. Wang, T.-M. Wang, C.-H. Chen, and T.-C. Liu, “An ultra-low-powerfast-lock-in small-jitter all-digital DLL,” in IEEE Int. Solid-State CircuitsConf. (ISSCC) Dig. Tech. Papers, Feb. 2005, pp. 422–607.
[5] R.-J. Yang and S.-I. Liu, “A 2.5 GHz all-digital delay-locked loop in0.13 µm CMOS technology,” IEEE J. Solid-State Circuits, vol. 42, no. 11,pp. 2338–2347, Nov. 2007.
[6] B. Mesgarzadeh and A. Alvandpour, “A low-power digital DLL-basedclock generator in open-loop mode,” IEEE J. Solid-State Circuits, vol. 44,no. 7, pp. 1907–1913, Jul. 2009.
[7] M.-J. E. Lee et al., “Jitter transfer characteristics of delay-locked loops-theories and design techniques,” IEEE J. Solid-State Circuits, vol. 38,no. 4, pp. 614–621, Apr. 2003.
[8] J.-S. Wang, C.-Y. Cheng, J.-C. Liu, Y.-C. Liu, and Y.-M. Wang, “A duty-cycle-distortion-tolerant half-delay-line low-power fast-lock-in all-digitaldelay-locked loop,” IEEE J. Solid-State Circuits, vol. 45, no. 5, pp. 1036–1047, May 2010.
[9] H.-H. Chang and S.-I. Liu, “A wide-range and fast-locking all-digitalcycle-controlled delay-locked loop,” IEEE J. Solid-State Circuits, vol. 40,no. 3, pp. 661–670, Mar. 2005.
[10] W.-J. Yun, H. W. Lee, D. Shin, S. D. Kang, J. Y. Yang, and H. O. Lee, “A0.1-to-1.5 GHz 4.2 mW all-digital DLL with dual duty-cycle correctioncircuit and update gear circuit for DRAM in 66 nm CMOS technology,”in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb.2008, pp. 282–613.
[11] R.-J. Yang and S.-I. Liu, “A 40–550 MHz harmonic-free all-digitaldelay-locked loop using a variable SAR algorithm,” IEEE J. Solid-StateCircuits, vol. 42, no. 2, pp. 361–373, Feb. 2007.
[12] D. Shin, J. Song, H. Chae, and C. Kim, “A 7 ps jitter 0.053 mm2
fast lock all-digital DLL with a wide range and high resolution DCC,”IEEE J. Solid-State Circuits, vol. 44, no. 9, pp. 2437–2451, Sep.2009.
[13] M.-H. Hsieh, L.-H. Chen, S.-I. Liu, and C.-P. Chen, “A 6.7 MHz-to-1.24 GHz 0.0318 mm2 fast-locking all-digital DLL in 90 nm CMOS,”in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb.2012, pp. 244–246.
[14] A. Elshazly, A. Balankutty, Y.-Y. Huang, K. Yu, and F. O’Mahony, “A2 GHz-to-7.5 GHz quadrature clock-generator using digital delay lockedloops for multi-standard I/Os in 14 nm CMOS,” in IEEE Symp. VLSICircuits Dig. Tech. Papers, Jun. 2014, pp. 1–2.
[15] X. Yu, W. Rhee, Z. Wang, J.-B. Lee, and C. Kim, “A 0.4-to-1.6 GHz low-OSR ΔΣ DLL with self-referenced multiphase generation,” in IEEE Int.Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2009, pp. 398–399.
[16] B.-G. Kim and L.-S. Kim, “A 250-MHz–2-GHz wide-range delay-lockedloop,” IEEE J. Solid-State Circuits, vol. 40, no. 6, pp. 1310–1321, Jun.2005.
Min-Han Hsieh (S’11–M’15) was born inKaohsiung, Taiwan, in 1984. He received the M.S.and Ph.D. degrees in electrical engineering from theNational Taiwan University (NTU), Taipei, Taiwan,in 2009 and 2015, respectively.
From 2013 to 2014, he was a Visiting Scholarat Berkeley Wireless Research Center (BWRC),University of California at Berkeley (UCB),Berkeley, CA, USA, sponsored by the NationalScience Council (NSC), Taipei, Taiwan. His researchinterests include domino logic circuits, mixed-signal
integrated circuits, and powerline communication systems.
Liang-Hsin Chen was born in Taoyuan, Taiwan, in1986. He received the B.S. degree in electrical engi-neering from the National Central University (NCU),Taoyuan, Taiwan, in 2009, and the M.S. degree inelectronics engineering from the National TaiwanUniversity (NTU), Taipei, Taiwan, in 2011.
His research interests include mixed-mode integra-tion circuits, high-speed SerDes, and PLLs.
Shen-Iuan Liu (S’88–M’93–SM’03–F’10) was bornin Keelung, Taiwan, in 1965. He received the B.S.and Ph.D. degrees in electrical engineering from theNational Taiwan University (NTU), Taipei, Taiwan,in 1987 and 1991, respectively.
From 1991 to 1993, he served as a SecondLieutenant with Chinese Air Force. From 1991to 1994, he was an Associate Professor with theDepartment of Electronic Engineering, NationalTaiwan Institute of Technology, Taipei, Taiwan. Hejoined the Department of Electrical Engineering,
NTU, in 1994, where he has been a Professor since 1998. Currently, he isa Distinguished Professor with the NTU since August 2010. He is also theDirector of Graduate Institute of Electronics Engineering with the NTU. Hisresearch interests include analog and digital integrated circuits and systems.
Dr. Liu has served as a Technical Program Committee Member for ISSCCfrom 2006 to 2008, IEEE VLSI-DAT from 2008 to 2012, and A-SSCC from2005 to 2012. He also served as the Technical Program Committee Co-Chairand Chair for A-SSCC 2010 and 2011, respectively. He was an Associate Editorfor the IEEE JOURNAL OF SOLID-STATE CIRCUITS from 2006 to 2009 anda Guest Editor for the IEEE JOURNAL OF SOLID-STATE CIRCUITS SpecialIssue between December 2008 and November 2012. He was an AssociateEditor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II:EXPRESS BRIEFS from 2006 to 2007. He was an Associate Editor for the IEEETRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS from2008 to 2009. He was in the Editorial Board of Research Letters in Electronicsfrom 2008 to 2009. He was an Associate Editor for IEICE (The Instituteof Electronics, Information and Communication Engineers) Transactions onElectronics from 2008 to 2011. He is an Associate Editor for ETRI Journal, andalso an Associate Editor for Journal of Semiconductor Technology and Science,Korea, in 2009. He also joined the Editorial Board for International ScholarlyResearch Network (ISRN) Electronics in 2011. He is a member of IEICE.He has served as General Chair of the 15th VLSI Design/CAD Symposium,Taiwan, in 2004, and as Program Co-Chair of the Fourth IEEE Asia-PacificConference on Advanced System Integrated Circuits, Fukuoka, Japan, in 2004.He has served as a Chair of the IEEE SSCS Taipei Chapter, from 2004 to 2008,which achieved the Best Chapter Award in 2009. He was the recipient of theEngineering Paper Award from the Chinese Institute of Engineers in 2003, theYoung Professor Teaching Award from MXIC Inc., the Research AchievementAward from NTU, and the Outstanding Research Award from National ScienceCouncil in 2004.
HSIEH et al.: 6.7 MHz TO 1.24 GHz 0.0318 mm2 FAST-LOCKING ADDLL 427
Charlie Chung-Ping Chen received the B.S. degreein computer science and information engineeringfrom the National Chiao-Tung University, Hsinchu,Taiwan, in 1990, and the M.S. and Ph.D. degreesin computer science from the University of Texasat Austin, Austin, TX, USA, in 1996 and 1998,respectively.
From 1996 to 1999, he was with Intel Corporation,Strategic CAD Labs, Hillsboro, OR, USA, as a SeniorCAD Engineer. Since 1999, he has been an AssistantProfessor with the ECE Department, University of
Wisconsin, Madison, WI, USA. Since 2003, he has been an AssociateProfessor with the EE Department, National Taiwan University, Taipei, Taiwan.Currently, he is a Professor with the GIEE, BIO, and EE Department, NationalTaiwan University. His research interests include EDA and BIO topics such ascomputer-aided design and microprocessor circuit design with an emphasis oninterconnect and circuit optimization, circuit simulation, statistical design, andsignal/power/thermal integrity analysis and optimization.
Dr. Chen served the Program Committee and/or Organizer of DAC, ICCAD,DATE, ISPD, ASPDAC, ISQED, SASIMI, VLSI/CAD Symposium, and ITRS.He received the D2000 Award from Intel Corp. and National SciencesFoundation Faculty Early Career Development Award (CAREER) from 1999 to2001, respectively. He also received the 2002 SIGDA/ACM Outstanding YoungFaculty Award and the 2002 IBM Peter Schneider Faculty Development Award.