16
412 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 51, NO. 2, FEBRUARY 2016 A 6.7 MHz to 1.24 GHz 0.0318 mm 2 Fast-Locking All-Digital DLL Using Phase-Tracing Delay Unit in 90 nm CMOS Min-Han Hsieh, Member, IEEE, Liang-Hsin Chen, Shen-Iuan Liu, Fellow, IEEE, and Charlie Chung-Ping Chen Abstract—In this paper, an all-digital delay-locked loop (ADDLL) with a phase-tracing delay unit (PTDU) has been pro- posed to achieve wide-operating frequency range, low power, and low cost. For the wide-range DLL, the long delay line is replaced by a PTDU which includes two gated ring oscillators (GROs) for generating the wide delay range with a reduced die area. According to the dual-loop control scheme in this work, the input clock rising edge and falling edge are tracked inde- pendently to ensure that the ADDLL output maintains the duty cycle of the input reference. Furthermore, the ADDLL utilizes an open-loop scheme to achieve fast lock time of five clock cycles for all supported input frequencies. The proposed ADDLL has been fabricated in TSMC 90 nm CMOS technology and supports a wide-operating frequency range from 6.7 MHz to 1.24 GHz within a small active area of 0.0318 mm 2 . The measured peak-to-peak and root-mean-square jitter at 1.24 GHz are 2.22 ps and 424.62 fs, respectively. The ADDLL consumes 14.5 mW while operating at 1.24 GHz. Index Terms—All-digital, delay-locked loop (DLL), fast locking, open-loop locking, phase-tracing delay unit (PTDU), wide range. I. I NTRODUCTION I N RECENT years, advanced deep-submicron CMOS tech- nologies have provided IC designers with the capability to design high quality and high reliability system on chips (SoC). Delay-locked loops (DLLs) have been frequently used to seamlessly connect the custom IP blocks in SoCs, and they are integral in enabling clock generation and synchro- nization for high-speed data communications. Although many new standards have been proposed for high-speed communica- tions, those standards should be compatible with the standards that are currently being used. For example, in memory systems where DLLs are commonly used, the maximum I/O clock fre- quency is 1066 MHz in the latest DDR3 standard, while the minimum I/O clock frequency is 100 MHz in the original DDR standard. As the technology keeps moving forward, it is pre- dictable in the future that an ultra-wide-range DLL is required in supporting all frequencies of all standards. In the past, cus- tomized analog DLLs were used for high-speed I/O [1], [2]. Manuscript received January 30, 2015; revised July 18, 2015; accepted October 11, 2015. Date of publication November 03, 2015; date of current version January 29, 2016. This paper was approved by Associate Editor Pavan Kumar Hanumolu. The authors are with the Graduate Institute of Electronics Engineering, National Taiwan University, Taipei 10617, Taiwan (e-mail: scy256@ gmail.com; [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSSC.2015.2494603 However, analog DLLs face some problems, such as high leak- age current, low supply voltage, and increased PVT variations due to CMOS scaling. In order for highly integrated digital systems to reach its reliability requirements, many IC design- ers have shifted their focus to digitally assisted and all-digital DLLs (ADDLLs) [3]–[6], forgoing analog DLLs despite their finer timing resolution and resulting low jitter performance. Moreover, digital DLLs achieve fast-locking times and low- power consumption while also having simplified designs. There are two locking mechanisms used to realize ADDLLs: closed loop and open loop. The closed-loop scheme is widely adopted in conventional phase-locked loops (PLLs) and DLLs. Closed- loop ADDLLs will update the control code every reference clock cycle depending on the detected phase error. The out- put jitter is not only due to the noise, but it is also a function of the delay resolution. As a result, the closed-loop ADDLLs suffer from a worse jitter performance due to cycle-to-cycle control code variations. Although there are some papers dis- cussing about the loop filter design for reducing output jitter, jitter amplification is still an important problem in the closed- loop ADDLLs [7]. On the other hand, because DLLs are only concerned with the time delay from the input signal of a given frequency, open-loop ADDLLs can provide better jitter perfor- mance. Usually the peak-to-peak jitter is a few picoseconds and the root-mean-square jitter is at the femtosecond level because the control code is not toggling for a fixed delay. Additionally, open-loop DLLs also achieve fast-locking time. In order to achieve a wide frequency range, some engineers recommend integrating two or more DLLs with different oper- ating frequencies. Therefore, a band selector is necessary to select the appropriate DLL for a given standard. The drawback of this method, however, is its high cost. To minimize compo- nents and therefore reduce cost, the practical solution would be to include only one DLL with wide operating frequency range [8]–[14] to satisfy different standards. In wide-range DLLs, har- monic locking would be an important issue due to the wide delay range. Even though there is only one DLL, the long delay line in the wide-range DLLs for wide delay range usu- ally occupies a large area and thus also increases the cost and power. In this work, a wide-range, fast-locking, low-cost, and low- power ADDLL has been proposed. The ADDLL utilizes a phase-tracing delay unit (PTDU) to replace the massively deployed delay cells in the delay line. The PTDU provides a wide delay range using two simple gated ring oscillators 0018-9200 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

A 6.7 MHz to 1.24 GHz 0.0318 mm Fast-Locking All-Digital ...cc.ee.ntu.edu.tw/~cchen/papers/07317498.pdf · As a result, a synchronous output clock CKDLL will be generated by sensing

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A 6.7 MHz to 1.24 GHz 0.0318 mm Fast-Locking All-Digital ...cc.ee.ntu.edu.tw/~cchen/papers/07317498.pdf · As a result, a synchronous output clock CKDLL will be generated by sensing

412 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 51, NO. 2, FEBRUARY 2016

A 6.7 MHz to 1.24 GHz 0.0318 mm2 Fast-LockingAll-Digital DLL Using Phase-Tracing Delay

Unit in 90 nm CMOSMin-Han Hsieh, Member, IEEE, Liang-Hsin Chen, Shen-Iuan Liu, Fellow, IEEE, and Charlie Chung-Ping Chen

Abstract—In this paper, an all-digital delay-locked loop(ADDLL) with a phase-tracing delay unit (PTDU) has been pro-posed to achieve wide-operating frequency range, low power,and low cost. For the wide-range DLL, the long delay line isreplaced by a PTDU which includes two gated ring oscillators(GROs) for generating the wide delay range with a reduced diearea. According to the dual-loop control scheme in this work,the input clock rising edge and falling edge are tracked inde-pendently to ensure that the ADDLL output maintains the dutycycle of the input reference. Furthermore, the ADDLL utilizes anopen-loop scheme to achieve fast lock time of five clock cycles forall supported input frequencies. The proposed ADDLL has beenfabricated in TSMC 90 nm CMOS technology and supports awide-operating frequency range from 6.7 MHz to 1.24 GHz withina small active area of 0.0318 mm2. The measured peak-to-peakand root-mean-square jitter at 1.24 GHz are 2.22 ps and 424.62 fs,respectively. The ADDLL consumes 14.5 mW while operating at1.24 GHz.

Index Terms—All-digital, delay-locked loop (DLL), fast locking,open-loop locking, phase-tracing delay unit (PTDU), wide range.

I. INTRODUCTION

I N RECENT years, advanced deep-submicron CMOS tech-nologies have provided IC designers with the capability

to design high quality and high reliability system on chips(SoC). Delay-locked loops (DLLs) have been frequently usedto seamlessly connect the custom IP blocks in SoCs, andthey are integral in enabling clock generation and synchro-nization for high-speed data communications. Although manynew standards have been proposed for high-speed communica-tions, those standards should be compatible with the standardsthat are currently being used. For example, in memory systemswhere DLLs are commonly used, the maximum I/O clock fre-quency is 1066 MHz in the latest DDR3 standard, while theminimum I/O clock frequency is 100 MHz in the original DDRstandard. As the technology keeps moving forward, it is pre-dictable in the future that an ultra-wide-range DLL is requiredin supporting all frequencies of all standards. In the past, cus-tomized analog DLLs were used for high-speed I/O [1], [2].

Manuscript received January 30, 2015; revised July 18, 2015; acceptedOctober 11, 2015. Date of publication November 03, 2015; date of currentversion January 29, 2016. This paper was approved by Associate Editor PavanKumar Hanumolu.

The authors are with the Graduate Institute of Electronics Engineering,National Taiwan University, Taipei 10617, Taiwan (e-mail: [email protected]; [email protected]; [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2015.2494603

However, analog DLLs face some problems, such as high leak-age current, low supply voltage, and increased PVT variationsdue to CMOS scaling. In order for highly integrated digitalsystems to reach its reliability requirements, many IC design-ers have shifted their focus to digitally assisted and all-digitalDLLs (ADDLLs) [3]–[6], forgoing analog DLLs despite theirfiner timing resolution and resulting low jitter performance.Moreover, digital DLLs achieve fast-locking times and low-power consumption while also having simplified designs. Thereare two locking mechanisms used to realize ADDLLs: closedloop and open loop. The closed-loop scheme is widely adoptedin conventional phase-locked loops (PLLs) and DLLs. Closed-loop ADDLLs will update the control code every referenceclock cycle depending on the detected phase error. The out-put jitter is not only due to the noise, but it is also a functionof the delay resolution. As a result, the closed-loop ADDLLssuffer from a worse jitter performance due to cycle-to-cyclecontrol code variations. Although there are some papers dis-cussing about the loop filter design for reducing output jitter,jitter amplification is still an important problem in the closed-loop ADDLLs [7]. On the other hand, because DLLs are onlyconcerned with the time delay from the input signal of a givenfrequency, open-loop ADDLLs can provide better jitter perfor-mance. Usually the peak-to-peak jitter is a few picoseconds andthe root-mean-square jitter is at the femtosecond level becausethe control code is not toggling for a fixed delay. Additionally,open-loop DLLs also achieve fast-locking time.

In order to achieve a wide frequency range, some engineersrecommend integrating two or more DLLs with different oper-ating frequencies. Therefore, a band selector is necessary toselect the appropriate DLL for a given standard. The drawbackof this method, however, is its high cost. To minimize compo-nents and therefore reduce cost, the practical solution would beto include only one DLL with wide operating frequency range[8]–[14] to satisfy different standards. In wide-range DLLs, har-monic locking would be an important issue due to the widedelay range. Even though there is only one DLL, the longdelay line in the wide-range DLLs for wide delay range usu-ally occupies a large area and thus also increases the cost andpower.

In this work, a wide-range, fast-locking, low-cost, and low-power ADDLL has been proposed. The ADDLL utilizes aphase-tracing delay unit (PTDU) to replace the massivelydeployed delay cells in the delay line. The PTDU providesa wide delay range using two simple gated ring oscillators

0018-9200 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: A 6.7 MHz to 1.24 GHz 0.0318 mm Fast-Locking All-Digital ...cc.ee.ntu.edu.tw/~cchen/papers/07317498.pdf · As a result, a synchronous output clock CKDLL will be generated by sensing

HSIEH et al.: 6.7 MHz TO 1.24 GHz 0.0318 mm2 FAST-LOCKING ADDLL 413

Fig. 1. Delay line in delay-locked loops. (a) Traditional serially linked delay line. (b) Proposed GRO-based delay line.

(GROs) to simultaneously reduce the area cost and powerconsumption of a wide-range DLL. An open-loop scheme isadopted so that the ADDLL can achieve a fast-locking time offive cycles regardless of input frequency. The open-loop schemealso achieves better jitter performance with 2.22 ps peak-to-peak jitter and 424.64 fs root-mean-square jitter at 1.24 GHz.The operating frequency range of the proposed ADDLL is from6.7 MHz to 1.24 GHz. Because the PTDU tracks both the ris-ing edge and falling edge of the reference clock independently,the ADDLL’s output maintains the duty cycle of the referenceclock. In this paper, the main innovation and some analysis ofthe proposed ADDLL will be described in Section II. The archi-tecture and circuit implementation details will be presented inSection III. Measurement results will be shown in Section IV.Finally, a brief conclusion will be given in Section V.

II. INNOVATION AND ANALYSIS OF THE

PROPOSED ADDLL

The main idea of the proposed ADDLL involves replacingthe long serially linked delay cells with two NOR-GROs. Thefunction of the DLL is to generate an output clock CKDLL,which is synchronized with the input clock CKREF. In conven-tional DLLs, the delay line consists of many delay cells that areused to generate delays up to a full clock period, as shown inFig. 1(a). Because each delay cell is composed of two inverters,the delay of each cell is 2t, where t is the inverter delay. WhenN delay cells are used, the longest clock period supported is2Nt. Therefore, more delay cells are required to support awider frequency range. Thus, wide-range DLLs would incur ahigh area cost. In the proposed ADDLL, the long delay lineis replaced by two NOR-GROs each with n stages, where nshould be an odd number. One GRO is gated by CKREF whilethe other is gated by /CKREF, as shown in Fig. 1(b). The NOR-GRO gated by CKREF oscillates when CKREF is high andanother NOR-GRO gated by /CKREF oscillates when CKREF

is low. As a result, a synchronous output clock CKDLL willbe generated by sensing the desired edge of each of the twoGRO output, representing the delay associated with the risingand falling edges of CKREF. Consequently, the area cost will

be reduced to only 2n delay cells. Assuming each gate delay ist, the oscillation period of an n-stage GRO is 2nt with a 50%duty cycle. Therefore, the estimated delay range of the proposedADDLL is

2nt ≤ T ≤ 2 (2K + 1)nt (1)

where K represents the number of GRO oscillation cycles asdetermined by the maximum value of the pulse counter. In orderto achieve the same operating range, a serially linked delay linerequires N = (2K + 1)n delay cells and Ninv gates, where

Ndelayline = 2N = 2 (2K + 1)n. (2)

Because each delay cell in the conventional delay line consistsof two inverter gates, twice as many gates are needed as delaycells. Thus, the conventional serially linked delay line requires2K + 1 times more gates than the GRO-based delay line toachieve the same delay range.

In order to make the comparison more fairly, we shouldinclude the control logic which is usually used for automaticfrequency detecting into the analysis. Fig. 2 shows the perioddetector in the traditional wide-range DLL [12]. Another NDFFs are applied and each DFF is triggered by the correspond-ing clock signal in the traditional serially linked delay line.Thus, the reference clock period will be determined accordingto the DFFs output Q[N : 1], which is a thermometer code. Theoverall cell number in the traditional wide-range DLL is

Ntraditional = Ndelayline +NDFF = 3N = 3 (2K + 1)n. (3)

Another innovation of our proposed ADDLL is in the design ofour pulse counter which is also the period detector of the pro-posed ADDLL. There are three ways to count the number ofGRO pulses, as shown in Fig. 3. The first involves using a sim-ple binary counter, which allows for maximal power and areasavings, as shown in Fig. 3(a). However, although the binarycounter uses the less DFFs, log2K, and saves the area mostly, itmaybe failed at high clock frequency output of the GROs. Thesecond way involves serially linking DFFs into a shift-register,as shown in Fig. 3(b). When utilizing this method, the speed is

Page 3: A 6.7 MHz to 1.24 GHz 0.0318 mm Fast-Locking All-Digital ...cc.ee.ntu.edu.tw/~cchen/papers/07317498.pdf · As a result, a synchronous output clock CKDLL will be generated by sensing

414 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 51, NO. 2, FEBRUARY 2016

Fig. 2. Clock period detector for the traditional wide-range DLLs. (a) Architecture. (b) Waveform diagram.

Fig. 3. GRO pulse counters. (a) Binary counter. (b) Shift-register type. (c) Proposed combinational type.

limited by the propagation delay, setup time, and hold time ofthe DFFs. Although it is easier to meet timing requirements athigh GRO frequencies with a serially linked DFF-based pulsecounter, a long DFF chain with K DFFs must be employed tosupport a wide frequency range. This would effectively negatethe power and area savings one would hope to achieve byswitching to a GRO-based delay line in the first place. Thus,we use a combination of the two methods to address this issue,shown in Fig. 3(c). We adopt x serially linked DFFs for highspeed, followed by a counter with y bits to support a widefrequency range. Therefore, the maximum pulse count K is

K = x · 2y. (4)

Then, y can be derived as

y = log2

(K

x

). (5)

However, we only need x+ y DFFs which is

x+ y = x+ log2

(K

x

). (6)

We should notice that since two GROs are used, twice this num-ber of DFFs is actually required. As a result, we can see thenumber of logic gates is roughly reduced in logarithm. Theproposed architecture is only meaningful given the followingstatement:

Ntraditional > 2 (n+ x+ y) . (7)

From (3), we can rewrite the condition as

3 (2K + 1)n > 2

[n+ x+ log2

(K

x

)]. (8)

Usually, K is relative to the DLL operating frequency range. nis the number of gates in a GRO which is relative to the oscil-lating frequency. x is the shift-register length, which should bedetermined by the speed requirement of the binary counter usedin pulse counting. We can select proper value of K, n, and x fora wide frequency range with area saving. For the same numberof cells 2[n+ x+ log2(K/x)], the conventional serially linkeddelay line provides the maximum delay TA,bound

TA,bound =4

3

[n+ x+ log2

(K

x

)]t. (9)

Page 4: A 6.7 MHz to 1.24 GHz 0.0318 mm Fast-Locking All-Digital ...cc.ee.ntu.edu.tw/~cchen/papers/07317498.pdf · As a result, a synchronous output clock CKDLL will be generated by sensing

HSIEH et al.: 6.7 MHz TO 1.24 GHz 0.0318 mm2 FAST-LOCKING ADDLL 415

Fig. 4. Comparison of the traditional serially linked delay line and the proposed GRO-based delay line (with automatic period detector). (a) Delay range versuscell number. (b) Operating frequency versus power consumption.

Equation (9) shows that if the longest required clock periodis larger than TA,bound, the GRO-based delay line achievessmaller area for a constant 2[n+ x+ log2(K/x)] delay cells.Fig. 4(a) shows the cost comparison between a conventionaldelay line and the GRO-based delay line for selected n, x,and K values. As the required delay increases, the number ofcells required of a conventional delay line also increases, pro-portionally to the delay increases. However, the GRO-baseddelay line requires a constant number of delay cells 2[n+ x+log2(K/x)] to support the wide frequency range. However,since true single phase clock (TSPC) logic is adopted for theDFFs in this work, we should notice that the area of a DFF is 4×larger than one inverter. For saving silicon area, the conditionexpressed in (8) should be rewritten as

6 (2K + 1)n > 2

[n+ 4x+ 4log2

(K

x

)]. (10)

In terms of power consumption, the GRO-based delay linealso performs better than the serially linked delay line. In atraditional wide-range DLL, all of the delay cells in the long,serially linked delay line are switching at the reference fre-quency. Because the power consumption is proportional tothe operating frequency, a higher operating frequency leads tohigher power consumption. Additionally, the number of cellsin the delay line is also related to the system power consump-tion. We should account for every cell in the delay line whencalculating the power intake of the system. In a traditionalwide-range DLL, many inverters are used to enable wide delayrange. While operating with a 2mt clock period, although onlym buffers are required to produce enough delay, the power ofall N buffers should be included. As a result, the total powerconsumption of 2N gates is

Pdelayline = CV 2 1

t

(N

m

). (11)

The power consumption is proportional to N/m. No mat-ter what the operating frequency is, all delay cells areswitching. Therefore, lower power consumption is attainableat lower frequencies, where the number of useful delay cell

m approaches the total number of delay cells N . On the otherhand, while operating at higher frequencies where fewer buffersare useful, a much higher power consumption N/m timesCV 2 (1/t) is necessary. The minimum power is CV 2 (1/t) atthe lowest operating frequency where m is equal to N . Sinceeach DFF in the period detector is triggered in every referenceclock period 2mt and the power of the TSPC DFF is roughly4× larger than that of an inverter, the power consumption of theDFFs is

PDFF = CV 2 1

t

(2N

m

). (12)

As a result, the overall power consumption of the traditionalDLL is

Ptotal,traditional = Pdelayline + PDFF = CV 2 1

t

(3N

m

). (13)

However, for the proposed GRO-based delay line, because twoGROs are oscillating with a period of 2nt and each GRO isoscillating in only half of the input reference cycle, the powerof the 2 GROs is

PGRO = CV 2 1

t· 12. (14)

In the pulse counter, the shift-register is still triggered in theGRO oscillation frequency of 1/2nt and the switching fre-quency of the binary counter has been slowed down to 1/(2ntx)by the shift-register. Thus, the power consumption of the pulsecounter is (DFF power is 4× larger than that of an inverter)

PPC = CV 2 1

t· 2

nx

[x2 + log2

(K

x

)]. (15)

Consequently, the total power is

Ptotal,proposed = PGRO + PPC

= CV 2 1

t· 12+ CV 2 1

t· 2

nx

[x2 + log2

(K

x

)]

= CV 2 1

t·{1

2+

2

nx

[x2 + log2

(K

x

)]}.

(16)

Page 5: A 6.7 MHz to 1.24 GHz 0.0318 mm Fast-Locking All-Digital ...cc.ee.ntu.edu.tw/~cchen/papers/07317498.pdf · As a result, a synchronous output clock CKDLL will be generated by sensing

416 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 51, NO. 2, FEBRUARY 2016

Fig. 5. Proposed ADDLL. (a) Architecture. (b) Waveform diagram.

Since the proposed GRO-based delay line consumes a constantpower regardless of the DLL input frequency, the GRO-baseddelay line profits from lower power, as compared to conven-tional delay lines, at higher operating frequencies. For a seriallylinked delay line with N buffers consuming the same power, theoperating clock period TP,bound is

TP,bound = 2mP,bound · t=

3N14 + 1

nx [x2 + log2 (K/x)]· t.

(17)

Therefore, when the operating clock period is smaller thanTP,bound, the GRO-based delay line saves power over theconventional delay line by drawing a constant power ofPtotal,proposed. Based on (13) and (16), the mathematical con-dition can be derived as

3N

m>

1

2+

2

nx

[x2 + log2

(K

x

)]. (18)

Since N , m, and K are relative to the operating frequencyrange, we should notice that the proposed architecture provides

power and area advantages in supporting ultra-wide frequencyranges with a well-selected shift-register length x and GRO gatenumber n to meet the conditions in (10) and (18). Fig. 4(b)shows a rough power analysis. The proposed ADDLL con-sumes a constant power, which is independent of the operatingfrequency. DLLs are usually used to complement the phase dif-ference between the original input clock and the target clock.Because the GROs can track the rising and falling edges ofthe target clock independently, we can create an output clockin phase with the target clock from the original input clock bycombining the tracking edges. Furthermore, we do not need tocalibrate the mismatch between the two GROs caused by PVTvariations because the two GROs are totally independent anduncorrelated.

III. ARCHITECTURE AND CIRCUIT IMPLEMENTATION OF

THE PROPOSED ADDLL

Fig. 5(a) shows the proposed architecture of the ADDLL.The ADDLL consists of the PTDU, digital phase selector(DPS), digital phase mixer (DPM), edge combiner (EC), and

Page 6: A 6.7 MHz to 1.24 GHz 0.0318 mm Fast-Locking All-Digital ...cc.ee.ntu.edu.tw/~cchen/papers/07317498.pdf · As a result, a synchronous output clock CKDLL will be generated by sensing

HSIEH et al.: 6.7 MHz TO 1.24 GHz 0.0318 mm2 FAST-LOCKING ADDLL 417

control unit. In the PTDU, two GROs are included to convertthe input clock CKREF into two pulse signals T_L and T_H,with the most delay times to the rising and falling edge ofCKREF, respectively. The PTDU is the frequency detector andcoarse tuning block of the wide-range DLL. Because of thepoor timing resolution of the PTDU, the DPS and DPM areadopted fine-tuning blocks following the PTDU to improve thetiming resolution. However, the delays of the DPS and DPMshould be defined correctly for minimum output phase error.Therefore, the control unit, including two replica delay lines(RDLs) and code generators, will generate the control codesDH/L and FH/L to determine the delays of the DPS and DPMcorrectly. Furthermore, an open-loop locking scheme is adoptedfor fast locking. In the end, because H and L are pulse func-tions representing the delay of the rising and falling edges ofCKREF, an EC is required to recover an output clock with theduty cycle of CKREF.

Fig. 5(b) shows the waveform diagram of the ADDLL. In thiswork, a synchronous output clock CKDLL is generated by com-bining two phases H and L, which represent the delay of thefalling edge and the rising edge of CKREF. H and L are gen-erated from the PTDU, DPS, and DPM operations, where thePTDU provides most of the (coarse) signal delay, and the DPSand DPM finely specify the delay to minimize the output phaseerror. In the PTDU, two NOR-GROs are adopted instead in lieuof a long chain of delay cells to generate T_H and T_L. OneNOR-GRO is gated by /CKREF and oscillates when CKREF ishigh; another is gated by CKREF and oscillates when CKREF islow. In order to achieve automatic locking, the PTDU is adoptedas the frequency detector in the proposed wide-range ADDLL.The operating frequency is detected by the counted pulse num-ber j of GRO. While operating at different frequencies, j willbe in different values. By sensing a proper NOR-GRO risingedge prior to the rising edge and falling edge of CKREF, T_Land T_H are generated to track the rising and falling edge ofCKREF, respectively. In this work, the third from last of theNOR-GRO rising edge [edge j − 2 in Fig. 5(b)] is selectedto provide enough time for the following DPS, DPM, and ECblocks to operate correctly. The PTDU requires a smaller areaand achieves a wider operating frequency range without usinga long delay line. As shown in Fig. 5(b), assuming each gatedelay is t, the oscillation period of the GRO is 6t with a 50%duty cycle. Consequently, the estimated ADDLL delay timerange is

21t ≤ T ≤ [6 (K + 1) + 3] t. (19)

Because the maximum value of pulse counter K is 320 in thiswork, a pulse counter consisting of a five-stage shift-registerand a 6 bit binary counter is adopted.

A. Phase-Tracing Delay Unit

To track the falling edge and rising edge of CKREF indepen-dently, the PTDU is divided into two parts, H-part for fallingedge tracing (with output signal T_H) and L-part for risingedge tracing (with output signal T_L), as shown in Fig. 6(a).As described later, to compensate for the intrinsic delay of

the various ADDLL blocks, the third from the last rising edge[j − 2 edge in Fig. 5(b)] of the falling edge/rising edge tracingGRO is output as T_H/T_L. The H-part and L-part each consistof a cyclic pulse generator (CPG), a path selector, a timing con-troller, a MUX, and a NOR-GRO which is gated by CKREF inthe L-part and /CKREF in the H-part. The CPG and the counterin the timing controller make up the pulse counter. In the CPG,Q1 to Q5 are generated from serially linked and chronologi-cally triggered DFFs when the GRO oscillates. Because theseDFFs are reset by Q5 or the input reference clock, each sig-nal from Q1 to Q5 may have multiple pulses in a given clockcycle. Two steps are required to select a rising edge from Q1

to Q5 corresponding to the (j − 2)th rising edge of the GRO.The first step, performed by the path selector, is to select thecorrect QN . Since there are many pulses in the selected QN ,the second step is to select the last pulse of QN which corre-sponds to the third from last rising edge (edge j − 2) of theGRO as the tracing signal. When the GRO stops oscillating atthe end of each input cycle, the registers in the path selectorcaptures the values of Q1 to Q5 for generating C[5:1]. SinceQ1 to Q5 are triggered chronologically, C[5:1] is a thermome-ter code. Therefore, there is only one nonzero bit in S[5:1] dueto the XOR gates. As a result, S[5:1] determines which QN

should be selected. Due to the input clock jitter, DFF setupand hold time violations may induce bubble codes in S[5:1].Therefore, a finite-state machine (FSM) is necessary to gener-ate a stable control code Sel[5:1]. As shown in Fig. 6(b), theFSM determines whether to refresh the control code Sel[5:1] ornot by comparing the current Sel[5:1] and the renewed controlcode S[5:1]. When each Sel[i] is obtained, Sel[5:1] will remainthe same value if S[i− 1], S[i], or S[i+ 1] is the same asS[i]. Otherwise, Sel[5:1] will be refreshed to S[5:1]. If QN hasmultiple rising edges within a clock period, only the last edgeshould be selected as the tracing signal. Therefore, a timingcontroller is adopted to determine when to output the selectedQN . The timing controller consists of a counter triggered by therising edge of Q5 and reset by the input clock, a register stor-ing the current count, a subtractor, and a comparator. Becausethere may be several pulses in Q5, the counter in the timingcontroller calculates the current pulse number and stores thenumber in the register. When Q1 or Q2 is selected in the pathselector, the subtrahend of the subtractor is zero because Sel[3]or Sel[4] is true and Sel[1], Sel[2], and Sel[5] are false in binary.Otherwise, the subtrahend of the subtractor is 1. Therefore,when the value of the counter is the same as the output of sub-tractor, the selected signal QN will be passed out. Fig. 7(a)shows the waveform diagram of how Q2 is selected andFig. 7(b) shows the waveform diagram of how Q5 is selected inthe L-part.

B. Digital Phase Selector, Digital Phase Mixer, and EdgeCombiner

After the PTDU provides the two tracing phases T_H andT_L with coarse delay times, the DPS and DPM [9] will finelytune the delays of T_H and T_L independently, resulting insignals H and L that are used for recovering the output clockCKDLL via the EC. In the DPS, two adjacent phases of a short

Page 7: A 6.7 MHz to 1.24 GHz 0.0318 mm Fast-Locking All-Digital ...cc.ee.ntu.edu.tw/~cchen/papers/07317498.pdf · As a result, a synchronous output clock CKDLL will be generated by sensing

418 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 51, NO. 2, FEBRUARY 2016

Fig. 6. Proposed PTDU. (a) PTDU architecture and circuit diagram of L-part. (b) FSM circuit diagram.

delay line are selected by MUXs for output according to thecontrol code DH/L[5 : 0], as shown in Fig. 8(a). The timingresolution of the DPS is limited by the buffer delay τ , which isusually around 40 ps in 90 nm CMOS technology. Therefore,without the DPM, the maximum output phase error is 40 ps dueto the open-loop locking method applied in this work. Thus,to reduce the phase error, a DPM is used to improve timingresolution. After two adjacent phases are selected by the DPS,the DPM interpolates the timing of these two phases for bet-ter timing precision by controlling the current distribution tothe delay cells for each phase input. As shown in Fig. 8(b), thetwo selected phases PH/L < i > and PH/L < i+ 1 > are fedinto the DPM, which is composed of two independent delaycells with a shared output. Each delay cell is constructed ofeight gated inverter cells controlled by the thermometer codeTH/L[6 : 0] and one dummy cell. The dummy cell for PH/L

< i > is always “ON” and the dummy cell for PH/L < i+1 > is always “OFF.” Since the number of “ON” inverters

in the DPM is constant, we can interpolate the phasesPH/L < i > and PH/L < i+ 1 > by distributing the numberof “ON” inverters between the DPM delay cells according toTH/L[6 : 0]. Because there are eight “ON” inverters to dis-tribute between the two delay cells, the timing resolutionwill be improved to τ/8 which is around 4–6 ps depend-ing on the timing resolution of the DPS. The DPS and DPMthus finely define the delay of the phases H and L giventhe tracing phases T_H and T_L. Fig. 9(a) shows the sim-ulated overall delay of the DPS and DPM associated withthe corresponding control code DH/L[5 : 0] and TH/L[6 : 0].The delay of the DPS and DPM is monotonic increasingalong with the increasing control codes so that there is nofalse locking problem in the ADDLL. Fig. 9(b) shows thesimulated INL and DNL of the DPS and DPM delay. Themaximum delay difference between two adjacent codes (DNL)is 7 ps which also represents the maximum phase error ofoutput.

Page 8: A 6.7 MHz to 1.24 GHz 0.0318 mm Fast-Locking All-Digital ...cc.ee.ntu.edu.tw/~cchen/papers/07317498.pdf · As a result, a synchronous output clock CKDLL will be generated by sensing

HSIEH et al.: 6.7 MHz TO 1.24 GHz 0.0318 mm2 FAST-LOCKING ADDLL 419

Fig. 7. Waveform diagram of L-part in PTDU. (a) Q2 is selected. (b) Q5 is selected.

Fig. 8. (a) DPS. (b) DPM.

Finally, an EC is used to reconstruct output waveform.Fig. 10 shows the circuit diagram of EC. In the EC, when theH signal is at high, the EC discharges the output CKDLL to beat low. On the other hand, the output CKDLL of the EC will be

at high when the L signal is at high. Because H and L are pulsesignals, H and L will not be at high at the same time. If both ofH and L are at low, the output CKDLL will be held by the latchat the previous level. The timing diagram associated with the

Page 9: A 6.7 MHz to 1.24 GHz 0.0318 mm Fast-Locking All-Digital ...cc.ee.ntu.edu.tw/~cchen/papers/07317498.pdf · As a result, a synchronous output clock CKDLL will be generated by sensing

420 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 51, NO. 2, FEBRUARY 2016

Fig. 9. Simulated delay of the DPS and DPM. (a) Delay versus control code. (b) INL and DNL.

Fig. 10. Edge combiner.

DPS and DPM is shown in Fig. 11, where the rising and fallingedge timings of the reference clock are represented by signalsL and H .

In summary, after H-part and L-part of the PTDU gener-ate the tracing phases T_H and T_L with rough and imprecisedelays, the DPS and DPM finely tune the delays from T_Hand T_L to H and L with better timing resolution to mini-mize output phase error. The DPS compensates for the outputphase error with a resolution of τ and the DPM compensates forthe phase error with a resolution of τ/8. Finally, the EC recov-ers the output waveform from H and L while maintaining theinput clock’s duty cycle. However, the DPS, DPM, and EC allpossess intrinsic minimum delays as shown in Fig. 11. To com-pensate for the delays, we had the PTDU select for the thirdfrom the last rising edge (j − 2 in Fig. 5) of each GRO output.In order to ensure that the ADDLL will be locked, the tuningrange of the DPS and DPM should be at least twice as large asthe PTDU timing resolution because bubble codes may occurin the PTDU, i.e., 6τ should not be less than 2(6t). Since theeach buffer in the DPS includes two inverters, we make τ to beequal to 2t while using the same transistor size. Thus, 6τ wouldbe equal to 2(6t).

C. Control Unit

Since the PTDU is able to provide two proper phases T_Hand T_L automatically, a control unit is required to define the

delays of the DPS and DPM to minimize the phase error. Thecontrol unit includes two RDLs, as shown in Fig. 12(a). Thefirst RDL provides minimum delay from the DPS to EC withan output clock CKPSR. Because both the DPS and DPM inthe first RDL are in minimum delay, the delay of the DPS inthe main delay line can be defined by quantizing the phase dif-ference between CKPSR and CKREF with a timing resolutionof τ . In the second RDL, an output clock CKPMR is generatedwith a minimum delay from the DPM to the EC. Therefore, thedelay of the DPM in the main delay line can be determined byquantizing the phase difference between CKPMR and CKREF

with a timing resolution of τ/8. Fig. 12(b) shows the timingdiagram for the control unit.

The decision code DH/L[5 : 0] for defining the DPS delayis generated by the DPS code generator in the first RDL,which actually is a delay-line time-to-digital converter (TDC),as shown in Fig. 13(a). Note that the delay lines of the TDC andDPS should be matched, so each delay cell should have a delayof τ . Thus, the DPS code generator is able to quantize the phasedifference between CKPSR and CKREF with a timing resolu-tion of τ . As a result, we can specify the delay required of theDPS in the main delay line to compensate for the output phaseerror with a timing resolution of τ . An encoder is necessary toconvert the thermometer code QH/L[6 : 1] triggered chronolog-ically by the DFFs into binary code DH/L[5 : 0]. However, as inthe PTDU, an FSM is also required in the DPS code generatorbecause the decisions may suffer from bubble codes due to DFFsetup/hold time violations. After the delay of the DPS in themain delay line is defined, the second RDL with a DPM codegenerator creates CKPMR with the minimum delay in DPM.The timing diagram is shown in Fig. 13(b).

Since it is hard to build a TDC with a timing resolution ofτ/8, a tunable delay unit (TDU) is used in the DPM code gen-erator, as shown in Fig. 14(a). There are two different delaysin the TDU, which are Ttdu and Ttdu + Ui. Ui is the resolutionfactor. The delay of each signal path in the TDU is controlledby the selection signal S. While S is high, the delay from IN1 toOUT1 is Ttdu and the delay from IN2 to OUT2 is Ttdu + Ui.On the contrary, the delay from IN1 to OUT1 is Ttdu + Ui andthe delay from IN2 IN2 to OUT2 is Ttdu, while S is low. For

Page 10: A 6.7 MHz to 1.24 GHz 0.0318 mm Fast-Locking All-Digital ...cc.ee.ntu.edu.tw/~cchen/papers/07317498.pdf · As a result, a synchronous output clock CKDLL will be generated by sensing

HSIEH et al.: 6.7 MHz TO 1.24 GHz 0.0318 mm2 FAST-LOCKING ADDLL 421

Fig. 11. Timing diagram of DPS, DPM, and EC.

Fig. 12. Control unit. (a) Block diagram. (b) Timing waveform.

fast locking, the successive approximation technique is appliedin the DPM code decision process. Fig. 14(b) shows the DPMcode decision flowchart. If CKPMR leads CKREF in the begin-ning, which means the delay is too large, FH/L[2 : 0] shouldbe directly set to “000” for minimum delay. If CKPMR lagsCKREF in the beginning, the DPM code decision process willstart. It takes three TDU stages to generate the DPM code sincethere are three bits of FH/L[2 : 0]. In each stage, only onebit will be determined according to the input phase relation.When CKPMR leads CKREF, CKREF will be delayed by Ttdu

and CKPMR will be delayed by Ttdu + Ui. On the contrary, ifCKPMR lags CKREF, CKREF will be delayed by Ttdu + Ui

Fig. 13. DPS code generator. (a) Block diagram. (b) Timing waveform.

and CKPMR will be delayed by Ttdu. This decision processis then recursively repeated. Therefore, the phase differencebetween CKREF and CKPMR will be reduced with a timingresolution of Ui within each recursive cycle. The timing res-olution factor Ui used in each stage should be τ/2, τ/4, andτ/8, respectively, since the successive approximation for thecontrol code is performed in binary. However, the timing reso-lution factor Ui can be carefully determined by transistor sizing.We generate Ttdu by one buffer with transistor size of W1/Lmin

and generate Ttdu + Ui by another buffer with transistor size ofW2/Lmin. Therefore, we can define the timing resolution fac-tor Ui by choosing transistor sizes of W1/Lmin and W2/Lmin.Taking PVT variations and device mismatch of the buffers intodesign considerations, the timing error of Ui should be smallerthan τ/16 for ensuring that there is no missing code in the DPM

Page 11: A 6.7 MHz to 1.24 GHz 0.0318 mm Fast-Locking All-Digital ...cc.ee.ntu.edu.tw/~cchen/papers/07317498.pdf · As a result, a synchronous output clock CKDLL will be generated by sensing

422 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 51, NO. 2, FEBRUARY 2016

Fig. 14. DPM code generator. (a) TDU. (b) Decision flow. (c) Block diagram and timing waveform.

Fig. 15. Locking procedure of the proposed ADDLL.

code generator. Thus, the DPM code generator is functionalwith PVT variations and device mismatch. In the circuit imple-mentation of the DPM code generator, we can accomplish threerecursive decision steps in only one clock cycle because of theTDU. Fig. 14(c) shows the block and waveform diagram ofthe DPM code generator. Three serially linked TDUs with dif-ferent timing resolution factors Ui are used. The input phase

Fig. 16. Simulated power consumption of the ADDLL.

relation at each stage is determined by the bang-bang phasedetector (BBPD) between each TDU. At the first TDU whereU1 is τ/2, because CKREF always leads CKPMR, CKPMR

will be delayed by Ttdu + τ/2 and CKREF will be delayed byTtdu, resulting in O1 and R1, respectively, in the next stage.FH/L[2] is decided according to the phase relation of O1 andR1. FH/L[2] also determines the signal paths of the secondTDU stage where U2 is τ/4. The second TDU creates a timedifference of τ/4 and results in signals O2 and R2. The phase

Page 12: A 6.7 MHz to 1.24 GHz 0.0318 mm Fast-Locking All-Digital ...cc.ee.ntu.edu.tw/~cchen/papers/07317498.pdf · As a result, a synchronous output clock CKDLL will be generated by sensing

HSIEH et al.: 6.7 MHz TO 1.24 GHz 0.0318 mm2 FAST-LOCKING ADDLL 423

Fig. 17. (a) Chip micrograph. (b) Area breakdown.

Fig. 18. Locked state of the ADDLL. (a) 6.7 MHz. (b) 1.24 GHz.

relation between these two signals determines the value ofFH/L[1]. Similarly, FH/L[0] is decided according to the phaserelation between O3 and R3 after the third TDU creates a finalτ/8 time difference. Because the required three TDU stagesare in one signal path, the DPM code generator needs only oneclock cycle to determine the DPM delay.

The ADDLL will be locked by the sixth clock cycle and thewaveform diagram is shown in Fig. 15. The PTDU requirestwo clock cycles to generate the tracing phases T_H and T_L.The first cycle is for generating Sel[5:1] in the path selec-tor and the second cycle is needed for registers in the timingcontroller to store the count. CKPSR and CKPMR in the twoRDLs are generated at the third cycle. The DPS code genera-tor defines the DPS delay by DH/L[5 : 0] in the fourth cycle.During the fifth cycle, the delay of the DPM will also be deter-mined by the DPM code generator. The ADDLL is able toprovide a synchronous output clock CKDLL by the start of thesixth cycle. As a result, the locking time of the ADDLL is fivecycles regardless of the operating frequency. For system clockgeneration, because the ADDLL applies the same procedureto track any clock phase, creating an output clock that is inphase with the tracked clock only takes the same five cyclesused for locking. Fig. 16 shows the simulated power break-down of the proposed ADDLL. As same as the power analysisin Section II, the PTDU which uses the proposed GRO-based

delay line consumes constant power dissipation independentof the reference frequency. However, the power consumptionof the DPS, the DPM, the EC, and the control logic is stillproportional to the operating frequency. The simulated powerof PTDU is slightly increasing along with the increasing fre-quency because there are some buffers in the PTDU operating atthe reference frequency. As a result, the power consumption ofthose buffer operating at reference frequency is still frequencydependent and follows P = CV 2f .

IV. EXPERIMENTAL RESULTS

The proposed ADDLL has been fabricated in TSMC 90 nmCMOS technology and occupies 0.0318 mm2 active siliconarea. The micrograph of the chip and the area breakdown areshown in Fig. 17, including the PTDU, DPS, DPM, and twoRDLs in the open-loop control logic. The operating range ofthe ADDLL is from 6.7 MHz to 1.24 GHz. Fig. 18 shows theoutput signal waveform while the ADDLL is locked. CKDLL isaligned with CKREF at both 1.24 GHz and 6.7 MHz. Becausethe delays associated with the rising and falling edges ofCKREF locked independently, the output duty cycle should bethe same as the input duty cycle. The measured root-mean-square jitters are 424.64 fs and 6.92 ps at 1.24 GHz and6.7 MHz, respectively, as shown in Fig. 19. In addition, the

Page 13: A 6.7 MHz to 1.24 GHz 0.0318 mm Fast-Locking All-Digital ...cc.ee.ntu.edu.tw/~cchen/papers/07317498.pdf · As a result, a synchronous output clock CKDLL will be generated by sensing

424 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 51, NO. 2, FEBRUARY 2016

Fig. 19. Jitter histogram of the ADDLL. (a) 6.7 MHz. (b) 1.24 GHz.

Fig. 20. Measured locking time of the ADDLL. (a) 10 MHz. (b) 525 MHz.

measured peak-to-peak jitters are 2.22 ps at 1.24 GHz and40 ps at 6.7 MHz. For the 40 ps jitterP-P at 6.7 MHz, 36 pscome from the signal generator. The measured locking time isfive cycles regardless of the operating frequency as shown inFig. 20. The PTDU generates T_H and T_L with coarse delaytimes in the first two cycles. Then, the control logic generatestwo outputs CKPSR and CKPMR through two RDLs in the thirdcycle. The control logic uses the fourth cycle to define the delayin the DPS and the fifth cycle to define the delay in the DPM.The locking procedure is completed and the ADDLL providesan output clock synchronized with the input clock by the sixthcycle. The ADDLL consumes 14 mW power at 1.24 GHz with1.2 V supply voltage.

Compared to previous wide-range DLLs, the proposedADDLL achieves the widest frequency range while maintaininglow-power consumption and small silicon area. In Fig. 21(a),the frequency range is defined by the ratio of the highest sup-ported frequency to the lowest frequency, which is related tothe delay line length. In addition, we normalized the area to thetechnology node in order to compare the area fairly. Since thepower is related to the operating range which is proportionalto the number of delay cells, the power consumption should

also be normalized to the supply voltage and operating range,as shown in Fig. 21(b). Furthermore, frequency range, powerconsumption, and area are all correlated in wide-range DLLs.Wider operating frequency range requires larger area, whichleads to higher power consumption. We define figures of merit(FOM) to compare the wide-range DLL performance in areaand power independently. The power and area FOM are definedas follows:

FOMPower =Power

Supply2 · fMaxfMin

(20)

FOMArea =Area

L2 · fMaxfMin

. (21)

However, to evaluate the overall performance of a wide-rangeDLL, the overall FOM should be

FOMDLL =Power · area

Supply2 · L2 · fMaxfMin

. (22)

The definition of FOM is based on normalized power, nor-malized area, and frequency range. Table I summarizes and

Page 14: A 6.7 MHz to 1.24 GHz 0.0318 mm Fast-Locking All-Digital ...cc.ee.ntu.edu.tw/~cchen/papers/07317498.pdf · As a result, a synchronous output clock CKDLL will be generated by sensing

HSIEH et al.: 6.7 MHz TO 1.24 GHz 0.0318 mm2 FAST-LOCKING ADDLL 425

Fig. 21. Performance summary. (a) Operating frequency range. (b) Power consumption.

TABLE ICOMPARISON OF WIDE-RANGE DLLS

compares the performance of the proposed ADDLL with otherworks. Because of the open-loop scheme, the ADDLL achievesbetter peak-to-peak and root-mean-square jitter performance.Furthermore, taking frequency range, power consumption, andarea into consideration, the proposed ADDLL gets the bestFOMDLL of 0.206.

V. CONCLUSION

In this paper, a wide-range ADDLL is proposed for appli-cations that require fast locking and low cost. The GRO-baseddelay line replaces conventional long serially linked delay linesand saves area. The GRO-based delay line not only achieves awide operating frequency with small area, but it also minimizes

power consumption due to the use of fewer delay cells. Becauseof the open-loop locking scheme, the ADDLL only requiresfive cycles to be locked. Two cycles are required for the PTDUoutput phases T_H and T_L to be ready, with one cycle forthe phase selector and one cycle for the timing control. Afterthe PTDU is locked, it takes one cycle to determine the DPSdelay and another cycle to define the DPM delay in the controlunit that utilizes two RDLs. As a result, we obtain a syn-chronous output clock at the end of the fifth cycle. Because theADDLL locks to the rising and falling edges independently, theADDLL is able to recover the input duty cycle at the outputclock. Therefore, if the input duty cycle is 50%, we do not needan extra duty cycle correction (DCC) circuit in system appli-cations. In general, the frequency range, power consumption,

Page 15: A 6.7 MHz to 1.24 GHz 0.0318 mm Fast-Locking All-Digital ...cc.ee.ntu.edu.tw/~cchen/papers/07317498.pdf · As a result, a synchronous output clock CKDLL will be generated by sensing

426 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 51, NO. 2, FEBRUARY 2016

and area are correlated in a wide-range DLL. However, theproposed architecture utilizing a GRO-based delay line breaksdown the relationship between these metrics. With silicon areaand power consumption comparable to previous designs, theADDLL achieves much wider frequency range. Consequently,the ADDLL has the best reported FOMArea, FOMPower, andoverall FOMDLL. Because digital DLLs are highly suited fortechnology scaling, the proposed wide-range, fast-locking, andlow-cost ADDLL architecture enables higher frequencies andwider operating frequency range that would be advantages innew SoC communication systems at advanced technology.

ACKNOWLEDGMENT

The authors would like to thank the National ChipImplementation Center (CIC), Taiwan, for chip fabrication.

REFERENCES

[1] S. J. Kim et al., “A low-jitter wide-range skew-calibrated dual-loopDLL using antifuse circuitry for high-speed DRAM,” IEEE J. Solid-StateCircuits, vol. 37, no. 6, pp. 726–734, Jun. 2002.

[2] B.-G. Kim, L.-S. Kim, K.-I. Park, Y.-H. Jun, and S.-I. Cho, “A DLL withjitter reduction techniques and quadrature phase generation for DRAMinterfaces,” IEEE J. Solid-State Circuits, vol. 44, no. 5, pp. 1522–1530,May 2009.

[3] M. Hossain et al., “A 400 MHz–1.6 GHz fast lock, jitter filtering ADDLLbased burst mode memory interface,” in IEEE Symp. VLSI Circuits Dig.Tech. Papers, Jun. 2013, pp. 244–245.

[4] J.-S. Wang, T.-M. Wang, C.-H. Chen, and T.-C. Liu, “An ultra-low-powerfast-lock-in small-jitter all-digital DLL,” in IEEE Int. Solid-State CircuitsConf. (ISSCC) Dig. Tech. Papers, Feb. 2005, pp. 422–607.

[5] R.-J. Yang and S.-I. Liu, “A 2.5 GHz all-digital delay-locked loop in0.13 µm CMOS technology,” IEEE J. Solid-State Circuits, vol. 42, no. 11,pp. 2338–2347, Nov. 2007.

[6] B. Mesgarzadeh and A. Alvandpour, “A low-power digital DLL-basedclock generator in open-loop mode,” IEEE J. Solid-State Circuits, vol. 44,no. 7, pp. 1907–1913, Jul. 2009.

[7] M.-J. E. Lee et al., “Jitter transfer characteristics of delay-locked loops-theories and design techniques,” IEEE J. Solid-State Circuits, vol. 38,no. 4, pp. 614–621, Apr. 2003.

[8] J.-S. Wang, C.-Y. Cheng, J.-C. Liu, Y.-C. Liu, and Y.-M. Wang, “A duty-cycle-distortion-tolerant half-delay-line low-power fast-lock-in all-digitaldelay-locked loop,” IEEE J. Solid-State Circuits, vol. 45, no. 5, pp. 1036–1047, May 2010.

[9] H.-H. Chang and S.-I. Liu, “A wide-range and fast-locking all-digitalcycle-controlled delay-locked loop,” IEEE J. Solid-State Circuits, vol. 40,no. 3, pp. 661–670, Mar. 2005.

[10] W.-J. Yun, H. W. Lee, D. Shin, S. D. Kang, J. Y. Yang, and H. O. Lee, “A0.1-to-1.5 GHz 4.2 mW all-digital DLL with dual duty-cycle correctioncircuit and update gear circuit for DRAM in 66 nm CMOS technology,”in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb.2008, pp. 282–613.

[11] R.-J. Yang and S.-I. Liu, “A 40–550 MHz harmonic-free all-digitaldelay-locked loop using a variable SAR algorithm,” IEEE J. Solid-StateCircuits, vol. 42, no. 2, pp. 361–373, Feb. 2007.

[12] D. Shin, J. Song, H. Chae, and C. Kim, “A 7 ps jitter 0.053 mm2

fast lock all-digital DLL with a wide range and high resolution DCC,”IEEE J. Solid-State Circuits, vol. 44, no. 9, pp. 2437–2451, Sep.2009.

[13] M.-H. Hsieh, L.-H. Chen, S.-I. Liu, and C.-P. Chen, “A 6.7 MHz-to-1.24 GHz 0.0318 mm2 fast-locking all-digital DLL in 90 nm CMOS,”in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb.2012, pp. 244–246.

[14] A. Elshazly, A. Balankutty, Y.-Y. Huang, K. Yu, and F. O’Mahony, “A2 GHz-to-7.5 GHz quadrature clock-generator using digital delay lockedloops for multi-standard I/Os in 14 nm CMOS,” in IEEE Symp. VLSICircuits Dig. Tech. Papers, Jun. 2014, pp. 1–2.

[15] X. Yu, W. Rhee, Z. Wang, J.-B. Lee, and C. Kim, “A 0.4-to-1.6 GHz low-OSR ΔΣ DLL with self-referenced multiphase generation,” in IEEE Int.Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2009, pp. 398–399.

[16] B.-G. Kim and L.-S. Kim, “A 250-MHz–2-GHz wide-range delay-lockedloop,” IEEE J. Solid-State Circuits, vol. 40, no. 6, pp. 1310–1321, Jun.2005.

Min-Han Hsieh (S’11–M’15) was born inKaohsiung, Taiwan, in 1984. He received the M.S.and Ph.D. degrees in electrical engineering from theNational Taiwan University (NTU), Taipei, Taiwan,in 2009 and 2015, respectively.

From 2013 to 2014, he was a Visiting Scholarat Berkeley Wireless Research Center (BWRC),University of California at Berkeley (UCB),Berkeley, CA, USA, sponsored by the NationalScience Council (NSC), Taipei, Taiwan. His researchinterests include domino logic circuits, mixed-signal

integrated circuits, and powerline communication systems.

Liang-Hsin Chen was born in Taoyuan, Taiwan, in1986. He received the B.S. degree in electrical engi-neering from the National Central University (NCU),Taoyuan, Taiwan, in 2009, and the M.S. degree inelectronics engineering from the National TaiwanUniversity (NTU), Taipei, Taiwan, in 2011.

His research interests include mixed-mode integra-tion circuits, high-speed SerDes, and PLLs.

Shen-Iuan Liu (S’88–M’93–SM’03–F’10) was bornin Keelung, Taiwan, in 1965. He received the B.S.and Ph.D. degrees in electrical engineering from theNational Taiwan University (NTU), Taipei, Taiwan,in 1987 and 1991, respectively.

From 1991 to 1993, he served as a SecondLieutenant with Chinese Air Force. From 1991to 1994, he was an Associate Professor with theDepartment of Electronic Engineering, NationalTaiwan Institute of Technology, Taipei, Taiwan. Hejoined the Department of Electrical Engineering,

NTU, in 1994, where he has been a Professor since 1998. Currently, he isa Distinguished Professor with the NTU since August 2010. He is also theDirector of Graduate Institute of Electronics Engineering with the NTU. Hisresearch interests include analog and digital integrated circuits and systems.

Dr. Liu has served as a Technical Program Committee Member for ISSCCfrom 2006 to 2008, IEEE VLSI-DAT from 2008 to 2012, and A-SSCC from2005 to 2012. He also served as the Technical Program Committee Co-Chairand Chair for A-SSCC 2010 and 2011, respectively. He was an Associate Editorfor the IEEE JOURNAL OF SOLID-STATE CIRCUITS from 2006 to 2009 anda Guest Editor for the IEEE JOURNAL OF SOLID-STATE CIRCUITS SpecialIssue between December 2008 and November 2012. He was an AssociateEditor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II:EXPRESS BRIEFS from 2006 to 2007. He was an Associate Editor for the IEEETRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS from2008 to 2009. He was in the Editorial Board of Research Letters in Electronicsfrom 2008 to 2009. He was an Associate Editor for IEICE (The Instituteof Electronics, Information and Communication Engineers) Transactions onElectronics from 2008 to 2011. He is an Associate Editor for ETRI Journal, andalso an Associate Editor for Journal of Semiconductor Technology and Science,Korea, in 2009. He also joined the Editorial Board for International ScholarlyResearch Network (ISRN) Electronics in 2011. He is a member of IEICE.He has served as General Chair of the 15th VLSI Design/CAD Symposium,Taiwan, in 2004, and as Program Co-Chair of the Fourth IEEE Asia-PacificConference on Advanced System Integrated Circuits, Fukuoka, Japan, in 2004.He has served as a Chair of the IEEE SSCS Taipei Chapter, from 2004 to 2008,which achieved the Best Chapter Award in 2009. He was the recipient of theEngineering Paper Award from the Chinese Institute of Engineers in 2003, theYoung Professor Teaching Award from MXIC Inc., the Research AchievementAward from NTU, and the Outstanding Research Award from National ScienceCouncil in 2004.

Page 16: A 6.7 MHz to 1.24 GHz 0.0318 mm Fast-Locking All-Digital ...cc.ee.ntu.edu.tw/~cchen/papers/07317498.pdf · As a result, a synchronous output clock CKDLL will be generated by sensing

HSIEH et al.: 6.7 MHz TO 1.24 GHz 0.0318 mm2 FAST-LOCKING ADDLL 427

Charlie Chung-Ping Chen received the B.S. degreein computer science and information engineeringfrom the National Chiao-Tung University, Hsinchu,Taiwan, in 1990, and the M.S. and Ph.D. degreesin computer science from the University of Texasat Austin, Austin, TX, USA, in 1996 and 1998,respectively.

From 1996 to 1999, he was with Intel Corporation,Strategic CAD Labs, Hillsboro, OR, USA, as a SeniorCAD Engineer. Since 1999, he has been an AssistantProfessor with the ECE Department, University of

Wisconsin, Madison, WI, USA. Since 2003, he has been an AssociateProfessor with the EE Department, National Taiwan University, Taipei, Taiwan.Currently, he is a Professor with the GIEE, BIO, and EE Department, NationalTaiwan University. His research interests include EDA and BIO topics such ascomputer-aided design and microprocessor circuit design with an emphasis oninterconnect and circuit optimization, circuit simulation, statistical design, andsignal/power/thermal integrity analysis and optimization.

Dr. Chen served the Program Committee and/or Organizer of DAC, ICCAD,DATE, ISPD, ASPDAC, ISQED, SASIMI, VLSI/CAD Symposium, and ITRS.He received the D2000 Award from Intel Corp. and National SciencesFoundation Faculty Early Career Development Award (CAREER) from 1999 to2001, respectively. He also received the 2002 SIGDA/ACM Outstanding YoungFaculty Award and the 2002 IBM Peter Schneider Faculty Development Award.