98
T IMING S KEW C ALIBRATION FOR T IME I NTERLEAVED ANALOG TO DIGITAL C ONVERTERS by Luke Wang A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto c Copyright 2014 by Luke Wang

by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

TIMING SKEW CALIBRATION FOR TIME INTERLEAVED ANALOG TO

DIGITAL CONVERTERS

by

Luke Wang

A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science

Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

c© Copyright 2014 by Luke Wang

Page 2: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

Abstract

Timing Skew Calibration for Time Interleaved Analog to Digital Converters

Luke WangMaster of Applied Science

Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

2014

For multi-Gb/s data rates, time interleaving is an attractive solution in ADC design. This

approach however introduces time varying errors including gain, offset and timing skew mis-

match. A novel timing skew correction algorithm utilizing statistical characteristics of input

sinusoids is proposed. Two calibration sequences which extend the algorithm to ADCs with

more than 2 channels are also proposed.

A prototype 10GS/s 8 bit time-interleaved (TI) SAR ADC is realized in TSMC 65nm

CMOS process. To improve the ADC performance, gain, offset, radix and proposed skew

calibration are performed off-chip. Due to the inadequate range of the delay buffers in the

clock path to fully compensate the skew, 4 channels were used to achieve 5GS/s. The proto-

type achieves a SNDR of 38.57dB with a FOM of 738fJ/conversion-step at Nyquist frequency

of 2.5GHz. The total power consumption is 138.6mW from a 1V supply and the ADC occupies

an area of 3.74mm2.

ii

Page 3: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

Acknowledgements

First and foremost, I would like to express my gratitude to my supervisor Professor AnthonyChan Carusone for providing me with guidance and support. His advice, both technical andnon-technical, have been invaluable during the past two years. I would also like to thankProfessor Liscidini, Professor Sheikholeslami, and Professor Trescases for their participationin my MASc exam committee.

This project was completed in collaboration with Jeffrey (Qiwei) Wang, who designedthe individual 1.25GS/s sub-ADC used in the time-interleaved 10GS/s ADC. I would alsolike to acknowledge and thank Victor Kozlov for the design and layout of the one-shot andthermometer-to-binary decoder which were used in the ADC prototype.

I would like to thank everyone in BA5000 for all the interesting conversations we hadranging from stories about unplanned excursions to wacky ideas that will surely not come offruition. I would like to thank Shayan and Alireza for providing help with ADS and high speedsignal integrity issues. I would like to thank Amer for giving me advice on phase noise andtransmission line issues. A special thanks goes out to Jeffrey (Qiwei) Wang for all the help anddiscussions we had which led to the completion of this project. Definitely couldn’t have doneit without you Jeff. I would also like to thank Dawei for providing excellent Visio and UNIXsupport, and for keeping me company when I was on an internship in California. I would liketo thank Mike and Rosanna for letting me “borrow” krusty and homer.

I would like to thank NSERC for providing funding through the CGS scholarship and theProvince of Ontario for providing the OGS scholarship.

Finally, I would like to thank my parents for the understanding, love and support during mystudy.

iii

Page 4: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

Contents

Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Time Interleaved ADC Overview 32.1 Time Interleaved ADC Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Offset Mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.2 Gain Mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.3 Timing Skew . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.4 Bandwidth Mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.5 Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Calibration of Time Interleaved ADC . . . . . . . . . . . . . . . . . . . . . . 142.2.1 Calibration of Static Errors . . . . . . . . . . . . . . . . . . . . . . . . 172.2.2 Overview of Calibration Techniques for Timing Skew . . . . . . . . . 17

El-Chammmas Crosscorrelation Approach . . . . . . . . . . . . . . . 17Stepanovic Taylor Approximation Approach . . . . . . . . . . . . . . 20Huang and Wang Zero Crossing Detection Approach . . . . . . . . . . 21Razavi Autocorrelation Approach . . . . . . . . . . . . . . . . . . . . 24

3 Proposed Timing Skew Calibration Technique 263.1 Cost Function Construction using Clock Swapper . . . . . . . . . . . . . . . . 263.2 Cost Function Construction using Existing Phases . . . . . . . . . . . . . . . . 30

iv

Page 5: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

3.3 Extending the Calibration to Converters with N Channels Using CalibrationSequence A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4 Limitations of Calibration Sequence A Using Proposed Cost Function . . . . . 353.5 Extending the Calibration to Converters with N Channels Using Calibration

Sequence B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.6 Calibration Sequence A Using Razavi’s Cost Function . . . . . . . . . . . . . 393.7 Implementation Using LMS Update . . . . . . . . . . . . . . . . . . . . . . . 393.8 Accuracy and Additional Signal Constraints . . . . . . . . . . . . . . . . . . . 413.9 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.10 Summary and Comparison to Existing Works . . . . . . . . . . . . . . . . . . 43

4 Time-Interleaved ADC Implementation 474.1 Time-Interleaved ADC Architecture . . . . . . . . . . . . . . . . . . . . . . . 474.2 Clock Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.3 Clock and Input Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.4 CML to CMOS Converter and CMOS Delay Line . . . . . . . . . . . . . . . . 544.5 Delay Code Input Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.6 Output Data Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.6.1 Retimer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.6.2 Decimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 Measurement Results 605.1 Test Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.1.1 Device Under Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.1.2 Printed Circuit Board . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.1.3 Equipment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.2 ADC Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.2.1 Single sub-ADC Performance . . . . . . . . . . . . . . . . . . . . . . 645.2.2 Timing Skew Calibration Performance for 2 Channels . . . . . . . . . 655.2.3 Timing Skew Calibration Performance for 8 Channels . . . . . . . . . 665.2.4 5GS/s 4 Channel Time Interleaved ADC Performance . . . . . . . . . 695.2.5 Performance Summary and Comparison . . . . . . . . . . . . . . . . . 72

6 Conclusion 766.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

v

Page 6: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

A Chip Pinout 78

vi

Page 7: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

List of Figures

2.1 Time-Interleaved ADC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 2 Channel Time-Interleaved ADC with Offset Mismatch . . . . . . . . . . . . 52.3 8-bit 4 Channel Time-Interleaved ADC Spectrum with Offset Mismatch (Mark-

ers: Fundamental - Diamond, Offset Distortion - Circular) . . . . . . . . . . . 62.4 2 Channel Time-Interleaved ADC with Gain Mismatch . . . . . . . . . . . . . 62.5 8-bit 4 Channel Time-Interleaved ADC Spectrum with Gain Mismatch (Mark-

ers: Fundamental - Diamond, Gain Distortion - Circular) . . . . . . . . . . . . 82.6 Time Interleaved ADC Structure with Front-end Sampler . . . . . . . . . . . . 82.7 2 Channel Time-Interleaved ADC with Timing Skew . . . . . . . . . . . . . . 92.8 8-bit 4 Channel Time-Interleaved ADC Spectrum with Timing Skew (Markers:

Fundamental - Diamond, Skew Distortion - Circular) . . . . . . . . . . . . . . 112.9 Statistical Bound on Timing Skew . . . . . . . . . . . . . . . . . . . . . . . . 122.10 Complete Model of Time Varying Errors of a Time Interleaved ADC (O as

offset mismatch, G as frequency dependent gain mismatch, and ∆t as frequencydependent timing skew) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.11 Calibration Schemes for Time-Interleaved ADC (Top: All Digital Scheme,Bottom: Mixed Signal Scheme) . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.12 Radix, Gain and Offset Foreground Calibration . . . . . . . . . . . . . . . . . 182.13 Crosscorrelation Maximization using Reference ADC . . . . . . . . . . . . . . 182.14 Implementation of Crosscorrelation Maximization For 8 Channel ADC . . . . 192.15 Error Estimation using First Order Taylor Series . . . . . . . . . . . . . . . . . 202.16 Implementation of Error Estimation using Bandwidth Mismatch . . . . . . . . 212.17 Sampling Sequence for 2 Channel Time-Interleaved ADC (red samples by

shifted φ0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.18 ZC Detection Implementation for 8 Channel ADC . . . . . . . . . . . . . . . . 232.19 (a) Effect of Timing Mismatch on 2 Channel ADC, and (b) Block Diagram

Implementation of Autocorrelation Approach . . . . . . . . . . . . . . . . . . 25

vii

Page 8: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

3.1 Error Generation using Clock Swapper . . . . . . . . . . . . . . . . . . . . . . 283.2 Generalization of Skew for a 2 Channel Time-Interleaved ADC . . . . . . . . . 313.3 Scale Factor K(ω) for −Ts ≤ ∆t ≤ Ts and frequencies 0 < ω <

π

Ts. . . . . . . 32

3.4 Block Diagram of Skew Calibration for 2 Channel Time Interleaved ADC . . . 323.5 Calibration Sequence A for a 4 Channel Time-Interleaved ADC (k = 2) . . . . 34

3.6 Nonlinear |K(ω)|, −Ts ≤ ∆t ≤ Ts, ω =0.45∗2π

Ts. . . . . . . . . . . . . . . . 35

3.7 Calibration Sequence B for a 4 Channel Time-Interleaved ADC . . . . . . . . . 373.8 Calibration Sequence B for a 4 Channel Time-Interleaved ADC Using φ0 As

Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.9 Block Diagram of Skew Calibration Engine . . . . . . . . . . . . . . . . . . . 413.10 Skew Calibration of Input FM Signal Using Sequence A and Proposed Cost

Function 5000 Points FFT Spectrum (Top: Before Calibration, Bottom: AfterCalibration) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.11 Delay Code for Skew Calibration of Input Signal Using Sequence A and Razavi’sCost Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.12 Delay Code for Skew Calibration of Input Signal Using Sequence B and Pro-posed Cost Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.1 Time-Interleaved ADC Architecture . . . . . . . . . . . . . . . . . . . . . . . 484.2 Time-Interleaved ADC Timing for Two Level Interleaving . . . . . . . . . . . 494.3 CML Divider for Multiphase Clock Generation . . . . . . . . . . . . . . . . . 514.4 On-chip Transmission Line and Termination . . . . . . . . . . . . . . . . . . . 524.5 Clock Distribution using H-bridge . . . . . . . . . . . . . . . . . . . . . . . . 534.6 Unit CML Buffer Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . 544.7 CML to CMOS Converter and CMOS Delay Line . . . . . . . . . . . . . . . . 554.8 Delay Code Input Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.9 Top Level Output Data Retiming . . . . . . . . . . . . . . . . . . . . . . . . . 574.10 Illustration of Decimate by 81 Sequence . . . . . . . . . . . . . . . . . . . . . 584.11 Decimator Circuit Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . 59

5.1 Time-Interleaved ADC Die Photo . . . . . . . . . . . . . . . . . . . . . . . . 615.2 Printed Circuit Board Block Diagram . . . . . . . . . . . . . . . . . . . . . . 625.3 Printed Circuit Board Photo . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.4 Test Setup for Evaluating ADC Prototype . . . . . . . . . . . . . . . . . . . . 635.5 Cost Function vs SNDR/SDR for 2 Channel System . . . . . . . . . . . . . . . 655.6 SNDR Convergence for 2 Channel System Using LMS . . . . . . . . . . . . . 665.7 Delay Code Convergence for 8 Channel System Using LMS . . . . . . . . . . 67

viii

Page 9: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

5.8 Total Channel Loss versus Input Frequency (including the effect of all onchipinterconnect and ADC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.9 Transmission Line Loss versus Frequency . . . . . . . . . . . . . . . . . . . . 685.10 Delay Code Convergence for 4 Channel System Using LMS . . . . . . . . . . 695.11 FFT Spectrum Before (Top) and After (Bottom) Calibration for 4GHz Input

2500 Points (Markers: Diamond - Fundamental, Circular - Skew, DownwardTriangle - 2nd Harmonic, Upward Triangle - 3rd Harmonic . . . . . . . . . . . 70

5.12 FFT Spectrum After Calibration for 2.5GHz Nyquist Input 2500 Points (Mark-ers: Diamond - Fundamental, Circular - Skew, Downward Triangle - 2nd Har-monic, Upward Triangle - 3rd Harmonic . . . . . . . . . . . . . . . . . . . . . 71

5.13 SNDR versus Frequency 5GS/s Time-Interleaved ADC . . . . . . . . . . . . . 725.14 High Speed ADC Performance Comparison (Markers: This Work - Star, Time-

Interleaved ADC-Diamond, Non Time-Interleaved ADC-Circular) . . . . . . . 74

A.1 Chip Pinout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

ix

Page 10: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

List of Tables

2.1 Summary of Time-Interleaved Mismatches for N sub-ADCs with Input Fre-quency of ω , k = 1, 2, ..., N-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.1 Summary of Background Timing Skew Calibration Techniques For ConvertersWith Number of Channels Greater Than 2 . . . . . . . . . . . . . . . . . . . . 46

3.2 Comparison of Proposed Cost Function and Razavi’s Cost Function Using Dif-ferent Calibration Sequences for N Channel Converter . . . . . . . . . . . . . 46

4.1 Design specification of the 1.25GS/s time-interleaved C-2C SAR ADC . . . . . 49

5.1 List of Key PCB Components . . . . . . . . . . . . . . . . . . . . . . . . . . 635.2 List of External Equipments Used . . . . . . . . . . . . . . . . . . . . . . . . 645.3 Performance Summary of Prototype ADC . . . . . . . . . . . . . . . . . . . . 735.4 ADC Performance Comparison with Other Published Works . . . . . . . . . . 75

x

Page 11: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

List of Acronyms

AAC Accumulator

AAR Accumulator with Reset

ADC Analog-to-Digital Converter

CAD Computer Aided Design

CAL Calibration

CICC Custom Integrated Circuits Conference

CML Current Mode Logic

CPW Coplanar Waveguide

CTLE Continuous-Time Linear Equalization

DAC Digital-to-Analog Converter

DDJ Data Dependent Jitter

DFE Decision-Feedback Equalization

DLL Delay Locked Loop

DNL Differential Non-Linearity

DSP Digital Signal Processing (or Processor)

DUT Device Under Test

ENOB Effective Number Of Bits

FIFO First in, First out

xi

Page 12: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

FIR Finite Impulse Response

FM Frequency Modulation

FOM Figure of Merit

Gb/s Giga-Bits per Second

GS/s Giga-Samples per Second

HCF Highest Common Factor

INL Integral Non-Linearity

ISI Inter-Symbol Interference

ILO Injection Locked Oscillator

I/O Input/Output

JSSC Journal of Solid States Circuits

LMS Least Mean Squares

LSB Least Significant Bit

MIM Metal Insulator Metal

MOSFET Metal-Oxide-Semiconductor Field-Effect Transistor

MPCG Multiphase Clock Generator

MSB Most Significant Bit

NMOS N-Channel MOSFET

PCB Printed Circuit Board

PCIe Peripheral Component Interconnect Express

PLL Phase Locked Loop

PMOS P-Channel MOSFET

PRBS Pseudo-random Binary Sequence

SAR Successive Approximation Register

xii

Page 13: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

SFDR Spurious-Free Dynamic Range

SNDR Signal-to-Noise-and-Distortion Ratio

SNR Signal-to-Noise Ratio

T&H Track and Hold

TI Time-Interleaved

TTLVHT Typical-Typical Low Voltage High Temperature

USB Universal Serial Bus

VLSI Very Large Scale Integration

VNA Vector Network Analyzer

ZC Zero-Crossing

µC Microprocessor

xiii

Page 14: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

Chapter 1

Introduction

1.1 Motivation

Analog to digital converters (ADC) are the interface between the real world and the digitalabstraction that made electronics integral to society today. At present, electronics in the formof silicon integrated circuits represent the most cost effective and efficient means of storingand processing large amounts of information. Information is transmitted both wirelessly andthrough wireline connections using copper and more recently optical links. Interconnects mayrange from a few millimetres between chips to kilometres between a base station and a user’scellphone. As the data rate increases, the interconnects themselves begin to impact the in-formation quality long before the Shannon channel capacity is reached. For instance, a wellknown standard such as Peripheral Component Interconnect Express (PCIe) has grown fromgeneration 1 at 2.5GS/s in 2003 to generation 3 at 8GS/s in 2010. PCIe generation 4 is pro-jected to arrive in 2014 at 16GS/s [1], with each generation requiring more sophisticated signalprocessing to ensure integrity of the links.

Inter-symbol interference (ISI) introduced by the communication channel must be correctedthrough equalization, and forward error correction may also be required. Digital circuit solu-tions have become more attractive for these functions compared to their analog counterpartsdue to the ease of digital abstraction, reduced circuit area, and hence cost, afforded by technol-ogy scaling, and zero static power dissipation. In addition, advanced computer-aided design(CAD) tools have enabled automatic place-and-route of digital blocks, leading to additionalcost savings. Therefore, equalization solutions are often implemented in the digital domain,where finite impulse response (FIR) [2] filters are common, and decision feedback equaliz-ers (DFE) [3] have become popular. ADC based receivers are gaining momentum as a re-sult [4] [5] [6] [7]. Most communication links function at multi-Gb/s (10GBASE-T, 40GbE,100GbE), but a single ADC cannot function at this rate. The fastest single channel CMOS

1

Page 15: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 1. INTRODUCTION 2

ADC ever published is a flash ADC functioning at 7.5GS/s [8]. As the sampling rate ap-proaches the technology limit of CMOS transistors at a particular process node, the powerdissipation increases super-linearly. Therefore it is more efficient, like multi-core computers,to exploit parallelism by using time-interleaving. A time-interleaved ADC uses multiple ADCsin parallel, where each has its sampling time shifted so that the signal is sampled uniformlyin time by different but functionally identical ADCs. The output is then multiplexed backtogether. This allows the individual ADCs to operate at a lower frequency and in general of-fers significant power savings. By operating at a lower frequency (below a given technology’sbandwidth limits), the designer is also afforded the opportunity to use different ADC architec-tures that are inherently more power efficient. Specifically, successive approximation register(SAR) converters have proven to operate with excellent figures of merit in nanoscale CMOStechnologies, but must operate well below a technology’s bandwidth limit since each conver-sion requires many clock cycles. In reality each unit ADC in the time interleaved system isunique when fabricated and this introduces time varying errors. The correction of these timevarying errors, in particular, phase or timing skew errors, is the focus of this thesis.

1.2 Outline

This thesis is organized into 6 chapters. Chapter 2 describes the time-interleaved ADC archi-tecture and the errors associated with this approach. It also provides an overview of existingcalibration solutions used to mitigate these errors. Chapter 3 presents a new timing skew cal-ibration method. It includes the derivation of a cost function and two calibration sequenceswhich are the major contributions in this thesis. Chapter 4 outlines the implementation of atime-interleaved ADC for exploration of the new skew calibration technique, including thechoice of the sub-ADC topology, time-interleaved ADC specifications, the clock distributionand output data conditioning, and top level integration. The sub-ADC used in the time inter-leaved structure was designed by another Master’s student Qiwei Wang [9]. Chapter 5 show-cases the measurement results, focusing on the proposed skew calibration performance. Finallychapter 6 draws conclusions and recommendations from this work.

Page 16: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

Chapter 2

Time Interleaved ADC Overview

The first time interleaved (TI) ADC was proposed by Black in 1980, functioning at speed of2.5MS/s [10]. Since then multi-GS/s TI ADCs have become common in the open literature.TI ADCs operate by using several ADCs, referred to here as sub-ADCs, in parallel. Figure2.1 shows a time-interleaved ADC with N sub-ADCs. The input is sampled sequentially anduniformly in time starting with sub-ADC 0 and ending with sub-ADC N−1, then cycling backto sub-ADC 0 again repetitively. The sampling rate of each sub-ADC is fs/N, where fs isthe aggregate sampling rate of the TI ADC. Ideally the sampling edges φ1 to φN−1 are evenlyspaced at a spacing of Ts. The output of the ith sub-ADC is given by

yi[n] = x(ti[n]) = x([nN + i]Ts) (2.1)

The sub-ADC outputs yi[n] are multiplexed to create y[n] such that the ideal TI ADC output isequivalent to sampling the input with a single ADC.

y[n] = yi[(n− i)/N] where i = n mod N

= x([n− i+ i]Ts)

= x(nTs)

= x(t)|t=nTs (2.2)

In reality the sub-ADCs are not identical nor are the clocks that define the sampling instantsnecessarily evenly spaced. This causes the TI architecture to experience several types of errorsincluding:

1. Offset Mismatch

2. Gain Mismatch

3

Page 17: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 4

sub-ADC 0

@ fS/N

sub-ADC 1

@ fS/N

sub-ADC N-1

@ fS/N

x(t) y[n]

@fS@fS

f0(t)

f1(t)

fN-1(t)

Figure 2.1: Time-Interleaved ADC

3. Timing Skew (Phase Mismatch)

4. Bandwidth Mismatch

In addition to these errors, circuits in general experience performance degradation due to jit-ter. This chapter examines each of these errors to determine their impact on the the TI ADCperformance.

2.1 Time Interleaved ADC Errors

2.1.1 Offset Mismatch

For simplicity, consider two sub-ADCs interleaved together as shown in figure 2.2. Sub-ADC0 has an offset of o0 and sub-ADC 1 has an offset of o1. This can be caused by a mismatch inthe comparator thresholds in flash TI sub-ADCs for instance. The effect of offset is to create anon-zero output signal for a zero input signal. The aggregate sampling rate of the TI ADC isfs and Ts = 1/ fs, ωs = 2π fs. Given a sinusoidal input x(t), if quantization noise is neglected,the outputs of the sub-ADCs will be

y0[n] = cos(ωnTs +θ)+o0 n = even

y1[n] = cos(ωnTs +θ)+o1 n = odd (2.3)

Page 18: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 5

sub-ADC 0

@ fS/2

sub-ADC 1

@ fS/2x(t) y[n]

f0(t)

f1(t)

o0

o1

f0(t)

f1(t)

2Ts

Figure 2.2: 2 Channel Time-Interleaved ADC with Offset Mismatch

Given that (−1)n = cos(nπ) = cos[(ωsnTs)/2] and defining ∆o =12(o0 − o1) and o =

12(o0 +o1), these two equations can be combined to obtain

y[n] = cos(ωnTs +θ)+o+(−1)n∆o

= cos(ωnTs +θ)+o+∆ocos(

ωs

2nTs

)(2.4)

From this result, it becomes clear that the mismatched offset between the sub-ADCs in-troduces a DC tone and a tone at

ωs

2. The DC tone is simply the mean offset of the two

channels and the amplitude of the high frequency tone is proportional to the mismatch betweenthe two offsets. The error introduced is however independent of the input frequency ω . It canbe shown [11] that for N sub-ADCs, in addition to a DC tone, the tones generated will be atfrequencies

kN

ωs k = 1, ...,N−1 (2.5)

Figure 2.3 illustrates the impact of offset mismatch for a 4 channel ADC. The number of points

in the FFT is 1000. The fundamental is at a frequency off s10

as denoted by the diamond marker.

The distortion tones, denoted by the circular markers, are generated atf s2

,f s4

and DC.

2.1.2 Gain Mismatch

Gain can be defined as the slope of the linear input to output transfer characteristic of an ADC.For instance in flash ADCs, a pre-amplifier is usually included before the comparators to reduceinput-referred offset and kickback. Mismatch in the common mode of this amplifier between

Page 19: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 6

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5−80

−70

−60

−50

−40

−30

−20

−10

0

Normalized Frequency (f/fs)

Am

plitu

de (

dBF

S)

Figure 2.3: 8-bit 4 Channel Time-Interleaved ADC Spectrum with Offset Mismatch (Markers:Fundamental - Diamond, Offset Distortion - Circular)

different comparators creates gain mismatch. Consider again two sub-ADCs with an inputsinusoid as shown in figure 2.4. The outputs with gain mismatch coefficients G0 and G1 willbe

y0[n] = G0cos(ωnTs +θ) n = even

y1[n] = G1cos(ωnTs +θ) n = odd (2.6)

sub-ADC 0

@ fS/2

sub-ADC 1

@ fS/2x(t) y[n]

f0(t)

f1(t)

G0

G1

f0(t)

f1(t)

2Ts

Figure 2.4: 2 Channel Time-Interleaved ADC with Gain Mismatch

Page 20: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 7

Using the same steps from the calculation of offset mismatch and defining ∆G =12(G0−

G1) and G =12(G0 +G1), the final interleaved output y[n] can be shown to be

y[n] = [G+(−1)n∆G]cos(ωnTs +θ)

=[G+∆Gcos

(ωs

2nTs

)]cos(ωnTs +θ) (2.7)

This equation can be simplified to show only the terms within the Nyquist band

y[n] = Gcos(ωnTs +θ)+∆Gcos[(

ω− ωs

2

)nTs +θ

](2.8)

The input tone is scaled by G and a distortion term appears at ω− ωs

2. The location of this tone

is dependent on the input frequency ω , however its magnitude does not depend on ω . Thisis similar to AM modulation, where side-bands are created around the carrier frequency ωc atωc±ω . For N sub-ADCs, the distortion terms will be located at [11]

±ω +kN

ωs k = 1, ...,N−1 (2.9)

Figure 2.5 illustrates the impact of gain mismatch for a 4 channel ADC. The number ofpoints in the FFT is 1000. The fundamental, denoted by a diamond marker, is at a frequency off s10

. The distortion tones, denoted by circular markers, are at locations7

20f s,

25

f s, and3

20f s.

2.1.3 Timing Skew

For a TI ADC at a sampling rate of fs, N sub-ADCs each sample at fs/N. Assuming eachsub-ADC has its own track and hold, the clocks to the sub-ADCs are evenly spaced at Ts

apart. If a front-end track and hold sampler is used, as shown in red in figure 2.6, then thesampling points are perfectly defined for all sub-ADCs and timing skew would not be present.However, designing a front-end track and hold at multi-GHz operation with good linearity isvery difficult and usually avoided. If the sampling instants are not uniformly spaced in time,as a generalization of Shannon-Nyquist sampling theorem, the signal is still reconstructiblewith an appropriate set of FIR filters [12]. However, distortions are introduced when practicalreconstruction is used [11].

Consider two sub-ADCs, where the clock to one sub-ADC is skewed by ∆t as shown infigure 2.7.

Page 21: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 8

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5−80

−70

−60

−50

−40

−30

−20

−10

0

Normalized Frequency (f/fs)

Am

plitu

de (

dBF

S)

Figure 2.5: 8-bit 4 Channel Time-Interleaved ADC Spectrum with Gain Mismatch (Markers:Fundamental - Diamond, Gain Distortion - Circular)

x(t)

sub-ADC 0

sub-ADC 1

sub-ADC N-1

fs

fs/N

fs/N

fs/N

MUX

y[n]

Figure 2.6: Time Interleaved ADC Structure with Front-end Sampler

Page 22: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 9

y0[n] = cos(ωnTs +θ) n = even

y1[n] = cos(ωt +θ)|t=nT+∆t n = odd (2.10)

sub-ADC 0

@ fS/2

sub-ADC 1

@ fS/2x(t) y[n]

f0(t)

f1(t)

f0(t)

f1(t)

2Ts

Dt

Figure 2.7: 2 Channel Time-Interleaved ADC with Timing Skew

Without loss of generality, assume that θ = 0, the two equations above can then be com-bined to yield y[n] in the form of

y[n] = cos[

ω

(nTs +

∆t2− (−1)n ∆t

2

)](2.11)

The cosine can be expanded using the trigonometric relationship cos(α−β )= cos(α)cos(β )+

sin(α)sin(β ) to obtain

y[n] = cos[

ω

(nTs +

∆t2

)]cos[(−1)n ω∆t

2

]+ sin

(nTs +

∆t2

)]sin[(−1)n ω∆t

2

](2.12)

Noting that cos[(−1)nγ] = cos(γ) and sin[(−1)nγ] = cos(nπ)sin(γ) = sin(γ − nπ), equation

Page 23: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 10

(2.12) can be simplified [13] to

y[n] = cos[

ω

(nTs +

∆t2

)]cos(

ω∆t2

)+ sin

(nTs +

∆t2

)]cos(nπ)sin

(ω∆t

2

)= cos

(nTs +

∆t2

)]cos(

ω∆t2

)+ sin

(nTs +

∆t2

)−nπ

]sin(

ω∆t2

)= cos

(nTs +

∆t2

)]cos(

ω∆t2

)+ sin

(nTs +

∆t2

)− ωsnTs

2

]sin(

ω∆t2

)= cos

(nTs +

∆t2

)]cos(

ω∆t2

)︸ ︷︷ ︸

fundamental

+sin[(

ω− ωs

2

)nTs +

ω∆t2

]sin(

ω∆t2

)︸ ︷︷ ︸

distortion

(2.13)

The first term in equation (2.13) is the fundamental tone that has been phase shifted and ampli-

tude modulated. If the skew ∆t is small then cos(

ω∆t2

)≈ 1, making these effects negligible.

Next comparing equation (2.13) to equation (2.8), the distortion term here is at the same fre-quency as the term generated by gain mismatch but with a 90◦ phase shift. However, in additionto the frequency location, the amplitude of the distortion also depends on the input frequencyω . This is true intuitively since for slower signals the deviation of the sampling point will in-troduce a smaller error compared to faster signals. For N sub-ADCs, the distortion terms willbe located at [11]

±ω +kN

ωs k = 1, ...,N−1 (2.14)

Figure 2.8 illustrates the impact of timing skew for a 4 channel ADC. The number of points

in the FFT is 1000. The fundamental, denoted by a diamond marker, is at a frequency off s10

.

The distortion tones, denoted by circular markers, are at locations7

20f s,

25

f s, and3

20f s.

The speed of the signal is also captured in the auto-correlation function R(τ) for a wide-sense stationary process. It is possible to derive statistical bounds on the skew as done in [14].Consider the out put of the TI ADC y[n] which can be separated into two terms, xo[n], and aresidue error term e[n].

y[n] = xo[n]+ e[n] (2.15)

The term xo[n] is an uniformly sampled version of x(t) that is the best fit to the output, with askew of ∆t. In general there will be an optimal ∆t given a N channel converter, each channelhaving a different skew value ∆ti.

xo[n] = x(nTs− ∆t) (2.16)

Page 24: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 11

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5−80

−70

−60

−50

−40

−30

−20

−10

0

Normalized Frequency (f/fs)

Am

plitu

de (

dBF

S)

Figure 2.8: 8-bit 4 Channel Time-Interleaved ADC Spectrum with Timing Skew (Markers:Fundamental - Diamond, Skew Distortion - Circular)

To find this optimal value, the mean square error defined by E[e[n]2]−E[e[n]]2 is minimized,which is when the autocorrelation is maximized as shown in equation (2.17) where ∆ti againare the skews of each sub-ADC.

∆t = argmax∆t

N−1

∑i=0

R(∆ti−∆t) (2.17)

Given that the signal-to-noise ratio (SNR) due to B-bit quantization is SNR =32(22B), it can

be shown [14] the variance of timing skew must be bounded by equation (2.18) for a B-bitN-channel time-interleaved ADC.

σ2∆t ≤

(N

N−1

)(2

3(22B)|R′′(0)|

)(2.18)

The second derivative of the autocorrelation function R′′(0) is equal to−(2π f )2 for a sinusoidalinput signal of frequency f and the bound becomes

σ2∆t ≤

(N

N−1

)(2

3(22B)(2π f )2

)(2.19)

This result is intuitive as the requirement on skew tightens as the number of bits, interleavedchannels, or input frequency increases. Figure 2.9 shows a plot of the standard deviation σ∆t as

Page 25: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 12

a function of input frequency and ADC resolution for an 8 channel time-interleaved ADC. Notethat sub-picosecond standard devation for skew is generally required for high speed (> 5GHz)and high resolution (> 5 bits) ADCs . For an ADC resolution of 6 bits, the standard deviationmust be less than 434fs for 5GHz and 217fs for 10GHz input. For an ADC resolution of 8 bits,the standard deviation must be less than 108.5fs for 5GHz and 54.3fs for 10GHz input.

2 3 4 5 6 7 8

10−1

100

101

ADC Resolution [Bits]

Sta

ndar

d D

evia

tion

of S

kew

(ps

)

f = 2.5 GHzf = 5 GHzf = 7.5 GHzf = 10 GHz

Figure 2.9: Statistical Bound on Timing Skew

2.1.4 Bandwidth Mismatch

Bandwidth mismatch results from the use of distributed track and hold samplers. If the frontendsampler did not exist in figure 2.6, then each sub-ADC would have its own individual tack andhold, which in general are not identical. Bandwidth mismatch can most easily be thought ofas a combination of AC gain and phase mismatch/timing skew. This implies that the distortiongenerated will be at the same frequencies as listed above for gain and timing skew. Indeed, fora 2 channel ADC, the interleaved output is given by equation (2.20) [15] where the distortionappears at

ωs

2−ω .

y[n] = Bscos(ωnTs +Ts +θs)+Bncos[(

ωs

2−ω

)nTs +Ts +θn

](2.20)

Just like timing skew, the degradation is much worse at high frequency. In general, it is con-sidered a second order effect since it can be remedied by careful design of the track and hold

Page 26: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 13

in front of the sub-ADC. Calibration can also be performed using a small test signal and FIRfilters [15] [16]. Bandwidth mismatch is compensated in this work by performing gain andtiming skew calibration at each frequency point.

2.1.5 Jitter

Sampling time uncertainty depends both on skew and jitter. While skew is a deterministic errorin the sampling point, jitter is a random process that is modelled by a Gaussian distribution.In general, measured jitter profiles need not be Gaussian as deterministic jitter resulting fromsystematic errors in the system change the distribution. Data dependent jitter (DDJ),duty cycledistortion jitter, and sinusoidal jitter are other types of deterministic jitter that are modelledthrough probability density fitting [17]. All electronic systems experience jitter and it is nota problem unique to time-interleaved ADC. When designing oscillators, designers evaluatethe performance based on phase noise, which is the power spectral density of jitter. Phasenoise analysis is complex and for an integrated circuit oscillator is comprised primarily of

thermal noise and up-converted1f

flicker noise around the oscillator fundamental frequency. A

clock’s Gaussian jitter distribution is completely defined by its variance (since it may alwaysbe normalized to have a mean of zero). The bound on jitter can be derived using equation(2.19) since it is equivalent to having an infinite number of time-interleaved ADCs sampling atan infinite number of skewed sampling points [14]. The expression is therefore equation (2.21)shown below as verified in [18].

σ2 ≤

(2

3(22B)(2π f )2

)(2.21)

To achieve an ADC resolution of 6 bits, the standard deviation of jitter must be less than 406fsfor 5GHz and 203fs for 10GHz inputs.

2.1.6 Summary

Table 2.1 summarizes the effects of different types of mismatch on time-interleaved ADC per-formance. Since the magnitude of the error caused by gain and offset mismatch is independentof input frequency, they are referred to as static mismatches. Static mismatches are generallyeasy to correct as shown in the next section. Timing skew on the other hand is difficult tocorrect as the error grows with input frequency, making it especially important for broadbandtime-interleaved ADCs. The model in figure 2.10 can be used to include all the effects de-scribed in this section. The offset mismatch can be modelled by an addition of a DC term O,the gain mismatch as a multiplicative term G and the timing skew as a shift in the sampling

Page 27: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 14

Table 2.1: Summary of Time-Interleaved Mismatches for N sub-ADCs with Input Frequencyof ω , k = 1, 2, ..., N-1

Type of Mismatch Distortion Frequency Dependency on Dependency onInput Magnitude Input Frequency

Offset kN ωs Independent Independent

Gain kN ωs ± ω Linearly dependent Independent

Timing Skew kN ωs ± ω Linearly dependent Linearly dependent

Bandwidth kN ωs ± ω Nonlinearly dependent Nonlinearly dependent

time position of ∆t. Bandwidth mismatch has been modelled by including an input frequencyf dependence on G and ∆t.

2.2 Calibration of Time Interleaved ADC

In general even with careful design it is impossible to constrain the effects of time varyingerrors within the limits of the design specifications. For instance it is impossible to guaranteea sub-picosecond timing skew standard deviation given process and layout mismatches in theclock paths in sub-micron CMOS processes to achieve 5 bit resolution at 5GS/s. Therefore cal-ibration is necessary for a time-interleaved ADC. Calibration schemes can be grouped into twocategories: background and foreground calibration. For background calibration schemes, theADC is able to function normally while the calibration is taking place. Foreground calibrationon the other hand interrupts the normal operation, for instance by requiring a special input to beapplied. Although common foreground calibration can be performed at start-up, such as offsetcalibration for differential comparators by shorting the input terminals, in general backgroundcalibration is preferred as it can be performed over time as the circuit behaviour changes. Cal-ibration can be done completely in the digital domain or using a mixed-signal approach wheredetection is performed in the digital domain and correction is applied to analog/mixed signalcomponents in the circuit as shown in figure 2.11 in the context of TI ADCs. This sectionoutlines the calibration schemes for gain, offset and timing skew mismatch. It also briefly ex-amines nonlinearity correction in the context of the successive approximation register (SAR)ADC implemented in this project. The rest of this section is dedicated to existing timing skewcorrection schemes.

Page 28: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 15

sub-ADC 0

@ fS/N

sub-ADC 1

@ fS/N

x(t) y[n]

f0[t+Dt0(f)]

G0(f)

G1(f)

O0

O1

f1[t+Dt1(f)]

sub-ADC N-1

@ fS/N

GN-1(f) ON-1

fN-1[t+DtN-1(f)]

@fS@fS

Figure 2.10: Complete Model of Time Varying Errors of a Time Interleaved ADC (O as offsetmismatch, G as frequency dependent gain mismatch, and ∆t as frequency dependent timingskew)

Page 29: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 16

sub-ADC 0

sub-ADC 1

x(t)

y0[n]Correction

Correction

Detection

y1[n]

Digital

sub-ADC 0

sub-ADC 1

x(t)

y0[n]

Detection

y1[n]

Digital

Figure 2.11: Calibration Schemes for Time-Interleaved ADC (Top: All Digital Scheme, Bot-tom: Mixed Signal Scheme)

Page 30: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 17

2.2.1 Calibration of Static Errors

Recall that gain and offset mismatches are considered static errors due to their independenceon input frequency. Detecting the offset is simple as it’s equivalent to estimating the mean ofeach channel (assuming a zero mean input signal). Correction is then done by subtracting themean so that each channel has zero offset. Similarly gain errors can be detected by estimatingthe signal power [y(n)]2 . Several implementations are available in literature [19] [20] [21].In addition to gain and offset mismatch, non-linearity, characterized using differential non-linearity (DNL) and integral non-linearity (INL) metrics, also impacts the performance ofADCs. Specifically in SAR ADCs, non-linearity in the radix or base is caused by parasiticcapacitance in the capacitive DAC array. Each bit is no longer weighted by a perfect power of2 in a radix-2 converter. In this work a simple foreground approach is chosen to correct theseerrors. This approach is taken from [22] and illustrated in figure 2.12. After applying a knownsinusoidal input to each ADC channel, the gain G, offset o, and phase information θ are ex-tracted from the fast Fourier transform (FFT) spectrum. A sinusoid is then reconstructed withthis information with an ideal radix-2 weighting - i.e. least significant bit (LSB) is weightedby 20 and most significant bit (MSB) is weighted by 2N−1 for N-bit quantization. This result iscompared to the actual output and the weights α0...N−1 are adjusted using a least mean squares(LMS) algorithm. If the ADC were ideal all weights α0...N−1 will converge to 1.

2.2.2 Overview of Calibration Techniques for Timing Skew

Timing skew can be corrected using digital means by finite impulse response (FIR) filters [20][23] [24]. In [20] this required a significant area of 5 mm2 in 0.35-µm CMOS process and alsorequired the adaptive filters to run at full speed, therefore consuming 190mW, comparable tothe analog blocks themselves which consume 171mW. To simplify the digital backend, severalmixed signal approaches have been proposed. Generally a cost function is used to change thephase of the clock by using a delay in the clock path. As long as the extra delay buffers inserteddo not contribute additional phase noise then the performance impact will be minimal.

El-Chammmas Crosscorrelation Approach

Equation (2.17) indicated that maximizing the SNR is equivalent to maximizing the autocor-relation function. However since the autocorrelation of the input signal can not be computedusing only sub-ADC outputs, El-Chammas’ scheme in [25] uses an extra ADC channel tocompute the crosscorrelation. Figure 2.13 illustrates the scheme in detail.

The reference ADC referred to as the calibration (CAL) ADC is used to compute an ap-proximation of the crosscorrelation R(τ) given by equation (2.22) where y[n] is the output of

Page 31: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 18

ADC

LMS

Adaptation

MSB

MSB-1

LSB a0

aN-2

aN-1

Reconstruct Ideal

Sinusoidal Input

+

_

FFT Parameter

Extraction (G, q, o)

20

2N-2

2N-1

N Bits

Figure 2.12: Radix, Gain and Offset Foreground Calibration

CAL

ADC

x(t)

Digital

AVGLogic

fcal

f

fa fcal

f

tt

t

fa

R(t)

R(t)

Figure 2.13: Crosscorrelation Maximization using Reference ADC

Page 32: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 19

the actual ADC and yc[n] is the output of the CAL ADC. Note that if yc[n] is taken at sam-pling time kTs, then y[n] is taken at sampling time kTs + τ . The number of samples used in thiscomputation is M.

R(τ) =1M

M

∑n=1

y[n]yc[n] (2.22)

Note that the CAL ADC is provided with a clock φcal which is an ideal clock that the clock φ

must align to. The output of the digital backend detection is used to tune an analog delay suchthat the edge φ advances to φa as the crosscorrelation is maximized. The CAL ADC can in fact

be a comparator since the crosscorrelation is simply modified to be2π

sin−1(R(τ)) accordingto the Van Vleck relationship [26], which is still monotonic. It is however more susceptible tooffset which introduces a flat region [25]. Another advantage is that the digital backend canoperate at a reduced speed, thereby saving power. A full implementation using 8 channels isshown in figure 2.14. The 8 phases are generated by a multiphase clock generator (MPCG) and

CAL

sub-ADC 0

Off-Chip

AVGLogic

fcal

f0

f0a

R(t)

Clock

Generator

MPCG

f7

sub-ADC 1

sub-ADC 7

Vin

MU

X

f7a

Figure 2.14: Implementation of Crosscorrelation Maximization For 8 Channel ADC

Page 33: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 20

are individually controlled using delay buffers. The generation of φcal introduces extra circuityin the form of a phase locked loop (PLL) or an equivalent clock generator since the referencephases must be almost ideal/skew-less. This is easier to accomplish as the phases can be locallygenerated next to the CAL comparator at a slower speed. The clock φcal must switch betweenthe edges of all sub-ADCs φ0 to φ7 while the calibration keeps track of which crosscorrelationit’s computing. In other words the reference CAL samples together with channel 1, then chan-nel 2 and so on. Since the calibration is based on crosscorrelation, the accuracy depends on thenumber of samples M. El-Chammas demonstrates the loop adapting using a steepest descentoptimizer which shows convergence in 20 cycles, each consisting of 500,000 samples, hence107 total samples. Note that the presence of the additional CAL ADC may introduce additionalerrors into the system such as kickback when it is active.

Stepanovic Taylor Approximation Approach

Stepanovic’s approach [27] is to use a first order Taylor expansion to approximate the errorusing the function’s derivative as shown in figure 2.15. If an estimate of the derivative D isavailable then the timing skew ∆t between the ideal clock edge and actual clock edge can beestimated by dividing the error by D.

e -DDt

tideal tCLK

Dt

t

Vin

Figure 2.15: Error Estimation using First Order Taylor Series

The ADC prototype uses 24 time-interleaved SAR ADCs. In order to estimate the deriva-tive and calibrate for non-linearity effects, two extra channels SARt and SAR0 are added tothe time-interleaved ADC. The idea is illustrated in figure 2.16 [27]. Similar to El-Chammas’

approach, the clock to the reference channel SAR0 is at a frequency offs

N +1while all other

channels sample atfs

Nso that the reference SAR0 samples together with channel 1, then chan-

nel 2 and so on. The function of SARt, which always samples together with SAR0, is to

Page 34: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 21

generate the derivative estimation by introducing a bandwidth mismatch. The bandwidth mis-match is implemented as an addition of ∆R in the signal path. The transfer function D(s) ofthe subtraction of two voltages going through two paths of different bandwidth includes a zeroat ω = 0 therefore approximating the derivative. The two high frequency poles in the trans-fer function introduces error in the estimation but does not degrade the performance. Once thederviative is known, a LMS engine drives an analog delay ∆tx to correct the skew of SARx. Thecalibration convergence time is improved as it does not require a crosscorrelation computationat the expense of an additional channel SARt. In this case SARt must have the same resolu-tion as all the other SAR channels. Since this approach utilizes the derivative information, thederivative should be stationary for convergence.

VOUT1

VOUT1

VIN

R + DR

R

C

C

SARt

SAR0

fcal

SARx

VIN +

-

+

-

LMS

Engine

D

e

fx

DtxD(s) = VOUT2(s) – VOUT1(s) = sCDR

(1+sCR)(1+sC(R+DR))

DR

Figure 2.16: Implementation of Error Estimation using Bandwidth Mismatch

Huang and Wang Zero Crossing Detection Approach

The approach [28] [29] is to use zero crossing information from the sub-ADC outputs to esti-mate the skew. The advantage is that no extra ADC channels are needed. Consider a 2 channelADC sampled by 2 phases φ0 and φ1 to yield x0[k] to x1[k] as shown in figure 2.17. A zero-crossing (ZC) occurs if the polarity of the signal switches such as between x0[1] and x1[1]. Itcan be shown [28] the probability of ZC between two adjacent samples x j[k] and x j+1[k] isproportional to the skew ∆t and Ts. Zero crossing information can be considered as a 1 bitcorrelation between adjacent samples. For instance, consider again the 2 channel system, thenumber of zero crossings are counted between odd samples sampled by φ0 and even samples

Page 35: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 22

sampled by φ1. Then the number of zero crossings are counted between even samples and oddsamples. If these two values are equal, both 1 as shown by the change in polarity of blacksamples in the figure, then the skew is zero. However if φ0 is shifted so that the input is nolonger sampled uniformly, as shown by samples in red, then the number of zero crossings willnot be equal. In the case illustrated in the figure, the number of zero crossings is 2 between oddsamples and even samples, and 0 between even samples and odd samples.

0t

x(t)

x0[0]x1[0]

x0[1]

x1[1]

x0[2]

x1[2]

x0[3]

x1[3]

f0 f1 f0 f1 f0 f1 f0 f1

Figure 2.17: Sampling Sequence for 2 Channel Time-Interleaved ADC (red samples by shiftedφ0)

A full implementation of the system is shown in figure 2.18 [28] for an 8 channel ADC. TheZC detector detects ZC events z j[k] which are compared to m[k], the difference of two denotedby U [k] is accumulated. The sequence m[k] is generated by a ZC recorder which adds all ZCfrom channel 1 to channel 8 such that the average of this sequence represents the nominalsampling interval. The signal U [k] is first accumulated by an accumulator with reset (AAR)into three levels [−1,+1,0] when the thresholds are [≤ Nc,≥ Nc,otherwise]. The result S[k] isaccumulated again by accumulator ACC to generated Tj[k] which controls the delay of the jth

analog clock buffer. The channels are calibrated sequentially with one channel functioning asreference. For instance channel 0 is used as the starting reference and channel 1 is calibratedto minimize the skew between itself and channel 0. Then channel 1 is used as the reference tocalibrate channel 2 and so on. Since this scheme depends on the probability of ZC, it requiresa significant number of samples. In the example given in [29] the calibration required 224

samples to converge. In addition the ZC probability depends on the input frequency such thatthe calibration fails for certain frequencies when the clock becomes synchronous to the data,

that isfin

fclk=

ab

, where a and b are mutually prime integers [28].

Page 36: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 23

ZC

Detector

ZC

Detector

f7 Calibration Channel

x7[k-1] ZC

Detectorx0[k]

x1[k]

x2[k]

x7[k]

ZC

Recorder

AAR

S

ACC

S

z7[k]

z0[k]

m[k]

U0[k] S0[k]T1[k]

AAR

S

ACC

S

z1[k]

m[k]

U1[k] S1[k]T2[k]

T7[k]

T0[k]0Reference Channel

z0[k]

z7[k]

m[k]

LMS Update: Dtj[k] = Dtj,0 + mTj[k]

-

-

Figure 2.18: ZC Detection Implementation for 8 Channel ADC

Page 37: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 24

Razavi Autocorrelation Approach

The work by Razavi was first published in Custom Integrated Circuits Conference (CICC) inSeptember 2012 [30] and later appeared in Journal of Solid States Circuits (JSSC) in 2013 [31].The background mixed signal calibration approach also uses a correlation, however it onlyneeds to use the existing ADC channels. Consider a 2 channel ADC which has a samplingsequence as shown in figure 2.19a. The sampling period is Ts and each phase has a periodof TCK = 2Ts. Suppose that a skew ∆t exists so that the phase φ1 is too slow. Then the timedifference between the even sample y1[k− 1] and previous odd sample y0[k− 1] is Ts + ∆t

and between the even sample y1[k− 1] and next odd sample y0[k] is Ts−∆t. Consider theproducts given below and their equivalent expressions when the sample time of y1[k−1] is setto reference time 0.

g0,1 = y0[k]y1[k−1]

= x(Ts−∆t)x(0)

g1,0 = y1[k−1]y0[k−1]

= x(0)x[−(Ts +∆t)] (2.23)

It’s now clear that that the mean or expected values E[g0,1] and E[g1,0] are the values of theautocorrelation function Rx(τ) evaluated at Ts−∆t and−(Ts+∆t) respectively. The subtractionof these two values is a function of the skew ∆t and a linear function if ∆t is small as shown inequation (2.24).

E[g0,1]−E[g1,0] = Rx(Ts−∆t)−Rx[−(Ts +∆t)]

= Rx(Ts−∆t)−Rx(Ts +∆t)

≈−2∆tdRx(Ts)

(2.24)

This difference can be used as a cost function as shown in block diagram implementation figure2.19b. Although Razavi didn’t expand this approach to ADCs with more than 2 channels, itcan be used for ADCs with larger number of channels as explained later in this chapter.

Page 38: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 2. TIME INTERLEAVED ADC OVERVIEW 25

t

x(t)

y0[k-1]

y1[k-1]

y0[k]

f0 f1f0

Ts - DtTs + Dt

Odd

Sample

Odd

Sample

Even

Sample

sub-ADC 0

sub-ADC 1

x(t)f0(t)

f1(t)

gz-1

z-1

y1

y0

g0,1

g1,0

a)

b)

+

-

Figure 2.19: (a) Effect of Timing Mismatch on 2 Channel ADC, and (b) Block Diagram Im-plementation of Autocorrelation Approach

Page 39: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

Chapter 3

Proposed Timing Skew CalibrationTechnique

The previous chapter introduced several methods of timing skew calibration. In general theapproach is to estimate the error with a method, such as a cost function, and then correctit by applying a delay to the appropriate clock path. The approaches taken by El-Chammasand Stepanovic both required extra reference ADC channel(s). The reference channel alsoneeded a skew-less clock generator, thus requiring additional analog circuitry. The work byHuang does not require an extra channel but has a long convergence time due to the use ofzero crossing information. The calibration scheme described in this chapter is proposed as abackground mixed-signal approach utilizing only the existing ADC channels while achievingfast convergence time. It is very similar to Razavi’s approach. The proposed approach wasarrived at independently given that Razavi’s work was unpublished at the time. Key differenceswill be noted between the proposed approach and Razavi’s approach. An extension of both theproposed approach and Razavi’s approach to converters with more than 2 channels is madeusing two different calibration sequences A and B.

3.1 Cost Function Construction using Clock Swapper

The first order approximation of the Taylor series expansion for a function f (t) around t0is given in equation (3.1), where D represents the first derivative, D′ represents the secondderivative and so on. This is utilized in deriving the cost function for this approach.

f (t) = f (t0)+ [D(t0)](t− t0)+12[D′(t0)](t− t0)2 + · · ·

≈ f (t0)+ [D(t0)](t− t0) (3.1)

26

Page 40: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 27

Note that the skew in general should be much less than one sampling clock cycle given reason-able circuit design.

Consider a 2 channel time interleaved ADC and suppose there existed a clock swapperblock that allowed the clock phases to the ADC channels 0 and 1 to be exchanged as shown infigure 3.1. In one case, ADC0 samples first, and in the other case ADC1 samples first. Thereexists a skew ∆t in the sampling path of ADC1. However by using the clock swapper, this skewwill shift the sampling time of ADC0 or ADC1 depending on the swapping sequence. In case 1(figure 3.1a), ADC0 samples before ADC1 at kTs then ADC1 samples at (k+1)Ts +∆t. In case2 (figure 3.1b), ADC1 samples first at jTs +∆t then ADC0 samples at ( j+1)Ts.

Following the Taylor expansion evaluated at the sampling time t0 of the 0◦ clock, the errorsf (t)− f (t0) will be

e1 = f (t)− f (t0)

= [D(t0)](t− t0)

= [D(kTs)]((k+1)Ts +∆t− kTs)

= [D(kTs)](Ts +∆t)

= D1(Ts +∆t)

e2 = f (t)− f (t0)

= [D(t0)](t− t0)

= [D( jTs +∆t)](( j+1)Ts− ( jTs +∆t))

= [D( jTs +∆t)](Ts−∆t)

= D2(Ts−∆t) (3.2)

where D1 and D2 are the derivatives evaluated at time kTs and jTs + ∆t respectively. Thiseliminates the need for an additional ADC channel where an ideal clock is used as reference.Consider the cost function given below in equation (3.3) where the operator E denotes theexpectation or time average for an ergodic process. If E[|D1|] = E[|D2|] =C, a constant, thenthe constraint reduces to a linear function in terms of skew.

g = E[|e1|]−E[|e2|]

= E[|D1(Ts +∆t)|]−E[|D2(Ts−∆t)|] ∆t << Ts

= (Ts +∆t)E[|D1|]− (Ts−∆t)E[|D2|] E[|D1|] = E[|D2|] =C

= 2∆tC (3.3)

Page 41: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 28

Dt

ADC0

ADC1

a)

e D1(Ts + Dt)

Dt

ADC0

ADC1

CLKX

jTs+ Dt

0o

180o

0o

180o

kTs

(k+1)Ts + Dt

(j+1)Ts

b)

e D2(Ts - Dt)

CLKX

tTs + Dt

kTs (k+1)Ts + Dt

tTs - Dt

jTs + Dt (j+1)Ts

0o

180o

0o

180o

Figure 3.1: Error Generation using Clock Swapper

Page 42: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 29

Assuming ergodicity, the values of E[|D1|] and E[|D2|] are the M sample means given by

E[|D1|] =1M

M

∑k=1|D(kTs)|

E[|D2|] =1M

M

∑j=1|D( jTs +∆t)| (3.4)

and the constant value criterion implies that the derivatives should be stationary.Assuming all the higher order derivatives exist and are stationary, the remaining terms of

the Taylor expansion will be of the form (x+ y)n− (x− y)n where x = Ts and y = ∆t. This canbe expanded using the binomial theorem.

(x+ y)n− (x− y)n =n

∑k=0

(nk

)xn−kyk−

n

∑k=0

(nk

)xn−k(−1)kyk

=n

∑k=0

(nk

)xn−k(yk− (−1)kyk)

= 1+2bn/2c

∑k=0

(n

2k+1

)x[n−(2k+1)]y(2k+1) (3.5)

Equation (3.5) guarantees that this remainder is an odd function of y or ∆t in this casesince it is an odd powered polynomial. This means the cost function is an odd function andtherefore, even considering the error terms, always equal to 0 when the skew is 0. For small ∆t

the absolute value of the cost function |g| is thus an even function that has a single minimumat ∆t = 0.

This is equivalent to Razavi’s correlation approach in the sense that two skewed versionsare created and their difference is used as the cost function. In Razavi’s case the two skewedversions are the autocorrelation function at Ts+∆t and Ts−∆t. Thus a DC term proportional tothe skew is generated by multiplying/mixing the appropriate samples together. In the proposedskew calibration, a DC term proportional to the skew is created by using the absolute difference.Minimizing this cost function will reduce the skew to 0 if the clock swapper itself were skew-less. However this is not realistic in hardware implementation. In addition switching clocksgenerates transients that may impact the performance of the track and hold and therefore theADC. The approach to remove the clock-swapper is described next.

Page 43: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 30

3.2 Cost Function Construction using Existing Phases

The purpose of this section is to show that the existing cost function can be used without a clockswapper. This in turn will lead to the natural extension of the cost function for calibrating skewof time interleaved ADCs with more than 2 channels.

For a sinusoidal function of frequency ω , f (t) = cos(ωt), consider the generic error termsgiven in equation (3.6) where α and β are timing skews.

e1 = f (t)− f (t +α)

e2 = f (t)− f (t−β ) (3.6)

To arrive at analytical expressions for the error terms, consider the summation of two generalcomplex exponentials Ae j(ωt+θ1) and Be j(ωt+θ2). By converting between rectangular and polarforms, one arrives at the result in equation (3.7).

Ae j(ωt+θ1)+Be j(ωt+θ2) = e jωt(Ae jθ1 +Be jθ2)

= e jωt [Acos(θ1)+Bcos(θ2)+ j(Asin(θ1)+Bsin(θ2))]

=√[Acos(θ1)+Bcos(θ2)]2 +[Asin(θ1)+Bsin(θ2)]2

∗ e j{ωt+tan−1[Asin(θ1)+Bsin(θ2)Acos(θ1)+Bcos(θ2)

]} (3.7)

Using equation (3.7), the error terms can now be expressed as

e1 =√

2−2cos(ωα)cos[ωt− tan−1(cot(

ωα

2

))]

e2 =√

2−2cos(ωβ )cos[ωt + tan−1(cot(

ωβ

2

))] (3.8)

The operation scales the magnitude of the function by√

2−2cos(θ) which is monotonic forθ ≤ π and 2π periodic and also adds a phase shift.

Consider next the 2 channel system in figure 3.2 where the phase φ1 is skewed by ∆t. Note

that the skew is equivalent toα−β

2. The cost function |E[|e1|]−E[|e2|]| given in equation

(3.9) is exactly equal to zero when α = β so that f (t) is equidistant between the two skewedversions of itself. A new constant K(ω) can be defined as

√2−2cos(ωα)−

√2−2cos(ωβ ),

Page 44: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 31

knowing that the average value of the function | f |, f , is the same no matter the phase shift.

g = |E[|e1|]−E[|e2|]|

= |[√

2−2cos(ωα)]

f −[√

2−2cos(ωβ )]

f |

= |[√

2−2cos(ωα)−√

2−2cos(ωβ )]| f

= |K(ω)| f (3.9)

f0

0o

f(t-b) = x[(2i)Ts]

f0

360o

f(t+a) = x[(2i+2)Ts]

f1

180o

f(t) = x[(2i+1)Ts]

t

a = Ts + Dtb = Ts - Dt

Figure 3.2: Generalization of Skew for a 2 Channel Time-Interleaved ADC

The idea of the clock swapper was to generate these two skewed versions. Note that if the

skew is small, then using the approximation cos(θ) ≈ 1− θ 2

2, K(ω) reduces to ω(α −β ) =

2ω∆t which is a linear function of skew, agreeing with the approximation from equation (3.3).As shown in figure 3.3, with ω ranging from 0 to Nyquist frequency

π

Ts, K(ω) is non-linear

but monotonic for -Ts ≤ ∆t ≤ Ts, guaranteeing that all realistic skew can be covered. From thisit becomes clear that it is possible to use outputs sampled by existing phases as the two skewedversions of f (t), therefore arriving at Razavi’s solution but with a different cost function.

Consider the timing for a two channel ADC illustrated in figure 3.2. The sampling phasesare φ0 and φ1 or 0◦ and 180◦. The 360◦ phase is simply the next edge of φ0. Ideally the phase180◦ must be equidistant between the phases 0◦ and 360◦. Therefore these two phases can beused as the reference phases instead of generating them using a clock swapper. The skew canthen be minimized by tuning the delay of φ1 or 180◦ phase. The advantage of this is the clockphases 0◦ and 360◦ are guaranteed to have a fixed spacing or zero average skew (since they areboth edges of φ0). Therefore they can be used as an accurate starting point. A block diagram

Page 45: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 32

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

∆t/Ts

Val

ue

Figure 3.3: Scale Factor K(ω) for −Ts ≤ ∆t ≤ Ts and frequencies 0 < ω <π

Ts

is shown in figure 3.4 with the sampling waveform on the left annotated with e1 and e2.

t

x(t)

y0[k-1]

y1[k-1]

y0[k]

f0 f1f0

Ts - DtTs + Dt

Odd

Sample

Odd

Sample

Even

Sample

e1e2

sub-ADC 0

sub-ADC 1

x(t)f0(t)

f1(t)

gz-1

z-1

y1

y0 e1

e2

+

-

+

+

-

-ABS

ABS

ABS

LMS

EngineDtx

+ -- +

Figure 3.4: Block Diagram of Skew Calibration for 2 Channel Time Interleaved ADC

This concept is similar to that of a phase locked loop (PLL) system where the phase differ-ence is minimized between the reference clock and a divided version of the voltage controlledoscillator (VCO) output. In a sub-sampling PLL (SSPLL), a sub-sampling phase detector isdirectly used to sample the VCO output and this is compared to the reference without the needof a divider. The work in [32] uses this concept to introduce a charge pump structure that is lesssensitive to mismatch. If the phase difference is zero, then the average value of the subsampledversion of the VCO output is the same as the DC value of the VCO output. Therefore it also

Page 46: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 33

uses amplitude information to generate phase error information. In contrast, in the proposedskew calibration, two average errors e1 and e2 are generated from the amplitude differences ofthe corresponding phases and compared to each other.

3.3 Extending the Calibration to Converters with N Chan-nels Using Calibration Sequence A

There are two ways to generalize this process to a N channel ADC where N is a power of2. In this section calibration sequence A will be discussed. Consider a 4 channel ADC, thecalibration sequence is shown in figure 3.5 by natural extension using equidistant phases. Thephase φ0 is used as the reference phase. In part a, phase φ2 is tuned to the correct position usingtwo neighbouring samples taken using φ0. This first step is the same as the 2 channel case,except now the two phases are kTs apart where k in this case is 2 ideally. In part b, phase φ1

is tuned to the correct position using neighbouring samples from φ0 and φ2. Finally in part c,phase φ3 is tuned to the correct position using neighbouring samples from φ2 and φ0. The costfunction as before is

g = | 1M

M−1

∑o=0

(|e1|− |e2|)| (3.10)

The error terms are generated by selecting the appropriate channels and subtracting the samplesat each iteration step. Each step consists of taking the appropriate 3M samples to computethe time average of the errors over a range of NM samples. The time required to convergeall channels will be (N− 1)(NMH)Ts, where H is the number of iterations to converge onechannel and N is the total number of channels. Since each step depends on the result of theprevious step, there will be an error propagation effect if convergence is not achieved. If themaximum residue skew error in any step is ∆ then the maximum residue error that can exist in

the system isN∆

2: 1 phase will have an error of ∆, 2 phases will have an error of 2∆, 4 phases

will have an error of 3∆ and so on. If the number of calibration steps is low, then this effectwill be negligible, as the probability of each phase converging with a maximum residue erroris low. One way to combat this is to lower the resolution/step of the tuning delay buffer foreach phase. Note that this calibration sequence creates additional signal constraints, since thesamples it uses are now more than one Ts apart. Intuitively this means some information hasbeen lost because the Nyquist sampling criterion has been violated.

Page 47: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 34

f0

0o

f0

360o

f2

180o

t

f1

90o

f3

270o

f0

0o

f0

360o

f2

180o

t

f1

90o

f3

270o

f0

0o

f0

360o

f2

180o

t

f1

90o

f3

270o

a)

b)

c)

b = kTs - Dt a = kTs + Dt

Figure 3.5: Calibration Sequence A for a 4 Channel Time-Interleaved ADC (k = 2)

Page 48: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 35

3.4 Limitations of Calibration Sequence A Using ProposedCost Function

Consider K(ω) from equation (3.9), the absolute value of this function, |K(ω)| = |K1(ω)−K2(ω)| is plotted in figure 3.6, where K1(ω) =

√2−2cos(ωα) and K2(ω) =

√2−2cos(ωβ )

for ω =0.45∗2π

Ts. Looking at the positive skew range, this function is non-monotonic for

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

∆t/Ts

Val

ue

K1(ω)

K2(ω)

|K(ω)|

Figure 3.6: Nonlinear |K(ω)|, −Ts ≤ ∆t ≤ Ts, ω =0.45∗2π

Ts

∆t ≥ 0.22Ts and therefore may render the cost function non-monotonic for realistic skew. Thereason is the periodic nature of K(ω) and the fact that the time difference is now much larger(ideally kTs between phases for 2k channel converter in the first step of calibration sequence).Also note that α +β = 2kTs, a constant, so that K1(ω) and K2(ω) are shifted versions of thesame function. The monotonicity is violated when K1(ω) or K2(ω) is zero. The functionK1(ω) is zero when the angle inside the cosine, ωα , is p2π , p ∈ Z.

For a converter with 2k channels, letting ω = q2π fs =q2π

Ts, q ∈ [0,0.5], and letting ∆t =

Page 49: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 36

rTs, r ∈ [0,1], the nulls occur when r =pq− k.

ωα = p2π

ω(kTs +∆t) = p2π

q2π

Ts(k+ r)Ts = p2π

r =pq− k p ∈ Z (3.11)

Therefore equation (3.11) gives the tolerable skew range where monotonicity is maintained asthe minimum r (where the first null occurs) multiplied by Ts. For k = 2, q = 0.45, the first nullof concern r (when p = 1) is calculated to be 0.22 as was shown in figure 3.6. Based on theparameters, replicas K1(ω) and K2(ω) shift closer to one another and cause the nulls to showup and the tolerable skew range to decrease. For a specific k there is a set of frequencies thathave 0 tolerable skew range:

q =pk

p ∈ Z, q ∈ [0,0.5] (3.12)

For instance, for k = 8, the frequencies arefs

8,

fs

4,

3 fs

8and

fs

2. As the input moves away from

these frequencies the tolerable skew range increases. These frequencies are also poor due tothe their rational nature as discussed later in section 3.8. Note that for k = 1, i.e. a 2 channelconverter, there is no such frequency (except DC signal of course), and monotonicity is guar-anteed up to a skew of Ts. As evident from equation (3.12), the number of frequencies whichcauses the calibration to fail increases as k increases, essentially imposing a bandwidth limit onthe input signal. Therefore care must be exercised when using this approach by investigatingthe statistics of the input signal.

This is especially true for broadband input signals with spectral content around problematicfrequency locations. For instance, given a converter with 4 channels (maximum k = 2) and apseudo-random (PRBS7) input signal at a rate of fs, the cost function behaves well if thechannel loss has a single pole and the loss is 6dB at Nyquist, however does not when the lossis 3dB at Nyquist frequency. In this case it is because when the signal has spectral content athigher frequencies, the bandwidth restriction that exists can be relaxed if the low pass natureof the channel attenuates those high frequency components.

Page 50: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 37

3.5 Extending the Calibration to Converters with N Chan-nels Using Calibration Sequence B

Another possibility is to modify the calibration sequence so that k is always equal to 1 at everystep of the calibration. This will allow the calibration to occur for all skew up to Ts and forany number of channels. Consider the situation where the odd phases are calibrated first thenthe even phases, using only phases 1 Ts away from them, as illustrated in figure 3.7 for a 4channel system. In part a, the initial spacings α0 to α3 are shown, ideally they will all be Ts.In part b, the first step of the sequence calibrates all odd phases. The effect is each of the oddphases is moved to the centre position using the two reference phases on either side and thespacing/skews are averaged as shown. In part c, the second step of the sequence calibrates alleven phases, averaging the spacing once again to obtain equidistant timing differences.

For a 4 channel system, this averaging effect propagates very fast so that only 2 cyclesare needed. As the number of channels increases so does the number of cycles needed so thisprocess of calibrating odd then even phases is repeated.

f0

0o

f0

360o

f2

180o

t

f1

90o

f3

270o

f0

0o

f0

360o

f2

180o

t

f1

90o

f3

270o

f0

0o

f0

360o

f2

180o

t

f1

90o

f3

270oa)

b)

c)

(a0+a1)

2

f3

270o

f3

270o

f3

270o

(a0+a1)

2

(a2+a3)

2

(a2+a3)

2

(a2+a3)

2

(a0+a1+a2+a3)

4

(a0+a1+a2+a3)

4

(a0+a1+a2+a3)

4

(a0+a1+a2+a3)

4

(a0+a1+a2+a3)

4

a0 a1 a2 a3a3

Figure 3.7: Calibration Sequence B for a 4 Channel Time-Interleaved ADC

Page 51: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 38

Since all the phases are being moved, any error will cause the phases to drift togethereven after convergence is reached. For instance, if after convergence phase φ0 gets perturbedslightly by 1%, all the other phases will also move by 1%. Alternatively, φ0 can be used asa fixed reference once again. This will slow down the convergence time. The sequence for4 channels is shown in figure 3.8. The second and third spacing will converge to the optimalvalue after 2 iterations, however the first and last spacing will take much longer to converge.

Note that from figure 3.8, the value of the first spacing changes every 2 steps (since oddphases are moved then even phases) or every even step L when the even phases are moved. Italso must be a linear combination of α0 to α3. For L > 1, notice that all the α terms grow by 2every even step. Therefore the spacing is

2(α0 +α1)+∑Ll=2 2l−2(α0 +α1 +α2 +α3)

2L+1

where the first term in the expression accounts for step L = 1 when the terms α2 and α3 do notappear. This is a simple series which can be expanded into

2(α0 +α1)+(2L−1−1)(α0 +α1 +α2 +α3)

2L+1=

2L−1(α0 +α1 +α2 +α3)+α0 +α1−α2−α3

2L+1

=α0 +α1 +α2 +α3

4+

α0 +α1−α2−α3

2L+1

(3.13)

The first term in equation (3.13) is the optimal spacing and the second term is the residue error.Using the substitution L = dL/2e to account for all steps including both odd and even steps,the expression for the error is given below and is valid for L > 2.

α0 +α1−α2−α3

2dL/2e+1 (3.14)

Note that each cycle needs a certain number of iterations to converge the appropriate channelsto their centre position relative to the phases beside them. Therefore although all odd phasesor even phases can be calibrated at the same, as opposed to one at a time, this calibrationsequence B in general takes much longer to converge than sequence A. Any residue error willcause the remaining skew to move around the optimal value. At best the residue error would bethe delay code step size µ , but could be much worse due to error propagation. For instance, ifthe LMS update converges to 2µ away from the optimal midpoint for phase φx, then this errorwill propagate to all the other phases. Then if phase φx converges to -µ away in a subsequentcycle, all the other phases will be off by 3µ from φx in that cycle. This effect is also present

Page 52: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 39

in sequence A but to a lesser degree due to the small number of steps. One solution would beto budget smaller delay step size at the cost of additional delay stages, which consumes morepower and degrades jitter performance. Using a M sample mean as before, the time requiredto converge all channels is 2(NMHL)Ts, where again H is the number of iterations to convergeall odd or even channels and N is the total number of channels.

3.6 Calibration Sequence A Using Razavi’s Cost Function

Note that the autocorrelation cost function used by Razavi is the same as using (e1)2 +(e2)

2

with e1 and e2 defined in equation (3.6). For a cosine input, a DC term proportional to the skew,K(ω) = cos(ωα)− cos(ωβ ), is generated. This term is also periodic and non-monotonic.Monotonicity is violated when K(ω) reaches a maximum or in other words when the two

cosine terms are π apart. For input frequencies ω =q2π

Ts, q ∈ [0,0.5], this occurs when the

skew isTs

4q, and is independent of k, where again k is the spacing (multiple of Ts) between

adjacent phases used in the calibration sequence.

ωα−ωβ = π

α−β =π

ω

α−β =Ts

2q

∆t =Ts

4q(3.15)

The maximum q is 0.5 and therefore the maximum skew range for which monotonicity ismaintained is 0.5Ts. If the system is guaranteed to have skew which is less than this throughdesign, then calibration sequence A can be used for a N channel converter. Since calibrationsequence A is faster and less prone to error propagation, using Razavi’s cost function is thebest solution when the skew is less than 0.5Ts.

3.7 Implementation Using LMS Update

A block diagram is shown in figure 3.9 where the channel selector is selecting phase φ1 asin part b of figure 3.5 when using sequence A or tuning it as one of the odd phases as part ofsequence B. A LMS iterative update can be used to tune the delay to minimize the cost function,according to equation (3.16). Generally, the speed of convergence may be controlled by theconstant µ , with larger values of µ leading to faster convergence, but pushing the adaptation

Page 53: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 40

f0

0o

f0

360o

f2

180o

t

f1

90o

f3

270o

f0

0o

f0

360o

f2

180o

t

f1

90o

f3

270o

f0

0o

f0

360o

f2

180o

t

f1

90o

f3

270oa)

b)

c)

(a0+a1)

2

(a2+a3)

2

a0 a1 a2 a3

f0

0o

f0

360o

f2

180o

t

f1

90o

f3

270o

(a0+a1+a2+a3)

4

(a0+a1+a2+a3)

4

(a2+a3)

2

(a0+a1)

2

(a2+a3)

2

(a0+a1)

2

(a0+a1+a2+a3)

4

(a0+a1+a2+a3)

4

(3a0+3a1+a2+a3)

8

(a0+a1+3a2+3a3)

8

f0

0o

f0

360o

f2

180o

t

f1

90o

f3

270o

(a0+a1+a2+a3)

4

(a0+a1+a2+a3)

4

(3a0+3a1+a2+a3)

8

(a0+a1+3a2+3a3)

8

d)

e)

f) f0

0o

f0

360o

f2

180o

t

f1

90o

f3

270o

(a0+a1+a2+a3)

4

(a0+a1+a2+a3)

4

(5a0+5a1+3a2+3a3)

16 (3a0+3a1+5a2+5a3)

16

Figure 3.8: Calibration Sequence B for a 4 Channel Time-Interleaved ADC Using φ0 As Ref-erence

Page 54: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 41

algorithm towards instability. However, due to the dependency of the cost function on the inputfrequency ω , it may be difficult to find an optimal step size µ .

∆t[n+1] = ∆t[n]−µ[g(n)−g(n−1)][

g(n)−g(n−1)∆t(n)−∆t(n−1)

](3.16)

One may use also the magnitude-sign variation, equation (3.17), or sign-sign variation,equation (3.18), of LMS. The sign-sign variation results in fixed steps. This simplifies theadaptation algorithm’s digital implementation at the cost of reduced convergence speed. Oneadvantage of this scheme is that the calibration need not be run at the full-rate of the ADC.In this work, the output of the ADC is down-sampled and the calibration is carried out inMATLAB using the down-sampled outputs.

∆t[n+1] = ∆t[n]−µ[g(n)−g(n−1)]sign[

g(n)−g(n−1)∆t(n)−∆t(n−1)

](3.17)

∆t[n+1] = ∆t[n]−µsign[g(n)−g(n−1)]sign[

g(n)−g(n−1)∆t(n)−∆t(n−1)

](3.18)

ADC0

ADC1

.

ADC2

.

.

Channel Selector

Accumlator M samples

LMS and

Tune Select

g

Dt0

g(n)

Dtlms(n)

REG

clk/M

f0

f1

Delay

f2

Figure 3.9: Block Diagram of Skew Calibration Engine

3.8 Accuracy and Additional Signal Constraints

Since the cost function relies on the amplitude of the samples, gain mismatch negatively im-pacts the calibration. In most cases it will cause convergence to a non-zero residual skew value.

Page 55: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 42

Therefore gain, offset, and radix calibration must be performed before timing skew calibration.The calibration accuracy depends on the mean estimate γx, which depends on the number ofsamples M. For a mean ergodic process, the variance of the mean estimate [33] is given by

Var[γx] = E[γx2]− (E[γx])

2

= E[γx2]− γ

2x

= E

[(1M

M−1

∑j=0

X( j)

)(1M

M−1

∑k=0

X(k)

)]− γ

2x

=1M

j=(M−1)

∑j=−(M−1)

(1− | j|

M

)rx( j)− γ

2x (3.19)

An accuracy estimate can therefore be made using the mean γx and auto-correlation rx of theinput signal. Calibration sequence A has a tolerable skew dependent on input frequency ω andk, where the number of channels is N = 2k. Calibration sequence B always has a tolerable skewof Ts. If Razavi’s cost function is used, then the maximum range is 0.5Ts using both sequences.In terms of additional signal constraints not discussed previously, this calibration suffers fromthe same problem of Huang and Wang’s approach [28] [29]. El-Chammas’ approach is alsosubject to this constraint as described in his book [34] in section 3.5.3. The problem occurswhen the ratio of the input frequency to the sampling frequency can be expressed as a rational

fraction, i.e.fin

fs=

ab

, where a and b are co-prime. Let HCF denote the highest common

factor - the largest positive integer that divides two numbers without remainder. If the ratiois a rational fraction then the number of distinct errors that are available for the calibrationis Ne =

bHCF(b,N)

no matter how many samples are taken. Therefore an accurate estimate

of the average can not be obtained. Note that the analysis in the preceding sections were,for simplicity, done using a single tone sinusoidal input. In general the calibration works forbroadband signals as shown in one of the simulation examples in the next section.

3.9 Simulation Results

The cost function and skew calibration algorithm using calibration sequence A were simulatedin MATLAB using a 8 channel 8 bit ADC with a skew standard deviation of 0.3Ts and fs

= 10GHz. Sign-sign LMS was used to descend the cost function. Except for the quantiza-tion noise and added Gaussian noise, the ADC is ideal (no offset and gain mismatch or non-linearity effects). The spectrum before the calibration and after the calibration of a broadband

Page 56: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 43

frequency-modulated (FM) signal is shown in figure 3.10. The input FM signal is of the form

Accos[ωct+h∗cos(ωmt)] where Ac = 0.5, h= 20, ωc = 2π717

5000fs, and ωm = 2π(10MHz). Ob-

serve that the distortion terms are significantly reduced after calibration: the skew is reducedfrom 12.6ps-rms to 679fs-rms using a delay code step of 400fs. In the simulation, the initialskew conditions are set and can be observed by keeping track of the delay code/calibrationsteps. The algorithm converged relatively fast in 80 iterations per phase using 5000 samplesper iteration. If Razavi’s cost function is used, it converges a bit slower to 650fs-rms using 120iterations per phase. Next a direct comparison is made between an extension of Razavi’s costfunction using sequence A and the proposed cost function using sequence B.

A sinusoidal signal with frequency ω = 2π24595000

fs is used for a 8 channel 8 bit ADC. UsingRazavi’s cost function and sign-sign LMS as before, convergence is achieved using sequenceA, 50 iterations per phase and 40000 samples per iteration. The standard deviation of skewimproves from 5.2ps-rms to 486fs-rms. Figure 3.11 shows the delay codes as a function ofiteration. If sequence B is used with the proposed calibration cost function instead, it requires50 cycles, 50 iterations per cycle and 40000 samples per iteration. The standard deviation ofskew improves from 5.2ps-rms to 800fs-rms. The delay codes converge to approximately thesame value from the initial value of 0 as shown in figure 3.12 versus cycle number. If additionalcycles are performed, the residue error ranges from the step size of the delay code to 6 timesthat due to error propagation.

3.10 Summary and Comparison to Existing Works

Table 3.1 outlines qualitative differences between this approach and the previous works de-scribed in the previous chapter for converters with more than 2 channels. All approaches areclassified as background calibration but each one is subject to certain signal constraints as de-scribed previously in this chapter. This work benefits from the lack of extra analog circuitry.Fast convergence time is achievable due to the utilization of existing full resolution sub-ADCs.Calibration sequence A can be used if some signal properties are known, as it’s more restrictiveon the range of input frequency. A slower background calibration using calibration sequenceB may be used for all frequencies, subject to a performance penalty due to additional errorpropagation.

An extension of Razavi’s approach to converters with more than 2 channels using calibra-tion sequence A offers the best solution for skew less than 0.5Ts. Comparing Razavi’s costfunction specifically, the proposed cost function has a larger skew tolerance for a 2 channelsystem but suffers from more signal constraints when the number of channels is greater than 2.Table 3.2 summarizes the key differences.

Page 57: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 44

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

−80

−70

−60

−50

−40

−30

−20

−10

0

Am

plitu

de (

dBF

S)

Normalized Frequency (f/fs)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

−80

−70

−60

−50

−40

−30

−20

−10

0

Am

plitu

de (

dBF

S)

Normalized Frequency (f/fs)

Figure 3.10: Skew Calibration of Input FM Signal Using Sequence A and Proposed Cost Func-tion 5000 Points FFT Spectrum (Top: Before Calibration, Bottom: After Calibration)

Page 58: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 45

0 5 10 15 20 25 30 35 40 45 50−15

−10

−5

0

5

10

15

20

25

30

Iteration Number

Del

ay C

ode

Ch1Ch2Ch3Ch4Ch5Ch6Ch7

Figure 3.11: Delay Code for Skew Calibration of Input Signal Using Sequence A and Razavi’sCost Function

0 5 10 15 20 25 30 35 40 45 50−15

−10

−5

0

5

10

15

20

25

30

Cycle Number

Del

ay C

ode

Figure 3.12: Delay Code for Skew Calibration of Input Signal Using Sequence B and ProposedCost Function

Page 59: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 3. PROPOSED TIMING SKEW CALIBRATION TECHNIQUE 46

Table 3.1: Summary of Background Timing Skew Calibration Techniques For Converters WithNumber of Channels Greater Than 2

Calibration Extra Circuitry Complexity of ConvergenceDigital Backend Speed

El-Chammas [25] ADC Channel, PLL/DLL Low MediumS. tepanovic [27] 2 ADC Channels, PLL/DLL Low Fast

Huang/Wang [28] [29] None Low SlowThis work None Low Fast/Medium

Table 3.2: Comparison of Proposed Cost Function and Razavi’s Cost Function Using DifferentCalibration Sequences for N Channel Converter

Approach Maximum Skew Tolerance Samples RequiredCalibration Sequence A rTs (N−1)(NMH)

Proposed Cost Function r = min(

pq− N

2

)p ∈ Z,q ∈ [0,0.5] M samples, H iterations

r = 1(N = 2)r = 0.5(N = 4, fin < 0.4 fs)

Calibration Sequence A 0.5Ts (N−1)(NMH)

Razavi’s Cost Function M samples, H iterationsCalibration Sequence B Ts 2(NMHL)Proposed Cost Function Prone to error propagation H iterations, L cycles

Page 60: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

Chapter 4

Time-Interleaved ADC Implementation

This chapter outlines the circuit implementation of the time-interleaved ADC. The ADC ar-chitecture is described with its design specifications. The clock generation, input and clockdistribution, output data conditioning, and layout floorplan will be discussed in detail.

4.1 Time-Interleaved ADC Architecture

Due to abundance of standards and receivers operating at 10GHz, the time-interleaved ADCsampling rate was chosen as 10GS/s. From section 2.1.5, a 7 bit resolution or effective numberof bits (ENOB) at the Nyquist frequency of 5GHz requires a jitter of 200fs which is achievable[35]. The resolution of the ADC was therefore chosen as 8 bits so that quantization noise willnot degrade the performance. The unit ADC was chosen to be a successive approximationregister (SAR) type ADC for its powerful efficiency. The SAR ADC architecture has benefitedfrom CMOS scaling since it comprises mainly of digital components such as the logic for thefinite state machine. The capacitive digital to analog converter (DAC) that is usually used alsobenefits from scaling, with unit capacitors values being reduced to atto-Farad in some cases:50aF in [27] and 500aF in [36].

The rationale for time interleaving is to relax the speed requirement of the circuit blocksand reduce the overall power consumption, albeit at the cost of increased system complexity.For instance, the work in [37] shows the power consumption for a particular pipeline ADC atspecific sampling rate is reduced to a minimum for 4 time-interleaved channels. Razavi [31]shows that if the samplers are the speed bottleneck for a B-bit ADC then there’s an advantageto interleaving when the number of time-interleaved channels is

N < (B+1)ln(2) (4.1)

47

Page 61: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 4. TIME-INTERLEAVED ADC IMPLEMENTATION 48

This assumes that the the B-bit ADC must settle to 0.5LSB. For instance, in this case 8 bits willmean there will be negligible advantage to interleaving beyond 4 channels assuming a samplerbottleneck. For this project, the timing constraint was in the operation of the unit SAR ADCwhich includes the settling of the sampler, DAC, and digital logic delay. In order to simply theunit ADC structure and timing requirement a two stage interleaving scheme similar to [38] wasused. The top level architecture is shown in figure 4.1.

Bout

Vdac

SAR Logic

DAC

T/H 8

Vref

10 × 125MS/s

1

f1

45o

Bout

sub-ADC1

Bout

Vdac

SAR Logic

DAC

T/H 8

Vref

10 × 125MS/s

1

f7

315o

Bout

sub-ADC7

8 sub-ADCs in totalVin

Bout

Vdac

SAR Logic

DAC

T/H 8

Vref

10 × 125MS/s

1

f0

0o

Bout

sub-ADC0

C-2C SAR ADC

C-2C SAR ADC

C-2C SAR ADC

Figure 4.1: Time-Interleaved ADC Architecture

The top level is composed of 8 sub-ADCs each at a sampling rate of 1.25GS/s for an aggre-

Page 62: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 4. TIME-INTERLEAVED ADC IMPLEMENTATION 49

Table 4.1: Design specification of the 1.25GS/s time-interleaved C-2C SAR ADC

Sampling Rate 10GS/s Bandwidth 5GHzResolution 8 bit Process CMOS 65nm GP

Supply 1V Power <0.5pJ/conversion stepNumber of Channels 8 Number of Unit ADCs 80

gate sampling rate of 10GS/s. Each sub-ADC has a buffer (PMOS source follower) followedby a bootstrapped track and hold sampler. The number of distinct evenly phases required istherefore 8, φ0 to φ7. At the second level of interleaving, each sub-ADC is composed of 10 unitsynchronous C-2C SAR ADCs, each operating at 125MS/s. Each of the 10 unit SAR ADCstakes 10 clock cycles to complete a full conversion, therefore one sub-ADC only requires a sin-gle 1.25GHz clock signal. Note that due to this structure of interleaving, timing skew exits onlybetween the 8 track-and-hold sampling clocks, φ0 to φ7; there is no concern for skew betweenthe 10 unit SAR ADCs within each subADC. Assuming all sub-ADCs initialize at SAR0, thetiming is illustrated in figure 4.2. Sub-ADC i always samples with φi no matter which SAR itis using. The C-2C SAR sub-ADC was implemented by Qiwei Wang, a Masters student whocollaborated with me on this project [9]. Please refer to his thesis for additional details. Theprototype ADC was implemented in the TSMC 65nm GP CMOS process at a supply of 1V.All calibrations were performed off-chip using MATLAB. Table 4.1 summarizes the designspecifications for the ADC prototype.

sub-ADC0

SAR0

sub-ADC1

SAR0. . . sub-ADC7

SAR0

sub-ADC0

SAR1. . . sub-ADC7

SAR1

sub-ADC0

SAR2. . . sub-ADC7

SAR9

sub-ADC0

SAR0. . .

Time

sub-ADC1

SAR0

100ps0 800ps

1/1.25GHz

1.6ns 8ns 8.1ns 8.2ns

1st Sample

Done From

sub-ADC0

SAR0

2nd Sample

Done From

sub-ADC1

SAR0

3rd Sample

Done From

sub-ADC2

SAR0

f0 f1 f2f0 f1 f2 f7 f0 f0

Figure 4.2: Time-Interleaved ADC Timing for Two Level Interleaving

4.2 Clock Generation

As discussed previously the TI ADC requires 8 phases of a 1.25GHz clock signal. There aretwo constraints for clock generation: low skew between the phases and low jitter. In general,for applications where the ADC is part of a system such as a receiver [6], it makes sense to

Page 63: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 4. TIME-INTERLEAVED ADC IMPLEMENTATION 50

construct an on-chip PLL. To achieve the low-jitter target, it is anticipated that an LC oscillatorwould likely be incorporated into the PLL, providing only two clock phases at 5GHz (Ringoscillators and/or quadrature LC oscillators generally have higher jitter for the same powerconsumption). In this prototype, a low-jitter 5GHz differential clock is supplied from off-chip.Hence, the ADC requires a circuit to generate 8 clock phases at 1.25GHz from a single dif-ferential 5GHz clock. This can be performed with either a DLL or clock divider. A DLL isa first order system that is guaranteed to be stable. A DLL however suffers from jitter accu-mulation effects. Shift-register based solutions such as synchronous clock dividers are a betterchoice [39]. In essence, clock dividers are injection locked oscillators where the injection sig-nal, the higher frequency input clock, is injected at multiple points in the loop, thus correctingthe zero crossings and preventing jitter accumulation [40]. The divided outputs are locked toa sub-harmonic of the input clock. A 4 stage clock divider was used to generate 4 differentialphases from a differential 5GHz off-chip input clock. The 4 differential phases are eventuallyconverted into 8 differential phases by duplicating and inverting. The clock divider was im-plemented using current mode logic (CML) to ensure lower jitter, especially in the presenceof supply voltage noise. It functions up to 1.5GHz at the typical process corner with a freerunning frequency of 1.3GHz. The input CML clock VCLK needs a common mode of 700mV(set off-chip) and a swing of 400mV peak differential. A block diagram and the unit CMLlatch is shown in figure 4.3.

4.3 Clock and Input Distribution

For low timing skew, clock distribution is critical in minimizing the systematic mismatches inclock paths. Any mismatch in the input signal path may also contribute to shifting the samplingpoint and therefore has the same effect as clock skew. In order to minimize the layout effects,a transmission line and a H-bridge is used. The transmission line is a coplanar waveguide(CPW) style line with ground shields around the centre signal path as shown in figure 4.4a.The bottom ground plane is implemented using segmented metal 4 (M4) and the signal runs onmetal 9 (M9). On-chip termination is provided as shown in figure 4.4b. The simulated insertionloss of the transmission line is 0.71dB at the Nyquist frequency of 5GHz.

The input and clock signals are brought to the middle of the chip using transmission linesfrom the right and left side respectively. On the input path, the transmission line is terminatedand connected directly to the H-bridge (implemented on M7 and M8, width is 1µm). On theclock path, after termination the signal is fed to the divider which outputs 4 differential phases.These phases are buffered by 2 stages of CML buffers before they are converted into 8 differ-ential phases and connected to the H-bridge. CML style distribution is used in order to combat

Page 64: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 4. TIME-INTERLEAVED ADC IMPLEMENTATION 51

CML

Latch

5GHz + 5GHz +5GHz - 5GHz -

8x480n/60n

4x960n/60n

1.7KW

VOUTPVOUTN

4x960n/60n

VINP VINN

7x960n/180n

IBIAS 210mA

VCLKP VCLKN

VCLK CM

700mV

Figure 4.3: CML Divider for Multiphase Clock Generation

Page 65: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 4. TIME-INTERLEAVED ADC IMPLEMENTATION 52

50W

50W

20pFM9

M4

...

M9

...

M9M9

3mm 3mm 3mm5mm

40mm

a) b)

Figure 4.4: On-chip Transmission Line and Termination

power supply noise induced jitter. A low swing is used to save power. CML buffer repeatersare also used along the distribution path due to long routing distances. The CML clocks areconverted into CMOS levels and fed through delay stages before arriving at the track and holdof each sub-ADC. The delay stages, under the control of the calibration algorithm, providethe necessary skew compensation. The concept is illustrated in figure 4.5. The sub-ADCs arearranged in a grid with the relevant clock phases shown. The approximate dimensions are alsoindicated. The unit CML buffer is shown in figure 4.6.

On the input distribution side, the insertion loss of the H-bridge after extraction is approx-imately 0.7dB at Nyquist frequency of 5GHz. The clock distribution network uses a separatepower supply for jitter isolation. For testing purposes, the lowest jitter clock source available inthe lab had a jitter of 300fs rms. Therefore the entire clock distribution network was simulatedat typical-typical low voltage high temperature (TTLVHT) corner after extraction with powersupply noise and an input source with a jitter of 300fs. The total jitter is approximately 430fsrms, which limits the ENOB to approximately 6 at Nyquist frequency of 5GHz. It’s possible toreduce this further by adding stronger and/or additional buffer stages, but the power consump-tion will not be acceptable. An improvement can be made if the sub-ADC size can be reducedsuch that the routing distances are shorter. Unfortunately this was not possible due to the largesize of the metal-insulator-metal (MIM) capacitors in the C-2C DAC of the unit SAR ADC. Inthe next iteration of this project a smaller unit capacitor can be designed instead of using thedefault MIM structure available in the kit. Due to space constraints, the clock paths are notshielded on both sides by ground. This may have also contributed to a higher jitter.

Page 66: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 4. TIME-INTERLEAVED ADC IMPLEMENTATION 53

subADC0

subADC2

90°

subADC1

45°

subADC3

135°

subADC4

180°

subADC6

270°

subADC5

225°

subADC7

315°

Div

ider

f0

f1

f2

f3

1x 2x

8

4

f0-3

f4-7

4

1x

1x

1x

1x

Transmission Line

250mm

250mm

150mm

150mm

600mm

400mm

150mm

Figure 4.5: Clock Distribution using H-bridge

Page 67: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 4. TIME-INTERLEAVED ADC IMPLEMENTATION 54

6x960n/60n

510W

VOUTPVOUTN

VINP VINN

10x960n/180n

IBIAS 300mA

Figure 4.6: Unit CML Buffer Schematic

4.4 CML to CMOS Converter and CMOS Delay Line

The CML clocks are converted into CMOS logic levels for the track and hold at each sub-ADC.After conversion, the clocks are fed into delay lines that enable skew control for the calibrationalgorithm. The block diagram is shown in figure 4.7. The delay lines are implemented usingCMOS inverters with varactors. To save area, the varactors are binary weighted MOS capac-itors constructed by shorting the drain and source nodes together. Assuming the trip point of

the inverter isVDD

2the delay is proportional to the load capacitance CL and inverter current I

given by

∆tdelay = ∆CLVDD

2I(4.2)

The array of MOS capacitors is controlled by a 7 bit code. To improve the monotonicity ofthe code, the 4 most significant bits (MSB) are thermometer encoded while the rest are binaryencoded. To further improve the monotonicity of the delay line, it was implemented in 4 stages.By dividing it into 4 stages the effects of thermal and power supply induced jitter are reducedby limiting the change in delay to 30% of inverter delay such at the slew rate is not significantlyimpacted [34]. Monotonicity is important for convergence/stability of the least mean square(LMS) algorithm. The delay line has a resolution of 440fs with a range of ± 28ps. Again theskew resolution corresponds to approximately 6 ENOB at Nyquist.

Page 68: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 4. TIME-INTERLEAVED ADC IMPLEMENTATION 55

7 bits (4 bits

Thermometer)

control

Unit Inverter

P: 960n/60n

N: 480n/60n

12.8kW

200fF

4x

1x

4x

Delay Stage

Delay

Stag

e

Delay

Stag

e

Delay

Stag

e 1x

8xVINP

VINN

VOUTP

VOUTN

CML to CMOS

Converter

7bit

Decoder

Figure 4.7: CML to CMOS Converter and CMOS Delay Line

4.5 Delay Code Input Interface

The delay codes are updated by the offchip calibration algorithm and sent onchip via an inputinterface. This interface uses a ready signal DATA RDY and a clock signal CLK DELAY inphase with the delay code data D < 7 : 0 > as shown in figure 4.8. The 3 least significant bitsD < 2 : 0 > are used as a select signal. When DATA RDY is low, D < 2 : 0 > are set to thechannel number to select which channel’s delay code should be updated, this value is storedin a 3 bit register. When DATA RDY is high, this register is disabled and a binary to one-shotconverter converts the 3 bit channel number into an 8 bit one-shot number O < 7 : 0 > (in thiscase all bits are high except for one). Each bit is OR-ed with the inverted DATA RDY signalto generate a 8 bit D ENAB signal. The 8 bit D ENAB signal is used to enable one of 8 7 bitregisters (not shown in figure) used to store the delay code values for the 8 sub-ADC channels.

4.6 Output Data Conditioning

4.6.1 Retimer

Recall the outputs of the TI ADC (all the sub-ADCs) are available every Ts = 100ps. Eachsub-ADC output is available every T = 800ps corresponding to the 1.25GHz clock signal. Theoutputs of different sub-ADCs appear at different delays (phases) of the 1.25GHz clock. For

Page 69: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 4. TIME-INTERLEAVED ADC IMPLEMENTATION 56

CLK_DATA

D<2:0>

D

REG

ENAB

Binary to One-

shot Converter

O<7:0>

DATA_RDY

D_ENAB<7:0>OR<7:0>

Figure 4.8: Delay Code Input Interface

each sub-ADC, there’s a multiplexer so that all the output bits are synchronous with a digitalclock replicated from the clock used by the track and hold. This clock is provided at theoutput of each sub-ADC for the retiming concept described here. The output consists of 10bits: 8 data bits and 2 additional flag bits, flagSAR and flagALL. FlagSAR is high when sub-ADC0 SAR0 outputs valid data. FlagALL is high when any of the SAR0 outputs valid data.Using these two bits it’s possible to align data taken on different occasions to enable fastercalibration. In order to process data off-chip, a retiming structure was used at the topmostlevel instead of a multiplexer. This structure is illustrated in figure 4.9. For instance, data fromsub-ADC0 is synchronous with the 0◦ clock phase and data from sub-ADC2 is synchronouswith the 90◦ clock phase. Data from both sub-ADCs can then be resampled by registers usingthe 270◦ clock phase. This clock phase is used to give some margin to the retimer due to longinterconnect routing distances. This concept is repeated until all the sub-ADCs’ data (8x10bits)is synchronous with a single 1.25GHz clock phase (45◦). Common data acquisition tools cannot process this much data at the specified data-rate. One solution is to use on-chip memoryto capture a sufficient number of digitized samples to validate the ADC performance, such asin [41]. Alternatively, the digital output can be downsampled and buffered off-chip for dataacquisition, such as in [34]. In this project, downsampling was used as described in the nextsection.

4.6.2 Decimator

Downsampling or decimating can be used to reduce the speed at which the data acquisitionoccurs. The signal and all harmonics and other distortion terms will experience aliasing due todecimation. As long as the major harmonics and distortions do not alias onto the fundamentalsignal tone(s) decimation does not improve or degrade the signal to noise and distortion ratio(SNDR). A decimation factor of 81 was chosen because there are 80 individual unit ADCs in

Page 70: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 4. TIME-INTERLEAVED ADC IMPLEMENTATION 57

subADC0

subADC2

90°

subADC1

45°

subADC3

135°

subADC4

180°

subADC6

270°

subADC5

225°

subADC7

315°

270° 315° 135°

45°

90° 135°

315°

1x10bit

2x10bit

4x10bit

8x10bit

1.25GHz

Figure 4.9: Top Level Output Data Retiming

Page 71: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 4. TIME-INTERLEAVED ADC IMPLEMENTATION 58

total. This decimation factor enables samples to be collected from all 80 ADCs. For instancea decimation factor of 80 will cause the outputs to be repeatedly taken from one ADC only.The decimation sequence is illustrated in figure 4.10 with the numbers representing samplenumbers. The samples taken are shown with a red background. Decimating by 81 essentially

Figure 4.10: Illustration of Decimate by 81 Sequence

means taking a sample from every sub-ADC sequentially. Because the retimer aligns the out-puts of all 8 sub-ADCs together in time, the decimator must take a sample every 10 cycles ofthe 1.25GHz clock, from 0 to 70T . When the decimator arrives at sub-ADC 7, it has to go backto sub-ADC 0. This roll-over needs to be implemented carefully. Instead of taking the 641th

sample at 80T , the 649th must be taken at 81T . Therefore in total two counters are needed: oneto count from 0 to 80 and reset at 80, and another from 0 to 7 to select the correct sub-ADCchannel. The block diagram is shown in figure 4.11.

The decimator was implemented using custom digital logic. Both counters are reset to 0at the beginning of the operation. The 7 bit counter counts from 0 to 80 with reset at 80. Theoutput of the counter is fed to a combinatoric logic block which functions as a comparator. Itoutputs MCLK which is high when the counter is 0, 10, 20, 30, 40, 50, 60, or 70. MCLK is madesynchronous with CLK which is the clock used at the last stage of the retimer. It also enablesthe 3bit synchronous counter which selects the sub-ADC channel. The synchronous MCLKRT

is used to clock 8 10-bit registers which are connected to the last stage of the retimer, whilethe output of the 3bit counter selects the correct sub-ADC channel for the 8-to-1 multiplexer

(MUX). The output of the 8-to-1 MUX is 10 bits at a rate of1081

GHz or approximately 123MHz.A duty cycle corrected version of the clock MCLK RT is also outputted off-chip for samplingthis data. All outputs are buffered through large inverters to the pads. The buffers use a separatesupply to avoid large transients that may affect other circuits.

Page 72: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 4. TIME-INTERLEAVED ADC IMPLEMENTATION 59

7bit

Counter

8 to 1

MUX

Comparator

Logic

(Combinatoric) DFF

3bit

Counter

ENAB

D Flip

Flop SEL

D

REG

CLK

MCLK

CLK

CLK

MCLK_RT

From subADC0From subADC1

From subADC7

Reset at 80

= {0,10,20,30,40,50,60,70}

11 10 10

10bit

1.25GHz/81

8x10bit

1.25GHz

Figure 4.11: Decimator Circuit Block Diagram

Page 73: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

Chapter 5

Measurement Results

This chapter summarizes the measurement results of the prototype ADC described in chapter4. The calibration of the time interleaved ADC is performed off-chip. For the gain, offset, andradix calibration, the method presented in section 2.2.1 is used. The timing skew calibrationuses the proposed algorithm presented in section 3.2.

5.1 Test Setup

The packaged chip is mounted on a printed circuit board for evaluation using laboratory equip-ment. The evaluation process is controlled through a MATLAB interface on a laptop computerusing a USB connection to communicate with an on-board microcontroller. The microcon-troller controls the reading of the output data from the chip, and the writing of the delay codesto tune the analog delays for the timing skew calibration. The entire test setup is described indetail in the following 3 sections.

5.1.1 Device Under Test

The prototype ADC was fabricated using TSMC 65nm 1P7M GP CMOS process. It occupiesan area of 2.2mm by 1.7mm or 3.74mm2. It has a total of 48 pins and is packaged using QFN-48. The cavity of the package is approximately 7mm x 7mm. For good input performance,the chip was shifted within the cavity so that the input signal traverses the shortest possiblebond-wires. The die photo is shown in figure 5.1. The locations of the 8 sub-ADCs, the inputand clock transmission lines, and the H-bridge are illustrated. A pinout of the chip is availablein appendix A.

60

Page 74: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 5. MEASUREMENT RESULTS 61

subADC

1

subADC

3

subADC

0

subADC

2

subADC

4

subADC

6

subADC

5

subADC

7

TL INPUTTL CLK

H-B

RID

GE

Figure 5.1: Time-Interleaved ADC Die Photo

5.1.2 Printed Circuit Board

A 4 layer printed circuit board (PCB) was fabricated with two signal layers, a power planeand a ground plane. The power plane was partitioned into 4 sections, for the 5V board supply,the 1V supply for the clock network VDDCLK , 1V supply for the output buffers VDDBUFF , and1V supply for the rest of the chip. A block diagram of the PCB is shown in figure 5.2 and apopulated PCB photo is shown in figure 5.3. The 5V board supply and on-board regulators areused to generate the other 3 required voltages. The reference voltages for the SAR ADC andthe common mode voltages of the input data and clock are generated using resistor ladders withunity gain buffers. On-board bias-tees are used to set the common mode voltages. The inputdata and clock signals are sent via edge-mount SMA connectors and on-board transmissionlines to the packaged chip. After downsampling by the on-chip decimator, the device under test(DUT) outputs 10 bits (8 bits of data and 2 flag bits) with a clock that can be used to re-samplethe data. The data is sent and stored in first-in-first-out (FIFO) memory before being read offthe board by the microcontroller (µC). The microcontroller has a serial to USB interface whichis used to communicate with the laptop computer. The microcontroller is also responsible forsending the 7bit delay code to the DUT using the results of the calibration algorithm runningon the computer. A list of key board components is shown in table 5.1.

Page 75: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 5. MEASUREMENT RESULTS 62

Microcontroller

(USB Interface)FIFODUT

Voltage

Reference

Generator

Bias

Tee

Data Common

Mode Voltage

Bias

TeeClock Common

Mode Voltage

Bias

Generator

Current

Bias

Reference

Voltage

Regulators

Supply

Voltages

Programmable Delay Code

Differential Data (SMA) Power

Differential Clock (SMA)

Decimated

Data Out

Decimated

Clock Out

On Board

T-Line

On Board

T-Line

Figure 5.2: Printed Circuit Board Block Diagram

Figure 5.3: Printed Circuit Board Photo

Page 76: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 5. MEASUREMENT RESULTS 63

Table 5.1: List of Key PCB Components

Item Manufacturer Part Number Part DescriptionMicrocontroller Atmel ATmega2560 8-bit Microncontroller

Regulators Linear Technology LT3060ETS8 Linear Voltage RegulatorAnalog Devices ADP1715 Linear Voltage Regulator

Voltage Reference Linear Technology LT1807CS8 Op-AmpFIFO Texas Instruments SN74V293-7PZA FIFO with 64K x 18 MemoryBias T Mini-Circuits TCBT-14+ Wideband Bias Tee

5.1.3 Equipment Setup

The external equipment setup used to evaluate the prototype ADC is shown in figure 5.4. Itrequired a laptop computer to run the calibration, two single-ended signal sources and a single5V power supply. Hybrid couplers are used to convert the single ended data and clock inputsinto differential inputs. For low frequencies, a balun board is used for differential conversion onthe input side due to the limited range of the coupler (500MHz - 7GHz). The list of equipment

ComputerPCB with Prototype

ADC

USB

Krytar Hybrid

Coupler /

TI Wideband Balun

Board

M/A-COM Hybrid

Coupler

HP Signal

Source

Centellax

Clock

Synthesizer

(5GHz)

Single-Ended

Data

Differential Data

Differential Clock

Single-Ended

Clock

HP DC

Power Supply

(5V)

DC Power

Figure 5.4: Test Setup for Evaluating ADC Prototype

used is shown in table 5.2. The Centellax clock synthesizer provides a low jitter (300fs rms)

Page 77: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 5. MEASUREMENT RESULTS 64

Table 5.2: List of External Equipments Used

Item Part Number Part DescriptionHP Signal Source hp83712B 10MHz-20GHz Synthesized CW Generator

Centellax Clock Synthesizer TG1C1-A 0.5GHz-13.5GHz Square Wave Clock SynthesizerHP DC Power Supply hpE3631A 80W Triple Output Power SupplyKrytar Hybrid Coupler 4005070 0.5GHz-7GHz 180◦ Hybrid Coupler

TI Wideband Balun Board ADC-WB-BB 4.5MHz-3GHz Wideband Balun BoardM/A-COM Hybrid Coupler 2031-6334-00 4-8GHz 3dB Hybrid Coupler

5GHz clock for the prototype ADC. The HP signal source works up to 20GHz.Data collection and evaluation occurs by first setting the signal generators to the correct

frequency and amplitude, resetting the FIFO and DUT using the µC, and then using the µC toinitialize the write operation of the FIFO. Once a certain number of samples is taken, the writeoperation is stopped and the read operation is started. When reading is complete, calibration isperformed on the laptop computer and the delay code is updated and written to the DUT usingthe µC. The FIFO is then reset and the read/write process repeats. Recall that gain, offset andradix calibration must be performed prior to skew calibration. In addition, a large number ofsamples is required due to the fact that there are 80 unit ADCs and each one must be calibrated.Thus it is performed once at start-up. The weights are then saved and used in the subsequentskew calibration iterations.

5.2 ADC Measurement Results

This section presents the measurement results for the time interleaved ADC, focusing on theperformance of the proposed timing skew calibration.

5.2.1 Single sub-ADC Performance

Performance of the single 1.25GS/s sub-ADC is summarized in Qiwei’s thesis [9] and outlinedhere briefly for completeness. Recall that timing skew doesn’t exist for a single sub-ADC dueto the existence of a front-end track and hold sampler. After gain, offset and radix calibration,DNL improves from +6.2/-1.0 LSB to +1.9/-1.0 LSB, and INL improves from +5.6/-7.4 LSBto +2.2/-1.7 LSB. The SNDR is approximately 39.4dB at low frequency and 37.9dB at Nyquistfrequency of 625MHz.

Page 78: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 5. MEASUREMENT RESULTS 65

5.2.2 Timing Skew Calibration Performance for 2 Channels

To evaluate the performance of the timing skew calibration, gain, offset, and radix calibrationare performed first for all 80 unit SAR ADCs. The calibration utilized 20000 samples orequivalently 250 samples per ADC. The skew calibration algorithm implemented in MATLABis then used iteratively to update the delay codes after data collection. The calibration consistsof iterations as described in section 3.2, where each iteration requires a certain number ofsamples. Note that calibration sequence A was used since the input frequencies can be chosenso that the signal constraints are satisfied. The first step is to tune the 180◦ phase to thecorrect location. This step requires using data from sub-ADC 0 and sub-ADC 4 which are180◦ apart. This is essentially a sub-sampling 2 channel ADC. To verify that the cost functionis indeed monotonic, the delay code of channel 4 was adjusted from its minimum value of 0to its maximum value of 127 by steps of 5, while the delay code of the reference channel 0was fixed at 127. An input signal at 886MHz was used with a total sample number of 1250per code. The resulting cost function is plotted in figure 5.5. Also plotted in the figure is thevalue of SNDR and signal to skew distortion ratio (SDR) as a function of delay code. SDR isdefined here as the input signal power divided by the power of the skew distortion tone (only 1exists since this is a 2 channel system). It can be seen clearly that the cost function has a singleminimum and the SNDR/SDR is maximized when it is minimized.

0 20 40 60 80 100 1200

2

4

6

8

10

12

14

16

Delay Code

Cos

t Fun

ctio

n

0 20 40 60 80 100 12020

25

30

35

40

45

50

55

60

SD

R/S

ND

R (

dB)

SDRSNDR

Figure 5.5: Cost Function vs SNDR/SDR for 2 Channel System

To test the convergence time, codes for both channels were initialized to the maximumvalue of 127 and the phase for channel 4 was adjusted. The calibration converges in approx-

Page 79: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 5. MEASUREMENT RESULTS 66

imately 18 cycles when magnitude-sign LMS is used. Figure 5.6 shows this for both a singletone input and a FM input. The FM input is a 10MHz signal with a frequency deviation of10MHz superimposed on the carrier tone of 886MHz. In both cases the algorithm convergesto approximately the same delay code value. The SNDR is improved from 23.6dB to 37.7dB.

0 2 4 6 8 10 12 14 16 18 2060

80

100

120

140

Iteration

Del

ay C

ode

0 2 4 6 8 10 12 14 16 18 2020

25

30

35

40

SN

DR

(dB

)

Single ToneFM

Figure 5.6: SNDR Convergence for 2 Channel System Using LMS

5.2.3 Timing Skew Calibration Performance for 8 Channels

Unfortunately the skew calibration could not be performed properly for all 8 channels. Themaximum timing skew (between the fastest and slowest sub-ADC phases) is greater than thefull range of the delay line (56ps at TT corner). As before, the delay codes of all 8 sub-ADCs were initialized to 127, with channel 0 fixed at 127 and used as the reference channel.The calibration algorithm settles the delay codes of the other 7 channels as shown in figure5.7. The delay code for channel 3 floors to zero, and dithers around this value. This problemis potentially due to the power distribution network for the supply of the clocking network.The clocking network supply was distributed from one side (top-left) of the chip (see pinoutdiagram in Appendix A). Clock domain power-tiles cover only a small fraction of the chip,mainly on the left side, as the main power domain tiles were given priority. Since the clockingnetwork is mostly CML based, the large currents that it draws could contribute to significantIR drop, especially given the large size of the chip. When the supply was raised to 1.2V from1V, the skew greatly improved, although still not within the range of the delay line. This is

Page 80: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 5. MEASUREMENT RESULTS 67

because as the delay line supply increases and therefore the supply of CMOS inverters in thedelay line increases, the range of the delay decreases proportionally. Not counting channel 3,the maximum skew is 31ps based on the converged delay code value.

0 2 4 6 8 10 12 14 16 18 200

20

40

60

80

100

120

140

Iteration

Del

ay C

ode

Ch1Ch2Ch3Ch4Ch5Ch6Ch7

Figure 5.7: Delay Code Convergence for 8 Channel System Using LMS

Another problem encountered during testing was the significant channel loss experiencedby the input data when the frequency is higher than 4GHz. Figure 5.8 plots the channel lossversus input frequency. This includes the loss due to the on-board transmission line, bias-tees,wire-bond and packaging, on-chip transmission line, distribution network, track and hold andPMOS source follower buffer. The loss at Nyquist frequency of 5GHz is approximately -14dB.The degradation in performance could be due to the poor characteristic of the transmissionline on the PCB. Since the insertion loss can not be directly measured using the Vector Net-work Analyzer (VNA) due to the absence of a connector on the chip side, the single-endedreturn loss was measured instead. An approximation of the insertion loss was made by using|S21| =

√1−|S11|2 and plotted in figure 5.9. This includes the loss of both of the on-board

transmission line and on-chip transmission line up to the on-chip termination. Another factorin the sharp roll-off may be the bandwidth of the track and hold sampler and PMOS sourcefollower buffer. Due to these two issues, the time interleaved ADC was evaluated as a 5GS/s 4channel ADC so that the input experiences reasonable loss at the Nyquist frequency of 2.5GHzand the skew can be fully compensated using the available delay line range.

Page 81: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 5. MEASUREMENT RESULTS 68

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6−20

−18

−16

−14

−12

−10

−8

−6

−4

−2

0

Input Frequency (GHz)

Cha

nnel

Los

s (d

B)

Figure 5.8: Total Channel Loss versus Input Frequency (including the effect of all onchipinterconnect and ADC)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5−3

−2.5

−2

−1.5

−1

−0.5

0

Frequency (GHz)

Inse

rtio

n Lo

ss (

dB)

Figure 5.9: Transmission Line Loss versus Frequency

Page 82: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 5. MEASUREMENT RESULTS 69

5.2.4 5GS/s 4 Channel Time Interleaved ADC Performance

Given the problems outlined in the previous section, the time-interleaved ADC was reduced toa 4 channel ADC by using half of the available sub-ADCs. This means sub-ADC 0, 2, 4, and 6were used. The sampling rate is then reduced from 10GS/s to 5GS/s. As before the gain, offset,and radix calibration were performed first, then the timing skew calibration was performed.Figure 5.10 shows the delay codes converge in 25 iteration cycles, channel 0 being the referenceand staying at code 127. Each cycle uses approximately 2000 samples. Note that the phasesare calibrated sequentially, so that the total calibration time is 2000 ∗ 25 ∗ 3 ∗Ts = 30µs. Themaximum compensated skew inferred from the converged delay code values is 21.9ps.

0 5 10 15 20 25 3060

70

80

90

100

110

120

130

Iteration

Del

ay C

ode

Ch1Ch2Ch3

Figure 5.10: Delay Code Convergence for 4 Channel System Using LMS

The improvement in SNDR from 11.1dB before skew calibration to 31.4dB after skewcalibration for a 4GHz input signal is shown in figure 5.11. Note that the 5GS/s 4 channeltime interleaved ADC acts like a sub-sampling ADC for frequencies larger than 2.5GHz. Thefundamental tone at 3.998GHz which aliases to 1.002GHz after sampling at 2.5GS/s, thenaliases again to 14.35MHz after downsampling by 81, is denoted by a diamond marker, theskew distortion tones are denoted by circular markers, the second harmonic is denoted by adownward triangle, and the third harmonic is denoted by a upward triangle. The spuriousfree dynamic range (SFDR) is limited by the skew distortion tone before calibration. Aftercalibration, the SFDR is limited by the residue distortion tones and harmonics. The residueskew is approximately 400fs-rms. Figure 5.12 shows the output spectrum for an input at theNyquist frequency (2.498GHz aliased to 28.86MHz) after calibration. Note that the SFDR is

Page 83: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 5. MEASUREMENT RESULTS 70

5 10 15 20 25 30

−80

−70

−60

−50

−40

−30

−20

−10

0

Frequency (MHz)

Am

plitu

de (

dBF

S)

SNDR = 11.1dB

SFDR = 14.1dB

5 10 15 20 25 30

−80

−70

−60

−50

−40

−30

−20

−10

0

Frequency (MHz)

Am

plitu

de (

dBF

S)

SNDR = 31.4dB

SFDR = 38.1dB

Figure 5.11: FFT Spectrum Before (Top) and After (Bottom) Calibration for 4GHz Input 2500Points (Markers: Diamond - Fundamental, Circular - Skew, Downward Triangle - 2nd Har-monic, Upward Triangle - 3rd Harmonic

Page 84: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 5. MEASUREMENT RESULTS 71

limited by the second harmonic at 40.4dB, which indicates the differential matching of thesystem is poor, again possibly due to the construction of the transmission line on the PCB.

5 10 15 20 25 30

−80

−70

−60

−50

−40

−30

−20

−10

0

Frequency (MHz)

Am

plitu

de (

dBF

S)

SNDR = 33.2dB

SFDR = 40.4dB

Figure 5.12: FFT Spectrum After Calibration for 2.5GHz Nyquist Input 2500 Points (Mark-ers: Diamond - Fundamental, Circular - Skew, Downward Triangle - 2nd Harmonic, UpwardTriangle - 3rd Harmonic

Figure 5.13 shows the effect of calibration across all frequencies. The highest input fre-quency used was 4GHz. Channel loss was compensated by increasing the input amplitude. Atlow frequencies, the effect of timing skew is small so the SNDR is limited by the tones gener-ated from gain, offset and radix mismatch. The SNDR improves significantly after calibrationis performed for static mismatches. At high frequencies, the skew tones become dominant,and only with the addition of skew calibration does the SNDR improve. The SNDR is 16.2dBbefore calibration and 33.25dB after calibration at Nyquist frequency of 2.5GHz. The jitter canbe estimated by moving all the harmonics and distortion terms from the spectrum. The totalnoise power remaining σ2

T is the sum of the quantization/thermal noise σ2q and the noise due

to jitter σ2j . At 4GHz, the total noise power is approximately -35dB. The quantization noise

can be extracted from a lower frequency signal where the contribution of jitter is minimal. Thequantization noise power at an input of 46MHz is -40.6dB. The jitter power is then -41.46dBcorresponding to an ENOB of 6.6. Using equation (2.21) and frequency of 4GHz, the standarddeviation is estimated to be 336fs.

The total power consumption of the 10GS/s time interleaved ADC is 277.3mW excludingthe I/O cells. The clock generation and peripheral blocks consume 31.3mW. The 8 sub-ADCsconsume 246mW, approximately 30.75mW per sub-ADC. Due to use of CML buffers in the

Page 85: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 5. MEASUREMENT RESULTS 72

0 0.5 1 1.5 2 2.5 3 3.5 410

15

20

25

30

35

40

Frequency (GHz)

SN

DR

(dB

)

NO CAL+RADIX/GAIN/OFFSET CAL+SKEW CAL

Figure 5.13: SNDR versus Frequency 5GS/s Time-Interleaved ADC

clock path, the power consumption for clock generation is comparable to that of a single sub-ADC. Since a single clock path and single sub-ADC are essentially replicated in layout, the5GS/s 4 channel ADC would consume half of the total power or 138.6mW.

5.2.5 Performance Summary and Comparison

The ADC performance is summarized in Table 5.3. The main characteristics of this ADC isthat it is implemented in TSMC65nm GP process, runs with 1V supply, and has a samplingrate of 5GS/s. The input full scale range is 1Vpp differential and the power consumption isapproximately 138.6mW. The SNDR is 38.57dB at low frequency and 33.35dB at Nyquistfrequency. The Walden figure-of-merit (FOM) is calculated with equation

FOM≡ Powerfs×2ENOB (5.1)

and it is 401fJ/conv-step and 738fJ/conv-step for low and Nyquist input frequencies respec-tively. Table 5.4 compares this work with other relevant works. The FOM is comparable toanother TI SAR [42], and two flash ADCs [8] [43]. The SNDR is better than the two otherflash ADCs [44] [45] at 5GS/s. The work in [46] is also a time interleaved SAR ADC whichachieves significantly better FOM due to the use of 0.5fF unit capacitances in the DAC andlarge onchip reference capacitors that are 80pF for fast settling. By using custom capacitorsand stacking, the ADC only occupies an area of 195 x 130µm2. Although at 5GS/s, single

Page 86: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 5. MEASUREMENT RESULTS 73

Table 5.3: Performance Summary of Prototype ADC

Parameter ValueProcess TSMC 65nm 1P9M GP

Active area 3.74mm2

Resolution 8 bitsSample rate 5GS/s

Supply voltage 1.0VInput range 1.0Vpp differential

Power consumption 138.6mWDNL +1.9/-1.0INL +2.2/-1.7

fin = 46MHz fin = 2.5GHz fin = 4GHzSNDR 38.57dB 33.25dB 31.44dBENOB 6.11 bits 5.23 bits 4.93 bitsFOM 401fJ/conv-step 738fJ/conv-step

channel flash architecture ADCs are operable, recall that this work was originally designed asa 10GS/s ADC where time interleaving would be more obviously beneficial. The FOM andSNDR performance vs sampling rate are also shown in figure 5.14, where this work is denotedby a star, the other works are separated into time-interleaved ADCs, denoted by diamond mark-ers, and non time-interleaved ADCs, denoted by circular markers. ADCs [4] [5] [6] [47] [25]at a sampling frequency of 10GS/s are also included to illustrate that, had the skew rangeproblem been resolved, the prototype ADC would achieve comparable performance to recentpublications at 10GS/s (majority of which are in fact time-interleaved SAR ADCs).

Page 87: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 5. MEASUREMENT RESULTS 74

4 5 6 7 8 9 10 11 1220

25

30

35

40

45

50

[Chung 2009]

[Poulton 2002]

[Park 2006]

[Paulus 2004]

[Janssen 2013]

[Deguchi 2007]

[Wu 2013]

[Chen 2013][Choi 2008]

[Kull 2013]

[Verma 2013]

[Zhang 2013][Cao 2009]

[Tabasy 2013][ElChammas 2011]

Sampling Frequency (GS/s)

SN

DR

(dB

)

TINon TI

4 5 6 7 8 9 10 11 1210

1

102

103

104

105

[Chung 2009]

[Poulton 2002]

[Park 2006]

[Paulus 2004]

[Janssen 2013][Deguchi 2007]

[Wu 2013]

[Chen 2013]

[Choi 2008]

[Kull 2013]

[Verma 2013][Zhang 2013]

[Cao 2009]

[Tabasy 2013][ElChammas 2011]

Sampling Frequency (GS/s)

FO

M (

fJ/c

onv−

step

)

TINon TI

Figure 5.14: High Speed ADC Performance Comparison (Markers: This Work - Star, Time-Interleaved ADC-Diamond, Non Time-Interleaved ADC-Circular)

Page 88: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 5. MEASUREMENT RESULTS 75

Table 5.4: ADC Performance Comparison with Other Published Works

Resolution Sample Rate Power SNDR FOMRef [Bits] [GS/s] [mW] [dB] [fJ/conv-step] Arch Tech[46] 8 8.8 35 38.5 58 SAR TI 32nm[8] 4.5 7.5 52 24.5 497 Flash 65nm

[48] 8 4 4600 39.1 15642 Pipeline TI 0.35µm[49] 4 4 89 22 2163 Flash 0.18µm[50] 6 4 990 30 9581 Flash 0.13µm[42] 11 3.6 795 50 539 SAR TI 65nm[43] 6 3.5 98 31.2 946 Flash 90nm[51] 12 5.4 500 50 358.4 Pipeline TI 28nm[44] 6 5 8.5 30.9 59.4 Flash 32nm[45] 6 5 320 32 1968 Flash 65nmThis 8 5 138.6 33.25 738 SAR TI 65nmwork

Page 89: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

Chapter 6

Conclusion

6.1 Summary

Digital circuit solutions have become attractive for the gains in power, area, and ease of de-sign as CMOS process continues to scale. For communication links, new architectures whichmove majority of the processing to the digital back-end necessitates the creation of a front-endanalog-to-digital (ADC) converter. In both wireline and wireless systems, the data rate contin-ues to increase and channel losses must be compensated through equalization which requireshigh resolution ADCs. Time interleaving (TI) is a solution in the design of multi-Gb/s ADCsfor reasonable power consumption. This solution however introduces time varying errors dueto the mismatches between the individual ADC channels. These mismatches include gain, off-set and phase/timing skew mismatch. Chapter 2 outlines the effects of these mismatches on theperformance of the TI ADC and existing methods used to mitigate these errors.

The mismatches are generally categorized into static mismatches and dynamic mismatches.Static mismatches, gain and offset mismatch, create distortion tones whose magnitude doesnot depend on the input frequency. In contrast, dynamic mismatch like timing skew mismatchcauses the distortion to grow with input frequency. For multi-Gb/s ADCs with moderate orhigh resolution, sub-pico-second timing skew is needed. Due to layout and device mismatcheffects, this skew requirement usually can not be met through design alone, therefore leadingto the use of calibration. Chapter 3 presents the proposed approach to timing skew correction.The proposed calibration is a background mixed-signal approach which uses a cost functionand iterative LMS update to traverse the cost function by changing the delay of the delay buffersof the appropriate clock paths. Two calibration sequences are proposed which extends the costfunction for use in ADCs with number of channels greater than 2. For single frequency inputs,calibration sequence A has some restrictions on frequency locations due to non-monotonicityof the cost function. The situation is more complicated for broadband inputs that contain

76

Page 90: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

CHAPTER 6. CONCLUSION 77

some spectral content around the problematic input frequencies. The sequence may eitherconverge or diverge depending upon the precise spectral content, which may be examinedwith behavioural simulation. Calibration B resolves this problem but is more prone to errorpropagation.

Chapter 4 presents the TI ADC architecture and circuit level implementation of severalblocks including clock generator, distribution, and output data conditioning. Finally chapter5 summarizes the performance of the TI ADC which was fabricated in TSMC 65nm CMOSprocess. Due to the inadequate range of the delay buffers in the clock path to fully compensatethe skew, 4 channels were used to achieve 5GS/s. The proposed skew calibration improvesthe SNDR from 16.2dB before calibration to 33.25dB after calibration at Nyquist frequency of2.5GHz. The ADC achieves a FOM of 738fJ/conversion-step at Nyquist frequency. The SNDRafter calibration is 38.57dB with a FOM of 401fJ/conversion-step at low frequency. The totalpower consumption is 138.6mW from a 1V supply and the ADC occupies an area of 3.74mm2.

6.2 Future Work

The time-interleaved ADC can be improved in several ways. The large skew variance canbe reduced. First the ADC chip size can be reduced by more than half by using a smallerunit sized capacitor in the DAC of the unit SAR ADC. Since the mismatch of the DAC isstatic, radix calibration should be able to remove non-linearity effects no matter how smallthe unit capacitors are. This will help reduce the clock skew and also help in reducing thepower consumption of the ADC. Secondly, to reduce cross-talk and layout mismatches in thevicinity of the clock paths, ground shields can be added. The channel loss can be improved byre-evaluating the transmission line network both on the PCB and chip. In terms of calibrationalgorithm, methods to reduce error propagation in calibration sequence B may be investigated

Page 91: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

Appendix A

Chip Pinout

The pinout of the chip is shown in figure A.1. The chip uses 3 supply voltages in total:V DD CLK for the clock domain circuitry, V DD BUFF for the output buffers, and V DD forthe rest of the chip. The reference voltages V REFP, V REFN and common mode voltage VCM

are provided for each elementary SAR ADC. A reset signal (RST ) is also available to reset allADCs and the decimator. Comparator current bias (IBIAS CMP), input buffer current bias(IBIAS BUFF), and CML divider current bias (IBIAS DIV ) are also provided from offchip.The output interface consists of the two flag bits (FLAG ALL, FLAG SAR) and the 8 bit data(DATA OUT < 7 : 0 >) in addition to a clock that can be used to read the data (CLK DATA).The input interface consists of the 7 bit delay code and a clock (CLK DELAY ) and a readysignal (DATA RDY ).

78

Page 92: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

APPENDIX A. CHIP PINOUT 79

Figure A.1: Chip Pinout

Page 93: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

Bibliography

[1] C. Castellanos, “PCI-SIG Announces PCI Express 4.0 Evolution to 16GT/S,Twice the Throughput of PCI Express 3.0 Technology,” 2011. [Online]. Available:http://www.pcisig.com/news room/Press Releases/November 29 2011 Press Release /

[2] J. L. Zerbe, C. W. Werner, V. Stojanovic, F. Chen, J. Wei, G. Tsang, D. Kim, W. F.Stonecypher, A. Ho, T. P. Thrush, R. T. Kollipara, M. A. Horowitz, and K. S. Don-nelly, “Equalization and Clock Recovery for a 2.5 10-Gb/s 2-PAM/4-PAM BackplaneTransceiver Cell,” IEEE Journal of Solid-State Circuits, vol. 38, no. 12, pp. 2121–2130,2003.

[3] Y. Liu, B. Kim, T. O. Dickson, J. F. Bulzacchelli, and D. J. Friedman, “A 10Gb/s CompactLow-Power Serial I/O with DFE-IIR Equalization in 65nm CMOS,” IEEE International

Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 182–184, 2009.

[4] S. Verma, S. Zogopoulos, and S. Sidiropoulos, “A 10.3-GS/s, 6-Bit Flash ADC for 10GEthernet Applications,” IEEE Journal of Solid-State Circuits, vol. 48, no. 12, pp. 1–11,2013.

[5] B. Zhang, A. Nazemi, A. Garg, N. Kocaman, M. R. Ahmadi, M. Khanpour, H. Zhang,J. Cao, and A. Momtaz, “A 195mW/55mW Dual-Path Receiver AFE for multistandard8.5-to-11.5 Gb/s serial links in 40nm CMOS,” IEEE International Solid-State Circuits

Conference Digest of Technical Papers (ISSCC), pp. 34–36, 2013.

[6] J. Cao, B. Zhang, U. Singh, D. Cui, A. Vasani, A. Garg, W. Zhang, N. Kocaman, D. Pi,B. Raghavan, H. Pan, I. Fujimori, and A. Momtaz, “A 500mW Digitally Calibrated AFEin 65nm CMOS for 10Gb/s Serial Links over Backplane and Multimode Fiber,” IEEE

International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp.370–372, 2009.

[7] D. Crivelli, M. Hueda, H. Carrer, J. Zachan, V. Gutnik, M. Del Barco, R. Lopez,G. Hatcher, J. Finochietto, M. Yeo, A. Chartrand, N. Swenson, P. Voois, and O. Agazzi,

80

Page 94: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

BIBLIOGRAPHY 81

“A 40nm CMOS Single-Chip 50Gb/s DP-QPSK/BPSK Transceiver With Electronic Dis-persion Compensation for Coherent Optical Channels,” IEEE International Solid-State

Circuits Conference Digest of Technical Papers (ISSCC), pp. 328–330, 2012.

[8] H. Chung, A. Rylyakov, Z. T. Deniz, J. Bulzacchelli, G.-y. Wei, and D. Friedman, “A 7.5-GS/s 3.8-ENOB 52-mW Flash ADC with Clock Duty Cycle Control in 65nm CMOS,”VLSI Circuits Symposium, Digest of Technical Papers, pp. 268–269, 2009.

[9] Q. Wang, “A 1.25GS/S 8-Bit Time-Interleaved C-2C SAR ADC For Wireline ReceiverApplications,” MASc Dissertation, University of Toronto, 2013.

[10] W.Black and D.Hodges, “Time Interleaved Converter Arrays,” IEEE Journal of Solid-

State Circuits, vol. 15, no. 6, pp. 1022–1029, 1980.

[11] N. Kurosawa, H. Kobayashi, K. Maruyama, H. Sugawara, and K. Kobayashi, “ExplicitAnalysis of Channel Mismatch Effects in Time-Interleaved ADC Systems,” IEEE Trans-

actions on Circuits and Systems -I: Fundamental Theory and Application, vol. 48, no. 3,pp. 261–271, 2001.

[12] P. Vaidyanathan, “Generalizations of the sampling theorem: Seven decades afterNyquist,” IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Ap-

plications, vol. 48, no. 9, pp. 1094–1109, 2001.

[13] E. Alpman, “A 7-Bit 2.5GS/sec Time-Interleaved C-2C SAR ADC for 60GHz Multi-Band OFDM-Based Receivers,” Ph.D. dissertation, 2009.

[14] M. El-Chammas and B. Murmann, “General Analysis on the Impact of Phase-Skew inTime-Interleaved ADCs,” IEEE Transactions on Circuits and Systems -I: Regular Papers,vol. 56, no. 5, pp. 902–910, 2009.

[15] T.-h. Tsai, P. J. Hurst, and S. H. Lewis, “Bandwidth Mismatch and Its Correction in Time-Interleaved Analog-to-Digital Converters,” IEEE Transactions on Circuits and Systems-

II: Express Briefs, vol. 53, no. 10, pp. 1133–1137, 2006.

[16] P. Satarzadeh, S. Member, B. C. Levy, P. J. Hurst, and A. Bandwidth, “Adaptive Semi-blind Calibration of Bandwidth Mismatch for Two-Channel Time-Interleaved ADCs,”IEEE Transactions on Circuits and Systems -I: Regular Papers, vol. 56, no. 9, pp. 2075–2088, 2009.

[17] B. Analui, J. F. Buckwalter, and A. Hajimiri, “Data-Dependent Jitter in Serial Communi-cations,” IEEE Transactions on Microwave Theory and Techniques, vol. 53, no. 11, pp.3388–3397, 2005.

Page 95: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

BIBLIOGRAPHY 82

[18] N. Da Dalt, M. Harteneck, C. Sandner, and A. Wiesbauer, “On the Jitter Requirementsof the Sampling Clock for Analog-to-Digital Converters,” IEEE Transactions on Circuits

and Systems I: Fundamental Theory and Applications, vol. 49, no. 9, pp. 1354–1360,Sep. 2002.

[19] D. Fu, K. C. Dyer, S. H. Lewis, and P. J. Hurst, “A Digital Background CalibrationTechnique for Time-Interleaved Analog-to-Digital Converters,” IEEE Journal of Solid-

State Circuits, vol. 33, no. 12, pp. 1904–1911, 1998.

[20] S. M. Jamal, D. Fu, N. C. Chang, P. J. Hurst, and S. H. Lewis, “A 10-b 120-Msample/sTime-Interleaved Analog-to-Digital Converter With Digital Background Calibration,”IEEE Journal of Solid-State Circuits, vol. 37, no. 12, pp. 1618–1627, 2002.

[21] W. Liu and Y. Chiu, “Time-Interleaved Analog-to-Digital Conversion With Online Adap-tive Equalization,” IEEE Transactions on Circuits and Systems -I: Regular Papers,vol. 59, no. 7, pp. 1384–1395, 2012.

[22] S. M. Chen and R. W. Brodersen, “A 6-bit 600-MS/s 5.3-mW Asynchronous ADC in0.13-um CMOS,” IEEE Journal of Solid-State Circuits, vol. 41, no. 12, pp. 2669–2680,2006.

[23] H. Johansson and P. Lowenborg, “Reconstruction of Nonuniformly Sampled Bandlim-ited Signals by Means of Digital Fractional Delay Filters,” IEEE Transactions on Signal

Processing, vol. 50, no. 11, pp. 2757–2767, 2002.

[24] D. Marelli, K. Mahata, and M. Fu, “Linear LMS Compensation for Timing Mismatch inTime-Interleaved ADCs,” IEEE Transactions on Circuits and Systems -I: Regular Papers,vol. 56, no. 11, pp. 2476–2486, 2009.

[25] M. El-Chammas and B. Murmann, “A 12-GS/s 81-mW 5-bit Time-Interleaved FlashADC With Background Timing Skew Calibration,” IEEE Journal of Solid-State Circuits,vol. 46, no. 4, pp. 838–847, Apr. 2011.

[26] J. Van Vleck and D. Middleton, “The Spectrum of Clipped Noise,” Proceedings of the

IEEE, vol. 54, no. 1, pp. 2–19, 1966.

[27] D. Stepanovic and B. Nikolic, “A 2.8GS/s 44.6mW Time-Interleaved ADC Achieving50.9dB SNDR and 3dB Effective Resolution Bandwidth of 1.5GHz in 65nm CMOS,”VLSI Circuits Symposium, Digest of Technical Papers, pp. 84–85, 2012.

Page 96: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

BIBLIOGRAPHY 83

[28] C. Wang and J. Wu, “A Multiphase Timing-Skew Calibration Technique Using Zero-Crossing Detection,” IEEE Transactions on Circuits and Systems -I: Regular Papers,vol. 56, no. 6, pp. 1102–1114, 2009.

[29] C. Huang, C. Wang, and J. Wu, “A CMOS 6-Bit 16-GS/s Time-Interleaved ADC Us-ing Digital Background Calibration Techniques,” IEEE Journal of Solid-State Circuits,vol. 46, no. 4, pp. 848–858, Apr. 2011.

[30] B. Razavi, “Problem of timing mismatch in interleaved ADCs,” Proceedings of the IEEE

2012 Custom Integrated Circuits Conference, pp. 1–8, Sep. 2012.

[31] ——, “Design Considerations for Interleaved ADCs,” IEEE Journal of Solid-State Cir-

cuits, vol. 48, no. 8, pp. 1806–1817, Aug. 2013.

[32] X. Gao, E. A. M. Klumperink, M. Bohsali, and B. Nauta, “A 2.2GHz 7.6mW Sub-Sampling PLL with -126dBc/Hz In-Band Phase Noise and 0.15psrms Jitter in 0.18µmCMOS,” IEEE International Solid-State Circuits Conference Digest of Technical Papers

(ISSCC), pp. 392–394, 2009.

[33] S. V. Vaseghi, Advanced Digital Signal Processing and Noise Reduction, 2nd ed. Eng-land: John Wiley & Sons Ltd, 2000, vol. 9.

[34] M. El-Chammas and B. Murmann, Background Calibration of Time-Interleaved Data

Converters. New York: Springer, 2012.

[35] B. Shen, G. Unruh, M. Lugthart, C.-h. Lee, and M. Chambers, “An 8.5 mW, 0.07 mmˆ2ADPLL in 28 nm CMOS with sub-ps resolution TDC and ¡ 230 fs RMS jitter,” VLSI

Circuits Symposium, Digest of Technical Papers, pp. 192–193, 2013.

[36] A. Shikata, R. Sekimoto, T. Kuroda, and H. Ishikuro, “A 0.5V 1.1MS/sec6.3fJ/conversion-step SAR-ADC with Tri-Level Comparator in 40nm CMOS,” VLSI Cir-

cuits Symposium, Digest of Technical Papers, pp. 262–263, 2011.

[37] L. Sumanen, M. Waltari, and K. Halonen, “A 10-bit 200-MS/s CMOS Parallel PipelineA/D Converter,” IEEE Journal of Solid-State Circuits, vol. 36, no. 7, pp. 1048–1055,2001.

[38] P. Schvan, R. Gibbins, Y. Greshishchev, and N. Ben-hamida, “A 24GS/s 6b ADC in 90nmCMOS,” IEEE International Solid-State Circuits Conference Digest of Technical Papers

(ISSCC), pp. 544–546, 2008.

Page 97: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

BIBLIOGRAPHY 84

[39] X. Gao, E. Klumperink, and B. Nauta, “Advantages of Shift Registers Over DLLs forFlexible Low Jitter Multiphase Clock Generation,” IEEE Transactions on Circuits and

Systems-II: Express Briefs, vol. 55, no. 3, pp. 244–248, 2008.

[40] J.-c. Chien and L.-h. Lu, “Analysis and Design of Wideband Injection-Locked Ring Os-cillators With Multiple-Input Injection,” IEEE Journal of Solid-State Circuits, vol. 42,no. 9, pp. 1906–1915, 2007.

[41] K. Poulton, R. Neff, B. Setterberg, B. Wuppermann, T. Kopley, R. Jewett, J. Pernillo,C. Tan, and A. Montijo, “A 20GS/s 8b ADC with a 1MB Memory in 0.18m CMOS,”IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC),pp. 318–496, 2003.

[42] E. Janssen, K. Doris, A. Zanikopoulos, A. Murroni, G. V. D. Weide, Y. Lin, L. Alvado,F. Darthenay, and Y. Fregeais, “An 11b 3.6GS/s Time-Interleaved SAR ADC in 65nmCMOS,” IEEE International Solid-State Circuits Conference Digest of Technical Papers

(ISSCC), pp. 464–466, 2013.

[43] K. Deguchi, N. Suwa, M. Ito, and T. Kumamoto, “A 6-bit 3.5-GS/s 0.9V 98-mW FlashADC in 90nm CMOS,” VLSI Circuits Symposium, Digest of Technical Papers, pp. 64–65,2007.

[44] V. H. Chen and L. Pileggi, “An 8.5mW 5GS/s 6b Flash ADC with Dynamic Offset Cali-bration in 32nm CMOS SOI,” VLSI Circuits Symposium, Digest of Technical Papers, pp.264–265, 2013.

[45] M. Choi, J. Lee, J. Lee, and H. Son, “A 6-bit 5-GSample/s Nyquist A/D converter in65nm CMOS,” VLSI Circuits Symposium, Digest of Technical Papers, pp. 16–17, Jun.2008.

[46] L. Kull, T. Toifl, M. Schmatz, P. A. Francese, C. Menolfi, M. Braendli, M. Kossel,T. Morf, T. M. Andersen, and Y. Leblebici, “A 35mW 8b 8.8GS/s SAR ADC with Low-Power Capacitive Reference Buffers in 32 nm Digital SOI CMOS,” VLSI Circuits Sym-

posium, Digest of Technical Papers, pp. 260–261, 2013.

[47] E. Z. Tabasy, A. Shafik, K. Lee, S. Hoyos, and S. Palermo, “A 6b 10GS/s TI-SAR ADCwith Embedded 2-Tap FFE/1-Tap DFE in 65nm CMOS,” VLSI Circuits Symposium, Di-

gest of Technical Papers, pp. 274–275, 2013.

Page 98: by Luke Wang - University of Toronto T-Space · Luke Wang Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 For multi-Gb/s

BIBLIOGRAPHY 85

[48] K. Poulton, R. Neff, A. Burstein, and M. Heshamp, “A 4GSample/s 8b ADC in 0.35umCMOS,” IEEE International Solid-State Circuits Conference Digest of Technical Papers

(ISSCC), pp. 127–434, 2002.

[49] S. Park, Y. Palaskas, and M. P. Flynn, “A 4GS/s 4b Flash ADC in 0.18um CMOS,” IEEE

International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2006.

[50] C. Paulus, H.-m. Bluthgen, M. Low, E. Sicheneder, N. Briils, A. Courtois, M. Tiebout,and R. Thewes, “A 4GS/s 6b Flash ADC in 0.13um CMOS,” VLSI Circuits Symposium,

Digest of Technical Papers, pp. 420–423, 2004.

[51] J. Wu, A. W.-t. Chou, C.-h. Yang, Y. Ding, Y.-j. Ko, S.-t. Lin, W. Liu, C.-m. Hsiao, M.-h.Hsieh, C.-c. Huang, J.-j. Hung, K. Y. Kim, M. Le, T. Li, W.-t. Shih, A. Shrivastava, Y.-c. Yang, C.-y. Chen, and H.-s. Huang, “A 5.4GS/s 12b 500mW Pipeline ADC in 28nmCMOS,” VLSI Circuits Symposium, Digest of Technical Papers, pp. 92–93, 2013.