Achieving High Data Rates in a Distributed MIMO System

Achieving High Data Rates in a Distributed MIMO System

Horia Vlad Balan Ryan Rogalin Antonios Michaloliakos Konstantinos PsounisGiuseppe Caire

University of Southern California{hbalan,rogalin,michalol,kpsounis,caire}@usc.edu

A distributed MIMO system consists of several accesspoints connected to a central server and operating as a largedistributed multi-antenna access point. In theory, such asystem enjoys all the significant performance gains of a tra-ditional MIMO system, and it may be deployed in an en-terprise WiFi like setup. In this paper, we investigate theefficiency of such a system in practice. Specifically, we buildupon our prior work on developing a distributed MIMOtestbed, and study the performance of such a system whenboth full channel state information is available to the trans-mitters and when no channel state information is available.In the full channel state information scenario, we imple-ment Zero-Forcing Beamforming (ZFBF) and Tomlinson-Harashima Precoding (THP) which is provably near-optimalin high SNR conditions. In the scenario where no channelinformation is available, we implement Blind InterferenceAlignment (BIA), which achieves a higher multiplexing gain(degrees of freedom) than conventional TDMA. Our exper-imental results show that the performance of our imple-mentation is very close to the theoretically predicted per-formance and offers significant gains over optimal TDMA.We also discuss medium access layer issues in detail for bothscenarios. To the best of our knowledge, this is the first timethat the theoretical high data rates of multiuser MIMO sys-tems have been showcased in a real world distributed MIMOtestbed.

Categories and Subject Descriptors

C.2.2 [Computer System Organization]: Computer Com-munication Networks

General Terms

Design, Experimentation, Performance

Keywords

Software Radio, Synchronization, Virtual MIMO, Wireless

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.MobiCom’12, August 22–26, 2012, Istanbul, Turkey.Copyright 2012 ACM 978-1-4503-1159-5/12/08 ...$15.00.

1. INTRODUCTIONOne of the most fundamental challenges in wireless net-

working today is the increasing demand for higher datarates. In theory, denser deployments, which suppress in-terference and achieve high spectral efficiency via spatialmultiplexing, may be realized in a cost effective manner viaa distributed multiuser MIMO system. More specifically,in an enterprise environment, several access points (APs)can be connected to a central server and operate as a largedistributed multi-antenna AP, ensuring that all transmittedsignal power serves the purpose of data transmission, ratherthan creating “interference.”

In this work, we investigate the efficiency of such a systemin practice by focusing on the downlink, that is, the trans-missions from the APs to the clients. The first challengethat needs to be addressed is how to achieve tight phase andtiming synchronization of the different APs in order to allowfor distributed multiuser MIMO precoding. We address thischallenge by building upon our prior work on developing adistributed MIMO testbed (see Section 3). We then pro-ceed to study the performance of such a system both whenfull channel state information is available to the transmit-ters (full CSIT) and when no channel state information isavailable (no CSIT).

In the full CSIT scenario we implement two schemes (seeSection 4): Zero-Forcing Beamforming (ZFBF) and Tomlinson-Harashima Precoding (THP) [30]. ZFBF is known for itssimplicity and high performance (see [6] and references therein),and has been implemented and studied recently in the caseof a centralized MIMO system [2]. THP is provably near-optimal in high SNR conditions, the typical use case forWiFi. In the no CSIT scenario, we implement the so calledBlind Interference Alignment (BIA) scheme [16], which ach-ieves higher multiplexing gain than orthogonal access (TDMA)by exploiting antenna switching at the receivers in conjunc-tion with a special form of linear precoding at the trans-mitters (see Section 5). It is worth noting that to the bestof our knowledge, neither THP nor BIA have been imple-mented in the past even in the simpler case of a centralizedMIMO setup.

We study the performance gains of the three schemes inour distributed MIMO testbed, comprised of two APs andtwo clients with one antenna each (see Section 6). Ourresults are particularly promising. Our implementation ofboth ZFBF and THP achieve close to 85% of the theoreti-cally achievable gain, an 85% improvement in absolute termsin comparison to TDMA (this gain grows linearly with thenumber of clients when enough antennas and a sufficiently

41

rich environment is present). Also, our implementation ofBIA achieves up to 60% of the theoretically achievable gain,i.e. a 22% improvement in absolute terms in comparisonto optimal TDMA. We also study the performance of theseschemes through simulations in a variety of scenarios, andwe compare it to that of Dirty Paper Coding (DPC), aninformation theoretic coding scheme that is known to be ca-pacity achieving in the full CSIT case, but is too complex tobe implemented in practice. As a matter of fact, THP canbe regarded as the practical approximation of the DPC idea.Finally, we discuss medium access layer issues for both thefull CSIT and the no CSIT scenarios (see Section 7). To thebest of our knowledge, this is the first time that the theoret-ical high data rates of a MIMO system have been showcasedin a real world distributed MIMO testbed.

2. A MULTIUSER MIMO PRIMER

2.1 Precoding schemesSince the introduction of multiuser MIMO [7] and the so-

lution of its information theoretic capacity region [29], therehas been extensive research on low-complexity practical pre-coding schemes able to achieve a large fraction of capacitywith low complexity. Under the assumption of full CSIT,the capacity achieving scheme consists of Dirty-Paper Cod-ing (DPC) [10] combined with optimal power allocation.

While DPC is essentially an information theoretic toolfor realizing perfect interference pre-cancellation with nopower penalty, its practical implementation is notoriouslydifficult [4,13]. THP is a well-known low-complexity schemefor the pre-cancellation of known interference which was pro-posed and widely studied in the context of precoding forInter-Symbol Interference channels [14]. At high SNR, THPachieves the same rate as DPC minus a fixed penalty calledthe “shaping loss”, which is an asymptotic power loss (seeSection 4.2). For this reason, THP was proposed as a low-complexity alternative to DPC for multiuser MIMO in [30].

An even simpler alternative to non-linear precoding (DPCand THP) is represented by linear precoding, i.e., beamform-ing. Among linear beamforming schemes, the simplest andbest known is Zero Forcing Beamforming (ZFBF), consistingof a column-normalized version of the right pseudo-inverseof the channel matrix. ZFBF may outperform THP at lowSNR. However, when ZFBF is used in conjunction with prac-tical QAM coded modulation [22] it is affected by the sameshaping loss as THP at high SNR. In addition, it has beenshown [31] that when the number of users is larger than theone of the transmit antennas, ZFBF may use user selectionto approach the system sum capacity.

The conclusion of the above discussion is that, in general,there is no clear a priori ranking of the achievable perfor-mance between THP and ZFBF: this depends on the actualchannel statistics (scattering environment), operating SNR,number of clients K and jointly coordinated AP antennasM . However, when ZFBF is used with practical codes basedon QAM constellations, THP is generally superior to ZFBFunless the system works in low SNR (typically not the casefor WLAN broadband access, targeting high spectral effi-ciency and operating in high SNR) or the number of clientsis significantly larger than the number of AP antennas, andgreedy selection of the clients is performed. The reason whyZFBF has been the preferred choice in testbed implemen-tations (see Section 2.3) is that it is considered to be easier

and lower complexity than THP. Our work, reported in thispaper, shows that this is not the case.

The need for CSIT imposes certain constraints on the sys-tem hardware, and PHY/MAC protocols. For example, in aTime-Division Duplexing (TDD) scenario (common in wire-less LANs), CSIT can be acquired at the APs from incominguplink pilot symbols, provided that the RF hardware is de-signed in order to preserve the reciprocity of the uplink anddownlink baseband equivalent channels. In order to be ableto schedule the downlink clients, it is necessary that theseclients send their uplink pilots right before the downlink slot,such that their channel state information is not outdated.This requires the design of special uplink pilots and downlinkdata frames, which are different from legacy 802.11 WLANs(see the discussion in Section 7). When the hardware is notspecifically designed to achieve uplink-downlink reciprocityand/or when legacy MAC protocols must be used, it is im-possible to have fresh CSIT of the scheduled clients for eachdownlink slot and precoding schemes based on CSIT can-not be used. In this case, it is known that if the clientsare statistically equivalent (i.e., they have identical marginalchannel statistics of their channel vectors), then TDMA isoptimal. In order to achieve a performance gain over conven-tional TDMA it is therefore necessary to induce artificiallydifferent channel statistics for the clients.

Following this idea, a scheme known as Blind InterferenceAlignment (BIA) was proposed in [16]. The scheme assumesthat each client terminal has M antennas, but only one RFchain. At each point in time, only the output of one antennais demodulated and sampled. Therefore, the complexity andpower consumption of the client terminals is the same as ina conventional single antenna terminal, with the additionof a switch that connects the different antennas to the RFdemodulation front-end. The client channel statistics aredifferentiated by assigning to each client a specific antennaswitching pattern. In this way, each client “sees” a differenttime-varying sequence of channel vectors. In [16] it is shownthat BIA can send MK independent information streamsover blocks of M + K − 1 channel uses, and, in idealizedconditions, each client is able to perfectly remove the in-terference from streams destined to the other clients, eventhough the APs have no CSIT at all.

2.2 PerformanceDifferent schemes can be evaluated in terms of their achiev-

able rates, implicitly assuming ideal coding. Studying achiev-able rates is much easier than implementing a whole codingscheme and evaluating throughput and packet error rate,since the achievable rates can be calculated on the basis ofclosed-form information theoretic formulas (see Sections 4and 5). In general, a coarse performance measure is pro-vided by the so-called “system degrees of freedom” (DoF),also known as “spatial multiplexing gain”. This is the num-ber of non-interfering data streams that we can simultane-ously send to the clients, per time-frequency slot. Mathe-matically, the systems DoFs are given by the limit:

dsum = limSNR→∞

sum rate(SNR)

log SNR, (1)

where SNR denotes the ratio between total transmit energyper channel use and noise power spectral density. A systemachieving a total DoFs count equal to dsum has high-SNRsum rate of sum rate(SNR) = dsum log(SNR)+o(log(SNR)).

42

It turns out that for all schemes considered in this work, theo(log(SNR)) terms are effectively a constant, i.e. a O(1)term. It is well-known that THP and ZFBF achieve underfull CSIT dsum = min{M,K} DoFs (recall that K is thenumber of clients and M the number of antennas), whichis the maximum achievable. BIA achieves dsum = MK

M+K−1,

which is close to min{M,K} when either M ≫ K or K ≫M . Finally, TDMA (or, equivalently, any orthogonal multi-ple access scheme) achieves dsum = 1.

The notion of DoFs becomes particularly relevant whenfixed QAM constellations are used (e.g., determined by somestandard coded modulation scheme), such that the maxi-mum number of information bits per symbol per stream iscapped by log2(constellation size). With fixed size QAMconstellations, the achieved spectral efficiency at high SNRapproaches dsum × log2(constellation size). We shall see inSection 6 that our testbed, comprising two coordinated APsM = 2 and two clients K = 2, and transmitting datastreams formed by 16QAM symbols (carrying a maximumof 4 information bits per symbol per stream), achieves sumrates close to 8 bps/Hz for the CSIT schemes (THP andZFBF), sum rates close to 16

3bps/Hz for the BIA scheme,

and sum rate close to 4 bps/Hz for TDMA, reflecting theDoFs that we expect from the theory.

2.3 Related work with software radio testbedsThe performance of multiuser precoding schemes in real

world scenarios has been investigated recently using soft-ware radio testbeds. A related study [2] focuses on ZFBFperformance when using a single access point with multi-ple antennas. Achieving multiplexing gains with distributed

transmit antennas (i.e., using different APs jointly coordi-nated by a central server through a wired backhaul network)is discussed in [15] for systems that do not provide phasesynchronization between the different APs. This precludesthe use of the known full-multiplexing achieving precodingschemes. Therefore, in [15] an alternative based on interfer-ence alignment and cancellation is proposed. Other streamsof work related to spatial multiplexing in wireless systemshave considered the use of directional antennas to limit in-terference [19], the use of additional antennas to transmitconcurrently with other nodes without harming the ongoingtransmissions [18], and applying multiuser detection tech-niques to the WiFi uplink [28].

Other related implementations are SourceSync [23], achiev-ing diversity gains through the use of distributed space timecoding, and Fine Grained Channel Access [27], which im-plements practical uplink bandwidth sharing. These sys-tems rely on frame alignment but do not achieve tight phasesynchronization. Our solution (see Section 3) achieves suf-ficiently accurate phase synchronization across the differentjointly coordinated APs such that full distributed precodingis possible. This condition is necessary to implement fullCSIT precoding schemes (in particular THP and ZFBF) fordistributed multiuser MIMO. It should be remarked, how-ever, that BIA only requires frame synchronization and suf-ficient carrier phase stability, and therefore BIA might besupported by system solutions such as SourceSync and FineGrained Channel Access.

3. DISTRIBUTED MIMOIn this section we briefly describe the distributed MIMO

testbed over which we implement THP, ZFBF, and BIA. We

pay special attention to the requirement to synchronize theclocks of the APs which transmit simultaneously.

3.1 The Synchronization RequirementMost precoding schemes designed for multiuser MIMO

transmission assume that multiple transmit antennas arehosted on the same AP and share the baseband circuitryand the clocking circuitry which produces the passband sig-nals. In fact, for schemes exploiting full CSIT it is essentialthat the signal amplitudes and the relative phase and timingoffsets of the signals received from different antennas remainunchanged between channel estimation and the actual trans-mission.

In contrast, schemes such as Blind Interference Align-ment [16] and distributed space-time coding through Alam-outi encoding [1] and its generalizations do not require anyCSIT and can be implemented under relaxed synchroniza-tion requirements, limited to frame synchronization and suf-ficient phase stability over the number of packets spanning aprecoding block. For example, for the BIA scheme that wehave implemented we need phase stability over three consec-utive packets, forming a BIA precoding block, as detailed inSection 5.

Since full CSIT schemes achieve the full sum-DoFs of thedistributed MIMO system, and in high SNR the advantagein spectral efficiency can be very significant, it is interestingto explore the possibility of achieving this level of synchro-nization in practice. We have implemented a testbed thatachieves phase synchronization and enables full CSIT pre-coding for distributed multiuser MIMO. It is also evidentthat the schemes requiring less strict synchronization canbe implemented in our testbed as well.

3.2 Platform DescriptionWe have implemented our synchronization method in the

FPGA of the WARP [24] software radio platform. The ra-dio platform is comprised of a central board containing theprogrammable logic circuitry and up to four daughterboardsserving as radio frequency front-ends. The FPGA and thedigital-to-analog and analog-to-digital circuitry present onthe daughterboards are clocked from a single oscillator calledthe sampling clock, while the transceivers on the RF front-ends derive their carrier frequency signals used in modula-tion and demodulation from a second oscillator, called thecarrier clock.

All transmitters are connected to a central server throughindividual gigabit Ethernet links. The server is responsi-ble for the joint encoding of the transmitted signals and forpassing the resulting waveforms, in the form of frequencydomain soft symbols, to the transmitter radios.

3.3 The Synchronization MethodPhase drifts and oscillators. Any discussion on phase

synchronization of distributed wireless transmitters mustnecessarily start with the mechanisms through which phaseerrors occur. Digital wireless transmission systems are con-structed using a number of clock sources, among which thetwo most important ones are the sampling clock and thecarrier clock. In a typical system, signals are created in adigital form in baseband at a sampling rate on the orderof megahertz, then passed through a digital-to-analog con-verter (DAC). Through the use of interpolators and filters,the DAC creates a smooth analog waveform signal which is

43

5 10 15 20 25 30 35−180

−120

−60

0

60

120

180

Time (OFDM Symbols)

Phase (

Degre

es)

Figure 1: Pilot phases

then multiplied by a sinusoidal carrier produced by the car-rier clock. The result is a passband signal which is then sentover the antenna.

Wireless receivers, in turn, use a chain of signal multi-plications and filters to create a baseband version of thepassband signal received over the antenna. Some designs,such as the common superheterodyne receivers, use multi-ple high frequency clocks and convert a signal first to anintermediate frequency before bringing it back to baseband.Other designs simply use a carrier clock operating at thesame nominal frequency as the carrier clock of the transmit-ter and perform the passage from passband to baseband ina single step. We will be focusing on such designs in theensuing discussion. After baseband conversion, the signal issampled and the resulting digital waveform is decoded.

There are four clocks in the signal path: the transmitter’ssampling clock and carrier clock and the receiver’s carrierclock and sampling clock. All four clocks manifest phasedrift and jitter. The drift effect, when linear in time andhappening at a relatively stable rate, can be assimilated tothe presence of a carrier frequency offset.

Denote by ωn = 2πnNTs

the subcarrier frequency and withωc the carrier frequency. Let the timing error of the sam-pling clock be ∆ts and the timing error of the carrier clockbe ∆tc and assume that they are on the same order of mag-nitude. The phase error due to the sampling clock will beωn∆ts while the phase error due to the carrier clock will beφi = ωc∆tc. Since ωc is much greater than ωn, the domi-nant phase rotation is due to the carrier clock and does notdepend on the subcarrier frequency. Moreover, since timeerrors are additive, if the time error is approximately linearin time (linear clock drift) then the phase error will also belinear in time and almost equal for all subcarriers.

The assumptions behind the above statement are verifiedby the results presented in Figure 1. We have constructedan experiment in which a transmitter sends several tone sig-nals, i.e. simple unmodulated sine waves, on several differentsubcarrier frequencies. In the absence of phase drift each ofthe tone signals would exhibit a constant phase when mea-sured over several OFDM frames. In reality, the phase is notconstant and the frame to frame phase shifts of the signalscan be measured and recorded. In the figure, phase driftshave been plotted over the duration of a few tens of frames,a time length corresponding to that of a packet transmissionaccording to the WiFi standards. Our experiment confirmswhat was anticipated above: the drift is indeed linear anddoes not depend on the subcarrier frequency. This allows us

FFT

Figure 2: AirSync Schematic. The baseband signals are pro-cessed through an FFT which feeds phase estimates into aKalman Filter. The IFFT produces a phase-adjusted datasignal, with the same phase drift as the main transmit-ter. The modulation and demodulation use the same carrierclock.

to design a scheme through which the drift can be trackedand predicted.

Design of a synchronization method. In a nutshell,our method consists of performing frame alignment simi-larly to prior work [23,27], and then broadcasting a referencesignal from a master AP, on to which the rest of the APstransmitters will lock. Thanks to the high clock precision ofthe WARP platform, the carrier offset is sufficiently low topreclude the need for further frequency offset compensationat baseband frequency before decoding the signal througha Fourier transform. In general, when it is needed, suchadditional carrier estimation and compensation can be per-formed using standard methods [25]. No matter how smallthe residual carrier frequency offset is, it will lead to phasechanges in the signal received from a transmitter, from onesymbol to the next. We chose to track these phase shifts inreal-time and compensate them separately for every frame.We base the tracking of the phase drift on pilot tone signalsbroadcasted by a master transmitter.

In order to reduce self-interference at the secondary trans-mitters, the tone signals are placed outside the data band,from which they are separated by a guard interval. The sec-ondary transmitters place an analog baseband filter aroundtheir data band further limiting their interference with thepilots. Self-interference could have been avoided using anumber of other techniques such as nulling by antenna place-ment [9], digital compensation [11], or simply relying on theOFDMA-like property of a frame aligned system [27] andpreventing the secondary transmitters from using the pilotsubcarriers.

In addition to tracking the common drift, the secondaryAPs measure the initial phase of each tone of the master APin order to obtain absolute estimates, that is the interceptsof the lines describing the phase drifts in Figure 1. For this,the master AP transmits an initial synchronization headercontaining a set of known channel estimation symbols. Theinitial phase estimate, combined with the phase drift mea-sured using the pilot signals, suffices to predict the absolutephase of any particular tone.

Figure 2 illustrates the process of creating a phase syn-chronous signal at the secondary transmitter. The secondarytransmitter overhears a packet sent by the primary trans-mitter and uses the initial pseudo-noise sequence in order todetermine the block boundary timing of this packet. Usinga discrete Fourier transform the secondary transmitter de-codes the successive frames of the incoming packet. It thenemploys the CORDIC algorithm on the complex-valued re-ceived soft symbols in order to obtain their phases in radi-

44

ans. The phases of the out-of-band pilot signals are trackedthroughout the entire packet transmission in order to esti-mate the phase drift from the primary sender. The mea-surements from the four different pilots are averaged andpassed through a simplified Kalman filter which maintainsan accurate estimate and predicts, based on the current driftestimate, the phase drift after the passage of a few furtherframes.

The phase estimates are used in synthesizing a synchro-nized signal. The secondary transmitter uses an inverse dis-crete Fourier transform, whose output frames are timed suchthat they align with the frames of the main sender’s signal.For every subcarrier the secondary transmitter rotates thesoft symbol to be sent by an angle corresponding to the sub-carrier’s estimated phase offset. The result is a tone that,while not having the same phase as the corresponding tonefrom the main transmitter, follows that tone at a fixed, pre-known phase difference.

Since the subcarrier tones of all transmitters are now syn-chronized, the transmitters act together as a commonly hostedset of MIMO radio-frequency front-ends.

3.4 Centralized joint encodingBy transmitting phase synchronous signals from multiple

access points we have created the equivalent of a distributedMIMO transmitter, capable of employing multiuser MIMOprecoding strategies in order to transmit to multiple usersat the same time. However, the use of multiple access pointscomplicates the design of the transmitter system. For mostof the precoding schemes available, the encoding of the wave-forms to be transmitted over the antennas must be donejointly, since reaching a single user usually involves trans-mitting over multiple antennas. While in theory the jointencoding process could be duplicated at each access pointgiven the binary information destined to each user, we choseto do the encoding only once, at a central server and send theresulting waveforms to each access point for transmission1.

Our central server has an individual gigabit Ethernet con-nection to each of the WARP radios serving as access points.We divide the downlink time into slots and in each slotschedule for transmission a number of packets destined tovarious users. For each of the access points, the server com-putes the waveform of the signal to be transmitted in thenext downlink slot. However, it does not perform any phasecorrection at this point. The only information used in theprecoding is the data to be transmitted and the channelstate information between each access point antenna andeach user antenna. The server assumes that all access pointsare phase synchronous, like in a normal MIMO system. Theserver transmits their corresponding waveforms to all sec-ondary transmitters and finishes by sending the last wave-form to the primary transmitter. The primary transmitterstarts transmitting right away and the secondary transmit-ters follow.

The design of the phase alignment method ensures itsscalability. There is no added overhead for synchronizinga larger number of secondary transmitters.

1This approach is practical in enterprise networks where anumber of access points are already connected to a commonserver.

4. PRECODING USING TRANSMITTER CSIConsider a system with M single-antenna jointly coor-

dinated access points and K single-antenna clients. Whenusing OFDM, the time and frequency selective fading chan-nel is decomposed into subcarriers, where for each OFDMsymbol and subcarrier the channel is characterized by asingle frequency-domain complex coefficient for each trans-mit/receive antenna pair. The resulting baseband channelmodel for a single subcarrier over any given OFDM symbolis described by

y = HHx+ z, (2)

where the vectors y, x and z and the matrix matrix H havecomplex entries, and H denotes Hermitian transpose. For thesake of notation simplicity, we neglect in Equation(2) boththe OFDM symbol and the subcarrier (time and frequencyindices), since they are irrelevant at this point. The receivedsignal sample at client receiver k is the k-th entry of thevector y, the symbol transmitted by access point j is thej-th entry of the vector x, and the (j, k)-th element of thematrix H indicates the channel coefficient between the j-thaccess point transmitter and the k-th client receiver. Finally,z denotes a K-component vector of independent, identicallydistributed, additive white circularly symmetric Gaussiannoise samples (AWGN).

Let u be a K-vector containing the symbols destined toeach user. The transmitted vector x is obtained as a func-tion of u. The mapping u 7→ x is called precoding. Whenthe mapping function depends on the matrix H, assumedknown to the transmitters, we say that the scheme makesuse of full CSIT. If the mapping is independent of H, thenthe scheme is “blind”. Next, we review the two full CSITschemes that we have implemented in our testbed: ZFBF (alinear mapping) and THP (a non-linear mapping).

4.1 Zero-Forcing BeamformingThe precoding mapping in ZFBF is given by

x = Vu (3)

whereV is a scaled version of the Moore-Penrose right pseudo-inverse ofHH, normalized in order to have unit-norm columns.Assuming M ≥ K, 2 the matrix H has rank K with proba-bility 1 for any practically relevant scattering environment.The pseudo-inverse takes on the form H† = H(HHH)−1 andthe precoding matrix V is given by

V = H†Λ

1/2 (4)

where Λ = diag(λ1, . . . , λK) and

λk =1

[(HHH)−1]k,k

(inverse of the (k, k) diagonal element of (HHH)−1). It isimmediate to check that V in (4) has unit norm columns.

With ZFBF precoding, the original channel (Equation 2)

reduces to y = Λ1/2u + z. Hence, each k-th client seesa separate (spatially decoupled) channel of the form yk =√λkuk+zk. The total transmit power is given by E[‖x‖2] =

E[‖u‖2]∑Kk=1 qk, where qk = E[|uk|2] is the power allocated

to the k-th data stream. The corresponding achievable sum

2If K > M , greedy user selection is used to serve a numberof clients not larger than M .

45

rate assuming a Gaussian random coding ensemble is givenby

Rzfbfsum(SNR) =

K∑

k=1

log(1 + λkqk), (5)

where, without loss of generality, we normalize the noisevariance per component to 1, and let

∑k qk = SNR. The

above rate can be maximized with respect to the power al-location to the data streams, i.e., with respect to q1, . . . , qK ,subject to the constraint that the total power must be notlarger than SNR.

In the case of a distributed MIMO system, both inaccura-cies in channel state information and phase synchronizationerrors between the transmitter antennas can lead to powerleakage from each client’s channel to the others, creating anerror floor for downlink transmission. In previous work, dur-ing the development of our system, we have measured theamount of leakage arising from this cumulated effects. Wehave discovered that the residual self-interference power dueto imperfect zero-forcing is, on average, about 24 dB belowthe useful power.

4.2 Tomlinson-Harashima PrecodingIn THP the mapping from the data symbol vector u to

the transmitted symbol vector x is non-linear. Consideragain the channel model (2). THP imposes a given precod-ing ordering, and it pre-cancels sequentially the interferenceof already precoded signals. Without loss of generality, con-sider the natural precoding ordering to be from 1 to K, andlet hk denote the k-th client’s M × 1 channel vector. LetH = QR be the QR factorization of H, such that R isK ×K upper triangular with real non-negative diagonal co-efficients, and Q is such that QHQ = I. THP precoding isformed by the concatenation of a linear mapping, definedby the unitary matrix Q, with a non-linear mapping thatdoes the interference pre-cancellation. Let u = THP(u) de-note the non-linear mapping of the data vector u into anintermediate vector u, that will be defined later. The linearmapping component of THP is then given by

x = QΣ1/2

u, (6)

where Σ = diag(q1, . . . , qK), and qk is, as before, the powerallocated to the k-th stream. It follows that the channel re-duces to y = LΣ1/2u+z, where L = RH is lower triangular.The signal seen at client k receiver is given by

yk = [L]k,k√qkuk +

∑

j<k

[L]k,j√qjuj

︸︷︷︸interference

+zk (7)

Next, we look at the non-linear mapping u 7→ u. The goal isto pre-cancel the term indicated by “interference” in Equa-tion 7. Notice that this term depends only on symbols uj

with j < k. Therefore, the elements u1, . . . , uK can be calcu-lated sequentially. A simple pre-subtraction of the interfer-ence term at each step would increase the effective transmitpower.

The key idea of THP is to introduce a modulo operationthat limits the transmit power of each precoded stream uk.This is defined as follows. Assume that the data symbolsuk are points from a QAM constellation uniformly spacedin the squared region of the complex plane bounded by theinterval [−τ/2, τ/2] on both the real axis and the imaginary

axis. Then, for a complex number s, let s modulo τ begiven by [s]mod τ = s − Qτ (s), where Qτ (s) is the point(n + jm)τ with integers n,m closest to s. In short, Qτ (s)is the quantization of s with respect to a square grid withminimum distance τ on the complex plane, and [s]mod τ isthe quantization error. We let

uk =

[

uk −∑

j<k[L]k,j√qj uj

[L]k,k√qk

]

mod τ

. (8)

In this way, the symbol uk is necessarily bounded into thesquared region of side τ , and its variance (assuming a uni-form distribution over the squared region, which is approx-imately true when we use a QAM constellation inscribed inthe square) is given by E[|uk|2] = τ 2/6. Letting τ =

√6 we

have that the precoded symbols have unit energy and thatthe transmit power for stream k is exactly given by qk.

Let’s focus now on receiver k and see how the moduloprecoding can be undone. The receiver scales the receivedsymbol yk by [L]k,k

√qk and applies again the same the mod-

ulo τ non-linear mapping. Simple algebra then shows that

yk =

[uk +

zk[L]k,k

√qk

]

mod τ

. (9)

It follows that the interference term is perfectly removed, butwe have introduced a distortion in the noise term. Namely,while uk is unchanged by the modulo operation, since byconstruction it is a point inside the square, the noise term

zk[L]k,k

√qk

is “folded” by the modulo operation, i.e., the tails

of the Gaussian noise distribution are folded on the squaredregion. Noise folding is a well-known effect of THP [14].

As far as the achievable rate is concerned, it is possible toshow (see [5,12]) that this is given by

Rthpsum(SNR) =

K∑

k=1

[log(1 + |[L]k,k|2qk)− log(πe/6)

]+,

(10)where [·]+ indicates the positive part. Again, this sum ratecan be optimized with respect to the power allocation q1, . . . , qK ,subject to the sum power constraint

∑Kk=1 qk ≤ SNR. The

rate penalty term log(πe/6) is the shaping loss, due to thefact that THP produces a signal which is uniformly dis-tributed in the square region (therefore, a codeword of n sig-nal components is uniformly distributed in an n-dimensionalcomplex hypercube). 3

5. PRECODING WITHOUT TRANSMITTER

CSIIn the case when CSIT cannot be reliably acquired, when

tight phase synchronization cannot be achieved, or whenchannel reciprocity does not hold because of hardware lim-itations, we need to resort to “blind” approaches. The clas-sical choice, already implemented in software radio [23], con-sists of distributed space time coding (e.g., distributed Alam-outi [1]). This, however, improves reliability through diver-sity, but achieves the single DoF of TDMA. Here, we im-plemented the BIA scheme proposed in [16], able to achievedsum = MK

M+K−1. We briefly outline the case of M = K = 2,

3It should be noticed that the same shaping loss is incurredby any other scheme such as ZFBF, BIA, as well as TDMAif practical QAM constellations are used instead of the the-oretical Gaussian coding ensemble.

46

Slot t = t1 t = t2 t = t3

[Tx1 Sends

Tx2 Sends

]x[1] + x[2] =

[u[1]1 + u

[2]1

u[1]2 + u

[2]2

]x[1] =

[u[1]1

u[1]2

]x[2] =

[u[2]1

u[2]2

]

User 1 Antenna A B A

User 2 Antenna A A B

User 1 Receives y1(t1) = hH

1A(x[1] + x[2]) + z1(t1) y1(t2) = hH

1Bx[1] + z1(t2) y1(t3) = hH

1Ax[2] + z1(t3)

User 2 Receives y2(t1) = hH

2A(x[1] + x[2]) + z2(t1) y2(t2) = hH

2Ax[1] + z2(t2) y2(t3) = hH

2Bx[2] + z2(t3)

User 1 Decodesy1(1) = y1(t1)− y1(t3)

= hH

1Ax[1] + z1(t1) − z1(t3)

y1(2) = y1(t2)

= hH

1Bx[1] + z1(2)

⇒ x

[1] =

[hH

1A

hH

1B

]−1 [y1(1)

y1(2)

]

User 2 Decodesy2(1) = y2(t1)− y2(t2)

= hH

2Ax[2] + z2(t1) − z2(t2)

y2(2) = y2(t3)

= hH

2Bx[2] + z2(3)

⇒ x[2] =

[hH

2A

hH

2B

]−1 [y2(1)

y2(2)

]

Table 1: Blind Interference Alignment for the 2× 2 scenario

when each client has two antennas connected through aswitch to a single RF front-end.

The fundamental idea of BIA is to differentiate the usersby inducing special signatures in their channel temporal vari-ations. This is obtained by allocating to each user an an-tenna switching sequence, according to which they demodu-late the signal from one of their antennas. Only one antennain every given slot is used, so that a single RF front-end anddemodulation chain are needed.

The scheme that we have implemented sends 2 indepen-dent streams per client to two clients, over 3 time slots. Fig-ure 4 contains a sketch of the testbed. Receiver 1 uses theswitching sequence A,B,A, indicating that it uses antennaA in slots 1 and 3 and antenna B in slot 2 of a precodingframe formed by 3 slots. Receiver 2 uses the switching se-

quence A,A,B, with analogous meaning. Denoting by u[k]i

the i-th data symbol of user k, with i = 1, 2 and k = 1, 2,the BIA scheme transmits x[1] + x[2] in the first slot, x[1]

in the second slot, and x[2] in the third slot, where x[1] andx[2] are formed out of the symbol streams as illustrated inTable 4.2. Letting hkA and hkB denote the 2 × 1 channelvectors seen at antennas A and B of user k, we observe thatthe 2 × 2 matrix with columns [hkA,hkB] has rank 2, andthat the channels remain constant over the precoding blockspanning 3 slots.

After linear interference cancellation at each client, theachievable sum rate with Gaussian random coding ensemblesis given by [16]:

Rsum =

K∑

k=1

E

[log det

(I+ (K+M−1)P

M2KHH

k Hk

)]

M +K − 1(11)

where for the 2× 2 case:

Hk =

[1√2hkA,hkB

](12)

6. PERFORMANCE EVALUATION

6.1 Experimental resultsIn this section we characterize the performance of the

precoding schemes. The first series of experiments studieshow the achieved rates vary in respect to variations in thetransmitted power, for a single channel realization. We thenstudy the performance of BIA when the channel experiencesphase variations. We have focused on the THP and BIAprecoding schemes. While we have implemented and testedZFBF as well, we do not plot the results since the resultingperformance values in our setup are quite close to those ofTHP.

−10 −5 0 5 100

0.2

0.4

0.6

0.8

1

Phase Error (Degrees)

Perc

enta

ge o

f E

xperim

ents

Figure 3: The Precision of the Phase Synchronization.AirSync achieves phase synchronization within a few degreesof the source signal.

Synchronization Accuracy. In this particular experi-ment we have placed the two transmitters and the two re-ceivers at random locations. We placed a third RF front-endon the secondary sender and configured it in receive mode.

47

Figure 4: BIA Testbed. When usng Blind InterferenceAlignment each receiver switches between two antennamodes.

The secondary transmitter samples its own synthesized sig-nal over a wired feedback loop and compares it with the maintransmitter’s signal. The synchronization circuit measuresand records the phase differences between these two signals.Since we use the primary transmission as a reference, in thisexperiment we do not broadcast the signal synthesized bythe secondary transmitter in order to protect the primarytransmission from unintended interference. We note thatthe use of a third RF front-end is not needed in the generalcase.

We have modified the synchronization circuit to producea signal that is not only phase synchronous with that of theprimary transmitter but has the exact same phase when ob-served from the secondary transmitter. To achieve this, thecircuit estimates the phase rotation that is induced betweenthe DAC of the secondary transmitter and the ADC throughwhich the synthesized signal is resampled. It then compen-sates for this rotation by subtracting this value from theinitial phase estimate. It is worth noting that this rotationcorresponds to the propagation delay through the feedbackcircuit and is constant for different packet transmissions, asdetermined through measurements.

Figure 3 illustrates the CDF of the synchronization errorbetween the secondary transmitter and the primary trans-mitter. The error is measured on a frame-to-frame basisusing the feedback circuit. In decimal degree values, thestandard deviation is 2.37 degrees. The 95th percentile ofthe synchronization error is at most 4.5 degrees.

Achievable rates. We have used the testbed topologyillustrated in Figure 4 throughout our experiments, placingthe receivers in arbitrary locations in a closed environment.In order to compare the performance of THP and BIA to theone of a typical TDMA system, we introduced a third trans-mission scheme, in which instead of multiuser precoding wetransmit to one user at a time from the closest access point.In this scheme, transmissions to different users happen ina time-shared manner, just like in 802.11. As opposed to

User 1 User 2

Figure 5: The scattering diagram for Tomlinson-HarashimaPrecoding.

User 1 Symbol Stream 1 User 1 Symbol Stream 2

User 2 Symbol Stream 1 User 2 Symbol Stream 2

Figure 6: The scattering diagram for Blind InterferenceAlignment.

802.11, we assume that different access points do not collidewhen doing channel access, i.e. they perform perfect down-link scheduling. We investigate the sum rates achievableduring downlink transmission. The unit of measure is thenumber of bits per second per Hertz (bps/Hz) transferred byeach scheme, where the comparison was done looking onlyat the portion of the bandwidth used for data transmission(i.e. we considered only the data carriers and ignored theoverhead of null carriers, pilots and cyclic prefix). Since theOFDM framing for all three schemes is identical and similarto the one of 802.11, we obtain a fair comparison of theirthroughputs.

We have varied the transmitters’ signal powers in a pro-portional way, trying to obtain a typical range of SNRs atthe receivers. The receive-side SNR values span the typicalhigh range encountered in WiFi signal transmission, from 15dB to 30 dB. The received SNR values (or carrier to noise ra-tios) in our figures were estimated using non-precoded andnon-synchronized isotropic broadcasts, measuring the rawreceived power and comparing it to the receiver noise. Thesame levels of total transmit power were used in the pre-coded synchronous transmissions.

We evaluate the SINR (Signal to Noise plus InterferencePower Ratio) values of the different symbols streams de-coded by the receivers. Determining the symbol SINR valuesrequires more effort in our scenario than in classic point-to-point transmission. Since our system is susceptible to powerleakage from one stream to another, we would like to con-

48

15 20 25 3010

15

20

25

30

Receiver Carrier to Noise Ratio (dB)

Sym

bo

l S

INR

(d

B)

THP

BIA

TDMA

(a) Carrier Energy vs. Symbol SINR

15 20 2510

−8

10−6

10−4

10−2

100


Bit E

rro

r R

ate

THP

BIA

TDMA

(b) Bit Error Rate

15 20 25 300

2

4

6

8


Su

m R

ate

(b

its/s

/Hz)

THP

BIA

TDMA

(c) Sum rate (16-QAM)

15 20 25 300

5

10

15


Su

m R

ate

(b

its/s

/Hz)

THP

BIA

TDMA

(d) Sum rate (Gaussian codes)

15 20 25 300

0.5

1

1.5

2


Multip

lexin

g G

ain

Over

TD

MA

THP

BIA

(e) Multiplexing gain (16-QAM)

15 20 25 300

0.5

1

1.5

2


Multip

lexin

g G

ain

Over

TD

MA

THP

BIA

(f) Multiplexing gain (Gaussian codes)

Figure 7: Experimental Results

tinuously transmit over all channels in order to assess theimpact of interference.

To this end we sampled each symbol stream using sym-bols chosen from a relatively sparse QAM-16 constellation.We measured the variance of the constellation points on thereceiver side in order to determine the sum of the noise andinterference powers. The amplitude of the constellation re-flects the received signal power. At the high SNR valuespresent in our system, the clusters of constellation pointsare spaced sufficiently to allow for an accurate mapping ofthe received symbols to constellation points. In order toassess the effects of interference produced by streams thatfollow other encodings, we have, in some experiments, fixeda QAM-16 constellation on one symbol stream while em-ploying symbols chosen according to a Gaussian or uniformdistribution on the other stream. Our results have shownthat at the low interference levels measured, none of thestatistics collected shows considerable variance dependingon the type of interference.

Figure 7a presents the SINR values for symbols receivedwhen using each of the three precoding schemes. Figure 7billustrates the inferred symbol error rates for the QAM-16constellation transmitted. It can be easily seen that theTHP and BIA curves closely follow the TDMA curve, withonly a few dB difference.

Figure 7c presents the sum rate achievable by the threedifferent schemes (THP, BIA and plain TDMA) for differ-ent levels of the total transmit power, when employing acapacity achieving code on top of the transmitted QAM-16constellation. Figure 7e presents the relative gains of THPand BIA over TDMA. It can be easily seen that each schemequickly saturates at the maximum rate of 4 bits/DoF. SinceTHP and BIA provide extra degrees of freedom, they achievetheir theoretical multiplexing gain over TDMA.

We would like to know how the quality of the resulting

symbol streams affects the achievable rates. To this end wehave estimated the rates achievable when using capacity-achieving codes instead of the QAM-16 modulation. Figure7d presents the resulting sum rates and Figure 7f presentsthe multiplexing gains. THP achieves an average increase insum rate of 85%. While this may seem shy of the theoreticalachievable multiplexing gain of 2, we must remember thatTHP allocates power among two degrees of freedom, whileTDMA allocates its whole transmitted power to a singletransmitter. The second reason for this discrepancy is theshaping loss present in the rate calculation in the case ofTHP, which was indicated in Equation 10.

The average gain for BIA is 22%. Again, the transmittedpower is distributed between the two transmitters. Addi-tionally, as mentioned in Section 5, BIA suffers from noiseenhancement, which affects the received symbols.

In the case of a distributed MIMO system, we would ex-pect that phase synchronization error could lead to randomrotations of the received soft symbols. We investigated thiseffect by comparing the variance of soft symbols correspond-ing to constellation points of different amplitudes. We wouldexpect that due to random rotations, the variance of theouter constellation points would be higher. However, ourmeasurements could not identify such an effect for any ofthe transmission schemes.

Since BIA does not provide the transmitter with channelstate information, to allow it to guess an appropriate trans-mission rate, it is interesting to find out by how much thereceived symbol quality is affected by small variations in thepositioning of the antennas. Such an effect is analogous tofast fading, where small phase changes affect the channelamplitude at different frequencies. We have conducted anexperiment in which we have varied the transmitter antennapositions within one wavelength of their initial position andmeasured the channel SINR for the two user symbols. The

49

10 15 20 250

0.2

0.4

0.6

0.8

1

Received SNR (dB)

Po

rtio

n o

f M

ea

su

rem

en

ts L

ess o

r E

qu

al

User 1

User 2

Figure 8: The cumulative distribution function of receivedSNRs under the Blind Interference Alignment Scheme.

CDFs of the resulting SINR distributions are shown in Fig-ure 8. The high variance of the distribution has profoundimplications on the design of a coding and medium accessscheme for BIA, as will be discussed in Section 7. The higherSNR present in one of the CDFs can be easily explained bythe fact that the two symbols are transmitted by antennasplaced on different transmitters. The placement of the usersrelative to the corresponding transmitter determines eachsymbol’s average power.

6.2 Simulation resultsTo further evaluate the different schemes that were im-

plemented on our testbed, we have also performed off-linesimulations to compare them with respect to informationtheoretic benchmarks. We collected a large set of channelmatrices measured at the receivers of our testbed, based onthe downlink pilot symbols, and we calculated the achieva-ble sum rate of the various precoding schemes assuming idealGaussian random coding, and averaging over the ensembleof measured channel matrices. Specifically, we compare thesum rates between the capacity achieving theoretic DPC,THP and ZFBF, BIA and finally TDMA. In Fig. 9 we cansee the sum rates plotted against the SNR (in dBs) at thetransmitter for K = 4 users and M = 4 antennas.

For the full CSIT schemes, we observe that both THP andZFBF approach the optimal theoretic capacity and show aconstant gap of 2.5 − 5 dB with respect to the system sumcapacity (achieved by DPC), at sufficiently large SNR. No-tice that THP suffers from the shaping loss even in the caseof ideal coding, while ZFBF with ideal coding does not. Thismakes THP outperform ZFBF only in very high SNR. How-ever, in a real system implementation, coded modulationschemes based on QAM constellations would be used alsowith ZFBF, and these suffer from the same shaping lossof THP. Therefore, in an actual implementation, the perfor-mance of ZFBF would decrease also by roughly 0.5 bit/s/Hzper stream. Thus, the effective gain of THP over ZFBF in anactual implementation can be significant, even at moderateSNR. This is consistent with the testbed experiments shownin Section 6.1, where symbols from a 16QAM constellationwere used.

Next, we compare the two schemes that require no CSIT.First, notice that in order for BIA to achieve the theoreticDoFs, we have to be in the high SNR region. It is thereforeexpected that BIA’s sumrate will suffer from the noise en-hancement until we reach a high SNR. However, we again

0 5 10 15 20 25 300

5

10

15

20

25

30

35

SNR (dB)

Sum

Rate

(bits/s

/Hz)

DPCZFBFTHPBIA TDMA

Figure 9: Simulation Results. Sum rate for four users andfour access points.

remark that these conclusions driven by the use of idealGaussian coding may be misleading. For example, whenthe system is constrained to use constellations of fixed size,as observed before, the advantage of BIA over TDMA canbe significant. For example, in the results of Section 6.1, wesee that BIA achieves almost a rate 4/3 times larger thanTDMA.

7. MEDIUM ACCESSOur performance evaluation has shown that multiuser trans-

mission at high data rates is possible and that we can ap-proach the theoretical multiplexing gains in a real-world sce-nario. In the following we discuss issues pertaining to theMAC layer design for the full CSIT and no CSIT cases.

MAC design for full CSIT schemes. As opposedto standard 802.11, distributed multiuser MIMO transmis-sion requires a high degree of coordination between the APs.Part of that coordination, necessary for obtaining framealignment and phase synchronization, has been detailed inSection 3. The precoding schemes of Sections 4 and 5 re-quire joint precoding of the packets to be transmitted to thedifferent users. It results that the system design of a dis-tributed MIMO system is necessarily more centralized thanthe current enterprise WLAN design.

We envision a system that, at the MAC layer, resem-bles more current cellular deployments than the standardCSMA/CA-based large wireless deployment. The downlinkand uplink phases of the system should be completely sepa-rated in a Time Division Duplexing fashion. In the downlinkphase, there will be almost no random behavior, since down-link transmissions are centrally planned by the server andthen executed by the APs, see [3] for a very recent studyof this matter. With full CSIT, the clients can be selectedand jointly precoded on the basis of the knowledge of thesystem channel matrix H (see Equation (2)). In particular,the coding rate allocated to each data stream can be dy-namically adapted depending on the effective SINR of thestream. This is in line with what is currently done in today’sWLAN standards [20], that make use of adaptive coding andmodulation based on a family of coded-modulation schemes.

MAC design for schemes with no CSIT.When CSITis not available, the AP adapts its transmission rate (i.e.,its coded-modulation scheme) on the basis of some aver-age signal quality level, reported from the clients through aChannel Quality Indicator (CQI) feedback message, or sim-

50

ply obtained by measuring the quality of the uplink pack-ets. The CQI depends on the average SINR, rather thanon the instantaneous realization of the channel matrix H,and allows to adapt to coarse and slow signal strength vari-ations (e.g., due to distance and walls attenuation). Weassume that for practical reasons, obtaining a quality mea-sure in a timely manner is not possible. For example, inthe case of a moving receiver, the symbol SNR figures forBIA vary widely due to phase changes (see Section 6). Itfollows that the channel quality is, as far as the transmit-ter is concerned, a random variable with quite high varianceand that the rates supported may be very different fromone packet transmission to the next. We may choose one oftwo strategies in order to deal with the rate uncertainty: aconservative rate adaptation approach that chooses a smallrate for every transmission or an incremental redundancyscheme.

In incremental redundancy schemes, transmission occursin blocks. An encoder produces several consecutive encodedversions of a block which are sent over the channel to the de-coder. Each successive pass increases the amount of mutualinformation transfered. At some point the decoder accumu-lates sufficient information to decode the block and verifiesits correctness through a checksum check. At this pointthe decoder informs the transmitter, through an acknowl-edgment, that it can proceed to the following informationblock.

Rateless codes have been proposed as efficient buildingblocks for incremental redundancy systems. Raptor codes[26], for example, utilize a belief propagation-based decoderand implement the process described above. However, Rap-tor codes, along with a large class of rateless codes, must betuned depending on the channel SNR figure in order to ef-fectively approach capacity. More recent approaches do notsuffer from this shortcoming. Strider [17] is able to oper-ate in the presence of unknown amounts of interference andwhen paired with a capacity achieving code comes very closeto the actual channel rate. Spinal codes operate with a verydense constellation suitable to many channel conditions [21].The spinal code decoder makes use of the channel SNR fig-ure, however this quantity can be estimated efficiently [8]for each block transfer and used appropriately in symbollikelihood estimation.

8. DISCUSSION AND FUTURE WORKIn this work we have implemented for the first time ZFBF,

THP, and BIA in a distributed MIMO setting. All schemeswere shown to achieve almost all of the multiplexing gainpredicted by theory in our testbed, consisting of two accesspoints and two clients. In the rest of this section we discussscalability and implementation overhead issues.

Scalability. For full CSIT schemes, the main issues af-fecting the SINR of distributed multiuser MIMO are theaccuracy of channel estimation and the ability to maintainaccurate phase synchronization between the APs. Both is-sues are not affected by the number of clients, but dependon the number and geometry (distances) of the jointly co-ordinated APs. In future work, we seek to understand howmany APs can be jointly coordinated through the synchro-nization scheme that we have designed. In general, we mayenvisage that “clusters” of jointly processed APs can operateaccording to a distributed multiuser MIMO scheme.

Implementation Overhead. All schemes discussed in

this paper, namely ZFBF, THP, BIA, as well as TDMA,require downlink pilots, similarly to 802.11. The main dif-ference when it comes to overhead is the need for CSIT.Specifically, ZFBF and THP need to estimate the channelfrom the uplink in special slots, and rely on reciprocity toget an estimate for downlink. In contrast, BIA and TDMAdo not have this need. An additional benefit of BIA is thatit requires a single RF chain and very low power, makingit ideal for small devices like smartphones. That said, theperformance gain from CSI aware schemes is quite signifi-cant, as shown in Section 6. When it comes to implementingthe MAC, all schemes require scheduling to materialize highrates. It is part of future work to implement in our testbedMAC protocols that utilize efficient schedulers, as well as toimplement several incremental redundancy schemes.

9. ACKNOWLEDGMENTSThe authors would like to thank the reviewers and our

shepherd, Xinbing Wang, for their assistance in improvingthe manuscript of this paper.

This work was supported by the Ming Hsieh Institute,the Army Research Laboratory (ARL) Collaborative Tech-nology Alliance (CTA) number W911NF-09-2-0053, MarieCurie grants 256416 and 274523, Cisco Systems under aCRC grant and NSF grant CIF-0917343.

10. REFERENCES

[1] S. Alamouti. A simple transmit diversity technique forwireless communications. IEEE J. Sel. Areas

Commun., 16(8):1451–1458, 1998.

[2] E. Aryafar, N. Anand, T. Salonidis, and E. W.Knightly. Design and experimental evaluation ofmulti-user beamforming in wireless LANs. In ACM

MobiCom, Chicago, IL, 2010.

[3] H. Balan, A. Michaloliakos, R. Rogalin, G. Caire, andK. Psounis. Efficient MAC for distributed multiusermimo systems. In Information Theory and

Applications Workshop, La Jolla, CA, Feb. 2012.

[4] A. Bennatan, D. Burshtein, G. Caire, and S. Shamai.Superposition coding for side-information channels.Information Theory, IEEE Transactions on,52(5):1872 – 1889, may 2006.

[5] F. Boccardi, F. Tosato, and G. Caire. Precodingschemes for the mimo-gbc. In Communications, 2006

International Zurich Seminar on, pages 10 –13, 0-02006.

[6] G. Caire, N. Jindal, M. Kobayashi, and N. Ravindran.Multiuser MIMO achievable rates with downlinktraining and channel state feedback. IEEE Trans. Inf.

Theory, 56(6):2845–2866, 2010.

[7] G. Caire and S. Shamai. On the achievable throughputof a multiantenna gaussian broadcast channel. IEEETrans. Inf. Theory, 49(7):1691 – 1706, Jul. 2003.

[8] B. Chen, Z. Zhou, Y. Zhao, and H. Yu. Efficient errorestimating coding: feasibility and applications. InACM SIGCOMM, pages 3–14, New Delhi, India, 2010.

[9] J. I. Choi, M. Jain, K. Srinivasan, P. Levis, andS. Katti. Achieving single channel, full duplex wirelesscommunication. In IEEE MobiCom, Chicago, IL, 2010.

[10] M. Costa. Writing on dirty paper (corresp.). IEEETrans. Inf. Theory, 29(3):439–441, May 1983.

51

[11] M. Duarte, C. Dick, and A. Sabharwal.Experiment-driven characterization of full-duplexwireless systems. CoRR, abs/1107.1276, 2011.

[12] U. Erez, S. Shamai, and R. Zamir. Capacity andlattice strategies for canceling known interference.Information Theory, IEEE Transactions on,51(11):3820 – 3833, nov. 2005.

[13] U. Erez and S. ten Brink. A close-to-capacity dirtypaper coding scheme. Information Theory, IEEE

Transactions on, 51(10):3417 –3432, oct. 2005.

[14] J. Forney, G.D. and M. Eyuboglu. Combinedequalization and coding using precoding.Communications Magazine, IEEE, 29(12):25 –34, dec.1991.

[15] S. Gollakota, S. D. Perli, and D. Katabi. Interferencealignment and cancellation. In ACM SIGCOMM,Barcelona, Spain, 2009.

[16] T. Gou, C. Wang, and S. Jafar. Aiming perfectly inthe dark - blind interference alignment throughstaggered antenna switching. In IEEE GLOBECOM,Dec. 2010.

[17] A. Gudipati and S. Katti. Strider: automatic rateadaptation and collision handling. In ACM

SIGCOMM, Toronto, Ontario, Canada, 2011.

[18] K. C.-J. Lin, S. Gollakota, and D. Katabi. Randomaccess heterogeneous mimo networks. In ACM

SIGCOMM, pages 146–157, Toronto, Ontario,Canada, 2011. ACM.

[19] X. Liu, A. Sheth, M. Kaminsky, K. Papagiannaki,S. Seshan, and P. Steenkiste. Pushing the envelope ofindoor wireless spatial reuse using directional accesspoints and clients. In IEEE MobiCom, Chicago,Illinois, USA, 2010.

[20] A. Molisch. Wireless communications. Wiley-IEEE,2005.

[21] J. Perry, H. Balakrishnan, and D. Shah. Ratelessspinal codes. In ACM HotNets, Cambridge,Massachusetts, 2011.

[22] J. Proakis and M. Salehi. Digital communications.McGraw-Hill, New York, NY, 2007.

[23] H. Rahul, H. Hassanieh, and D. Katabi. SourceSync: adistributed wireless architecture for exploiting senderdiversity. In ACM SIGCOMM, New Delhi, India, 2010.

[24] Rice University. Rice university WARP project.

[25] T. Schmidl and D. Cox. Robust frequency and timingsynchronization for ofdm. Communications, IEEE

Transactions on, 45(12):1613 –1621, dec 1997.

[26] A. Shokrollahi. Raptor codes. IEEE Trans. Inf.

Theory, 52(6):2551 –2567, Jun. 2006.

[27] K. Tan, J. Fang, Y. Zhang, S. Chen, L. Shi, J. Zhang,and Y. Zhang. Fine-grained channel access in wirelessLAN. In ACM SIGCOMM, New Delhi, India, 2010.

[28] K. Tan, H. Liu, J. Fang, W. Wang, J. Zhang,M. Chen, and G. M. Voelker. Sam: enabling practicalspatial multiple access in wireless lan. In IEEE

MobiCom, pages 49–60, Beijing, China, 2009.

[29] S. Vishwanath, N. Jindal, and A. Goldsmith. Duality,achievable rates, and sum-rate capacity of gaussianmimo broadcast channels. Information Theory, IEEE

Transactions on, 49(10):2658 – 2668, oct. 2003.

[30] C. Windpassinger, R. Fischer, T. Vencel, and

J. Huber. Precoding in multiantenna and multiusercommunications. Wireless Communications, IEEE

Transactions on, 3(4):1305 – 1316, Jul. 2004.

[31] T. Yoo and A. Goldsmith. On the optimality ofmultiantenna broadcast scheduling using zero-forcingbeamforming. IEEE J. Sel. Areas Commun., 24(3):528– 541, Mar. 2006.

52

Documents

Achieving High Data Rates in a Distributed MIMO System