main

Price Discovery on Multiple Exchanges

Andreas Vig Logermann & Johan Emil Rasmussen ∗ †

January 21, 2015

Abstract

The study of price discovery is motivated by the recent trend of securities being tradedat multiple venues. The literature revolves around the issue of finding an efficient way toestimate the contribution of each market to the implicit efficient price process. This articleestimates the information share using the Cholesky decomposition originally proposed byHasbrouck (1995) and seeks to compare this measure with a recent extension, the spectraldecomposition of Fernandes and Scherrer (2014). The spectral decomposition approacheffectively eliminates the problem of the information share measure being order-variant.The conclusion of this paper is that taking the mean of the different Cholesky permuta-tions yields results that are somewhat equal to results given by the spectral decompositionmethod. Empirically, these two methods are compared using three different stocks, YHOO,AAPL and INTC, and their quotes from three liquid exchanges, NASDAQ, BATS and NYSEArca. Overall, NYSE Arca seems to be the most information dominant market, followedby NASDAQ. BATS seems to be the least information dominant market. To confirm thefull-sample results, daily VECM models are estimated, yielding information shares upon dailysubsamples. We construct an alternative full-sample information share measure using thedaily estimates, which yield somehow different results. NASDAQ becomes less informationdominant, while the information share of BATS increase, such that BATS in general becomesmore informationally dominant than NASDAQ. Our results show no consistent significantcorrelations between daily information shares and daily realized variance, daily volumesand daily mean spreads, respectively, although some negative correlation is present. Theinformation shares are used in a trading strategy application which exploits buy and sellsignals from the observed difference between the mid quotes in the most and least informationdominant market. The trading strategy consistently yields negative profits, which mightindicate efficient markets.

Keywords: Price discovery, VECM, trading strategies, high frequency data, TAQ, cointegra-tion, unit root, information share, Cholesky decomposition, spectral decomposition

Department of Economics and BusinessAarhus University

∗This article is produced in the course Empirical Market Microstructure as part of the M.Sc. of Economics at AarhusUniversity, autumn 2014. We want to thank Christina Scherrer for providing very valuable suggestions and discussionsand for being so interested in topic of the article.

†Study ID: 20102490, 20106193. Email: [email protected], [email protected].

1

1 Introduction

The degree of market fragmentation has grown during the last two decades, meaning an increasein the number of exchanges on which a single security trade. Because of this phenomenon, itbecomes of interest to determine which markets are price dominant - that is, the markets whichprocess news into their prices most rapidly and drive the reactions in other markets.

This market fragmentation increase has contributed to markets being less transparent, but hasalso made the issue of price discovery a core subject within the field of market microstructure.The rapid technological development has further enabled faster and even automated tradingwith less and less interference from human beings. These two trends have drastically changedthe structure and speed of the price formation process, and an entirely new term, HighFrequency Trading (HFT), which relies on arbitraging using and obtaining information aboutthe market microstructure, has emerged.

This article will cover the methodology of one of the main approaches for analysing pricediscovery (Hasbrouck, 1995) as well as certain modifications and extensions to this method.Further, we will apply an empirical study to determine on which markets the price discoveryoccurs for three technology stocks. Finally, we investigate whether one could profit from thisknowledge by developing a simple trading strategy.

The next section describes the existing literature, which will shed light on some of the discus-sions in the field.

2 Literature

The recent decades’ increase in market fragmentation combined with a higher availability ofhigh-frequency intraday transaction data has given way to much literature about the subject ofprice discovery. Some of the first work in the field is by Garbade and Silber (1979), who proposea taxonomy for categorizing relationships between market centers that trade identical assets.Markets are either "dominant" or "satellites", where the former lead the price discovery andthe latter adjust to the dominant market. Whether the satellite markets are "pure satellites" orwhether they play a role of incorporating new information into security prices is the object ofinterest in the study. Prices of the same security in different markets are assumed to be linkedby arbitrage and short-term equilibrium conditions, such that a common implicit efficient priceexists. Using a model with an autoregressive adjustment component to explain prices, theyconclude that regional exchanges are satellites, but not pure satellites.

In a later study, Garbade and Silber (1983) examine the price discovery of commodity futuresand spot markets. They conclude that the majority of new information is first incorporatedin future prices. It was later proposed, however, that it is difficult to generalize the modelsdeveloped in these papers to take into account the diversity of microstructure effects. Thearticles that followed applied a method of "lead-lag" return regressions, where one return isregressed against leads and lags of the other. These studies generally find that futures marketslead the cash markets. But, from an econometric point of view, Hasbrouck (1995) suggests thatthe models used in these papers are generally misspecified1.

Later studies have analysed price discovery in several ways, but particularly two main measureshave been adopted since the initial work by Garbade and Silber (1979, 1983). Both approachesare based on an implicit unobservable efficient price common to all markets, and both use areduced form cointegrated vector error correction model. The first is the information share (IS)technique by Hasbrouck (1995), which defines the information share of a specific market as theproportion of the efficient price innovation variance attributable to that market. Hasbrouck usesa one-second time resolution to measure the price discovery for equities traded on the NYSEand regional exchanges (using the best non-NYSE bids and offers). The method applies the

1Hasbrouck (1995) elaborates on this critique.

2

Stock and Watson (1988) decomposition of the price vector, where the permanent componentis a random walk with serially uncorrelated increments. Hasbrouck acknowledges that aproblem arises when price innovations are correlated across markets, and to solve this, heuses the Cholesky decomposition of the covariance matrix from the estimated vector errorcorrection model. This method is not order invariant, however. As a result, information sharesare estimated for different permutations of the price vector, which yields upper and lowerbounds for the measures. Hasbrouck finds these bounds to be almost equal. Point-estimatesof the information shares are then obtained by averaging over the different results. The studyfinds that the majority of the price innovation happens on the NYSE (the NYSE is "informationdominant"), and that the information share is larger than what the actual traded volume couldindicate. In a later paper Hasbrouck (2003) investigates two different ways in which to estimatethe information share of a market; one, where the model is based on the full data sample and asecond approach, where the daily information shares are estimated, and the point-estimate ofthe information shares is obtained by the mean of these daily measures. He argues that thedaily method facilitates the derivation of distributional results, and might impose less structureon the data, which in turn makes misspecification less of an issue. However, he obtains similarresults by the two methods.

Apart from Hasbrouck’s, the other main measure in the literature is the permanent-transitory(PT) decomposition method by Gonzalo and Granger (1995), which focuses on the commonfactor component weights of the different markets in forming the efficient price innovation.This approach is thus only concerned with the error correction process and the permanentshocks that result in disequilibrium. In this study, Gonzalo and Granger propose an alternativedecomposition of the price vector into permanent and transitory components, where theelements are linear combinations of the price vector alone. Under this method, the contributionof the different markets to the common factor is a function of the markets’ error correctioncoefficients. The authors consider three applications: The relationship between consumptionand GNP, dividends and stock prices and interest rates in Canada and the US, respectively.This method has been applied by Booth, So, and Tse (1999) and Chu, Hsieh, and Tse (1999)among others.

The two methodologies have been applied on several markets and with different samplingfrequencies. Four papers, deB. Harris, McInish, and Wood (2002), Baillie, Geoffrey Booth, Tse,and Zabotina (2002), Lehmann (2002) and de Jong (2002) all appear in a special issue of theJournal of Financial Markets in 2002. Independent of each other, the three latter seek to shedlight on the specifics of the Gonzalo-Granger PT measure and the Hasbrouck IS measure, howthey are related and when the two approaches give similar and different results.

Baillie et al. (2002) shows that the Hasbrouck and Gonzalo-Granger approaches are directlyrelated and give similar results if no residual correlation is present between the different markets.It is also shown, however, that the models provide different results when substantial correlationexists, since the Gonzalo-Granger approach does not take into account correlation amongthe markets. In the IS approach, the bounds of the calculated information shares are usuallyfound to be very far apart when much contemporaneous correlation between the markets ispresent. Baillie et al. (2002) suggest that Hasbrouck (1995)’s finding of almost equal upper andlower bounds is due to the contemporaneous correlation being almost insignificant under aone-second sampling frequency. Furthermore, the authors provide evidence to support the useof the average of the upper and lower bounds as approximation of the center of the informationshare distribution. This method yields an approximate point estimate of the information sharein order to resolve interpretational ambiguities. Also the statistical significance of the bands isassessed using the cross-sectional standard errors (Hasbrouck (1995) and Huang (2002) addressthese points as well).

Lehmann (2002) attempts to resolve the differences between the IS and PT approaches toprice discovery. The study shows that the Gonzalo-Granger portfolio-weights do not resolvethe identification problem inherent in measuring the contribution of different markets toprice discovery. Despite the fact that the decomposition of the efficient price in Hasbrouck(1995) is ambiguous when price changes are correlated across markets, he concludes that the

3

Stock-Watson common trend should nonetheless be used to estimate efficient prices.

de Jong (2002) shows that the coefficients of the Gonzalo-Granger common factors are justnormalized elements of the vector, that defines the Stock-Watson common stochastic trend. Hefurther concludes that the major difference between the two methods can be found in the roleof the variance of the innovations, and that only the information share accounts for variabilityof the innovations in each market’s price. Both methods’ merits are discussed: The PT approachis appropriate when the interest lies in constructing the innovations in the efficient price fromthe full innovation vector, while the IS measure is more concerned with the amount of variationin the prices and how much of that is explained by price changes in the different markets.

Lastly, deB. Harris et al. (2002) address the problem of non-uniqueness of the parameterestimates in the IS model, and therefore employ a time series approach, which is closely relatedto the PT method of Gonzalo and Granger (1995). They analyze the common factor weightattributable to three exchanges for 30 DJIA stocks over the period from 1988 to 1995, and findthat the NYSE common factor weights in general decline from 1988 to 1992. This is conjecturedto be due to a reduction in the competitiveness of the NYSE’s spreads, depth and immediacycaused by an emergence of innovative electronic communications networks (ECN’s) over thesame period. From 1992 to 1995, they see the NYSE common factor weights recover, as theNYSE spreads retightened during this period. Furthermore, the study considers different datacollection procedures: MINSPAN, REPLACEALL and XFIRST.2

The subjects of these four papers clearly revolve around the issue of the Cholesky decompositionnot being order invariant, and the Gonzalo-Granger approach not providing a satisfactoryalternative. Two standard solutions to this problem have appeared in the literature. The firstis to increase the sampling frequency of prices to decrease the contemporaneous correlationbetween price innovations across markets. The second is to average the IS measures for allpossible permutations of the price vector. In support of the first solution, Grammig, Melvin,and Schlag (2005) analyse exchange rates and equity quotes for three German firms tradingon the NYSE and the XETRA. They find little residual correlation when sampling prices every10 seconds, but substantial correlation when sampling at one minute. Furthermore, this studyshows how to apply a dynamic simulation to estimate the vector moving average parametersfrom the vector error correction representation.

These initial solutions have proven unsatisfactory, however. In the last decade, several papershave attempted to solve the ordering problem in alternative ways. Frijns and Schotman (2009)develops a tick-time model for the quote setting dynamics on Nasdaq, which avoids manyof the simultaneity issues mentioned above. Within this model they define new measuresfor price discovery contribution, which are closely related to Hasbrouck’s information shares.Yan and Zivot (2010) suggest a structural price discovery cointegration model as resolution tothe problems - that is, a structural VAR-inspired model where the sources of the shocks areidentified. They suggest that price discovery is dynamic in nature, and that only a structuralmodel will be able to provide a clear interpretation of price discovery. Using this model, thestructural determinants of both Hasbrouck’s information share and Gonzalo and Granger’scomponent share are investigated. In the structural model, the two approaches are used incombination to disentangle the impacts of permanent and transitory shocks. This yields anunambiguous measure of price discovery. Putnins (2013) apply this model in an empiricalstudy.

In Lien and Shrestha (2009) a modified information share (MIS) is computed by decomposingthe innovation correlation matrix. This yields a unique measure of price discovery. Thenew measure is based on a different factor structure, which gets around the problem of theCholesky factorization being order dependent. In a simulation study, the modified measure isfound to outperform both the IS and the PT measure. Lien and Shrestha (2014) extend theirprevious study to consider the case where price series don’t have a one-to-one cointegratingrelationship. That is, the authors propose a further generalization of the IS and MIS measures,called the Generalized Information Share (GIS). The equilibrium condition under the IS and

2In this paper, we will adopt the REPLACEALL approach.

4

MIS approaches requires equalities of all series, except for a constant difference. Under the GISmeasure, the only requirement is that the unit root series must be driven by a single stochastictrend, which implies that the number of cointegration relationships must equal the number ofprice series minus one.

Fernandes and Scherrer (2014) claim that there is no guarantee that increasing the samplefrequency will solve anything empirically, while taking the average of the information sharesunder different permutations will only be effective when securities are listed on a smallnumber of exchanges. When the number of trade locations grow large, the number of possiblepermutations increases exponentially. Furthermore, they address Lien and Shrestha (2009), andpoint to the fact that a decomposition of the correlation matrix is unnecessarily complex. As aresult, the study suggests to solve the ordering problem by basing the spectral decompositionon the covariance matrix instead. The procedure of the study yields a single, order-invariantmeasure of the information share.

To examine the accuracy of the estimated information shares, a few papers in the price discoveryliterature have employed bootstrapping procedures appropriate for handling time series data.These include, for example, Sapp (2002) and Grammig et al. (2005).

Finally, it is of interest to examine which (if any) observable characteristics of the marketstructure explain the information shares of the different markets. This question is examinedin Mizrach and Neely (2008), where the authors use the daily estimated information sharesobtained using both the Hasbrouck (1995) method and the Gonzalo and Granger (1995) method.In particular, they seek to determine whether observable liquidity measures of the marketsexplain the obtained information shares. Their test is based on a regression of the informationshare on the daily average spread, number of trades and the realized volatility (noise trades).They find that both wider spreads and higher realized volatilities are associated with a lowerinformation share, while the proportion of trades has a positive, but not always statisticalsignificant, impact on the estimated information shares.

This paper tries to shed light on the differences between the Hasbrouck (1995) method ofcalculating the information shares using the Cholesky decomposition, and the method ofFernandes and Scherrer (2014) that uses the spectral decomposition to obtain an order invariantmeasure of the information shares. Furthermore, two diverging practices are compared:one, where the model is based on the entire sample (as e.g. Grammig et al. (2005)), and asecond method, where we aggregate over daily information share measures to get the overallinformation share (as e.g. Hasbrouck (2003)).

3 Methodology

A walkthrough of the methodology used throughout the paper is presented in the nextsubsections, starting with an illustration of the general idea through a simple model. Next, wemove to a model covering the more general case, which will be used in our estimations in alater section. Following this subsection, the statistical properties of the model are discussed,including an introduction of an alternative procedure to obtain the information shares usingdaily estimates. Finally, we provide the steps for performing a bootstrap to examine thedistribution of the obtained information share estimates.

3.1 General methodology

The price of a security in a given market is determined by news about the specific security, andthe interpretation of these news. The prices for the same security should tend to converge in thelong run as a result of intermarket non-arbitrage, while short run deviations are possible dueto trading frictions. The study of price discovery therefore largely rely on cointegration. Theprices are assumed to be cointegrated I(1) variables that share one or more common stochastic

5

factors. In most price discovery studies, there is only one common stochastic factor, whichwe then refer to as the implicit efficient price common to all markets. Under the method ofHasbrouck (1995), the sources of variation in this efficient price are attributed to the differentmarkets. Hasbrouck thus defines a market’s contribution to price discovery as its informationshare: the fraction of the efficient price innovation variance which is attributable to that market.

In the study by Hasbrouck (1995), the information shares of two markets (NYSE and non-NYSE)are found. In this paper we adopt the method of Hasbrouck, and let the analysis cover threestocks, each traded on multiple exchanges. Further, we address the shortcomings of the originalorder invariant method of Hasbrouck by implementing the spectral decomposition, as originallydone by Fernandes and Scherrer (2014).

3.2 A simple model

In this subsection, we briefly motivate the empirical analysis by presenting the general ideathrough a simplified model. The section is to a large extent inspired by the corresponding sectionin Hasbrouck (1995), and seeks to illustrate the idea of price discovery when securities tradein multiple markets. We will illustrate the importance of cointegration and error correction inmarket microstructure applications, where we observe several time series that share a commonstochastic trend (which is the same as an implicit efficient price between the markets). Themodel assumes that a single security trades in two distinct markets, where all price discoveryoccurs in the first market. The (log) price dynamics are illustrated as

p1,t = p1,t−1 + wt (1)

p2,t = p1,t−2 + εt (2)

where p1,t and p2,t are price variables (bids and offer quotes, mid quotes or transaction prices)and wt and εt are i.i.d. uncorrelated zero-mean innovations. While the first price follows arandom walk, the price in the second market adjusts to the first market’s price lagged twoperiods and also include a random term. From this specification it is obvious that the two pricesare fundamentally driven by the first (and price dominant) market. This simple model alsoillustrates the important concept of cointegration between the series; while the two series arenon-stationary and may deviate shortly from one another, they do not diverge without bound.Since a linear combination of the two variables are stationary, the prices are cointegrated. Thiscan be seen by the difference of the two prices:

p1,t − p2,t = p1,t−1 + wt − (p1,t−2 + εt) = wt + wt−1 − εt (3)

which is a stationary variable. The VECM representation is used to estimate the model, wherethe stationary error correction term is included as a single explanatory variable:

∆p1,t = wt (4)

∆p2,t = (p1,t−1 − p2,t−1)− ∆p1,t−1 + εt (5)

Note that this is merely one of infinitely many possible representations, which all lead tothe same VMA representation. The cointegration vector is in this simple model equal toβ =

[1 −1

]. By writing up the model in its vector moving average representation,

∆p1,t = wt (6)

∆p2,t = ∆p1,t−2 + εt − εt−1 = wt−2 + εt − εt−1 (7)

, it is clear that the number of lags, with which the first market impacts the second, is nomore than two. Further rewriting of the model into Stock and Watson (1988)’s commontrends representation reveals that the two price series share a common stochastic trend. This

6

representation is obtained by writing up the model in its integrated form:

p1,t = p1,0 +

(t

∑i=1

wi

)(8)

p2,t = p1,0 +

(t

∑i=1

wi

)+ (εt − wt−1 − wt) (9)

By this model specification, both price series can be interpreted as the sum of an initial value,a common stochastic factor and a stationary term. In this application, the random walk termrepresents the implicit efficient price, which is shared by the two markets. Hasbrouck (1995)defines the information share of a market as the proportion of the common factor variancewhich is attributable to that particular market. Clearly, in this simple model all the pricediscovery happens in the first market, as the common stochastic factor consists entirely of errorterms from this market.

In the next section we present a more general setting which will form the basis of our empiricalanalysis.

3.3 A general model

If the investigated security trades across n different venues, one will have n different pricevariables, which are all closely related to the single security. Hasbrouck (1995) uses bid and offerquotes in his study, but notes that also transaction prices or mid quotes are viable alternatives.These (log) price variables are collected in the (n× 1) column vector, pt.

pt =

p1,tp2,tp3,t...

pn,t

(10)

Each element of the price vector is assumed to be integrated of order one (I(1)), i.e. the priceseries will contain a random walk component. Since the price series are integrated of order one∆pt will be stationary and I(0) (under assumption of no trend etc.).Then, if a linear combination of the price variables are stationary, pt is said to be cointegrated.In that case the Granger representation theorem (Engle and Granger, 1987) shows that thereexists a valid error-correction representation. The basis of the IS model is thus the estimation ofthe following Vector Error Correction Model (VECM),3

∆pt = αβ′pt−1 +K−1

∑i=1

Γi∆pt−i + et (11)

The VECM consists of two parts: The first part, αβ′pt−1, denotes the long run (equilibrium)dynamics between the price series. Because the temporary deviations from the efficient priceare stationary, α and β are both n× (n− 1) matrices with rank (n− 1). α contains the error-correction vectors (the adjustment coefficients) and zt−1 = β′pt−1 are the stationary errorcorrection terms. β′ contains the (n− 1) cointegration vectors, which are not unique: if β′ptis stationary, then so is c′β′pt, for any nonzero (1× (n − 1)) vector c′. In this VECM, wethus normalize the cointegration relationships in the form where the first part of β′ is anidentity matrix; i.e., β′ =

[In−1 : β′1

], where β1 has dimension (1× (n− 1)) (as proposed in

Johansen (1995)). The full-information maximum likelihood (FIML) could also be used tofind the cointegration vectors. By applying the FIML method, one does not rely on the ratherarbitrary assumption of which market to use as benchmark - however, this is not crucial in this

3Note that this VECM specification is not unique: there are multiple error-correction representations. Any preferencefor one over another must be based on economic intuition, since the different VECM’s lead to the same VMAspecification.

7

case, as all variables are present in the cointegration relationships and any price can be used asbenchmark price (Hasbrouck, 1995).

The second part of the VECM is the sum of the vector error correction coefficient matrices,∑K−1

i=1 Γi∆pt−i, which constitutes the short-run dynamics that follow from the short run devi-ations due to trading frictions. The zero-mean vector, et, contains the serially uncorrelatedinnovations with covariance matrix, Ω:

Ω = E(ete′t)

(12)

Now, because ∆pt and et are stationary, the term αβ′pt−1 is also stationary.

In order to determine the approproate lag length of the VECM, the Schwarz InformationCriterion (SIC) is used4. This criterion is preferred over, for example, the Akaike InformationCriterion (AIC), because SIC is consistent. When one has a fixed number of candidate modelsavailable, a consistent information criterion will select the true model asymptotically withprobability one. In this context, the smallest adequate model will be the true model. AIC is notconsistent, since the probability of choosing an excessive complex model does not convergeto zero as the number of observations becomes large. Put simply, the penalty for includingadditional parameters in the model is greater for SIC than for AIC, and thus SIC results in amore parsimonious model representation.

The VECM model can be written in the vector moving average (VMA) representation, underassumption of covariance stationary price changes:

∆pt = Ψ(L)et (13)

where Ψ(L) is a polynomial in the lag operator: Ψ(L) = In + Ψ1L + Ψ2L2 + Ψ3L3 + . . . .To decompose the implicit efficient price variance, the model can equivalently be written in itsintegrated form, which is closely related to the common trends representation by Stock andWatson (1988):

pt = p0 + Ψ(1)t

∑s=1

es + Ψ∗(L)et (14)

where p0 is a (n× 1) vector of constants and Ψ∗(L) represents matrix polynomials in the lagoperator, L, where Ψ∗i = −∑∞

j=i+1 Ψj. Ψ(1) is the impact matrix, which is obtained by the sumof the moving average coefficients:

Ψ(1) = In + Ψ1 + Ψ2 + Ψ3 + . . . (15)

Ψ(1)et provides the permanent effect of an innovation on the three prices, which can be seen bywriting up the components of this vector explicitly (for n = 3):

Ψ(1)et =

ψ11 ψ12 ψ13ψ21 ψ22 ψ23ψ31 ψ32 ψ33

e1te2te3t

(16)

Hence, the long run impact of a one-unit shock to the price of market j on price i is given byψij. Following the cointegration restrictions (βpt is stationary, and does therefore not includethe stochastic trend), it must be the case that β′Ψ(1) = 0. From the structure of β it follows thatthe rows of the impact matrix, Ψ(1), are identical: the long run impact is the same for all pricesas they share the same implicit efficient price. Therefore ψ11 = ψ21 = ψ31, ψ12 = ψ22 = ψ32 andψ13 = ψ23 = ψ33. If we denote the common row of Ψ(1) as ψ, the common trends representationcan be rewritten as

pt = p0 + ιψt

∑s=1

es + Ψ∗(L)et (17)

4Under this approach the VECM is estimated on the full sample.

8

where ι is a (n× 1) vector of ones. This representation effectively splits up the price vector intothe sum of an initial value, a permanent I(1) random-walk component and a transitory I(0)component. Hasbrouck (1995) denotes ψet as the component which is permanently impoundedinto the price, presumably due to new information. This term therefore represents the commonfactor between the prices, which is the implicit efficient price. The transitory component,Ψ∗(L)et, characterizes transient effects, which do not have permanent effects on pt. This latterterm only helps forecasting transient price innovations, and is ignored in the information shareapproach.

3.4 Information share

Hasbrouck defines a market’s contribution to price discovery as its information share: theproportion of the variance in the common factor, which is attributable to that given market.The idea is that price variance illustrates the information flow - thus, a market’s proportionalcontribution to this variance represents the share of the price discovery of that market. Theinformation share measure intuitively describes the degree to which a given market drives thereactions in other markets. The information dominant market is the market which moves firstin the price adjustment process after the arrival of new information, and therefore it is the mostinformative market with respect to the implicit efficient price process. Information shares aretherefore not necessarily related to, for example, the size of the spreads, or volumes traded onthe different markets.

Following the decomposition of the implicit efficient price variance above, the total variance ofthe common factor innovations can readily be obtained: Var(ψet) = ψΩψ′. This represents thedenominator in the information share measure.

In the case of no contemporaneous correlation between the markets, i.e. when Ω is diagonal,ψΩψ′ consists of n terms, where each term characterizes the contribution to the fundamentalprice innovation from a given market. The variance of the common factor attributable tomarket j (the numerator in the IS measure under assumption of a diagonal covariance matrix)is obtained as Var(ψejt) = ψ2

j Ωjj. The information share of market j is thus given as market j’srelative contribution to the total common factor variance:

Sj =ψ2

j Ωjj

ψΩψ′(18)

This normalized structure ensures that all the information shares sum to unity.

The Cholesky decomposition

If contemporaneous correlation between the different markets’ innovations is present, theinformation shares are not uniquely defined and the above equation does not hold. In otherwords, we need another approach for the general case where Ω is not diagonal. One method toreduce the contemporaneous correlation is to reduce the time aggregation by increasing thesampling frequency. This follows from the fact that a significant part of the contemporaneouscorrelation might come from time aggregation. When the correct sequence of the observationsin the different markets cannot be determined because the market data resolution is too low,the price changes will seem to be simultaneous. Nonetheless, this solution will only reduce thecorrelation and not eliminate it completely, since even at relatively high sampling frequencies(say, one second), it is very common that more than one market report a price update withinthe same interval.

Another approach consists of applying the Cholesky decomposition which is done in Has-brouck (1995) and several related papers. This approach effectively works around the issueof contemporaneous correlation by factorization of the covariance matrix, Ω, and can be usedto establish lower and upper bounds for the information shares of the different markets. The

9

Cholesky factorization of Ω is found by

Ω = FF′ (19)

where F is a lower triangular matrix which can be used to orthogonalize the price innovations,

et = Fzt (20)

where zt is a (n× 1) vector of random variables with E(zt) = 0 and Var(zt) = I. Then theinformation share can be computed as the market-share of the innovation variance attributableto zj,

Sj =([ψF]j)2

ψΩψ′

where [ψF]j is the j’th element of the (1× n) matrix [ψF]. This IS measure is order variant,which means that the ordering of the price series in the Cholesky decomposition has significantimpact on the obtained results. That is, the information share for a particular market variesaccording to where the price variable for that market is placed in the price vector. Baillie et al.(2002) proves that one obtains the largest (smallest) information share for a given market whenthe price variable of that market is placed first (last) in the sequence, under assumption of thecross correlation being positive. This can be seen intuitively from the structure of the lowertriangular matrix, F (again assuming n = 3):

F =

f11 0 0f21 f22 0f31 f32 f33

=

σ1 0 0σ2ρ21 σ2(1− ρ2

21)1/2 0

σ3ρ31 σ3ρ32 σ3(1− ρ231 − ρ2

32)1/2

(21)

When computing the information share for a given market and placing the market’s priceseries first in the sequence, the measure incorporates the own contribution of the market (σ1)and the market’s correlation with the other markets (σ2ρ21 and σ3ρ31). However, when placingthe particular market last in the sequence, only the "pure" contribution of the given marketis taken into account (σ3(1− ρ2

31 − ρ232)

1/2) - that is, the contribution of the variance, which isuncorrelated with the other price series (Baillie et al., 2002).

By computing the information shares for the different permutations of the price vector, it ispossible to derive upper and lower bounds of the information shares: The upper bound for agiven market’s information share is computed by placing the price variable of that market firstin the price vector. Likewise, the measure obtained by placing the market’s price variable lastin the price vector will serve as the lower bound.

Baillie et al. (2002) and Fernandes and Scherrer (2014) find that the IS bounds are often very farapart, therefore limiting their practical use, as results are at best ambiguous. This is particularlyrelevant if the contemporaneous correlation between the price series is significant. We confirmthese results, as our results also show high contemporaneous correlation and consequentlywide bounds for the estimated information shares. This is the case, despite of using relativelyhigh frequency one second resolution data. Therefore, ordering is indeed an issue in this articleThis contrasts with the results in Hasbrouck (1995) and Grammig et al. (2005), who find thebounds to be quite tight. Two explanations might be apparent: Grammig et al. (2005) examinesmarkets denoted in different currencies, and Hasbrouck’s paper is older; thus trading volumeswere smaller and the price discovery process arguably slower.

Though Baillie et al. (2002) find support for using the mean of the bounds to resolve theambiguities of the non-unique information shares, Fernandes and Scherrer (2014) note thataveraging over the bounds, across all the possible permutations, is not an adequate solution tothe problem of finding the true IS values.

The spectral decomposition

To accomodate the problem with order invariance in the above mentioned information sharemeasure, two closely related approaches have been used in the literature. Lien and Shrestha

10

(2009) use a spectral decomposition of the correlation matrix to derive a new information sharemeasure, which they call the Modfied Information Share (MIS). Fernandes and Scherrer (2014)instead propose a more direct way of making the information share order invariant, namely aspectral decomposition of the covariance matrix. We will pursue the latter approach here.

Fernandes and Scherrer (2014) relies on the following decomposition. Let Λ represent a diagonalmatrix where the diagonal elements are the eigenvalues of Ω. Further, let V be a matrix, wherethe columns are eigenvectors of Ω. The spectral decomposition of the covariance matrix is thenperformed by

Ω = VΛV−1 (22)

Ω1/2 = VΛ1/2V−1 = S (23)

The information share can then be calculated in an order invariant way:

SSj =

([ψS]j)2

ψΩψ′(24)

3.5 Statistical properties of the model

Under the assumption of the price changes being covariance stationary, the residuals of thevector error correction model are homoskedastic and exhibit no autocorrelation. This effectivelymakes sure that the OLS-estimated model coefficients as well as the estimated covariancematrix are asymptotically consistent (Hasbrouck, 2003). However, the characterization of thedistributions for the above estimates are not easily obtained, since the asymptotic sampledistributions can only be observed when the VECM innovations are i.i.d. normal. This is notlikely to be the case due to the discreteness of prices as well as the irregular timing of quotesubmissions. In addition, both of the main statistics in this paper are nonlinear functions of themodel parameters; the information share itself, and the impulse response functions (discussedlater).

A further problem might consist of the sample prices being nonstationary across different days,e.g. if a trend is observed over the sample period. This can motivate estimations of dailyinformation shares, such that the model is applied on daily subsamples.5 The evolution inthese estimated daily information shares over time are interesting in their own right, but mayadditionally be used to construct an estimate of the full-sample information share. This is doneas follows:

Let Sji denote the daily information share of exchange i on day j (for a given stock); that is,

exchange i’s contribution to the long-run common factor variance, implied by the model esti-mated only on day j’s sample. This daily information share can, like the full-sample measures,be obtained using both the Hasbrouck (1995) method and using the spectral decompositionas in Fernandes and Scherrer (2014). Now, if N is the total number of days in the dataset, thefull-sample (alternative) estimate of the information share can be computed as Si =

1N ∑N

j=1 Sji .

If one construct daily information shares using both the Cholesky - and the spectral decomposi-tion, this simple mean procedure will likewise yield two sets of alternative estimates for thefull-sample information shares of the different markets. By comparison, these can then be usedas a robustness-check of the originally obtained full-sample estimates of the information shares.We will apply this procedure, as well as the original full-sample procedure, in the estimationsection.

5In this case, the optimal lag length for the VECM must be found for every day in the sample, as a separate VECMis estimated every day.

11

3.6 Bootstrapping of time series

To examine the accuracy of the obtained information share measures, one can derive thedistribution of the information share estimates and construct confidence bands by the stationarybootstrap, first proposed by Politis and Romano (1994).

Normally a bootstrap requires i.i.d. observations - if not i.i.d., the bootstrap statistics will beinvalid. Since time series data is often dependent, and because the data series analysed in thispaper are further cointegrated and I(1), this poses an even greater challenge. Therefore, to retainthe dependency of the original sample in the bootstrapped sample, a parametric bootstrapcalled the stationary bootstrap, proposed by Politis and Romano (1994), is used. Li and Maddala(1997) compare different bootstrap methods for cointegration regression models, and by MonteCarlo simulation find evidence in favour of the stationary bootstrap. Related work, where thestationary bootstrap method is used in price discovery studies are Sapp (2002) and Grammiget al. (2005).

In this section, we will describe the procedure of the stationary bootstrap algorithm, which weapply in the estimation section. The steps are as follows:

1. Estimate the VECM of the same form as described earlier using OLS6.

∆pt = αβ′pt−1 +K−1

∑i=1

Γi∆pt−i + et (25)

2. Calculate the information shares using the the two different methods described above, foreach of the price series in pt: by the mean of the different permutations under the Choleskymethod, and by the spectral decomposition method (this is completely equivalent to theway the information shares are computed in the previous section).

3. Using the estimated set of residuals from the VECM model e1, e2, e3 (where 1, 2, 3denotes the trading venues, such that e1, e2, e3 are vectors of the residuals from the threevenues), perform a stationary bootstrap of the residuals:

(a) Randomly sample one set of the residuals from the residuals of the three tradingvenues, e1,j, e2,j, e3,j, for j ∈ 1, 2, . . . , T, where T is the total number of observa-tions in our dataset. This gives us three residuals, e∗1,1, e∗2,1, e∗3,1.

(b) When drawing the next residuals e∗1,2, e∗2,2, e∗3,2, two things can happen. Withprobability 1− p, the next set of residuals drawn will be the residuals immediatelyfollowing the ones of those drawn in (a). With probability p, step (a) is repeatedand we draw a new set of residuals, with replacement, from the set of residualse1,j, e2,j, e3,j, for j ∈ 1, 2, . . . , T.

(c) Repeat until there are T sets of residuals, which yields the (T × 3) matrix of boot-strapped residuals, e∗1 , e∗2 , e∗3.

4. The bootstrapped residuals e∗1 , e∗2 , e∗3, are now obtained, such that we can constructa bootstrap sample (pseudo time series of observations) by recursively inserting thebootstrapped residuals into the estimated VECM:

∆p∗t = αβ′pt−1 +K−1

∑i=1

Γi∆pt−i + e∗t (26)

The initial K− 1 ∆p∗’s are defined as:

∆p∗1 = e∗1,1, e∗2,1, e∗3,1 (27)... (28)

∆p∗K−1 = e∗1,K−1, e∗2,K−1, e∗3,K−1 (29)

where K− 1 denotes optimal lag order used in the VECM.6we have included a constant in the cointegration-relationship, because the series contain a trend in levels.

12

5. Using the new set of ∆p∗t , reestimate the VECM and calculate the IS measures.

6. Step 3 to 5 are repeated 250 times to obtain an estimate of the empirical distribution ofthe test statistics.

7. The corresponding 5% and 95% percentiles of the new IS statistics, based on the bootstrapsamples ∆p∗t , are used for the lower and upper boundaries of the confidence intervalaround the IS estimate from step 2. Because of potential skewness in the distribution, theconfidence interval might not be symmetric.

Because of the sampling structure in step 3 (b), the sampled bootstrap residuals will have ablock like structure, where the blocks’ lengths are random, but follow a geometric distribution.Note that in the specific case of drawing the last residuals e1,T , e2,T , e3,T in step 3, we will nextneed to draw residuals e1,T+1, e2,T+1, e3,T+1 with probability 1− p. Since these residuals donot exist in our sample, the data is wrapped around, such that e1 follows eT (the sequence ofresiduals can be thought of as a circle). This ensures stationarity of the re-sampled residuals, asopposed to a sample generated by simple or moving block bootstrapping.

The average block length, and thus p and 1− p, is chosen with the algorithm of Politis andWhite (2004), which was later corrected in Patton, Politis, and White (2009).7

4 Data

The data analysed in this paper is constructed from quote data for three technology companies:Yahoo! Inc., Apple Inc. and Intel Corporation, from now on referred to by their respectiveticker symbols: YHOO, AAPL and INTC. Quote data is obtained from the Wharton ResearchData Services TAQ Database, which contains intraday trades and quotes stamped at a one-second resolution for the entire set of securities listed on New York Stock Exchange, AmericanStock Exchange, Nasdaq National Market System and SmallCap issues.8 The trading periodconsidered is the first six months of 2013, resulting in 124 trading days (only 123 trading daysfor AAPL due to erroneous data).

4.1 Exchanges

The six-month dataset for the three stocks consists of quotes from thirteen different exchanges.Table 1 presents summary statistics for the three stocks. For each exchange, the market sharein terms of volume is reported as both the daily average number of quotes posted and theproportion occurring at the given exchange. The table clearly shows the high degree offragmentation for these three stocks, as no exchange has any evident quote volume dominanceover the other. That being said, some markets are undoubtedly more liquid (in terms of numberof quotes posted per day) than others. Therefore, in order to secure a sufficiently high frequencyfor a meaningful discussion, we choose to work with three of the most liquid exchanges, whereall three stocks trade: NASDAQ OMX Stock Exchange (T), BATS Exchange (Z) and NYSE Arca(P). In table 2, the average spread of each exchange, is presented. Comparing the spreads acrossexchanges, for each stock separately, it is noticeable that NASDAQ in every case has the widestaverage spread of the three, in spite of being the most liquid exchange in terms of quotes posted(for YHOO and INTC at least), as we saw above.

7The optimal average block length for the stationary bootstrap is estimated by

bopt,SB =

(2G2

DSB

)1/3

N1/3

where N is the number of observations, and the expression for G can be found in the original paper. DSB can be foundin Patton et al. (2009) since the expression for DSB was erroneous in the original paper.

8The WRDS database contains both a TAQ dataset with a one-second resolution and a dataset with a microsecondresolution. Unfortunately, we only had access to the seconds database through our institution’s subscription.

13

Since the first generation of Electronic Communication Networks (ECN) appeared in the 1990’s,several mergers and acquisitions have taken place in the industry, but two majors have survivedin a relatively identifiable form: NYSE Arca and NASDAQ OMX Stock Exchange. The NYSEArca ECN is a fully-electronic exchange, where more than than 8000 securities trade, includingstocks, options and ETFs. The exchange is a result of a reverse merger between the NYSE Groupand Achipelago Holdings in February 2006, and operates as NYSE’s electionic competitor tothe NASDAQ OMX Stock Exchange. NASDAQ, the first electronic stock market in the world,revolutionized the field of electronic trading in 1997 by introducing the original maker-takerpricing model, which offers to pay liquidity providers a rebate and charge liquidity removers afee (a similar model has afterwards been applied by NYSE Arca and BATS). The newer BATSexchange emerged in 2005 and have since then gained market share by employing a moreaggressively-priced market-taker model than NASDAQ.

YHOO AAPL INTCSymbol Exchange Daily quotes % Daily quotes % Daily quotes %

T NASDAQ OMX Stock Exchange 234263 0.37 83518 0.14 282618 0.33Z BATS Exchange 64609 0.10 67741 0.12 107274 0.12X NASDAQ OMX PSX Stock Exchange 54680 0.09 1632 0.00 59559 0.07P NYSE Arca 54079 0.08 100341 0.17 84024 0.10K Direct Edge X Stock Exchange 53418 0.08 50279 0.09 70469 0.08B NASDAQ OMX BX Stock Exchange 45546 0.07 67091 0.12 69782 0.08Y BATS Y-Exchange 36759 0.06 66523 0.12 55500 0.06C National Stock Exchange 36489 0.06 88639 0.15 34035 0.04J Direct Edge A Stock Exchange 33054 0.05 44471 0.08 70698 0.08A NYSE MKT (AMEX) 13042 0.02 0 0.00 9922 0.01W CBOE Stock Exchange 10383 0.02 6221 0.01 9683 0.01M Chicago Stock Exchange 1162 0.00 2 0.00 9739 0.01N New York Stock Exchange 0 0.00 1330 0.00 0 0.00

Table 1: Summary statistics for the three stocks, January to June, 2013. The table reports the daily average number ofposted quotes on all exchanges, as well as the market shares in terms of total quotes.

Average spreadsExchange YHOO AAPL INTC

NASDAQ 0.0260 0.5191 0.0397BATS 0.0175 0.4407 0.0134

NYSE Arca 0.0126 0.2855 0.0204

Table 2: The average spread in dollars for BATS. NASDAQ and NYSE Arca for the three stocks over the periodJanuary-June, 2013.

The different exchanges operate several routing strategies for their participants. These include,for example, routing to dark pools, which can offer possible price improvements and loweraccess fees, or splitting routable orders and sending them to multiple markets simultaneouslyat either the same - or at different prices. This is just a small subset of the large pool of orderrouting strategies that these exchanges provide.

To sum it up, the NASDAQ OMX Stock Exchange, the BATS Exchange and the NYSE Arca aresome of the world’s biggest ECNs in terms of volumes traded. From now on these exchangeswill be referred to as NASDAQ, BATS and NYSE Arca.

4.2 Data Cleaning

As described above, the three stocks we have chosen to analyse in this paper are characterizedby being very liquid and traded across different trading venues. A sample of the YHOO datafrom January 3, 2013 can be seen in table 3. The sample shows the posted quotes at 09.30, justafter the core trading session begun. Most of the variables are self-explanatory, except "mode"

14

and "ex": mode describes the quote condition, and ex represents the exchange, on which thequote is posted. 99.99% of the quotes in the dataset are condition 12, which indicates a normaltrading environment. Every quote not being mode 12 is treated as erroneous and filtered awayto avoid invalid data.

# Obs. symbol date time bid ofr bidsiz ofrsiz mode ex1816 YHOO 20130103 09:30:00 19.96 20.05 5 1 12 Z1817 YHOO 20130103 09:30:00 19.95 20.09 1 1 12 B1818 YHOO 20130103 09:30:00 19.95 20.09 1 1 12 Y1819 YHOO 20130103 09:30:00 19.98 20.05 5 1 12 Z1820 YHOO 20130103 09:30:00 19.98 20.05 6 1 12 Z1821 YHOO 20130103 09:30:00 19.98 20.09 6 1 12 Z1822 YHOO 20130103 09:30:00 19.95 20.10 1 1 12 B1823 YHOO 20130103 09:30:00 19.98 20.10 6 12 12 Z1824 YHOO 20130103 09:30:00 19.95 20.10 1 1 12 Y

Table 3: Data showing quotes on the YHOO stock after the opening 09:30:00 the 03 January 2013. The 1815 quotesbefore, have been outside the core trading sessions’ opening time. Mode describes the quote condition (99.999 % aremode 12). ex tells which exchange the trade is on.

The employed estimation technique treats all observations equal. An initial cleaning of thedataset to remove outliers and database errors is therefore essential in order to obtain correctlyestimated models.

The entire dataset is cleaned according to a procedure proposed by Barndorff-Nielsen, Hansen,Lunde, and Shephard (2009a). The original procedure is applied on data consisting both oftransaction - and quote data, and is therefore split up into three groups of sub-steps. The firstgroup of steps (P1-P3) is applied to all their data, whereas the second and third group of steps(Q1-Q4 and T1-T4) are only applied to their quotation and trade data, respectively. In thispaper, we work only with quotes, and therefore apply the first two groups of steps (P1-P3 andQ1-Q4), which are described as follows:

P1. Delete entries with a time stamp outside the 9:30 am - 4 pm opening window of theexchanges9.

P2. Delete entries where the bid or ask price is equal to zero.

P3. Delete entries not originating from the exchanges which are being investigated in thispaper.

Q1. Because of the one-second resolution of the data, many quotes have the same time stamp,as they are posted in the same one-second interval. All these are aggregated into a singleentry using the median bid and ask price for each exchange separately.

Q2. Quotes where the spread is negative are deleted.

Q3. Delete entries for which the spread deviates more than 50 times from the median spreadon that particular day.

Q4. Delete entries where the mid-quote is more than 10 mean absolute deviations from arolling centred median of 50 observations (using 25 observations before and 25 after theentry under consideration, which itself is excluded).

The dataset contains many quotations outside the opening window of the exchanges. P1 ensuresthat only entries, which are posted in the opening window of the exchanges, are considered.For all the stocks, there are entries with bids or offers equal to zeros; these are removed in P2.Finally, in P3 we only retain entries where ex = T, Z, P - that is, all quotes not originatingfrom the exchanges NYSE Arca (P) , NASDAQ (T) and BATS (Z) are removed.

9The three exchanges considered in this paper have identical opening hours. Therefore, we do not have to take intoaccount non-overlapping trading periods.

15

# Obs. datetime bid_t ofr_t sumbid_t sumofr_t mid_t bid_z ofr_z sumbid_z sumofr_z mid_z bid_p ofr_p sumbid_p

1 02-01-2013 09:30:00 20.230 20.25 467 2293 202.400 20.195 20.25 62 68 202.225 20.20 20.25 942 02-01-2013 09:30:01 20.230 20.25 399 484 202.400 20.210 20.25 41 110 202.300 20.23 20.26 333 02-01-2013 09:30:02 20.230 20.25 663 640 202.400 20.210 20.25 60 58 202.300 20.21 20.25 494 02-01-2013 09:30:03 20.220 20.23 395 386 202.250 20.220 20.24 68 31 202.300 20.21 20.24 525 02-01-2013 09:30:04 20.220 20.24 1570 446 202.300 20.220 20.23 51 9 202.250 20.22 20.24 656 02-01-2013 09:30:05 20.220 20.23 2813 339 202.250 20.215 20.24 177 46 202.275 20.21 20.24 217 02-01-2013 09:30:06 20.220 20.23 784 84 202.250 20.220 20.24 37 12 202.300 20.22 20.23 58 02-01-2013 09:30:07 20.220 20.23 134 12 202.250 20.220 20.23 18 4 202.250 20.22 20.23 69 02-01-2013 09:30:08 20.220 20.23 249 19 202.250 20.220 20.24 4 4 202.300 20.22 20.23 17

11 02-01-2013 09:30:10 20.220 20.24 1428 428 202.300 20.230 20.24 45 30 202.350 20.23 20.24 53

Table 4: The ten first tuples formed by the REPLACEALL procedure for YHOO. Each tuple has variables for each of thethree exchanges, as seen by their suffixes; t (NASDAQ), z (BATS) and p (NYSE Arca). The variables "bid" and "ofr"represent the median bid and offer price for a particular exchange, while "sumbid" and "sumofr" describe the quantityof shares associated with the particular quotes. All "mid" variables contain the mid quote price (the mid point betweenthe bid and ask price).

In Q1, we consider the case when multiple quotations are posted within a single one-secondtime stamp for a given exchange. These data points are aggregated into a single entry consistingof that exchange’s median bid and ask price for the given second. Since each quote containsbid and ask prices as well as bid and ask quantities, the quantities are also aggregated forthese quotes. This is done by simply summing over the exchange’s bid and ask quantities,respectively, for the given second.

The Q2 step deletes negative spread quotations for every exchange separately, since these areclearly erroneous. The rule in Q3 of deleting entries for which the spread is more than 50times the spread of that day is rather arbitrary according to Barndorff-Nielsen, Hansen, Lunde,and Shephard (2009b). However, the authors’ extensive experimental tests find 50 times themedian to be a suitable value when using the realized kernel estimator. Q4 employs a rollingcentred median, the purpose of which is the same as in Q3. The difference lies in Q4’s localfocus, which is applied through the rolling window. This allows for a filtering of outliers,which are missed by Q3. It should be noted, that we modify Q4 slightly, since if 51 quotesin a row all have identical quoted prices, except for the middle value which diverges fromthe others by 0.01, this middle quote would be filtered. This suboptimal result is avoided byimplementing an additional restriction in Q4: retain quotes for which the price change is lessthan 0.02. Furthermore Q4 is implemented by constructing a rolling window for each day inthe dataset, such that any natural price jumps occurring between closing and opening are notfiltered away.

Tuple data formation

To deal with the issues of non-trading and non-synchronicity of the different time seriesoriginating from the exchanges, several methods can be applied. In this paper, we constructtuples of data using the REPLACEALL procedure of deB. Harris et al. (2002), which minimizesthe number of periods between trades without affecting the order. More specifically, theapproach creates a tuple after all markets have traded, by taking the most recent trade on eachmarket, and then repeating the process.

Figure 1: Example of a trading sequence from deB. Harris et al. (2002)

Figure 1 depicts a trading sequence for a given security, which trades at three markets: a, band c. Applying the REPLACEALL procedure on this sequence results in the following tuples:a1b1c1, c3b2a2, c4b4a3, c5b5a4. deB. Harris et al. (2002) finds that the different tuple generatingmethods, MINSPAN, REPLACEALL, CHICAGOFIRST, PACFIRST and NYSEFIRST all producesimilar results, although some information shares are found to be 0.43 using one method and0.22 using another, so the information share results are generally not that robust to the sampling

16

Step Step explanation YHOO AAPL INTC

Start obs. 79,053,563 71,098,354 107,061,299Only mode 12 5,455 30,292 11,722

P1Outside opening hours (T) 106,226 295,739 187,330Outside opening hours (P) 100,432 831,081 576,460Outside opening hours (Z) 44,489 54,859 57,388

P2 Bids and offers = 0 16,672 17,520 14,051P3 Only exchanges T, P, Z 35,282,192 40,121,124 48,283,969

Q1Aggregate over seconds (T) 27,443,308 8,279,241 32,868,460Aggregate over seconds (P) 5,349,540 9,793,538 8,140,324Aggregate over seconds (Z) 6,763,091 6,662,018 11,559,548

Q2Negative spreads (T) 0 0 0Negative spreads (P) 0 0 0Negative spreads (Z) 0 0 0

Q3Extreme spread deviations (T) 0 0 0Extreme spread deviations (P) 0 0 0Extreme spread deviations (Z) 37 20 24

Q4Large mid-quote deviations (T) 0 2,216 0Large mid-quote deviations (P) 0 1,997 0Large mid-quote deviations (Z) 13 2,189 12

REPLACEALL (tuple construction) 1,150,362 1,358,220 1,278,417

Remove 06-06-2013 0 19,047 0

Final obs. 2,791,746 3,629,253 4,083,594

Table 5: The cleaning procedure broken down into different steps for each stock. The first and last row show the initialand final number of observations, respectively. The intermediate rows represent the undertaken cleaning steps, wherethe values indicate the number of observations removed for each step.

method used.

Table 4 shows the ten first tuples of YHOO, which were constructed by the REPLACEALLmethod. Each tuple consists of variables for each of the three exchanges, as seen by theirsuffixes; t (NASDAQ), z (BATS) and p (NYSE Arca). The variables "bid" and "ofr" represent themedian bid and offer price for a particular exchange, while "sumbid" and "sumofr" describe thequantity of shares associated with the particular quotes. All "mid" variables contain the midquote price (the mid point between the bid and ask price). sumbid and sumofr are the sum ofall the quantities associated with a given bid, ofr and mid price. There is no tuple for the timepoint 02-01-2013 09:30:09 since one of the markets didn’t trade at that particular second. Tuplesfor AAPL and INTC were created in the same way, and the total number of tuples was 930582for YHOO, 1216100 for AAPL and 1361198 for INTC.

Table 5 gives an overview of the cleaning steps conducted for the three stocks. Each row inthe table reports the number of observations removed for each step in the cleaning procedure,except for the first and last row. Clearly, the steps P3, Q1 and REPLACEALL remove/aggregatea great number of observations, effectively reducing the dataset size. For the AAPL stock allobservations for June 6, 2013 were removed, since they were all erroneous. 10

In practice, the cleaning procedure has been developed in R due to its excellent handling of bigdatasets. The cleaning procedure and the tuple formation amounts to around 500 lines of code,and takes around fifteen minutes for 6 months data to execute for a single stock. 11

10According to the quotes for June 6, 2013, the share price of AAPL declined 30% just for that particular date.11One valuable insight to keep in mind under the development of the cleaning process is to only load in parts of the

dataset when constructing the tuples. The run time of the code creating the tuples is exponentially dependent on thesize of the dataset. By only loading in parts of the dataset at a time, a more or less linear relationship between the runtime of the code and the size of the dataset can be obtained. The code is of course available upon request.

17

5 Estimation

This section contains all the estimation results. To analyse price discovery, we apply Hasbrouck’smethod, using the Cholesky decomposition, as well as the spectral decomposition method toobtain the information shares for the three individual stocks in the sample, where all of themtrade at the same three exchanges: BATS, NASDAQ and NYSE Arca. We choose to work withbid and offer quotes rather than actual transaction prices, since modeling under the latter wouldcontain autocorrelation problems in the case of infrequent trading (Hasbrouck, 1995). Morespecifically we construct our price series as the midquotes - the hypothetical quote between thebid and the ask.

Because the econometric approach relies on the individual price series being I(1), but sharing acommon trend, we first test for unit roots and number of cointegration relationships. Secondly,we estimate the vector error correction model to obtain the covariance matrix of the residuals,which will be used to find the information shares. Then, a dynamic simulation approach isemployed in order to find the vector moving average representation. As a by-product, theimpulse-response functions are obtained. From the coefficients of the VMA model we get thecommon row vector. Finally, for each of the three stocks, we calculate the information shares forthe different markets, using both the Cholesky and spectral decomposition method. We extendthe analysis by also computing the daily information shares for the period under consideration- again using both methods.

5.1 Unit roots and cointegration

Unit root tests

YHOO

ADF Phillips-PerronNASDAQ BATS NYSE Arca NASDAQ BATS NYSE Arca

Lag/bandw. 20 18 11 32 61 26Test stat. -1.6136 -1.5969 -1.6389 -1.5858 -1.6073 -1.6185

p-value 0.7880 0.7946 0.7778 0.7989 0.7905 0.7861

AAPL



p-value 0.1936 0.1967 0.2081 0.1998 0.2009 0.2035

INTC



p-value 0.3556 0.3551 0.3579 0.3737 0.3744 0.3747

Table 6: Unit root tests for the three exchanges and the three stocks. Lag/bandwidth describes the lag length used inthe ADF to whiten the errors, and the bandwidth of the Bartlett kernel in the Phillips-Perron test. The lag length isbased on the Schwarz Information criterion. Both the ADF and the Phillips-Perron tests have included a constant and atrend in the test equation.

Table 6 reports the unit root test for the three stocks (YHOO, AAPL and INTC) on the threeexchanges (NASDAQ, BATS and NYSE Arca). Neither the Augmented Dickey-Fuller tests northe Phillips-Perron test can reject the null hyphotesis of a unit root. All unit root tests of thethree log price series being I(2) are rejected (not reported here). The lag-length was chosen byminimizing the Schwarz Information Criteria. That is, both tests indicate unit roots in all threelog price series.

18

Next, we apply the Johansen maximum eigenvalue test to find the number of cointegrationrelationships for each of the three stocks. We include a constant, but no deterministic trendin the models. The results are shown in table 7. For all three stocks we reject both the nullhypothesis of zero and one cointegration relationships, respectively (r = 0 and r <= 1). As wecannot reject the null hypothesis of r <= 2, the tests clearly indicate two existing cointegrationrelationships among the three variables. This indicates that, for every stock, the three pricevariables share one common trend.

Johansen maximum eigenvaluetest for cointegration

YHOOH0 Alternative Test stat. 5pct 1pct

r <= 2 r=3 2.87 9.24 12.97r <= 1 r=2 23870.33 15.67 20.2

r = 0 r=1 27721.75 22.00 26.81AAPL

H0 Alternative Test stat. 5pct 1pctr <= 2 r=3 8.65 9.24 12.97r <= 1 r=2 27527.41 15.67 20.2

r = 0 r=1 32401.28 22.00 26.81INTC

H0 Alternative Test stat. 5pct 1pctr <= 2 r=3 1.87 9.24 12.97r <= 1 r=2 40696.97 15.67 20.2

r = 0 r=1 45091.08 22.00 26.81

Table 7: Johansen maximum eigenvalue test for cointegration relationships.

Now, we can estimate the vector error correction model. The SIC implies a different number ofoptimal lags for the three stocks: We find K = 22 for INTC, K = 24 for AAPL and K = 25 forYHOO.

The estimation of our VECM yields the following (normalized) cointegration vectors for thedifferent stocks12[

1.000000 0.000000 1.0000030.000000 1.000000 −1.000005

]︸︷︷︸

β′YHOO

[1.000000 0.000000 −0.9999970.000000 1.000000 −1.000008

]︸︷︷︸

β′AAPL[1.000000 0.000000 1.0000000.000000 1.000000 −1.000008

]︸︷︷︸

β′INTC

To save space, we will not report the coefficient estimates for the VECM. However, the estimatedcoefficients tended to show a particular behaviour, consistent over all three stocks: the modelledprice series did usually depend negatively on their own lagged values, and positively on thelagged values of the other two series. The negative autocorrelation was clearly strongest on theBATS exchange and generally weakest on NYSE Arca. To proceed, we obtain the covariancematrices of the innovations for each of the three estimated VECMs, one for each stock. Byalso finding the correlation matrices as shown in equation (30), it is clear, that there is severecontemporaneous correlation present between the different price series.

To write the cointegrated system in its VMA representation, enabling the derivation of the infor-mation shares, Hasbrouck (1995) considers the model in its vector autoregressive representationand then iterates forward on this representation to get the VMA coefficients. In this paper weapply a simple dynamic simulation related to Hasbrouck’s, but merely use the VECM instead

12These cointegration vectors are of course also order variant, and thus depend on the sequence in which the pricevariables are placed in the price vector. The reported vectors are obtained by placing the NASDAQ first, the BATSsecond and the NYSE Arca last in the price matrix.

19

and skip the step of writing up the model as a VAR. To apply this simulation, µ, pt−1, pt−2, ...is set equal to zero. We then shock the first market by setting e1t = 1 and e2t = e3t = 0 andsimulate the model for t, t + 1, ..., t + k, where we keep µ, et+1, et+2, ... equal to zero and wherek is the cutoff period. The dynamic simulation for the VECM yields the cumulative impulseresponse function - therefore, the first column of Ψ(1) is equal to the cumulative effects onthe three prices, k periods after applying a unit shock to the first market. The simulation isrepeated for shocks to the second and the third market, respectively, to obtain the second andthird row of Ψ(1). For all three stocks, k was set to 300, which is be more than sufficient, asthe impact on each subsequent period after lag 40 is close to zero. The obtained three Ψ(1)matrices are shown below. 0.338417 0.196286 0.585411

0.338418 0.196286 0.5854120.338416 0.196285 0.585409

︸︷︷︸

Ψ(1)YHOO

0.492644 0.188664 0.4581830.492650 0.188667 0.4581880.492646 0.188665 0.458184

︸︷︷︸

Ψ(1)AAPL 0.241936 0.260635 0.5558790.241938 0.260637 0.5558830.241936 0.260635 0.555878

︸︷︷︸

Ψ(1)INTC

Since Ψ(1)et represents the long run impact of an innovation to each price, the common rowsindicate that the long run effect is the same for each price. Because the rows are not exactlyidentical, we take the average of each column in the matrix to obtain the common row vector, ψ,for each stock.

Visual inspection of the autocorrelation and partial autocorrelation function of the VECM’sresiduals and squared residuals reveals no significant autocorrelation. Despite this, all ARCH-tests fails to rejects ARCH effects, and the multivariate Jarque-Bera test also fails to rejectnon-normality of the residuals. In addition multivariate Portmanteau- and Breusch-Godfreytest for serially correlated errors were performed, they both failed to reject serially correlatederrors.13

5.2 Impulse Response Functions

From the dynamic simulation we obtain the cumulative impulse response functions as a by-product, which, for each stock, represent the effects of a shock to NYSE Arca, NASDAQ andBATS, respectively. These transitional characteristics of the estimated systems are plotted infigure 2. The graphs in the top row depict how each of the exchanges’ mid quotes for YHOO areaffected over time when the mid quote price on NASDAQ, BATS and NYSE Arca, respectively,move up by one. The other rows show the graphs for AAPL and INTC. Clearly, a shock to agiven market very rapidly impacts the other markets. Immediately after the initial positiveshock to one of the markets, the market itself display partial reversal, while the two othermarkets move up. It is worth to note, that these transition paths only illustrate the expectedvalues and not the actual adjustments. Generally it holds that the shock’s impact that remainsin the long run is largest for shocks to NYSE Arca and lowest for BATS. It is further clear thatall the impulse response functions converge to the respective element in the Ψ(1) matrix forthe given stock. However, since we observe severe contemporaneous correlation between theseries, and this impulse response analysis rests on the assumption that shocks occur only in onemarket at a time, it might be misleading to set all the other contemporaneous errors to zero.Instead, the observed correlation of the error terms indicates that an innovation in one of theseries tends to be associated with an innovation in another series. The above analysis mighttherefore not provide a true illustration of the dynamic relationships between shocks and prices.

13Most of the test statistics are based on T-tests of one or more coefficients in an OLS regression. These coefficientstend to become significantly different from zero as the number of observations grow, and thereby, for example, wecannot reject the null of having eg. serial correlation in residuals.

20

To mitigate this problem we have looked at the orthogonalized impulse response functions,which are not reported here, since there were only small differences in relative magnitudebetween the IRFs reported here, and the orthogonalized IRFs.

Shock to NASDAQ − YHOO

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

NASDAQBATSNYSE Arca

Shock to BATS − YHOO

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

NASDAQBATSNYSE Arca

Shock to NYSE Arca − YHOO

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

NASDAQBATSNYSE Arca

Shock to NASDAQ − AAPL

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

NASDAQBATSNYSE Arca

Shock to BATS − AAPL

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

NASDAQBATSNYSE Arca

Shock to NYSE Arca − AAPL

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

NASDAQBATSNYSE Arca

Shock to NASDAQ − INTC

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

NASDAQBATSNYSE Arca

Shock to BATS − INTC

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

NASDAQBATSNYSE Arca

Shock to NYSE Arca − INTC

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

NASDAQBATSNYSE Arca

x−axis: periods

y−ax

is: c

umul

ativ

e im

puls

e re

spon

se fu

nctio

ns

Figure 2: The impulse response functions for the three stocks.

5.3 Information shares

We calculate the information shares using both the Cholesky decomposition (Hasbrouck, 1995)and the spectral decomposition (Fernandes and Scherrer, 2014) as described in the methodologysection.

When applying the Cholesky decomposition, there exist six different permutations of the pricevector. We compute the information shares for each market under all possible permutations, and,for each stock, report the lower bound, the upper bound and the mean over all permutations.The results are reported in table 8. We find the lower and upper bounds from the Choleskydecomposition to be very far apart - this again indicates substantial contemporaneous correlationbetween the different price series in spite of using relatively high-frequency one-second data.This result was expected, however, following from the large impact of the time aggregationstep on the sample size, described earlier in the data section. While the IS bands are thereforegenerally very wide, the mean over the permutations can be used as a single measure - this canthen be compared to the result of the spectral decomposition approach, which, as discussedearlier, yields a single, order-invariant measure of the information share. The standard errorof the IS measures are reported in parenthesis (derived by bootstrapping). It can be seen thatfor YHOO and INTC the standard error is largest for the spectral information shares, whilethe opposite is true for AAPL (the same can be confirmed by eyeballing the densities of theestimates on figure 3).

In several cases, the results of the Choleskly and the spectral IS methods do not agree uponwhich market that contributes most (and least) to price discovery: the price leader for AAPL(NASDAQ or NYSE Arca) and the least dominant market for YHOO (NASDAQ or BATS)

21

cannot be unambiguously determined. In other cases the two approaches reach the sameconclusion, and it can generally be seen that NYSE Arca is the information dominant marketfor both YHOO and INTC, while BATS and NASDAQ contribute the least to price discoveryfor AAPL and INTC respectively.

A full plot of the realized variance of the three stocks can be seen in appendix in figure 7, whereit can be seen that all three trading venues’ realized variance closely resembles each other,except that BATS sometimes has a slightly higher realized variance. The slightly higher realizedvariance, caused by a noisier quote price trajectory, might be part of the explanation to whyBATS is the least information dominant for YHOO and AAPL.

The result that NYSE Arca is the price leader for both YHOO and INTC contrasts with thefact that the daily average number of quotes posted on NYSE Arca is actually the lowest whencomparing the three exchanges considered (see table 1). Contrary, however, both approachesdetermine BATS as the least information dominant market for AAPL, which is aligned withBATS having the lowest volume, in terms of daily number of posted quotes, of the three. Theseresults therefore partially fulfills our ex ante expectation - that venues with the least volumealso, ceteris paribus, would be least information dominant. When comparing the estimatedinformation shares to the average bid-ask spread for the different exchanges in table 2, noobvious correlation seems to be present. This supports the fact, that the most informativemarket is not necessarily the market with the tightest spreads.

Information shares from whole sample model

YHOO (opt. lag=25)

NASDAQ BATS NYSE ArcaIS mean Cholesky 0.3278 (0.007) 0.3005 (0.0118) 0.3717 (0.0067)IS spectral 0.2821 (0.0151) 0.2866 (0.025) 0.4313 (0.0147)IS Cholesky lower 0.0135 0.0073 0.0455IS Cholesky upper 0.9127 0.8642 0.9684

AAPL (opt. lag=24)


INTC (opt. lag=22)


Table 8: The estimated information shares for the three stocks. For the Cholesky method, the lower and upper bound,as well as the mean over all permutations are reported. The single estimate for the spectral decomposition method asthe second row for each stock. In parenthesis the bootstrapped standard errors of the IS measures are reported.

Since the above results are computed using the full sample, we obtain one information sharemeasure, which is computed by the Hasbrouck (1995) method using the Cholesky decomposition(IS mean Cholesky), and one measure computed using the spectral decomposition of thecovariance matrix (IS spectral), for each of the three stocks. To examine the accuracy of theseresults, it is of interest to derive the distribution of the information shares estimated under bothmethods. We produce densities of the IS estimates by bootstrapping, using the methodology ofsection 3.6, where 250 bootstrap replications are employed for each stock.14 Figure 3 depictsthe density of the information shares. The red dots on the x-axis represent the lower boundary(5%) and the upper boundary (95%) of the confidence interval for the information share,computed using the Cholesky decomposition (CIS). The blue squares on the x-axis represent

14This is an computationally intensive task. The total number of 750 bootstrap replications required approximately30 hours to complete.

22

0.20 0.25 0.30 0.35 0.40 0.45 0.50

010

2030

4050

60

Density − NASDAQ − YHOO

IS CholeskyIS spectral

0.20 0.25 0.30 0.35 0.40 0.45 0.50

010

2030

40

Density − BATS − YHOO


0.20 0.25 0.30 0.35 0.40 0.45 0.50

010

2030

4050

60

Density − NYSE Arca − YHOO


0.20 0.25 0.30 0.35 0.40 0.45 0.50

010

2030

40

Density − NASDAQ − AAPL


0.20 0.25 0.30 0.35 0.40 0.45 0.50

05

1015

20

Density − BATS − AAPL


0.20 0.25 0.30 0.35 0.40 0.45 0.50

010

2030

40

Density − NYSE Arca − AAPL


0.20 0.25 0.30 0.35 0.40 0.45 0.50

020

4060

8010

0

Density − NASDAQ − INTC


0.20 0.25 0.30 0.35 0.40 0.45 0.50

05

1015

2025

Density − BATS − INTC


0.20 0.25 0.30 0.35 0.40 0.45 0.50

05

1015

2025

30

Density − NYSE Arca − INTC


x−axis: Information Share

y−ax

is: d

ensi

ty

Figure 3: Density plots of the information share estimates. The x-axis shows the information share, and the y-axis thedensity. The densities for each stock are obtained by bootstrapping.

the lower boundary (5%) and the upper boundary (95%) of the confidence interval for theinformation share, computed under the spectral decomposition (SIS).15 All the densities arealmost symmetric, and, for all stocks except AAPL, the CIS has tighter confidence bands thanSIS. While the AAPL CIS and SIS densities seem to be almost identical, there is clearly asystematic difference between CIS and SIS for YHOO and INTC, especially for the NASDAQand NYSE Arca exchange. One explanation for this difference between the CIS and SIS forYHOO and INTC could be greater contemporaneous correlation than in AAPL. The correlationmatrix (Ξ) for each stock is shown below, and it is obvious that there is considerably lowercontemporaneous correlation between the trading venues in the AAPL stock than in YHOOand INTC (even though the contemporaneous correlation is still substantial in the AAPL stock).

1.0000 0.8659 0.90250.8659 1.0000 0.87950.9025 0.8795 1.0000

︸︷︷︸

ΞYHOO

1.0000 0.7268 0.84760.7268 1.0000 0.71230.8476 0.7123 1.0000

︸︷︷︸

ΞAAPL 1.0000 0.8999 0.91730.8999 1.0000 0.91330.9173 0.9133 1.0000

︸︷︷︸

ΞINTC

(30)

5.4 Daily information shares

To investigate the robustness of our results and spot any potential interdaily differences in theefficient price process we have conducted the information share study for each day separately

15The confidence intervals are found as in step 7 of the algorithm presented in 3.6.

23

for all 124 trading days. As mentioned in the literature section, some of the price discoveryliterature, which uses Hasbrouck’s methodology, estimates daily VECM’s, and aggregates thedaily VECM’s measures up to cover the entire sample. There are pros and cons of estimatinga daily VECM. The pro is that structural breaks within the daily efficient price process areless likely than during the entire sample period. Therefore problems with non-stationarityin the full-sample might be mitigated by using daily samples instead. A con is that we usemore degrees of freedom to fit our model to the data, thus it is less parsimonious, since we get124 different model specifications. One can also argue that daily samples are not enough toestimate a long run relationship, but according to our impulse response functions reported infigure 2, one minute is enough for the prices to stabilize after a shock to one market.

Figure 4 depicts the evolution of the information shares for each of the 124 trading days (123for AAPL) for the three stocks on the three trading venues. Depicted left is the CIS (CholeskyInformation Share), and right SIS (Spectral Information Share). The bars on the plot are themean of the daily realized variance over the three exchanges, which is then depicted for thethree stocks. The daily information shares were calculated by splitting the dataset up in itstrading days, and calculate the information shares in the exact same manner as the full sampleinformation shares. A full plot of the realized variance of the three stocks can be seen inappendix in figure 7, where it is obvious that all three trading venues’ realized variance closelyfollow each other, except that BATS sometimes has a slightly higher realized variance.

The trajectories of the daily information shares are rather volatile. It is apparent from lookingat the plot that SIS are more volatile than the CIS. In figure 3 we saw that the SIS estimatesin general suffered from wider confidence intervals than the CIS estimates. The SIS estimatestherefore seem less robust to variations in data. Secondly the overall volatility in the CIS andSIS seems to be well explained by the realized variance, since INTC has the most stable dailyinformation share measures, and also the lowest realized variance.

Each daily VECM was estimated with its optimal lag length according to the Schwarz infor-mation criterion. We observed from visual inspection of the price trajectories that days withjumps and what looked like structural breaks in the efficient price process, also were the dayswhere a greater lag length for the VECM was chosen. The optimal lag lengths varied from 1 till30, where almost half of the optimal lag lengths were found to be less than 3. YHOO had 9days with the optimal lag length greater than 6, AAPL 1, and INTC 3. There seems to be somecorrelation between the optimal lag length and huge movements in the daily IS. Therefore itseems like jumps in the process drive some of the jumps in the daily IS’s. Evidence againstthis hypothesis is the fact that realized variance and daily IS doesn’t seem correlated at all, i.e.extreme volatility in daily IS estimates does not seem to be driven by volatile price movementsas measured by realized volatility.

Information shares by daily estimation approach

YHOONASDAQ BATS NYSE Arca

IS mean (Cholesky) 0.3040 (-0.0238) 0.3382 (+0.0377) 0.3578 (-0.0139)IS (Spectral) 0.2309 (-0.0512) 0.3656 (+0.0790) 0.4035 (-0.0278)

AAPLNASDAQ BATS NYSE Arca

IS mean (Cholesky) 0.3673 (-0.0100) 0.2689 (+0.0189) 0.3638 (-0.0089)IS (Spectral) 0.3437 (-0.0258) 0.2866 (+0.0404) 0.3697 (-0.0146)

INTCNASDAQ BATS NYSE Arca

IS mean (Cholesky) 0.3035 (-0.0135) 0.3246 (+0.0046) 0.3719 (+0.0089)IS (Spectral) 0.2272 (-0.0328) 0.3192 (+0.0099) 0.4536 (+0.0229)

Table 9: The information shares estimated under the daily estimation approach. The daily information shares arecomputed for each day of the entire sample, and the full-sample estimates are then generated by the sample mean ofthese daily estimates. Both the Cholesky - and the spectral decomposition are used. The numbers in the parenthesesrepresent how the measures change compared to the original-full sample measures, described in an earlier section.

24

IS −

mea

n ov

er p

erm

utat

ions

− Y

HO

O

020

4060

8010

012

0

0.10.20.30.40.50.6

NA

SD

AQ

BAT

SN

YS

E A

rca

IS −

spe

ctra

l dec

ompo

sitio

n −

YH

OO

020

4060

8010

012

0

0.10.20.30.40.50.6

NA

SD

AQ

BAT

SN

YS

E A

rca

IS −

mea

n ov

er p

erm

utat

ions

− A

AP

L

020

4060

8010

012

0

0.10.20.30.40.50.6

NA

SD

AQ

BAT

SN

YS

E A

rca

IS −

spe

ctra

l dec

ompo

sitio

n −

AA

PL

020

4060

8010

012

0

0.10.20.30.40.50.6

NA

SD

AQ

BAT

SN

YS

E A

rca

IS −

mea

n ov

er p

erm

utat

ions

− IN

TC

020

4060

8010

012

0

0.10.20.30.40.50.6

NA

SD

AQ

BAT

SN

YS

E A

rca

IS −

spe

ctra

l dec

ompo

sitio

n −

INT

C

020

4060

8010

012

0

0.10.20.30.40.50.6

NA

SD

AQ

BAT

SN

YS

E A

rca

x−ax

is: d

ays

y−axis: information share

Figure 4: The evolution of the information shares for each of the 124 trading days (123 for AAPL) for the three stockson the three trading venues (both CIS and SIS). The graphs are annotated with the daily realized volatility. Note thatthe scale of the realized variance is different than the IS estimates’ scale.

25

Several papers calculate the overall information shares as means of the daily informationshares. These papers include Hasbrouck (2003), Chakravarty, Gulen, and Mayhew (2004) andMizrach and Neely (2008). The results of the aggregated IS measures are included in table9, as well as the differences from the full sample measures are reported in parenthesis, tocheck the comparability of the mean of the daily IS results to our overall IS measure. Forboth the daily and overall model, NYSE Arca seems to be the information dominant market(except when looking at the Cholesky information share of AAPL). The differences between thedaily and overall estimates are much bigger for the spectral decomposition than the Choleskydecomposition. There are some major differences between the daily and the overall measuresas well: NYSE Arca is almost the same across the two different measures, while BATS becomesmore information dominant, and NASDAQ becomes less. The differences are so big that BATSand NASDAQ actually shift places in the ranking of information leading markets, from 1:NYSE Arca (0.392), 2: NASDAQ (0.322), 3: BATS (0.285) to 1: NYSE Arca (0.387), 2: BATS(0.317), 3: NASDAQ (0.296), where the numbers reported in parentheses in the text are theaverage information share across all stock measures.

To make this section complete, we have included plots of the densities of the daily informationshares in figure 5 and compared them to the bootstrapped densities in figure 3. It is apparentfrom the figure that the densities of the daily IS measures look identical to the bootstrappedinformation shares from the whole sample, except that the daily densities are much wider thanthe bootstrapped. As can be seen, the densities of the CIS and SIS estimates are more alike forAAPL than for YHOO and INTC, where the SIS is correspondingly to the left and the right ofthe CIS in NASDAQ and NYSE Arca. It is also observed that the mean of the information sharemeasures in BATS are closer than the other exchanges. The fact that the daily densities arewider than the bootstrapped densities is an indication of existing interdaily structural breaks,that the overall VECM model cannot adequately capture.16

0.1 0.2 0.3 0.4 0.5 0.6

05

1015

Density − NASDAQ − YHOO


0.1 0.2 0.3 0.4 0.5 0.6

05

1015

Density − BATS − YHOO


0.1 0.2 0.3 0.4 0.5 0.6

05

1015

Density − NYSE Arca − YHOO


0.1 0.2 0.3 0.4 0.5 0.6

02

46

8

Density − NASDAQ − AAPL


0.1 0.2 0.3 0.4 0.5 0.6

02

46

8

Density − BATS − AAPL


0.1 0.2 0.3 0.4 0.5 0.6

02

46

810

Density − NYSE Arca − AAPL


0.1 0.2 0.3 0.4 0.5 0.6

05

1015

20

Density − NASDAQ − INTC


0.1 0.2 0.3 0.4 0.5 0.6

05

1015

Density − BATS − INTC


0.1 0.2 0.3 0.4 0.5 0.6

05

1015

2025

Density − NYSE Arca − INTC


x−axis: Information Share

y−ax

is: d

ensi

ty

Figure 5: Densities of the daily information shares.

16Please remember that there cannot be made an immediate comparison between the bootstrapped densities and thedensities of the daily IS estimates, because they are different in nature.

26

5.5 Explainability of information shares

Which observable characteristics of market (micro)structure explain information shares? Toinvestigate the explainability of the obtained information shares a simple correlation analysis isundertaken. Three relevant variables that might explain a markets information share is: thedaily spread, the daily quantity quoted and the daily realized volatility. So the correlation ischecked between the daily Cholesky information shares, and the beforementioned variables.One problem with this correlation analysis is that the information shares of the three marketsare restricted to sum to 1, therefore only the two information dominant markets under theCholesky Information share, NASDAQ and NYSE Arca’s are incldued in the analysis. Theanalysis was conducted for both the Cholesky and Spectral decomposition IS measures, andalso both with Pearson and Spearman correlation, however only the IS Cholesky Spearmancorrelations are included for brevity, since results where similar across IS and correlationmeasures.

An a priori expectation to the correlation could be that daily realized variance would benegatively correlated with the IS on the same exchange and positively correlated with the otherexchanges. A reason for this expectation would be that more variance/noise on a market wouldmake signals about the efficient price process more noisy. One might think that the marketwith the tighest spread also would be the market incorporating new information first, sinceit is riskier for market makers to keep a tight spread as new information arrives, thereforeincentivizing them to revise quotes faster. This hypothesis would make daily mean spreadspositively correlated with the IS of the same market, and negatively with the other markets.Another hypothesis is exchanges with the largest volumes are information dominant, therebyleading to a positive correlation between daily quoted quantities and the information share.

CORRELATIONS

Daily realized variance Daily mean spread Daily quoted quantityDaily Cholesky IS NASDAQ BATS NYSE Arca NASDAQ BATS NYSE Arca NASDAQ BATS NYSE Arca

YHOO NASDAQ -0.01 0.08 0 0.08 0.05 -0.02 -0.05 -0.04 -0.04YHOO NYSE -0.04 -0.07 -0.05 0.16* 0.03 0.05 0.07 0.01 0.02AAPL NASDAQ -0.18** -0.05 -0.18** -0.37** -0.15* -0.16* -0.03 -0.24** -0.18**AAPL NYSE -0.2** -0.03 -0.21** -0.17* 0.04 -0.26** -0.04 -0.14* 0.02INTC NASDAQ -0.06 -0.02 -0.05 0 -0.02 -0.08 0.07 0.06 0.14*INTC NYSE -0.23** -0.2** -0.24** -0.17* -0.24** -0.13* -0.09 -0.03 0

Table 10: Correlation between Daily Cholesky information shares and the daily descriptive variables: realized variance,mean spread and quoted quantity. ** significant at a 5% level, * significant at a 10% level.

The results of the correlation analysis can be seen in table 10 and none of the above mentionedhypothesis can be consistently confirmed. There is some negative correlation present betweendaily Cholesky IS and realized variance, but it the same across all exchanges. The same is truewith daily mean spreads. Since the number of correlations calculated are rather high, therecould be a data snooping issue, i.e. some correlations might be significant by chance.

6 Application

The degree of automated - and high frequency trading have grown in the recent decades. Whenexamining the degree of price discovery, it becomes of natural interest to find out whether onecould profit from this knowledge. Along these lines, we develop a simple (medium-frequency:trading within the minute) trading strategy and backtest it over the sample period for all threestocks. Our results show that the strategy yields consistent negative returns, indicating efficientmarkets. Further, we acknowledge that even a thorough backtest is not sufficient for testing,and a step-forward/out-of-sample test should be applied as well. However, since the backtestfind consistent declining profits after applying our strategy, we will not continue to performout of sample testing here.

27

The strategy

The results obtained in the estimation section indicate that price discovery occurs on all themarkets, but some markets are more price dominant than other. It is this result that we willutilize in this short section.

We control for look-ahead bias by making sure that a trade signal only relies on informationavailable up until the time of that trade. Additionally we seek to minimize potential data-snooping bias by limiting the number of parameters to be optimized to two: the look-back length,to compute moving averages, and the entry threshold.

The strategy is described next.

1. Find the markets with the highest and lowest information share for a given market,respectively, and construct the difference of their mid quotes, by subtracting the low ISmarket’s mid quote from the high IS market’s mid quote for a given stock. When thisdifference is "large" and positive, the price leading market has priced the stock higherthan the low IS market (measured by mid quotes), and we will expect that the low ISmarket will adjust its prices upwards (on average). The opposite is the case when thedifference is negative and large in absolute value.

2. To make sure that this difference is statistically significant, construct a rolling mean androlling standard deviation of the difference. Now compute the Z-Score for a given secondby subtracting the value of the moving average (for that second and n seconds back) fromthe difference computed in step 1, and dividing by the rolling standard deviation (forthat second and n seconds back). n, the look-back length is one of the parameters to beoptimized.

3. Now, when the Z-Score rises above L standard deviations (drops below L standarddeviations), it represents a buy (sell) signal. We buy (sell) on the market which has thelowest offer (highest bid) prices17. The trade amount is equal to the amount associatedwith the quote price of interest, with a cap of 100 shares. If a buy (sell) signal occurin a second where we are already long (short), we allow the algorithm to buy (sell)more shares, still keeping the limit of maximum being long (short) 100 shares. L, theentry-condition is the second parameter to be optimized. The different entry-levels are setsuch that they correspond to the 2.5%, 1% and 0.1% critical values of the standard normalcumulative distribution function.

4. An exit signal occurs when we are long (short) and the Z-Score goes below L standarddeviations (above L standard deviations)18. We then exit the position to the highest bid(lowest offer) price of the information subordinate markets. If the entire position cannotbe closed to this price, the algorithm will move on to the second-best price. If the entireposition still cannot be closed in its entirety, the algorithm will continue the procedure inthe next second.

5. Run the strategy on the data sample. If a position is not closed at the last period in thesample, close it to the best prices possible. If there is not enough liquidity in the lastsecond to close the entire position, we assume that we can close the remaining amount tothe least attractive price.

The results

We test the strategy for all three stocks and for different parameter values. The results aregiven in 11. Across all parameter values and stocks, the algorithm provides consistent negative

17Note that we allow trades on both information subordinate markets even though we only use the least information-contributing market for finding entry and exit points.

18Note that this strategy uses identical exit - and entry levels. An alternative would be to set different exit levels(with an absolute value lower than L), such that the algorithm closed the position when the difference between the midquotes narrowed (on average meaning that the low IS market had caught up to the price leader). We backtested thisstrategy as well without any improvements in cumulative profits.

28

returns. The table also shows the total number of traded shares during the six month sampleperiod. As expected, the number of traded shares fall as we increase the entry threshold.Because of the lower number of traded shares, the loss also diminishes as we increase the entrythreshold.

YHOO AAPL INTCL Lookback Period Lookback Period Lookback Period

20 40 80 20 40 80 20 40 80Profits 1.96 -26.5 -27.4 -27.2 -151.1 -168.1 -157.0 -24.9 -25.3 -25.3

(in thousands 2.33 -21.9 -23.6 -24.1 -61.3 -75.6 -73.2 -21.4 -22.8 -23.5of dollars) 3.09 -10.4 -15.4 -16.8 -5.9 -29.2 -31.6 -12.2 -16.5 -18.1

Total number of 1.96 4431.4 4546.6 4546.7 1643.5 1654.8 1590.8 4492.5 4536.0 4550.5trades shares 2.33 3611.0 3904.5 3999.7 670.4 733.5 726.5 3927.5 4129.7 4167.8(in thousands) 3.09 1804.4 2513.2 2727.2 60.0 109.1 128.9 2301.2 3097.4 3316.5

Table 11: Profits and total number of traded shares reported for all stocks and all possible parameter combinations.

In 6 we provide an illustration of the trading strategy for INTC, using a particular set ofparameter values: L = 2.33 and lookback period equal to 20 seconds. The figure shows thatthe main part of trades yields a negative profit, and the left tail of the profit distribution isdefinitely fatter than the right. Furthermore the consistently decreasing cumulative profits areshown in the right part of the figure. Though this just illustrates a single case, the figures looksimilar across stocks and parameter values. The strategy doesn’t seem to be able to yield apositive profit for any of the chosen stocks or parameter values.

−1.0 −0.5 0.0

02

46

810

14

Profit per share traded ($)

Den

sity

Jan Mar May Jul

−20

000

−10

000

0

Time

Cum

ulat

ive

prof

its (

in d

olla

rs) INTC

Figure 6: The figure on the left shows the distribution of profits per share traded (in dollars). The distribution’s mean isobviously negative, and the left tail is also fatter than the right. The figure on the right shows the cumulative profits ofthe strategy over the six month data sample (January 2013 - June 2013).

Since the results show consistent negative returns, it may be tempting to reject the strategy anddeclare markets efficient. However, it should be noted that the estimated information shares inan earlier section are close to being equal for the different markets, and second, that the degreeof price discovery varies over the different methods. It might therefore be of interest to conductbacktesting of related strategies on other stocks (or exchanges), which show a more uneven andconsistent picture of where the price discovery occurs. It is also noteworthy, that the strategywere tested on the cleansed price series, which, as described in the Data section, are subjectto data aggregation. This means that all quotes posted within a given second are aggregatedto show a single quote, describing the median price and the cumulative bid - or offer amount.Testing on this modified dataset will thus provide a misleading picture of the real conditions.To mitigate this problem, one would need access to high frequency data such as a resolution inmilliseconds. Finally, the results of the strategy does not include transaction costs. This singleaspect will drive down profits significantly as a result of the large number of trades.

To end this section, it is worth to mention that we test one strategy only. It may be interesting

29

to examine other strategies related to the subject of price discovery on different stocks andexchanges.

7 Conclusion

The goal of this paper is to examine the issue of price discovery for three stocks: YHOO, AAPLand INTC. Furthermore, we want to test whether one can profit from this knowledge by utilizinga simple trading strategy. To answer these question, we adopt Hasbrouck (1995)’s informationshare methodology with the Cholesky decomposition, as well as the spectral decompositionby Fernandes and Scherrer (2014) to circumvent the problem of contemporaneous correlationbetween the price series. The empirical study is carried out using 1-second resolution TAQdata, which was aggregated using the REPLACEALL method and cleaned in various ways toremove the few outliers present in the dataset.

Generally we find that the results of YHOO and INTC resembles each other closely. That is,for these stocks, NYSE Arca seems to be the most information dominant market, followed byNASDAQ. This is not immediately surprising since NYSE Arca has significantly tighter averagespreads than the other exchanges. However, NASDAQ has two to three times as many quotes asthe other exchanges. Therefore, we confirm that information shares are not necessarily relatedto width of spreads or liquidity of a market. Furthermore, we find the confidence intervalsfor the Cholesky information shares to be much tighter than the confidence intervals obtainedunder the spectral information shares, but only for YHOO and INTC.

AAPL is special in the sense that the information shares for NYSE Arca and NASDAQ arealmost identical for both measures, and special because the spectral information shares havetighter confidence intervals than the ones constructed from the Cholesky decomposition. Whencomparing the Cholesky and spectral densitiy estimates, we see that the two methods yieldresults for AAPL, which are significantly more aligned than the two methods’ results for YHOOand INTC, respectively.

The impulse response functions illustrate the obtained information shares; it is clear that shocksto the least information dominant market naturally has the smallest impact to the efficient price.

To check the robustness of our results, daily VECMs and correspondingly daily IS measures arecalculated. Daily and overall model estimates yield somewhat different results, which mightbe an indication the full-sample VECM not being able to capture the structure of the sampleadequately. Under this alternative method, NASDAQ generally becomes much less informationdominant, while BATS becomes more information dominant. The effect is so profound thatBATS actually becomes more information dominant than NASDAQ.

To explain the information shares, a correlation analysis is conducted. The correlation analysiscannot conclude anything consistently, but there seems to be some negative correlation betweendaily information shares and daily realized variances, as well as some negative correlationbetween daily information shares and daily mean spreads.

Finally, the obtained information shares are used in a trading strategy application whichexploits buy and sell signals from the difference between the mid quotes of the most and leastinformation dominant market. The trading strategy consistently yields negative profits, whichmight indicate efficient markets. However, it might be interesting to test further strategiesrelated to price discovery, potentially using other markets and stocks, which show a moreuneven picture of where the price discovery occurs.

30

8 References

Baillie, R. T., G. Geoffrey Booth, Y. Tse, and T. Zabotina (2002): “Price discovery andcommon factor models,” Journal of Financial Markets, 5, 309–321.

Barndorff-Nielsen, O. E., P. R. Hansen, A. Lunde, and N. Shephard (2009a): “Realizedkernels in practice: trades and quotes,” Econometrics Journal, 12, C1–C32.

——— (2009b): “Realized kernels in practice: trades and quotes,” Econometrics Journal, 12,C1–C32.

Booth, G. G., R. W. So, and Y. Tse (1999): “Price discovery in the German equity indexderivatives markets,” Journal of Futures Markets, 19, 619–643.

Chakravarty, S., H. Gulen, and S. Mayhew (2004): “Informed Trading in Stock and OptionMarkets.” Journal of Finance, 59, 1235 – 1257.

Chu, Q. C., W.-l. G. Hsieh, and Y. Tse (1999): “Price discovery on the S&P 500 index markets:An analysis of spot index, index futures, and SPDRs,” International Review of Financial Analysis,8, 21 – 34.

de Jong, F. (2002): “Measures of contributions to price discovery: a comparison,” Journal ofFinancial Markets, 5, 323 – 327, price Discovery.

deB. Harris, F. H., T. H. McInish, and R. A. Wood (2002): “Security price adjustment acrossexchanges: an investigation of common factor components for Dow stocks,” Journal of FinancialMarkets, 5, 277–308.

Engle, R. F. and C. W. J. Granger (1987): “Co-integration and Error Correction: Representation,Estimation, and Testing,” Econometrica, 55, 251–76.

Fernandes, M. and C. Scherrer (2014): “Price discovery in dual-class shares across multiplemarkets,” Workingpaper <importmodel: Workingpaperimportmodel>, Institut for Økonomi,Aarhus Universitet.

Frijns, B. and P. Schotman (2009): “Price discovery in tick time,” Journal of Empirical Finance,16, 759–776.

Garbade, K. D. and W. L. Silber (1979): “Dominant and Satellite Markets: A Study ofDually-Traded Securities,” The Review of Economics and Statistics, 61, 455–60.

——— (1983): “Price Movements and Price Discovery in Futures and Cash Markets,” The Reviewof Economics and Statistics, 65, 289–97.

Gonzalo, J. and C. W. J. Granger (1995): “Estimation of Common Long-Memory Componentsin Cointegrated Systems,” Journal of Business & Economic Statistics, 13, 27–35.

Grammig, J., M. Melvin, and C. Schlag (2005): “Internationally cross-listed stock pricesduring overlapping trading hours: price discovery and exchange rate effects,” Journal ofEmpirical Finance, 12, 139 – 164.

Hasbrouck, J. (1995): “ One Security, Many Markets: Determining the Contributions to PriceDiscovery,” Journal of Finance, 50, 1175–99.

——— (2003): “Intraday Price Formation in U.S. Equity Index Markets.” Journal of Finance, 58,2375 – 2400.

Huang, R. D. (2002): “The Quality of ECN and Nasdaq Market Maker Quotes,” The Journal ofFinance, 57, 1285–1319.

Johansen, S. (1995): Likelihood-Based Inference in Cointegrated Vector Autoregressive Models, no.9780198774501 in OUP Catalogue, Oxford University Press.

31

Lehmann, B. N. (2002): “Some desiderata for the measurement of price discovery acrossmarkets,” Journal of Financial Markets, 5, 259–276.

Li, H. and G. S. Maddala (1997): “Bootstrapping cointegrating regressions,” Journal of Econo-metrics, 80, 297–318.

Lien, D. and K. Shrestha (2009): “A new information share measure.” Journal of FuturesMarkets, 29, 377 – 395.

——— (2014): “Price Discovery in Interrelated Markets,” Journal of Futures Markets, 34, 203–219.

Mizrach, B. and C. J. Neely (2008): “Information shares in the US Treasury market,” Journalof Banking & Finance, 32, 1221 – 1233.

Patton, A., D. N. Politis, and H. White (2009): “Correction to “Automatic Block-LengthSelection for the Dependent Bootstrap” by D. Politis and H. White,” Econometric Reviews, 28,372–375.

Politis, D. N. and J. P. Romano (1994): “The Stationary Bootstrap,” Journal of the AmericanStatistical Association, 89, pp. 1303–1313.

Politis, D. N. and H. White (2004): “Automatic Block-Length Selection for the DependentBootstrap,” Econometric Reviews, 23, 53–70.

Putnins, T. J. (2013): “What do price discovery metrics really measure?” Journal of EmpiricalFinance, 23, 68–83.

Sapp, S. G. (2002): “Price Leadership in the Spot Foreign Exchange Market,” Journal of Financialand Quantitative Analysis, 37, 425–448.

Stock, J. and M. Watson (1988): “Testing for Common Trends,” Journal of the American StatisticalAssociation, 83, 1097–1107.

Yan, B. and E. Zivot (2010): “A structural analysis of price discovery measures,” Journal ofFinancial Markets, 13, 1 – 19.

32

9 Appendix

Realized variance − YHOO

0 20 40 60 80 100 120

0e+

004e

−04

8e−

04

NASDAQBATSNYSE Arca

Realized variance − AAPL

0 20 40 60 80 100 120

0e+

004e

−04

8e−

04

NASDAQBATSNYSE Arca

Realized variance − INTC

0 20 40 60 80 100 120

0e+

004e

−04

8e−

04

NASDAQBATSNYSE Arca

x−axis: days

y−ax

is: d

aily

rea

lized

vol

atili

ty

Figure 7: Realized daily variance using 2 minutes observations for the three stocks on the three different trading venues:NASDAQ, BATS, and NYSE Arca.

33

Documents

main