23
Rismark Technical Paper Calculating High Frequency Australian Residential Property Price Indices December 2011 Dr. Matthew Hardman Head of Research Rismark International

Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

Rismark Technical Paper

Calculating High Frequency

Australian Residential

Property Price Indices

December 2011

Dr. Matthew Hardman

Head of Research

Rismark International

Page 2: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

INTRODUCTION............................................................................................................. 2

1 STRUCTURE OF THE INDICES .......................................................................... 3 1.1 BUSINESS REQUIREMENTS ................................................................................... 3 1.2 INDEX COMPOSITION ........................................................................................... 4

1.2.1 SSDs ................................................................................................................ 4 1.2.2 Composite Indices........................................................................................... 4 1.2.3 Reasons for Fixing the Weights ...................................................................... 5

2 CALCULATION OF THE INDICES..................................................................... 6 2.1 SUMMARY............................................................................................................ 6 2.2 ADJACENT PERIOD HEDONIC INDEX CALCULATION ............................................ 6

2.2.1 Format of Calculation..................................................................................... 6 2.2.2 Index Regions.................................................................................................. 9

2.3 SMOOTHING INDICES............................................................................................ 9 2.4 THE ADJACENT PERIOD HEDONIC INDEX AS NUMERAIRE.................................. 10

2.4.1 Continuity Issues........................................................................................... 10 2.4.2 Monthly Part of Series .................................................................................. 11 2.4.3 Daily Part of Series....................................................................................... 11 2.4.4 Most Recent Part of Series............................................................................ 12 2.4.5 Benchmarking Historical Sales..................................................................... 14

2.5 PRICING OF INDIVIDUAL PROPERTIES................................................................. 14 2.5.1 Hedonic Pricing Function............................................................................. 14 2.5.2 Imputation of Hedonic Variables.................................................................. 16 2.5.3 Neighbours Calculation ................................................................................ 17

2.6 CAPITAL GAIN INDEX CALCULATION................................................................. 18

3 CALCULATION OF RENTAL YIELDS ............................................................ 20 3.1 SUMMARY.......................................................................................................... 20 3.2 RENTAL YIELD CALCULATION........................................................................... 21

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

1

Page 3: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

Introduction In December 2011, RP Data-Rismark began publishing daily, Australian capital city residential property indices. These are house, unit and all dwellings indices for Sydney, Melbourne, Brisbane (including Gold Coast), Adelaide, Perth and Australia, the latter being a stock weighted average of these 5 capital cities. Thus, there are a total of 18 daily index series.

Additionally, indices using the same method will be calculated for the smaller capitals of Hobart, Canberra and Darwin, as well as an 8 city “All Australia” stock weighted average.

The purpose of such high frequency indices is to act as a set of reference assets for settling property index contracts, either over the counter or exchange traded.

This paper describes in detail the calculation of these indices via a hedonic imputation method. In particular, we discuss necessary modifications to the general method to meet the requirements of daily frequency and tradability: an index must represent a self-financing portfolio in an idealized, frictionless market.

Imputation based hedonic indices fit a regression based model to observed property sale prices, attributes and locations and a time variable in order to estimate the value of all properties in the market in each period. The crucial idea is that the index is constructed as the value of a market portfolio by valuing each property in that portfolio at each time point.

Imputation models may be viewed as a case of broader index number methods, which measure the weighted average price change of a basket of goods from period to period. Well known examples are Laspeyres and Paasche indices, which have been applied to house prices by Hill and Melser (2007, 2008) [HM1], [HM2]. These authors conclude that hedonic imputation methods are superior to median, repeat sales and time dummy methods.

Section 1 describes the suite of indices produced, as well as the stock weighted average method for obtaining composite indices such as all dwellings from house and unit indices or all Australia from the capital city indices. Additionally, we discuss the rationale behind the index construction.

Section 2 describes in detail the index calculation: the construction of an adjacent period hedonic index as a numeraire, the imputation of prices for all properties in the market from the sale times, prices and characteristics of those properties which do sell and the construction of the daily indices from the imputed prices of the set of properties representing the market.

Section 3 describes the method of calculating imputed rental yields on our indices, so that contracts such as index linked notes may pay a yield to the holder as well as a capital gain.

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

2

Page 4: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

1 Structure of the Indices 1.1 Business Requirements The indices were developed in response to the following business requirements:

To be a reference asset for the settlement of exchange traded property index contracts.

To be able to be calculated with daily frequency.

To be divisible into capital gains and rental components. That is, the daily reported index values must track the capital value of a self-financing, diversified market portfolio of residential property. The daily rental figure must equate to the rental yield on that day’s market portfolio.

To calculate Australian “composite” indices.

To calculate distinct house, unit and all dwellings indices for each of the major cities and the Australian composites.

The intention of synthetic contracts is to represent and mirror as closely as possible returns on the physical asset class, but in an idealized market where the asset class is infinitely liquid and there are no transaction costs (such as government stamp duty and agents’ selling fees).

An investor would ideally wish to construct a portfolio exactly representing a given market segment. Income from the portfolio would then represent an overall yield from the market segment. The investor would also wish to rebalance their portfolio each period to track changes in market composition.

Thus, returns on the underlying index should equate to returns generated by the above strategy: forming a portfolio, holding it for one period, booking the income yield and capital gain or loss, then rebalancing the portfolio.

In the case of tradable property indices, they should be capital gain indices and index linked notes should pay:

1. The value of the capital gain index at the expiry date.

2. A periodic income equal to a rental yield on the properties composing the index.

Each of the city indices will represent capital value in the sense that:

1. They will be based at the mean property value in $000 of the market segment which they represent

2. Daily returns on each index will equal daily returns on a portfolio of the properties composing the market segment represented.

It is clear that the method of constructing daily indices applies as well to other frequencies, such as weekly or monthly.

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

3

Page 5: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

1.2 Index Composition 1.2.1 SSDs Daily indices have been calculated for Sydney, Melbourne, Brisbane (including Gold Coast), Adelaide and Perth, Hobart, Canberra and Darwin.

The Australian Securities Exchange (ASX) has requested daily indices for the first 5 cities be provided to act as reference instruments for the settlement of ASX traded Property Index Contracts (PIC) in 2012. ASX traded contracts for Hobart, Canberra and Darwin may or may not be later added to the market.

The population of properties in each city is defined as a union of Statistical Subdivisions (SSD). The term SSD is defined by the Australian Bureau of Statistics. Definitions, names and maps may be found in [ASGC]. Each SSD is a connected region which typically comprises around 15 to 25 suburbs.

For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal population within each city.

The geographic regions represented by the daily city indices comprise the corresponding SSDs for that city, according to the ABS classification, with the following two variations:

1. The Brisbane indices represent the combined Brisbane and Gold Coast regions.

2. SSDs 30515 (Beaudesert Shire) and 30720 (Gold Coast Inland), while respectively part of the Brisbane and Gold Coast areas in the ABS classification, are not considered metropolitan areas for, and not included in the calculation of, the indices.

The Hobart index covers SSD 60505 (Greater Hobart).

The Darwin index covers SSDs 70505 (Darwin City) and 70510 (Palmerston).

The Canberra index covers all 7 Canberra SSDs. 5 of these 7 SSDs are comprised of a dozen or so Canberra suburbs, plus a significant, semi-rural region. A refinement of the Canberra index would involve excising particular suburbs or postcodes deemed rural from these SSDs.

1.2.2 Composite Indices For each of the above 8 cities, plus the Australian composites described below, there are 3 indices: Houses, Units and All Dwellings.

The All Dwellings indices are a stock weighted average of the corresponding House and Unit indices.

Additionally, we calculate “All Australia” composite indices. The daily Australian series for ASX trading are stock weighted averages of the corresponding 5 daily city indices and are calculated chiefly for trading purposes. We also extend this to stock weighted averages of the corresponding 8 city indices described in the previous section.

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

4

Page 6: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

The dwelling stock figures are fixed for 5 year periods and are taken from Australian census data. Thus, the current numbers are from the 2006 census. There is no intention to announce alterations to the weights until after the 2011 census results are published.

The dwelling stock numbers are as follows:

Stock Numbers - BRI includes Gold Coast Type SYD MEL BRI ADE PER HOB DAR CAN H 1,074,057 1,083,034 675,501 383,383 480,409 68,177 22,911 104,801U 339,783 193,572 104,449 45,701 45,062 7,928 6,777 11,791

For example, if the Sydney House index is 500 and the Sydney Units index is 400, the Sydney All Dwellings index is

SYDD = (1074057 * 500 + 339783 * 400) / (1074057 + 339783) = 475.97

1.2.3 Reasons for Fixing the Weights In reality dwelling stock numbers change every time a property is built or demolished. The ABS releases monthly building approvals figures and it is possible to estimate with reasonable accuracy total dwelling stock numbers at the end of each month and thus update the weighting of the composite indices.

This may be a satisfactory procedure for a set of research indices, however traders prefer a set of fixed weights. Any changes in those weights are required to be known well in advance of implementation. Any open contracts on a composite index during a change in weightings can be adjusted as they would be for equity derivatives in response to a jump in the stock price after a stock split or dividend payment.

A major factor in keeping the weights constant is to encourage liquidity by facilitating arbitrage opportunities between the individual and composite indices as temporary market asymmetries arise.

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

5

Page 7: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

2 Calculation of the Indices 2.1 Summary The indices will represent capital value: daily returns on each index will equal the best estimate of 1 day capital gains on the portfolio of properties represented by the index. The index calculation is in four parts (see Section 2.2 for details): 1. Price every property in the given category. 2. Form a portfolio of properties which represents the index from the category. 3. Calculate the change in value of that portfolio over the subsequent period ie. 1 day. 4. Multiply the previous index value by the relative change in portfolio value. The calculation to price every property is in five parts: 1. Inflate or deflate all observed historical log sale prices to a given fixed date via

addition of the increment in the adjacent period hedonic (log) index for the property type (house/unit) and statistical subdivision (SSD) in which the property is located. See Section 2.2 for details of the adjacent period hedonic index calculation method.

2. Transform the benchmarked sale price data by a fixed, monotonically increasing

function. 3. Fit the transformed data to a set of explanatory variables via a generalized additive

model to obtain a valuation function for each property type and SSD. 4. For all properties in the current population, apply the appropriate valuation function

to obtain transformed price estimates. 5. Apply the inverse of the transformation function in step 2 to the transformed price

estimates in step 4 to obtain the actual property price estimates. See Section 2.5 for details of steps 2 – 5 above. 2.2 Adjacent Period Hedonic Index Calculation 2.2.1 Format of Calculation As discussed above, an adjacent period hedonic index is calculated prior to valuing every property in the portfolio. This index acts as a numeraire in that historical sales occurring at various times are benchmarked via this numeraire to equivalent values at a common time prior to fitting the valuation model (see §2.4).

A hedonic variable is an observable attribute of a good such that variation in the value of the attribute is explanatory of some of the variation in the price of the good. In the case of residential property prices, examples of hedonic attributes are suburb, land size and number of bedrooms.

A hedonic property index is calculated from observed sale prices, taking into account the hedonic attributes of the properties which sold during each observation period. Since we

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

6

Page 8: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

are using our hedonic index in this case to benchmark log sale price observations prior to fitting a hedonic valuation model, our index will be in units of log price.

If we have a hedonic index series 0 1, , , ,tI I IK the relative value 2t 1t

I I− between any two periods is in our method, the least squares error estimate of the mean difference in log value over the time interval of the properties in the population, conditional on observing the prices and hedonic attributes of those properties which actually sold during the period of construction of the index.

1t → 2t

Given hedonic variables 1, , ,mX XK

( ) :n

a hedonic formula is applied to property sales in a set of periods

iP

0 , ,T TK

( )01 1 1

logm n

i j j j j j k kj j k

P c c f x s S Tν

iλ ε= = =

= + + + +∑ ∑ ∑ (2.1)

where: The jf are transformations of the hedonic variables. For example, 1X might be land

size and ( )1 log .f x x= The , and j j kc s λ are numerical coefficients to be estimated. The jS are dummy variables with 1jS = if property i is in suburb 1,..., .j ν= kT is a dummy variable with 1kT = if the sale occurred in period k and 0

otherwise. kT =

iε is the (zero mean) residual error term for property .i

The above hedonic model thus gives the least squares estimate of the log price of a property at the given time, conditional on the information available and controlling for its most statistically significant, objectively observable price determining attributes.

It is the coefficients kλ which constitute the index. Note that each coefficient has its own

estimation error, so we model each kλ as normally distributed with mean kλ (the actual estimate) and standard error .kσ

To estimate an index over the period 0,1,..., ,t n= we group the 1n + periods into consecutive pairs {

n} { } { }0 1 1 2, , , ,...,T T T T 1,n nT T− and use equation (2.1) with one time

dummy and one coefficient λ to calculate independent estimates of the coefficients , the transformations

n,j jc s jf and the independent estimates , .k kλ σ

The index (log) values are calculated inductively from a pre-defined base value 0I :

[ ]1

k k k

k k

I I E

I

λ

λ−

= +

= + (2.2)

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

7

Page 9: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

The coefficients and the transformations ,j jc s jf vary each period, allowing for changing marginal contributions to price of each hedonic attribute over time.

The adjacent period hedonic index thus obtained represents log returns on a capital gains index. The average return on the hedonic property index over a period [ ]0 ,t t is therefore

an estimate of the average return on a diversified property portfolio over [ ]0t t, .

Various functional forms of the land and floor size transformations have been tested, with the best ie. the ones which minimise the standard deviation of the error ,kε found to be

( ) log .f x = x Additionally, where the variables previous sale price of a mean or median of sale prices of neighbours are used as hedonic inputs, these are transformed to log values for the same reason.

The set of objectively observable attributes 1, , nX XK in the adjacent period hedonic index are:

Suburb Land size (m2) if property type = house Street classification (highway, main road, cul de sac etc.) Property type (eg. free standing house, semi etc. for houses) Number of bedrooms Number of bathrooms Number of car spaces Pool (Y/N) Waterfront (Y/N) Air Conditioning (Y/N) View (Y/N)

Notice that each of the hedonic variables after land size has a discrete domain, with many binary. The remaining transformations are determined independently for each period by using dummy variables for the range of values of the corresponding hedonic variables.

Such discrete hedonic variables are fit as categorical variables, so the transformations of these variables are determined as tabulated discrete valued functions during the model fit.

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

8

Page 10: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

2.2.2 Index Regions Distinct adjacent period hedonic indices are calculated for each property type (house / unit) and SSD (see §1.2.1), with the exception that some SSDs are amalgamated due to small numbers of sales and resulting index instability. These are:

Houses: {80505, 80510, 80540}, {80515, 80520, 80525, 80535} Units: {20535, 20540}, {20560, 20580}, {30520, 30540}, {30525, 30530}, {70505, 70510}, {80505, 80510, 80540}, {80515, 80520, 80525, 80535} 2.3 Smoothing Indices Often economic time series and indices show an unrealistic amount of period to period variation. A significant proportion of the volatility of such series comes from compositional biases and / or parameter estimation errors, sometimes more than due to actual market movements.

Time series in general can be represented as the sum of seasonal, noise and local mean components. The local mean is often referred to as the “signal”. It is the mean of the index in a neighbourhood of the current time. The random increments in the noise component need not be independent, however it is often a first step in the analysis of these time series to assume they are.

Specifically, we represent the series as

( ) ( ) ( ) ( )X t X t S t ks tε= + − + (2.3)

where

( )tε is a zero mean error term with ( ) ( ) 0 for .E t u t uε ε = ≠⎡ ⎤⎣ ⎦ .

[ ]: 0,S s → is a periodic, seasonal function and ( )1 .k s t k s≤ < +

( ) ( ) ( ) | tX t E X t S t ks= − − ℑ ⎤⎦ is the local mean or signal. ⎡⎣

The decomposition index = signal + seasonal + noise is commonly achieved via a state space model [PA]. It is a more effective method for extracting the underlying signal than simply calculating a moving average over the seasonal period.

In the case of the property indices and economic time series used in our model, the noise components are usually significantly coloured ie. show strong temporal dependence. We are aware that this contradicts the assumptions in (2.3), however our purpose is to provide a noise-reduced version of the adjacent period hedonic index in §2.2 to act as a benchmark or numeraire for individual property prices (see §2.4). Thus, our numeraire may tolerate some deviation from serial independence of the errors in pursuit of practicality.

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

9

Page 11: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

2.4 The Adjacent Period Hedonic Index as Numeraire 2.4.1 Continuity Issues In §2.5, we will be fitting an individual property pricing model to historical sales information. These sales occur at different times and therefore must be benchmarked (inflated or deflated) to a common time. We achieve this via adding the increment in the adjacent period hedonic (log price) index. We recalculate the benchmark index each day as new sales data arrives.

Specifically, each day we will calculate raw and smoothed adjacent period hedonic indices, with values for each day These will be used to create a benchmark index .t ( )B t for the valuation of individual properties in §2.5.

Suppose that on day we generate the benchmarking index ,T ( ).TB t

This is actually a function of 2 variables: We want it to behave as if it is daily snapshots of a continuous function in both.

, .t T

That is, both ( ) (1T T )B t B+ − t and ( ) ( )1T TB t B t+ − should not be too large.

We also want to eliminate periodicity and jumps due to the index calculation method.

For example, if we calculate a monthly hedonic benchmark index, then interpolate to daily values, we will add a new data point on a set day each month. This will cause a noticeable jump in the index, which is clearly not market driven.

Suppose instead we calculate the monthly benchmark index backward from the current date. This index will then cycle every month, producing a periodic effect which is again clearly not market driven.

The solution employed for the ASX indices is to construct the benchmark indices inductively from daily increments, which are themselves averages of increments kλ via (2.1) over pairs of adjacent 28 day periods ie. 1n = in (2.1).

We define each “month” as days. This is so market driven weekly effects such as those from weekend sales are not obscured. For brevity, write for the monthly period

28M =tU

[ ]1,t M t− + and suppose we are currently at time .T

The idea is that we can form pairs of adjacent periods, with each set of 2M days partitioned into a pair of M days each: { } { } { }2 1 2 1, , , ,..., .M M M M M TU U U U U+ + −

t

,TU For each we then calculate 2 , 2 1,...,t M M= + ,T λ via (2.1) and a 2 period model.

Define tλ as the lambda obtained from this calculation on the periods { }, .t M tU U− Although these are “monthly” returns, each day is the intersection of M monthly periods, so that we can estimate daily λ values as the average of the corresponding monthly ones. This will be explained fully in Section 2.4.3 below.

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

10

Page 12: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

If we calculate daily increments as averages of overlapping monthly periods for the entire history, we will have a runtime problem. However, it is not necessary to calculate daily increments for the entire period; merely for the final part after a fixed time point. Monthly indices can be used prior to this, since their time points do not change and any change in the index values is due to old sales data entering the database, which can happen, but is very small in volume compared to recent sales.

2.4.2 Monthly Part of Series

Pick a fixed time point This time point must be fixed in order to avoid periodicity in the index. In practice, the gap

* * .t n M T= <*T t− should probably be at least 1 year;

only because of the delayed arrival process of the sales data, otherwise it could be a few months.

Form the periods These will give us the monthly hedonic index. *2, ,..., .M M n MU U U

Recall that we define tλ as the lambda obtained from the 2 period hedonic regression in (2.1) on the periods { }, .tUt MU −

Calculate a set of monthly benchmark values

( ) ( ) ( ) ( )*, 2 , 3 ,...,T T T TB M B M B M B n M from the set of monthly lambdas via

( ) ( )( ) ( )1 1T Tk M B k M B k Mλ + = + − (2.4)

The base value ( )TB M is independently calculated, as the mean of log sales in [ ]1, .M

The monthly series is then smoothed as per §2.3, then cubic spline interpolated to a daily index.

2.4.3 Daily Part of Series Now we’ll calculate the daily incremented part of the series:

Form the period for each day tU * 1,..., .t t M T= − +

For each pair { } *, ,t M tU U t t− > calculate the average price growth tλ via (2.1).

We have generated a set of monthly lambdas, each 1 day apart.

Each day, calculate sets of monthly lambdas from overlapping adjacent periods, each 1 day apart. The daily lambdas will then be averages of all the monthly lambdas for periods containing the given day. That is, these intersecting periods will all be the 2nd period in an adjacent period calculation.

Day is the intersection of the monthly periods t 1... .t tt U U M+ −= ∩ ∩

The daily lambda for day t is then

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

11

Page 13: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

12

t M

t kk t

Mλ+ −

=

Λ = ∑ (2.5)

We then compute each daily index value inductively post *t t= as

( ) ( )1 tB t B t= − + Λ (2.6)

Since we’re calculating using all pairs of adjacent periods each day, we don’t get any periodicity: the value of will not have a monthly periodic component. tΛ

Further, there is no longer any reason why we need to stick to 28 days: we could use fixed periods of any length, although multiples of 7 days would not obscure genuine weekly market effects. 28 days is the period chosen in our method.

We stop calculating the daily part of the index at 140,T − for reasons which are explained in the next section.

2.4.4 Most Recent Part of Series Since there is a delay of a few days to a few months between a property sale (exchange of contracts) and the reporting of the sale to the database, a time point prior to the current date T is reached after which the are insufficient sales data to calculate statistically significant values of tλ via the 2 adjacent period method.

We counter this problem by pooling the most recent 168 days of sales data into a single period and fitting a modified version of (2.1):

( )5

01 1 1

logm

i j j j j j k kj j k

P c c f x s S Tν

iλ ε= = =

′= + + + +∑ ∑ ∑ (2.7)

where each sale occurs during the time interval iP ( ]168, .T T−

Because of the pooling, kλ′ represents the change in log values from the base date to month k of the 6 month period. To obtain the monthly increments 168T − kλ to

match with the form of equation (2.2), we subtract and also make a dampening adjustment:

( )( ) ( )( ) (( )

)( )

( ) ( )

2 24 1 0 1

2 23 2 1 0 2

2 22 3 2 0 3

2 21 4 3 0 4

2 25 4 0 5

min 1,

min 1,

min 1,

min 1,

min 1,

N

N

N

N

N

λ λ σ σ

λ λ λ σ σ

λ λ λ σ σ

λ λ λ σ σ

λ λ λ σ σ

′=

′ ′= −

′ ′= −

′ ′= −

′ ′= −

(2.8)

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

12

Page 14: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

where Nλ is the estimated index increment from 28T − to ,T ,k k 1σ > is the standard error of the estimate of 1k kλ λ −′ ′− and 0 0.02σ = is a baseline standard error obtained from estimating lambda values on large data sets.

Additionally, we pool more SSDs to obtain larger data sets and ensure statistically significant results. The SSD pooling for the final period regression is:

Houses: {20575, 20585}, {30501, 30503}, {30520, 30540, 30545}, {30525, 30530, 30550}, {50505, 50515}, {80505, 80510, 80540}, {80515, 80520, 80525, 80535} Units: {10530, 10545, 10553}, {20520, 20535, 20540}, {20525, 20530}, {20545, 20565}, {20560, 20580}, {20575, 20585}, {30501, 30503, 30509, 30511}, {30507, 30520, 30540, 30545}, {30525, 30530, 30550}, {50505, 50515}, {50510, 50520, 50525}, {70505, 70510}, {80505, 80510, 80540}, {80515, 80520, 80525, 80535}

The extra pooling to those listed in §2.2.2 is solely due to the small data sets of recent sales. The decrease in accuracy caused by less local indices is far less than and far more preferable to the unrealistic volatility causes by small sample sizes.

The downside of having the constants in ic (2.7) fixed over the entire last 6 months of the index calculation (thus for example the marginal value of an additional bedroom or bathroom is fixed) is more than offset by the ability to calculate statistically significant index increments over the most recent period.

The reason for the dampening of the lambda estimates in (2.8) is that they are usually estimated on small data sets and so can return unrealistically large estimates (albeit with large standard errors). If we were to set 0λ = for a standard error above an acceptable threshold, we would find unacceptable jumps ( ) ( )1T TB t B t−− as new data arrived on day

and the standard error dropped inside the threshold. T

Note that the effect of the dampening formula is to make the kλ very close to zero if there are almost no sales data and gradually increase the values of the kλ until the standard error is sufficiently small that we are confident in the estimate. This scheme avoids unrealistic, data driven volatility in the current part of the benchmark index which would then translate into volatility in the daily indices themselves.

Since, we have the value of from the daily calculation, the values obtained

in ( 140TB T −

( ))

(2.8) give us the values ( ) ( ) ( ) ( )84 , 56 , 28 , .T T T112 ,T TB T B T B T B T B− − − − T

The final benchmark index over the period [ ]140,T −112,..., .T

T is obtained by a cubic spline interpolation of the values at T T 140,− −

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

13

Page 15: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

2.4.5 Benchmarking Historical Sales

Given the above, we may view the benchmark index ( )TB t on any calculation day T as comprising:

1. A smoothed monthly hedonic index over the fixed period *0, ,t t⎡ ⎤∈ ⎣ ⎦ cubic spline interpolated to daily values.

2. A daily index over the period ( *, 140t t T ⎤∈ − ⎦ , calculated via (2.5) and (2.6).

3. An index over the final, fixed period of ( ]140, ,t T T∈ − calculated via §2.4.4.

If the current time is given a property sale price at time , the benchmarked log sale price is

,T iP it

( ) ( ) ( )logi i T T ip T P B T B= + −% t (2.9)

2.5 Pricing of Individual Properties 2.5.1 Hedonic Pricing Function As discussed in §2.1, the pricing of each property in the population is a 5 step process. Recall that a different model is fit for each property type (house/unit) and each region (same general mathematical structure for all models, but different parameter values).

We will refer to the method described in this section as the hedonic imputation method.

Thus in what follows, we will assume all properties are of the same type (house/unit) and in the same geographic region:

Each observed property sale at price and time will have hedonic attribute values iP it ix%

where x

% is an evaluation of ( ), , n1 .X X K X=

% Note that some elements of ix

% may be

NULL if there is missing data, for example if the number of bathrooms is not known.

Suppose we are currently at time .T 1. Benchmark all observed log sale prices by the relative benchmark index value (2.9),

as discussed in §2.4.

2. Fit a generalized, additive model (GAM) ( )1: , , nX XΨ %K a p which maps observed hedonic attribute data to the benchmarked log prices.

The set of hedonic observables in the GAM are those in the adjacent period hedonic index plus:

Fraction of year of observed sale (to cater for stripping the seasonality from the benchmark indices in §2.4). For example, if the sale occurred on 17/05/2004, this variable has value 138/366 = 0.3770.

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

14

Page 16: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

Log of the most recent previous sale price of the property, benchmarked by the relative hedonic index.

Log sale prices of up to 30 nearest neighbours, benchmarked by the relative hedonic index.

Latitude and longitude of property.

The neighbours’ sale prices enter the model as a single value, which is a weighted average as discussed in the following section.

Note that not all properties have records of previous sales, so this hedonic variable can be NULL. Properties with no previous sale still enter the model by nesting this variable inside a dummy expressing the existence of a previous sale.

The fraction of year input φ is fit to a smooth ie C2 function ( )h φ via a basis of

cubic splines. Note ( )h that φ in effect measures the relative phase of the sale date

and the current date. For a model fit on 14 Aug, the value of ( )h φ for a sale occurring on 17 May would be approximately ¼ period out of phase w e value of ( )h

ith thφ obtained for the same sale in the model fit on 14 Nov.

i

Conceptually, the GAM is then:

( ) ( ) ( )0

1 1

,

ˆ

m n

i j j jk j k k i ij k

i i

p c S c S f x h g u v

p

φ ε

ε= =

⎛ ⎞= + + +⎜ ⎟⎝ ⎠

= +

∑ ∑% + (2.10)

where:

The kf are transformations of the hedonic variables. The jkc are numerical coefficients. The jS are dummy variables with 1jS = if property i is in suburb .j

( )h φ is the phase function for the time of year.

( ),i ig u v is a 2-D cubic spline function of the latitude and longitude ( ), .i iu v

ˆ ii ip p is the (zero mean) residual error term, which we assume is normally distributed ε = −%

( )0, .iς

Given a particular property with observed hedonic attributes ( )1, , , , ,n ,x x u vφK obtain the current log price estimate from the GAM via (2.10). If the target property has some attributes ,ix NULL= these are imputed as described in §2.5.2.

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

15

Page 17: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

3. The current price estimate is then

( ) { }( ){

( ) ( )}

{ }

2

2

ˆ exp

ˆexp 2

exp 2

i i

i i

i i

P T E p

p T

Q T T

ς

ς

= ⎡ ⎤⎣ ⎦

= +

=

%

(2.11)

Steps 4 and 5 above are applied to all properties in the relevant population to obtain the full list of property values each day.

2.5.2 Imputation of Hedonic Variables It is often the case that historical sales data have missing values for some hedonic attributes. For example, the record may have a NULL value for the number of bedrooms or number of car spaces.

Where multiple sales records of the same property exist, with a particular hedonic attribute having a NULL value in some records, but a genuine value in others, we use a forward / back filling procedure to replace the NULL values. The rule is:

Forward fill first: If a NULL value is preceded (ie has an earlier sale date) by a non-null value ,x set all NULL values to x until the next non-NULL value is reached.

Back fill next: If the earliest sale record for that property has a NULL attribute, then all NULL values are set to the earliest non-NULL value.

Examples:

i. Suppose there are 4 sales records for a given property, with the #bedrooms attribute values 3, NULL, NULL, 4. Since forward filling takes precedence, the values are imputed to 3, 3, 3, 4.

ii. If the records are 3, NULL, NULL, NULL, the values are imputed to 3, 3, 3, 3.

iii. If the records are NULL, NULL, NULL, 4, the values are imputed to 4, 4, 4, 4.

iv. If the records are 3, NULL, 4, NULL, the values are imputed to 3, 3, 4, 4.

In the case where an attribute value is NULL for all sales records of a property, we cannot apply the above rules and so the attribute values remain NULL. When fitting the hedonic model (2.10), if NULL values are encountered for the attributes land size or #bedrooms, the sales record is dropped from the data.

If #bathrooms = NULL, the value defaults to 1.

If #car spaces = NULL, the value defaults to 0.

All NULL binary attributes are set to 0.

If NULL, property code is set to “free standing house” for a house and “low rise unit” for units. Street code is set to “suburban street”.

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

16

Page 18: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

When the hedonic model is then evaluated on the current portfolio of properties, the above imputation rules are applied, with the addition that hedonic attributes which are NULL for all sales records for a given property are imputed from suburb modes for all attributes except land size, where suburb median is used.

2.5.3 Neighbours Calculation Nearest neighbours are taken to be those properties closest in distance with the same number of bedrooms. Recall that all records where the number of bedrooms in the observed sale property is not known and unable to be imputed by the forward / back filling rule have been dropped from the data set.

For each target property ie. property to be valued, we choose a list of at most 31 sales of neighbours and find their benchmarked log sale prices via (2.9). Our hedonic input will be a single, positive number which is a weighted average of the adjusted, benchmarked neighbours’ log sale prices.

The adjustments are to cater for differing property sizes and types. For example, suppose the target property is a semi on 200m2 of land. If the next door neighbour is a free standing house of 300m2 which sold 4 months ago, the price information is of greater explanatory power if it is adjusted for the differences in land size and property type.

Additionally, we give more weight to more recent neighbouring sales and those whose adjusted, benchmarked prices are closer to the median (of the list of neighbours).

Adjust the Neighbours’ Prices Separate models (2.10) are fit by property type and region. Models fit at different times will lead to different coefficients. For example, the coefficient of log(landsize) will vary with each daily model fit. For our neighbours’ prices adjustment, we use a fixed coefficient value for each property type and region. These are averages of those obtained from a set of historical model fits.

Let be the coefficient of l ( )log land sizeL = and be the property code coefficients.

1, , kc K c

Denote the target property by the subscript 0 and the neighbour by the subscript 1.

Initially, we will receive the neighbour’s benchmarked log price 1.N%

For each neighbour, we make the adjustment

( )*1 1 0 1 0log logN N L L c c= + − + −% % l 1 . (2.12)

Weight for the Mean of Neighbours Formula There are two weights which are used:

1. aw of the age A of the neighbours’ sale in years.

2. rw of the rank R of the adjusted, benchmarked neighbour’s log sale price.

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

17

Page 19: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

In general, our final neighbours input is

*a r

nbra r

w w NX

w w= ∑

∑%

(2.13)

where is defined in *N% (2.12).

Assuming there are n neighbours, we have:

( )(1 max min ,10 1,0 / 9aw A= − − ) (2.14)

so that there is a linear decrease in weighting for age from 1 to 10 years.

We exclude the top & bottom 3 neighbours (if there are the full 31). We use a triangular weighting for the others ( x⎢ ⎥⎣ ⎦ is the floor of x ):

0 if 10

if 10 2

1 if 2 10

0 if 10

r

r

r

r

w R n

w R n R n

w n R n R n n

w R n n

= ≤ ⎢ ⎥⎣ ⎦= <⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦= + − < ≤ −⎢ ⎥ ⎢⎣ ⎦ ⎣= > − ⎢ ⎥⎣ ⎦

⎥⎦ (2.15)

2.6 Capital Gain Index Calculation We will refer to the index calculated in this section as the hedonic imputation index. For each property type (house/unit) and city, the method of calculation is: On the initial (base) day : 0T = 1. Price every property in the given category by the hedonic imputation method as

described in §2.5.

2. Form a “market portfolio” consisting of all properties between the 5th and 95th percentiles. The upper and lower tails are removed for statistical reasons: to ensure that the behaviour of outliers does not influence the dynamics of the index away from the dynamics of the actual market.

3. Sum the value of all the properties in the market portfolio. Call this value 0.M

4. If there are N properties in the market portfolio, the initial value of the index is the mean portfolio value divided by 1000: ( )0 0 1000 .I M N=

On each subsequent day : 0T > 1. Price every property in the given category by the hedonic imputation method (see

§2.2 §2.5).

2. Form a market portfolio as in Step 3 above.

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

18

Page 20: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

3. Find the intersection TΠ of today’s and yesterday’s market portfolios, with the

additional condition that a property cannot be in TΠ if the value of any of its attributes has changed from the previous day. See §2.2.1 for the list of attributes.

4. Find the total current value TM of all the properties in TΠ . That is, we want to value the properties in TΠ at the current time T as per the method of §2.5.

5. Use the previous day’s valuations to calculate *1,TM − yesterday’s value of TΠ . Note

that 1,−≠ the latter being yesterday’s value of yesterday’s portfolio 1T −Π . *1T TM M−

6. The change in value of the portfolio TΠ from the previous day is *1 .T TM M − The

new index value is then ( )*1 1T TI .T TM M I− − =

Note that the ASX index daily return for day T is calculated as

( ) ( ) ( )

( ) ( ){ } ( ) ( ){ }2

ˆ ˆ 1

exp 2 1 exp 1 2i i

i i i i

r T P T P T

Q T T Q T Tς ς

= −

= −

∑ ∑∑ ∑ 2 −

(2.16)

Note the are the standard deviation of the residuals ( )i Tς iε in (2.10), so that if we don’t expect the valuation imputation model to systematically get worse or better over time, the quantity ( ) ( )logT Pς T should be on average constant.

Moreover, we expect that long term, ( )log P T trends linearly upward with T at approx the mean capital log return rate (approx = nominal GDP growth rate). g

It follows from (2.16) that the convexity adjustment ( ){ }2exp 2i Tς cannot be ignored.

Note on Step 3 above: A property is dropped from the valuation portfolio TΠ only on the day the value of its attribute vector changes. The reason is that including it on such a day is equivalent to a capital in or outflow, which contradicts the index as tracking the value of a self financing portfolio.

As an example, suppose new information about a property arrives on day T which shows it has 4 bedrooms, not 3 as previously thought, so that our information about the number of bedrooms is …, 3, 3, 3, 4, 4, 4, … The property is in the portfolios 1−TΠ and , but not in , because the output of this property’s valuation function will jump on day as the change in information is equivalent to the building of an extra bedroom ie. a capital injection to the portfolio.

1T +Π

TΠ ,T

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

19

Page 21: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

3 Calculation of Rental Yields 3.1 Summary Rental yields are calculated daily. The yield is intended to represent the rental income for the given day as a proportion of total property market value if all properties in the population were rented without any market dilution effects. Clearly only a minority of properties are actually rented at any given time and the majority of those were rented during a prior period. The figures in the table are from the 2006 Census: Houses

City % Owned % Mortgaged % Rented Sydney 40.2% 41.0% 18.8% Melbourne 40.4% 42.4% 17.2% Brisbane 34.0% 42.4% 23.6% Adelaide 39.9% 42.7% 17.4% Perth 34.6% 45.0% 20.4%

Units

City % Owned % Mortgaged % Rented Sydney 18.1% 21.8% 60.1% Melbourne 18.6% 19.0% 62.4% Brisbane 15.3% 16.3% 68.4% Adelaide 21.1% 17.9% 60.9% Perth 20.2% 20.5% 59.4%

Consequently, a hedonic rental formula, essentially a rental version of the property valuation formula in §2.5 is derived from observed rents and the attributes of the properties which do rent. This formula is then applied to impute rental income for properties not being rented (because they are owner occupied or vacant). The rental yield for each index is the sum of the imputed rental incomes for every property in the population divided by the sum of the imputed property values derived by the method of §2.5.

The rental income is then that day’s rental yield multiplied by the day’s index value.

The actual rental income for each index contract is the average of the past 7 days’ rent figures above, less a constant discount to the gross yield applied by the ASX to mirror physical portfolio returns.

The rents are averaged over the most recent 7 days to provide stability of income.

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

20

Page 22: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

3.2 Rental Yield Calculation We first describe the method of imputing rents for all properties in the population. As with valuation of all properties in Section 2, a different rental model is fit for each property type (house/unit) and each SSD (same general mathematical structure for all models, but different parameter values). Thus in what follows, we will assume all properties are of the same type (house/unit) and in the same SSD. We take all rental listings (most recent listing price) occurring in the 6 months prior to the calculation date. The reason for the 6 months is that properties advertised for rent during this time period can reasonably be assumed to be currently rented at the same rate. The most recently advertised rent is used where confirmation of the actual rent is not available, which is usually the case. All rents are converted to an annual figure, counting 365 days in a year. Suppose there are properties in the population, for which we have value estimates

by the method of §2.5. Suppose we have observed rental information N

1, , NP PK ˆ1, , MR RK

for M N< of these properties. We follow (2.10) and (2.11), using the same hedonic variables listed in §2.2 and the method of §2.5 to fit a hedonic imputation model for the rental income from any property, given its hedonic attributes (without needing to benchmark the rents by step 1):

({ )}1ˆ exp , ,rent nR E x x⎡= Ψ⎣ K ⎤⎦

)

(3.1)

where ( 1, , nx xK is the vector of observed hedonic attributes.

We thus obtain a full set of rental estimates N 1ˆ ˆ, , NR RK for all properties in the

population. The annualized portfolio yield for the given day is then

1 1

ˆN N

t ii i

iy R= =

=∑ ∑P (3.2)

Day rental income per index contract is then 'st

( ){ }6 6 5 5 ... 7 365t t t t t t trent y I y I y I cI− − − −= + + + − (3.3)

where tI is that day’s index value and c is a constant discount to the gross yield applied by the ASX to mirror physical portfolio returns.

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

21

Page 23: Rismark Technical Paper Calculating High Frequency ... · For example, Sydney has 14 SSDs, Melbourne 16, Brisbane 12, Gold Coast 4, Adelaide 4 and Perth 5, all of roughly of equal

Copyright Rismark International 2011 Level 13, 50 Margaret Street, Sydney NSW 2000

22

Bibliography [ASGC] Australian Bureau of Statistics, (2006) Statistical Geography Volume 1 –

Australian Standard Geographical Classification (ASGC).

[HM1] Hill R., Melser D., (2007) Comparing House Prices Across Regions and Time: An Hedonic Approach, UNSW School of Economics Discussion Paper 2007-33. http://wwwdocs.fce.unsw.edu.au/economics/Research/WorkingPapers/2007_33.pdf

[HM2] Hill R., Melser D., (2008) Hedonic Imputation and the Price Index Problem: An Application to Housing, Economic Inquiry 46(4), 593-609.

[PA] Peng J-Y, Aston J., State Space Models Manual MATLAB implementation. Available