29
1 Methods for detecting errors in VAT Turnover data Phil Lewis Processing, Editing and Imputation branch Business Statistics Methods-Survey Methodology E-mail: [email protected]

Methods for detecting errors in VAT Turnover data

  • Upload
    beau

  • View
    55

  • Download
    0

Embed Size (px)

DESCRIPTION

Methods for detecting errors in VAT Turnover data. Phil Lewis Processing, Editing and Imputation branch Business Statistics Methods-Survey Methodology E-mail: [email protected]. Outline of talk. Detecting suspicious patterns Methods for detecting unit errors Consider 5 methods - PowerPoint PPT Presentation

Citation preview

Page 1: Methods for detecting errors  in VAT Turnover data

1

Methods for detecting errors in VAT Turnover data

Phil LewisProcessing, Editing and Imputation branch

Business Statistics Methods-Survey MethodologyE-mail: [email protected]

Page 2: Methods for detecting errors  in VAT Turnover data

2

Outline of talk

• Detecting suspicious patterns• Methods for detecting unit errors• Consider 5 methods• Comparing methods• Results• Conclusion and recommendations

Page 3: Methods for detecting errors  in VAT Turnover data

3

Detecting suspicious patterns

• One of the problems with VAT Turnover data is that it is often not possible to re-contact businesses to get an idea of their true Turnover figure.

• It is often possible to identify errors in VAT Turnover data by considering the pattern of reported Turnover over a period.

Page 4: Methods for detecting errors  in VAT Turnover data

4

Hoogland (2010)

i. Zero Turnover in three quarters, positive Turnover in the other quarter

ii. Zero Turnover in one quarter, positive Turnover in the other three quarters

iii. Same Turnover in all four quarters

iv. Same Turnover for three quarters, a different (positive) Turnover value in the other quarter

v. Negative Turnover in any of the quarters

Page 5: Methods for detecting errors  in VAT Turnover data

5

Methods for detecting unit errors in reported VAT Turnover

If

then assume the current VAT Turnover has

been reported in thousands of pounds and

multiply by 1000 to get a figure in pounds.

current VAT TurnoverA B

previous VAT Turnover

Page 6: Methods for detecting errors  in VAT Turnover data

6

1 – Quartile distances in industry Turnover

• Based on a method described in Hoogland and Van Haren (2007) to identify unusually large or small Turnover by locating extreme values in the distribution of VAT Turnover within a particular industry and size class.

Page 7: Methods for detecting errors  in VAT Turnover data

7

• Suspicious Turnover is identified as follows.

If

Turnover > Q3 + [C × (Q3 – Median)]

or

Turnover < Q1 – [C × (Median – Q1)]

• C may be given different values for different industry and size classes.

Page 8: Methods for detecting errors  in VAT Turnover data

8

2 – Period on period ratios

• Method 2 comes from De Jong (2003) and involves calculating period on period ratios for each business based on the contribution that business’s Turnover makes to its class.

• For each business calculate:

VAT TurnoverScore = Median VAT Turnover in class

Page 9: Methods for detecting errors  in VAT Turnover data

9

• Then calculate

Where is the value of Score in period t.

t tt-1 t-1

tt-1

Score Score if Score > Score TestRatio =

Score Score otherwise.

tScore

Page 10: Methods for detecting errors  in VAT Turnover data

10

3 – Comparison with reporting history for the business

• The method is described in slightly different forms in Hoogland and Van Haren (2007), Lorenz (2010) and Röstel (2010).

• Note that this method only identifies suspiciously large Turnover.

Page 11: Methods for detecting errors  in VAT Turnover data

11

• If Turnover > £100 million

and Turnover > 10 × mean Turnover for the business in the past 24 months.

Then treat as suspicious.

Page 12: Methods for detecting errors  in VAT Turnover data

12

4 – Quartile differences combined with measure of influence

• Refinement to method 1, inspired by Hoogland et al (2009).

• Calculate the influence as the proportion of VAT Turnover the business contributes to the total VAT Turnover in the industry and size class.

• Combine detection of suspicious values using quartile differences with the influence.

Page 13: Methods for detecting errors  in VAT Turnover data

13

• Identify unusual Turnover values using the quartile distances measure described in method 1.

• Reminder method 1 Suspicious Turnover:

Turnover > Q3 + [C × (Q3 – Median)]

or

Turnover < Q1 – [C × (Median – Q1)]

Page 14: Methods for detecting errors  in VAT Turnover data

14

• Then for each business calculate

• This method effectively subsets businesses failing the quartile distance method, so that only the most influential are viewed as being suspicious.

VAT TurnoverInfluence = Total VAT Turnover in class

Page 15: Methods for detecting errors  in VAT Turnover data

15

5 – Hidiroglou-Berthelot method

• Compare to previous period’s value:

Form the ratio r

= current VAT turnover / previous VAT turnover

Transform the ratio• if r < m median then t = (r - m) / r• otherwise t = (r - m) / m

Define

E = t x max { current VAT T/O, previous VAT T/O }v

Page 16: Methods for detecting errors  in VAT Turnover data

16

Then calculate

Suspicious businesses are then identified asfollows:

If

or

Q1d = max (Q2 - Q1) , A Q2

Q3d = max (Q3 - Q2) , A Q2

Q1E < Q2 - C × d

Q3E > Q2 + C × d

Page 17: Methods for detecting errors  in VAT Turnover data

17

• A key difference between survey and administrative data is that with administrative data it is often not possible to re-contact the business and ask them to confirm any suspicious values.

• Evaluation of detection methods is not straightforward and cannot usually be definitive.

Page 18: Methods for detecting errors  in VAT Turnover data

18

Comparing methods

• Diagnostics include the proportion of businesses identified as suspicious within each industry and size class and the average size (employment) and VAT Turnover of suspicious businesses compared with the rest of the class.

Page 19: Methods for detecting errors  in VAT Turnover data

19

Results of testing detection methods with VAT data

• If businesses with larger Turnover values are of more importance:

method 4 (Quartile differences &influence) and

method 5 (Hidiroglou-Berthelot)

offer the flexibility to give higher weight to those businesses.

Page 20: Methods for detecting errors  in VAT Turnover data

20

• Good quality historic data available then:

method 2 (Period on period ratios)

and

method 3 (Comparison with history )

likely to give good results.

Page 21: Methods for detecting errors  in VAT Turnover data

21

• Method 1 (Quartile differences)

and the related

method 4 (Quartile differences &influence)

should be effective in identifying extreme values when only the current period data are available.

Page 22: Methods for detecting errors  in VAT Turnover data

22

Results of testing detection methods with VAT data

Page 23: Methods for detecting errors  in VAT Turnover data

23

Estimated false hits

Page 24: Methods for detecting errors  in VAT Turnover data

24

Conclusion and recommendations

• Each of these methods uses parameters which can be fine-tuned to identify an appropriate number of suspicious businesses.

• The effective values of these parameters are likely to differ between data sources. Therefore, rather than prescribe specific values, it is recommended that the parameters are set through analysis of the effect of the method on the VAT data under consideration.

Page 25: Methods for detecting errors  in VAT Turnover data

25

Before applying any detection methods

• Suspicious patterns. It is recommended that VAT data are checked for these patterns before implementing any other error detection method.

• Unit errors: relatively easy to identify and correct. It is recommended that an automatic method is developed to detect and correct any unit errors in VAT Turnover data, before applying any other rules.

Page 26: Methods for detecting errors  in VAT Turnover data

26

• The final recommendation is that in developing methods for detecting errors in VAT Turnover data, it is always useful to understand the data source and the possible errors that may be found in it.

• In many cases, it will be necessary to liaise with the data providers to get this information.

Page 27: Methods for detecting errors  in VAT Turnover data

27

References:

• De Jong, A. "Impect: Recent developments in harmonized processing and selective editing", Proceedings of UNECE Work Session on Statistical Data Editing, Madrid, October 2003: Web.

• Hidiroglou, M. A. and Berthelot, J.-M. “Statistical Editing and Imputation for Periodic Business Surveys”, Survey Methodology, June 1986, Vol. 12, No. 1, pp 73-83: Journal.

• Hoogland, J. "Editing strategies for VAT data", Seminar on 'Using administrative data in the Production of Business Statistics - Member States experiences', Rome, March 2010: Web.

• Hoogland, J. and Van Haren, G. "Editing and integrating VAT and SBS data", Proceedings of the third International Conference on Establishment Surveys (ICES-III), Montreal, June 2007: CD ROM.

Page 28: Methods for detecting errors  in VAT Turnover data

28

References:

• Hoogland, J., Van Bemmel, K. and De Wolf, P-P. "Detection of potential influential errors in VAT turnover data used for short term statistics", Proceedings of UNECE Work Session on Statistical Data Editing, Neuchatel, October 2009: Web.

• Lorenz, R. "The integrated system of editing administrative data for STS in Germany", Seminar on 'Using administrative data in the Production of Business Statistics - Member States experiences', Rome, March 2010: Web.

• Seyb, A., Stewart, J., Chiang, G., Tinkler, I., Kupferman, L., Cox, V. and Allan, D. "Automated editing and imputation system for administrative financial data in New Zealand", Proceedings of UNECE Work Session on Statistical Data Editing, Neuchatel, October 2009: Web.

Page 29: Methods for detecting errors  in VAT Turnover data

29

Extra information

• For method 2, we used a threshold of 25 as a compromise between the monthly and quarterly data.

• For method 3, we used the thresholds described in Hoogland and Van Haren (2007).

• For method 5, the Hidiroglou-Berthelot rule, we used a value of V = 1 to give extra weight to businesses with larger Turnover, as this has been shown to work well with business data in the past. The value of C for this method was 250.

• Method 1 used a value of C = 10 in the quartile method to give the same proportion of failures.

• For method 4 we chose a value of C = 8 in the quartile method and then prioritised the businesses failing that method by VAT Turnover to give a similar proportion of failures as methods 1 and 5.