16
1 Data Validation For the 2006 Census of Agriculture Charlie Arcaro, Statistics Canada ICES III - 20 June, 2007

1 Data Validation For the 2006 Census of Agriculture Charlie Arcaro, Statistics Canada ICES III - 20 June, 2007

Embed Size (px)

Citation preview

Page 1: 1 Data Validation For the 2006 Census of Agriculture Charlie Arcaro, Statistics Canada ICES III - 20 June, 2007

1

Data Validation For the 2006 Census of

Agriculture

Charlie Arcaro, Statistics Canada

ICES III - 20 June, 2007

Page 2: 1 Data Validation For the 2006 Census of Agriculture Charlie Arcaro, Statistics Canada ICES III - 20 June, 2007

2

Outline

Introduction

What is Data Validation?

CEAG and Validation Process

New Strategies for 2006

Conclusions

Page 3: 1 Data Validation For the 2006 Census of Agriculture Charlie Arcaro, Statistics Canada ICES III - 20 June, 2007

3

Introduction

Census of Agriculture (CEAG) Provides a quinquennial snapshot of Canadian Agriculture

Piggyback with Census of Population

Collects inventories of farm commodities/financial information

Approximately 229K farms & 327K operators in CEAG 2006

Used in redesigning Agricultural surveys Basic sources of sampling frames Intracensal corrections Updating the Farm Register

Page 4: 1 Data Validation For the 2006 Census of Agriculture Charlie Arcaro, Statistics Canada ICES III - 20 June, 2007

4

What is Data Validation?

Major part of data quality evaluation for Census of Agriculture

Costliest part of the CEAG process post-collection

Analyze and change CEAG data (Sep 2006 to Feb 2007)

Data analyzed using both micro and macro level tools Aggregate data at various geographic levels Quality evaluation of small area data

Produce reports, make presentations to Certification Committee and recommendations for publication

Page 5: 1 Data Validation For the 2006 Census of Agriculture Charlie Arcaro, Statistics Canada ICES III - 20 June, 2007

5

CEAG Process

CollectionCollection Data ScanningData Scanning

Output & DisseminationOutput & Dissemination

Editing, Matching, Follow upEditing, Matching, Follow up

ImputationImputation

Data ValidationData Validation

Page 6: 1 Data Validation For the 2006 Census of Agriculture Charlie Arcaro, Statistics Canada ICES III - 20 June, 2007

6

What’s Involved?

Senior Validators prepare validation plan based upon knowledge and expectations

Number of farms & totals in 2006, Structural changes since 2001

Validation tools on Central Processing System (CPS) Comparison/Match/Distribution tables, Top Contributors, Impact

of Processing

Compare references sources

Validators responsible for many variables Validation done by variables and not questionnaire

Page 7: 1 Data Validation For the 2006 Census of Agriculture Charlie Arcaro, Statistics Canada ICES III - 20 June, 2007

7

Goals for CEAG Validation 2006

More efficient use of resources

Improve data quality at finer geographic level Greater scrutiny expected in 2006

Page 8: 1 Data Validation For the 2006 Census of Agriculture Charlie Arcaro, Statistics Canada ICES III - 20 June, 2007

8

Validation Tools

Comparison Tables

Compare data for 2001 and 2006 and % of change Total values, # farms reporting, average reporting value Four geographical levels

Look for changes that makes data “questionable” Use survey data or other sources to justify changes More likely at lower levels of geography

Page 9: 1 Data Validation For the 2006 Census of Agriculture Charlie Arcaro, Statistics Canada ICES III - 20 June, 2007

9

Validation Tools (cont.)

Impact of Processing Tables Assesses the impact made to data during

imputation and validation (Provincial level only)

Reports available for all variables

No change for 2006

Page 10: 1 Data Validation For the 2006 Census of Agriculture Charlie Arcaro, Statistics Canada ICES III - 20 June, 2007

10

Validation Tools (cont.)

Top Contributor Tables Farm records with highest value for certain geographic

areas

What to look for Imputed values, Jumps in consecutive values Compare associated variables with main variable

Bottom Contributors Greenhouse variables - supposed to report in ft2 or m2

Locate capture and response errors

Page 11: 1 Data Validation For the 2006 Census of Agriculture Charlie Arcaro, Statistics Canada ICES III - 20 June, 2007

11

Validation Tools

Top Contributor Tables

In 2001… Top 100 farms in each province (default)

In 2006… Top 100 farms or top 80% contributors in each province

Page 12: 1 Data Validation For the 2006 Census of Agriculture Charlie Arcaro, Statistics Canada ICES III - 20 June, 2007

12

Validation Tools

Match Tables Compare CEAG data to selected Referential Sources (RS)

Three reports are generated In RS not in CEAG, in CEAG not in RS, in both but with significant

differences

In 2001… Top 100 from each of the 3 match reports (default)

In 2006… Using Call Management System (CMS) cutoff algorithm (2% units,

50% cumulative total 1% individual total) Missing CEAG farms > 30% CCS estimate

Page 13: 1 Data Validation For the 2006 Census of Agriculture Charlie Arcaro, Statistics Canada ICES III - 20 June, 2007

13

Validation Tools

Distribution Tables Distribution of variables (Geog. Classification, Category..)

Similar to Comparison Tables. Three types:

Operator Tables All operators at detailed geography level # of operators, age, sex, job injuries etc….

Livestock Tables Counts and farms sizes for various livestock and poultry (Province

level)

Other Tables Computer Usage, Land management Data on tick box variables

Page 14: 1 Data Validation For the 2006 Census of Agriculture Charlie Arcaro, Statistics Canada ICES III - 20 June, 2007

14

Results for CEAG 2006

Total Cattle (TCATTL) – 109,920 Farms

# Validated % Farms Total %Total

New Methods 1,551 1.41% 3,346,211 20.15

Previous Methods 2,928 2.66% 3,992,312 20.73

Alfalfa (ALFALFA) – 88,064 Farms

# Validated % Farms Total %Total

New Methods 1,916 2.18% 1,220,928 9.62

Previous Methods 2,562 2.91% 1,187, 804 9.36

Total Pigs (TOPIGS) – 11,506 Farms

# Validated % Farms Total %Total

New Methods 1,466 12.74% 7,018,275 45.71

Previous Methods 2,808 24.40% 7,255,827 47.25

Page 15: 1 Data Validation For the 2006 Census of Agriculture Charlie Arcaro, Statistics Canada ICES III - 20 June, 2007

15

Conclusions

How much validation is enough?

More focused and structured approach to Validation

Process

Reduced work without quality compromise

Page 16: 1 Data Validation For the 2006 Census of Agriculture Charlie Arcaro, Statistics Canada ICES III - 20 June, 2007

16

Contact Details

If you have any Si vous avez plus defurther questions questions

Charlie ArcaroPhone (613) [email protected]

Visit our website / Visitez notre site: www.statcan.ca