45
Using Administrative Data to Enhance Longitudinal Research Lorraine Dearden Director ADMIN Institute of Education Email: [email protected] NILS Research Forum Belfast 22 October 2010

Using Administrative Data to Enhance Longitudinal Research

  • Upload
    vila

  • View
    37

  • Download
    1

Embed Size (px)

DESCRIPTION

Using Administrative Data to Enhance Longitudinal Research. NILS Research Forum Belfast 22 October 2010. Lorraine Dearden Director ADMIN Institute of Education Email: [email protected]. Introduction. - PowerPoint PPT Presentation

Citation preview

New longitudinal admin data

Using Administrative Data to Enhance Longitudinal ResearchLorraine DeardenDirector ADMINInstitute of EducationEmail: [email protected]

NILS Research ForumBelfast22 October 20101IntroductionIn current economic climate, using and linking administrative data very important for policy analysisScope for well funded longitudinal surveys going to be put under pressureAlso, for countries like NI, sample sizes in survey data not always satisfactoryNILS is a very welcome addition for researchersIndeed colleagues at ADMIN using it to look at issues do to with health and migrationBut limited in scope as to what issues you can use it for and could be significantly enhanced with other administrative dataWhy so important to make better use of Administrative Data?Administrative data has already been collected for administrative purposes so money spent

But the potential it gives for those interested in making sound policy advice immense if used correctly

Allows one to potentially follow multiple cohorts over time (longitudinal data) which is something survey data can rarely doSample size issues disappear in general which is very important when doing within country analysisSo why hasnt it happened?Fears over data protection...But this is always issue when any individual level data and the instances of researchers inappropriately using data virtually unheard ofThe individual level data is highly disclosive but researchers never look at nor report anything that is disclosive But is essential that this information is in their data at the individual levelMajor issues around disclosure and data protection have been centred around agencies holding the administrative data So how far have we got on this?Have various LS with Scotland the most advanced in terms of linkage (including linkage to schools data)Serious discussions in government about whether Censuses could be replaced by linking administrative dataSo politicians and policy makers are talking about itCertain departments in Whitehall have started linking administrative data sets for internal use (ONS) whereas others have linked data for research projects for them (e.g. DWP) and yet others for general research purposes (DfE and BIS)

Another important developmentThere is increasing linkage of survey data to Administrative data where consent has been obtained from the individuals in the surveyLongitudinal Survey of Young People in England (linked to NPD data)MCS (and ALSPAC) linked to hospital registration data, NPD data and now have permissions to link to Hospital Episodes Data, Economic Data held by DWP and HMRC (for both parents) as well as NPD data for all siblings of CMELSA has linked to health and economic data and NCDS and MCS are about to do this as wellInnovation Panel of Understanding Society will do this in a few years with hope of rolling it out to full sampleWhy is this important?New linked admin/longitudinal data has potential to:Get a better understanding of the implications of missing covariates in administrative which is crucial if we are going to rely more on administrative data linkageGet a better understanding of implications of attrition and non-response in survey dataAllow us to understand the implications and extent or recall bias in surveysReduce the costs of longitudinal survey data

7So what administrative data is there?Some, like data on school children, is country specificOthers like HESA (Higher Education), DWP and HMRC data covers all of Great BritainNow going to talk a bit about what is out there in terms of administrative data...

New longitudinal HE admin dataLinked individual-level administrative dataSchool (NPD), FE (ILR/NISVQ) and HE (HESA) recordsData on participants AND non-participants in HEFour cohorts:In Year 11 in 2001-02, 2002-03, 2003-04 and 2005-05Potential age 18/19 HE entry in 2004-05, 2005-06, 2006-07, 2007-08 or (age 19/20 entry 2005-06, 2006-07 and 2007-08)State and private school students9DataSocio-economic backgroundFree school meals status from PLASCIMD quintiles based on home postcode (age 16)Gender, MOB and school ID available for allEthnicity, EAL, SEN from PLASCMissing for private school kidsNeighbourhood measure of parental education based on 2001 CensusBased on home postcode for state school analysisBased on school postcode when include private school kids10DataPrior attainment State school :Average point score at Key Stage 2, 3, 4 and 5 (plus indicators of reaching expected level at Key Stage 4 and 5)Private school :Key Stage 4 and 5 results only

11Integrated administrative data setSchool dataCensus of school children with individual characteristics of all pupils e.g. gender, ethnicityPrior achievement from age 11 through to 18Individual Learner RecordFE college attendedParticipation and qualifications achievedHigher Education dataDetailed information on degree subject, institution, degree class awarded for all those participating in HE12Destinations of Leavers from Higher Education survey (DLHE) Early DHLE Survey (surveys graduates 6 months out of university) only preliminary snapshot of graduate successIn 2006, HESA carried out a follow up to the Early DHLE Survey Longitudinal DLHE 3 years after graduationContains full details of HE plus wages / occupation 3 years after graduation13Longitudinal DLHECan tell us early value of degrees By subjectBy institutionPossibly by subject and institution (subject to sample size)Data essentially owned by universities so would need their permission to do this14What data is included within NPD?Key Stage 1 Results Keys: PupilID, Academic Year, Lea/EstabKey Stage 2 Results Keys: PupilID, Academic Year, Lea/EstabKey Stage 3 Results Keys: PupilID, Academic Year, Lea/EstabKey Stage 4 Candidate Keys: PupilID, Academic Year, Lea/EstabKey Stage 5 Candidate Keys: PupilID, Academic Year, Lea/EstabFoundation Stage Profile Keys: PupilID, Academic Year, Lea/EstabSchools census (formally PLASC)Keys: PupilID, Academic Year, Lea/Estab, Pupil postcodeKey Stage 4 ResultsKey Stage 4 IndicatorsKey Stage 5 IndicatorsKey Stage 5 ResultsInformation Learner Record - Aims Keys: PupilID, Academic Year, Lea/EstabYear 7 Progress Test Results Keys: PupilID, Academic Year, Lea/EstabCore Pupil Keys: PupilID, Academic Year, Lea/Estab, Pupil postcode15

16Main fixed pupil characteristics from School CensusMain indicators:Sex of childAge (month of birth is standard release)Ethnic groupEnglish as an additional languageAre they time-invariant?We might collect several measures of each, e.g. one from each of KS4, KS2, KS1 sweeps and also up to nine years of Pupil Census reports from schoolsWe think of these characteristics as fairly time-invariant, yet they vary for a tiny minority of childrenYou can place greatest weight on most recent reports, or alternatively place greatest weight on the modal report of their characteristic17Time-variant pupil characteristicsFSM eligibleSENPostcode, LLSOA, IDACI rankConnexions, gifted and talented (variable school recording of this)Mode of travel (new)Part-time, border18Obtaining geo-classifications for home addressesStandard release:DCSF will release a lower level super output area to indicate where the child livesLLSOA geographical area with a minimum population of 1,000, nested within census ward boundariesSecure release:DCSF will release childs home postcode to researchers who make a case for it and can show data will be held securelyHome postcode geographical area with an average of 11 households, giving a relatively precise (within 100m) geo-locationWILL NOT release if you just want to attach geo-data to the postcode (they will do this for you)WILL NOT release if you just want to calculate home-school distances, find the nearest school etc (they will do this for you)19Access to NPD data Most researchers can access this dataHave to outline their research question, the data they need, make a case for any special additional variables that are thought to be disclosive (e.g. date ofbirth, postcode) and provide evidence that data will be held securely (never on laptop or desktop etc)Data is transfered via a encrypted electronic transferIf want to use data for new research project, need to approach DfE again before using dataNI Schools DataHave similar data though not so detailed results data. Basic outcomes at KS2, KS4 and KS5Census data comparable and in some cases more richBut have potential to link this to HESA data and graduate destinations survey as wellAccess to linked HESA/NPD dataThis access occurs through BIS who have done the linkageAgain need to outline research question and make case for dataAgain transfer is via electronic encrypted transfer (FTP site) and host organisation has to demonstrate has secure facilities where data will be kept

DWP and HMRC data: WPLSThe DWP has linked all DWP benefit and program participants to HMRC employment and earnings data (from P14 returns) since 1998This is called the WPLS (Work and Pensions Longitudinal Study)Permission to link this to FRS, NCDS, MCS and ELSA surveys as well (consent obtained from individuals in these surveys)A summary of its uses can be found here http://statistics.dwp.gov.uk/asd/longitudinal_study/WPLS_Uses.pdfWPLSResearchers have had access to this data when carrying out work/evaluations for DWP What data does not include is HMRC records for individuals who have not been on DWP program or benefits so not as good as it could be...But surveys who have sought permission to link to DWP and HMRC data can link to this additional HMRC data (e.g. FRS, ELSA, NCDS and MCS)Collecting data on benefit receipt typically difficult to do in surveys so this linkage extremely valuable and saves survey time costsThis data covers whole of Great Britain not just EnglandHMRC NIC dataHMRC has records on individual NI contributions since NI was introduced in 1948Originally only 1% of sample was held electronically but now all of these records are electronically held by HMRCThe English Longitudinal Survey of Aging (ELSA) has linked all individuals in its survey who gave consent for linkage to this NIC data which means they have earnings and employment history for their sample from 1948Up until recent changes in NI for those above UEL, do not know earnings above UEL but this reasonably small proportion for most time periods and no longer an issueThis data going to be linked to NCDS and MCS (where consent rates were in excess of 80%)

Other dataGP registration data (NILS at forefront here)

Hospital Episodes Data

Home Office data on crimes (have individual level information)

Birth, marriages and death registration data (NILS again at forefront here)How has this linked ADMIN data been used by researchers?Going to shamelessly focus on some of the work I have done with this dataNot always successful as I will demonstrate and this linked administrative data not always up to research taskBut has great potential to answer lots of policy relevant questions

Widening participation in HEJoint work with Chowdry, Crawford, Goodman and VignolesShows that prior school attainment is main reason for large gap between rich and poor in:HE participationParticipation in a high status universitySuggests HE funding reforms are not best tool for addressing social mobility/access issues. Focus instead must be on improving school attainment amongst poor childrenUses linked school, FE and HE administrative data to assess schooling roots of large SEP gap

28Widening participation in HE29Month of birth effectsJoint work with Crawford and MeghirChildren born in September start school aged 5 whereas those born in August are almost a year youngerDoes this impact on longer term educational outcomes?Used samed linked data to look at this questionFound being born in August has prolonged impact on educational outcomes and even reduces probability of entering HERaw differences (proportion getting expected level)

31Summary of findingsAugust-born children experience significantly poorer education outcomes than September-born childrenAlmost entirely due to differences in the age at which they sit the testsStarting school earlier/having more terms of school is marginally better for August born children at younger ages Ethnic Parity in JCP services in UK?Joint work with Crawford, Mesnard, Shaw and Sianesi at IFSEthnic parity:No difference on average between Ethnic Minority and otherwise identical White entering the same JCP office and accessing same program/benefitOur aim:Get as close as possible to otherwise identical White and see what difference remainsCalculate results for a range of JCP benefits and programs33Different methods: sensitivity of results to methods usedPrograms and BenefitsIncapacity benefit (IB): paid to individuals who are assessed as being incapable of work and who meet certain National Insurance contributions conditions.

Income support (IS): a benefit for individuals on low income; usually claimants are lone parents, sick or disabled, or carers.

Jobseekers allowance (JSA): a benefit paid to individuals of working age who are unemployed, or who work fewer than 16 hours per week and are looking for full-time work.

New Deal for Lone Parents (NDLP): a voluntary programme whose aim is to encourage lone parents to improve their work prospects and help them into work.

New Deal for individuals aged 25 plus (ND25plus): a programme to help unemployed individuals aged 25 and over to find and keep a job. Participation is compulsory for individuals who have been claiming JSA for at least 18 of the previous 21 months.

New Deal for Young People (NDYP): similar to ND25plus except that it is targeted on individuals aged 18-24. Participation is compulsory for those who have been claiming JSA for at least six months.

Controlling for selectionControl for differences in observed characteristics between ethnic groups that may affect outcomesData:Detailed labour market historiesIndividual background characteristicsMethods:Primarily propensity score matching (PSM)Also regression-based methods and conditional difference in differences (DID)Previous LM history may have been affected by discrimination but nothing we can do about thisSampling frameSample selected on inflow into programmeAddresses differential selection off programmeSampling frameSample selected on inflow into programmeAddresses differential selection off programmeInflow window is 2003, allowing:3-year pre-inflow labour market history1-year follow-upInflow windowPrevious labour market historyOutcomesDec 2004Jan 20002003Outcomes of interestTwo dimensions of labour market statusIn employment (15+ days in the month)On benefit (15+ days in the month)Benefit definition includes:IS, IB, JSA, New Deal options, Basic Skills and Work-Based Learning for AdultsMeasured monthly38DataPrimarily Work and Pensions Longitudinal Study (WPLS)Benefit and employment spells for anyone on a DWP benefit since mid-1999Also contains limited demographics including sex, DOB, ethnicity and postcodeAlso used National Benefit Database (NBD) and census informationX variablesEmployment and benefit historyPast participation in voluntary programmesPast participation in Basic SkillsIndividual characteristicsGender, age, month of inflowProxies for education and wealth (from census)Local area characteristics (region, travel-to-work-area unemployment)Other programme-related informationWhat did we find?For most programs and benefits (with exception of IS and IB), Minorities and Whites are simply too different for satisfactory estimates to be calculated and results are sensitive to the methodology used. MASSIVE COMMON SUPPORT PROBLEMSThis calls into question previous results based on simple regression techniques, which may hide the fact that observationally different ethnic groups are being compared by parametric extrapolation. In some cases, depending on method used, eg NDLP we could find significant ethnic penalites in employment (raw and DID), no ethnic penalty (regression methods) and significant ethnic premium (PSM)IB: raw labour market status

IB: overall employment resultReliability of matching: CS(0), UC(28) (i.e. reliable according to our criteria)

IB: overall benefit resultReliability of matching: CS(0), UC(28) (i.e. reliable according to our criteria)

*

*

*

*Need other methods to do this properlyUsing administrative data to analyse this question very problematicProblem due to the fact that the Ethnic Minority and White clients accessing the same JCP office are very different in the UK with exception of IS and IB recipientsMight not be problem in other countries but could be.......Not problem with ADMIN data just cant be used for this question45