23
Student Centered ODS ETL Processing

Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

Embed Size (px)

DESCRIPTION

≠ PADMIDSnapshotFgUpdtDtEndDt T1008F :21:36.117NULL Source Data SMFConfig_ID 134 IDUnqTstEvent_IDEffDtEndDt :00:00.000NULL Event SMFConfig_ID 134 The current date and time is 11/21/ :15AM when load starts IDUnqTstEvent_IDEffDtEndDt :00:00.000NULL Event SMFConfig_ID 134 The current date and time is 11/21/2008 2:30PM when load starts How the effective date is determined before / after noon In some rare cases, a row is found to be inserted that has already been end dated. In that case it will be added to the database and be effective for one day. IDUnqTstEvent_IDEffDtEndDt :00: :59: Event SMFConfig_ID 134

Citation preview

Page 1: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

Student Centered ODS

ETL Processing

Page 2: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

InsertInsert Search for rows not previously in the Search for rows not previously in the

database within a snapshot type for a database within a snapshot type for a specific subject and yearspecific subject and year

Check for duplicatesCheck for duplicates Identify testIdentify test Lookup metadataLookup metadata Create Unique Test Event identifiersCreate Unique Test Event identifiers Load dataLoad data Copy Previous Year Test detailsCopy Previous Year Test details

Page 3: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

≠PADMID SnapshotFg UpdtDt EndDt

20179458T1008 F 2008-11-21 09:21:36.117 NULL

Source DataSMFConfig_ID

134

ID UnqTstEvent_ID EffDt EndDt

567917 438502 2008-11-21 00:00:00.000 NULL

EventSMFConfig_ID

134

The current date and time is 11/21/2008 10:15AM when load starts

ID UnqTstEvent_ID EffDt EndDt

567917 438502 2008-11-22 00:00:00.000 NULL

EventSMFConfig_ID

134

The current date and time is 11/21/2008 2:30PM when load starts

How the effective date is determined before / after noon

In some rare cases, a row is found to be inserted that has already been end dated. In that case it will be added to the database and be effective for one day.

ID UnqTstEvent_ID EffDt EndDt

567917 438502 2008-11-22 00:00:00.000 2008-11-22 23:59:59.000

EventSMFConfig_ID

134

Page 4: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

UpdateUpdate Search for changes to rows already in the ODS Search for changes to rows already in the ODS

from Staging within a snapshot type for a specific from Staging within a snapshot type for a specific subject and year that occurred since the last run of subject and year that occurred since the last run of the update ETLthe update ETL

No need to check for duplicatesNo need to check for duplicates Identify testIdentify test Lookup metadataLookup metadata Load dataLoad data

– Determine how to apply updateDetermine how to apply update Normally end date prior version of row and insert new rowNormally end date prior version of row and insert new row On some occasions rows that had been previously end dated On some occasions rows that had been previously end dated

may be reintroduced to the ODSmay be reintroduced to the ODS Sometimes the only action is to end date the current rowSometimes the only action is to end date the current row

Copy Previous Year Test detailsCopy Previous Year Test details

Page 5: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

=PADMID SnapshotFg UpdtDt EndDt

20179458T1008 F 2008-11-21 09:21:36.117 NULL

Source DataSMFConfig_ID

134

How the effective date is determined before / after noon

ID UnqTstEvent_ID EffDt

567917 438502 2008-11-21 00:00:00.000

Event

SMFConfig_ID

134

The current date and time is 12/15/2008 10:15AM when load starts

In ODS

EndDt

2008-12-14 23:59:59.000

627423 438502 2008-12-15 00:00:00.000134Change from Staging NULL

FNm

Bob

Robert

The current date and time is 12/15/2008 2:30PM when load starts

ID UnqTstEvent_ID EffDt

567917 438502 2008-11-21 00:00:00.000

Event

SMFConfig_ID

134In ODS

EndDt

2008-12-15 23:59:59.000

627423 438502 2008-12-16 00:00:00.000134Change from Staging NULL

FNm

Bob

Robert

Page 6: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

Update LogicUpdate Logic Read updated row from StagingRead updated row from Staging Read current record from ODSRead current record from ODS If row in ODS is end dated If row in ODS is end dated

– Add new row to ODSAdd new row to ODS If row in ODS is not end datedIf row in ODS is not end dated

– Compare columns Staging <> ODSCompare columns Staging <> ODS– If the columns are differentIf the columns are different

End date ODS rowEnd date ODS row Add Staging row to ODSAdd Staging row to ODS

– If the columns are the same and the end date in If the columns are the same and the end date in the ODS row is not set and the end date from the the ODS row is not set and the end date from the Staging record is setStaging record is set

End date ODS rowEnd date ODS row

Page 7: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

Delete (soft)Delete (soft) For the subject / year table search for rows in the ODS that are no For the subject / year table search for rows in the ODS that are no

longer in Staginglonger in Staging The examinee table is not checked for end dated rows, only the The examinee table is not checked for end dated rows, only the

subject / year table is used to determine if a delete is needed per subject / year table is used to determine if a delete is needed per AssessmentAssessment

A Cascaded update is performed end dating the entire test event A Cascaded update is performed end dating the entire test event and all related rows from the following tables:and all related rows from the following tables:– EventEvent– EventInstEventInst– EventIndEventInd– BnchLvlBnchLvl– ScoreScore– RaterScoreRaterScore– PaperPencilDataPaperPencilData– CmptrBasedDataCmptrBasedData– EventClsRmEventClsRm

The end date is determined the same way as in the insert and The end date is determined the same way as in the insert and update ETL update ETL

Page 8: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

ExamineeSubject / Year Table

CmptrBasedDataEvent BnchLvlEventClsRmPaperPencilDataEventIndEventInst RaterScoreScore

Source

ODS

The affects of source tables on Student Centered tables

Not all tables are loaded for all tests• Score, Benchmark Level and Rater Score are not loaded for Virtual tests• Presently only Writing has Rater Scores (possibly ELPA in the future)• Paper / Pencil tests load to Paper Pencil Data and Event Class Room• Computer based tests load to Computer Based Data• Writing has only total Benchmark Level scores while other subjects have

Benchmark Level scores at the category (aka strand) level

Some tables are only loaded if there are values present• Only non-blank scores are loaded to the Benchmark Level table• Only non-null scores are loaded to the Score and Rater Score tables• Only non-null institution identifiers are loaded to the Event Institution table

Page 9: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

Example of applying updates

EventEff 5/1/2008End Null

BnchLvlEff 5/1/2008End Null

EventClsRmEff 5/1/2008End Null

PaperPencilDataEff 5/1/2008End Null

EventIndEff 5/1/2008End Null

EventInstEff 5/1/2008End Null

RaterScoreN/A

ScoreEff 5/1/2008End Null

CmptrBasedDataN/AInsert

Subject / Year TableSource Examinee

Subject / Year TableSource

UpdateEventInstEff 5/1/2008End 5/5/2008

EventIndEff 5/1/2008End 5/5/2008

Row is inserted

Update to Subject / Year table

EventInstEff 5/6/2008End Null

EventIndEff 5/6/2008End Null

Insert

Update to Subject / Year table

UpdateEventEff 5/1/2008End 5/8/2009

Subject / Year TableSource

EventEff 5/9/2008End Null

Insert

<< Changes to an Institution and an indicator – end date current rows

<< Insert new current rows – null end date

<< Changes to student demographic data – end date current row

<< Insert new current row – null end date

Page 10: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

MaintenanceMaintenance Occasionally there may be the need to make Occasionally there may be the need to make

corrections or to reload portions of the ODScorrections or to reload portions of the ODS After a few years there may also be the need After a few years there may also be the need

to remove some lower level of detail from to remove some lower level of detail from the ODSthe ODS

In any case as maintenance is needed In any case as maintenance is needed communication will be made to inform communication will be made to inform clients of what changes are coming and clients of what changes are coming and some suggestions on how to deal with those some suggestions on how to deal with those changeschanges

Page 11: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

How to store the Extracted ODS DataHow to store the Extracted ODS Data

• It is recommended that the extract layout It is recommended that the extract layout be used as a guide for the staging be used as a guide for the staging databasedatabase

• A data model is available in this format for A data model is available in this format for your useyour use

• The model is in power designer 12.5 and The model is in power designer 12.5 and available in html format for reviewavailable in html format for review

Page 12: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

Considerations for Considerations for your local ODSyour local ODS

The ID column from each table is sufficient The ID column from each table is sufficient for a primary keyfor a primary key

Columns ending in _ID are foreign keys Columns ending in _ID are foreign keys from other tables and should be indexedfrom other tables and should be indexed

The SMFConfig_ID can be used for vertical The SMFConfig_ID can be used for vertical partitioning of the Core Content datapartitioning of the Core Content data

Additional indexes are in the data model Additional indexes are in the data model and you may want to tune and add more and you may want to tune and add more based on your needsbased on your needs

Page 13: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

Loading data to your Loading data to your local ODSlocal ODS

If you follow the suggested database design, If you follow the suggested database design, the process of loading data is simplethe process of loading data is simple– On your schedule you will receive dataOn your schedule you will receive data

Rows that match by ID are updatedRows that match by ID are updated Rows not found by matching on ID are insertedRows not found by matching on ID are inserted

The Security tables are replaced on each The Security tables are replaced on each load in totalload in total

That’s all there is to keeping your local ODS That’s all there is to keeping your local ODS up to date.up to date.

Page 14: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

How to retrieve data How to retrieve data from your local ODSfrom your local ODS

Generally there are two types of queriesGenerally there are two types of queries– Current dataCurrent data– Data as of a dateData as of a date

For current data select where the end date is nullFor current data select where the end date is null– select * select *

from Event from Event where UnqTstEvent_ID = 500 where UnqTstEvent_ID = 500 and Enddt is null and Enddt is null

For data as of a date a query such as this will workFor data as of a date a query such as this will work– select * select *

from Event from Event where EffDt <= ‘2008-11-19 10:57:00.000’ where EffDt <= ‘2008-11-19 10:57:00.000’ and (EndDt is null or EndDt >= ‘2008-11-19 and (EndDt is null or EndDt >= ‘2008-11-19 10:57:00.000’) and 10:57:00.000’) and UnqTstEvent_ID = 500UnqTstEvent_ID = 500

Page 15: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

Formal and Inferred Formal and Inferred Registration System Registration System (FIRS)(FIRS)

FIRS makes it possible for ODE to give the proper FIRS makes it possible for ODE to give the proper data for each of the students served by your clientsdata for each of the students served by your clients

Spring and Fall Membership, all state assessments Spring and Fall Membership, all state assessments and any student transfers through OSTX since 2004-and any student transfers through OSTX since 2004-2005 are used by FIRS to form a chronology of 2005 are used by FIRS to form a chronology of which institutions a student was related to and which institutions a student was related to and whenwhen

This information is used by the extract process to This information is used by the extract process to provide the most complete data possibleprovide the most complete data possible

In the coming year the new Consolidated ADM data In the coming year the new Consolidated ADM data collection will begin providing information to this collection will begin providing information to this process as wellprocess as well

The tables FIRS and DistUnqTstEvent provided in The tables FIRS and DistUnqTstEvent provided in your extract are taken from this systemyour extract are taken from this system

Page 16: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

Extract ProcessExtract Process Regions must provide a list of districts in order to Regions must provide a list of districts in order to

receive the extractreceive the extract The ODE helpdesk will setup the relationships The ODE helpdesk will setup the relationships

between the Region and their districts between the Region and their districts The first extract will be a fullThe first extract will be a full The next extract will change to an incremental The next extract will change to an incremental

automaticallyautomatically An incremental extract includes changes for An incremental extract includes changes for

continuing students plus full extracts for students continuing students plus full extracts for students new to the client districtsnew to the client districts

Each time an extract is performed the last extract Each time an extract is performed the last extract date in the Region’s configuration is setdate in the Region’s configuration is set

The next extract will contain all changes since the The next extract will contain all changes since the last extract datelast extract date

Page 17: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

Extract Process Extract Process (continued)(continued) Data is extracted into the formats specified by the Data is extracted into the formats specified by the

StudentCenteredExportFormat.xlsStudentCenteredExportFormat.xls One file is produced per table described in the One file is produced per table described in the

formatformat The files are in CSV format with text delimited by The files are in CSV format with text delimited by

quotesquotes Files will only be produced if there are rows Files will only be produced if there are rows

qualifying for the extractqualifying for the extract The file Manifest.txt contains a list of the files The file Manifest.txt contains a list of the files

extracted with the count of rows and when it was extracted with the count of rows and when it was producedproduced

The CSV files and the Manifest.txt are compressed The CSV files and the Manifest.txt are compressed into a .zip file and placed on ODE’s secure FTP site into a .zip file and placed on ODE’s secure FTP site for pickup by the Regionfor pickup by the Region

Page 18: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

File Transfer and File Transfer and SchedulingScheduling

When the districts the Region serves are When the districts the Region serves are communicated to the ODE Helpdesk ODE communicated to the ODE Helpdesk ODE will also make sure security is setup for will also make sure security is setup for connecting to the secure FTP serverconnecting to the secure FTP server

Instructions for connection will be providedInstructions for connection will be provided The ODE helpdesk can schedule which The ODE helpdesk can schedule which

days the Region will receive filesdays the Region will receive files The same scheduling system is already in The same scheduling system is already in

use for extracting data from Student use for extracting data from Student Centered StagingCentered Staging

Page 19: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

Requested extractsRequested extracts At times it may be necessary for the Region At times it may be necessary for the Region

to receive a full extract to repopulate your to receive a full extract to repopulate your local ODS or the Region may have missed local ODS or the Region may have missed some extracts produced previouslysome extracts produced previously– Full extracts can be scheduled by the ODE Full extracts can be scheduled by the ODE

HelpdeskHelpdesk The number that can be done in one day is limitedThe number that can be done in one day is limited Full extracts may have to wait until the weekendFull extracts may have to wait until the weekend Full extracts will not be provided on a regular basisFull extracts will not be provided on a regular basis

– For missed extracts, the ODE Helpdesk can set For missed extracts, the ODE Helpdesk can set the date as of which the extract will pull the date as of which the extract will pull information and provide a larger incremental information and provide a larger incremental extractextract

The date as of which the extract will pull information is The date as of which the extract will pull information is cleared after the run of the extractcleared after the run of the extract

Processing will return to normal automaticallyProcessing will return to normal automatically

Page 20: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

Limiting AccessLimiting Access Each extract will provide a new copy of the security files Each extract will provide a new copy of the security files

(FIRS and DistUnqTstEvent)(FIRS and DistUnqTstEvent) The FIRS file contains the district related to the student with The FIRS file contains the district related to the student with

the end date provided for review if neededthe end date provided for review if needed The DistUnqTstEvent relates the district to the specific test The DistUnqTstEvent relates the district to the specific test

events available to that districtevents available to that district By simply joining through this table when providing access By simply joining through this table when providing access

to your clients you can restrict access to just the to your clients you can restrict access to just the information they are allowed to accessinformation they are allowed to access

Declare @DistInstID intDeclare @DistInstID intSet @DistInstID = 2082Set @DistInstID = 2082

select e.*select e.* from Event efrom Event e join DistUnqTstEvent djoin DistUnqTstEvent d on e.UnqTstEvent_ID = d.UnqTstEvent_IDon e.UnqTstEvent_ID = d.UnqTstEvent_ID and d.DistInstID = @DistInstIDand d.DistInstID = @DistInstID

Page 21: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

Data ModelData Model Follows the same definitions provided by the Follows the same definitions provided by the

StudentCenteredExportFormat.xls spreadsheetStudentCenteredExportFormat.xls spreadsheet Organized around the document as well providing Organized around the document as well providing

different color coding for the same major subjects as different color coding for the same major subjects as the spreadsheet:the spreadsheet:– Core Content DataCore Content Data– Score DataScore Data– File Processing Control DataFile Processing Control Data– MetadataMetadata– SecuritySecurity– – plus –plus –– Possible changesPossible changes

The model show how the tables are related to each The model show how the tables are related to each other and provides useful information about the dataother and provides useful information about the data

The model is in Power Designer 12.5; an html version The model is in Power Designer 12.5; an html version is available to review the model that contains table is available to review the model that contains table create statements as wellcreate statements as well

Page 22: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

Pulling it all togetherPulling it all together The SMFConfig table is essentially a link to the subject The SMFConfig table is essentially a link to the subject

and year that the data belongs withand year that the data belongs with Most of the metadata is contained in a Table of Most of the metadata is contained in a Table of

Tables / Master Codes scheme which houses virtual Tables / Master Codes scheme which houses virtual tables and related code valuestables and related code values

Each row in the Table of Tables represents a virtual Each row in the Table of Tables represents a virtual tabletable

Rows in the Master Codes table relate back to the Rows in the Master Codes table relate back to the Table of Tables for rows that represent the values Table of Tables for rows that represent the values stored in the virtual table of tablesstored in the virtual table of tables

The Ctgry table is used to indicate the score reporting The Ctgry table is used to indicate the score reporting category (aka strand) related to score datacategory (aka strand) related to score data

The Ctgry table also contains entries for total scoresThe Ctgry table also contains entries for total scores Review samplesReview samples

Page 23: Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search

Student Centered Student Centered ODSODS

Thanks for coming!Thanks for coming!