Upload
angelica-austin
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
Program ReviewSupport Tool
Nathan PellegrinResearch Analyst
Goals
Background and purpose of the tool
Demonstration
Cal-PASS update
OLAP Development at Cal-PASS
OLAP Success Story: SSPIRE Cube
Future development
The Program Review Support Tool
• Funded by Hewlett.• Currently being tested by several colleges.• All data is from MIS.• Caveat: figures may not match what is found in locally
produced reports due to differences in master data sources and formulae used to derive figures.
• Like a “data smorgasbord” and includes – student demographics– course grades– TOP code course hierarchy
… the menu will be expanding !
Purpose• Not the product of a mandate or requirement
from the Chancellors office. • Not intended to take the place of local tools.• Not intended to drive evaluation activities.• Intended for use by colleges as an optional
FREE tool in their program review process.• Obtain feedback from users to scale and
improve our data model and OLAP infrastructure.
Cal-PASS Statistics
Over 300,000,000 records
Up to 15 years of data in some regions
Over 7,000 schools, colleges and university members
Over 150 research studies conducted in the last two years
Sixty-six Professional Learning Councils (1,200+ faculty)
Universities (23)CSU .CSU .•Channel Islands •Dominguez Hills •Fresno •Long Beach •Los Angeles •Monterey Bay•Pomona •Sacramento •San Bernardino •San Marcos•Stanislaus •San Francisco•San Jose•Sonoma
UC UC ..•Davis •Merced •Riverside •San Diego •Santa Barbara•Santa Cruz
PRIVATE PRIVATE . .•Otis College of Art and Design •National University •University of the Pacific
Changing the Paradigm:OLAP Applications
• OLAP = On-Line Analytical Processing• Like Excel pivot tables, except Excel handles only two
dimensional data.• Stores pre-computed aggregations of data with B-Tree
indexing for delivering fast retrieval times and fast calculation.• Enables users to perform analysis of data quickly with drag-
and-drop manipulation of variables and dynamic visualization.• Web-based for easy access – all processing is performed on
the server so it does not tie up your work station (zero footprint).
• Big time savings!• Ideal for the action research paradigm and design research.
User Interface - Dundas
OLAP Cube - SSAS
Database(s) – SQL Server
MDX
SQL
3 Layers of the Application
Development Process of the OLAP project is a technical collaboration between IT and Research …
• Server Architecture/O.R. – Alex Zakharenkov (IT)• Submission Processes/User Interface – Nick Wade (IT)• Data Model/ETL – Nathan Pellegrin (Research)• Design/Feedback of OLAP cubes - All IT and Research
Staff, including Terrence Willett and Mary Kay Patton
Development times
• Development of initial Dim Model started in July 2008 … incremental additions/changes congealed into a (basic) model by February, 2009.
• Initial development of Program Review, including feedback and changes ≈ 8 weeks.
• Dim model ETL execution ≈ 15 hours.• Processing of OLAP cube ≈ 20 min./300K rows .• Initial deployment of UI ≈ 3 weeks. Several
changes since then. • UI required tweaks to OLAP cube design.
Development Tools
• .NET• SQL Server
– storage– Integration Services– Analysis Services
• BIDS• Dundas
LA
Object Object RepositorRepositor
yy
UCUC
CCCCCCCCOMISOMIS
K-12K-12
UC
CSU
Private
CSU
UniversitiesUniversities
Cal-PASS Cal-PASS SubmissiSubmissi
onon
ETL
0607,01612590000001,0000179441,U,,12,ABARCA,CARLOS,09091988,M,500,,,,,,,,,,,,,,,,,,,01,15,1,,,,,,N,010,Y,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,01,10032006,N,275,0,0,4,24,0,0,0,0,0,0,,10032006,X,,,,,,,,,,,,,,,,,,,,,,,N,Y,,,8,,,,,,,,,,0607,01612590000001,0000154281,9107510861,,11,BLACK,BRITNI,11291990,F,600,,,,,,,,,,,,,,,,,,,00,13,1,,,,,,N,000,N,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,01,10032006,N,302,1,8,6,35,3,15,5,28,4,33,01,10032006,N,340,5,71,12,67,18,90,0,0,7,47,2.5,,,,,,,,,,,N,N,U,72,80,,,,,,,,,,0607,01612590000001,0000159553,U,,11,BOWIE,EARLISHA,10231988,F,999,,,,,,,,,,,,,,,,,,,00,14,1,,,,,,Y,060,Y,,Y,,,,,,,,,,,,,,,,,,,,,,,Y,,,,,,,,,,,,,,,,,,,01,10032006,N,278,3,23,4,24,4,20,0,0,0,0,,10032006,A,,,,,,,,,,,,,,,,,,,,,,,N,N,,,40,,,,,,,,,,0607,01612590000001,0000161233,U,,
9107510861,,11,BLACK,BRITNI,11291990,F,600,,,,,,,,,,,,,,,,,,,00,13,1,,,,,,N,000,N,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,01,10032006,N,302,1,8,6,35,3,15,5,28,4,33,01,10032006,N,340,5,71,12,67,18,90,0,0,7,47,2.5,,,,,,,,,,,N,N,U,72,80,,,,,,,,,,0607,01612590000001,0000159553,U,,11,BOWIE,EARLISHA,10231988,F,999,,,,,,,,,,,,,,,,,,,00,14,1,,,,,,Y,060,Y,,Y,,,,,,,,,,,,,,,,,,,,,,,Y,,,,,,,,,,,,,,,,,,,01,10032006,N,278,3,23,4,24,4,20,0,0,0,0,,10032006,A,,,,,,,,,,,,,,,,,,,,,,,N,N,,,40,,,,,,,,,,0607,01612590000001,0000161233,U,,
CUSTOMFILES
•Analytical•Integrated K-12/CC/Univ•Time-dependent•2NF (Redundant CK)•Optimized Indexing
ETL
DimensionDimensional Modelal Model
•Semistructured data•Format/value Validation
•Storage•Application Integration•Key-value pairs (KVP) design
Cal-Cal-PASS PASS Data Data FlowFlow
What does a dimensional data model do for Cal-PASS?What does a dimensional data model do for Cal-PASS?
UNIFY: Data from across segments is integrated into a unified dataset.STANDARDIZE: Table and field names, data types and value coding systems are standardized to be the same for all segments.
SIMPLIFY: The number of tables and fields used to store the data is reduced. Granularity of tables are at the units of analysis. Table relationships reflect analytical relationships between entities.
BOOST PRODUCTIVITY: The simpler, cleaner data model makes it easier to develop cubes with re-usable components, generalized for all segments. Currently, analytical data processing must be developed separately for each segment. Using a dimensional model, only one pathway needs to be developed that applies to all segments.
IMPROVE DATA QUALITY: Merging data brings data quality issues to light so they can be noted and/or resolved. Establishing primary and foreign key relationships enforces referential integrity. Multiple student identifiers are unified to produce a single “metakey” Missing course CBEDS classifications imputed using machine learning.
REDUCE RISK: Without it, in order to produce one metric for all segments separate analytical data processing pathways are required for each segment, which means more maintenance and increased risk of inconsistent results. Using a dimensional model the analytical computations and services are centralized.
Organization
Organization
StudentStudent CourseCourse
TermTerm
Course Taxonomy
Course Taxonomy
Student StatusStudent Status
Course Outcome
Course Outcome
AwardAward
Cal-PASS Unified Cal-PASS Unified Dimensional Data ModelDimensional Data Model
(Selected Tables)(Selected Tables)
= Fact Table
= Dimension Table
= Foreign Key Relationship
Dimensional
Model Tables
Dimensional
Model Tables
The Ideal: Centralization of analytical query processing
Each statistic can emerge at multiple presentation points, but there is only one logical control point.
Views and Stored Proc’s
OLAP
Presentation & User Engagement
User-defined cohorts; model outputs
Student identifiers from each source system are mapped to a new identifier through transitive closure of all connected values (using a modified version of the Floyd–Warshall algorithm).
Local district student id
CSIS SSID
Name + gender + DOB
CCCCO SID (SB00)
n1d
1
d2
n2
o1
c1
d3
m1
Each edge represents a record linking two values of different identifiers in submitted student records.
OLAP Success Story: SSPIRE Cube
• Funded by Irvine Foundation.• Currently used by nine colleges.• Incorporates MIS data with data submitted by colleges
(custom files).• Tracks cohorts of students.• Demonstrate using Merced college (thank you Dr. Duran!)
Program Review Support Tool
This is only the beginning…
• Provide access to K-12 districts and Universities• Inter-segmental OLAP Cubes• Link non-academic outcomes (Employment
Development Department, Child Welfare Data System)
“Success at Every Level”
Education Data and Information Act of 2008SB 1298
1. convene a high-level working group to decide the best the governance structure for the comprehensive education data system;
2. directs the State Chief Information Office (CIO), in consultation with educators and education policymakers, to prepare a strategy plan outlining a clear path for technical implementation; and
3. requires the various education segments to begin using a common student identifier, so that once a governance structure and technical architecture are in place we can begin linking records from pre-k through the university with relative ease and speed.
Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/
23
CDE Data Systems
High level cross-agency systems map of key collections WORKING DRAFT
NOT EXHAUSTIVE
CASAS TOPS Pro
SACS
Assessments
CALPADS**
CASEMIS
Migrant
ConApps
AYP/API
Early Childcare
CALTIDES**
CPEC
CALPASS
UC CSS
EDD
CSU ERS CCC COMIS
CCTC CASE
Other CDE systems/ units including CDS, Charter schools
Other CDE units including Homeless, CALSAFE, Title 3,Private Schools etc.
Non CDE Data Systems
Prisons, Census
From Franchise tax, benefits system etc.
National Student Clearinghouse
Data sharing through local agencies
Direct data sharing*
Planned/ potential
* Does not imply direct data linkages. Only state system linkages shown
** CALPADS is envisioned to replace much of the CBEDS, Language Census, Student National Origin Report and select Consolidated Application data
*** CALTIDES is envisioned to collect data primarily from CALPADS and Commission on Teacher Credentialing’s CCTC’s Credential Automation System Enterprise CASE system
Source: Interviews with respective agencies, RAND, team analysis
In development
Existing
CDPH
Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/
24
High level system profiles of key CDE collections (1/2)System name Description
CALPADS California Longitudinal Pupil Achievement Data System. System (under development) for tracking K12 students longitudinally, that will replace CBEDS collections
Key identifier Data categories Granularity Data sharing
SSID Student demographic, program participation, grade level, enrollment, course enrollment and completion, discipline, and statewide assessment
Student Planned include- Assessments, API/AYP, Migrant, ConApps,
CALTIDES California Longitudinal Teacher Integrated Data Education System. Iintegrated data system for teacher data based on unique SEID
SEID Teacher credentials, authorizations, teacher participation program, alternative routes, participation in Beginning Teacher Support and intern program, SEID, Salary
Student Planned include- CALPADS, CCTC CASE
CASEMIS California Special Education Management Information System. Integrated data system for special education students on students, services and provider programs
SSID Attendance/Enrollment, Disciplinary, Education Agency, Mobility, Special Education, Staffing Data, Student Demographic, Other (services, age, gender, race/ethnicity)
Student, School district, School, county, region
None at state level
Assessments California High School Exit Exam CAHSEE, Standardized Testing and Reporting STAR and CELDT
SSID Attendance/Enrollment, Education Agency, Food and Nutrition, Parent Data, Special Education, Student Demographic
Student, School District, School, County
CASAS, Migrant, AYP/API, CALPADS (planned)
API/ AYP Accountability related information based on California's Public Schools Accountability Act of 1999 as well as No Child Left Behind Act of 2001
CDS code AYP/API score by student characteristics School Assessments
DRAFT
Source: Respective CDE departments
Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/
25
High level system profiles of key CDE collections (2/2)System name Description
Migrant Student enrollments in migrant education programs. Includes migrant education forms and a directory of offices providing services
Key identifier Data categories Granularity Data sharing
Migrant ID, COE number, CDS code
Student demographics, educational programs, counseling, health and support services, emergency health, clothing, food, transportation
Student Assessments, CALPADS (planned)
SACS Standardized Account Code Structure. Offers LEAs with a means of reporting financial information
CDS code For every general ledger accounting transaction- information on funds, resources, project year, goal, function, and object. Includes information on Attendance/Enrollment, Education Agency, Fiscal, Transportation
School, District CDS, Charter schools
ConAPPS Consolidated applications. Includes information on categorical programs e.g., Title I, II, V etc.
CDS code Student demographic, Title I, III, V, Part A, Immigrant, LEP, funding model, charter status, Gradespan, participants
School, District, County
CALPADS (planned)
Early Childcare Systems
CD-801A,B, CDMIS, Special Education Desired Result System SEDRS, and CD 9600
SSID Child demographics, IEP flag, family identification/case number, household name, type of program, DRDP Desired Result Development Profile, Early Childhood Environment Rating Scale ECERS
Student None
CASAS TOPSPro
Comprehensive Adult Student Assessment Systems. System for tracking Students in Adult Education Programs
ADA ID, SSID, CASAS no
Student demographics, Agency, instruction level and program, assessment scores, date of entry, reason for exit, class number, attainable goal within program year
Student Assessments
DRAFT
Source: Respective CDE departments
Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/
26
High level system profiles of non-CDE collectionsSystem name Description
CPEC California Post Secondary Education Commission. Data system for Higher Ed- post secondary systems
Key identifier Data categories Granularity Data sharing
Student ID based of SSN
Demographic, IEP, grade level, program, Graduation rate, teacher, institution
Student CDE, CSU, UC,CCC, prison, census
UC CSS Corporate Student System provides information on student enrollment and performance for University of California campuses
SSN Student demographic, income, financial aid, education history, assessment
Student CDE, CCC, CALPASS
DRAFT
CCC COMIS California Community Colleges Management Information System. COMIS data is used to prepare reports for Federal and State reports including Integrated Postsecondary Education Data System (IPEDS) and to track student outcomes
SSN, Student ID student demographic, income, financial aid, education history, assessment, teacher, institution
Student CALPASS, CPEC, CSU, EDD, National Student Clearinghouse
CSU ERS Enrollment RecordingSystem is used by Cal State totrack student retention and graduation to support regular term reports, IPEDS, and state budget requests
SSN Student demographic, financial aid, education history, assessment
Student CPEC, CALPASS, CCC
CDPH California Department of Public Health. System use to track
CDPH ID Case ID and demographics, clinical and diagnostic data
Case None
EDD Employment Development Database ID based of SSN Wages, payroll taxes, unemployment tracking, job matching, job training
Employee Franchise tax, benefits system, CCC
Source: Respective agencies, RAND
Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/
HAPPY DATA! Data Data
Thank you! Have fun ….