Click here to load reader
Upload
doantuong
View
278
Download
2
Embed Size (px)
Citation preview
i
6TH INTERNATIONAL CONGRESS ON MEASUREMENT AND
EVALUATION IN EDUCATION AND PSYCHOLOGY
September 5-8, 2018
Prizren, KOSOVO
The responsibility of the contents of this book belongs to the authors of the papers.
September, 2018
ii
INTRODUCTION
Proceedings, conferences, and panel discussions were held at the 6th International
Conference on Measurement and Evaluation in Psychology at Prizren University in Kosovo. The
Congress was hosted by Ankara University for the first time in 2008. It has gained an international
quality this year since this is the first time Congress has organized abroad, in Kosovo.
Among the participants of Congress were the Ministry of Education representatives from
Turkey, academics who are working in different universities in Turkey, testing and assessment
professionals and graduate students. 122 researchers participated in the Congress.
In cooperation with Prizren University of Kosovo, in the opening of the Congress included
a large protocol consisting of the Kosovo Turkish Democratic Party MP, the Mayor of Prizren, the
Turkish Consulate in Prizren, the Rector of the Prizren University, the Dean of the Faculty of
Education and the Vice Dean, the representative of the Kosovo Yunus Emre Institute and the
representative of the Kosovo Military Representative Committee. Protocol, stated that they were
happy to welcome guests from Turkey and emphasized that the importance of education and
cooperation. Keynote speakers of the Congres were Hasan Kalyoncu University Vice Rector, Dean
of Faculty of Education, Chairman of the Board of Directors of EPODDER (Association of
Educational and Psychological Assessment and Evaluation).
A total of 12 faculty members working in 7 different countries took part in the scientific
committee of the congress organized by Prizren University, Hasan Kalyoncu University, and
EPODDER. Two invited speakers attended the congress. Congress included lectures titled as
Using Unidimensional Models to Represent a Two Dimensional Latent Space by Prof. Dr. Terry
Ackerman and Strategies for Addressing High Dimensional Cognitive Diagnostic Assessment
Problems by Jimmy De La Torre.
A total of 105 papers about large-scale tests were presented in 28 different sessions. Most
of them focused on the preparation, scoring, application, and evaluation of tests. These studies
were carried out on the basis of international scale tests such as PISA, TIMSS, as well as national
large-scale tests such as TEOG and SBS. The presentations were made in two languages, English
and Turkish. In a panel regarding the problems and solutions experienced in the transition between
the grades in Turkey were discussed.
We want to express our gratitude to the Pegem Academy, Nobel Publishing House, Yunus
Emre Institute, TIKA, Ministry of National Education, Evaluation and Examination Services. We
also thank the members of the Congress Organizing Committee and the members of the
Scientific Committee who have contributed to the organization of our congress.
Co-Presidents of the Congress Organizing Committee
Prof. ener Bykztrk Prof. Selahattin Gelbal
iii
CONGRESS HONORARY MEMBERS
Prof. Dr. Tamer Ylmaz, Hasan Kalyoncu University
Prof. Dr. Ram Vataj, Prizren University
CO-CHAIRS OF THE CONGRESS ORGANIZATION BOARD
Prof. Dr. ener Bykztrk, Hasan Kalyoncu University
Prof. Dr. Selahattin Gelbal, Hacettepe University
CONGRESS ORGANIZATION BOARD
Assoc. Prof. Ismet Temaj, Prizren University
Assistant Prof. Shemsi Morina, Prizren University
Assoc. Prof. smail Karakaya, Gazi University
Assoc. Prof. Alper Kse, Bolu Abant Izzet Baysal University
Assistant Prof. Ufuk Akba, Hasan Kalyoncu University
Assistant Prof. Ersoy Karabay, Hasan Kalyoncu University
Assistant Prof. Soner Yldrm, Prizren University
Meral Alkan, PhD, Ministry of National Education
Serdan Kervan, PhD, Prizren University
SCIENCE COMMITTEE*
Ali Baykal, Bahcesehir University
Cindy M. Walker, Duquesne University
Hlya Keleciolu, Hacettepe University
Ismet Temaj, Prizren University
Jimmy De La Torre, The University of Hong Kong
Kadriye Ercikan, University of British Columbia
Mehtap akan, Gazi University
Ron Hambleton, University of Massachusetts Amherst
Selahattin Gelbal, Hacettepe University
ener Bykztrk, Hasan Kalyoncu University
Terry Ackerman, The ACT
Zekeriya Nartgn, Abant zzet Baysal University
*Sorted in alphabetical order.
SECRETARY
Ayfer Sayn, PhD, Gazi University
Research Assistant Merve Yldrm Seheryeli, Hasan Kalyoncu University
iv
WEB DESIGN AND DEVELOPMENT
Software Developer Turul Trpan, Hasan Kalyoncu University
REVIEWERS/REFEREES*
Adnan Erku, PhD.
Ahmet Salih imek, PhD.
Ahmet Turhan, PhD.
Ahmet Yldrm, PhD.
Asiye engl Avar, PhD.
Ayfer Sayn, PhD.
Aylin Albayrak Sar, PhD.
Ayegl Altun, PhD.
Bahar ahin Sarkn, PhD.
Bayram Bak, PhD.
Bilge Gk Bekci, PhD.
Burak Aydn, PhD.
Burhanettin zdemir, PhD.
C. Deha Doan, PhD.
Cem Oktay Gzeller, PhD.
idem Akn Arkan, PhD.
Deniz Tue zmen, PhD.
Derya akc Eser, PhD.
Derya obanolu Aktan, PhD.
Devrim zdemir Alc, PhD.
Dilara Bakan Kalaycolu, PhD.
Durmu zba, PhD.
Duygu Anl, PhD.
Duygu Gizem Saral Ertoprak, PhD.
Duygu Koak, PhD.
Elif Bengi nsal zberk, PhD.
Emine Burcu Pehlivan Tun, PhD.
Emine nen, PhD.
Emrah Gl, PhD.
Emre Toprak, PhD.
Eren Can Aybek, PhD.
Eren Halil zberk, PhD.
Ersoy Karabay, PhD.
Esin Tezbaaran, PhD.
Esin Ylmaz Koar, PhD.
Esra Eminolu zmercan, PhD.
Ezel Tavancl, PhD.
Ezgi Mor Dirlik, PhD.
Fatih Kezer, PhD.
Fazilet Tademir, PhD.
Fethi Toker, PhD.
Funda Nalbantolu Ylmaz, PhD.
v
Gizem Uyumaz, PhD.
Gl ekerciolu, PhD.
Glin rak Uzun, PhD.
Glden Kaya Uyank, PhD.
Glen Tadelen Teker, PhD.
Hakan Atlgan, PhD.
Hakan Koar, PhD.
Hakan Yavuz Atar, PhD.
Halil Yurdugl, PhD.
Hamide Deniz Gllerolu, PhD.
Hatice Kumanda, PhD.
Hlya Keleciolu, PhD.
Hlya Yrekli, PhD.
. Alper Kse, PhD.
lhan Akhun, PhD.
lker Kalender, PhD.
smail Karakaya, PhD.
Kadriye Belgin Demirus, PhD.
Kbra Atalay Kabasakal, PhD.
Levent Yakar, PhD.
Lokman Akbay, PhD.
Mehtap akan, PhD.
Melek Glah ahin, PhD.
Meltem Acar Gvendir, PhD.
Meral Alkan, PhD.
Murat Akyldz, PhD.
Murat Doan ahin, PhD.
Mnevver Baman, PhD.
Nagihan ztrk Boztun, PhD.
Nee Gler, PhD.
Nee ztrk Gbe, PhD.
Nilfer Kahraman, PhD.
Nizamettin Ko, PhD.
Nkhet Demirtal, PhD.
Okan Bulut, PhD.
mer Kutlu, PhD.
mr Kaya Kalkan, PhD.
nder Snbl, PhD.
zge Bal Altnta, PhD.
zge Bkmaz Bilgen, PhD.
Ragp Terzi, PhD.
Recep Serkan Ark, PhD.
Sakine Ger ahin, PhD.
Satlm Tekindal, PhD.
Seil mr Snbl, PhD.
Sedat en, PhD.
Seher Yaln, PhD.
Selahattin Gelbal, PhD.
Sema Sulak, PhD.
Sevda etin, PhD.
vi
ener Bykztrk, PhD.
eref Tan, PhD.
eyma Yksel, PhD.
Tlin Acar, PhD.
Trker Toker, PhD.
Ufuk Akba, PhD.
mit elen, PhD.
Yaar Baykul, PhD.
Yavuz Akpnar, PhD.
Yeim zer zkan, PhD.
Yurdagl Gnal, PhD.
Zekeriya Nartgn, PhD.
*Sorted in alphabetical order.
vii
CONTENTS
INTRODUCTION ........................................................................................................................................ ii
CONGRESS HONORARY MEMBERS .................................................................................................... iii
CO-CHAIRS OF THE CONGRESS ORGANIZATION BOARD ............................................................ iii
CONGRESS ORGANIZATION BOARD .................................................................................................. iii
SCIENCE COMMITTEE* .......................................................................................................................... iii
SECRETARY .............................................................................................................................................. iii
WEB DESIGN AND DEVELOPMENT .................................................................................................... iv
REVIEWERS/REFEREES* ....................................................................................................................... iv
CONTENTS ............................................................................................................................................... vii
SUMMARY ..................................................................................................................................................1
AN ANALYSIS ON THE MODEL FIT INDICES IN A SCALE CONSTRUCTION PROCESS .............2
KOMPOZSYON PUANLAMADA PUANLAYICILAR ARASI GVENRLK BELRLEME
YNTEMLERNN KARILATIRILMASI .............................................................................................7
INVESTIGATION of SUBGROUP INVARIANCE OF EQUATING FUNCTIONS .................................8
THE COMPARISON OF ITEM AND ABILITY ESTIMATIONS CALCULATED FROM THE
PARAMETRIC AND NON-PARAMETRIC ITEM RESPONSE THEORY ACCORDING TO THE
SEVERAL FACTORS ................................................................................................................................ 12
A COMPARISON OF KERNEL EQUATING METHODS UNDER NEAT DESIGN ............................ 16
MADDE TEPK KURAMI SMLASYONLARININ YAPI GEERL VE GVENRLK
AISINDAN NCELENMES ................................................................................................................... 17
ANALYSING TEACHERS VIEWS ON THE PROBABLE NEGATIVE EFFECTS OF HIGH STAKE
TESTS THROUGH PAIRED COMPARISONS BASED ON RASCH ANALYSIS ................................ 22
DORULAYICI FAKTR ANALZ VE OK DZEYL MODELLEME ZERNE BR
KARILATIRMA: DENZL L RNE ............................................................................................. 29
APPLICABILITY OF THE THEORETICAL EXAM FOR CANDIDATES OF DRIVING LICENCE
TRAINEES AS THE COMPUTER ADAPTIVE TEST* ........................................................................... 36
THE IMPACT OF COMPLEX MULTIPLE CHOICE ITEMS ON ITEM DIFFICULTY ........................ 41
COMPARING PERFORMANCE OF DIFFERENT EQUATING METHODS in PRESENCE and
ABSENCE of DIF ITEMS in ANCHOR TEST .......................................................................................... 42
DINA MODEL EREVESNDE DIF BELRLENMES ........................................................................ 43
CNSYETN PROBLEML NTERNET KULLANIMI ZERNDEK ETKS: BR META-ANALZ
ALIMASI ............................................................................................................................................... 46
TRK RENCLERN BLG VE LETM TEKNOLOJLERNE ERM VE KULLANIM
DZEYLERNN PISA FEN BAARISINA ETKS ............................................................................... 47
PISA FEN MADDELERNN FeTeMM ETMNN ZELLKLER AISINDAN NCELENMES
..................................................................................................................................................................... 62
COMPARING THE PERFORMANCE OF DATA MINING METHODS IN CLASSIFYING
SUCCESSFUL STUDENTS WITH SCIENTIFIC LITERACY IN PISA 2015* ...................................... 68
viii
AKADEMK YILMAZLIK LENN TRKEYE UYARLAMA ALIMASI ............................. 76
TEST EQUATING WITH BAYESIAN NONPARAMETRIC MODEL ................................................... 77
BLMSEL AKIL YRTME TESTNN TRKEYE UYARLANMASI VE PSKOMETRK
ZELLKLERNN BELRLENMES ...................................................................................................... 82
DERECEL PUANLANMI LEKLER N PUANLAYICILARIN DEERLENDRLMESNDE
YAPISAL ETLK YAKLAIMININ KULLANILMASI ...................................................................... 83
MADDE TAKIMINDA YER ALAN MADDELERDE YEREL BAIMLILIIN MADDE TAKIMI
TEPK MODEL LE NCELENMES ...................................................................................................... 86
MATEMATK RETM KAYGISI LE GVENRLK VE GEERLK ALIMASI ............. 91
AN ANALYSIS OF ENGLISH TEST IN 2016 YEAR UNDERGRADUATE PLACEMENT EXAM IN
TERMS OF DIFFERENTIAL ITEM FUNCTIONING ............................................................................. 99
AN INVESTIGATION OF FOREIGN LANGUAGE PRODUCTION OF CABIN CREW CANDIDATES
DURING THEIR EMPLOYMENT PROCESS ........................................................................................ 104
RENCLERN SBS-MATEMATK BAARILARINI ETKLEYEN DEKENLERN VE OKUL
KATMA DEERNN HYERARK LNEER MODELLEME ANALZ YOLUYLA
BELRLENMES* .................................................................................................................................... 107
K KATEGORL PUANLANAN MADDELERDE MADDE TEPK KURAMINA DAYALI
PARAMETRE KESTRM: BILOG, Mplus, ltm KARILATIRMASI ............................................... 113
RETMENLERIN KAYNATIRMA UYGULAMALARINDAKI LME VE DEERLENDIRME
YETERLIKLERI VE PROBLEMLERI ................................................................................................... 114
TREND ANALYSIS OF TURKEY IN INTERNATIONAL EDUCATION SURVEYS: USING TIMSS
AND PISA RESULTS .............................................................................................................................. 118
TEK UYGULAMAYA DAYALI SINIFLAMA TUTARLILII VE SINIFLAMA DORULUU
YNTEMLERNN KARILATIRILMASI ......................................................................................... 119
GRSEL METN TEMELL YENLK MADDE FORMATI N TASARLANAN TEST
SKALASININ MADDE AYIRT EDCLNN NCELENMES ........................................................ 125
BYK RNEKLEMLERDE KAYIP VER LE BA ETME YNTEMLERNN LME
DEMEZLNE ETKS AISINDAN KARILATIRILMASI .................................................... 130
KOMPLEKS OKTAN SEMEL SORULARIN SEENEK SAYILARININ SORUNUN
PSKOMETRK ZELLKLERNE ETKS .......................................................................................... 138
OKLU DORUSAL REGRESYON VE GENELLETRLM TOPLAMSAL MODELLERN
AIKLANAN VARYANS SONULARININ KARILATIRILMASI ............................................... 139
SEM TEORS TEMELL KARYER DEERLER LE: GEERLK VE GVENRLK
ALIMALARI ........................................................................................................................................ 140
THE ILLUSTRATION OF VALIDITY ARGUMENTS WITH DIFFERENT CONFIRMATORY
FACTOR ANALYSIS MODELS ............................................................................................................. 147
PISA 2003 VE 2012 MATEMATK OKURYAZARLII PUANLARININ LT GEERL:
BEKLENT TABLOLARI VE UYUM ANALZ ................................................................................... 151
ITEM ORDER EFFECT ON STUDENT PERFORMANCE IN LARGE-SCALE TESTING ................ 154
A COMPARISON OF SOFTWARE PACKAGES AVAILABLE FOR DINA MODEL ESTIMATION
................................................................................................................................................................... 158
ix
AN EVALUATION OF 4PL IRT AND DINA MODELS FOR ESTIMATING PSEUDO-GUESSING
AND SLIPPING PARAMETERS ............................................................................................................ 159
A NON-DIAGNOSTIC ASSESSMENT FOR DIAGNOSTIC PURPOSES: Q-MATRIX VALIDATION
AND ITEM-BASED MODEL FIT EVALUATION FOR THE TIMSS 2011 ASSESSMENT .............. 163
POZTF DUYGU DURUM LENN PSKOMETRK ZELLKLERNN
DERECELENDRLM TEPK MODEL LE NCELENMES .......................................................... 164
BREYSELLETRLM BLGSAYARLI TEST VE BREYSELLETRLM OK AAMALI
TESTLERN PERFORMANSLARININ NCELENMES ...................................................................... 171
TIMSSTE DNGDE YER ALAN MADDELER N MADDE KAYMASININ NCELENMES
................................................................................................................................................................... 172
NVERSTE RENCLERNN BAARILARININ DEERLENDRLMESNDE E-PORTFOLYO
KULLANIMINA LKN GRLERN BELRLENMES ............................................................... 178
BAYESIAN VE NONBAYESIAN KESTRM YNTEMLERNE DAYALI SINIFLAMA
DORULUU NDEKSLERNN FARKLI RNEKLEM KOULLARINDA NCELENMES ........ 184
A COMPARISON OF MIXED EFFECTS LOGISTIC REGRESSION AND LOGISTIC REGRESSION
................................................................................................................................................................... 185
21. YZYIL ETMNDE NE IKAN RENC YETERLKLER, SINIF LME VE
DEERLENDRME ANLAYIININ DNM VE GELECE* ................................................. 188
DUYUSAL ZELLKLER, ETM KAYNAKLARI VE OKUL KLMNN FEN BAARISINA
ETKS: PISA 2015 .................................................................................................................................. 193
INVESTIGATION THE EFFECTS OF ITEM AND STUDENT RELATED VARIABLES ON TURKISH
STUDENTS MATHEMATICS PERFORMANCE* .............................................................................. 197
NANOTEKNOLOJ FARKINDALIK LENN TRK KLTRNDEK GEERLK ve
GVENRLK ALIMASI .................................................................................................................... 204
ULUSLARARASI RENC DEERLENDRME PROGRAMI (PISA-2015) FEN OKURYAZARLII
TEST MADDELERNN KLTRLERARASI LME DEMEZLNN NCELENMES ...... 213
OKBOYUTLU AAMALI TEPK KURAMI ALTINDA RETLEN TEK BML VE TEK BML
OLMAYAN VER GRUPLARINDA OLABLRLK ORANI YNTEM I. TP HATA VE G
ALIMASI ............................................................................................................................................. 221
MEB TARAFINDAN RETMENLERE NERLEN ETM TEMALI FLMLERN ERDKLER
LME VE DEERLENDRME ELERNE GRE NCELENMES ............................................. 226
INVESTIGATING DIFFERENTIAL ITEM FUNCTIONING OF THE ANKARA UNIVERSITY
EXAMINATION FOR FOREIGN STUDENTS BY RECURSIVE PARTITIONING ANALYSIS IN THE
RASCH MODEL ...................................................................................................................................... 234
YEREL BAIMSIZLIK VARSAYIMININ HLAL DURUMUNDA PARAMETRK VE
PARAMETRK OLMAYAN MADDE TEPK KURAMI MODELLERNDEN KESTRLEN MADDE
PARAMETRELERNN KARILATIRILMASI .................................................................................. 239
COMPARISON OF TEACHER PERFORMANCE BY ANSWERS FROM DIFFERENT COUNTRY
STUDENTS .............................................................................................................................................. 244
COMPARISON OF THE CLASSIFICATION PERFORMANCE OF COMPUTER ADAPTIVE
TESTING AND MULTI-STAGE COMPUTER ADAPTIVE TESTING ................................................ 248
x
RTK SINIF ANALZ MPLUS UYGULAMASI: PSKOLOJK DAYANIKLILIK ZERNE BR
RNEK ..................................................................................................................................................... 251
FEN Z YETERL YAPISININ OK GRUPLU DFA YNTEM LE LME DEMEZLNN
NCELENMES ........................................................................................................................................ 257
GRME ENGELL RENCLERE YNELK ULUSAL SINAV UYGULAMALARININ RENC
VE RETMENLER TARAFINDAN DEERLENDRLMES .......................................................... 263
GZL DEKEN ORTALAMA YAPILARININ DEMEZLNN ORTALAMA VE
KOVARYANS YAPILARI ANALZ YAKLAIMIYLA TEST EDLMES ....................................... 276
DAILIM ARPIKLIININ OMEGA NDEKSNN PERFORMANSI ZERNDEK ETKLERNN
ETL KOULLAR ALTINDA NCELENMES ............................................................................... 284
M4 STATSTNN CEVAPLAMA BENZERL BELRLEME PERFORMANSININ
NCELENMES ........................................................................................................................................ 287
RENCLERN BAZI ZELLKLERNE GRE SAYISAL VE SZEL YETENEKLER LE
OLUTURULAN LME MODELNN DEMEZLNN KOVARYANS YAPILARI ANALZ
YAKLAIMIYLA NCELENMES ......................................................................................................... 293
OK BOYUTLU TESTLERDE DEEN KME FONKSYONUNUN BELRLENMES ............... 300
KARAR AALARININ SINIFLANDIRMA PERFORMANSLARININ BAIMSIZ
DEKEN TRNE GRE KARILATIRILMASI ......................................................................... 301
ORTAOKUL RETM PROGRAMLARINDA YER ALAN KAZANIMLARIN BLSEL
DZEYLERE DAILIMININ NCELENMES ..................................................................................... 306
KAYIP VER ATAMA YNTEMLERNN CRONBACH ALFA VE TABAKALI ALFA GVENRLK
KATSAYILARI ZERNDEK ETKSNN NCELENMES ............................................................... 314
PISA PROBLEM ZME BECERLERNN PISADAK MATEMATK, OKUMA VE FEN
BAARISINA ETKSNN NCELENMES ........................................................................................... 325
MADDE TEPK KURAMINA DAYALI TEST ETLEME YNTEMLERNN
KARILATIRILMASI: PISA 2012 FEN TEST RNE ................................................................... 327
OK BOYUTLU TEST YAPILARINDA GVENIRLIIN NCELENMESI ...................................... 328
KARMA DAILIMLARDA MADDE TEPK KURAMI MODELLER LE MADDE PARAMETRE
KAYMASININ NCELENMES .............................................................................................................. 333
CHANGES IN TRANSITION SYSTEM FROM BASIC EDUCATION TO SECONDARY
EDUCATION: A HISTORICAL, DESCRIPTIVE AND CAUSAL VIEW ............................................ 336
ETM ARATIRMALARINDA ELM PUANI ELETRME YNTEMNN KULLANILMASI
................................................................................................................................................................... 340
EVALUATION OF HUMAN RESOURCES MANAGEMENT CERTIFICATE PROGRAM .............. 345
DEEN MADDE FONKSYONUNUN MN VE MD ORTAK TESTE ETKS ........................... 350
ZEKA LEKLERNDE KLASK TEST VE MADDE TEPK KURAMINA GRE PUANLAMA
YNTEMLERNN KARILATIRILMASI (KBIT-2 RNE) ......................................................... 351
TEST ETLEMEDE MADDE TEPK KURAMINA DAYALI YETENEK PARAMETRESNE
YNELK LEK DNTRME YNTEMLERNN KARILATIRILMASI ............................ 354
STANDARD SETTING IN EFL WRITING ASSESSMENT WITH MANY-FACET RASCH
MEASUREMENT MODEL ..................................................................................................................... 355
xi
MADDE/MODL SEME YNTEMLERININ KARILATIRILMASI: BIREYSELLETIRILMI
BILGISAYARLI TEST VE BIREYSELLETIRILMI BILGISAYARLI OK AAMALI TEST
UYGULAMASI ........................................................................................................................................ 361
BR ELDRC ANALZ YNTEM NERS .................................................................................... 362
TIMSS 2015 FEN TEST MADDELERNN DL VE KLTRE GRE DEEN MADDE
FONKSYONU AISINDAN NCELENMES ...................................................................................... 370
INVESTIGATION INVARIANCE OF THE DINA MODEL PARAMETERS WITH REAL DATA ... 380
ANKOR TEST DESENNDE BLSEL TANI MODEL RTK YETENEK SINIFLARI LE TEST
ETLEME ALIMASI ......................................................................................................................... 384
YAPILANDIRILMI VE OKTAN SEMEL MADDELERDE PARAMETRE DEMEZLNN
NCELENMES: BLSEL TANI MODELLERNDE MADDE FORMATININ ETKS ................... 389
EXAMINATION OF CDM ATTRIBUTE PREVALENCE FOR HIGHER-ORDER THINKING SKILLS
ITEMS ....................................................................................................................................................... 394
AKADEMK PERFORMANS LMLERNDE MADDE FORMATININ BLSEL VE ST-
BLSEL ZLEME BECERS ZERNE ETKS ................................................................................ 399
OKUL, EV VE RENC ZELLKLERNN 4. SINIF RENCLERNN TIMSS MATEMATK
BLSEL BECERLERNE ETKS ....................................................................................................... 403
PISA 2015 FEN OKURYAZARLII MADDELERNN DEEN MADDE FONKSYONLARI
AISINDAN NCELENMES ................................................................................................................. 404
RENCLERN TIMSS 2015 MATEMATK YETERLK DZEYLERN ETKLEYEN
DEKENLERN OKLU UYUM ANALZ LE BELRLENMES ................................................. 411
NIVERSITE RENCILERININ YAZMA BECERILERININ LLMESINE YNELIK
BOYLAMSAL BIR LME ARACI GELITIRILMESI VE DEERLENDIRILMESI ...................... 420
SZEL YETENEK TESTINDE KISA CEVAPLI VE OKTAN SEMELI MADDE TRNDEN ELDE
EDILEN SONULARIN KARILATIRILMASI ................................................................................. 424
TRKYE LME ARALARI DZNNDE YER ALAN AIMLAYICI FAKTR ANALZ
ALIMALARININ PARALEL ANALZ SONULARI LE KARILATIRILMASI ...................... 426
DENK OLMAYAN GRUPLARDA KERNEL ETLEME YNTEMNDEN ELDE EDLEN
SONULARININ KARILATIRILMASI ............................................................................................ 427
DMF KAYNAKLARININ BELRLENMESNDE RTK SINIF DMF YAKLAIMININ
KULLANILMASI: PISA-2015 TRKYE RNE .............................................................................. 429
FARKLI FORMATLARDA VE ORTAMLARDA UYGULANAN LKERT TP LEK LE METRK
LEN PSKOMETRK ZELLKLERNN NCELENMES ........................................................ 434
DEIEN MADDE FONKSIYONUN MADDE AYIKLAMAYA GRE NCELENMESI ................. 435
USE OF MIXED ITEM RESPONSE THEORY IN RATING SCALES ................................................. 439
BLSEL TANI MODELLERNDE Q-MATRSN FARKLI BELRLENMESNN SINIFLAMA
DORULUU VE TUTARLILII AISINDAN KARILATIRILMASI .......................................... 443
1
SUMMARIES
2
Paper ID: 51
AN ANALYSIS ON THE MODEL FIT INDICES IN A SCALE
CONSTRUCTION PROCESS
Esin Tezbaaran
stanbul University-Cerrahpaa Hasan Ali Ycel Education Faculty
Abstract: There are a lot of research studies on evaluating the models using some cutoff values of the model fit indices
with no statistical approach. This study analyzes the changing of model fit indices when changing the number of
factors and changing the appointed observed variables to the factors using the same observed variables under the same
sample size in a scale construction process. This research was carried out using real data with the participation of 1013
individuals. The data was collected by the Attitude Scale Towards Learning. The model that is specified according to
the results of exploratory factor analysis and the remodeled constructions at random were investigated by the model
fit indices. The results revealed that whatever the factor numbers are, the divergence among factors affects the
changing the model fit indices. Moreover, he remodeled scale constructions that are non-theoretical and non-analytical
constructions do not seem so poor to model fit.
Keywords: Model fit indices, divergent validity
INTRODUCTION
Goodness-of-fit assessment refers to how well our model reproduces the data generating process (Maydeu-
Olivares, 2017). There are some fit and misfit indices that are the most common for structural equation
models (SEM) in literature. In most of the traditional and typical SEM applications, the operationalized
theories assume one of three forms (Mueller & Hancock, 2008): A measured variable path analysis
(MVPA), a confirmatory factor analysis (CFA), a latent variable path analysis (LVPA). General purpose
of the statistical modeling is to provide an efficient and convenient way of describing the latent structure
underlying a set of observed variables (Byrne, 1994). Therefore, the model structure should be assessed in
terms of efficiency and convenience of describing statistical models.
Data model fit assessment, the fit between observed data and the hypothesized model are ideally
operationalized as an evaluation of the degree of discrepancy between the true population covariance matrix
and that implied by the models structural and nonstructural parameters and there are three broad categories
of fit indices (Mueller & Hancock, 2008): Absolute fit indices (standardized root mean square residual
(SRMR), the chi-square test, and the goodness-of-fit index (GFI)), parsimonious indices (the root mean
square error of approximation (RMSEA) with its associated confidence interval, the Akaike information
criterion (AIC), the adjusted goodness-of-fit index (AGFI)), incremental indices (the comparative fit index
(CFI), the normed fit index, (NFI), and the non-normed fit index (NNFI)). Mueller & Hancock (2008) gives
a table (Table 1) to assess the degree of data-model misfit, various fit indices can be obtained and then
should be compared against established cutoff criteria (e.g., those empirically derived by Hu & Bentler,
1999 cited in Mueller & Hancock 2008).
Table 1. Target values for selected fit indices to retain a model by class
Index Class
Incremental Absolute Parsimonious
NFI 0.90
NNFI 0.95 GFI 0.90 AGFI 0.90
CFI 0.95 SRMR 0.08 RMSEA 0.06
Joint Criteria
NNFI, CFI 0.96 and SRMR 0.09
3
SRMR 0.09 and RMSEA 0.06
Reference: Mueller & Hancock 2008
An absolute-fit index directly evaluates how well an a priori model reproduces the sample data (Hu &
Bentler, 1998). It doesnt depend on a comparison with another model such as saturated models or the
observed data (Ullman, 2001). Parsimonious indices evaluate the overall discrepancy between observed
and implied covariance matrices while taking into account a models complexity; fit improves as more
parameters are added to the model, as long as those parameters are making a useful contribution (Mueller
& Hancock, 2008). Incremental fit indices are also called comparative fit indices. Bentler and Bonett (as
cited in Hu & Bentler, 1998 ) describes that these indices as a null model in which all the observed variables
are allowed to have variances but are uncorrelated with each other is the most typically used baseline model.
As it appears, fit indices can be used to evaluate the model in different points of view. Different indices
reflect a different aspect of model fit (Hooper, Coughlan & Mullen, 2008). Kline (2005) recommends a
minimal set of indices (the model chi-square, RMSEA with its 90% confidence interval, CFI, SRMR) that
should be reported and interpreted when resulting from SEM analysis.
In addition to all of above explanations, the estimation method and sample size have an important effect on
the fit indices. If the assumptions regarding normality and independency among factors and errors seem
plausible, maximum likelihood (ML) and generalized least square estimation techniques are better choices
(Ullman, 2001). Moreover, the role of sample size in evaluating the fit index should be taken into account
to detect the discrepancy between the model and the data generating mechanism powerfully (Maydeu-
Olivares, 2017).
In this research paper, it is aimed that how the model fit indices changes and how the changings can be
explained, when changing the number of factors and changing the appointed observed variables to the
factors using same observed variables in a scale construction process. Also, the evaluations of the models
fit in terms of cutoff criterion of the indices are considered.
METHODOLOGY
This research is a basic research since the aim of the study is to make some inferences related to cut off
criterion of most frequently used model fit indices under comparing different constructed models for scale
development. The basic research is designed to increase scientific understanding of phenomena (Best &
Kahn, 2006) and to add to the general knowledge with little or no concern for the immediate application of
the knowledge produced (Bogden & Biklen, 2007).
The data in the research was collected from 1013 individuals who voluntarily participated in the study.
Trial version of 44-item Attitude Scale Towards Learning, which was developed by Fenerciolu (2003),
was used as a data collection tool.
As the first step of the data analysis, exploratory factor analysis (EFA) was used to determine the scale
construct. Then four different scale constructions were analyzed by CFA to get their model fit indices. One
of the scale construction is getting from the EFA result. For the second construct, the items and the number
of factors revealing by the EFA was used but the items were assigned to the factors randomly. For the third
and fourth constructs, the same items getting from EFA were used two times and assigned them randomly
to only two factors. In this way, the changing value of the model fit indices as change in model structure
can be seen and also how the model fit indices evaluate the different constructs using same observed
variables.
FINDINGS
Firstly, data cleaning procedure was done before factor analysis. Missing value analysis using Littles
MCAR test shows that the data are missing completely at random (2 (3018)= 3033,5, p>0,05). In addition,
missing values were replaced with the values computed by expectation maximization method. Four items
didnt meet normality assumption and 18 items have no statistically significant correlation with the other
items. These items were removed from the scale. Remaining items (22 items) were analyzed using principal
component analysis (PCA) with oblique rotation to get the factorial construct of the scale and the criterion
level for factor loading was 0.30. The sampling adequacy for the analysis is acceptable (KMO= 0.886).
Bartletts test of sphericity shows that correlations between items were sufficiently large (2 (231)=
4
4426.88, p,00001) for PCA. The analysis shows that five
components (factor will be used instead of component in CFA) had eigenvalues over Kaisers criterion
of 1 and in combination explained 46.88% of the variance. The Cronbach alfa value is 0.73 for this scale.
For the second scale construct, 22 same items were used and they were assigned randomly to the five
factors. Then using these items, another two scales constructed with two factors having equal number of
items at random. In this way, four scale construct was used in this study: One of the construct was specified
by EFA and the others at random. Then CFA was conducted to determine the adequacy of the models. For
the estimation method of the population parameters, maximum likelihood (ML) was used. Because ML can
be robust if observed variables do not have multinormality assumption (Joreskog & Yang, 1996).
As it can be seen in table 2, the best values of all fit indices belong the construct from EFA as expected.
When comparing five-factor constructs, the meaningful difference is the correlations between factors. All
correlation coefficients values (10 correlations) are smaller in the construct getting from EFA than those in
the construct with arranging items to the five factors at random (Figure 1).
(a) (b)
Figure 1. Path diagrams of the construct getting from EFA (a) and the construct randomly assigned
items to the five factors (b)
The evident difference between randomly formed constructs with two factors is the level of correlation
between the factors. The model fit indices of the construct with low correlation between the factors were
found better when compared to the other construct with high correlation. Moreover, the model fit indices
of this construct is quite close to those of five-factor construct arranging randomly (Table 2).
Table 2. The model fit indices of different scale constructs
Number
of
factors
Corr.
Between
factors
NFI NNFI CFI GFI SRMR 2 (df) RMSEA
(CI90)
AGFI
5 (EFA-oblique
rotation)
Figure 1-
a
0.95 0.96 0.97 0.95 0.044 556.74
(197)
0.042 (0.038,
0.047)
0.94
5 (randomly
assigned)
Figure 1-
b
0.90 0.91 0.92 0.91 0.055 1093.13
(199)
0.067 (0.063,
0.071)
0.89
2 (randomly
assigned)
0.96 0.89 0.90 0.91 0.90 0.056 1271.51
(208)
0.071 (0.067,
0.075)
0.88
2 (randomly
assigned)
0.87 0.90 0.91 0.92 0.90 0.055 1200.05
(208)
0.069 (0.065,
0.072)
0.88
5
As for the evaluation of the constructs in terms of fitting, it was seen that the construct obtained from EFA
was acceptable fitting according to all indices. It was also seen that the constructs with randomly assigned
items to the factors were only at acceptable level according to the absolute fit indices, and were not
acceptable in the eye of incremental and parsimonious indices.
CONCLUSIONS
This research offers two basic results. One of them is related to the evaluation of the model fit index results
which belong to casual model with great care. Particularly in the case of testing hypothesis for the fitting
of theoretical model, the fitting can be accepted when using cutoff values of indices although the model is
not adequate.
In this research, it is expected that the construct getting from EFA, that is a mathematical procedure
designed to identify factors, has more fitting values than the fitting values of the constructs designed
randomly. The description of a construct provides information about the expected relationships among
variables; factor analysis helps to determine whether this pattern of relationships does indeed exist (Murphy
& Davidshofer, 2001). However, the fitting indices of the constructs randomly designed bear the risk of
acceptable fitting if more tolerant criterions are taken into account. Traditional goodness-of-fit indices also
are unduly influenced by the good fits in the measurement portions of the model and can yield values in the
.90s even when the structural relations among the latent variables of the model are seriously misspecified
(Mulaik, James, Alstine, Bennett, Lind, & Stilwell, 1989).
All indices give their cutoff values to the researchers to evaluate whether their models fit. However, a
problem exists when using fixed cutoff criterions that determine the model fit. The problem is that there is
no obvious external variable criterion outcome and there exists many different fit indices, each one or more
proposed cutoff values (Barrett, 2006; Maydeu-Olivares, 2017). Maydeu-Olivares (2017) declares three
problems with current practices. First they are pre-statistical which means that there is no any description
about the parameter being estimated or proof that the index is an asymptotically unbiased estimate of the
parameter of interest. Furthermore, sampling variability of the statistic is ignored. Second, partly as a result
of the approach being pre-statistical, no agreement is possible regarding which index to use or what cutoff
value. Third, and most importantly, inferences on the model parameters are made as if the model were
correct, when in fact, it is mis-specified. Most indices showed somewhat inadequate sensitivity to
misspecification with many also showing sensitivity to sample size (e.g., chi-square, SRMR), numbers of
factors (e.g., Mc), numbers of items (e.g., RMSEA), or some combination of these model conditions
(Meade, 2008).
The other result of this research is that, without the effect of the number of factors, the values of model fit
indices (except GFI) are affected negatively when the correlation between factors increases. Low
correlation between factors is an important discriminant evidence for construct validity. A validity
coefficient showing little relationship between test scores and/or other variables with which scores on the
test being construct-validated should not theoretically be correlated and provides discriminant evidence of
construct validity (Cohen, Montague, Nathanson & Swerdlik, 1988). According to the findings of this
research, divergent evidence affects the values of model fit indices being in favor of the model.
The studies examining divergence between the factors and convergence under factor loads in the models
can be recommended. In addition, it can be said that there is a need to determine the common cutoff values
for model fit indices with their effect sizes.
REFERENCES
Barrett, P. (2006). Structural equation modelling: Adjudging model fit. Personality and Individual
Differences 42, 815824. doi: 10.1016/j.paid.2006.09.018
Best, J.W. & Kahn, J.V. (2006). Research in education. New York: Pearson Education, Inc.
Bogdan, R. C., & Biklen, S. K. (2007). Qualitative research for education. New York: Pearson Education,
Inc.
Byrne, B.M. (1994). Structural equation modeling with EQS and EQS/Windows. California: Sage
Publication, Inc.
6
Cohen, R.J., Montague, P., Nathanson L.S., & Swerdlik, M.E. (1988). Psychological testing. Mountain
View: Mayfield Publishing Company.
Fenerciolu, E. (2003). Likert tipi leklere madde semede kullanlan Pearson momentler arpm
korelasyon teknii ile ok serili korelasyon tekniinin incelenmesi (Masters thesis). Hacettepe
niversitesi, Sosyal Bilimler Enstits, Ankara, Turkey.
Hooper, D., Coughlan, J., & Mullen, M. R. (2008). Structural equation modelling: Guidelines for
determining model fit. The Electronic Journal of Business Research Methods 6(1), 53 60.
Retrieved www.ejbrm.com
Hu, L., & Bentler, P.M. (1998). Fit indices in covariance structure modeling: Sensitivity to under
parametrized model misspecification. Psychological Methods, 3(4), 424-453.
Joreskog, K.G., & Yang, F. (1996). Nonlinear structural equation models: The Kenny-Judd model with
interaction effects. In G.A Marcoulides, and R.E. Schumacker (Eds.), Advanced structural -
equation modeling: Issues and techniques (pp. 57-88). New Jersey: Lawrence Erlbaum
Asssociates.
Kline, R.B. (2005). Principles and practice of structural equation modeling. New York: The Guilford Press.
Maydeu-Olivares, A. (2017). Assessing the size of model misfit in structural equation models.
Psychometrika, 82(3), 533-558. doi: 10.1007/s11336-016-9552-7
Meade, A. W. (2008, April). Power of AFIs to Detect CFA Model Misfit. Paper presented at the twenty-
third Annual Conference of the Society for Industrial and Organizational Psychology, San
Francisco, USA. Retrieved from http://www4.ncsu.edu/~awmeade/Links/Papers/AFI
(SIOP08).pdf
Mueller, R.O., & Murphy, K.R. ve Davidshofer, C.O. (2001). Psychological Testing Principles and Applications.
New Jersey: Prentice Hall Inc. Hancock, G.R. (2008). Best practices in structural equation
modelling. In J. Osborne (Ed.), Best practices in quantitative methods (pp. 488-510). doi:
http/://dx.doi.org/10.4135/ 9781412995627.d38
Mulaik, S.A., James, L.R., Alstine, J.V., Bennett, N., Lind, S., & Stilwell, C.D. (1989). Evaluation of
goodness-of-fit indices for structural equation models. Psychological Bulletin, 105(3), 430-445.
Murphy, K.R., & Davidshofer, C.O. (2001). Psychological testing principles and applications. New Jersey:
Prentice Hall Inc.
Ullman, J. B. (2001). Structural Equation Modeling. In B.G. Tabachnick and L.S. Fidell (Eds.), Using
Multivariate statistics (pp. 653-771). Boston: Allyn Bacon.
http://www4.ncsu.edu/~awmeade/Links/Papers/AFI7
Paper ID: 54
KOMPOZSYON PUANLAMADA PUANLAYICILAR ARASI
GVENRLK BELRLEME YNTEMLERNN KARILATIRILMASI
Nee ztrk Gbe1, Mustafa Kln2, Ercan Yldz3, Osman Aydn4, Gl Aslan5, eyma zcan6 1,2Mehmet Akif Ersoy niversitesi, 3,4,5,6Milli Eitim Bakanl
zet: Bu aratrmann amac puanlayclar aras gvenirlii farkl yntemler ile incelemektir. Aratrma kapsamnda,
ilkretim 6. Snfa devam eden 19 rencinin yazm olduu kompozisyonlar drt retmen (puanlayc) tarafndan
birbirinden bamsz olarak btncl puanlama anahtar kullanlarak puanlanmtr. Puanlayclar aras gvenirlii
belirlemek iin uyuma yzdesi, Fleisss Kappa istatistii, Kendall Uyum Katsays W, Spearman sra farklar korelasyon katsays ortalamas, Krippendorff Alfa ve genellenebilirlik kuram ie koulmutur. Elde edilen bulgular,
puanlayclarn verdii puanlar arasnda tam uyuma yzdesi %21.1 iken +/-1 aralnda uyuma yzdesinin %89.5
olduunu gstermitir. Fleissin kappa istatistii puanlayclar arasnda istatistiksel olarak anlaml fakat dk bir
uyum olduunu; Kendalln uyum katsays W puanlayclar arasnda yine istatistiksel olarak anlaml fakat verilen
puanlar arasnda orta dzey bir iliki olduunu gstermitir. Bu bulguyu puanlayclarn vermi olduu puanlar
arasnda hesaplanan Spearman sra farklar korelasyon katsays ortalamas da desteklemitir. Benzer ekilde,
Krippendorff alfa istatistii de puanlayclar arasnda zayf uyum olduunu gstermitir. Bu almada puanlayclar
aras gvenirlii belirlemede kullanlan bir dier yntem genellenebilirlik kuram olmutur ve genellenebilirlik
katsays 0.59 olarak hesaplanmtr. Varyans bileenleri incelendiinde puanlayc ana etkisine ait varyansn en dk
varyans bileeni olduu dolaysyla puanlayclarn birbirine benzer puanlamalar yapt sonucuna ulalmtr. Fakat
renci-puanlayc etkileimine ilikin kestirilen artk varyansn en byk varyans bileeni olduu grlmtr. Bu
durumun oluma nedeninin puanlayclarn sistematik olmayan bir ekilde renciler boyunca farkllamasnn
olabilecei gibi rencilerin kompozisyon puanlarnn almada yer almayan faktrler tarafndan da etkilenmi
olabilecei sonucuna ulalmtr.
Anahtar kelimeler: gvenirlik, puanlayclar aras gvenirlik, kompozisyon puanlama
8
Paper ID: 55
INVESTIGATION of SUBGROUP INVARIANCE OF EQUATING
FUNCTIONS
Kbra Atalay Kabasakal1, Nermin Kbrslolu Uysal1, Sakine Ger ahin2 1Hacettepe University, 2University of Wisconsin-Madison
[email protected], [email protected], [email protected]
Abstract: The purpose of this study is to examine the invariance of equating functions and possible reasons behind
invariance. To accomplish this, the responses to the mathematics test obtained from the TIMMS 2015 examination
were used. The USA and Turkey sample, consisting of 16,295 students were used in line with the purpose of the
research. Gender and both countries were considered as subgroups in this study.There are 14 booklets in TIMSS
administration. Each booklet is linked to each other with common items. In the scope of the study 14 booklets were
equated with equipercentile equating method in each subgroup and equated scores were compared with respect to
REMSD. Moreover, DIF between subgroups, distributional properties, test difficulty and discrimination difference
between subgroups within each booklet were evaluated. The results of the study indicated that the test difficulty and
skewness difference between subgroups and the ratio of DIF items were significantly related with REMSD values.
Keywords: Subgroup invariance, equating functions, equipercentile equating, TIMSS 2015.
INTRODUCTION
Test equating is a method to make test scores which were obtained from different but have same construct
tests comparable. Although there are different methods to equate or link test scores, they are all based on
some certain prerequisites (Dorans & Holland, 2000). Subgroup invariance of equating functions is one of
the most important prerequisite of equating process (Dorans, 2004). Lack of invariance of the equated
scores means that some students have the same score on one scale, but belonging to different
subpopulations, makes them to have different expected test scores on the corresponding equated scale
which directly effects the validity of equated scores. Hence, the purpose of this study is to examineequating
invariance between subgroups and evaluate possible reasons behind the invariance. TIMSS 2015
mathematics scores were equated with equipercentile equating method with nonequivalent groups common
item design. The research questions are given below.
Research Questions
(1) Do equating functions obtained to equate mathematics scores in TIMSS 2015 ensure subgroup
invariance?
i. Do equating functions obtained to equate mathematics scores in TIMSS 2015 administration ensure subgroup invariance in Turkey sample between gender subgroups?
ii. Do equating functions obtained to equate mathematics scores in TIMSS 2015 administration ensure subgroup invariance in USA sample between gender subgroups?
iii. Do equating functions obtained to equate mathematics scores in TIMSS 2015 administration ensure subgroup invariance in Turkey and USA samples between country subgroups?
(2) Are the equating errors obtained to equate mathematics scores in TIMSS 2015 administration between
subgroups related to DIF, distributional properties, test difficulty and discrimination difference between
subgroups?
i. Are the equating errors obtained in Turkey sample between gender subgroups related to DIF, distributional properties, test difficulty and discrimination difference between subgroups?
9
ii. Are the equating errors obtained in USA sample between gender subgroups related to DIF, distributional properties, test difficulty and discrimination difference between subgroups?
iii. Are the equating errors obtained in USA and Turkey sample between country subgroups, related to DIF, distributional properties, and test difficulty and discrimination difference between
subgroups?
METHODOLOGY
In the scope of the study, the invariance of equating functions to equate scores of country and gender
subgroups was evaluated and possible reasons of invariance were examined within TIMSS 2015
administration. Hence, this study is a descriptive study which aims to evaluate validity of equated scores in
TIMSS 2015 administration.
Sample
Turkey and USA subgroups were taken into consideration. Hence, the sample of the study consists of
16,295 eight grade students involved in TIMSS 2015 administration in Turkey and USA.
Data Analysis
In order to achieve its predefined goals, TIMSS administration requires too many items that a student is not
able to respond in one session. Hence, there are 14 item clusters in each domain. Each cluster consists of
approximately 12-18 items regarding number, algebra, geometry, data and probability subject domains.
These clusters are distributed 14 booklets in a way that each booklet consists of two clusters and they are
balanced with respect to content, item number and item format. These booklets are linked by common
clusters to receive a final score for each student.
In data analysis each booklet was equated with following one by common clusters. Data analysis was
conducted in two steps. First, equating invariance between subgroups was evaluated. Equipercentile
equating method was used. Then possible reasons behind the invariance were examined. Two subgroups
were defined as country and gender groups. Country group consists of Turkey and USA sample groups.
Gender group consists of the gender subgroups of each country's sample. Equating invariance was evaluated
in three phases. First, the whole groups scores were equated. Second, the subgroups scores were equated.
Last, the REMSD values were calculated. The REMSD value higher than 10% interpreted as equating
invariance is not ensured (Dorans, 2004; Dorans &Holland, 2000). After REMSD values were calculated, the possible reasons behind equating invariance were evaluated. DIF, differences in distributional
properties, test difficulty and discrimination indexes for each subgroup within each booklet were examined.
Data analysis was conducted in R program.
FINDINGS
The findings were reported for each research question. The results addressing first research question are
summarized in Table 1.
Table 1. REMSD values for each equating relation within subgroups
Booklets TR-USA TR-Gender USA-Gender B1 - B2 0.10 0.05 0.07 B2 - B3 0.11 0.15 0.06 B3 -B4 0.12 0.12 0.07 B4 - B5 0.09 0.12 0.08 B5 -B6 0.06 0.08 0.11 B6 - B7 0.11 0.06 0.10 B7 - B8 0.12 0.06 0.04 B8 - B9 0.07 0.19 0.06 B9 - B10 0.11 0.13 0.03 B10 - B11 0.04 0.10 0.11 B11 - B12 0.10 0.16 0.17 B12 - B13 0.08 0.06 0.06
10
B13 - B14 0.08 0.12 0.07 B14 - B1 0.11 0.11 0.15
As shown in Table 1, the REMSD values are higher than 0.10 for seven equating functions between
countries, for eight equating functions between gender subgroups in Turkey and for four equating functions
between gender subgroups in USA. These results imply that equating invariance is not ensured for some
booklets between the subgroups.
In order to answer second research question DIF, distributional properties, test difficulty and discrimination
indexes difference between subgroups were examined. Number of DIF items in common tests and whole
tests with respect to favoring subgroups were summarized in Table 2.
Table2. Number of DIF items in common and whole test within each subgroups
TR-USA TR-Gender USA-Gender
Common Whole Common Whole Common Whole
Booklets TR USA TR USA F M F M F M F M
B1 - B2 4 3 7 12 1 2 3 7 2 2 5 6
B2 - B3 5 3 8 13 1 2 6 6 2 2 6 6
B3 -B4 4 2 6 12 4 2 8 6 2 2 5 5
B4 - B5 3 5 8 10 3 2 8 6 1 1 3 4
B5 -B6 3 5 8 11 1 2 7 8 0 1 2 6
B6 - B7 5 4 9 14 3 4 6 7 1 4 2 6
B7 - B8 6 5 11 17 2 1 9 8 1 1 4 6
B8 - B9 6 5 11 16 4 3 6 5 2 1 3 3
B9 - B10 4 4 8 12 0 1 6 7 0 1 2 3
B10 - B11 2 2 4 10 2 3 3 8 0 1 2 4
B11 - B12 4 5 9 11 1 4 5 9 2 2 3 3
B12 - B13 5 4 9 12 2 2 4 7 1 0 3 4
B13 - B14 3 6 9 13 1 1 4 6 0 2 2 4
B14 - B1 5 6 11 12 1 3 3 6 1 2 5 6
As shown in Table 2, there are plenty of DIF items in each equating case between subgroups. For country
subgroups, while number of items favoring Turkey and USA seem more balanced in common tests, there
are more items favoring USA in the entire tests. For gender subgroups, while number of items favoring
male students is more than items favoring females, the number of DIF items favoring males and females
seem more balanced in both common and entire test for Turkey as well as USA.
The skewness, kurtosis, test difficulty and test discrimination index difference of booklets between
subgroups are given in Table 3.
Table 3. Differences between skewness, kurtosis, test difficulty and discrimination indexes
TR-USA TR-Gender USA-Gender
Booklets Skew Kurt Disc Diff Skew Kurt Disc Diff Skew Kurt Disc Diff
B1 - B2 0.287 0.715 0.096 0.619 0.301 0.695 0.734 0.039 0.048 0.21 0.035 0.01
B2 - B3 1.12 0.981 0.038 0.943 0.514 0.899 1.053 0.154 0.143 0.077 0.053 0.002
B3 -B4 1.169 1.054 0.262 0.792 0.314 1.077 1.034 0.043 0.066 0.01 0.013 0.003
B4 - B5 0.641 0.838 0.316 0.522 0.134 0.816 0.852 0.036 0.132 0.107 0.043 0.001
B5 -B6 0.28 0.731 0.24 0.491 0.55 0.579 0.888 0.309 0.167 0.206 0.009 0.022
B6 - B7 0.347 0.653 0.111 0.542 0.125 0.665 0.64 0.025 0.013 0.017 0.011 0.024
B7 - B8 0.204 0.65 0.089 0.561 0.157 0.633 0.659 0.026 0.061 0.233 0.036 0.013
B8 - B9 1.022 0.924 0.018 0.906 0.535 0.827 1.003 0.176 0.059 0.137 0.047 0.014
11
B9 - B10 1.022 0.968 0.167 0.801 0.196 1.019 0.936 0.083 0.08 0.041 0.001 0.001
B10 - B11 0.46 0.797 0.286 0.511 0.113 0.822 0.77 0.052 0.104 0.007 0.002 0.005
B11 - B12 0.339 0.665 0.111 0.554 0.764 0.492 0.858 0.366 0.178 0.095 0.003 0.044
B12 - B13 0.507 0.789 0.31 0.479 0.227 0.78 0.79 0.01 0.125 0.137 0.054 0.021
B13 - B14 0.294 0.689 0.189 0.5 0.194 0.75 0.638 0.112 0.032 0.031 0.038 0.025
B14 - B1 0.356 0.698 0.04 0.658 0.126 0.782 0.617 0.165 0.159 0.01 0.002 0.05
Note: Skew=Skewness; Kurt=Kurtosis; Disc=Discrimination index; Diff= Difficulty
In order to decide which features of tests are significantly related to equating error, the Sperman-Brown rho
correlations were calculated between REMSD values and DIF ratio, skewness, kurtosis, test difficulty and
test discrimination index difference for each group. The results indicated that there is a significant
relationship between REMSD and DIF ration in whole test (=.55, p=.041) and skewness difference (=.56,
p=.038) in Turkey-USA samples. In Turkey sample between gender subgroups, REMSD values are
significantly related to skewness difference (=.66, p=.010), while in USA sample gender subgroups,
REMSD values are significantly related to test difficulty difference (=.54, p=.045).
CONCLUSIONS
The results of the study indicated that equating invariance is not ensured for some booklets. These results
imply that students get lower equated scores as they belong to different subgroups (Huggins, 2014). Hence,
the evaluation of the scores under certain subgroups may lead misinterpretations as the equated scores are
not invariant.
The results also imply that REMSD values are related to different features of test in different groups. While
in Turkey and USA sample country subgroup invariance is related to the DIF ratio, it is not related in gender
subgroups both in Turkey and USA groups. The ratio of DIF items is higher in country subgroups than
gender subgroups. This may be interpreted as equipercentile equating method is robust to DIF until certain
ratio within equating invariance perspective. Skewness differences of subgroups is also a significant
variable related to REMSD gender subgroups in Turkey and country subgroups but not in gender subgroups
in USA. The possible reason behind this may be the skewness differences between gender subgroups in
USA is too small.
The results of the study are limited with the equipercentile equating method and defined subgroups. Future
studies may focus on how different equating methods behave under certain subgroups. Moreover, in order
to get a broader perspective related to possible reasons of invariance, simulation studies may be conducted.
REFERENCES
Dorans, N. J., & Holland, P. W. (2000). Population invariance and the equatabilityof tests: Basic theory
and the linear case. Journal of Educational Measurement, 37, 281-306.
Dorans, N. J. (2004). Using subpopulation invariance to assess test score equity. Journal of Educational
Measurement, 41, 43-68.
Huggins, A. C. (2014). The effect of differentialitem functioning in anchoritems on populationinvariance
of equating. Educational and Psychological Measurement, 74(4), 627658.
12
Paper ID: 58
THE COMPARISON OF ITEM AND ABILITY ESTIMATIONS
CALCULATED FROM THE PARAMETRIC AND NON-PARAMETRIC
ITEM RESPONSE THEORY ACCORDING TO THE SEVERAL FACTORS
Ezgi Mor Dirlik1, Nizamettin Ko2
1Kastamonu niversitesi, 2Ankara niversitesi (Emekli)
[email protected], [email protected]
Abstract: This study aimed to compare the item and ability parameters estimated from two different approaches of
Item Response Theory, which are parametric and non-parametric, according to the factors of test length, sample size
and the item psychometric qualities. The study was prepared in a basic research method and the data of the study is
obtained from the Trends of International Mathematic and Science (TIMSS) 2011 application. From the main data
set, 16 different data sets were formed according to the research questions and the conditions of the factors. The results
showed that all of the item parameters estimated according to the parametric and non-parametric models from the data
sets which are composed of different number of items and sample sizes were correlated significantly and highly.
Among of the abilities estimated by non-parametric models in all sample size and test length factors, it was concluded
that ability parameters are significantly and highly correlated. To sum up, it was found that both approaches had
resulted in so highly and significantly correlated item and ability parameters. This finding showed that if the
assumptions of parametric item response theory models are not met as a required level, the non-parametric alternative
models can be used in the conditions that studied in this research.
Keywords: Parametric Item Response Theory, Non-Parametric Item Response Theory, Mokken Scaling, Sample Size,
Test Length
SUMMARY
Introduction
Measurement of the abstract constructs has been one of the most difficult fields since the beginning
of the history of social sciences. There have been so many attempts to define and measure the properties
such as success, motivation, intelligence, attitudes and personality. Because of the complex structure of
these constructs, it has been already accepted that there is always a fair amount of error in these
measurements. However, in order to minimize the level of error, the theories of measurement have been
developed. The first measurement theory which has been known the most is the Classical Test Theory
(CTT) and has been used so many measurement applications thanks to easy structural features. Its
assumptions can be met easily for very kind of data sets and this is the main reason that made this theory
so popular. Despite of the popularity, CTT has some important drawbacks which limit the precision of the
measurement. By using CTT, only one value of the error, the standard error of measurement, can be
obtained but it is a fact that the amount of the error is not same for the all people in a sample. Also, it is so
difficult to adapt a test to the tester level in this theory. In order to propose solutions to these drawbacks,
the other measurement theory has been developed which is Item Response Theory(IRT). It is known as
latent trait theory and provides deeper knowledge not only about the testees performances but also the test
items and the constructs. It has made possible to create adaptive tests and to equate tests for the goal of
comparison of students performance. However, in order to get these advantages of the theory, there are
some requirements which should be met by the data sets. These requirements are large sample size, normal
distribution, local independence and uni-dimensionality. Especially for the classroom assessment, it is not
possible to reach large samples such as 1000 or 2000 generally and to apply as many items as in the high
stake tests. For these reasons, the advantages of IRT cannot be used for the small samples which are not
normally distributed. Because of the difficulty of these assumptions of IRT, it has been occurred new
approaches which are based on this theory but differentiate as for requirements. One of this new approach
13
is Nonparametric Item Response Theory, which is between CTT and Parametric Item Response Theory. It
can be applied for short tests and small samples like CTT but it is possible to get invariable item and ability
parameters like IRT. The nonparametric item response theory is a relatively new approach and has been
frequently used in health sciences. Because of the novelty of this field, there are limited numbers of studies
in literature and most of them have been focused on the theoretical perspectives. In this study, the
comparison of the parameters calculated from the parametric and nonparametric IRT has been purposed.
Method
By this study, it was aimed to compare the item and ability parameters estimated from two different
approaches of Item Response Theory, which are parametric and non-parametric, according to the factors of
test length, sample size and the item psychometric qualities. The study was prepared in a basic research
method and the data of the study is obtained from the Trends of International Mathematics and Science
(TIMSS) 2011 application. The mathematic booklet at eight grade level which had been determined as
unidimensional at most was used and the data from the first 20 countries were gathered. The final data set
of the research was composed of 7254 students from different countries. From the main data set, 16 different
data sets were formed according to the research questions and the conditions of the factors. As for sample
size factor, three different data sets were prepared consisted of 500, 1000 and 3000 students and sample
size conditions were crossed with the test length conditions which are five, 15 ve 25 items. From these data
sets, item and ability parameters were estimated by using the compatible parametric and non-parametric
item response models and the results were compared by using the relevant correlation coefficients.
According to the last factor of item psychometric quality factors, four different data sets were composed
and ability parameters were estimated from them and compared. Lastly, the reliability of the data sets were
investigated by using test information functions in parametric item response theory models and calculating
Cronbach Alpha, Lambda2, LCRC and MS coefficients in non-parametric item response theory models.
Results
The results showed that all of the item parameters estimated according to the parametric and non-
parametric models from the data sets which are composed of different number of items and sample sizes
were correlated significantly and highly. However, as for ability parameters, it was determined that there is
not consistency in the estimations according to the parametric models especially when the data sets are
composed of five and 15 items. However, it was found that the abilities estimated from 25 item data set is
correlated significantly and highly. Among of the abilities estimated by non-parametric models in all
sample size and test length factors, it was concluded that ability parameters are significantly and highly
correlated. Additionally, when the correlation of estimated ability parameters was examined between
parametric and non-parametric models, parametric models produced consistent with non-parametric models
only in the conditions of having 25 items or sample size is larger than 3000.
Discussion and Conclusion
In the scope of this work, the another factor whose effects were examined on ability parameters were
the levels of item discrimination and item difficulty parameters and according to this factor, it was observed
that the abilities estimated from the four data sets that were grouped in terms of low and high item
difficulties and item discrimination, were correlated highly and significantly. Accordingly, it was found
that both approaches had resulted in so highly and significantly correlated item and ability parameters. This
finding showed that if the assumptions of parametric item response theory models are not met as a required
level, the non-parametric alternative models can be used in the conditions that studied in this research.
REFERENCES
Baker, F. B. and Kim. S.H. (2004). Item response theory: Parameter estimation techniques (2nd Edition).
New York: Marcel Dekker.
Christensen, K. and Kreiner, S. (2010). Monte Carlo tests of the Rasch model based on scalability
coefficients. British Journal of Mathematical and Statistical Psychology, 63, 101-111.
Dyehouse, M. A. (2009). A comparison of model-data fit for parametric and nonparametric item response
theory models using ordinal level ratings. (Doctoral dissertation). Available from ProQuest
Dissertations and Theses database.
14
Embretson, S. E. and Reise, S. P. (2000). Item response theory for psychologists. Mahwah. NJ: Erlbaum.
Hambleton, R. K. and Swaminathan, H. (1985). Item response theory. Principles and applications. Boston:
Kluwer.
Junker, B. (2000). Some topics in nonparametric and parametric IRT. with some thougts about the future.
Carnegie Mellon University: Department of Statistics.
Koar, H. (2014). Madde tepki kuramnn farkl uygulamalarndan elde edilen parametrelerin ve model
uyumlarnn rneklem bykl ve test uzunluu asndan karlatrlmas. Yaymlanmam
doktora tezi. Hacettepe niversitesi Eitim Bilimleri Enstits.
Kujipers, R. E. Van der Ark, L. A. , Croon, M. A. and Sijtsma, K. (2016). Bias in point estimates and
standard errors of Mokkens scalability coefficients. Applied Pyschological Measurement. 40 (5),
331-345.
Loevinger, J. (1947). A systematic approach to the construction and evaluation of tests of ability.
Psychological Monographs. 61 (A).
Loevinger, J. (1948). The technic of homogeneous tests compared with some aspects of "scale analysis"
and factor analysis. Psychological Bulletin. 45 (6). 507-529.
Meijer, R. R. Sijtsma, K and Smidt, N. G. (1990). Theoretical and empirical comparison of the Mokken
and the Rasch approach to IRT. Applied Psychological Measurement. 14. 283298.
Meijer, R. R. Sijtsma, K., and Molenaar, I. W. (1995). Reliability estimation for single dichotomous items
based on Mokken's IRT model. Applied Psychological Measurement, 19 (4), 323-335.
Meijer, R.R. and Baneke, J.J. (2004). Analyzing psychopathlogy items: a case for nonparametric item
response theory modeling. Pyschological Methods, 9 (3), 354-368.
Meijer, R. R. Egberink, I.J.L., Emons, W.H.M. and Sijtsma, K. (2008). Detection and valiaditon of
unscalable item score patterns usingitem response theory: an isllustration with Harters self
perception profile for children. Journal of Personality Assessment, 90 (3), 227-238.
Meijer, R., Tendeiro, J., and Wanders, R. (2014). The use of nonparametric item response theory to explore
data quality. S. P. Reise & D. Revicki (Ed.). Handbook of item response theory modeling:
Applications to typical performance assessment. In S. P. Reise & D. Revicki (Eds.). Handbook of
item response theory modeling: Applications to typical performance assessment. (s. 85-110). New
York: Routledge
Mislevy, R. J. & Stocking, M. L. (1989). A consumers guide to LOJIST and BILOG. Applied
Psychological Measurement. 13. 57-75.
Mokken. R. J. and Lewis. C. (1982). A nonparametric approach to the analysis of dichotomous item
responses. Applied Psychological Measurement, 6 (A), 417-430.
Mokken, R. J. (1971). A theory and procedure of scale analysis with applications in political research.
New York. Berlin: Walter de Gruyter. Mouton.
Molenaar, I.W. ve Sijtsma. K. (1984).Internal conisistency and realibility in Mokkens nonparametric item
response model. Tijdshrift voor onderwijsresearch.9 (5). 257-268.
Molenaar, I. W. (1991). A weighted Loevinger H coefficient extending Mokken scaling to multicategory
items. KwantitatieMe Methoden. 37. 97 117.
Molenaar. I. W. (1997). Nonparametric models for polytomous responses. In W.J. Man der Linden and
R.K. Hambleton (Ed.). Handbook of modern item response theory (s. 369380). NewYork: Springer.
Molenaar, I. W. (2001). Thirty years of nonparametric item response theory. Applied Psychological
Measurement, 25 (3), 295-299.
Molenaar. I. W. (1997). Nonparametric models for polytomous responses. In W.J. Man der Linden and
R.K. Hambleton (Ed.). Handbook of modern item response theory (s. 369380). NewYork: Springer.
Sijtsma, K. and Junker, B.W. (2006). Item Response Theory: Past Performance. Present Developments
and Future Expectations. Behaviormetrika, 1, 75102.
Sijtsma, K. and Meijer, R. R. (2007). Nonparametric item response theory. C. R. Rao. & S. Sinharay (Ed.)
Handbook of statistics 26: psychometrics. (s.719-746). Amsterdam: Elsevier.
http://www.rug.nl/staff/r.r.meijer/researchhttp://www.rug.nl/staff/r.r.meijer/researchhttp://www.rug.nl/staff/r.b.k.wanders/researchhttp://www.rug.nl/research/portal/publications/the-use-of-nonparametric-item-response-theory-to-explore-data-quality-in-s-p-reise--d-revicki-eds-handbook-of-item-response-theory-modeling-applications-to-typical-performance-assessment(33e177c5-7a32-4f47-b8d8-7d0be43bdfd3).htmlhttp://www.rug.nl/research/portal/publications/the-use-of-nonparametric-item-response-theory-to-explore-data-quality-in-s-p-reise--d-revicki-eds-handbook-of-item-response-theory-modeling-applications-to-typical-performance-assessment(33e177c5-7a32-4f47-b8d8-7d0be43bdfd3).htmlhttp://www.rug.nl/research/portal/publications/the-use-of-nonparametric-item-response-theory-to-explore-data-quality-in-s-p-reise--d-revicki-eds-handbook-of-item-response-theory-modeling-applications-to-typical-performance-assessment(33e177c5-7a32-4f47-b8d8-7d0be43bdfd3).html15
Sijtsma, K. and Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Newbury
Park. CA: Sage.
Straat, J. H. Van der Ark, L.A. and Sijtsma, K. (2014). Minimum sample size requirements for Mokken
scale analysis. Educational and Psychological Measurement, 74, 809-822.
engl Avar, A. (2015). ok Kategorili Puanlanan Maddelerin Psikometrik zellkilerinin Farkl Test
Koullarnda Parametrik Olmayan Madde Tepki Kuram Modellerine Gre ncelenmesi.
Yaymlanmam doktora tezi. Ankara niversitesi Eitim Bilimleri Enstits.
Van Schuur, W. H. (2011). Ordinal item response theory: Mokken scale analysis. Los Angeles: Sage
Publications.
16
Paper ID: 59
A COMPARISON OF KERNEL EQUATING METHODS UNDER NEAT
DESIGN
idem Akn Arkan
Ordu niversitesi [email protected]
Abstract: In this study, the equated score results of the Kernel equating (KE) methods which are chained and post-
stratification equipercentile and linear equating methods compared under NEAT design. TIMMS Science data was
used for this study. The study sample consisted of 865 eighth-grade examinees who were given the booklets 1 and 14
during the TIMSS application in Turkey. There were 39 items in booklet 1, and 38 items in booklet 14. Firstly,
descriptive statistics were calculated and then, the two booklets were equated according to the NEAT design based on
Kernel chained, Kernel post-stratification equipercentile, and linear equating methods. Secondly, equating methods
were evaluated according to some criteria to DTM, PRE, SEE, SEED and RMSD. It was seen that results based on
equipercentile and linear equating methods were consistent with each other, except high range of the score scale.
Finally, PRE values demonstrate that KE equipercentile equating methods better matches the discrete target
distribution Y and distribution of SEED reveals that KE equipercentile and linear methods are not significantly
different from each other according to DTM.
Keywords: Equating, Kernel equating, DTM, SEE, SEED, RMSD.
mailto:[email protected]17
Paper ID: 64
MADDE TEPK KURAMI SMLASYONLARININ YAPI GEERL VE
GVENRLK AISINDAN NCELENMES
Nuri Doan1, Abdullah Faruk Kl2, brahim Uysal3 1Hacettepe niversitesi, 2Adyaman niversitesi, 3Abant zzet Baysal niversitesi
[email protected], [email protected], [email protected]
zet: Bu aratrmann amac Madde Tepki Kuramna (MTK) dayal olarak gerekletirilen simlasyon almalarnn
farkl paket ve programlardan elde edilen sonularnn geerlik ve gvenirlik asndan incelenmesidir. Bu amala R
program paketlerinden mirt, psych, irtoys ve catIrt paketleri ve MTKya dayal olarak gerekletirilen simlasyon
almalarnda sklkla kullanlan WinGen programndan elde edilen veri setleri KR20 gvenirlik katsays ve
amlayc faktr analiz (AFA) sonucunda elde edilen aklanan varyans oran, ortalama faktr yk ve very setlerinin
boyutluluu asndan karlatrlmtr. Aratrma Monte Carlo simlasyon almas olarak yrtlmtr.
Aratrmada simlasyon koullar; madde says (20, 30 ve 40), kullanlan MTK modeli (Rasch, 2 parametreli lojistik
model [2PLM] ve 3 parametreli lojistik model [3PLM]), veri retme program (WinGen 3.1 ile R paketlerinden irtoys,
mirt, psych ve catIrt) ve rneklem bykl (500, 1000 ve 2000) eklinde belirlenmi olup toplam 135 simlasyon
koulunda allm ve her bir koul iin 1000 replikasyon yaplmtr. Aratrma sonucunda tm MTK modellerine
gre retilen veri setlerinin genel olarak tek boyutlu olduu, aklanan varyans orannn Rasch modeli iin 0.30un
altnda kald, 2PLM iin bir koul hari dier tm koullarda programlarn 0.30un zerinde aklanan varyansa
sahip olduu, 3PLM iin aklanan varyans orannn genellikle 0.30un altnda olduu, ortalama faktr yklerinin ise
tm modeller ve programlar iin 0.46nn zerinde olduu gzlenmitir. Gvenirlik katsaylarnn tm koullar iin
0.75-0.93 aralnda olduu gzlenmitir. Aratrma bulgularna gre R paketlerinin MTK kapsamnda veri retiminde
simlayon koullarnn byk ksmnda aklanan varyans oran, ortalama faktr yk ve gvenirlik katsays
asndan daha iyi sonular elde ettii sylenebilir. Bu nedenle MTKya dayal simlasyon almalarnda R
paketlerinin tercih edilmesi nerilmektedir.
Anahtar Kelimeler: Simlasyon, Madde Tepki Kuram, Amlayc Faktr Analizi, aklanan varyans, gvenirlik.
GR
Simlasyon almalar eitimde ve psikolojide sklkla kullanlan almalardr. Feinberg ve Rubright
(2016) Applied Psychological Measurement ve Journal of Educational Measurement dergilerinde
2008-2012 yllar arasnda yaynlanan almalarn %60nn simlasyon almalar olduunu
belirtmektedir. Simlasyon almalar, veri analizi, istatiksel g ve ampirik almalardan doru
sonularn elde edilmesi iin kullanlacak yntemlere ilikin cevap bulmada aratrmaclara yardmc
olmaktadr (Hallgren, 2013). Bu nedenle de aratrmalarda sklkla kullanld sylenebilir.
Madde Tepki Kuram (MTK) eitim ve psikoloji alannda sklkla kullanlan teorilerden biridir. Bu nedenle
MTKya ynelik simlasyon almalarna sklkla rastlanmaktadr. MTKya ynelik olarak Monte Carlo
simlasyon almas yrtmek iin birok program ve programlama dili bulunmaktadr. IRTPRO [(Cai,
Thissen ve du Toit, 2011), flexMIRT (Cai, 2017), BMIRT (Yao, 2003) ve Mplus (Muthn ve Muthn,
2012) gibi programlar cretliyken WinGen (Han, 2007) gibi programlar ya da R (R Core Team, 2017) gibi
yazlm dilleri cretsizdir.
Aratrmaclar sklkla bu programlar araclyla Monte Carlo simlasyon almalar yrtmekte ve
aratrma bulgularna gre nerilerde bulunmaktadr. Ancak bu programlarn rettii veri setlerinin MTK
varsaymlar ve psikometrik zellikleri asndan incelenmedii de aratrmalarda gzlenmektedir. Mevcut
almada cretsiz olmas ve birok MTK almasnda kullanlmas nedeniyle Wingen 3.1 program ve R
programndan 4 farkl paket kullanlarak MTKya dayal olarak retilen veri setlerinin yap geerlii ve
gvenirliiyle ilgilenilmitir. Bylece aratrmalardan elde edilen bulgularn snrlar daha net
izilebilecektir.
mailto:[email protected]18
Aratrmada MTKya dayal olarak yrtlen Monte Carlo simlasyon almalarnda sklkla kullanlan
programlardan elde edilen veri setlerinin yap geerlii ve gvenirlii nasldr? sorusuna cevap aran ve
bu problemle ilikili olarak
1. Monte Carlo simlasyonlarnda kullanlan WinGen 3.1, irtoys, mirt, psych ve catIrt programlarndan elde edilen veri setlerinin boyutluluu nasldr?
2. Monte Carlo simlasyonlarnda kullanlan WinGen 3.1, irtoys, mirt, psych ve catIrt programlarndan elde edilen veri setlerinin aklanan varyans oran nedir?
3. Monte Carlo simlasyonlarnda kullanlan WinGen 3.1, irtoys, mirt, psych ve catIrt programlarndan elde edilen veri setlerinin ortalama faktr yk nedir?
4. Monte Carlo simlasyonlarnda kullanlan WinGen 3.1, irtoys, mirt, psych ve catIrt programlarndan elde edilen veri setlerinin gvenirlik katsaylar nasldr?
alt problemlerine yant aranmtr.
METOT
Aratrmada Madde Tepki Kuramna (MTK) gre yrtlen Monte Carlo simlasyon almalarnda
retilen veri setlerinin yap geerlii ve gvenirlii incelendiinden Monte Carlo simlasyon almas
olarak yrtlmtr. Simlasyon almalar eitim ve psikoloji alannda sklkla kullanlan (Feinberg ve
Rubright, 2016; Harwell, Stone, Hsu ve Kirisci, 1996) ve olas ihtimallerin deerlendirilmesine imkan
tanyan almalardr (Gilbert, 1999).
Aratrmada simlasyon koullar; madde says (20, 30 ve 40), kullanlan MTK modeli (Rasch, 2
parametreli lojistik model [2PLM] ve 2 parametreli lojistik model [3PLM]), veri retme program (WinGen
3.1 (Han, 2007) ile R paketlerinden irtoys (Partchev, 2016), mirt (Chalmers, 2012), psych (Revelle, 2018)
ve catIrt (Nydick, 2015)) ve rneklem bykl (500, 1000 ve 2000) eklinde belirlenmi olup toplam
3x3x5x3=135 simlasyon koulunda allm ve her bir hcre iin 1000 replikasyon yaplmtr.
Replikasyonlar sonucunda 135000 veri seti retilmitir. Aratrmada Tek Boyutlu Madde Tepki Kuram
temel alnmtr. Bu nedenle tm veri setleri tek boyutlu olacak ekilde retilmitir. Ayrca aratrmada
sadece ikili puanlanan veriler incelenmitir. ok kategorili veri bu aratrma dnda braklmtr.
MTK temelinde yrtlen simlasyon almalarnda madde parametrelerini dalmlar almalara gre
farkllamaktadr. Bu almada Bulut ve Snbl (2017) tarafndan kullanlan madde parametrelerinin
dalmlar kullanlmtr. Buna gre 3 PLM iin a parametresi, ortalamas 0.3 ve standart sapmas 0.2 olan
lognormal dalm, b parametresi ortalamas 0, standart sapmas 1 olan normal dalm ve c parametresi de
ekil parametreleri a=20 ve b=90 olan beta dalmndan retilmitir. 2PLM iin c parametresi 0 olarak
tanmlanmtr. Rasch model iin de a parametresi 1 ve c parametresi 0 olarak tanmlanmtr. Yetenek
parametresi ise ortalamas 0, standart sapmas 1 olan normal dalmdan retilmitir.
Veri setlerinin boyutluluunun belirlenmesi iin en kk ortalamal ksmi korelasyon (Minimum Avarage
Partial [MAP]) analizi yaplmtr. Aklanan varyans oran ve ortalama faktr yk iin amlayc faktr
analizi (AFA) yaplmtr. Bunun iin en kk kalntlar (minimum residual [minres]) faktr karma
yntemi kullanlm ve tetrakorik korelasyon matrisiyle analizler yrtlmtr. Ortalama faktr yknn
belirlenmesinde tm veri setleri tek boyutlu retildiinden tm maddelerin birinci faktre ait faktr
yklerinin ortalamas alnmtr. Bu ilem her bir kouldaki 1000 replikasyon iin yaplm olup 1000
replikasyon sonucu elde edilen ortalama deer raporlanmtr. Gvenirlik katsaylarnn belirlenmesi iin
KR20 gvenirlik katsays hesaplanmtr. Bunun iin her bir bir koul iin retilen 1000 veri setinin KR20
deerlerinin ortalamas alnmtr. Analizlerin tm R programnda psych (Revelle, 2018) paketi
kullanlarak gerekletirilmitir.
BULGULAR
Aratrma sonucunda elde edilen bulgular Tablo 1de sunulmu olup alt problemlere gre yorumlanmtr.
Tablo 1. Programlardan Elde Edilen Veri Setlerinin Aklanan Varyans Oran, Ortalama Faktr
Yk ve Gvenirlii
Rasch
Model
2PLM 3PLM
19
Mad
de
rn
ekle
m
Pro
gra
m
AV
O
OF
Y
G
ven
irli
k
AV
O
OF
Y
G
ven
irli
k
AV
O
OF
Y
G
ven
irli
k
20 500 mirt 0.261 0.508 0.788 0.405 0.629 0.862 0.304 0.544 0.815
psych 0.265 0.512 0.790 0.403 0.627 0.850 0.286 0.524 0.792
WinGen 0.246 0.493 0.765 0.299 0.542 0.805 0.267 0.503 0.753
irtoys 0.263 0.509 0.788 0.399 0.624 0.847 0.289 0.528 0.794
catIrt 0.263 0.510 0.789 0.399 0.624 0.847 0.289 0.528 0.794
20 1000 mirt 0.261 0.510 0.790 0.400 0.626 0.862 0.293 0.535 0.803
psych 0.265 0.513 0.793 0.378 0.608 0.843 0.232 0.470 0.753
WinGen 0.246 0.494 0.766 0.350 0.588 0.837 0.281 0.519 0.770
irtoys 0.261 0.510 0.790 0.399 0.625 0.853 0.233 0.470 0.754
catIrt 0.261 0.510 0.790 0.398 0.625 0.853 0.232 0.469 0.753
20 2000 mirt 0.243 0.492 0.770 0.374 0.608 0.848 0.290 0.533 0.805
psych 0.261 0.510 0.785 0.364 0.601 0.829 0.243 0.487 0.771
WinGen 0.255 0.504 0.775 0.343 0.582 0.834 0.272 0.512 0.766
irtoys 0.243 0.492 0.769 0.366 0.602 0.830 0.249 0.493 0.777
catIrt 0.243 0.492 0.769 0.366 0.602 0.830 0.248 0.492 0.776
30 500 mirt 0.263 0.509 0.842 0.393 0.622 0.900 0.252 0.495 0.835
psych 0.261 0.507 0.841 0.369 0.602 0.884 0.241 0.474 0.813
WinGen 0.245 0.492 0.834 0.388 0.617 0.893 0.251 0.493 0.827
irtoys 0.264 0.511 0.844 0.387 0.617 0.891 0.247 0.478 0.817
catIrt 0.265 0.511 0.844 0.388 0.618 0.891 0.246 0.477 0.816
30 1000 mirt 0.269 0.517 0.848 0.367 0.602 0.891 0.258 0.490 0.829
psych 0.260 0.508 0.842 0.364 0.600 0.881 0.232 0.469 0.803
WinGen 0.251 0.499 0.839 0.387 0.616 0.893 0.255 0.498 0.831
irtoys 0.270 0.518 0.848 0.364 0.599 0.881 0.228 0.465 0.799
cat