6TH INTERNATIONAL CONGRESS ON MEASUREMENT ANDc-meep2018.hku.edu.tr/Content/files/CMEEP2018-Proceedings.pdf · İlker Kalender, PhD. İsmail Karakaya, PhD. Kadriye Belgin Demirus,

i

6TH INTERNATIONAL CONGRESS ON MEASUREMENT AND

EVALUATION IN EDUCATION AND PSYCHOLOGY

September 5-8, 2018

Prizren, KOSOVO

The responsibility of the contents of this book belongs to the authors of the papers.

September, 2018

ii

INTRODUCTION

Proceedings, conferences, and panel discussions were held at the 6th International

Conference on Measurement and Evaluation in Psychology at Prizren University in Kosovo. The

Congress was hosted by Ankara University for the first time in 2008. It has gained an international

quality this year since this is the first time Congress has organized abroad, in Kosovo.

Among the participants of Congress were the Ministry of Education representatives from

Turkey, academics who are working in different universities in Turkey, testing and assessment

professionals and graduate students. 122 researchers participated in the Congress.

In cooperation with Prizren University of Kosovo, in the opening of the Congress included

a large protocol consisting of the Kosovo Turkish Democratic Party MP, the Mayor of Prizren, the

Turkish Consulate in Prizren, the Rector of the Prizren University, the Dean of the Faculty of

Education and the Vice Dean, the representative of the Kosovo Yunus Emre Institute and the

representative of the Kosovo Military Representative Committee. Protocol, stated that they were

happy to welcome guests from Turkey and emphasized that the importance of education and

cooperation. Keynote speakers of the Congres were Hasan Kalyoncu University Vice Rector, Dean

of Faculty of Education, Chairman of the Board of Directors of EPODDER (Association of

Educational and Psychological Assessment and Evaluation).

A total of 12 faculty members working in 7 different countries took part in the scientific

committee of the congress organized by Prizren University, Hasan Kalyoncu University, and

EPODDER. Two invited speakers attended the congress. Congress included lectures titled as

Using Unidimensional Models to Represent a Two Dimensional Latent Space by Prof. Dr. Terry

Ackerman and Strategies for Addressing High Dimensional Cognitive Diagnostic Assessment

Problems by Jimmy De La Torre.

A total of 105 papers about large-scale tests were presented in 28 different sessions. Most

of them focused on the preparation, scoring, application, and evaluation of tests. These studies

were carried out on the basis of international scale tests such as PISA, TIMSS, as well as national

large-scale tests such as TEOG and SBS. The presentations were made in two languages, English

and Turkish. In a panel regarding the problems and solutions experienced in the transition between

the grades in Turkey were discussed.

We want to express our gratitude to the Pegem Academy, Nobel Publishing House, Yunus

Emre Institute, TIKA, Ministry of National Education, Evaluation and Examination Services. We

also thank the members of the Congress Organizing Committee and the members of the

Scientific Committee who have contributed to the organization of our congress.

Co-Presidents of the Congress Organizing Committee

Prof. ener Bykztrk Prof. Selahattin Gelbal

iii

CONGRESS HONORARY MEMBERS

Prof. Dr. Tamer Ylmaz, Hasan Kalyoncu University

Prof. Dr. Ram Vataj, Prizren University

CO-CHAIRS OF THE CONGRESS ORGANIZATION BOARD

Prof. Dr. ener Bykztrk, Hasan Kalyoncu University

Prof. Dr. Selahattin Gelbal, Hacettepe University

CONGRESS ORGANIZATION BOARD

Assoc. Prof. Ismet Temaj, Prizren University

Assistant Prof. Shemsi Morina, Prizren University

Assoc. Prof. smail Karakaya, Gazi University

Assoc. Prof. Alper Kse, Bolu Abant Izzet Baysal University

Assistant Prof. Ufuk Akba, Hasan Kalyoncu University

Assistant Prof. Ersoy Karabay, Hasan Kalyoncu University

Assistant Prof. Soner Yldrm, Prizren University

Meral Alkan, PhD, Ministry of National Education

Serdan Kervan, PhD, Prizren University

SCIENCE COMMITTEE*

Ali Baykal, Bahcesehir University

Cindy M. Walker, Duquesne University

Hlya Keleciolu, Hacettepe University

Ismet Temaj, Prizren University

Jimmy De La Torre, The University of Hong Kong

Kadriye Ercikan, University of British Columbia

Mehtap akan, Gazi University

Ron Hambleton, University of Massachusetts Amherst

Selahattin Gelbal, Hacettepe University

ener Bykztrk, Hasan Kalyoncu University

Terry Ackerman, The ACT

Zekeriya Nartgn, Abant zzet Baysal University

*Sorted in alphabetical order.

SECRETARY

Ayfer Sayn, PhD, Gazi University

Research Assistant Merve Yldrm Seheryeli, Hasan Kalyoncu University

iv

WEB DESIGN AND DEVELOPMENT

Software Developer Turul Trpan, Hasan Kalyoncu University

REVIEWERS/REFEREES*

Adnan Erku, PhD.

Ahmet Salih imek, PhD.

Ahmet Turhan, PhD.

Ahmet Yldrm, PhD.

Asiye engl Avar, PhD.

Ayfer Sayn, PhD.

Aylin Albayrak Sar, PhD.

Ayegl Altun, PhD.

Bahar ahin Sarkn, PhD.

Bayram Bak, PhD.

Bilge Gk Bekci, PhD.

Burak Aydn, PhD.

Burhanettin zdemir, PhD.

C. Deha Doan, PhD.

Cem Oktay Gzeller, PhD.

idem Akn Arkan, PhD.

Deniz Tue zmen, PhD.

Derya akc Eser, PhD.

Derya obanolu Aktan, PhD.

Devrim zdemir Alc, PhD.

Dilara Bakan Kalaycolu, PhD.

Durmu zba, PhD.

Duygu Anl, PhD.

Duygu Gizem Saral Ertoprak, PhD.

Duygu Koak, PhD.

Elif Bengi nsal zberk, PhD.

Emine Burcu Pehlivan Tun, PhD.

Emine nen, PhD.

Emrah Gl, PhD.

Emre Toprak, PhD.

Eren Can Aybek, PhD.

Eren Halil zberk, PhD.

Ersoy Karabay, PhD.

Esin Tezbaaran, PhD.

Esin Ylmaz Koar, PhD.

Esra Eminolu zmercan, PhD.

Ezel Tavancl, PhD.

Ezgi Mor Dirlik, PhD.

Fatih Kezer, PhD.

Fazilet Tademir, PhD.

Fethi Toker, PhD.

Funda Nalbantolu Ylmaz, PhD.

v

Gizem Uyumaz, PhD.

Gl ekerciolu, PhD.

Glin rak Uzun, PhD.

Glden Kaya Uyank, PhD.

Glen Tadelen Teker, PhD.

Hakan Atlgan, PhD.

Hakan Koar, PhD.

Hakan Yavuz Atar, PhD.

Halil Yurdugl, PhD.

Hamide Deniz Gllerolu, PhD.

Hatice Kumanda, PhD.

Hlya Keleciolu, PhD.

Hlya Yrekli, PhD.

. Alper Kse, PhD.

lhan Akhun, PhD.

lker Kalender, PhD.

smail Karakaya, PhD.

Kadriye Belgin Demirus, PhD.

Kbra Atalay Kabasakal, PhD.

Levent Yakar, PhD.

Lokman Akbay, PhD.

Mehtap akan, PhD.

Melek Glah ahin, PhD.

Meltem Acar Gvendir, PhD.

Meral Alkan, PhD.

Murat Akyldz, PhD.

Murat Doan ahin, PhD.

Mnevver Baman, PhD.

Nagihan ztrk Boztun, PhD.

Nee Gler, PhD.

Nee ztrk Gbe, PhD.

Nilfer Kahraman, PhD.

Nizamettin Ko, PhD.

Nkhet Demirtal, PhD.

Okan Bulut, PhD.

mer Kutlu, PhD.

mr Kaya Kalkan, PhD.

nder Snbl, PhD.

zge Bal Altnta, PhD.

zge Bkmaz Bilgen, PhD.

Ragp Terzi, PhD.

Recep Serkan Ark, PhD.

Sakine Ger ahin, PhD.

Satlm Tekindal, PhD.

Seil mr Snbl, PhD.

Sedat en, PhD.

Seher Yaln, PhD.

Selahattin Gelbal, PhD.

Sema Sulak, PhD.

Sevda etin, PhD.

vi

ener Bykztrk, PhD.

eref Tan, PhD.

eyma Yksel, PhD.

Tlin Acar, PhD.

Trker Toker, PhD.

Ufuk Akba, PhD.

mit elen, PhD.

Yaar Baykul, PhD.

Yavuz Akpnar, PhD.

Yeim zer zkan, PhD.

Yurdagl Gnal, PhD.

Zekeriya Nartgn, PhD.

*Sorted in alphabetical order.

vii

CONTENTS

INTRODUCTION ........................................................................................................................................ ii

CONGRESS HONORARY MEMBERS .................................................................................................... iii

CO-CHAIRS OF THE CONGRESS ORGANIZATION BOARD ............................................................ iii

CONGRESS ORGANIZATION BOARD .................................................................................................. iii

SCIENCE COMMITTEE* .......................................................................................................................... iii

SECRETARY .............................................................................................................................................. iii

WEB DESIGN AND DEVELOPMENT .................................................................................................... iv

REVIEWERS/REFEREES* ....................................................................................................................... iv

CONTENTS ............................................................................................................................................... vii

SUMMARY ..................................................................................................................................................1

AN ANALYSIS ON THE MODEL FIT INDICES IN A SCALE CONSTRUCTION PROCESS .............2

KOMPOZSYON PUANLAMADA PUANLAYICILAR ARASI GVENRLK BELRLEME

YNTEMLERNN KARILATIRILMASI .............................................................................................7

INVESTIGATION of SUBGROUP INVARIANCE OF EQUATING FUNCTIONS .................................8

THE COMPARISON OF ITEM AND ABILITY ESTIMATIONS CALCULATED FROM THE

PARAMETRIC AND NON-PARAMETRIC ITEM RESPONSE THEORY ACCORDING TO THE

SEVERAL FACTORS ................................................................................................................................ 12

A COMPARISON OF KERNEL EQUATING METHODS UNDER NEAT DESIGN ............................ 16

MADDE TEPK KURAMI SMLASYONLARININ YAPI GEERL VE GVENRLK

AISINDAN NCELENMES ................................................................................................................... 17

ANALYSING TEACHERS VIEWS ON THE PROBABLE NEGATIVE EFFECTS OF HIGH STAKE

TESTS THROUGH PAIRED COMPARISONS BASED ON RASCH ANALYSIS ................................ 22

DORULAYICI FAKTR ANALZ VE OK DZEYL MODELLEME ZERNE BR

KARILATIRMA: DENZL L RNE ............................................................................................. 29

APPLICABILITY OF THE THEORETICAL EXAM FOR CANDIDATES OF DRIVING LICENCE

TRAINEES AS THE COMPUTER ADAPTIVE TEST* ........................................................................... 36

THE IMPACT OF COMPLEX MULTIPLE CHOICE ITEMS ON ITEM DIFFICULTY ........................ 41

COMPARING PERFORMANCE OF DIFFERENT EQUATING METHODS in PRESENCE and

ABSENCE of DIF ITEMS in ANCHOR TEST .......................................................................................... 42

DINA MODEL EREVESNDE DIF BELRLENMES ........................................................................ 43

CNSYETN PROBLEML NTERNET KULLANIMI ZERNDEK ETKS: BR META-ANALZ

ALIMASI ............................................................................................................................................... 46

TRK RENCLERN BLG VE LETM TEKNOLOJLERNE ERM VE KULLANIM

DZEYLERNN PISA FEN BAARISINA ETKS ............................................................................... 47

PISA FEN MADDELERNN FeTeMM ETMNN ZELLKLER AISINDAN NCELENMES

..................................................................................................................................................................... 62

COMPARING THE PERFORMANCE OF DATA MINING METHODS IN CLASSIFYING

SUCCESSFUL STUDENTS WITH SCIENTIFIC LITERACY IN PISA 2015* ...................................... 68

viii

AKADEMK YILMAZLIK LENN TRKEYE UYARLAMA ALIMASI ............................. 76

TEST EQUATING WITH BAYESIAN NONPARAMETRIC MODEL ................................................... 77

BLMSEL AKIL YRTME TESTNN TRKEYE UYARLANMASI VE PSKOMETRK

ZELLKLERNN BELRLENMES ...................................................................................................... 82

DERECEL PUANLANMI LEKLER N PUANLAYICILARIN DEERLENDRLMESNDE

YAPISAL ETLK YAKLAIMININ KULLANILMASI ...................................................................... 83

MADDE TAKIMINDA YER ALAN MADDELERDE YEREL BAIMLILIIN MADDE TAKIMI

TEPK MODEL LE NCELENMES ...................................................................................................... 86

MATEMATK RETM KAYGISI LE GVENRLK VE GEERLK ALIMASI ............. 91

AN ANALYSIS OF ENGLISH TEST IN 2016 YEAR UNDERGRADUATE PLACEMENT EXAM IN

TERMS OF DIFFERENTIAL ITEM FUNCTIONING ............................................................................. 99

AN INVESTIGATION OF FOREIGN LANGUAGE PRODUCTION OF CABIN CREW CANDIDATES

DURING THEIR EMPLOYMENT PROCESS ........................................................................................ 104

RENCLERN SBS-MATEMATK BAARILARINI ETKLEYEN DEKENLERN VE OKUL

KATMA DEERNN HYERARK LNEER MODELLEME ANALZ YOLUYLA

BELRLENMES* .................................................................................................................................... 107

K KATEGORL PUANLANAN MADDELERDE MADDE TEPK KURAMINA DAYALI

PARAMETRE KESTRM: BILOG, Mplus, ltm KARILATIRMASI ............................................... 113

RETMENLERIN KAYNATIRMA UYGULAMALARINDAKI LME VE DEERLENDIRME

YETERLIKLERI VE PROBLEMLERI ................................................................................................... 114

TREND ANALYSIS OF TURKEY IN INTERNATIONAL EDUCATION SURVEYS: USING TIMSS

AND PISA RESULTS .............................................................................................................................. 118

TEK UYGULAMAYA DAYALI SINIFLAMA TUTARLILII VE SINIFLAMA DORULUU

YNTEMLERNN KARILATIRILMASI ......................................................................................... 119

GRSEL METN TEMELL YENLK MADDE FORMATI N TASARLANAN TEST

SKALASININ MADDE AYIRT EDCLNN NCELENMES ........................................................ 125

BYK RNEKLEMLERDE KAYIP VER LE BA ETME YNTEMLERNN LME

DEMEZLNE ETKS AISINDAN KARILATIRILMASI .................................................... 130

KOMPLEKS OKTAN SEMEL SORULARIN SEENEK SAYILARININ SORUNUN

PSKOMETRK ZELLKLERNE ETKS .......................................................................................... 138

OKLU DORUSAL REGRESYON VE GENELLETRLM TOPLAMSAL MODELLERN

AIKLANAN VARYANS SONULARININ KARILATIRILMASI ............................................... 139

SEM TEORS TEMELL KARYER DEERLER LE: GEERLK VE GVENRLK

ALIMALARI ........................................................................................................................................ 140

THE ILLUSTRATION OF VALIDITY ARGUMENTS WITH DIFFERENT CONFIRMATORY

FACTOR ANALYSIS MODELS ............................................................................................................. 147

PISA 2003 VE 2012 MATEMATK OKURYAZARLII PUANLARININ LT GEERL:

BEKLENT TABLOLARI VE UYUM ANALZ ................................................................................... 151

ITEM ORDER EFFECT ON STUDENT PERFORMANCE IN LARGE-SCALE TESTING ................ 154

A COMPARISON OF SOFTWARE PACKAGES AVAILABLE FOR DINA MODEL ESTIMATION

................................................................................................................................................................... 158

ix

AN EVALUATION OF 4PL IRT AND DINA MODELS FOR ESTIMATING PSEUDO-GUESSING

AND SLIPPING PARAMETERS ............................................................................................................ 159

A NON-DIAGNOSTIC ASSESSMENT FOR DIAGNOSTIC PURPOSES: Q-MATRIX VALIDATION

AND ITEM-BASED MODEL FIT EVALUATION FOR THE TIMSS 2011 ASSESSMENT .............. 163

POZTF DUYGU DURUM LENN PSKOMETRK ZELLKLERNN

DERECELENDRLM TEPK MODEL LE NCELENMES .......................................................... 164

BREYSELLETRLM BLGSAYARLI TEST VE BREYSELLETRLM OK AAMALI

TESTLERN PERFORMANSLARININ NCELENMES ...................................................................... 171

TIMSSTE DNGDE YER ALAN MADDELER N MADDE KAYMASININ NCELENMES

................................................................................................................................................................... 172

NVERSTE RENCLERNN BAARILARININ DEERLENDRLMESNDE E-PORTFOLYO

KULLANIMINA LKN GRLERN BELRLENMES ............................................................... 178

BAYESIAN VE NONBAYESIAN KESTRM YNTEMLERNE DAYALI SINIFLAMA

DORULUU NDEKSLERNN FARKLI RNEKLEM KOULLARINDA NCELENMES ........ 184

A COMPARISON OF MIXED EFFECTS LOGISTIC REGRESSION AND LOGISTIC REGRESSION

................................................................................................................................................................... 185

21. YZYIL ETMNDE NE IKAN RENC YETERLKLER, SINIF LME VE

DEERLENDRME ANLAYIININ DNM VE GELECE* ................................................. 188

DUYUSAL ZELLKLER, ETM KAYNAKLARI VE OKUL KLMNN FEN BAARISINA

ETKS: PISA 2015 .................................................................................................................................. 193

INVESTIGATION THE EFFECTS OF ITEM AND STUDENT RELATED VARIABLES ON TURKISH

STUDENTS MATHEMATICS PERFORMANCE* .............................................................................. 197

NANOTEKNOLOJ FARKINDALIK LENN TRK KLTRNDEK GEERLK ve

GVENRLK ALIMASI .................................................................................................................... 204

ULUSLARARASI RENC DEERLENDRME PROGRAMI (PISA-2015) FEN OKURYAZARLII

TEST MADDELERNN KLTRLERARASI LME DEMEZLNN NCELENMES ...... 213

OKBOYUTLU AAMALI TEPK KURAMI ALTINDA RETLEN TEK BML VE TEK BML

OLMAYAN VER GRUPLARINDA OLABLRLK ORANI YNTEM I. TP HATA VE G

ALIMASI ............................................................................................................................................. 221

MEB TARAFINDAN RETMENLERE NERLEN ETM TEMALI FLMLERN ERDKLER

LME VE DEERLENDRME ELERNE GRE NCELENMES ............................................. 226

INVESTIGATING DIFFERENTIAL ITEM FUNCTIONING OF THE ANKARA UNIVERSITY

EXAMINATION FOR FOREIGN STUDENTS BY RECURSIVE PARTITIONING ANALYSIS IN THE

RASCH MODEL ...................................................................................................................................... 234

YEREL BAIMSIZLIK VARSAYIMININ HLAL DURUMUNDA PARAMETRK VE

PARAMETRK OLMAYAN MADDE TEPK KURAMI MODELLERNDEN KESTRLEN MADDE

PARAMETRELERNN KARILATIRILMASI .................................................................................. 239

COMPARISON OF TEACHER PERFORMANCE BY ANSWERS FROM DIFFERENT COUNTRY

STUDENTS .............................................................................................................................................. 244

COMPARISON OF THE CLASSIFICATION PERFORMANCE OF COMPUTER ADAPTIVE

TESTING AND MULTI-STAGE COMPUTER ADAPTIVE TESTING ................................................ 248

x

RTK SINIF ANALZ MPLUS UYGULAMASI: PSKOLOJK DAYANIKLILIK ZERNE BR

RNEK ..................................................................................................................................................... 251

FEN Z YETERL YAPISININ OK GRUPLU DFA YNTEM LE LME DEMEZLNN

NCELENMES ........................................................................................................................................ 257

GRME ENGELL RENCLERE YNELK ULUSAL SINAV UYGULAMALARININ RENC

VE RETMENLER TARAFINDAN DEERLENDRLMES .......................................................... 263

GZL DEKEN ORTALAMA YAPILARININ DEMEZLNN ORTALAMA VE

KOVARYANS YAPILARI ANALZ YAKLAIMIYLA TEST EDLMES ....................................... 276

DAILIM ARPIKLIININ OMEGA NDEKSNN PERFORMANSI ZERNDEK ETKLERNN

ETL KOULLAR ALTINDA NCELENMES ............................................................................... 284

M4 STATSTNN CEVAPLAMA BENZERL BELRLEME PERFORMANSININ

NCELENMES ........................................................................................................................................ 287

RENCLERN BAZI ZELLKLERNE GRE SAYISAL VE SZEL YETENEKLER LE

OLUTURULAN LME MODELNN DEMEZLNN KOVARYANS YAPILARI ANALZ

YAKLAIMIYLA NCELENMES ......................................................................................................... 293

OK BOYUTLU TESTLERDE DEEN KME FONKSYONUNUN BELRLENMES ............... 300

KARAR AALARININ SINIFLANDIRMA PERFORMANSLARININ BAIMSIZ

DEKEN TRNE GRE KARILATIRILMASI ......................................................................... 301

ORTAOKUL RETM PROGRAMLARINDA YER ALAN KAZANIMLARIN BLSEL

DZEYLERE DAILIMININ NCELENMES ..................................................................................... 306

KAYIP VER ATAMA YNTEMLERNN CRONBACH ALFA VE TABAKALI ALFA GVENRLK

KATSAYILARI ZERNDEK ETKSNN NCELENMES ............................................................... 314

PISA PROBLEM ZME BECERLERNN PISADAK MATEMATK, OKUMA VE FEN

BAARISINA ETKSNN NCELENMES ........................................................................................... 325

MADDE TEPK KURAMINA DAYALI TEST ETLEME YNTEMLERNN

KARILATIRILMASI: PISA 2012 FEN TEST RNE ................................................................... 327

OK BOYUTLU TEST YAPILARINDA GVENIRLIIN NCELENMESI ...................................... 328

KARMA DAILIMLARDA MADDE TEPK KURAMI MODELLER LE MADDE PARAMETRE

KAYMASININ NCELENMES .............................................................................................................. 333

CHANGES IN TRANSITION SYSTEM FROM BASIC EDUCATION TO SECONDARY

EDUCATION: A HISTORICAL, DESCRIPTIVE AND CAUSAL VIEW ............................................ 336

ETM ARATIRMALARINDA ELM PUANI ELETRME YNTEMNN KULLANILMASI

................................................................................................................................................................... 340

EVALUATION OF HUMAN RESOURCES MANAGEMENT CERTIFICATE PROGRAM .............. 345

DEEN MADDE FONKSYONUNUN MN VE MD ORTAK TESTE ETKS ........................... 350

ZEKA LEKLERNDE KLASK TEST VE MADDE TEPK KURAMINA GRE PUANLAMA

YNTEMLERNN KARILATIRILMASI (KBIT-2 RNE) ......................................................... 351

TEST ETLEMEDE MADDE TEPK KURAMINA DAYALI YETENEK PARAMETRESNE

YNELK LEK DNTRME YNTEMLERNN KARILATIRILMASI ............................ 354

STANDARD SETTING IN EFL WRITING ASSESSMENT WITH MANY-FACET RASCH

MEASUREMENT MODEL ..................................................................................................................... 355

xi

MADDE/MODL SEME YNTEMLERININ KARILATIRILMASI: BIREYSELLETIRILMI

BILGISAYARLI TEST VE BIREYSELLETIRILMI BILGISAYARLI OK AAMALI TEST

UYGULAMASI ........................................................................................................................................ 361

BR ELDRC ANALZ YNTEM NERS .................................................................................... 362

TIMSS 2015 FEN TEST MADDELERNN DL VE KLTRE GRE DEEN MADDE

FONKSYONU AISINDAN NCELENMES ...................................................................................... 370

INVESTIGATION INVARIANCE OF THE DINA MODEL PARAMETERS WITH REAL DATA ... 380

ANKOR TEST DESENNDE BLSEL TANI MODEL RTK YETENEK SINIFLARI LE TEST

ETLEME ALIMASI ......................................................................................................................... 384

YAPILANDIRILMI VE OKTAN SEMEL MADDELERDE PARAMETRE DEMEZLNN

NCELENMES: BLSEL TANI MODELLERNDE MADDE FORMATININ ETKS ................... 389

EXAMINATION OF CDM ATTRIBUTE PREVALENCE FOR HIGHER-ORDER THINKING SKILLS

ITEMS ....................................................................................................................................................... 394

AKADEMK PERFORMANS LMLERNDE MADDE FORMATININ BLSEL VE ST-

BLSEL ZLEME BECERS ZERNE ETKS ................................................................................ 399

OKUL, EV VE RENC ZELLKLERNN 4. SINIF RENCLERNN TIMSS MATEMATK

BLSEL BECERLERNE ETKS ....................................................................................................... 403

PISA 2015 FEN OKURYAZARLII MADDELERNN DEEN MADDE FONKSYONLARI

AISINDAN NCELENMES ................................................................................................................. 404

RENCLERN TIMSS 2015 MATEMATK YETERLK DZEYLERN ETKLEYEN

DEKENLERN OKLU UYUM ANALZ LE BELRLENMES ................................................. 411

NIVERSITE RENCILERININ YAZMA BECERILERININ LLMESINE YNELIK

BOYLAMSAL BIR LME ARACI GELITIRILMESI VE DEERLENDIRILMESI ...................... 420

SZEL YETENEK TESTINDE KISA CEVAPLI VE OKTAN SEMELI MADDE TRNDEN ELDE

EDILEN SONULARIN KARILATIRILMASI ................................................................................. 424

TRKYE LME ARALARI DZNNDE YER ALAN AIMLAYICI FAKTR ANALZ

ALIMALARININ PARALEL ANALZ SONULARI LE KARILATIRILMASI ...................... 426

DENK OLMAYAN GRUPLARDA KERNEL ETLEME YNTEMNDEN ELDE EDLEN

SONULARININ KARILATIRILMASI ............................................................................................ 427

DMF KAYNAKLARININ BELRLENMESNDE RTK SINIF DMF YAKLAIMININ

KULLANILMASI: PISA-2015 TRKYE RNE .............................................................................. 429

FARKLI FORMATLARDA VE ORTAMLARDA UYGULANAN LKERT TP LEK LE METRK

LEN PSKOMETRK ZELLKLERNN NCELENMES ........................................................ 434

DEIEN MADDE FONKSIYONUN MADDE AYIKLAMAYA GRE NCELENMESI ................. 435

USE OF MIXED ITEM RESPONSE THEORY IN RATING SCALES ................................................. 439

BLSEL TANI MODELLERNDE Q-MATRSN FARKLI BELRLENMESNN SINIFLAMA

DORULUU VE TUTARLILII AISINDAN KARILATIRILMASI .......................................... 443

1

SUMMARIES

2

Paper ID: 51

AN ANALYSIS ON THE MODEL FIT INDICES IN A SCALE

CONSTRUCTION PROCESS

Esin Tezbaaran

stanbul University-Cerrahpaa Hasan Ali Ycel Education Faculty

[email protected]

Abstract: There are a lot of research studies on evaluating the models using some cutoff values of the model fit indices

with no statistical approach. This study analyzes the changing of model fit indices when changing the number of

factors and changing the appointed observed variables to the factors using the same observed variables under the same

sample size in a scale construction process. This research was carried out using real data with the participation of 1013

individuals. The data was collected by the Attitude Scale Towards Learning. The model that is specified according to

the results of exploratory factor analysis and the remodeled constructions at random were investigated by the model

fit indices. The results revealed that whatever the factor numbers are, the divergence among factors affects the

changing the model fit indices. Moreover, he remodeled scale constructions that are non-theoretical and non-analytical

constructions do not seem so poor to model fit.

Keywords: Model fit indices, divergent validity

INTRODUCTION

Goodness-of-fit assessment refers to how well our model reproduces the data generating process (Maydeu-

Olivares, 2017). There are some fit and misfit indices that are the most common for structural equation

models (SEM) in literature. In most of the traditional and typical SEM applications, the operationalized

theories assume one of three forms (Mueller & Hancock, 2008): A measured variable path analysis

(MVPA), a confirmatory factor analysis (CFA), a latent variable path analysis (LVPA). General purpose

of the statistical modeling is to provide an efficient and convenient way of describing the latent structure

underlying a set of observed variables (Byrne, 1994). Therefore, the model structure should be assessed in

terms of efficiency and convenience of describing statistical models.

Data model fit assessment, the fit between observed data and the hypothesized model are ideally

operationalized as an evaluation of the degree of discrepancy between the true population covariance matrix

and that implied by the models structural and nonstructural parameters and there are three broad categories

of fit indices (Mueller & Hancock, 2008): Absolute fit indices (standardized root mean square residual

(SRMR), the chi-square test, and the goodness-of-fit index (GFI)), parsimonious indices (the root mean

square error of approximation (RMSEA) with its associated confidence interval, the Akaike information

criterion (AIC), the adjusted goodness-of-fit index (AGFI)), incremental indices (the comparative fit index

(CFI), the normed fit index, (NFI), and the non-normed fit index (NNFI)). Mueller & Hancock (2008) gives

a table (Table 1) to assess the degree of data-model misfit, various fit indices can be obtained and then

should be compared against established cutoff criteria (e.g., those empirically derived by Hu & Bentler,

1999 cited in Mueller & Hancock 2008).

Table 1. Target values for selected fit indices to retain a model by class

Index Class

Incremental Absolute Parsimonious

NFI 0.90

NNFI 0.95 GFI 0.90 AGFI 0.90

CFI 0.95 SRMR 0.08 RMSEA 0.06

Joint Criteria

NNFI, CFI 0.96 and SRMR 0.09

3

SRMR 0.09 and RMSEA 0.06

Reference: Mueller & Hancock 2008

An absolute-fit index directly evaluates how well an a priori model reproduces the sample data (Hu &

Bentler, 1998). It doesnt depend on a comparison with another model such as saturated models or the

observed data (Ullman, 2001). Parsimonious indices evaluate the overall discrepancy between observed

and implied covariance matrices while taking into account a models complexity; fit improves as more

parameters are added to the model, as long as those parameters are making a useful contribution (Mueller

& Hancock, 2008). Incremental fit indices are also called comparative fit indices. Bentler and Bonett (as

cited in Hu & Bentler, 1998 ) describes that these indices as a null model in which all the observed variables

are allowed to have variances but are uncorrelated with each other is the most typically used baseline model.

As it appears, fit indices can be used to evaluate the model in different points of view. Different indices

reflect a different aspect of model fit (Hooper, Coughlan & Mullen, 2008). Kline (2005) recommends a

minimal set of indices (the model chi-square, RMSEA with its 90% confidence interval, CFI, SRMR) that

should be reported and interpreted when resulting from SEM analysis.

In addition to all of above explanations, the estimation method and sample size have an important effect on

the fit indices. If the assumptions regarding normality and independency among factors and errors seem

plausible, maximum likelihood (ML) and generalized least square estimation techniques are better choices

(Ullman, 2001). Moreover, the role of sample size in evaluating the fit index should be taken into account

to detect the discrepancy between the model and the data generating mechanism powerfully (Maydeu-

Olivares, 2017).

In this research paper, it is aimed that how the model fit indices changes and how the changings can be

explained, when changing the number of factors and changing the appointed observed variables to the

factors using same observed variables in a scale construction process. Also, the evaluations of the models

fit in terms of cutoff criterion of the indices are considered.

METHODOLOGY

This research is a basic research since the aim of the study is to make some inferences related to cut off

criterion of most frequently used model fit indices under comparing different constructed models for scale

development. The basic research is designed to increase scientific understanding of phenomena (Best &

Kahn, 2006) and to add to the general knowledge with little or no concern for the immediate application of

the knowledge produced (Bogden & Biklen, 2007).

The data in the research was collected from 1013 individuals who voluntarily participated in the study.

Trial version of 44-item Attitude Scale Towards Learning, which was developed by Fenerciolu (2003),

was used as a data collection tool.

As the first step of the data analysis, exploratory factor analysis (EFA) was used to determine the scale

construct. Then four different scale constructions were analyzed by CFA to get their model fit indices. One

of the scale construction is getting from the EFA result. For the second construct, the items and the number

of factors revealing by the EFA was used but the items were assigned to the factors randomly. For the third

and fourth constructs, the same items getting from EFA were used two times and assigned them randomly

to only two factors. In this way, the changing value of the model fit indices as change in model structure

can be seen and also how the model fit indices evaluate the different constructs using same observed

variables.

FINDINGS

Firstly, data cleaning procedure was done before factor analysis. Missing value analysis using Littles

MCAR test shows that the data are missing completely at random (2 (3018)= 3033,5, p>0,05). In addition,

missing values were replaced with the values computed by expectation maximization method. Four items

didnt meet normality assumption and 18 items have no statistically significant correlation with the other

items. These items were removed from the scale. Remaining items (22 items) were analyzed using principal

component analysis (PCA) with oblique rotation to get the factorial construct of the scale and the criterion

level for factor loading was 0.30. The sampling adequacy for the analysis is acceptable (KMO= 0.886).

Bartletts test of sphericity shows that correlations between items were sufficiently large (2 (231)=

4

4426.88, p,00001) for PCA. The analysis shows that five

components (factor will be used instead of component in CFA) had eigenvalues over Kaisers criterion

of 1 and in combination explained 46.88% of the variance. The Cronbach alfa value is 0.73 for this scale.

For the second scale construct, 22 same items were used and they were assigned randomly to the five

factors. Then using these items, another two scales constructed with two factors having equal number of

items at random. In this way, four scale construct was used in this study: One of the construct was specified

by EFA and the others at random. Then CFA was conducted to determine the adequacy of the models. For

the estimation method of the population parameters, maximum likelihood (ML) was used. Because ML can

be robust if observed variables do not have multinormality assumption (Joreskog & Yang, 1996).

As it can be seen in table 2, the best values of all fit indices belong the construct from EFA as expected.

When comparing five-factor constructs, the meaningful difference is the correlations between factors. All

correlation coefficients values (10 correlations) are smaller in the construct getting from EFA than those in

the construct with arranging items to the five factors at random (Figure 1).

(a) (b)

Figure 1. Path diagrams of the construct getting from EFA (a) and the construct randomly assigned

items to the five factors (b)

The evident difference between randomly formed constructs with two factors is the level of correlation

between the factors. The model fit indices of the construct with low correlation between the factors were

found better when compared to the other construct with high correlation. Moreover, the model fit indices

of this construct is quite close to those of five-factor construct arranging randomly (Table 2).

Table 2. The model fit indices of different scale constructs

Number

of

factors

Corr.

Between

factors

NFI NNFI CFI GFI SRMR 2 (df) RMSEA

(CI90)

AGFI

5 (EFA-oblique

rotation)

Figure 1-

a

0.95 0.96 0.97 0.95 0.044 556.74

(197)

0.042 (0.038,

0.047)

0.94

5 (randomly

assigned)

Figure 1-

b

0.90 0.91 0.92 0.91 0.055 1093.13

(199)

0.067 (0.063,

0.071)

0.89

2 (randomly

assigned)

0.96 0.89 0.90 0.91 0.90 0.056 1271.51

(208)

0.071 (0.067,

0.075)

0.88

2 (randomly

assigned)

0.87 0.90 0.91 0.92 0.90 0.055 1200.05

(208)

0.069 (0.065,

0.072)

0.88

5

As for the evaluation of the constructs in terms of fitting, it was seen that the construct obtained from EFA

was acceptable fitting according to all indices. It was also seen that the constructs with randomly assigned

items to the factors were only at acceptable level according to the absolute fit indices, and were not

acceptable in the eye of incremental and parsimonious indices.

CONCLUSIONS

This research offers two basic results. One of them is related to the evaluation of the model fit index results

which belong to casual model with great care. Particularly in the case of testing hypothesis for the fitting

of theoretical model, the fitting can be accepted when using cutoff values of indices although the model is

not adequate.

In this research, it is expected that the construct getting from EFA, that is a mathematical procedure

designed to identify factors, has more fitting values than the fitting values of the constructs designed

randomly. The description of a construct provides information about the expected relationships among

variables; factor analysis helps to determine whether this pattern of relationships does indeed exist (Murphy

& Davidshofer, 2001). However, the fitting indices of the constructs randomly designed bear the risk of

acceptable fitting if more tolerant criterions are taken into account. Traditional goodness-of-fit indices also

are unduly influenced by the good fits in the measurement portions of the model and can yield values in the

.90s even when the structural relations among the latent variables of the model are seriously misspecified

(Mulaik, James, Alstine, Bennett, Lind, & Stilwell, 1989).

All indices give their cutoff values to the researchers to evaluate whether their models fit. However, a

problem exists when using fixed cutoff criterions that determine the model fit. The problem is that there is

no obvious external variable criterion outcome and there exists many different fit indices, each one or more

proposed cutoff values (Barrett, 2006; Maydeu-Olivares, 2017). Maydeu-Olivares (2017) declares three

problems with current practices. First they are pre-statistical which means that there is no any description

about the parameter being estimated or proof that the index is an asymptotically unbiased estimate of the

parameter of interest. Furthermore, sampling variability of the statistic is ignored. Second, partly as a result

of the approach being pre-statistical, no agreement is possible regarding which index to use or what cutoff

value. Third, and most importantly, inferences on the model parameters are made as if the model were

correct, when in fact, it is mis-specified. Most indices showed somewhat inadequate sensitivity to

misspecification with many also showing sensitivity to sample size (e.g., chi-square, SRMR), numbers of

factors (e.g., Mc), numbers of items (e.g., RMSEA), or some combination of these model conditions

(Meade, 2008).

The other result of this research is that, without the effect of the number of factors, the values of model fit

indices (except GFI) are affected negatively when the correlation between factors increases. Low

correlation between factors is an important discriminant evidence for construct validity. A validity

coefficient showing little relationship between test scores and/or other variables with which scores on the

test being construct-validated should not theoretically be correlated and provides discriminant evidence of

construct validity (Cohen, Montague, Nathanson & Swerdlik, 1988). According to the findings of this

research, divergent evidence affects the values of model fit indices being in favor of the model.

The studies examining divergence between the factors and convergence under factor loads in the models

can be recommended. In addition, it can be said that there is a need to determine the common cutoff values

for model fit indices with their effect sizes.

REFERENCES

Barrett, P. (2006). Structural equation modelling: Adjudging model fit. Personality and Individual

Differences 42, 815824. doi: 10.1016/j.paid.2006.09.018

Best, J.W. & Kahn, J.V. (2006). Research in education. New York: Pearson Education, Inc.

Bogdan, R. C., & Biklen, S. K. (2007). Qualitative research for education. New York: Pearson Education,

Inc.

Byrne, B.M. (1994). Structural equation modeling with EQS and EQS/Windows. California: Sage

Publication, Inc.

6

Cohen, R.J., Montague, P., Nathanson L.S., & Swerdlik, M.E. (1988). Psychological testing. Mountain

View: Mayfield Publishing Company.

Fenerciolu, E. (2003). Likert tipi leklere madde semede kullanlan Pearson momentler arpm

korelasyon teknii ile ok serili korelasyon tekniinin incelenmesi (Masters thesis). Hacettepe

niversitesi, Sosyal Bilimler Enstits, Ankara, Turkey.

Hooper, D., Coughlan, J., & Mullen, M. R. (2008). Structural equation modelling: Guidelines for

determining model fit. The Electronic Journal of Business Research Methods 6(1), 53 60.

Retrieved www.ejbrm.com

Hu, L., & Bentler, P.M. (1998). Fit indices in covariance structure modeling: Sensitivity to under

parametrized model misspecification. Psychological Methods, 3(4), 424-453.

Joreskog, K.G., & Yang, F. (1996). Nonlinear structural equation models: The Kenny-Judd model with

interaction effects. In G.A Marcoulides, and R.E. Schumacker (Eds.), Advanced structural -

equation modeling: Issues and techniques (pp. 57-88). New Jersey: Lawrence Erlbaum

Asssociates.

Kline, R.B. (2005). Principles and practice of structural equation modeling. New York: The Guilford Press.

Maydeu-Olivares, A. (2017). Assessing the size of model misfit in structural equation models.

Psychometrika, 82(3), 533-558. doi: 10.1007/s11336-016-9552-7

Meade, A. W. (2008, April). Power of AFIs to Detect CFA Model Misfit. Paper presented at the twenty-

third Annual Conference of the Society for Industrial and Organizational Psychology, San

Francisco, USA. Retrieved from http://www4.ncsu.edu/~awmeade/Links/Papers/AFI

(SIOP08).pdf

Mueller, R.O., & Murphy, K.R. ve Davidshofer, C.O. (2001). Psychological Testing Principles and Applications.

New Jersey: Prentice Hall Inc. Hancock, G.R. (2008). Best practices in structural equation

modelling. In J. Osborne (Ed.), Best practices in quantitative methods (pp. 488-510). doi:

http/://dx.doi.org/10.4135/ 9781412995627.d38

Mulaik, S.A., James, L.R., Alstine, J.V., Bennett, N., Lind, S., & Stilwell, C.D. (1989). Evaluation of

goodness-of-fit indices for structural equation models. Psychological Bulletin, 105(3), 430-445.

Murphy, K.R., & Davidshofer, C.O. (2001). Psychological testing principles and applications. New Jersey:

Prentice Hall Inc.

Ullman, J. B. (2001). Structural Equation Modeling. In B.G. Tabachnick and L.S. Fidell (Eds.), Using

Multivariate statistics (pp. 653-771). Boston: Allyn Bacon.
http://www4.ncsu.edu/~awmeade/Links/Papers/AFI

7

Paper ID: 54

KOMPOZSYON PUANLAMADA PUANLAYICILAR ARASI

GVENRLK BELRLEME YNTEMLERNN KARILATIRILMASI

Nee ztrk Gbe1, Mustafa Kln2, Ercan Yldz3, Osman Aydn4, Gl Aslan5, eyma zcan6 1,2Mehmet Akif Ersoy niversitesi, 3,4,5,6Milli Eitim Bakanl

[email protected]

zet: Bu aratrmann amac puanlayclar aras gvenirlii farkl yntemler ile incelemektir. Aratrma kapsamnda,

ilkretim 6. Snfa devam eden 19 rencinin yazm olduu kompozisyonlar drt retmen (puanlayc) tarafndan

birbirinden bamsz olarak btncl puanlama anahtar kullanlarak puanlanmtr. Puanlayclar aras gvenirlii

belirlemek iin uyuma yzdesi, Fleisss Kappa istatistii, Kendall Uyum Katsays W, Spearman sra farklar korelasyon katsays ortalamas, Krippendorff Alfa ve genellenebilirlik kuram ie koulmutur. Elde edilen bulgular,

puanlayclarn verdii puanlar arasnda tam uyuma yzdesi %21.1 iken +/-1 aralnda uyuma yzdesinin %89.5

olduunu gstermitir. Fleissin kappa istatistii puanlayclar arasnda istatistiksel olarak anlaml fakat dk bir

uyum olduunu; Kendalln uyum katsays W puanlayclar arasnda yine istatistiksel olarak anlaml fakat verilen

puanlar arasnda orta dzey bir iliki olduunu gstermitir. Bu bulguyu puanlayclarn vermi olduu puanlar

arasnda hesaplanan Spearman sra farklar korelasyon katsays ortalamas da desteklemitir. Benzer ekilde,

Krippendorff alfa istatistii de puanlayclar arasnda zayf uyum olduunu gstermitir. Bu almada puanlayclar

aras gvenirlii belirlemede kullanlan bir dier yntem genellenebilirlik kuram olmutur ve genellenebilirlik

katsays 0.59 olarak hesaplanmtr. Varyans bileenleri incelendiinde puanlayc ana etkisine ait varyansn en dk

varyans bileeni olduu dolaysyla puanlayclarn birbirine benzer puanlamalar yapt sonucuna ulalmtr. Fakat

renci-puanlayc etkileimine ilikin kestirilen artk varyansn en byk varyans bileeni olduu grlmtr. Bu

durumun oluma nedeninin puanlayclarn sistematik olmayan bir ekilde renciler boyunca farkllamasnn

olabilecei gibi rencilerin kompozisyon puanlarnn almada yer almayan faktrler tarafndan da etkilenmi

olabilecei sonucuna ulalmtr.

Anahtar kelimeler: gvenirlik, puanlayclar aras gvenirlik, kompozisyon puanlama

8

Paper ID: 55

INVESTIGATION of SUBGROUP INVARIANCE OF EQUATING

FUNCTIONS

Kbra Atalay Kabasakal1, Nermin Kbrslolu Uysal1, Sakine Ger ahin2 1Hacettepe University, 2University of Wisconsin-Madison

[email protected], [email protected], [email protected]

Abstract: The purpose of this study is to examine the invariance of equating functions and possible reasons behind

invariance. To accomplish this, the responses to the mathematics test obtained from the TIMMS 2015 examination

were used. The USA and Turkey sample, consisting of 16,295 students were used in line with the purpose of the

research. Gender and both countries were considered as subgroups in this study.There are 14 booklets in TIMSS

administration. Each booklet is linked to each other with common items. In the scope of the study 14 booklets were

equated with equipercentile equating method in each subgroup and equated scores were compared with respect to

REMSD. Moreover, DIF between subgroups, distributional properties, test difficulty and discrimination difference

between subgroups within each booklet were evaluated. The results of the study indicated that the test difficulty and

skewness difference between subgroups and the ratio of DIF items were significantly related with REMSD values.

Keywords: Subgroup invariance, equating functions, equipercentile equating, TIMSS 2015.

INTRODUCTION

Test equating is a method to make test scores which were obtained from different but have same construct

tests comparable. Although there are different methods to equate or link test scores, they are all based on

some certain prerequisites (Dorans & Holland, 2000). Subgroup invariance of equating functions is one of

the most important prerequisite of equating process (Dorans, 2004). Lack of invariance of the equated

scores means that some students have the same score on one scale, but belonging to different

subpopulations, makes them to have different expected test scores on the corresponding equated scale

which directly effects the validity of equated scores. Hence, the purpose of this study is to examineequating

invariance between subgroups and evaluate possible reasons behind the invariance. TIMSS 2015

mathematics scores were equated with equipercentile equating method with nonequivalent groups common

item design. The research questions are given below.

Research Questions

(1) Do equating functions obtained to equate mathematics scores in TIMSS 2015 ensure subgroup

invariance?

i. Do equating functions obtained to equate mathematics scores in TIMSS 2015 administration ensure subgroup invariance in Turkey sample between gender subgroups?

ii. Do equating functions obtained to equate mathematics scores in TIMSS 2015 administration ensure subgroup invariance in USA sample between gender subgroups?

iii. Do equating functions obtained to equate mathematics scores in TIMSS 2015 administration ensure subgroup invariance in Turkey and USA samples between country subgroups?

(2) Are the equating errors obtained to equate mathematics scores in TIMSS 2015 administration between

subgroups related to DIF, distributional properties, test difficulty and discrimination difference between

subgroups?

i. Are the equating errors obtained in Turkey sample between gender subgroups related to DIF, distributional properties, test difficulty and discrimination difference between subgroups?

9

ii. Are the equating errors obtained in USA sample between gender subgroups related to DIF, distributional properties, test difficulty and discrimination difference between subgroups?

iii. Are the equating errors obtained in USA and Turkey sample between country subgroups, related to DIF, distributional properties, and test difficulty and discrimination difference between

subgroups?

METHODOLOGY

In the scope of the study, the invariance of equating functions to equate scores of country and gender

subgroups was evaluated and possible reasons of invariance were examined within TIMSS 2015

administration. Hence, this study is a descriptive study which aims to evaluate validity of equated scores in

TIMSS 2015 administration.

Sample

Turkey and USA subgroups were taken into consideration. Hence, the sample of the study consists of

16,295 eight grade students involved in TIMSS 2015 administration in Turkey and USA.

Data Analysis

In order to achieve its predefined goals, TIMSS administration requires too many items that a student is not

able to respond in one session. Hence, there are 14 item clusters in each domain. Each cluster consists of

approximately 12-18 items regarding number, algebra, geometry, data and probability subject domains.

These clusters are distributed 14 booklets in a way that each booklet consists of two clusters and they are

balanced with respect to content, item number and item format. These booklets are linked by common

clusters to receive a final score for each student.

In data analysis each booklet was equated with following one by common clusters. Data analysis was

conducted in two steps. First, equating invariance between subgroups was evaluated. Equipercentile

equating method was used. Then possible reasons behind the invariance were examined. Two subgroups

were defined as country and gender groups. Country group consists of Turkey and USA sample groups.

Gender group consists of the gender subgroups of each country's sample. Equating invariance was evaluated

in three phases. First, the whole groups scores were equated. Second, the subgroups scores were equated.

Last, the REMSD values were calculated. The REMSD value higher than 10% interpreted as equating

invariance is not ensured (Dorans, 2004; Dorans &Holland, 2000). After REMSD values were calculated, the possible reasons behind equating invariance were evaluated. DIF, differences in distributional

properties, test difficulty and discrimination indexes for each subgroup within each booklet were examined.

Data analysis was conducted in R program.

FINDINGS

The findings were reported for each research question. The results addressing first research question are

summarized in Table 1.

Table 1. REMSD values for each equating relation within subgroups

Booklets TR-USA TR-Gender USA-Gender B1 - B2 0.10 0.05 0.07 B2 - B3 0.11 0.15 0.06 B3 -B4 0.12 0.12 0.07 B4 - B5 0.09 0.12 0.08 B5 -B6 0.06 0.08 0.11 B6 - B7 0.11 0.06 0.10 B7 - B8 0.12 0.06 0.04 B8 - B9 0.07 0.19 0.06 B9 - B10 0.11 0.13 0.03 B10 - B11 0.04 0.10 0.11 B11 - B12 0.10 0.16 0.17 B12 - B13 0.08 0.06 0.06

10

B13 - B14 0.08 0.12 0.07 B14 - B1 0.11 0.11 0.15

As shown in Table 1, the REMSD values are higher than 0.10 for seven equating functions between

countries, for eight equating functions between gender subgroups in Turkey and for four equating functions

between gender subgroups in USA. These results imply that equating invariance is not ensured for some

booklets between the subgroups.

In order to answer second research question DIF, distributional properties, test difficulty and discrimination

indexes difference between subgroups were examined. Number of DIF items in common tests and whole

tests with respect to favoring subgroups were summarized in Table 2.

Table2. Number of DIF items in common and whole test within each subgroups

TR-USA TR-Gender USA-Gender

Common Whole Common Whole Common Whole

Booklets TR USA TR USA F M F M F M F M

B1 - B2 4 3 7 12 1 2 3 7 2 2 5 6

B2 - B3 5 3 8 13 1 2 6 6 2 2 6 6

B3 -B4 4 2 6 12 4 2 8 6 2 2 5 5

B4 - B5 3 5 8 10 3 2 8 6 1 1 3 4

B5 -B6 3 5 8 11 1 2 7 8 0 1 2 6

B6 - B7 5 4 9 14 3 4 6 7 1 4 2 6

B7 - B8 6 5 11 17 2 1 9 8 1 1 4 6

B8 - B9 6 5 11 16 4 3 6 5 2 1 3 3

B9 - B10 4 4 8 12 0 1 6 7 0 1 2 3

B10 - B11 2 2 4 10 2 3 3 8 0 1 2 4

B11 - B12 4 5 9 11 1 4 5 9 2 2 3 3

B12 - B13 5 4 9 12 2 2 4 7 1 0 3 4

B13 - B14 3 6 9 13 1 1 4 6 0 2 2 4

B14 - B1 5 6 11 12 1 3 3 6 1 2 5 6

As shown in Table 2, there are plenty of DIF items in each equating case between subgroups. For country

subgroups, while number of items favoring Turkey and USA seem more balanced in common tests, there

are more items favoring USA in the entire tests. For gender subgroups, while number of items favoring

male students is more than items favoring females, the number of DIF items favoring males and females

seem more balanced in both common and entire test for Turkey as well as USA.

The skewness, kurtosis, test difficulty and test discrimination index difference of booklets between

subgroups are given in Table 3.

Table 3. Differences between skewness, kurtosis, test difficulty and discrimination indexes

TR-USA TR-Gender USA-Gender

Booklets Skew Kurt Disc Diff Skew Kurt Disc Diff Skew Kurt Disc Diff

B1 - B2 0.287 0.715 0.096 0.619 0.301 0.695 0.734 0.039 0.048 0.21 0.035 0.01

B2 - B3 1.12 0.981 0.038 0.943 0.514 0.899 1.053 0.154 0.143 0.077 0.053 0.002

B3 -B4 1.169 1.054 0.262 0.792 0.314 1.077 1.034 0.043 0.066 0.01 0.013 0.003

B4 - B5 0.641 0.838 0.316 0.522 0.134 0.816 0.852 0.036 0.132 0.107 0.043 0.001

B5 -B6 0.28 0.731 0.24 0.491 0.55 0.579 0.888 0.309 0.167 0.206 0.009 0.022

B6 - B7 0.347 0.653 0.111 0.542 0.125 0.665 0.64 0.025 0.013 0.017 0.011 0.024

B7 - B8 0.204 0.65 0.089 0.561 0.157 0.633 0.659 0.026 0.061 0.233 0.036 0.013

B8 - B9 1.022 0.924 0.018 0.906 0.535 0.827 1.003 0.176 0.059 0.137 0.047 0.014

11

B9 - B10 1.022 0.968 0.167 0.801 0.196 1.019 0.936 0.083 0.08 0.041 0.001 0.001

B10 - B11 0.46 0.797 0.286 0.511 0.113 0.822 0.77 0.052 0.104 0.007 0.002 0.005

B11 - B12 0.339 0.665 0.111 0.554 0.764 0.492 0.858 0.366 0.178 0.095 0.003 0.044

B12 - B13 0.507 0.789 0.31 0.479 0.227 0.78 0.79 0.01 0.125 0.137 0.054 0.021

B13 - B14 0.294 0.689 0.189 0.5 0.194 0.75 0.638 0.112 0.032 0.031 0.038 0.025

B14 - B1 0.356 0.698 0.04 0.658 0.126 0.782 0.617 0.165 0.159 0.01 0.002 0.05

Note: Skew=Skewness; Kurt=Kurtosis; Disc=Discrimination index; Diff= Difficulty

In order to decide which features of tests are significantly related to equating error, the Sperman-Brown rho

correlations were calculated between REMSD values and DIF ratio, skewness, kurtosis, test difficulty and

test discrimination index difference for each group. The results indicated that there is a significant

relationship between REMSD and DIF ration in whole test (=.55, p=.041) and skewness difference (=.56,

p=.038) in Turkey-USA samples. In Turkey sample between gender subgroups, REMSD values are

significantly related to skewness difference (=.66, p=.010), while in USA sample gender subgroups,

REMSD values are significantly related to test difficulty difference (=.54, p=.045).

CONCLUSIONS

The results of the study indicated that equating invariance is not ensured for some booklets. These results

imply that students get lower equated scores as they belong to different subgroups (Huggins, 2014). Hence,

the evaluation of the scores under certain subgroups may lead misinterpretations as the equated scores are

not invariant.

The results also imply that REMSD values are related to different features of test in different groups. While

in Turkey and USA sample country subgroup invariance is related to the DIF ratio, it is not related in gender

subgroups both in Turkey and USA groups. The ratio of DIF items is higher in country subgroups than

gender subgroups. This may be interpreted as equipercentile equating method is robust to DIF until certain

ratio within equating invariance perspective. Skewness differences of subgroups is also a significant

variable related to REMSD gender subgroups in Turkey and country subgroups but not in gender subgroups

in USA. The possible reason behind this may be the skewness differences between gender subgroups in

USA is too small.

The results of the study are limited with the equipercentile equating method and defined subgroups. Future

studies may focus on how different equating methods behave under certain subgroups. Moreover, in order

to get a broader perspective related to possible reasons of invariance, simulation studies may be conducted.

REFERENCES

Dorans, N. J., & Holland, P. W. (2000). Population invariance and the equatabilityof tests: Basic theory

and the linear case. Journal of Educational Measurement, 37, 281-306.

Dorans, N. J. (2004). Using subpopulation invariance to assess test score equity. Journal of Educational

Measurement, 41, 43-68.

Huggins, A. C. (2014). The effect of differentialitem functioning in anchoritems on populationinvariance

of equating. Educational and Psychological Measurement, 74(4), 627658.

12

Paper ID: 58

THE COMPARISON OF ITEM AND ABILITY ESTIMATIONS

CALCULATED FROM THE PARAMETRIC AND NON-PARAMETRIC

ITEM RESPONSE THEORY ACCORDING TO THE SEVERAL FACTORS

Ezgi Mor Dirlik1, Nizamettin Ko2

1Kastamonu niversitesi, 2Ankara niversitesi (Emekli)

[email protected], [email protected]

Abstract: This study aimed to compare the item and ability parameters estimated from two different approaches of

Item Response Theory, which are parametric and non-parametric, according to the factors of test length, sample size

and the item psychometric qualities. The study was prepared in a basic research method and the data of the study is

obtained from the Trends of International Mathematic and Science (TIMSS) 2011 application. From the main data

set, 16 different data sets were formed according to the research questions and the conditions of the factors. The results

showed that all of the item parameters estimated according to the parametric and non-parametric models from the data

sets which are composed of different number of items and sample sizes were correlated significantly and highly.

Among of the abilities estimated by non-parametric models in all sample size and test length factors, it was concluded

that ability parameters are significantly and highly correlated. To sum up, it was found that both approaches had

resulted in so highly and significantly correlated item and ability parameters. This finding showed that if the

assumptions of parametric item response theory models are not met as a required level, the non-parametric alternative

models can be used in the conditions that studied in this research.

Keywords: Parametric Item Response Theory, Non-Parametric Item Response Theory, Mokken Scaling, Sample Size,

Test Length

SUMMARY

Introduction

Measurement of the abstract constructs has been one of the most difficult fields since the beginning

of the history of social sciences. There have been so many attempts to define and measure the properties

such as success, motivation, intelligence, attitudes and personality. Because of the complex structure of

these constructs, it has been already accepted that there is always a fair amount of error in these

measurements. However, in order to minimize the level of error, the theories of measurement have been

developed. The first measurement theory which has been known the most is the Classical Test Theory

(CTT) and has been used so many measurement applications thanks to easy structural features. Its

assumptions can be met easily for very kind of data sets and this is the main reason that made this theory

so popular. Despite of the popularity, CTT has some important drawbacks which limit the precision of the

measurement. By using CTT, only one value of the error, the standard error of measurement, can be

obtained but it is a fact that the amount of the error is not same for the all people in a sample. Also, it is so

difficult to adapt a test to the tester level in this theory. In order to propose solutions to these drawbacks,

the other measurement theory has been developed which is Item Response Theory(IRT). It is known as

latent trait theory and provides deeper knowledge not only about the testees performances but also the test

items and the constructs. It has made possible to create adaptive tests and to equate tests for the goal of

comparison of students performance. However, in order to get these advantages of the theory, there are

some requirements which should be met by the data sets. These requirements are large sample size, normal

distribution, local independence and uni-dimensionality. Especially for the classroom assessment, it is not

possible to reach large samples such as 1000 or 2000 generally and to apply as many items as in the high

stake tests. For these reasons, the advantages of IRT cannot be used for the small samples which are not

normally distributed. Because of the difficulty of these assumptions of IRT, it has been occurred new

approaches which are based on this theory but differentiate as for requirements. One of this new approach

13

is Nonparametric Item Response Theory, which is between CTT and Parametric Item Response Theory. It

can be applied for short tests and small samples like CTT but it is possible to get invariable item and ability

parameters like IRT. The nonparametric item response theory is a relatively new approach and has been

frequently used in health sciences. Because of the novelty of this field, there are limited numbers of studies

in literature and most of them have been focused on the theoretical perspectives. In this study, the

comparison of the parameters calculated from the parametric and nonparametric IRT has been purposed.

Method

By this study, it was aimed to compare the item and ability parameters estimated from two different

approaches of Item Response Theory, which are parametric and non-parametric, according to the factors of

test length, sample size and the item psychometric qualities. The study was prepared in a basic research

method and the data of the study is obtained from the Trends of International Mathematics and Science

(TIMSS) 2011 application. The mathematic booklet at eight grade level which had been determined as

unidimensional at most was used and the data from the first 20 countries were gathered. The final data set

of the research was composed of 7254 students from different countries. From the main data set, 16 different

data sets were formed according to the research questions and the conditions of the factors. As for sample

size factor, three different data sets were prepared consisted of 500, 1000 and 3000 students and sample

size conditions were crossed with the test length conditions which are five, 15 ve 25 items. From these data

sets, item and ability parameters were estimated by using the compatible parametric and non-parametric

item response models and the results were compared by using the relevant correlation coefficients.

According to the last factor of item psychometric quality factors, four different data sets were composed

and ability parameters were estimated from them and compared. Lastly, the reliability of the data sets were

investigated by using test information functions in parametric item response theory models and calculating

Cronbach Alpha, Lambda2, LCRC and MS coefficients in non-parametric item response theory models.

Results

The results showed that all of the item parameters estimated according to the parametric and non-

parametric models from the data sets which are composed of different number of items and sample sizes

were correlated significantly and highly. However, as for ability parameters, it was determined that there is

not consistency in the estimations according to the parametric models especially when the data sets are

composed of five and 15 items. However, it was found that the abilities estimated from 25 item data set is

correlated significantly and highly. Among of the abilities estimated by non-parametric models in all

sample size and test length factors, it was concluded that ability parameters are significantly and highly

correlated. Additionally, when the correlation of estimated ability parameters was examined between

parametric and non-parametric models, parametric models produced consistent with non-parametric models

only in the conditions of having 25 items or sample size is larger than 3000.

Discussion and Conclusion

In the scope of this work, the another factor whose effects were examined on ability parameters were

the levels of item discrimination and item difficulty parameters and according to this factor, it was observed

that the abilities estimated from the four data sets that were grouped in terms of low and high item

difficulties and item discrimination, were correlated highly and significantly. Accordingly, it was found

that both approaches had resulted in so highly and significantly correlated item and ability parameters. This

finding showed that if the assumptions of parametric item response theory models are not met as a required

level, the non-parametric alternative models can be used in the conditions that studied in this research.

REFERENCES

Baker, F. B. and Kim. S.H. (2004). Item response theory: Parameter estimation techniques (2nd Edition).

New York: Marcel Dekker.

Christensen, K. and Kreiner, S. (2010). Monte Carlo tests of the Rasch model based on scalability

coefficients. British Journal of Mathematical and Statistical Psychology, 63, 101-111.

Dyehouse, M. A. (2009). A comparison of model-data fit for parametric and nonparametric item response

theory models using ordinal level ratings. (Doctoral dissertation). Available from ProQuest

Dissertations and Theses database.

14

Embretson, S. E. and Reise, S. P. (2000). Item response theory for psychologists. Mahwah. NJ: Erlbaum.

Hambleton, R. K. and Swaminathan, H. (1985). Item response theory. Principles and applications. Boston:

Kluwer.

Junker, B. (2000). Some topics in nonparametric and parametric IRT. with some thougts about the future.

Carnegie Mellon University: Department of Statistics.

Koar, H. (2014). Madde tepki kuramnn farkl uygulamalarndan elde edilen parametrelerin ve model

uyumlarnn rneklem bykl ve test uzunluu asndan karlatrlmas. Yaymlanmam

doktora tezi. Hacettepe niversitesi Eitim Bilimleri Enstits.

Kujipers, R. E. Van der Ark, L. A. , Croon, M. A. and Sijtsma, K. (2016). Bias in point estimates and

standard errors of Mokkens scalability coefficients. Applied Pyschological Measurement. 40 (5),

331-345.

Loevinger, J. (1947). A systematic approach to the construction and evaluation of tests of ability.

Psychological Monographs. 61 (A).

Loevinger, J. (1948). The technic of homogeneous tests compared with some aspects of "scale analysis"

and factor analysis. Psychological Bulletin. 45 (6). 507-529.

Meijer, R. R. Sijtsma, K and Smidt, N. G. (1990). Theoretical and empirical comparison of the Mokken

and the Rasch approach to IRT. Applied Psychological Measurement. 14. 283298.

Meijer, R. R. Sijtsma, K., and Molenaar, I. W. (1995). Reliability estimation for single dichotomous items

based on Mokken's IRT model. Applied Psychological Measurement, 19 (4), 323-335.

Meijer, R.R. and Baneke, J.J. (2004). Analyzing psychopathlogy items: a case for nonparametric item

response theory modeling. Pyschological Methods, 9 (3), 354-368.

Meijer, R. R. Egberink, I.J.L., Emons, W.H.M. and Sijtsma, K. (2008). Detection and valiaditon of

unscalable item score patterns usingitem response theory: an isllustration with Harters self

perception profile for children. Journal of Personality Assessment, 90 (3), 227-238.

Meijer, R., Tendeiro, J., and Wanders, R. (2014). The use of nonparametric item response theory to explore

data quality. S. P. Reise & D. Revicki (Ed.). Handbook of item response theory modeling:

Applications to typical performance assessment. In S. P. Reise & D. Revicki (Eds.). Handbook of

item response theory modeling: Applications to typical performance assessment. (s. 85-110). New

York: Routledge

Mislevy, R. J. & Stocking, M. L. (1989). A consumers guide to LOJIST and BILOG. Applied

Psychological Measurement. 13. 57-75.

Mokken. R. J. and Lewis. C. (1982). A nonparametric approach to the analysis of dichotomous item

responses. Applied Psychological Measurement, 6 (A), 417-430.

Mokken, R. J. (1971). A theory and procedure of scale analysis with applications in political research.

New York. Berlin: Walter de Gruyter. Mouton.

Molenaar, I.W. ve Sijtsma. K. (1984).Internal conisistency and realibility in Mokkens nonparametric item

response model. Tijdshrift voor onderwijsresearch.9 (5). 257-268.

Molenaar, I. W. (1991). A weighted Loevinger H coefficient extending Mokken scaling to multicategory

items. KwantitatieMe Methoden. 37. 97 117.

Molenaar. I. W. (1997). Nonparametric models for polytomous responses. In W.J. Man der Linden and

R.K. Hambleton (Ed.). Handbook of modern item response theory (s. 369380). NewYork: Springer.

Molenaar, I. W. (2001). Thirty years of nonparametric item response theory. Applied Psychological

Measurement, 25 (3), 295-299.

Molenaar. I. W. (1997). Nonparametric models for polytomous responses. In W.J. Man der Linden and

R.K. Hambleton (Ed.). Handbook of modern item response theory (s. 369380). NewYork: Springer.

Sijtsma, K. and Junker, B.W. (2006). Item Response Theory: Past Performance. Present Developments

and Future Expectations. Behaviormetrika, 1, 75102.

Sijtsma, K. and Meijer, R. R. (2007). Nonparametric item response theory. C. R. Rao. & S. Sinharay (Ed.)

Handbook of statistics 26: psychometrics. (s.719-746). Amsterdam: Elsevier.
http://www.rug.nl/staff/r.r.meijer/researchhttp://www.rug.nl/staff/r.r.meijer/researchhttp://www.rug.nl/staff/r.b.k.wanders/researchhttp://www.rug.nl/research/portal/publications/the-use-of-nonparametric-item-response-theory-to-explore-data-quality-in-s-p-reise--d-revicki-eds-handbook-of-item-response-theory-modeling-applications-to-typical-performance-assessment(33e177c5-7a32-4f47-b8d8-7d0be43bdfd3).htmlhttp://www.rug.nl/research/portal/publications/the-use-of-nonparametric-item-response-theory-to-explore-data-quality-in-s-p-reise--d-revicki-eds-handbook-of-item-response-theory-modeling-applications-to-typical-performance-assessment(33e177c5-7a32-4f47-b8d8-7d0be43bdfd3).htmlhttp://www.rug.nl/research/portal/publications/the-use-of-nonparametric-item-response-theory-to-explore-data-quality-in-s-p-reise--d-revicki-eds-handbook-of-item-response-theory-modeling-applications-to-typical-performance-assessment(33e177c5-7a32-4f47-b8d8-7d0be43bdfd3).html

15

Sijtsma, K. and Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Newbury

Park. CA: Sage.

Straat, J. H. Van der Ark, L.A. and Sijtsma, K. (2014). Minimum sample size requirements for Mokken

scale analysis. Educational and Psychological Measurement, 74, 809-822.

engl Avar, A. (2015). ok Kategorili Puanlanan Maddelerin Psikometrik zellkilerinin Farkl Test

Koullarnda Parametrik Olmayan Madde Tepki Kuram Modellerine Gre ncelenmesi.

Yaymlanmam doktora tezi. Ankara niversitesi Eitim Bilimleri Enstits.

Van Schuur, W. H. (2011). Ordinal item response theory: Mokken scale analysis. Los Angeles: Sage

Publications.

16

Paper ID: 59

A COMPARISON OF KERNEL EQUATING METHODS UNDER NEAT

DESIGN

idem Akn Arkan

Ordu niversitesi [email protected]

Abstract: In this study, the equated score results of the Kernel equating (KE) methods which are chained and post-

stratification equipercentile and linear equating methods compared under NEAT design. TIMMS Science data was

used for this study. The study sample consisted of 865 eighth-grade examinees who were given the booklets 1 and 14

during the TIMSS application in Turkey. There were 39 items in booklet 1, and 38 items in booklet 14. Firstly,

descriptive statistics were calculated and then, the two booklets were equated according to the NEAT design based on

Kernel chained, Kernel post-stratification equipercentile, and linear equating methods. Secondly, equating methods

were evaluated according to some criteria to DTM, PRE, SEE, SEED and RMSD. It was seen that results based on

equipercentile and linear equating methods were consistent with each other, except high range of the score scale.

Finally, PRE values demonstrate that KE equipercentile equating methods better matches the discrete target

distribution Y and distribution of SEED reveals that KE equipercentile and linear methods are not significantly

different from each other according to DTM.

Keywords: Equating, Kernel equating, DTM, SEE, SEED, RMSD.
mailto:[email protected]

17

Paper ID: 64

MADDE TEPK KURAMI SMLASYONLARININ YAPI GEERL VE

GVENRLK AISINDAN NCELENMES

Nuri Doan1, Abdullah Faruk Kl2, brahim Uysal3 1Hacettepe niversitesi, 2Adyaman niversitesi, 3Abant zzet Baysal niversitesi

[email protected], [email protected], [email protected]

zet: Bu aratrmann amac Madde Tepki Kuramna (MTK) dayal olarak gerekletirilen simlasyon almalarnn

farkl paket ve programlardan elde edilen sonularnn geerlik ve gvenirlik asndan incelenmesidir. Bu amala R

program paketlerinden mirt, psych, irtoys ve catIrt paketleri ve MTKya dayal olarak gerekletirilen simlasyon

almalarnda sklkla kullanlan WinGen programndan elde edilen veri setleri KR20 gvenirlik katsays ve

amlayc faktr analiz (AFA) sonucunda elde edilen aklanan varyans oran, ortalama faktr yk ve very setlerinin

boyutluluu asndan karlatrlmtr. Aratrma Monte Carlo simlasyon almas olarak yrtlmtr.

Aratrmada simlasyon koullar; madde says (20, 30 ve 40), kullanlan MTK modeli (Rasch, 2 parametreli lojistik

model [2PLM] ve 3 parametreli lojistik model [3PLM]), veri retme program (WinGen 3.1 ile R paketlerinden irtoys,

mirt, psych ve catIrt) ve rneklem bykl (500, 1000 ve 2000) eklinde belirlenmi olup toplam 135 simlasyon

koulunda allm ve her bir koul iin 1000 replikasyon yaplmtr. Aratrma sonucunda tm MTK modellerine

gre retilen veri setlerinin genel olarak tek boyutlu olduu, aklanan varyans orannn Rasch modeli iin 0.30un

altnda kald, 2PLM iin bir koul hari dier tm koullarda programlarn 0.30un zerinde aklanan varyansa

sahip olduu, 3PLM iin aklanan varyans orannn genellikle 0.30un altnda olduu, ortalama faktr yklerinin ise

tm modeller ve programlar iin 0.46nn zerinde olduu gzlenmitir. Gvenirlik katsaylarnn tm koullar iin

0.75-0.93 aralnda olduu gzlenmitir. Aratrma bulgularna gre R paketlerinin MTK kapsamnda veri retiminde

simlayon koullarnn byk ksmnda aklanan varyans oran, ortalama faktr yk ve gvenirlik katsays

asndan daha iyi sonular elde ettii sylenebilir. Bu nedenle MTKya dayal simlasyon almalarnda R

paketlerinin tercih edilmesi nerilmektedir.

Anahtar Kelimeler: Simlasyon, Madde Tepki Kuram, Amlayc Faktr Analizi, aklanan varyans, gvenirlik.

GR

Simlasyon almalar eitimde ve psikolojide sklkla kullanlan almalardr. Feinberg ve Rubright

(2016) Applied Psychological Measurement ve Journal of Educational Measurement dergilerinde

2008-2012 yllar arasnda yaynlanan almalarn %60nn simlasyon almalar olduunu

belirtmektedir. Simlasyon almalar, veri analizi, istatiksel g ve ampirik almalardan doru

sonularn elde edilmesi iin kullanlacak yntemlere ilikin cevap bulmada aratrmaclara yardmc

olmaktadr (Hallgren, 2013). Bu nedenle de aratrmalarda sklkla kullanld sylenebilir.

Madde Tepki Kuram (MTK) eitim ve psikoloji alannda sklkla kullanlan teorilerden biridir. Bu nedenle

MTKya ynelik simlasyon almalarna sklkla rastlanmaktadr. MTKya ynelik olarak Monte Carlo

simlasyon almas yrtmek iin birok program ve programlama dili bulunmaktadr. IRTPRO [(Cai,

Thissen ve du Toit, 2011), flexMIRT (Cai, 2017), BMIRT (Yao, 2003) ve Mplus (Muthn ve Muthn,

2012) gibi programlar cretliyken WinGen (Han, 2007) gibi programlar ya da R (R Core Team, 2017) gibi

yazlm dilleri cretsizdir.

Aratrmaclar sklkla bu programlar araclyla Monte Carlo simlasyon almalar yrtmekte ve

aratrma bulgularna gre nerilerde bulunmaktadr. Ancak bu programlarn rettii veri setlerinin MTK

varsaymlar ve psikometrik zellikleri asndan incelenmedii de aratrmalarda gzlenmektedir. Mevcut

almada cretsiz olmas ve birok MTK almasnda kullanlmas nedeniyle Wingen 3.1 program ve R

programndan 4 farkl paket kullanlarak MTKya dayal olarak retilen veri setlerinin yap geerlii ve

gvenirliiyle ilgilenilmitir. Bylece aratrmalardan elde edilen bulgularn snrlar daha net

izilebilecektir.
mailto:[email protected]

18

Aratrmada MTKya dayal olarak yrtlen Monte Carlo simlasyon almalarnda sklkla kullanlan

programlardan elde edilen veri setlerinin yap geerlii ve gvenirlii nasldr? sorusuna cevap aran ve

bu problemle ilikili olarak

1. Monte Carlo simlasyonlarnda kullanlan WinGen 3.1, irtoys, mirt, psych ve catIrt programlarndan elde edilen veri setlerinin boyutluluu nasldr?

2. Monte Carlo simlasyonlarnda kullanlan WinGen 3.1, irtoys, mirt, psych ve catIrt programlarndan elde edilen veri setlerinin aklanan varyans oran nedir?

3. Monte Carlo simlasyonlarnda kullanlan WinGen 3.1, irtoys, mirt, psych ve catIrt programlarndan elde edilen veri setlerinin ortalama faktr yk nedir?

4. Monte Carlo simlasyonlarnda kullanlan WinGen 3.1, irtoys, mirt, psych ve catIrt programlarndan elde edilen veri setlerinin gvenirlik katsaylar nasldr?

alt problemlerine yant aranmtr.

METOT

Aratrmada Madde Tepki Kuramna (MTK) gre yrtlen Monte Carlo simlasyon almalarnda

retilen veri setlerinin yap geerlii ve gvenirlii incelendiinden Monte Carlo simlasyon almas

olarak yrtlmtr. Simlasyon almalar eitim ve psikoloji alannda sklkla kullanlan (Feinberg ve

Rubright, 2016; Harwell, Stone, Hsu ve Kirisci, 1996) ve olas ihtimallerin deerlendirilmesine imkan

tanyan almalardr (Gilbert, 1999).

Aratrmada simlasyon koullar; madde says (20, 30 ve 40), kullanlan MTK modeli (Rasch, 2

parametreli lojistik model [2PLM] ve 2 parametreli lojistik model [3PLM]), veri retme program (WinGen

3.1 (Han, 2007) ile R paketlerinden irtoys (Partchev, 2016), mirt (Chalmers, 2012), psych (Revelle, 2018)

ve catIrt (Nydick, 2015)) ve rneklem bykl (500, 1000 ve 2000) eklinde belirlenmi olup toplam

3x3x5x3=135 simlasyon koulunda allm ve her bir hcre iin 1000 replikasyon yaplmtr.

Replikasyonlar sonucunda 135000 veri seti retilmitir. Aratrmada Tek Boyutlu Madde Tepki Kuram

temel alnmtr. Bu nedenle tm veri setleri tek boyutlu olacak ekilde retilmitir. Ayrca aratrmada

sadece ikili puanlanan veriler incelenmitir. ok kategorili veri bu aratrma dnda braklmtr.

MTK temelinde yrtlen simlasyon almalarnda madde parametrelerini dalmlar almalara gre

farkllamaktadr. Bu almada Bulut ve Snbl (2017) tarafndan kullanlan madde parametrelerinin

dalmlar kullanlmtr. Buna gre 3 PLM iin a parametresi, ortalamas 0.3 ve standart sapmas 0.2 olan

lognormal dalm, b parametresi ortalamas 0, standart sapmas 1 olan normal dalm ve c parametresi de

ekil parametreleri a=20 ve b=90 olan beta dalmndan retilmitir. 2PLM iin c parametresi 0 olarak

tanmlanmtr. Rasch model iin de a parametresi 1 ve c parametresi 0 olarak tanmlanmtr. Yetenek

parametresi ise ortalamas 0, standart sapmas 1 olan normal dalmdan retilmitir.

Veri setlerinin boyutluluunun belirlenmesi iin en kk ortalamal ksmi korelasyon (Minimum Avarage

Partial [MAP]) analizi yaplmtr. Aklanan varyans oran ve ortalama faktr yk iin amlayc faktr

analizi (AFA) yaplmtr. Bunun iin en kk kalntlar (minimum residual [minres]) faktr karma

yntemi kullanlm ve tetrakorik korelasyon matrisiyle analizler yrtlmtr. Ortalama faktr yknn

belirlenmesinde tm veri setleri tek boyutlu retildiinden tm maddelerin birinci faktre ait faktr

yklerinin ortalamas alnmtr. Bu ilem her bir kouldaki 1000 replikasyon iin yaplm olup 1000

replikasyon sonucu elde edilen ortalama deer raporlanmtr. Gvenirlik katsaylarnn belirlenmesi iin

KR20 gvenirlik katsays hesaplanmtr. Bunun iin her bir bir koul iin retilen 1000 veri setinin KR20

deerlerinin ortalamas alnmtr. Analizlerin tm R programnda psych (Revelle, 2018) paketi

kullanlarak gerekletirilmitir.

BULGULAR

Aratrma sonucunda elde edilen bulgular Tablo 1de sunulmu olup alt problemlere gre yorumlanmtr.

Tablo 1. Programlardan Elde Edilen Veri Setlerinin Aklanan Varyans Oran, Ortalama Faktr

Yk ve Gvenirlii

Rasch

Model

2PLM 3PLM

19

Mad

de

rn

ekle

m

Pro

gra

m

AV

O

OF

Y

G

ven

irli

k

AV

O

OF

Y

G

ven

irli

k

AV

O

OF

Y

G

ven

irli

k

20 500 mirt 0.261 0.508 0.788 0.405 0.629 0.862 0.304 0.544 0.815

psych 0.265 0.512 0.790 0.403 0.627 0.850 0.286 0.524 0.792

WinGen 0.246 0.493 0.765 0.299 0.542 0.805 0.267 0.503 0.753

irtoys 0.263 0.509 0.788 0.399 0.624 0.847 0.289 0.528 0.794

catIrt 0.263 0.510 0.789 0.399 0.624 0.847 0.289 0.528 0.794

20 1000 mirt 0.261 0.510 0.790 0.400 0.626 0.862 0.293 0.535 0.803

psych 0.265 0.513 0.793 0.378 0.608 0.843 0.232 0.470 0.753

WinGen 0.246 0.494 0.766 0.350 0.588 0.837 0.281 0.519 0.770

irtoys 0.261 0.510 0.790 0.399 0.625 0.853 0.233 0.470 0.754

catIrt 0.261 0.510 0.790 0.398 0.625 0.853 0.232 0.469 0.753

20 2000 mirt 0.243 0.492 0.770 0.374 0.608 0.848 0.290 0.533 0.805

psych 0.261 0.510 0.785 0.364 0.601 0.829 0.243 0.487 0.771

WinGen 0.255 0.504 0.775 0.343 0.582 0.834 0.272 0.512 0.766

irtoys 0.243 0.492 0.769 0.366 0.602 0.830 0.249 0.493 0.777

catIrt 0.243 0.492 0.769 0.366 0.602 0.830 0.248 0.492 0.776

30 500 mirt 0.263 0.509 0.842 0.393 0.622 0.900 0.252 0.495 0.835

psych 0.261 0.507 0.841 0.369 0.602 0.884 0.241 0.474 0.813

WinGen 0.245 0.492 0.834 0.388 0.617 0.893 0.251 0.493 0.827

irtoys 0.264 0.511 0.844 0.387 0.617 0.891 0.247 0.478 0.817

catIrt 0.265 0.511 0.844 0.388 0.618 0.891 0.246 0.477 0.816

30 1000 mirt 0.269 0.517 0.848 0.367 0.602 0.891 0.258 0.490 0.829

psych 0.260 0.508 0.842 0.364 0.600 0.881 0.232 0.469 0.803

WinGen 0.251 0.499 0.839 0.387 0.616 0.893 0.255 0.498 0.831

irtoys 0.270 0.518 0.848 0.364 0.599 0.881 0.228 0.465 0.799

cat

Documents

6TH INTERNATIONAL CONGRESS ON MEASUREMENT ANDc-meep2018.hku.edu.tr/Content/files/CMEEP2018-Proceedings.pdf · İlker Kalender, PhD. İsmail Karakaya, PhD. Kadriye Belgin Demirus,