456

Click here to load reader

6TH INTERNATIONAL CONGRESS ON MEASUREMENT ANDc-meep2018.hku.edu.tr/Content/files/CMEEP2018-Proceedings.pdf · İlker Kalender, PhD. İsmail Karakaya, PhD. Kadriye Belgin Demirus,

Embed Size (px)

Citation preview

  • i

    6TH INTERNATIONAL CONGRESS ON MEASUREMENT AND

    EVALUATION IN EDUCATION AND PSYCHOLOGY

    September 5-8, 2018

    Prizren, KOSOVO

    The responsibility of the contents of this book belongs to the authors of the papers.

    September, 2018

  • ii

    INTRODUCTION

    Proceedings, conferences, and panel discussions were held at the 6th International

    Conference on Measurement and Evaluation in Psychology at Prizren University in Kosovo. The

    Congress was hosted by Ankara University for the first time in 2008. It has gained an international

    quality this year since this is the first time Congress has organized abroad, in Kosovo.

    Among the participants of Congress were the Ministry of Education representatives from

    Turkey, academics who are working in different universities in Turkey, testing and assessment

    professionals and graduate students. 122 researchers participated in the Congress.

    In cooperation with Prizren University of Kosovo, in the opening of the Congress included

    a large protocol consisting of the Kosovo Turkish Democratic Party MP, the Mayor of Prizren, the

    Turkish Consulate in Prizren, the Rector of the Prizren University, the Dean of the Faculty of

    Education and the Vice Dean, the representative of the Kosovo Yunus Emre Institute and the

    representative of the Kosovo Military Representative Committee. Protocol, stated that they were

    happy to welcome guests from Turkey and emphasized that the importance of education and

    cooperation. Keynote speakers of the Congres were Hasan Kalyoncu University Vice Rector, Dean

    of Faculty of Education, Chairman of the Board of Directors of EPODDER (Association of

    Educational and Psychological Assessment and Evaluation).

    A total of 12 faculty members working in 7 different countries took part in the scientific

    committee of the congress organized by Prizren University, Hasan Kalyoncu University, and

    EPODDER. Two invited speakers attended the congress. Congress included lectures titled as

    Using Unidimensional Models to Represent a Two Dimensional Latent Space by Prof. Dr. Terry

    Ackerman and Strategies for Addressing High Dimensional Cognitive Diagnostic Assessment

    Problems by Jimmy De La Torre.

    A total of 105 papers about large-scale tests were presented in 28 different sessions. Most

    of them focused on the preparation, scoring, application, and evaluation of tests. These studies

    were carried out on the basis of international scale tests such as PISA, TIMSS, as well as national

    large-scale tests such as TEOG and SBS. The presentations were made in two languages, English

    and Turkish. In a panel regarding the problems and solutions experienced in the transition between

    the grades in Turkey were discussed.

    We want to express our gratitude to the Pegem Academy, Nobel Publishing House, Yunus

    Emre Institute, TIKA, Ministry of National Education, Evaluation and Examination Services. We

    also thank the members of the Congress Organizing Committee and the members of the

    Scientific Committee who have contributed to the organization of our congress.

    Co-Presidents of the Congress Organizing Committee

    Prof. ener Bykztrk Prof. Selahattin Gelbal

  • iii

    CONGRESS HONORARY MEMBERS

    Prof. Dr. Tamer Ylmaz, Hasan Kalyoncu University

    Prof. Dr. Ram Vataj, Prizren University

    CO-CHAIRS OF THE CONGRESS ORGANIZATION BOARD

    Prof. Dr. ener Bykztrk, Hasan Kalyoncu University

    Prof. Dr. Selahattin Gelbal, Hacettepe University

    CONGRESS ORGANIZATION BOARD

    Assoc. Prof. Ismet Temaj, Prizren University

    Assistant Prof. Shemsi Morina, Prizren University

    Assoc. Prof. smail Karakaya, Gazi University

    Assoc. Prof. Alper Kse, Bolu Abant Izzet Baysal University

    Assistant Prof. Ufuk Akba, Hasan Kalyoncu University

    Assistant Prof. Ersoy Karabay, Hasan Kalyoncu University

    Assistant Prof. Soner Yldrm, Prizren University

    Meral Alkan, PhD, Ministry of National Education

    Serdan Kervan, PhD, Prizren University

    SCIENCE COMMITTEE*

    Ali Baykal, Bahcesehir University

    Cindy M. Walker, Duquesne University

    Hlya Keleciolu, Hacettepe University

    Ismet Temaj, Prizren University

    Jimmy De La Torre, The University of Hong Kong

    Kadriye Ercikan, University of British Columbia

    Mehtap akan, Gazi University

    Ron Hambleton, University of Massachusetts Amherst

    Selahattin Gelbal, Hacettepe University

    ener Bykztrk, Hasan Kalyoncu University

    Terry Ackerman, The ACT

    Zekeriya Nartgn, Abant zzet Baysal University

    *Sorted in alphabetical order.

    SECRETARY

    Ayfer Sayn, PhD, Gazi University

    Research Assistant Merve Yldrm Seheryeli, Hasan Kalyoncu University

  • iv

    WEB DESIGN AND DEVELOPMENT

    Software Developer Turul Trpan, Hasan Kalyoncu University

    REVIEWERS/REFEREES*

    Adnan Erku, PhD.

    Ahmet Salih imek, PhD.

    Ahmet Turhan, PhD.

    Ahmet Yldrm, PhD.

    Asiye engl Avar, PhD.

    Ayfer Sayn, PhD.

    Aylin Albayrak Sar, PhD.

    Ayegl Altun, PhD.

    Bahar ahin Sarkn, PhD.

    Bayram Bak, PhD.

    Bilge Gk Bekci, PhD.

    Burak Aydn, PhD.

    Burhanettin zdemir, PhD.

    C. Deha Doan, PhD.

    Cem Oktay Gzeller, PhD.

    idem Akn Arkan, PhD.

    Deniz Tue zmen, PhD.

    Derya akc Eser, PhD.

    Derya obanolu Aktan, PhD.

    Devrim zdemir Alc, PhD.

    Dilara Bakan Kalaycolu, PhD.

    Durmu zba, PhD.

    Duygu Anl, PhD.

    Duygu Gizem Saral Ertoprak, PhD.

    Duygu Koak, PhD.

    Elif Bengi nsal zberk, PhD.

    Emine Burcu Pehlivan Tun, PhD.

    Emine nen, PhD.

    Emrah Gl, PhD.

    Emre Toprak, PhD.

    Eren Can Aybek, PhD.

    Eren Halil zberk, PhD.

    Ersoy Karabay, PhD.

    Esin Tezbaaran, PhD.

    Esin Ylmaz Koar, PhD.

    Esra Eminolu zmercan, PhD.

    Ezel Tavancl, PhD.

    Ezgi Mor Dirlik, PhD.

    Fatih Kezer, PhD.

    Fazilet Tademir, PhD.

    Fethi Toker, PhD.

    Funda Nalbantolu Ylmaz, PhD.

  • v

    Gizem Uyumaz, PhD.

    Gl ekerciolu, PhD.

    Glin rak Uzun, PhD.

    Glden Kaya Uyank, PhD.

    Glen Tadelen Teker, PhD.

    Hakan Atlgan, PhD.

    Hakan Koar, PhD.

    Hakan Yavuz Atar, PhD.

    Halil Yurdugl, PhD.

    Hamide Deniz Gllerolu, PhD.

    Hatice Kumanda, PhD.

    Hlya Keleciolu, PhD.

    Hlya Yrekli, PhD.

    . Alper Kse, PhD.

    lhan Akhun, PhD.

    lker Kalender, PhD.

    smail Karakaya, PhD.

    Kadriye Belgin Demirus, PhD.

    Kbra Atalay Kabasakal, PhD.

    Levent Yakar, PhD.

    Lokman Akbay, PhD.

    Mehtap akan, PhD.

    Melek Glah ahin, PhD.

    Meltem Acar Gvendir, PhD.

    Meral Alkan, PhD.

    Murat Akyldz, PhD.

    Murat Doan ahin, PhD.

    Mnevver Baman, PhD.

    Nagihan ztrk Boztun, PhD.

    Nee Gler, PhD.

    Nee ztrk Gbe, PhD.

    Nilfer Kahraman, PhD.

    Nizamettin Ko, PhD.

    Nkhet Demirtal, PhD.

    Okan Bulut, PhD.

    mer Kutlu, PhD.

    mr Kaya Kalkan, PhD.

    nder Snbl, PhD.

    zge Bal Altnta, PhD.

    zge Bkmaz Bilgen, PhD.

    Ragp Terzi, PhD.

    Recep Serkan Ark, PhD.

    Sakine Ger ahin, PhD.

    Satlm Tekindal, PhD.

    Seil mr Snbl, PhD.

    Sedat en, PhD.

    Seher Yaln, PhD.

    Selahattin Gelbal, PhD.

    Sema Sulak, PhD.

    Sevda etin, PhD.

  • vi

    ener Bykztrk, PhD.

    eref Tan, PhD.

    eyma Yksel, PhD.

    Tlin Acar, PhD.

    Trker Toker, PhD.

    Ufuk Akba, PhD.

    mit elen, PhD.

    Yaar Baykul, PhD.

    Yavuz Akpnar, PhD.

    Yeim zer zkan, PhD.

    Yurdagl Gnal, PhD.

    Zekeriya Nartgn, PhD.

    *Sorted in alphabetical order.

  • vii

    CONTENTS

    INTRODUCTION ........................................................................................................................................ ii

    CONGRESS HONORARY MEMBERS .................................................................................................... iii

    CO-CHAIRS OF THE CONGRESS ORGANIZATION BOARD ............................................................ iii

    CONGRESS ORGANIZATION BOARD .................................................................................................. iii

    SCIENCE COMMITTEE* .......................................................................................................................... iii

    SECRETARY .............................................................................................................................................. iii

    WEB DESIGN AND DEVELOPMENT .................................................................................................... iv

    REVIEWERS/REFEREES* ....................................................................................................................... iv

    CONTENTS ............................................................................................................................................... vii

    SUMMARY ..................................................................................................................................................1

    AN ANALYSIS ON THE MODEL FIT INDICES IN A SCALE CONSTRUCTION PROCESS .............2

    KOMPOZSYON PUANLAMADA PUANLAYICILAR ARASI GVENRLK BELRLEME

    YNTEMLERNN KARILATIRILMASI .............................................................................................7

    INVESTIGATION of SUBGROUP INVARIANCE OF EQUATING FUNCTIONS .................................8

    THE COMPARISON OF ITEM AND ABILITY ESTIMATIONS CALCULATED FROM THE

    PARAMETRIC AND NON-PARAMETRIC ITEM RESPONSE THEORY ACCORDING TO THE

    SEVERAL FACTORS ................................................................................................................................ 12

    A COMPARISON OF KERNEL EQUATING METHODS UNDER NEAT DESIGN ............................ 16

    MADDE TEPK KURAMI SMLASYONLARININ YAPI GEERL VE GVENRLK

    AISINDAN NCELENMES ................................................................................................................... 17

    ANALYSING TEACHERS VIEWS ON THE PROBABLE NEGATIVE EFFECTS OF HIGH STAKE

    TESTS THROUGH PAIRED COMPARISONS BASED ON RASCH ANALYSIS ................................ 22

    DORULAYICI FAKTR ANALZ VE OK DZEYL MODELLEME ZERNE BR

    KARILATIRMA: DENZL L RNE ............................................................................................. 29

    APPLICABILITY OF THE THEORETICAL EXAM FOR CANDIDATES OF DRIVING LICENCE

    TRAINEES AS THE COMPUTER ADAPTIVE TEST* ........................................................................... 36

    THE IMPACT OF COMPLEX MULTIPLE CHOICE ITEMS ON ITEM DIFFICULTY ........................ 41

    COMPARING PERFORMANCE OF DIFFERENT EQUATING METHODS in PRESENCE and

    ABSENCE of DIF ITEMS in ANCHOR TEST .......................................................................................... 42

    DINA MODEL EREVESNDE DIF BELRLENMES ........................................................................ 43

    CNSYETN PROBLEML NTERNET KULLANIMI ZERNDEK ETKS: BR META-ANALZ

    ALIMASI ............................................................................................................................................... 46

    TRK RENCLERN BLG VE LETM TEKNOLOJLERNE ERM VE KULLANIM

    DZEYLERNN PISA FEN BAARISINA ETKS ............................................................................... 47

    PISA FEN MADDELERNN FeTeMM ETMNN ZELLKLER AISINDAN NCELENMES

    ..................................................................................................................................................................... 62

    COMPARING THE PERFORMANCE OF DATA MINING METHODS IN CLASSIFYING

    SUCCESSFUL STUDENTS WITH SCIENTIFIC LITERACY IN PISA 2015* ...................................... 68

  • viii

    AKADEMK YILMAZLIK LENN TRKEYE UYARLAMA ALIMASI ............................. 76

    TEST EQUATING WITH BAYESIAN NONPARAMETRIC MODEL ................................................... 77

    BLMSEL AKIL YRTME TESTNN TRKEYE UYARLANMASI VE PSKOMETRK

    ZELLKLERNN BELRLENMES ...................................................................................................... 82

    DERECEL PUANLANMI LEKLER N PUANLAYICILARIN DEERLENDRLMESNDE

    YAPISAL ETLK YAKLAIMININ KULLANILMASI ...................................................................... 83

    MADDE TAKIMINDA YER ALAN MADDELERDE YEREL BAIMLILIIN MADDE TAKIMI

    TEPK MODEL LE NCELENMES ...................................................................................................... 86

    MATEMATK RETM KAYGISI LE GVENRLK VE GEERLK ALIMASI ............. 91

    AN ANALYSIS OF ENGLISH TEST IN 2016 YEAR UNDERGRADUATE PLACEMENT EXAM IN

    TERMS OF DIFFERENTIAL ITEM FUNCTIONING ............................................................................. 99

    AN INVESTIGATION OF FOREIGN LANGUAGE PRODUCTION OF CABIN CREW CANDIDATES

    DURING THEIR EMPLOYMENT PROCESS ........................................................................................ 104

    RENCLERN SBS-MATEMATK BAARILARINI ETKLEYEN DEKENLERN VE OKUL

    KATMA DEERNN HYERARK LNEER MODELLEME ANALZ YOLUYLA

    BELRLENMES* .................................................................................................................................... 107

    K KATEGORL PUANLANAN MADDELERDE MADDE TEPK KURAMINA DAYALI

    PARAMETRE KESTRM: BILOG, Mplus, ltm KARILATIRMASI ............................................... 113

    RETMENLERIN KAYNATIRMA UYGULAMALARINDAKI LME VE DEERLENDIRME

    YETERLIKLERI VE PROBLEMLERI ................................................................................................... 114

    TREND ANALYSIS OF TURKEY IN INTERNATIONAL EDUCATION SURVEYS: USING TIMSS

    AND PISA RESULTS .............................................................................................................................. 118

    TEK UYGULAMAYA DAYALI SINIFLAMA TUTARLILII VE SINIFLAMA DORULUU

    YNTEMLERNN KARILATIRILMASI ......................................................................................... 119

    GRSEL METN TEMELL YENLK MADDE FORMATI N TASARLANAN TEST

    SKALASININ MADDE AYIRT EDCLNN NCELENMES ........................................................ 125

    BYK RNEKLEMLERDE KAYIP VER LE BA ETME YNTEMLERNN LME

    DEMEZLNE ETKS AISINDAN KARILATIRILMASI .................................................... 130

    KOMPLEKS OKTAN SEMEL SORULARIN SEENEK SAYILARININ SORUNUN

    PSKOMETRK ZELLKLERNE ETKS .......................................................................................... 138

    OKLU DORUSAL REGRESYON VE GENELLETRLM TOPLAMSAL MODELLERN

    AIKLANAN VARYANS SONULARININ KARILATIRILMASI ............................................... 139

    SEM TEORS TEMELL KARYER DEERLER LE: GEERLK VE GVENRLK

    ALIMALARI ........................................................................................................................................ 140

    THE ILLUSTRATION OF VALIDITY ARGUMENTS WITH DIFFERENT CONFIRMATORY

    FACTOR ANALYSIS MODELS ............................................................................................................. 147

    PISA 2003 VE 2012 MATEMATK OKURYAZARLII PUANLARININ LT GEERL:

    BEKLENT TABLOLARI VE UYUM ANALZ ................................................................................... 151

    ITEM ORDER EFFECT ON STUDENT PERFORMANCE IN LARGE-SCALE TESTING ................ 154

    A COMPARISON OF SOFTWARE PACKAGES AVAILABLE FOR DINA MODEL ESTIMATION

    ................................................................................................................................................................... 158

  • ix

    AN EVALUATION OF 4PL IRT AND DINA MODELS FOR ESTIMATING PSEUDO-GUESSING

    AND SLIPPING PARAMETERS ............................................................................................................ 159

    A NON-DIAGNOSTIC ASSESSMENT FOR DIAGNOSTIC PURPOSES: Q-MATRIX VALIDATION

    AND ITEM-BASED MODEL FIT EVALUATION FOR THE TIMSS 2011 ASSESSMENT .............. 163

    POZTF DUYGU DURUM LENN PSKOMETRK ZELLKLERNN

    DERECELENDRLM TEPK MODEL LE NCELENMES .......................................................... 164

    BREYSELLETRLM BLGSAYARLI TEST VE BREYSELLETRLM OK AAMALI

    TESTLERN PERFORMANSLARININ NCELENMES ...................................................................... 171

    TIMSSTE DNGDE YER ALAN MADDELER N MADDE KAYMASININ NCELENMES

    ................................................................................................................................................................... 172

    NVERSTE RENCLERNN BAARILARININ DEERLENDRLMESNDE E-PORTFOLYO

    KULLANIMINA LKN GRLERN BELRLENMES ............................................................... 178

    BAYESIAN VE NONBAYESIAN KESTRM YNTEMLERNE DAYALI SINIFLAMA

    DORULUU NDEKSLERNN FARKLI RNEKLEM KOULLARINDA NCELENMES ........ 184

    A COMPARISON OF MIXED EFFECTS LOGISTIC REGRESSION AND LOGISTIC REGRESSION

    ................................................................................................................................................................... 185

    21. YZYIL ETMNDE NE IKAN RENC YETERLKLER, SINIF LME VE

    DEERLENDRME ANLAYIININ DNM VE GELECE* ................................................. 188

    DUYUSAL ZELLKLER, ETM KAYNAKLARI VE OKUL KLMNN FEN BAARISINA

    ETKS: PISA 2015 .................................................................................................................................. 193

    INVESTIGATION THE EFFECTS OF ITEM AND STUDENT RELATED VARIABLES ON TURKISH

    STUDENTS MATHEMATICS PERFORMANCE* .............................................................................. 197

    NANOTEKNOLOJ FARKINDALIK LENN TRK KLTRNDEK GEERLK ve

    GVENRLK ALIMASI .................................................................................................................... 204

    ULUSLARARASI RENC DEERLENDRME PROGRAMI (PISA-2015) FEN OKURYAZARLII

    TEST MADDELERNN KLTRLERARASI LME DEMEZLNN NCELENMES ...... 213

    OKBOYUTLU AAMALI TEPK KURAMI ALTINDA RETLEN TEK BML VE TEK BML

    OLMAYAN VER GRUPLARINDA OLABLRLK ORANI YNTEM I. TP HATA VE G

    ALIMASI ............................................................................................................................................. 221

    MEB TARAFINDAN RETMENLERE NERLEN ETM TEMALI FLMLERN ERDKLER

    LME VE DEERLENDRME ELERNE GRE NCELENMES ............................................. 226

    INVESTIGATING DIFFERENTIAL ITEM FUNCTIONING OF THE ANKARA UNIVERSITY

    EXAMINATION FOR FOREIGN STUDENTS BY RECURSIVE PARTITIONING ANALYSIS IN THE

    RASCH MODEL ...................................................................................................................................... 234

    YEREL BAIMSIZLIK VARSAYIMININ HLAL DURUMUNDA PARAMETRK VE

    PARAMETRK OLMAYAN MADDE TEPK KURAMI MODELLERNDEN KESTRLEN MADDE

    PARAMETRELERNN KARILATIRILMASI .................................................................................. 239

    COMPARISON OF TEACHER PERFORMANCE BY ANSWERS FROM DIFFERENT COUNTRY

    STUDENTS .............................................................................................................................................. 244

    COMPARISON OF THE CLASSIFICATION PERFORMANCE OF COMPUTER ADAPTIVE

    TESTING AND MULTI-STAGE COMPUTER ADAPTIVE TESTING ................................................ 248

  • x

    RTK SINIF ANALZ MPLUS UYGULAMASI: PSKOLOJK DAYANIKLILIK ZERNE BR

    RNEK ..................................................................................................................................................... 251

    FEN Z YETERL YAPISININ OK GRUPLU DFA YNTEM LE LME DEMEZLNN

    NCELENMES ........................................................................................................................................ 257

    GRME ENGELL RENCLERE YNELK ULUSAL SINAV UYGULAMALARININ RENC

    VE RETMENLER TARAFINDAN DEERLENDRLMES .......................................................... 263

    GZL DEKEN ORTALAMA YAPILARININ DEMEZLNN ORTALAMA VE

    KOVARYANS YAPILARI ANALZ YAKLAIMIYLA TEST EDLMES ....................................... 276

    DAILIM ARPIKLIININ OMEGA NDEKSNN PERFORMANSI ZERNDEK ETKLERNN

    ETL KOULLAR ALTINDA NCELENMES ............................................................................... 284

    M4 STATSTNN CEVAPLAMA BENZERL BELRLEME PERFORMANSININ

    NCELENMES ........................................................................................................................................ 287

    RENCLERN BAZI ZELLKLERNE GRE SAYISAL VE SZEL YETENEKLER LE

    OLUTURULAN LME MODELNN DEMEZLNN KOVARYANS YAPILARI ANALZ

    YAKLAIMIYLA NCELENMES ......................................................................................................... 293

    OK BOYUTLU TESTLERDE DEEN KME FONKSYONUNUN BELRLENMES ............... 300

    KARAR AALARININ SINIFLANDIRMA PERFORMANSLARININ BAIMSIZ

    DEKEN TRNE GRE KARILATIRILMASI ......................................................................... 301

    ORTAOKUL RETM PROGRAMLARINDA YER ALAN KAZANIMLARIN BLSEL

    DZEYLERE DAILIMININ NCELENMES ..................................................................................... 306

    KAYIP VER ATAMA YNTEMLERNN CRONBACH ALFA VE TABAKALI ALFA GVENRLK

    KATSAYILARI ZERNDEK ETKSNN NCELENMES ............................................................... 314

    PISA PROBLEM ZME BECERLERNN PISADAK MATEMATK, OKUMA VE FEN

    BAARISINA ETKSNN NCELENMES ........................................................................................... 325

    MADDE TEPK KURAMINA DAYALI TEST ETLEME YNTEMLERNN

    KARILATIRILMASI: PISA 2012 FEN TEST RNE ................................................................... 327

    OK BOYUTLU TEST YAPILARINDA GVENIRLIIN NCELENMESI ...................................... 328

    KARMA DAILIMLARDA MADDE TEPK KURAMI MODELLER LE MADDE PARAMETRE

    KAYMASININ NCELENMES .............................................................................................................. 333

    CHANGES IN TRANSITION SYSTEM FROM BASIC EDUCATION TO SECONDARY

    EDUCATION: A HISTORICAL, DESCRIPTIVE AND CAUSAL VIEW ............................................ 336

    ETM ARATIRMALARINDA ELM PUANI ELETRME YNTEMNN KULLANILMASI

    ................................................................................................................................................................... 340

    EVALUATION OF HUMAN RESOURCES MANAGEMENT CERTIFICATE PROGRAM .............. 345

    DEEN MADDE FONKSYONUNUN MN VE MD ORTAK TESTE ETKS ........................... 350

    ZEKA LEKLERNDE KLASK TEST VE MADDE TEPK KURAMINA GRE PUANLAMA

    YNTEMLERNN KARILATIRILMASI (KBIT-2 RNE) ......................................................... 351

    TEST ETLEMEDE MADDE TEPK KURAMINA DAYALI YETENEK PARAMETRESNE

    YNELK LEK DNTRME YNTEMLERNN KARILATIRILMASI ............................ 354

    STANDARD SETTING IN EFL WRITING ASSESSMENT WITH MANY-FACET RASCH

    MEASUREMENT MODEL ..................................................................................................................... 355

  • xi

    MADDE/MODL SEME YNTEMLERININ KARILATIRILMASI: BIREYSELLETIRILMI

    BILGISAYARLI TEST VE BIREYSELLETIRILMI BILGISAYARLI OK AAMALI TEST

    UYGULAMASI ........................................................................................................................................ 361

    BR ELDRC ANALZ YNTEM NERS .................................................................................... 362

    TIMSS 2015 FEN TEST MADDELERNN DL VE KLTRE GRE DEEN MADDE

    FONKSYONU AISINDAN NCELENMES ...................................................................................... 370

    INVESTIGATION INVARIANCE OF THE DINA MODEL PARAMETERS WITH REAL DATA ... 380

    ANKOR TEST DESENNDE BLSEL TANI MODEL RTK YETENEK SINIFLARI LE TEST

    ETLEME ALIMASI ......................................................................................................................... 384

    YAPILANDIRILMI VE OKTAN SEMEL MADDELERDE PARAMETRE DEMEZLNN

    NCELENMES: BLSEL TANI MODELLERNDE MADDE FORMATININ ETKS ................... 389

    EXAMINATION OF CDM ATTRIBUTE PREVALENCE FOR HIGHER-ORDER THINKING SKILLS

    ITEMS ....................................................................................................................................................... 394

    AKADEMK PERFORMANS LMLERNDE MADDE FORMATININ BLSEL VE ST-

    BLSEL ZLEME BECERS ZERNE ETKS ................................................................................ 399

    OKUL, EV VE RENC ZELLKLERNN 4. SINIF RENCLERNN TIMSS MATEMATK

    BLSEL BECERLERNE ETKS ....................................................................................................... 403

    PISA 2015 FEN OKURYAZARLII MADDELERNN DEEN MADDE FONKSYONLARI

    AISINDAN NCELENMES ................................................................................................................. 404

    RENCLERN TIMSS 2015 MATEMATK YETERLK DZEYLERN ETKLEYEN

    DEKENLERN OKLU UYUM ANALZ LE BELRLENMES ................................................. 411

    NIVERSITE RENCILERININ YAZMA BECERILERININ LLMESINE YNELIK

    BOYLAMSAL BIR LME ARACI GELITIRILMESI VE DEERLENDIRILMESI ...................... 420

    SZEL YETENEK TESTINDE KISA CEVAPLI VE OKTAN SEMELI MADDE TRNDEN ELDE

    EDILEN SONULARIN KARILATIRILMASI ................................................................................. 424

    TRKYE LME ARALARI DZNNDE YER ALAN AIMLAYICI FAKTR ANALZ

    ALIMALARININ PARALEL ANALZ SONULARI LE KARILATIRILMASI ...................... 426

    DENK OLMAYAN GRUPLARDA KERNEL ETLEME YNTEMNDEN ELDE EDLEN

    SONULARININ KARILATIRILMASI ............................................................................................ 427

    DMF KAYNAKLARININ BELRLENMESNDE RTK SINIF DMF YAKLAIMININ

    KULLANILMASI: PISA-2015 TRKYE RNE .............................................................................. 429

    FARKLI FORMATLARDA VE ORTAMLARDA UYGULANAN LKERT TP LEK LE METRK

    LEN PSKOMETRK ZELLKLERNN NCELENMES ........................................................ 434

    DEIEN MADDE FONKSIYONUN MADDE AYIKLAMAYA GRE NCELENMESI ................. 435

    USE OF MIXED ITEM RESPONSE THEORY IN RATING SCALES ................................................. 439

    BLSEL TANI MODELLERNDE Q-MATRSN FARKLI BELRLENMESNN SINIFLAMA

    DORULUU VE TUTARLILII AISINDAN KARILATIRILMASI .......................................... 443

  • 1

    SUMMARIES

  • 2

    Paper ID: 51

    AN ANALYSIS ON THE MODEL FIT INDICES IN A SCALE

    CONSTRUCTION PROCESS

    Esin Tezbaaran

    stanbul University-Cerrahpaa Hasan Ali Ycel Education Faculty

    [email protected]

    Abstract: There are a lot of research studies on evaluating the models using some cutoff values of the model fit indices

    with no statistical approach. This study analyzes the changing of model fit indices when changing the number of

    factors and changing the appointed observed variables to the factors using the same observed variables under the same

    sample size in a scale construction process. This research was carried out using real data with the participation of 1013

    individuals. The data was collected by the Attitude Scale Towards Learning. The model that is specified according to

    the results of exploratory factor analysis and the remodeled constructions at random were investigated by the model

    fit indices. The results revealed that whatever the factor numbers are, the divergence among factors affects the

    changing the model fit indices. Moreover, he remodeled scale constructions that are non-theoretical and non-analytical

    constructions do not seem so poor to model fit.

    Keywords: Model fit indices, divergent validity

    INTRODUCTION

    Goodness-of-fit assessment refers to how well our model reproduces the data generating process (Maydeu-

    Olivares, 2017). There are some fit and misfit indices that are the most common for structural equation

    models (SEM) in literature. In most of the traditional and typical SEM applications, the operationalized

    theories assume one of three forms (Mueller & Hancock, 2008): A measured variable path analysis

    (MVPA), a confirmatory factor analysis (CFA), a latent variable path analysis (LVPA). General purpose

    of the statistical modeling is to provide an efficient and convenient way of describing the latent structure

    underlying a set of observed variables (Byrne, 1994). Therefore, the model structure should be assessed in

    terms of efficiency and convenience of describing statistical models.

    Data model fit assessment, the fit between observed data and the hypothesized model are ideally

    operationalized as an evaluation of the degree of discrepancy between the true population covariance matrix

    and that implied by the models structural and nonstructural parameters and there are three broad categories

    of fit indices (Mueller & Hancock, 2008): Absolute fit indices (standardized root mean square residual

    (SRMR), the chi-square test, and the goodness-of-fit index (GFI)), parsimonious indices (the root mean

    square error of approximation (RMSEA) with its associated confidence interval, the Akaike information

    criterion (AIC), the adjusted goodness-of-fit index (AGFI)), incremental indices (the comparative fit index

    (CFI), the normed fit index, (NFI), and the non-normed fit index (NNFI)). Mueller & Hancock (2008) gives

    a table (Table 1) to assess the degree of data-model misfit, various fit indices can be obtained and then

    should be compared against established cutoff criteria (e.g., those empirically derived by Hu & Bentler,

    1999 cited in Mueller & Hancock 2008).

    Table 1. Target values for selected fit indices to retain a model by class

    Index Class

    Incremental Absolute Parsimonious

    NFI 0.90

    NNFI 0.95 GFI 0.90 AGFI 0.90

    CFI 0.95 SRMR 0.08 RMSEA 0.06

    Joint Criteria

    NNFI, CFI 0.96 and SRMR 0.09

  • 3

    SRMR 0.09 and RMSEA 0.06

    Reference: Mueller & Hancock 2008

    An absolute-fit index directly evaluates how well an a priori model reproduces the sample data (Hu &

    Bentler, 1998). It doesnt depend on a comparison with another model such as saturated models or the

    observed data (Ullman, 2001). Parsimonious indices evaluate the overall discrepancy between observed

    and implied covariance matrices while taking into account a models complexity; fit improves as more

    parameters are added to the model, as long as those parameters are making a useful contribution (Mueller

    & Hancock, 2008). Incremental fit indices are also called comparative fit indices. Bentler and Bonett (as

    cited in Hu & Bentler, 1998 ) describes that these indices as a null model in which all the observed variables

    are allowed to have variances but are uncorrelated with each other is the most typically used baseline model.

    As it appears, fit indices can be used to evaluate the model in different points of view. Different indices

    reflect a different aspect of model fit (Hooper, Coughlan & Mullen, 2008). Kline (2005) recommends a

    minimal set of indices (the model chi-square, RMSEA with its 90% confidence interval, CFI, SRMR) that

    should be reported and interpreted when resulting from SEM analysis.

    In addition to all of above explanations, the estimation method and sample size have an important effect on

    the fit indices. If the assumptions regarding normality and independency among factors and errors seem

    plausible, maximum likelihood (ML) and generalized least square estimation techniques are better choices

    (Ullman, 2001). Moreover, the role of sample size in evaluating the fit index should be taken into account

    to detect the discrepancy between the model and the data generating mechanism powerfully (Maydeu-

    Olivares, 2017).

    In this research paper, it is aimed that how the model fit indices changes and how the changings can be

    explained, when changing the number of factors and changing the appointed observed variables to the

    factors using same observed variables in a scale construction process. Also, the evaluations of the models

    fit in terms of cutoff criterion of the indices are considered.

    METHODOLOGY

    This research is a basic research since the aim of the study is to make some inferences related to cut off

    criterion of most frequently used model fit indices under comparing different constructed models for scale

    development. The basic research is designed to increase scientific understanding of phenomena (Best &

    Kahn, 2006) and to add to the general knowledge with little or no concern for the immediate application of

    the knowledge produced (Bogden & Biklen, 2007).

    The data in the research was collected from 1013 individuals who voluntarily participated in the study.

    Trial version of 44-item Attitude Scale Towards Learning, which was developed by Fenerciolu (2003),

    was used as a data collection tool.

    As the first step of the data analysis, exploratory factor analysis (EFA) was used to determine the scale

    construct. Then four different scale constructions were analyzed by CFA to get their model fit indices. One

    of the scale construction is getting from the EFA result. For the second construct, the items and the number

    of factors revealing by the EFA was used but the items were assigned to the factors randomly. For the third

    and fourth constructs, the same items getting from EFA were used two times and assigned them randomly

    to only two factors. In this way, the changing value of the model fit indices as change in model structure

    can be seen and also how the model fit indices evaluate the different constructs using same observed

    variables.

    FINDINGS

    Firstly, data cleaning procedure was done before factor analysis. Missing value analysis using Littles

    MCAR test shows that the data are missing completely at random (2 (3018)= 3033,5, p>0,05). In addition,

    missing values were replaced with the values computed by expectation maximization method. Four items

    didnt meet normality assumption and 18 items have no statistically significant correlation with the other

    items. These items were removed from the scale. Remaining items (22 items) were analyzed using principal

    component analysis (PCA) with oblique rotation to get the factorial construct of the scale and the criterion

    level for factor loading was 0.30. The sampling adequacy for the analysis is acceptable (KMO= 0.886).

    Bartletts test of sphericity shows that correlations between items were sufficiently large (2 (231)=

  • 4

    4426.88, p,00001) for PCA. The analysis shows that five

    components (factor will be used instead of component in CFA) had eigenvalues over Kaisers criterion

    of 1 and in combination explained 46.88% of the variance. The Cronbach alfa value is 0.73 for this scale.

    For the second scale construct, 22 same items were used and they were assigned randomly to the five

    factors. Then using these items, another two scales constructed with two factors having equal number of

    items at random. In this way, four scale construct was used in this study: One of the construct was specified

    by EFA and the others at random. Then CFA was conducted to determine the adequacy of the models. For

    the estimation method of the population parameters, maximum likelihood (ML) was used. Because ML can

    be robust if observed variables do not have multinormality assumption (Joreskog & Yang, 1996).

    As it can be seen in table 2, the best values of all fit indices belong the construct from EFA as expected.

    When comparing five-factor constructs, the meaningful difference is the correlations between factors. All

    correlation coefficients values (10 correlations) are smaller in the construct getting from EFA than those in

    the construct with arranging items to the five factors at random (Figure 1).

    (a) (b)

    Figure 1. Path diagrams of the construct getting from EFA (a) and the construct randomly assigned

    items to the five factors (b)

    The evident difference between randomly formed constructs with two factors is the level of correlation

    between the factors. The model fit indices of the construct with low correlation between the factors were

    found better when compared to the other construct with high correlation. Moreover, the model fit indices

    of this construct is quite close to those of five-factor construct arranging randomly (Table 2).

    Table 2. The model fit indices of different scale constructs

    Number

    of

    factors

    Corr.

    Between

    factors

    NFI NNFI CFI GFI SRMR 2 (df) RMSEA

    (CI90)

    AGFI

    5 (EFA-oblique

    rotation)

    Figure 1-

    a

    0.95 0.96 0.97 0.95 0.044 556.74

    (197)

    0.042 (0.038,

    0.047)

    0.94

    5 (randomly

    assigned)

    Figure 1-

    b

    0.90 0.91 0.92 0.91 0.055 1093.13

    (199)

    0.067 (0.063,

    0.071)

    0.89

    2 (randomly

    assigned)

    0.96 0.89 0.90 0.91 0.90 0.056 1271.51

    (208)

    0.071 (0.067,

    0.075)

    0.88

    2 (randomly

    assigned)

    0.87 0.90 0.91 0.92 0.90 0.055 1200.05

    (208)

    0.069 (0.065,

    0.072)

    0.88

  • 5

    As for the evaluation of the constructs in terms of fitting, it was seen that the construct obtained from EFA

    was acceptable fitting according to all indices. It was also seen that the constructs with randomly assigned

    items to the factors were only at acceptable level according to the absolute fit indices, and were not

    acceptable in the eye of incremental and parsimonious indices.

    CONCLUSIONS

    This research offers two basic results. One of them is related to the evaluation of the model fit index results

    which belong to casual model with great care. Particularly in the case of testing hypothesis for the fitting

    of theoretical model, the fitting can be accepted when using cutoff values of indices although the model is

    not adequate.

    In this research, it is expected that the construct getting from EFA, that is a mathematical procedure

    designed to identify factors, has more fitting values than the fitting values of the constructs designed

    randomly. The description of a construct provides information about the expected relationships among

    variables; factor analysis helps to determine whether this pattern of relationships does indeed exist (Murphy

    & Davidshofer, 2001). However, the fitting indices of the constructs randomly designed bear the risk of

    acceptable fitting if more tolerant criterions are taken into account. Traditional goodness-of-fit indices also

    are unduly influenced by the good fits in the measurement portions of the model and can yield values in the

    .90s even when the structural relations among the latent variables of the model are seriously misspecified

    (Mulaik, James, Alstine, Bennett, Lind, & Stilwell, 1989).

    All indices give their cutoff values to the researchers to evaluate whether their models fit. However, a

    problem exists when using fixed cutoff criterions that determine the model fit. The problem is that there is

    no obvious external variable criterion outcome and there exists many different fit indices, each one or more

    proposed cutoff values (Barrett, 2006; Maydeu-Olivares, 2017). Maydeu-Olivares (2017) declares three

    problems with current practices. First they are pre-statistical which means that there is no any description

    about the parameter being estimated or proof that the index is an asymptotically unbiased estimate of the

    parameter of interest. Furthermore, sampling variability of the statistic is ignored. Second, partly as a result

    of the approach being pre-statistical, no agreement is possible regarding which index to use or what cutoff

    value. Third, and most importantly, inferences on the model parameters are made as if the model were

    correct, when in fact, it is mis-specified. Most indices showed somewhat inadequate sensitivity to

    misspecification with many also showing sensitivity to sample size (e.g., chi-square, SRMR), numbers of

    factors (e.g., Mc), numbers of items (e.g., RMSEA), or some combination of these model conditions

    (Meade, 2008).

    The other result of this research is that, without the effect of the number of factors, the values of model fit

    indices (except GFI) are affected negatively when the correlation between factors increases. Low

    correlation between factors is an important discriminant evidence for construct validity. A validity

    coefficient showing little relationship between test scores and/or other variables with which scores on the

    test being construct-validated should not theoretically be correlated and provides discriminant evidence of

    construct validity (Cohen, Montague, Nathanson & Swerdlik, 1988). According to the findings of this

    research, divergent evidence affects the values of model fit indices being in favor of the model.

    The studies examining divergence between the factors and convergence under factor loads in the models

    can be recommended. In addition, it can be said that there is a need to determine the common cutoff values

    for model fit indices with their effect sizes.

    REFERENCES

    Barrett, P. (2006). Structural equation modelling: Adjudging model fit. Personality and Individual

    Differences 42, 815824. doi: 10.1016/j.paid.2006.09.018

    Best, J.W. & Kahn, J.V. (2006). Research in education. New York: Pearson Education, Inc.

    Bogdan, R. C., & Biklen, S. K. (2007). Qualitative research for education. New York: Pearson Education,

    Inc.

    Byrne, B.M. (1994). Structural equation modeling with EQS and EQS/Windows. California: Sage

    Publication, Inc.

  • 6

    Cohen, R.J., Montague, P., Nathanson L.S., & Swerdlik, M.E. (1988). Psychological testing. Mountain

    View: Mayfield Publishing Company.

    Fenerciolu, E. (2003). Likert tipi leklere madde semede kullanlan Pearson momentler arpm

    korelasyon teknii ile ok serili korelasyon tekniinin incelenmesi (Masters thesis). Hacettepe

    niversitesi, Sosyal Bilimler Enstits, Ankara, Turkey.

    Hooper, D., Coughlan, J., & Mullen, M. R. (2008). Structural equation modelling: Guidelines for

    determining model fit. The Electronic Journal of Business Research Methods 6(1), 53 60.

    Retrieved www.ejbrm.com

    Hu, L., & Bentler, P.M. (1998). Fit indices in covariance structure modeling: Sensitivity to under

    parametrized model misspecification. Psychological Methods, 3(4), 424-453.

    Joreskog, K.G., & Yang, F. (1996). Nonlinear structural equation models: The Kenny-Judd model with

    interaction effects. In G.A Marcoulides, and R.E. Schumacker (Eds.), Advanced structural -

    equation modeling: Issues and techniques (pp. 57-88). New Jersey: Lawrence Erlbaum

    Asssociates.

    Kline, R.B. (2005). Principles and practice of structural equation modeling. New York: The Guilford Press.

    Maydeu-Olivares, A. (2017). Assessing the size of model misfit in structural equation models.

    Psychometrika, 82(3), 533-558. doi: 10.1007/s11336-016-9552-7

    Meade, A. W. (2008, April). Power of AFIs to Detect CFA Model Misfit. Paper presented at the twenty-

    third Annual Conference of the Society for Industrial and Organizational Psychology, San

    Francisco, USA. Retrieved from http://www4.ncsu.edu/~awmeade/Links/Papers/AFI

    (SIOP08).pdf

    Mueller, R.O., & Murphy, K.R. ve Davidshofer, C.O. (2001). Psychological Testing Principles and Applications.

    New Jersey: Prentice Hall Inc. Hancock, G.R. (2008). Best practices in structural equation

    modelling. In J. Osborne (Ed.), Best practices in quantitative methods (pp. 488-510). doi:

    http/://dx.doi.org/10.4135/ 9781412995627.d38

    Mulaik, S.A., James, L.R., Alstine, J.V., Bennett, N., Lind, S., & Stilwell, C.D. (1989). Evaluation of

    goodness-of-fit indices for structural equation models. Psychological Bulletin, 105(3), 430-445.

    Murphy, K.R., & Davidshofer, C.O. (2001). Psychological testing principles and applications. New Jersey:

    Prentice Hall Inc.

    Ullman, J. B. (2001). Structural Equation Modeling. In B.G. Tabachnick and L.S. Fidell (Eds.), Using

    Multivariate statistics (pp. 653-771). Boston: Allyn Bacon.

    http://www4.ncsu.edu/~awmeade/Links/Papers/AFI
  • 7

    Paper ID: 54

    KOMPOZSYON PUANLAMADA PUANLAYICILAR ARASI

    GVENRLK BELRLEME YNTEMLERNN KARILATIRILMASI

    Nee ztrk Gbe1, Mustafa Kln2, Ercan Yldz3, Osman Aydn4, Gl Aslan5, eyma zcan6 1,2Mehmet Akif Ersoy niversitesi, 3,4,5,6Milli Eitim Bakanl

    [email protected]

    zet: Bu aratrmann amac puanlayclar aras gvenirlii farkl yntemler ile incelemektir. Aratrma kapsamnda,

    ilkretim 6. Snfa devam eden 19 rencinin yazm olduu kompozisyonlar drt retmen (puanlayc) tarafndan

    birbirinden bamsz olarak btncl puanlama anahtar kullanlarak puanlanmtr. Puanlayclar aras gvenirlii

    belirlemek iin uyuma yzdesi, Fleisss Kappa istatistii, Kendall Uyum Katsays W, Spearman sra farklar korelasyon katsays ortalamas, Krippendorff Alfa ve genellenebilirlik kuram ie koulmutur. Elde edilen bulgular,

    puanlayclarn verdii puanlar arasnda tam uyuma yzdesi %21.1 iken +/-1 aralnda uyuma yzdesinin %89.5

    olduunu gstermitir. Fleissin kappa istatistii puanlayclar arasnda istatistiksel olarak anlaml fakat dk bir

    uyum olduunu; Kendalln uyum katsays W puanlayclar arasnda yine istatistiksel olarak anlaml fakat verilen

    puanlar arasnda orta dzey bir iliki olduunu gstermitir. Bu bulguyu puanlayclarn vermi olduu puanlar

    arasnda hesaplanan Spearman sra farklar korelasyon katsays ortalamas da desteklemitir. Benzer ekilde,

    Krippendorff alfa istatistii de puanlayclar arasnda zayf uyum olduunu gstermitir. Bu almada puanlayclar

    aras gvenirlii belirlemede kullanlan bir dier yntem genellenebilirlik kuram olmutur ve genellenebilirlik

    katsays 0.59 olarak hesaplanmtr. Varyans bileenleri incelendiinde puanlayc ana etkisine ait varyansn en dk

    varyans bileeni olduu dolaysyla puanlayclarn birbirine benzer puanlamalar yapt sonucuna ulalmtr. Fakat

    renci-puanlayc etkileimine ilikin kestirilen artk varyansn en byk varyans bileeni olduu grlmtr. Bu

    durumun oluma nedeninin puanlayclarn sistematik olmayan bir ekilde renciler boyunca farkllamasnn

    olabilecei gibi rencilerin kompozisyon puanlarnn almada yer almayan faktrler tarafndan da etkilenmi

    olabilecei sonucuna ulalmtr.

    Anahtar kelimeler: gvenirlik, puanlayclar aras gvenirlik, kompozisyon puanlama

  • 8

    Paper ID: 55

    INVESTIGATION of SUBGROUP INVARIANCE OF EQUATING

    FUNCTIONS

    Kbra Atalay Kabasakal1, Nermin Kbrslolu Uysal1, Sakine Ger ahin2 1Hacettepe University, 2University of Wisconsin-Madison

    [email protected], [email protected], [email protected]

    Abstract: The purpose of this study is to examine the invariance of equating functions and possible reasons behind

    invariance. To accomplish this, the responses to the mathematics test obtained from the TIMMS 2015 examination

    were used. The USA and Turkey sample, consisting of 16,295 students were used in line with the purpose of the

    research. Gender and both countries were considered as subgroups in this study.There are 14 booklets in TIMSS

    administration. Each booklet is linked to each other with common items. In the scope of the study 14 booklets were

    equated with equipercentile equating method in each subgroup and equated scores were compared with respect to

    REMSD. Moreover, DIF between subgroups, distributional properties, test difficulty and discrimination difference

    between subgroups within each booklet were evaluated. The results of the study indicated that the test difficulty and

    skewness difference between subgroups and the ratio of DIF items were significantly related with REMSD values.

    Keywords: Subgroup invariance, equating functions, equipercentile equating, TIMSS 2015.

    INTRODUCTION

    Test equating is a method to make test scores which were obtained from different but have same construct

    tests comparable. Although there are different methods to equate or link test scores, they are all based on

    some certain prerequisites (Dorans & Holland, 2000). Subgroup invariance of equating functions is one of

    the most important prerequisite of equating process (Dorans, 2004). Lack of invariance of the equated

    scores means that some students have the same score on one scale, but belonging to different

    subpopulations, makes them to have different expected test scores on the corresponding equated scale

    which directly effects the validity of equated scores. Hence, the purpose of this study is to examineequating

    invariance between subgroups and evaluate possible reasons behind the invariance. TIMSS 2015

    mathematics scores were equated with equipercentile equating method with nonequivalent groups common

    item design. The research questions are given below.

    Research Questions

    (1) Do equating functions obtained to equate mathematics scores in TIMSS 2015 ensure subgroup

    invariance?

    i. Do equating functions obtained to equate mathematics scores in TIMSS 2015 administration ensure subgroup invariance in Turkey sample between gender subgroups?

    ii. Do equating functions obtained to equate mathematics scores in TIMSS 2015 administration ensure subgroup invariance in USA sample between gender subgroups?

    iii. Do equating functions obtained to equate mathematics scores in TIMSS 2015 administration ensure subgroup invariance in Turkey and USA samples between country subgroups?

    (2) Are the equating errors obtained to equate mathematics scores in TIMSS 2015 administration between

    subgroups related to DIF, distributional properties, test difficulty and discrimination difference between

    subgroups?

    i. Are the equating errors obtained in Turkey sample between gender subgroups related to DIF, distributional properties, test difficulty and discrimination difference between subgroups?

  • 9

    ii. Are the equating errors obtained in USA sample between gender subgroups related to DIF, distributional properties, test difficulty and discrimination difference between subgroups?

    iii. Are the equating errors obtained in USA and Turkey sample between country subgroups, related to DIF, distributional properties, and test difficulty and discrimination difference between

    subgroups?

    METHODOLOGY

    In the scope of the study, the invariance of equating functions to equate scores of country and gender

    subgroups was evaluated and possible reasons of invariance were examined within TIMSS 2015

    administration. Hence, this study is a descriptive study which aims to evaluate validity of equated scores in

    TIMSS 2015 administration.

    Sample

    Turkey and USA subgroups were taken into consideration. Hence, the sample of the study consists of

    16,295 eight grade students involved in TIMSS 2015 administration in Turkey and USA.

    Data Analysis

    In order to achieve its predefined goals, TIMSS administration requires too many items that a student is not

    able to respond in one session. Hence, there are 14 item clusters in each domain. Each cluster consists of

    approximately 12-18 items regarding number, algebra, geometry, data and probability subject domains.

    These clusters are distributed 14 booklets in a way that each booklet consists of two clusters and they are

    balanced with respect to content, item number and item format. These booklets are linked by common

    clusters to receive a final score for each student.

    In data analysis each booklet was equated with following one by common clusters. Data analysis was

    conducted in two steps. First, equating invariance between subgroups was evaluated. Equipercentile

    equating method was used. Then possible reasons behind the invariance were examined. Two subgroups

    were defined as country and gender groups. Country group consists of Turkey and USA sample groups.

    Gender group consists of the gender subgroups of each country's sample. Equating invariance was evaluated

    in three phases. First, the whole groups scores were equated. Second, the subgroups scores were equated.

    Last, the REMSD values were calculated. The REMSD value higher than 10% interpreted as equating

    invariance is not ensured (Dorans, 2004; Dorans &Holland, 2000). After REMSD values were calculated, the possible reasons behind equating invariance were evaluated. DIF, differences in distributional

    properties, test difficulty and discrimination indexes for each subgroup within each booklet were examined.

    Data analysis was conducted in R program.

    FINDINGS

    The findings were reported for each research question. The results addressing first research question are

    summarized in Table 1.

    Table 1. REMSD values for each equating relation within subgroups

    Booklets TR-USA TR-Gender USA-Gender B1 - B2 0.10 0.05 0.07 B2 - B3 0.11 0.15 0.06 B3 -B4 0.12 0.12 0.07 B4 - B5 0.09 0.12 0.08 B5 -B6 0.06 0.08 0.11 B6 - B7 0.11 0.06 0.10 B7 - B8 0.12 0.06 0.04 B8 - B9 0.07 0.19 0.06 B9 - B10 0.11 0.13 0.03 B10 - B11 0.04 0.10 0.11 B11 - B12 0.10 0.16 0.17 B12 - B13 0.08 0.06 0.06

  • 10

    B13 - B14 0.08 0.12 0.07 B14 - B1 0.11 0.11 0.15

    As shown in Table 1, the REMSD values are higher than 0.10 for seven equating functions between

    countries, for eight equating functions between gender subgroups in Turkey and for four equating functions

    between gender subgroups in USA. These results imply that equating invariance is not ensured for some

    booklets between the subgroups.

    In order to answer second research question DIF, distributional properties, test difficulty and discrimination

    indexes difference between subgroups were examined. Number of DIF items in common tests and whole

    tests with respect to favoring subgroups were summarized in Table 2.

    Table2. Number of DIF items in common and whole test within each subgroups

    TR-USA TR-Gender USA-Gender

    Common Whole Common Whole Common Whole

    Booklets TR USA TR USA F M F M F M F M

    B1 - B2 4 3 7 12 1 2 3 7 2 2 5 6

    B2 - B3 5 3 8 13 1 2 6 6 2 2 6 6

    B3 -B4 4 2 6 12 4 2 8 6 2 2 5 5

    B4 - B5 3 5 8 10 3 2 8 6 1 1 3 4

    B5 -B6 3 5 8 11 1 2 7 8 0 1 2 6

    B6 - B7 5 4 9 14 3 4 6 7 1 4 2 6

    B7 - B8 6 5 11 17 2 1 9 8 1 1 4 6

    B8 - B9 6 5 11 16 4 3 6 5 2 1 3 3

    B9 - B10 4 4 8 12 0 1 6 7 0 1 2 3

    B10 - B11 2 2 4 10 2 3 3 8 0 1 2 4

    B11 - B12 4 5 9 11 1 4 5 9 2 2 3 3

    B12 - B13 5 4 9 12 2 2 4 7 1 0 3 4

    B13 - B14 3 6 9 13 1 1 4 6 0 2 2 4

    B14 - B1 5 6 11 12 1 3 3 6 1 2 5 6

    As shown in Table 2, there are plenty of DIF items in each equating case between subgroups. For country

    subgroups, while number of items favoring Turkey and USA seem more balanced in common tests, there

    are more items favoring USA in the entire tests. For gender subgroups, while number of items favoring

    male students is more than items favoring females, the number of DIF items favoring males and females

    seem more balanced in both common and entire test for Turkey as well as USA.

    The skewness, kurtosis, test difficulty and test discrimination index difference of booklets between

    subgroups are given in Table 3.

    Table 3. Differences between skewness, kurtosis, test difficulty and discrimination indexes

    TR-USA TR-Gender USA-Gender

    Booklets Skew Kurt Disc Diff Skew Kurt Disc Diff Skew Kurt Disc Diff

    B1 - B2 0.287 0.715 0.096 0.619 0.301 0.695 0.734 0.039 0.048 0.21 0.035 0.01

    B2 - B3 1.12 0.981 0.038 0.943 0.514 0.899 1.053 0.154 0.143 0.077 0.053 0.002

    B3 -B4 1.169 1.054 0.262 0.792 0.314 1.077 1.034 0.043 0.066 0.01 0.013 0.003

    B4 - B5 0.641 0.838 0.316 0.522 0.134 0.816 0.852 0.036 0.132 0.107 0.043 0.001

    B5 -B6 0.28 0.731 0.24 0.491 0.55 0.579 0.888 0.309 0.167 0.206 0.009 0.022

    B6 - B7 0.347 0.653 0.111 0.542 0.125 0.665 0.64 0.025 0.013 0.017 0.011 0.024

    B7 - B8 0.204 0.65 0.089 0.561 0.157 0.633 0.659 0.026 0.061 0.233 0.036 0.013

    B8 - B9 1.022 0.924 0.018 0.906 0.535 0.827 1.003 0.176 0.059 0.137 0.047 0.014

  • 11

    B9 - B10 1.022 0.968 0.167 0.801 0.196 1.019 0.936 0.083 0.08 0.041 0.001 0.001

    B10 - B11 0.46 0.797 0.286 0.511 0.113 0.822 0.77 0.052 0.104 0.007 0.002 0.005

    B11 - B12 0.339 0.665 0.111 0.554 0.764 0.492 0.858 0.366 0.178 0.095 0.003 0.044

    B12 - B13 0.507 0.789 0.31 0.479 0.227 0.78 0.79 0.01 0.125 0.137 0.054 0.021

    B13 - B14 0.294 0.689 0.189 0.5 0.194 0.75 0.638 0.112 0.032 0.031 0.038 0.025

    B14 - B1 0.356 0.698 0.04 0.658 0.126 0.782 0.617 0.165 0.159 0.01 0.002 0.05

    Note: Skew=Skewness; Kurt=Kurtosis; Disc=Discrimination index; Diff= Difficulty

    In order to decide which features of tests are significantly related to equating error, the Sperman-Brown rho

    correlations were calculated between REMSD values and DIF ratio, skewness, kurtosis, test difficulty and

    test discrimination index difference for each group. The results indicated that there is a significant

    relationship between REMSD and DIF ration in whole test (=.55, p=.041) and skewness difference (=.56,

    p=.038) in Turkey-USA samples. In Turkey sample between gender subgroups, REMSD values are

    significantly related to skewness difference (=.66, p=.010), while in USA sample gender subgroups,

    REMSD values are significantly related to test difficulty difference (=.54, p=.045).

    CONCLUSIONS

    The results of the study indicated that equating invariance is not ensured for some booklets. These results

    imply that students get lower equated scores as they belong to different subgroups (Huggins, 2014). Hence,

    the evaluation of the scores under certain subgroups may lead misinterpretations as the equated scores are

    not invariant.

    The results also imply that REMSD values are related to different features of test in different groups. While

    in Turkey and USA sample country subgroup invariance is related to the DIF ratio, it is not related in gender

    subgroups both in Turkey and USA groups. The ratio of DIF items is higher in country subgroups than

    gender subgroups. This may be interpreted as equipercentile equating method is robust to DIF until certain

    ratio within equating invariance perspective. Skewness differences of subgroups is also a significant

    variable related to REMSD gender subgroups in Turkey and country subgroups but not in gender subgroups

    in USA. The possible reason behind this may be the skewness differences between gender subgroups in

    USA is too small.

    The results of the study are limited with the equipercentile equating method and defined subgroups. Future

    studies may focus on how different equating methods behave under certain subgroups. Moreover, in order

    to get a broader perspective related to possible reasons of invariance, simulation studies may be conducted.

    REFERENCES

    Dorans, N. J., & Holland, P. W. (2000). Population invariance and the equatabilityof tests: Basic theory

    and the linear case. Journal of Educational Measurement, 37, 281-306.

    Dorans, N. J. (2004). Using subpopulation invariance to assess test score equity. Journal of Educational

    Measurement, 41, 43-68.

    Huggins, A. C. (2014). The effect of differentialitem functioning in anchoritems on populationinvariance

    of equating. Educational and Psychological Measurement, 74(4), 627658.

  • 12

    Paper ID: 58

    THE COMPARISON OF ITEM AND ABILITY ESTIMATIONS

    CALCULATED FROM THE PARAMETRIC AND NON-PARAMETRIC

    ITEM RESPONSE THEORY ACCORDING TO THE SEVERAL FACTORS

    Ezgi Mor Dirlik1, Nizamettin Ko2

    1Kastamonu niversitesi, 2Ankara niversitesi (Emekli)

    [email protected], [email protected]

    Abstract: This study aimed to compare the item and ability parameters estimated from two different approaches of

    Item Response Theory, which are parametric and non-parametric, according to the factors of test length, sample size

    and the item psychometric qualities. The study was prepared in a basic research method and the data of the study is

    obtained from the Trends of International Mathematic and Science (TIMSS) 2011 application. From the main data

    set, 16 different data sets were formed according to the research questions and the conditions of the factors. The results

    showed that all of the item parameters estimated according to the parametric and non-parametric models from the data

    sets which are composed of different number of items and sample sizes were correlated significantly and highly.

    Among of the abilities estimated by non-parametric models in all sample size and test length factors, it was concluded

    that ability parameters are significantly and highly correlated. To sum up, it was found that both approaches had

    resulted in so highly and significantly correlated item and ability parameters. This finding showed that if the

    assumptions of parametric item response theory models are not met as a required level, the non-parametric alternative

    models can be used in the conditions that studied in this research.

    Keywords: Parametric Item Response Theory, Non-Parametric Item Response Theory, Mokken Scaling, Sample Size,

    Test Length

    SUMMARY

    Introduction

    Measurement of the abstract constructs has been one of the most difficult fields since the beginning

    of the history of social sciences. There have been so many attempts to define and measure the properties

    such as success, motivation, intelligence, attitudes and personality. Because of the complex structure of

    these constructs, it has been already accepted that there is always a fair amount of error in these

    measurements. However, in order to minimize the level of error, the theories of measurement have been

    developed. The first measurement theory which has been known the most is the Classical Test Theory

    (CTT) and has been used so many measurement applications thanks to easy structural features. Its

    assumptions can be met easily for very kind of data sets and this is the main reason that made this theory

    so popular. Despite of the popularity, CTT has some important drawbacks which limit the precision of the

    measurement. By using CTT, only one value of the error, the standard error of measurement, can be

    obtained but it is a fact that the amount of the error is not same for the all people in a sample. Also, it is so

    difficult to adapt a test to the tester level in this theory. In order to propose solutions to these drawbacks,

    the other measurement theory has been developed which is Item Response Theory(IRT). It is known as

    latent trait theory and provides deeper knowledge not only about the testees performances but also the test

    items and the constructs. It has made possible to create adaptive tests and to equate tests for the goal of

    comparison of students performance. However, in order to get these advantages of the theory, there are

    some requirements which should be met by the data sets. These requirements are large sample size, normal

    distribution, local independence and uni-dimensionality. Especially for the classroom assessment, it is not

    possible to reach large samples such as 1000 or 2000 generally and to apply as many items as in the high

    stake tests. For these reasons, the advantages of IRT cannot be used for the small samples which are not

    normally distributed. Because of the difficulty of these assumptions of IRT, it has been occurred new

    approaches which are based on this theory but differentiate as for requirements. One of this new approach

  • 13

    is Nonparametric Item Response Theory, which is between CTT and Parametric Item Response Theory. It

    can be applied for short tests and small samples like CTT but it is possible to get invariable item and ability

    parameters like IRT. The nonparametric item response theory is a relatively new approach and has been

    frequently used in health sciences. Because of the novelty of this field, there are limited numbers of studies

    in literature and most of them have been focused on the theoretical perspectives. In this study, the

    comparison of the parameters calculated from the parametric and nonparametric IRT has been purposed.

    Method

    By this study, it was aimed to compare the item and ability parameters estimated from two different

    approaches of Item Response Theory, which are parametric and non-parametric, according to the factors of

    test length, sample size and the item psychometric qualities. The study was prepared in a basic research

    method and the data of the study is obtained from the Trends of International Mathematics and Science

    (TIMSS) 2011 application. The mathematic booklet at eight grade level which had been determined as

    unidimensional at most was used and the data from the first 20 countries were gathered. The final data set

    of the research was composed of 7254 students from different countries. From the main data set, 16 different

    data sets were formed according to the research questions and the conditions of the factors. As for sample

    size factor, three different data sets were prepared consisted of 500, 1000 and 3000 students and sample

    size conditions were crossed with the test length conditions which are five, 15 ve 25 items. From these data

    sets, item and ability parameters were estimated by using the compatible parametric and non-parametric

    item response models and the results were compared by using the relevant correlation coefficients.

    According to the last factor of item psychometric quality factors, four different data sets were composed

    and ability parameters were estimated from them and compared. Lastly, the reliability of the data sets were

    investigated by using test information functions in parametric item response theory models and calculating

    Cronbach Alpha, Lambda2, LCRC and MS coefficients in non-parametric item response theory models.

    Results

    The results showed that all of the item parameters estimated according to the parametric and non-

    parametric models from the data sets which are composed of different number of items and sample sizes

    were correlated significantly and highly. However, as for ability parameters, it was determined that there is

    not consistency in the estimations according to the parametric models especially when the data sets are

    composed of five and 15 items. However, it was found that the abilities estimated from 25 item data set is

    correlated significantly and highly. Among of the abilities estimated by non-parametric models in all

    sample size and test length factors, it was concluded that ability parameters are significantly and highly

    correlated. Additionally, when the correlation of estimated ability parameters was examined between

    parametric and non-parametric models, parametric models produced consistent with non-parametric models

    only in the conditions of having 25 items or sample size is larger than 3000.

    Discussion and Conclusion

    In the scope of this work, the another factor whose effects were examined on ability parameters were

    the levels of item discrimination and item difficulty parameters and according to this factor, it was observed

    that the abilities estimated from the four data sets that were grouped in terms of low and high item

    difficulties and item discrimination, were correlated highly and significantly. Accordingly, it was found

    that both approaches had resulted in so highly and significantly correlated item and ability parameters. This

    finding showed that if the assumptions of parametric item response theory models are not met as a required

    level, the non-parametric alternative models can be used in the conditions that studied in this research.

    REFERENCES

    Baker, F. B. and Kim. S.H. (2004). Item response theory: Parameter estimation techniques (2nd Edition).

    New York: Marcel Dekker.

    Christensen, K. and Kreiner, S. (2010). Monte Carlo tests of the Rasch model based on scalability

    coefficients. British Journal of Mathematical and Statistical Psychology, 63, 101-111.

    Dyehouse, M. A. (2009). A comparison of model-data fit for parametric and nonparametric item response

    theory models using ordinal level ratings. (Doctoral dissertation). Available from ProQuest

    Dissertations and Theses database.

  • 14

    Embretson, S. E. and Reise, S. P. (2000). Item response theory for psychologists. Mahwah. NJ: Erlbaum.

    Hambleton, R. K. and Swaminathan, H. (1985). Item response theory. Principles and applications. Boston:

    Kluwer.

    Junker, B. (2000). Some topics in nonparametric and parametric IRT. with some thougts about the future.

    Carnegie Mellon University: Department of Statistics.

    Koar, H. (2014). Madde tepki kuramnn farkl uygulamalarndan elde edilen parametrelerin ve model

    uyumlarnn rneklem bykl ve test uzunluu asndan karlatrlmas. Yaymlanmam

    doktora tezi. Hacettepe niversitesi Eitim Bilimleri Enstits.

    Kujipers, R. E. Van der Ark, L. A. , Croon, M. A. and Sijtsma, K. (2016). Bias in point estimates and

    standard errors of Mokkens scalability coefficients. Applied Pyschological Measurement. 40 (5),

    331-345.

    Loevinger, J. (1947). A systematic approach to the construction and evaluation of tests of ability.

    Psychological Monographs. 61 (A).

    Loevinger, J. (1948). The technic of homogeneous tests compared with some aspects of "scale analysis"

    and factor analysis. Psychological Bulletin. 45 (6). 507-529.

    Meijer, R. R. Sijtsma, K and Smidt, N. G. (1990). Theoretical and empirical comparison of the Mokken

    and the Rasch approach to IRT. Applied Psychological Measurement. 14. 283298.

    Meijer, R. R. Sijtsma, K., and Molenaar, I. W. (1995). Reliability estimation for single dichotomous items

    based on Mokken's IRT model. Applied Psychological Measurement, 19 (4), 323-335.

    Meijer, R.R. and Baneke, J.J. (2004). Analyzing psychopathlogy items: a case for nonparametric item

    response theory modeling. Pyschological Methods, 9 (3), 354-368.

    Meijer, R. R. Egberink, I.J.L., Emons, W.H.M. and Sijtsma, K. (2008). Detection and valiaditon of

    unscalable item score patterns usingitem response theory: an isllustration with Harters self

    perception profile for children. Journal of Personality Assessment, 90 (3), 227-238.

    Meijer, R., Tendeiro, J., and Wanders, R. (2014). The use of nonparametric item response theory to explore

    data quality. S. P. Reise & D. Revicki (Ed.). Handbook of item response theory modeling:

    Applications to typical performance assessment. In S. P. Reise & D. Revicki (Eds.). Handbook of

    item response theory modeling: Applications to typical performance assessment. (s. 85-110). New

    York: Routledge

    Mislevy, R. J. & Stocking, M. L. (1989). A consumers guide to LOJIST and BILOG. Applied

    Psychological Measurement. 13. 57-75.

    Mokken. R. J. and Lewis. C. (1982). A nonparametric approach to the analysis of dichotomous item

    responses. Applied Psychological Measurement, 6 (A), 417-430.

    Mokken, R. J. (1971). A theory and procedure of scale analysis with applications in political research.

    New York. Berlin: Walter de Gruyter. Mouton.

    Molenaar, I.W. ve Sijtsma. K. (1984).Internal conisistency and realibility in Mokkens nonparametric item

    response model. Tijdshrift voor onderwijsresearch.9 (5). 257-268.

    Molenaar, I. W. (1991). A weighted Loevinger H coefficient extending Mokken scaling to multicategory

    items. KwantitatieMe Methoden. 37. 97 117.

    Molenaar. I. W. (1997). Nonparametric models for polytomous responses. In W.J. Man der Linden and

    R.K. Hambleton (Ed.). Handbook of modern item response theory (s. 369380). NewYork: Springer.

    Molenaar, I. W. (2001). Thirty years of nonparametric item response theory. Applied Psychological

    Measurement, 25 (3), 295-299.

    Molenaar. I. W. (1997). Nonparametric models for polytomous responses. In W.J. Man der Linden and

    R.K. Hambleton (Ed.). Handbook of modern item response theory (s. 369380). NewYork: Springer.

    Sijtsma, K. and Junker, B.W. (2006). Item Response Theory: Past Performance. Present Developments

    and Future Expectations. Behaviormetrika, 1, 75102.

    Sijtsma, K. and Meijer, R. R. (2007). Nonparametric item response theory. C. R. Rao. & S. Sinharay (Ed.)

    Handbook of statistics 26: psychometrics. (s.719-746). Amsterdam: Elsevier.

    http://www.rug.nl/staff/r.r.meijer/researchhttp://www.rug.nl/staff/r.r.meijer/researchhttp://www.rug.nl/staff/r.b.k.wanders/researchhttp://www.rug.nl/research/portal/publications/the-use-of-nonparametric-item-response-theory-to-explore-data-quality-in-s-p-reise--d-revicki-eds-handbook-of-item-response-theory-modeling-applications-to-typical-performance-assessment(33e177c5-7a32-4f47-b8d8-7d0be43bdfd3).htmlhttp://www.rug.nl/research/portal/publications/the-use-of-nonparametric-item-response-theory-to-explore-data-quality-in-s-p-reise--d-revicki-eds-handbook-of-item-response-theory-modeling-applications-to-typical-performance-assessment(33e177c5-7a32-4f47-b8d8-7d0be43bdfd3).htmlhttp://www.rug.nl/research/portal/publications/the-use-of-nonparametric-item-response-theory-to-explore-data-quality-in-s-p-reise--d-revicki-eds-handbook-of-item-response-theory-modeling-applications-to-typical-performance-assessment(33e177c5-7a32-4f47-b8d8-7d0be43bdfd3).html
  • 15

    Sijtsma, K. and Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Newbury

    Park. CA: Sage.

    Straat, J. H. Van der Ark, L.A. and Sijtsma, K. (2014). Minimum sample size requirements for Mokken

    scale analysis. Educational and Psychological Measurement, 74, 809-822.

    engl Avar, A. (2015). ok Kategorili Puanlanan Maddelerin Psikometrik zellkilerinin Farkl Test

    Koullarnda Parametrik Olmayan Madde Tepki Kuram Modellerine Gre ncelenmesi.

    Yaymlanmam doktora tezi. Ankara niversitesi Eitim Bilimleri Enstits.

    Van Schuur, W. H. (2011). Ordinal item response theory: Mokken scale analysis. Los Angeles: Sage

    Publications.

  • 16

    Paper ID: 59

    A COMPARISON OF KERNEL EQUATING METHODS UNDER NEAT

    DESIGN

    idem Akn Arkan

    Ordu niversitesi [email protected]

    Abstract: In this study, the equated score results of the Kernel equating (KE) methods which are chained and post-

    stratification equipercentile and linear equating methods compared under NEAT design. TIMMS Science data was

    used for this study. The study sample consisted of 865 eighth-grade examinees who were given the booklets 1 and 14

    during the TIMSS application in Turkey. There were 39 items in booklet 1, and 38 items in booklet 14. Firstly,

    descriptive statistics were calculated and then, the two booklets were equated according to the NEAT design based on

    Kernel chained, Kernel post-stratification equipercentile, and linear equating methods. Secondly, equating methods

    were evaluated according to some criteria to DTM, PRE, SEE, SEED and RMSD. It was seen that results based on

    equipercentile and linear equating methods were consistent with each other, except high range of the score scale.

    Finally, PRE values demonstrate that KE equipercentile equating methods better matches the discrete target

    distribution Y and distribution of SEED reveals that KE equipercentile and linear methods are not significantly

    different from each other according to DTM.

    Keywords: Equating, Kernel equating, DTM, SEE, SEED, RMSD.

    mailto:[email protected]
  • 17

    Paper ID: 64

    MADDE TEPK KURAMI SMLASYONLARININ YAPI GEERL VE

    GVENRLK AISINDAN NCELENMES

    Nuri Doan1, Abdullah Faruk Kl2, brahim Uysal3 1Hacettepe niversitesi, 2Adyaman niversitesi, 3Abant zzet Baysal niversitesi

    [email protected], [email protected], [email protected]

    zet: Bu aratrmann amac Madde Tepki Kuramna (MTK) dayal olarak gerekletirilen simlasyon almalarnn

    farkl paket ve programlardan elde edilen sonularnn geerlik ve gvenirlik asndan incelenmesidir. Bu amala R

    program paketlerinden mirt, psych, irtoys ve catIrt paketleri ve MTKya dayal olarak gerekletirilen simlasyon

    almalarnda sklkla kullanlan WinGen programndan elde edilen veri setleri KR20 gvenirlik katsays ve

    amlayc faktr analiz (AFA) sonucunda elde edilen aklanan varyans oran, ortalama faktr yk ve very setlerinin

    boyutluluu asndan karlatrlmtr. Aratrma Monte Carlo simlasyon almas olarak yrtlmtr.

    Aratrmada simlasyon koullar; madde says (20, 30 ve 40), kullanlan MTK modeli (Rasch, 2 parametreli lojistik

    model [2PLM] ve 3 parametreli lojistik model [3PLM]), veri retme program (WinGen 3.1 ile R paketlerinden irtoys,

    mirt, psych ve catIrt) ve rneklem bykl (500, 1000 ve 2000) eklinde belirlenmi olup toplam 135 simlasyon

    koulunda allm ve her bir koul iin 1000 replikasyon yaplmtr. Aratrma sonucunda tm MTK modellerine

    gre retilen veri setlerinin genel olarak tek boyutlu olduu, aklanan varyans orannn Rasch modeli iin 0.30un

    altnda kald, 2PLM iin bir koul hari dier tm koullarda programlarn 0.30un zerinde aklanan varyansa

    sahip olduu, 3PLM iin aklanan varyans orannn genellikle 0.30un altnda olduu, ortalama faktr yklerinin ise

    tm modeller ve programlar iin 0.46nn zerinde olduu gzlenmitir. Gvenirlik katsaylarnn tm koullar iin

    0.75-0.93 aralnda olduu gzlenmitir. Aratrma bulgularna gre R paketlerinin MTK kapsamnda veri retiminde

    simlayon koullarnn byk ksmnda aklanan varyans oran, ortalama faktr yk ve gvenirlik katsays

    asndan daha iyi sonular elde ettii sylenebilir. Bu nedenle MTKya dayal simlasyon almalarnda R

    paketlerinin tercih edilmesi nerilmektedir.

    Anahtar Kelimeler: Simlasyon, Madde Tepki Kuram, Amlayc Faktr Analizi, aklanan varyans, gvenirlik.

    GR

    Simlasyon almalar eitimde ve psikolojide sklkla kullanlan almalardr. Feinberg ve Rubright

    (2016) Applied Psychological Measurement ve Journal of Educational Measurement dergilerinde

    2008-2012 yllar arasnda yaynlanan almalarn %60nn simlasyon almalar olduunu

    belirtmektedir. Simlasyon almalar, veri analizi, istatiksel g ve ampirik almalardan doru

    sonularn elde edilmesi iin kullanlacak yntemlere ilikin cevap bulmada aratrmaclara yardmc

    olmaktadr (Hallgren, 2013). Bu nedenle de aratrmalarda sklkla kullanld sylenebilir.

    Madde Tepki Kuram (MTK) eitim ve psikoloji alannda sklkla kullanlan teorilerden biridir. Bu nedenle

    MTKya ynelik simlasyon almalarna sklkla rastlanmaktadr. MTKya ynelik olarak Monte Carlo

    simlasyon almas yrtmek iin birok program ve programlama dili bulunmaktadr. IRTPRO [(Cai,

    Thissen ve du Toit, 2011), flexMIRT (Cai, 2017), BMIRT (Yao, 2003) ve Mplus (Muthn ve Muthn,

    2012) gibi programlar cretliyken WinGen (Han, 2007) gibi programlar ya da R (R Core Team, 2017) gibi

    yazlm dilleri cretsizdir.

    Aratrmaclar sklkla bu programlar araclyla Monte Carlo simlasyon almalar yrtmekte ve

    aratrma bulgularna gre nerilerde bulunmaktadr. Ancak bu programlarn rettii veri setlerinin MTK

    varsaymlar ve psikometrik zellikleri asndan incelenmedii de aratrmalarda gzlenmektedir. Mevcut

    almada cretsiz olmas ve birok MTK almasnda kullanlmas nedeniyle Wingen 3.1 program ve R

    programndan 4 farkl paket kullanlarak MTKya dayal olarak retilen veri setlerinin yap geerlii ve

    gvenirliiyle ilgilenilmitir. Bylece aratrmalardan elde edilen bulgularn snrlar daha net

    izilebilecektir.

    mailto:[email protected]
  • 18

    Aratrmada MTKya dayal olarak yrtlen Monte Carlo simlasyon almalarnda sklkla kullanlan

    programlardan elde edilen veri setlerinin yap geerlii ve gvenirlii nasldr? sorusuna cevap aran ve

    bu problemle ilikili olarak

    1. Monte Carlo simlasyonlarnda kullanlan WinGen 3.1, irtoys, mirt, psych ve catIrt programlarndan elde edilen veri setlerinin boyutluluu nasldr?

    2. Monte Carlo simlasyonlarnda kullanlan WinGen 3.1, irtoys, mirt, psych ve catIrt programlarndan elde edilen veri setlerinin aklanan varyans oran nedir?

    3. Monte Carlo simlasyonlarnda kullanlan WinGen 3.1, irtoys, mirt, psych ve catIrt programlarndan elde edilen veri setlerinin ortalama faktr yk nedir?

    4. Monte Carlo simlasyonlarnda kullanlan WinGen 3.1, irtoys, mirt, psych ve catIrt programlarndan elde edilen veri setlerinin gvenirlik katsaylar nasldr?

    alt problemlerine yant aranmtr.

    METOT

    Aratrmada Madde Tepki Kuramna (MTK) gre yrtlen Monte Carlo simlasyon almalarnda

    retilen veri setlerinin yap geerlii ve gvenirlii incelendiinden Monte Carlo simlasyon almas

    olarak yrtlmtr. Simlasyon almalar eitim ve psikoloji alannda sklkla kullanlan (Feinberg ve

    Rubright, 2016; Harwell, Stone, Hsu ve Kirisci, 1996) ve olas ihtimallerin deerlendirilmesine imkan

    tanyan almalardr (Gilbert, 1999).

    Aratrmada simlasyon koullar; madde says (20, 30 ve 40), kullanlan MTK modeli (Rasch, 2

    parametreli lojistik model [2PLM] ve 2 parametreli lojistik model [3PLM]), veri retme program (WinGen

    3.1 (Han, 2007) ile R paketlerinden irtoys (Partchev, 2016), mirt (Chalmers, 2012), psych (Revelle, 2018)

    ve catIrt (Nydick, 2015)) ve rneklem bykl (500, 1000 ve 2000) eklinde belirlenmi olup toplam

    3x3x5x3=135 simlasyon koulunda allm ve her bir hcre iin 1000 replikasyon yaplmtr.

    Replikasyonlar sonucunda 135000 veri seti retilmitir. Aratrmada Tek Boyutlu Madde Tepki Kuram

    temel alnmtr. Bu nedenle tm veri setleri tek boyutlu olacak ekilde retilmitir. Ayrca aratrmada

    sadece ikili puanlanan veriler incelenmitir. ok kategorili veri bu aratrma dnda braklmtr.

    MTK temelinde yrtlen simlasyon almalarnda madde parametrelerini dalmlar almalara gre

    farkllamaktadr. Bu almada Bulut ve Snbl (2017) tarafndan kullanlan madde parametrelerinin

    dalmlar kullanlmtr. Buna gre 3 PLM iin a parametresi, ortalamas 0.3 ve standart sapmas 0.2 olan

    lognormal dalm, b parametresi ortalamas 0, standart sapmas 1 olan normal dalm ve c parametresi de

    ekil parametreleri a=20 ve b=90 olan beta dalmndan retilmitir. 2PLM iin c parametresi 0 olarak

    tanmlanmtr. Rasch model iin de a parametresi 1 ve c parametresi 0 olarak tanmlanmtr. Yetenek

    parametresi ise ortalamas 0, standart sapmas 1 olan normal dalmdan retilmitir.

    Veri setlerinin boyutluluunun belirlenmesi iin en kk ortalamal ksmi korelasyon (Minimum Avarage

    Partial [MAP]) analizi yaplmtr. Aklanan varyans oran ve ortalama faktr yk iin amlayc faktr

    analizi (AFA) yaplmtr. Bunun iin en kk kalntlar (minimum residual [minres]) faktr karma

    yntemi kullanlm ve tetrakorik korelasyon matrisiyle analizler yrtlmtr. Ortalama faktr yknn

    belirlenmesinde tm veri setleri tek boyutlu retildiinden tm maddelerin birinci faktre ait faktr

    yklerinin ortalamas alnmtr. Bu ilem her bir kouldaki 1000 replikasyon iin yaplm olup 1000

    replikasyon sonucu elde edilen ortalama deer raporlanmtr. Gvenirlik katsaylarnn belirlenmesi iin

    KR20 gvenirlik katsays hesaplanmtr. Bunun iin her bir bir koul iin retilen 1000 veri setinin KR20

    deerlerinin ortalamas alnmtr. Analizlerin tm R programnda psych (Revelle, 2018) paketi

    kullanlarak gerekletirilmitir.

    BULGULAR

    Aratrma sonucunda elde edilen bulgular Tablo 1de sunulmu olup alt problemlere gre yorumlanmtr.

    Tablo 1. Programlardan Elde Edilen Veri Setlerinin Aklanan Varyans Oran, Ortalama Faktr

    Yk ve Gvenirlii

    Rasch

    Model

    2PLM 3PLM

  • 19

    Mad

    de

    rn

    ekle

    m

    Pro

    gra

    m

    AV

    O

    OF

    Y

    G

    ven

    irli

    k

    AV

    O

    OF

    Y

    G

    ven

    irli

    k

    AV

    O

    OF

    Y

    G

    ven

    irli

    k

    20 500 mirt 0.261 0.508 0.788 0.405 0.629 0.862 0.304 0.544 0.815

    psych 0.265 0.512 0.790 0.403 0.627 0.850 0.286 0.524 0.792

    WinGen 0.246 0.493 0.765 0.299 0.542 0.805 0.267 0.503 0.753

    irtoys 0.263 0.509 0.788 0.399 0.624 0.847 0.289 0.528 0.794

    catIrt 0.263 0.510 0.789 0.399 0.624 0.847 0.289 0.528 0.794

    20 1000 mirt 0.261 0.510 0.790 0.400 0.626 0.862 0.293 0.535 0.803

    psych 0.265 0.513 0.793 0.378 0.608 0.843 0.232 0.470 0.753

    WinGen 0.246 0.494 0.766 0.350 0.588 0.837 0.281 0.519 0.770

    irtoys 0.261 0.510 0.790 0.399 0.625 0.853 0.233 0.470 0.754

    catIrt 0.261 0.510 0.790 0.398 0.625 0.853 0.232 0.469 0.753

    20 2000 mirt 0.243 0.492 0.770 0.374 0.608 0.848 0.290 0.533 0.805

    psych 0.261 0.510 0.785 0.364 0.601 0.829 0.243 0.487 0.771

    WinGen 0.255 0.504 0.775 0.343 0.582 0.834 0.272 0.512 0.766

    irtoys 0.243 0.492 0.769 0.366 0.602 0.830 0.249 0.493 0.777

    catIrt 0.243 0.492 0.769 0.366 0.602 0.830 0.248 0.492 0.776

    30 500 mirt 0.263 0.509 0.842 0.393 0.622 0.900 0.252 0.495 0.835

    psych 0.261 0.507 0.841 0.369 0.602 0.884 0.241 0.474 0.813

    WinGen 0.245 0.492 0.834 0.388 0.617 0.893 0.251 0.493 0.827

    irtoys 0.264 0.511 0.844 0.387 0.617 0.891 0.247 0.478 0.817

    catIrt 0.265 0.511 0.844 0.388 0.618 0.891 0.246 0.477 0.816

    30 1000 mirt 0.269 0.517 0.848 0.367 0.602 0.891 0.258 0.490 0.829

    psych 0.260 0.508 0.842 0.364 0.600 0.881 0.232 0.469 0.803

    WinGen 0.251 0.499 0.839 0.387 0.616 0.893 0.255 0.498 0.831

    irtoys 0.270 0.518 0.848 0.364 0.599 0.881 0.228 0.465 0.799

    cat