Investigating the Language of Engineering Education · 2015-06-17 · Chirag Variawa Doctor of Philosophy Department of Mechanical and Industrial Engineering University of Toronto

Investigating the Language of Engineering Education

by

Chirag Variawa

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy

Department of Mechanical and Industrial Engineering University of Toronto

© Copyright by Chirag Variawa 2014

ii

Investigating the Language of Engineering Education

Chirag Variawa

Doctor of Philosophy

Department of Mechanical and Industrial Engineering University of Toronto

2014

Abstract

A significant part of professional communication development in engineering is the ability to

learn and understand technical vocabulary. Mastering such vocabulary is often a desired

learning outcome of engineering education. In promoting this goal, this research investigates

the development of a tool that creates wordlists of characteristic discipline-specific

vocabulary for a given course. These wordlists explicitly highlight requisite vocabulary

learning, and when used as a teaching aid, can promote greater accessibility in the learning

environment.

Literature, including work in higher education, diversity and language learning, suggest that

designing accessible learning environments can increase the quality of instruction and

learning for all students. Studying the student/instructor interface using the framework of

Universal Instructional Design identified vocabulary learning as an invisible barrier in

engineering education. A preliminary investigation of this barrier suggested that students

have difficulty assessing their understanding of technical vocabulary. Subsequently,

computing word frequency on engineering course material was investigated as an approach

for characterizing this barrier. However, it was concluded that a more nuanced method was

necessary.

iii

This research program was built on previous work in the fields of linguistics and computer

science, and lead to the design of an algorithm. The developed algorithm is based on a

statistical technique called, Term Frequency Inverse Document-Frequency. Comparator sets

of documents are used to hierarchically identify characteristic terms on a target document,

such as course materials from a previous term of study. The approach draws on a

standardized artifact of the engineering learning environment as its dataset; a repository of

2254 engineering final exams from the University of Toronto, to process the target material.

After producing wordlists for ten courses, with the goal of highlighting characteristic

discipline-specific terms, the effectiveness of the approach was evaluated by comparing the

computed results to the judgment of subject-matter experts. The overall data show a good

correlation between the program and the subject-matter experts. The results indicated a

balance between accuracy and feasibility, and suggested that this approach could mimic

subject-matter expertise to create a list discipline-specific vocabulary from course materials.

iv

Acknowledgments

This research was made possible by the invaluable counsel from Prof. Susan McCahan, Prof. Mark Chignell, Prof. Michael Grüninger, Prof. Eunice Jang, Prof. Greg Jamieson, and Prof. Clifton Johnston.

Special thanks to the participants of the studies for their time and experience.

This thesis is dedicated to the following:

My family – Mr. Ravindra Variawa, Mrs. Kavita Variawa, and my brother, Mr. Kunal Variawa.

Professor Susan McCahan.

Friends and colleagues.

v

TABLE OF CONTENTS

1 Introduction ....................................................................................................................... 1

1.1 The Problem ............................................................................................................... 2

1.1.1 Learning Barriers ................................................................................................ 5

1.1.2 Vocabulary Learning .......................................................................................... 7

1.1.3 Types of Vocabulary ........................................................................................... 8

1.2 Framing the Research ................................................................................................. 9

1.2.1 Research Questions ............................................................................................. 9

1.2.2 Theoretical Framework ..................................................................................... 11

1.2.3 Research Strategy .............................................................................................. 11

1.3 Roadmap of the Thesis ............................................................................................. 10

2 Literature Review............................................................................................................ 16

2.1 Universal Instructional Design ................................................................................. 17

2.1.1 Framework and Principles ................................................................................ 19

2.1.2 Criticism of Universal Instructional Design ..................................................... 23

2.1.3 The Implications of UID on the Study of Vocabulary in Engineering Education …………………………………………………………………………………25

2.2 Language Instruction ................................................................................................ 26

2.3 Automated Indexing ................................................................................................. 30

3 Student Self-Assessment Study ...................................................................................... 35

3.1 The Dataset ............................................................................................................... 36

3.2 Overview of the Study .............................................................................................. 37

3.2.1 Methodology ..................................................................................................... 39

3.2.2 Outcomes .......................................................................................................... 41

3.3 Discussion of Study .................................................................................................. 44

4 Frequency Analysis Study .............................................................................................. 47

4.1 Overview of the Study .............................................................................................. 47


5 Automated Indexing and Evaluation .............................................................................. 53

5.1 Artifacts of Study ..................................................................................................... 54

vi

5.2 TF-IDF Algorithm and Modification ....................................................................... 54

5.2.1 Modification of the TF-IDF Algorithm ............................................................ 57

5.3 Computational Approach ......................................................................................... 61

5.3.1 Software Development ...................................................................................... 62

5.3.2 Results using the Modified TF-IDF Algorithm ................................................ 66

5.4 Evaluation Study ...................................................................................................... 73

5.5 Results of the Automated Indexing Study ................................................................ 79

5.5.1 Courses Selected for this Study ........................................................................ 80

5.5.2 Sample Dataset for One Trial ........................................................................... 81

5.5.3 Summary of Quantitative Results for All Courses ........................................... 84

5.5.4 Statistical Analysis ............................................................................................ 89

5.5.5 Special Case: the statistical effect of using a design-heavy exam .................... 91


5.6.1 Correlation ........................................................................................................ 98

5.6.2 Implication of Results on Empirical Process .................................................... 99

5.6.3 Insight on Potential Impact on Teaching and Learning .................................. 100

6 Discussion ..................................................................................................................... 101

6.1 Recognition of Engineering Vocabulary as an Accessibility Barrier .................... 101

6.2 Creation of an Approach to Identify Characteristic Discipline-specific Vocabulary in Engineering Education .................................................................................................. 103

6.2.1 The Role of Technology in Vocabulary Characterization .............................. 103

6.2.2 Empirical Contribution ................................................................................... 103

6.3 Implications of the Approach on Teaching and Learning ...................................... 105

6.3.1 Converging Perspectives from the Literature ................................................. 105

6.3.2 The Development of Teaching Aids ............................................................... 105

6.3.3 Producing a Research-based Artifact of the Application of UID ................... 106

7 Conclusions ................................................................................................................... 111

7.1 Research Contributions .......................................................................................... 111

1. Contribution to Theory ....................................................................................... 111

2. Contribution to the Design of Learning Environments ...................................... 111

3. Important Findings ............................................................................................. 112

4. Contributions with respect to recommendations for future practice .................. 112

7.2 Limitations ............................................................................................................. 112

vii

1. Feasibility vs. Accuracy ..................................................................................... 112

2. Difficulties with respect to Measurement ........................................................... 113

3. Single-word processing ...................................................................................... 114

4. Human Intervention ............................................................................................ 114

5. New words inclusion .......................................................................................... 115

7.3 Implications for Further Research .......................................................................... 116

7.4 Final Word .............................................................................................................. 116

viii

List of Tables Table 1 - Principles of UD and UID ...................................................................................... 20 Table 2 - Ten words and statistical significance as described using ANOVA ...................... 42 Table 3 - Sample wordlist from a second year Materials Science Engineering course ......... 67 Table 4 - Course exams used for the evaluation study ........................................................... 81 Table 5 - Sample trial wordlist from a first-year electrical fundamentals course. ................. 82 Table 6 - Symmetric Measures............................................................................................... 90 Table 7 - Chi-Square Tests ..................................................................................................... 90 Table 8 - Case Processing Summary ...................................................................................... 91 Table 9 - Reliability Statistics ................................................................................................ 91 Table 10 - Hypothesis Test Summary .................................................................................... 91 Table 11 - Symmetric Measures (APS111 omitted) .............................................................. 92 Table 12 - Chi-Square Tests (APS111 omitted) .................................................................... 93 Table 13 - Case Processing (APS111 omitted) ...................................................................... 93 Table 14 - Reliability Statistics (APS111 omitted) ................................................................ 93 Table 15 – Independent Correlations .................................................................................... 94 Table 16 – Inter-rater Correlation for APS111 exam. ............................................................ 95 Table 17 – Effect of comparator sets. .................................................................................... 97 Table 18 - Implications of the research using the framework of Universal Instructional Design ................................................................................................................................... 106

ix

List of Figures Figure 1 - Individualized versus a systemic approaches to increasing accessibility .............. 4 Figure 2 - Adapted Johari Window and invisible/visible learning barriers ............................. 6 Figure 3 - Language types in engineering education ............................................................... 9 Figure 4 - Comparison of the OU and PU scores for sample words ...................................... 42 Figure 5 - Sample data from the Frequency Study. ............................................................... 49 Figure 6 - Major components of the computational approach ............................................... 62 Figure 7 - TF-IDF scores for a sample course. ...................................................................... 69 Figure 8 - Three regions on the plotted TF-IDF scores. ........................................................ 70 Figure 9 - Comparing TF-IDF plots of three courses ............................................................ 71 Figure 10 - Comparing TF-IDF plots of different years ....................................................... 71 Figure 11 - Major components of the evaluation study. ........................................................ 74 Figure 12 - Relationship between quintile and participant-assigned scores. ......................... 85 Figure 13 - Count of participant scores for all exams grouped by exam ............................... 88 Figure 14 - Count of TFIDF binned-quintile scores for each exam ....................................... 88

x

List of Appendices Appendix A1 - ASEE2010 - Literature Review Paper 129 Appendix A2 - IJEE - Student Self Assessment 144 Appendix A3 - ASEE2011 - Frequency Analysis 161 Appendix A4 - ASEE2012 - Automated Indexing Scoping 174 Appendix A5 - ASEE2013 - Automated Indexing Modified Algorithm 184 Appendix A6 - CEEA2013 - Automated Indexing CHE230 Sample Analysis 198 Appendix A7 - ASEE2014 - Automated Indexing Evaluation 200 Appendix A8 - CEEA2013 - Automated Indexing Language Learning 213 Appendix A9 - CEEA2013 - Automated Indexing Oral Presentations 216 Appendix B1 - Input Conditioning Walkthrough 219 Appendix B2 - Input Conditioning Software 223 Appendix C1 - Coding the Modified TFIDF Walkthrough 229 Appendix C2 - Coding the Modified TFIDF Software 235 Appendix D1 - Recruitment Email 240 Appendix D2 - Informed Consent 242 Appendix D3 - Instructions and Scale for Evaluation Study 245 Appendix D4 - SAMPLE wordlist for CIV280 247 Appendix D5 - COMPLETE wordlist for CIV280 249 Appendix E1 - Course-by-course Correlations 260 Appendix E2 – Inter-rater Correlation, and dataset for APS111 course 264

1 INTRODUCTION The work discussed in this dissertation is motivated by the idea that changing the

design of the learning environment can increase accessibility for a broad range of learners,

leading to a more inclusive classroom. By increasing accessibility to instructional material,

instructors can provide a higher-quality learning experience for students and potentially

reduce the need to some extent for individual accommodation. Proactively planning for

students with diverse characteristics and transforming the learning environment to increase

inclusivity enables individuals with differences to be accommodated without additional effort

within that system. This inclusive and systemic perspective on educational improvement is

based on a theoretical framework called Universal Instructional Design. The application of

this framework to engineering education, specifically, the vocabulary used in this

environment, is yet untested. This research tests a specific application using this framework

to increase accessibility to course material.

This research investigates the language used in engineering education. Using a multi-

disciplinary approach based in industrial engineering, the communication interface between

students and instructors is examined to identify and reduce learning barriers associated with

inaccessible vocabulary. Three successive research studies are used to: identify vocabulary

that can act as a learning barrier; develop and refine an automated approach to reduce that

barrier; and evaluate the efficacy of that automated approach.

Some of the vocabulary used in engineering education is domain-specific. This type

of vocabulary refers to words that are technical and specialized to the engineering profession.

The learning of this kind of vocabulary is often an outcome of the educational experience. A

student proceeds through engineering education, the corpus of language common to both the

1

instructor and student ought to converge as the student masters the course content. By

maximizing transparency in identifying these corpora, students are better equipped to

develop a robust professional vocabulary while learning the course content. Additionally,

once instructors have identified the core technical vocabulary required for their course, they

can integrate this list into the instructional material knowing that students are aware of the

authentic language they need to master.

The goal of this study is to develop a tool that can be used to increase transparency in

communication in engineering education. Specifically, this tool should be able to replicate

human subject-matter expertise in characterizing discipline-specific vocabulary used in

instructional material. The novel contribution of this research to the existing literature is the

design and evaluation of an automated process that can characterize technical vocabulary

using engineering assessment instruments as its dataset.

1.1 THE PROBLEM Student differences should be taken into account in the classroom. The existing focus

is often on the individuals with differences and the development of strategies for those

individuals to cope with disabling environments [1-3]. However, it may be that the

environment itself is contributing to disability [4]. As such, changing this environment may

lead to a more accessible and inclusive space [1-8]. One case of a disabling environment is

when a wheelchair user encounters a set of stairs that blocks their path. Another case is when

a learner is unintentionally alienated by the context of the instructional material used to

create an authentic learning experience. A third case is the unintentional use of unfamiliar

vocabulary to communicate ideas in a conversation. In the first case, it is clear to see that the

individual is disabled by their physical environment. In the second case, it may become

2

difficult for an instructor to identify this barrier to accessibility if unaided by the student.

The third case may be completely invisible to the instructor. Individuals in the third case

may feel discouraged to seek clarification of ideas, especially if they feel judged for their

lack of understanding of the conversation. All of these are examples of disabilities where

environmental factors can contribute to lack of access.

If an environment is created such that it accommodates people with a wide range of

abilities, right from the initial design stage, then many disabling situations can be minimized

or designed out, whether or not they are identified a priori [5, 9-11]. The area of disability

studies refers to this as the Social Model of Disability. When disability is seen as coming

from the environment, systems, and attitudes, instead of from the individuals themselves,

then it becomes increasingly possible to create a framework and tools with which

accessibility can be increased [2, 5, 12, 13], and this is visually depicted in Figure 1 below.

This figure represents a shift away from individualized learning strategies to cope with

disabling learning environments. It instead focuses on identifying and mitigating

characteristics of the learning environment as a whole to identify learning barriers, and in

turn, inaccessible education.

The left side of Figure 1 shows individual accommodations, each specific to a

particular user. This shows multiple strategies all with the same goal of increasing

accessibility. On the right is a broad-based systemic accommodation, available to everybody

in the set, with each user making use of it according to their needs. This represents a strategy

that is flexible and can be used as needed to increase accessibility for all users.

3

Figure 1 - From left to right: depicts an individualized approach versus a systemic approach designed to increase accessibility for use in a classroom

In a diverse and inclusive learning environment, instructors see students having a

varied set of characteristics, which is viewed as a normal situation [2, 14-16]. These may

include physical and sensory disabilities, as well as learning disabilities, mental health

challenges, chronic illnesses, and psychological factors including attention deficit disorders

[2, 12-14, 17]. If the instructor adapts the learning material so that accessible technology,

active and participatory methods of teaching, online tools, and flexible instruction are added,

then students can experience a more inclusive learning experience [7, 14, 16, 18-21]. This

allows these students to interact with the material and be more comfortable with their

learning environment, potentially leading to a more effective or higher-quality experience

[14, 18]. As a result, the classroom can become enabling for all students, not only for those

who have an identified disability, and this population can include non-traditional students,

second language learners, and a larger proportion of a diverse student body [22, 23].

4

1.1.1 Learning Barriers Existing literature on classroom inclusivity was reviewed for a conference publication

called, “Design of the Learning Environment for Inclusivity: a Review of the Literature”, and

is reprinted in Appendix A.1.

Learning barriers are roughly classified in two dimensions: physical or non-physical,

and visible or invisible. Physical barriers are features of spaces or buildings that make it

difficult or impossible for people to access, often due to anthropometrical or mobility-related

limitations. Non-physical barriers are obstacles that discriminate individuals based on other

features, like understanding of information or not supporting assistive devices. Visible

barriers are obstacles to access that people can clearly identify in an environment. Invisible

barriers are obstacles that people cannot easily identify, clearly understand, accurately

predict, or reliably characterize and discern from the environment.

This research focuses on non-physical invisible learning barriers in the classroom.

These barriers are characterized as impediments that can reduce accessibility to learning and

reduce student performance. Examples of such barriers may include cultural aspects of the

classroom that impede effective two-way communication, unclear classroom rules, ill-

defined learning outcomes, and so on. As instructors try to make the learning environment

more inclusive and accessible for all students, learning barriers reduce instructional material

that is teachable and learnable.

5

Figure 2 - Shows an adapted Johari Window that suggests decreasing invisible barriers as an approach to maximize what is teachable and learnable in a classroom.

One way of visualizing the invisible/visible dimension of barriers in the context of

teaching and learning is by referring to a schematic called a Johari Window. The adapted

Johari Window in Figure 2 can be used to help frame the research by representing the

interface between an instructor and students. It also shows a relationship between teaching,

learning, and learning barriers. The top-left quadrant represents a case where course material

is teachable by an instructor, and learnable by students, if the interface is void of invisible

barriers. The top-right quadrant represents a situation where students face an invisible

barrier, and this reduces learning. The bottom-right quadrant shows a situation where

invisible barriers are present for both the instructor and the students, and this means that

course material can neither be taught nor learned. The bottom-left quadrant represents a case

where an instructor faces an invisible barrier, reducing what course material can be taught in

the classroom.

6

The arrows show that modifying the visibility of barriers affects what is teachable and

learnable. The dotted line represents implications of what a reduction in invisible barriers for

instructor and students would look like. Decreasing invisible barriers interface increases

what can be taught by instructors, and also increases what students can learn. This window

suggests that it is possible to engineer an approach to increase learning by changing the

visibility of barriers. Specifically, this window is useful to help guide an approach that can

increase learning of a particular aspect of engineering education, vocabulary learning.

1.1.2 Vocabulary Learning Understanding technical vocabulary is a component of the engineering learning

environment. Technical jargon can lead to more accurate and precise communication

especially in professional education, such as engineering. As instructors facilitate learning in

the classroom, teaching vocabulary can be simultaneous with the other instruction being

presented. Specifically, the use of technical vocabulary is pervasive as it aids in creating

authentic contexts, develops deeper meaning, and can be employed for assessment purposes.

The problem is that technical vocabulary is not necessarily knowledge that students bring

with them as they enter an engineering learning environment. Students often have to learn a

new corpus of vocabulary – or add to an existing corpus – to develop a more robust

professional repertoire of terminology. In developing this vocabulary, students may face

invisible non-physical learning barriers.

Vocabulary learning in engineering education may be an invisible non-physical

learning barrier for students. Specifically, a priori language differs from person to person,

and in a diversified student body it becomes increasingly challenging to manage language

learning. Due to existing student characteristics, some students may be more adept at

7

learning this vocabulary than others, whereas others may face learning barriers and not even

know. This may be particularly evident if the required learning is not identified a priori.

Additionally, students may have difficulty self-assessing their mastery of vocabulary used in

engineering education and this reduces the efficacy of learning technical vocabulary.

Individualized accommodation in learning discipline-specific jargon becomes increasingly

difficult to administer with a growing student body. Systemic design change that enables

language learning in engineering education becomes increasingly appropriate. As a result,

this can increase accessibility to course material, in turn increasing inclusivity in the

classroom.

1.1.3 Types of Vocabulary There are different types of vocabulary present in an engineering learning

environment. While some of this vocabulary may be technical, there is other vocabulary that

is not. This research treats the characterization of these corpora in a simplified way;

represented in the diagram shown below. Figure 3 shows that the corpus of vocabulary used

in engineering education categorized into three types. The outer circle represents all of the

language used in the learning environment. This would include both non-technical and

technical words. Examples include: “pilates”, “model”, and “laminar”. Contained within

that larger circle is a subset that contains engineering-specific vocabulary. Examples from

this corpus include, “model”, and “laminar”. Within the engineering corpus, there is a subset

of vocabulary that is discipline-specific. An example from this corpus could include the

word “laminar”. Instructors use a combination of words across all of these subsets because

understanding technical language is often a learning outcome of courses in engineering

education.

8

Figure 3 - Shows that language used in engineering education has a subset that is engineering-specific, a subset of which is discipline-specific

1.2 FRAMING THE RESEARCH The research investigates the language of engineering education, with a specific focus

on identifying and decreasing invisible learning barriers due to vocabulary presently used in

the classroom. The overall goal is to make the discipline specific vocabulary used in

engineering education visible to both the instructor and the students. There are three main

research questions that this research attempts to answer.

1.2.1 Research Questions The research attempts to address the following research questions:

1. Do language-related learning barriers exist in engineering education?

2. If language-related learning barriers exist, can they be characterized?

3. Can a strategy be found or developed to assist in the identification and

characterization of these learning barriers, transforming them from invisible to

visible?

Language in Engineering Education

Engineering Vocabulary

Discipline Vocabulary

9

The first research question is used to help gauge, scope, and focus the research.

Investigating language in engineering education can be a broad area, and so this question is

used to gauge the potential for study while defining the specific area of research if one exists.

The measurement used to answer this question is responses acquired from 40 undergraduate

students. Specifically, participants from a representative sample of the engineering learning

population were recruited to take part in a specially designed study. The study aimed to find

whether the participants encountered learning barriers when given language-samples found in

engineering.

The second research question is an extension of the first, and is used to group

common aspects of language together in an attempt to further define the problem area. The

goal of this question is to define what language-related learning barriers look like. Since

language is broad and dynamic, this question attempts to build on commonalities between

problematic vocabularies to characterize the learning barriers.

The third research question is two-part, and forms the bulk of the research.

Specifically, this question probes whether an engineered tool can be developed to replicate

subject-matter expertise in identifying and characterizing inaccessible vocabulary in

engineering education. This question is integral to the development of a strategy that can be

used to mitigate learning barriers due to language. The objective is to replicate human

expertise to a statistically-significant degree. As such, the expected outcome is likely to have

a tradeoff between accuracy and feasibility, due to the massive dataset of language that exists

in the area of engineering education.

10

1.2.2 Theoretical Framework The theoretical framework used to ground this research program is Universal

Instructional Design (UID), and is explained in greater detail in Section 2.1. UID is a set of

guiding principles which seeks to maximize accessibility to education for the greatest number

of learners possible. The intent of the research program is aligned with this well-cited

framework from the academic literature. The framework itself is cross-disciplinary and is

used in several domains to guide the development of more accessible environments. In this

research program, however, the theoretical framework guides the development of a strategy

used to increase accessibility to vocabulary in engineering education.

1.2.3 Research Strategy Three major research studies are used to structure the research program. The first

study is an investigation of how prevalent the issue of non-physical invisible learning barriers

are, in the context of vocabulary on existing course material in engineering education. This

study utilized forty undergraduate students and several words commonly found on existing

artifacts of the engineering learning environment. The outcomes of this study help to scope

the investigation of language to a more focused area of research.

The second study is a preliminary investigation of an algorithm-based mitigation

strategy which has as its goal to help instructors disclose requisite technical jargon

automatically. This study builds on the outcomes of the first to present a cross-disciplinary

approach in identifying vocabulary that would otherwise be seen as an invisible learning

barrier.

The third study introduces a novel process, based on a modified version of an existing

computational keyword-search algorithm, to automatically generate wordlists of

11

characteristic discipline-specific vocabulary based on a dataset of all existing engineering

final exams at the University of Toronto. This process is coded into a computer program that

attempts to replicate human subject-matter expertise in characterizing vocabulary. The

efficacy of this program is then evaluated using eleven faculty members who are presented

with a sample wordlist from their course. These participants then evaluate whether and to

what degree each of those sample words are discipline-specific, and these findings are

compared to the output from the modified algorithm and computational process developed

previously. The outcome is a measure of correlation between the software and the instructor,

and is used to gauge the efficacy of the novel software program. These studies work together

to build and evaluate a strategy that can identify and potentially mitigate a learning barrier in

engineering education.

The outcomes of the research build on the existing literature and combine approaches

from different domains to produce a theoretical strategy to reduce learning barriers. The

methodology employed in this study is cross-disciplinary and intended to serve as a starting

point for future research in the area of language analysis in engineering education.

1.3 ROADMAP OF THE THESIS This dissertation contains three major research studies, all aspects of which have been

published in conference proceedings and peer-reviewed literature, and serve to guide the

development of a strategy that can increase accessibility in engineering education. After a

literature review section that describes prior art in the relevant cross-disciplinary fields, this

dissertation outlines the main findings from the three components of the research.

Chapter 3 presents the first phase of the research, which is to characterize learning

barriers due to language in engineering education. This area is supplemented by literature

12

written by the author and reprinted in Appendix A.2. This is the first of three major studies

which are conducted to develop a strategy to increase accessibility in engineering education.

Chapter 4 presents the second phase of the research, which builds on the first study,

and investigates frequency analysis as an approach to characterize language. This area is

supplemented by literature written by the author and reprinted in Appendix A.3.

Chapter 5 presents the third phase of the research, which builds on the first and

second studies, and comprises the bulk of the research program. This section discusses the

design of a strategy that is used to identify and characterize inaccessible language found on

existing standardized-artifacts of the engineering learning environment. In addition, this

section also investigates the effectiveness of the strategy developed with respect to how well

it can replicate human-subject matter expertise. This information is used to evaluate the

approach in the context of engineering education, and is used to inform the results section

which follows. The design of the strategy is discussed in depth in Section 5.3.

This chapter is complemented by literature, program-specific information, and

evaluation-specific material in the appendices.

• Literature published is reprinted in Appendix A. The researcher is first-author.

o Investigating the application of the TF-IDF equation, and experimenting

with different algorithms: Appendix A.4.

o Piloting the modified algorithm on a chemical engineering course:

Appendix A.5.

o Applying the modified algorithm based on the TF-IDF equation:

Appendix A.6.

13

o Evaluating the efficacy of the computational approach: Appendix A.7.

o Extending the computational approach to subtechnical vocabulary

learning: Appendix A.8.

o Extending the computational approach to measure language proficiency

in oral presentations: Appendix A.9.

• Supplemental information about the input preparation and software programming

is presented in Appendix B.

o A walkthrough of the input conditioning process is presented in

Appendix B.1.

The Visual Basic .NET program code for the input conditioning

process is located in Appendix B.2.

o A walkthrough of the TF-IDF computational method is presented in

Appendix C.1.

The Visual Basic .NET program code developed for the

computational method is located in Appendix C.2.

• Supplemental information about the evaluation study is located in Appendix D.

o Participant recruitment material is presented in Appendix D.1.

o Informed Consent documentation used in the study is located in

Appendix D.2.

o A complete wordlist produced by the computational approach, containing

all words from an exam, is presented in Appendix D.3.

Section 5.4, presents summarized results. This section contains quantitative

outcomes from the strategy employed to characterize vocabulary, as well as statistical

14

correlations which are used to measure efficacy. These results are supplemented by course-

by-course measurements, reprinted in Appendix E.

Chapter 6 discusses the three research studies and their outcomes in the context of

one another, the literature, and more globally. This section highlights key relationships

between the outcomes of each of these studies, and how those are applied to addressing the

research questions presented in Section 1.2.1.

Chapter 7 presents conclusions and future work, and summarizes the contributions of

the research. The conclusion also discusses limitations of the current work, improvement

strategies, and an outlook on potential future work for researchers in this area. This section

also provides an overview of tangential work that was performed to supplement the doctoral

research performed for this dissertation.

15

2 LITERATURE REVIEW There are several different bodies of literature that are conceptually related to this

research. Specifically, this literature is collected from the areas of universal instructional

design, second language learning, and automated indexing.

This section builds on a peer-reviewed conference publication written by the researcher

at an earlier stage of the research program (reprinted in Appendix A.1, “Design of the

Learning Environment for Inclusivity: A Review of the Literature”). That publication, a state

of the art review of the literature, discusses diversity in learning environments, retention in

technical education, disability studies, learning barriers, and literature relevant to this

dissertation. In that paper, the researcher organizes approaches to mitigate learning barriers

into two categories: those which accommodate individual students to those which change the

learning system. The first category includes one-to-one mentoring and personalized

instruction. The second category includes learning strategies for large groups of students and

changes in the design of the learning environment. In general, there are advantages and

disadvantages to both of these types of approaches and there is no single “optimal” method to

address all learning barriers or all learners in the classroom.

Literature about the design of accessible spaces is relevant to the dissertation, and is

used to contextualize the study of accessible learning environments in the field of

engineering education. Literature about the design of public spaces contains discussion

about Universal Design (UD), one of the schools of thought in the literature about

accessibility. A permutation of UD is Universal Instructional Design (UID), which is the

application of design principles to optimize accessibility in educational environments. These

16

concepts are discussed in the context of relevant criticisms, and are then examined in the

context of this research study about language in engineering education.

2.1 UNIVERSAL INSTRUCTIONAL DESIGN The foundations of accessible learning environments stem from areas like Universal

Design (UD). Universal Design is an approach to making public spaces as accessible as

possible to the greatest number of people possible [24-28]. UD is an extension of barrier-free

design, in that it attempts to reduce barriers while also promoting design decisions that

increase accessibility for diverse populations. Over several years, as the UD philosophy was

integrated into the design of physical spaces, it became apparent that the environmental

changes needed to accommodate people with disabilities was also benefitting people who did

not identify as having a disability already. Recognizing that many accessibility features

could be commonly integrated, UD became a tool to make designs more cost-effective, more

attractive, and more usable by larger and more diverse populations. According to several

authors, universal design can be used as a strategy to counter the presence of invisible

barriers [10, 11, 27-33]. UD is a set of design principles that promote interest and awareness

in proactively identifying and reducing accessibility barriers before they become an obstacle

[32].

Universal design has gained public interest and has prompted significant updates to

laws and regulations. Examples include the Americans with Disabilities Act (ADA) [34-36]

and the Accessibility for Ontarians with Disabilities Act (AODA) [37]. In particular,

standards of accessibility have become integral to the design of public spaces to maximize

usability and functionality for users who may be physically or mentally disabled. Examples

of such accessibility standards include: the number of required accessible parking spaces,

17

curb-cuts on sidewalks, dwelling-related construction features, and so on. From the fields of

design, architecture, and construction, it appears that making such changes to existing

structures for accommodation often occurs at a higher financial cost than to incorporate these

changes into the design from the start [9, 22, 35, 38, 39]. As designers create more

accessible public spaces for people with physical disabilities, there is also much work being

done to make spaces accessible for people with non-physical disabilities: the work in this

area is focused on perception of physical spaces and using multi-modal sensory inputs to

understand one’s surroundings [40-42]. Additionally, the ADA is beginning to increase

accessibility to online environments with design considerations that impact information

transfer, e-business, and also aspects of cyber security [10, 26, 41, 43-47].

The Accessibility for Ontarians with Disabilities Act (AODA) is a more local

application of accessibility legislation that directly affects public spaces and services

including universities [37]. The AODA’s requirements for educational institutions, as

interpreted by the Council of Universities (COU) emphasizes the importance of increasing

accessibility in the classroom [37, 48]. One of COU’s most publicized mandates is to

connect students to resources on campus which work to raise awareness for, and reduce,

mental health-related issues. More recently, the COU and AODA are interested in deploying

online resources to reduce barriers to accessibility. In the context of this research,

understanding language barriers can become a component of COU’s strategy and may

eventually be used to decrease learning barriers for university environments. As such,

institutions are obligated to understand and mitigate learning barriers, and the research being

performed is a step towards this achieving this goal.

18

Universal design has been applied to the area of teaching and learning to maximize

accessibility in learning environments. When UD is applied to education, it can be referred

to as Universal Instructional Design (UID), Universal Design in Learning (UDL), or

Universal Design of Education (UDE), all of which are used interchangeably [49-52]. In the

context of this research study, the phrase Universal Instructional Design will be used. UID

applies the principles of Universal Design to teaching and learning. UID is not just about

accessibility for persons with a disability, but like UD, it is about considering the potential

needs of all learners when designing and delivering instruction. Through that process, one

can identify and eliminate barriers to teaching and learning [53]. This can improve access to

learning for students of all backgrounds and learner characteristics, while maintaining

academic integrity and minimizing the need for special accommodations.

2.1.1 Framework and Principles

Universal Instructional Design is a set of principles that form a practical framework

with the goal of improving learning opportunities. It represents a set of initiatives, principles,

guidelines, and projects that promote and work toward inclusive and equitable access to

learning [14, 22, 23, 27, 54]. In particular, it is a process that involves considering the

potential needs of all learners when designing and delivering instruction or creating

institutional policy and systems.

The goal of UID is to maximize accessibility for all students, including students with

disabilities and differences, in educational environments [23, 53]. In addition to the number

of students who face physical barriers, an increasing number of students are identifying other

types of learning barriers in classrooms. This challenges the existing approach of teaching in

diverse environments [55]. Currently, some of the challenges associated with removing or

19

mitigating learning barriers are managed by individual one-on-one learning support [33, 53,

53, 55]. As the learning population increases and diversifies, this poses a significant strain

on existing resources which in turn necessitates considering systemic changes [55]. UID

attempts to increase accessibility for as many students as possible. Hence, the UID approach

is in contrast to providing accommodations for a specific student - it is a systemic approach

used to increase accessibility in the classroom [55].

UD is a design philosophy for physical spaces and UID is an approach used in

instructional environments. The principles intrinsic to both can be compared in the context

of this research program. Both of the UD and UID schools of thought are grounded in seven

principles which serve to codify design methodologies (see Table 1). This table shows the

principles of UD to the left [25-28, 31, 32, 56-59], and the corresponding principles of UID

on the right [22, 23, 49, 53, 60, 60], as significantly condensed from the literature.

Table 1 - Shows the principles of UD and UID Principle from Universal

Design Principle from Universal Instructional Design

1. Equitable Use Class climate 2. Flexibility in use Interaction 3. Simple and intuitive to use Physical environments and

products 4. Perceptible information Delivery methods 5. Tolerance for error Information resources and

technology 6. Low physical effort Feedback, Assessment 7. Size and space for approach

and use Accommodation

The principles of UD and UID were developed and subsequently refined by several

authors in the field. Ronald Mace is credited as being the founder of Universal Design [57,

59, 61], and founded the Center for Universal Design [33]. Mace and others [25-28, 31, 32,

20

56-59] place high value on the importance of diversity and inclusiveness, and promote

maximizing accessibility to an engineered outcome to the greatest extent possible [33, 59].

Similarly, centres for Universal Instructional Design at several universities developed and

subsequently refined the principles of accessibility as a codified aspect of the curriculum and

learning environment. Initially, these principles were guidelines formed by instructors using

tools that allow for creativity in teaching methods, alternative means of presentation, and

choices for effective assessment in the classroom [23, 51, 52]. Over time, several authors in

the field of instructional design coalesced the ideas that eventually led to the creation of UID

principles, based on research about accessibility in education [22, 23, 49, 53, 60, 60]. Both

design philosophies are roughly comparable to one another and can be discussed in the

context of the research being performed.

Principle #1 removes value judgment from the design. It suggests providing access in

a way that values diversity as a normal part of the environment. It also suggests that

instructors adopt practices that respect both diversity and inclusiveness, and make this an

explicit objective of their teaching experiences.

Principle #2 places an emphasis on the capacity of the design to accommodate a wide

range of individual preferences and abilities. An example of this would be the design of a

warning signal to be both audible and visible. It also encourages regular and effective

interactions between all members of the learning community including all students and the

instructors, and emphasizes the importance of clear and accessible communication. For

example, this principle implies that instructors should be explicit about communicating

learning outcomes and how they will be measured, as well as maximizing use of accessible

language for all people in the learning environment.

21

Principle #3 promotes designs that are easy to understand, regardless of the user’s

experience, knowledge, and language skills. It suggests using instruction that is appropriate

for the level of learner, not necessarily simplifying the content.

Principle #4 suggests that a design should communicate information effectively

regardless of ambient conditions or the user’s sensory abilities. It also suggests using

multiple modes to deliver content, including in-person, online, collaboration, and

independent learning.

Principle #5 suggests that the design should minimize hazards and the adverse

consequences of accidental or unintended actions. In particular, the designer should begin to

predict potential misuses and design to reduce the probability of any associated risks. It also

discourages instructors from using course materials and resources that unintentionally bypass

learning. One example of good practice would be to discourage students from closely-

following a template for an open-ended design report, as this may constrain students from

investigating deeper learning in some cases.

Principle #6 suggests that the design should be useable with minimum fatigue. For

example; incorporating ergonomics principles in the design of products and processes. As

applied to the classroom, means that instruction should minimize unnecessary repetition that

does not encourage new learning or strengthening of what is already learned.

Principle #7 suggests that appropriate physical size and space is provided for

approach, reach, manipulation, and use regardless of the user’s body size, posture, or

mobility. For example; design of a laboratory environment where a user has adequate

uncluttered experimental space. This encourages proactively planning for student needs that

22

are not met by the non-physical aspects of instructional design. Consulting campus resources

such as note-takers, providing course materials in alternate formats, and arranging for other

accommodations for students with known disabilities are all examples of how this principle

is incorporated in the classroom setting.

The principles above show how UD and UID are codified approaches that can be

used to increase accessibility in both physical spaces and the learning environment. These

guidelines all work to improve accessibility in products and environments by specifying

general goals that an engineer and designer ought to work towards [27, 28, 31-33, 57, 59].

These are generally applicable to most engineering designs and they can be applied to other

disciplines as well, including education, as shown by the principles of UID.

2.1.2 Criticism of Universal Instructional Design Though UID appears to be a promising approach, there are criticisms of this

methodology discussed in existing work [14, 29, 49, 53, 55, 60]. It is suggested that UID

falls short of increasing accessibility for everyone when a specific student requires

individualized attention or accommodation. Here, the systemic-approach of UID assumes

that all students are being served by increasing accessibility within a course. However, this

neglects to take into account the repercussions or effects of the holistic approach on an

individual for whom this may not be enough. UID is a step towards making these

environments more accessible, but personalized individual attention would still be required

for some people [55].

Another criticism is that UID simplifies existing learning material. Here, the use of

accessible language and multi-modal instruction is seen as a factor that makes learning

“easy” instead of being rigorous. This criticism is based on the logic that existing material is

23

difficult and that the energy spent in understanding is part of the learning process. UID,

however, suggests that learning material is presented at a level appropriate for the audience,

and not to pointlessly complicate nor simplify the instructional content or its delivery [23,

55]. This may help to promote deeper learning than previously thought because the students

are potentially developing a clearer and more intentional sense of what they are learning.

Further criticism of the UID approach is that it is quite resource-intensive to

implement [2, 16, 35, 62]. In particular, the cost of changing pedagogical approaches and

physical spaces is higher when completed on a large scale than when performed on an

individual basis. Here, the argument is that UID is becoming more necessary because more

students are requiring such additional assistance, and instead of having repeated

accommodations for a growing population, it would be more favourable to increase

accessibility for all students [53, 55]. The additional benefit of this is that accessibility is

increased for students who may not have asked for special accommodation, and as such is

useful to a greater population than initially intended.

This criticism applies to the research study as well. Here, the proposed research

accepts the fact that not all students will benefit by increasing clarity and transparency of

learning outcomes. Some students may still require additional accommodation and

individual learning strategies to cope with specific learning barriers. Reasons for their

unique learning barriers may vary, and as such a system-level approach to increasing

accessibility across the board may leave out areas where specific students still need support.

For example, increasing accessibility to course material by using multi-modal instruction

may make it easier for the majority of students to engage with the material, but it still leaves

out some students who require additional one-on-one remedial support.

24

The second area of criticism misses the point, however, because the intent is not to

simplify the vocabulary used in engineering education, but rather to highlight terms that

students ought to learn in order to develop a robust professional vocabulary. The use of a

UID strategy to make invisible learning barriers visible does not eliminate inaccessible

words, but rather empowers students to learn these words proactively.

The third criticism is also addressed, and this is due to the increasing need to identify

and characterize learning barriers in education that affect students. In particular, legislation

such as the AODA places a high priority on accessible learning environments, and this

motivates the development of strategies to increase accessibility for all students. Though

there are criticisms in deploying the UID system-focused method to investigate invisible

language-related learning barriers in engineering education, there is still significant reason to

pursue this research project through to completion.

2.1.3 The Implications of UID on the Study of Vocabulary in Engineering Education

Universal Instructional Design is used as the conceptual framework for this research

that investigates of language in engineering education. The systems-focused approach of

UID serves to increase accessibility for a broad and diverse population without need to

specifically test the accessibility requirements of each individual within that environment

[55, 63, 64]. This serves the study well because the student population is constantly

changing [55]. In addition, the intention is to study and develop instructional tools that can

work regardless of the institution, and a broad scope would benefit transferability in this

regard. Also, UID serves as a framework for developing a credible research strategy that

focuses on using an engineered approach to design the learning environment for accessibility.

Instead of focusing specifically on each user, the approach targets larger situational factors

25

that affect teaching and learning [53, 63]. Additionally, this encourages inclusivity in the

learning environment by creating an atmosphere where all learners are treated equally rather

than singling-out individuals who require accommodation. This may increase learning

because all learners feel like they are part of a community where everyone has equal access

to course instruction and learning [65]. The theoretical implications of UID to the research

on language in engineering education focus on: the design of a research study to maximize

accessibility to a large and diverse user group; increase teaching and learning in the

classroom; and, promoting the development of an inclusive learning environment.

2.2 LANGUAGE INSTRUCTION

Existing literature in the area of language instruction relevant to this research can be

classified quite broadly into first-language acquisition and second-language learning.

Literature in the area of first-language learning often focuses on the mechanics and the

understanding of the structure and meaning of vocabulary as well as the use of this

vocabulary is synthesizing new meaning [66-68]. There is a substantial amount of work in

this area and it includes material from early childhood development and related studies [66-

69]. Existing research focuses on phonology, morphology, syntax, semantics, and the

development of vocabulary [68, 70, 71]. Though language can be vocalized or presented in

more physical form, the capacity of learning language is based on a syntactic principle called

recursion [70, 72, 72]. There is also work that describes how language is interpreted when

being learned, and aspects that characterize this development of meaning.

A characteristic aspect of language acquisition is the ability to make connections that

appear arbitrary. For example, there is nothing about the word “mouse” that connects to the

26

meaning of this word. Additionally, the combination of multiple symbols and meanings to

produce novel meanings is also an area of research in this field of language instruction [73].

Literature according to Hockett defines this as “productivity” and as a critical element of

human first-language instruction, it is the ability to use an unlimited number of words that are

constantly changing and developing new meaning [74]. The research in first-language

acquisition emphasizes the ability to use a seemingly-unlimited range of vocabulary tokens

and actively produce new meaning [66, 67]. In addition to this work, new and emerging

research speaks to language learning from a biological perspective.

More recent research in the specific area of first-language acquisition also focuses on

how language capacities are developed by young children and whether there is a biological

component to first-language acquisition [68, 69]. In particular, the literature suggests that

first-language acquisition may be partially based on how the human-brain functions innately

versus the language environment in which a person is raised [68-70]. These discussions lead

to questions about what happens when multiple languages are learned.

Second-language learning (SLL) refers to any language learned in addition to a

person’s first language, and is not restricted to the order in which these subsequent languages

are learned. In particular, the field of SLL is related to applied linguistics and is connected to

fields like psychology, cognitive psychology, and education [75, 76]. Two quintessential

papers in this field are P. Corder’s “The Significance of Learner’s Errors” and L. Selinker’s

“Interlanguage”, both of which are pioneering and highly-cited in this area [77, 78]. These

two papers are used as a basis for much other work in this field. Corder’s work examines the

types and causes of language errors and suggests that they can occur not because of

similarities or differences between a learners first and second language, but rather because of

27

faulty inferences about the rules of the new language. In the context of the dissertation, this

means that errors in new language learning (like technical vocabulary learning) are affected

more by clarifying rules of that new language rather than trying to establish a connection

between existing knowledge and the new vocabulary. Selinker’s highly-cited work builds on

initial findings by Weinreich, and suggests that language can, in addition to other things,

become “fossilized” between low and high proficiency [79-81]. In the context of this

dissertation, it means that as students learn technical language they can stop at a “hybrid

point” between not knowing the meaning of a word, and fully understanding the meaning of

a word. Regular and intended use of new language helps to improve proficiency. It also

reduces the stagnation that can occur if words are not used natively during communication.

The implication of this is that if students are aware of the technical vocabulary they ought to

be familiar with ahead of time, then that creates a starting point for instructors to use that

vocabulary naturally in teaching. Furthermore, according to theory, this reduces the

fossilization of language in turn promoting more robust vocabulary development, and

potentially more effective communicators.

At least two major perspectives have emerged since these works and are based

loosely on universal grammar [82-84]. These perspectives include skill acquisition theory

and connectionism. They maintain that learning another language is rooted in how a person

can incorporate the symbolic representation of ideas into their existing knowledge, and

connect those new symbols to the ability to communicate with others [76, 85]. As these

topics emerged, there has been increasing debate on how language is learned.

There is much discussion in the academic literature about exactly how language is

learned. One difficulty is that the multidisciplinarity of this field causes experts from each

28

discipline to tend towards theories that they associate with rather than a larger unifying

understanding of second-language acquisition [85]. However, experts agree that the stages of

second-language acquisition are as follows: preproduction, early production, intermediate

fluency, and advanced fluency [86, 87]. This means that learners gradually increase their

vocabulary knowledge initially using imitation to form short sentences, then to develop

simple conversational sentences, and gradually master vocabulary through repetition and

creative development of ideas using word combinations.

A large differentiating factor of first and second-language learning is that the latter is

influenced by the languages that the learner already knows. The literature describes the

interaction between languages as language transfer, and this can also be influenced by

external non-communication-related factors as well [88-90]. According to Krashen’s theory,

the sequencing found in traditional classrooms to learn new languages may be detrimental to

language development because learners often use a universal grammar model [90-92]. This

is further seen in the literature in the theory of comprehensible output and studies of

bilingualism, where learning additional languages using a sentence-based protocol is less

likely to produce proficient speakers than when using bits of that language [93].

Long’s interaction hypothesis suggests that second-language acquisition is

particularly strong when there is “normal” communication in that second-language by at least

one speaker [94, 95]. These research studies infer that it may be more beneficial to have

students learn additional languages using pieces of language rather than always using them to

develop sentences and longer linguistic artifacts. In addition, it suggests that the instructor

ought to continue speaking in the “new language” to strengthen vocabulary development in

the student population. Further, these findings suggest that teaching students words in a new

29

language can preferentially help them develop an understanding of concepts in that language,

and then those students can gradually increase their proficiency over time [88-91, 93-95].

The theoretical implication is that learning additional languages, such as the

technical-speak used in engineering disciplines, can be made stronger by emphasizing and

encouraging learning of key words, as the instructor continues to use them normally.

Further, the process of understanding language may be just as effective, if not more effective,

when the student learns key pieces of a new language within an immersive context of

introductory learning.

2.3 AUTOMATED INDEXING The literature discussed in this section builds on the literature reviews in academic

papers reprinted in Appendix A.4-A.9.

Language is an ever-changing form of communication that has elements particular to

the domain in which it is used [96, 97]. In the context of engineering education, this means

that the vocabulary can change based on a number of factors including: time, discipline (field

of engineering), intended audience, instructor, technology, situational factors, and so forth.

Other elements of language include structural properties, linguistic roots, presentation, and

rhetorical aspects [98]. For the purpose of this research, the scope of the investigation will be

limited to the vocabulary itself to help bound this very broad area of study. Research in the

area of vocabulary analysis is also quite broad and includes literature about the formation and

use of words, the contextual properties of different types of words, and also about how words

change over time [96-101].

30

Of the many types of vocabulary studies performed, there are a few that are most

directly applicable to the study of vocabulary in the context of engineering education. In

particular, literature in the area of indexing is pertinent to the analysis and of vocabulary to

develop an instructional aid based on principles of universal design and language instruction.

This particular focus became more evident as the research progressed: the start of the

research study did not focus directly on indexing approaches because it was only one of

many approaches under consideration. As the research continued, it became clearer that a

research strategy that included indexing as a vocabulary characterizing tool was of great

importance. As such, the papers provided in Appendix A.4-A.9 include more literature

background about the area of indexing than the publications produced at the beginning of the

study. Indexing appears to provide a computational technique in which to analyze

vocabulary sets that are diagnostic to the disciplines within the field [102, 103], and this is

central to the research being performed.

Indexing is used to characterize information for ease of access. Traditionally,

indexing was used by professions such as library studies to classify books and other

instruments of information [102]. Indexing is also widely used in the broader scope of

communication, to condense volumes of knowledge into meaningful bits of information

[104]. This emphasis on succinctness has an effect that can make processing of multiple bits

of information manageable, especially with limited resources.

The word “index” has multiple definitions across different domains, ranging from

business to mathematics and others [105]. As used in this study, indexing refers to the

collection and characterization of information within documents – specifically, vocabulary.

In the broad field of information studies, indexing is considered one of the foundational

31

elements of research in this area [106-109]. With changing technology, research in this field

has become increasingly based on the technology of the time – from manually indexing and

archiving work to, more recently, search-engine programming. Indexing is critical to the

characterizing of document text [97-99, 110].

In the fields of linguistics and the philosophy of language and education, indexicality

bridges language and learning [111-113]. In the process of characterizing an idea, C.P.

Peirce suggests that the subject becomes a meaningful symbol through the use of accurate

vocabulary [114, 115]. Noted scholar E. Ochs suggests that indexing language can be an

extension of linguistic anthropology, where the inclusion of gender-related mechanics of

language influence greater societal studies [116, 117]. Indexing is also conceptually linked

to semantics, since it allows researchers to better-understand the area of communication and

its use to convey meaning [111].

Research in the area of indexing includes studies about deixis and deictic terms,

which have meanings that vary with contextual elements (examples of which include, “now”,

“here”, and “I”). Authors have also performed extensive studies on the pragmatic aspects of

indexing [108, 118]. Specifically, the Peirce Trichotomy of Signs discusses sign-relations

and the bases being indexed [111, 114, 115]. Using more comprehensive linguistics-based

approaches, models, and research, classifying language can be performed in terms of tokens,

symbols, and arguments, as well as sign-to-object relationships [119-121]. This research also

discusses referential and non-referential indexicality, orders of reference, deference and

interlocutor studies, and extensions to other ontological areas of research [120-126]. Though

there are several highly-cited authors in this field, much of the work in these areas can be

traced back to researchers and philosophers that include: Y. Bar-Hillel, R. Lingens, J. Locke,

32

U. Eco and J. Lotman [127-131]. Their combined works include research on pragmatism,

semiotics, semantics, literary theory, cultural and social-discourse. Much of these areas of

research are linked to indexing studies and characterizing vocabulary, which are, in turn,

central components of language, learning, and social development.

Automated indexing is the use of assistive means to characterize pieces of

information within datasets. In the context of this research study, automated indexing refers

to using a computer program to characterize vocabulary used in engineering education. With

advances in technology, computers are becoming more capable of processing larger datasets

and utilizing more complex algorithms in a reasonable run time to identify key terms in

documents [132, 133].

A pioneer in the field of automated indexing and information studies is Gerard Salton.

Salton’s work is critical to this research study for a number of reasons, the most important of

which is his contribution to the development of algorithms that characterize words [132, 134,

135]. Specifically, he was involved in building a vector space model for information

retrieval - and subsequently the Term Frequency-Inverse Document Frequency equation - in

the field of computational linguistics. The algorithm which employs this equation is

modified for use in the current research. In Salton’s model, both documents and queries are

represented as vectors of term counts, and the similarity between a document and a query is

given by the cosine between the term vector and the document vector [134-140]. Though the

application of this is intended for retrieving information from a large dataset, it can also be

used for automated indexing and summarization of vocabulary.

Other authors in this field have contributed to research in the area of automated

indexing, and have evaluated the effectiveness of different algorithms. Some of these

33

approaches include: Kullback-Leibler divergence, latent semantic analysis and indexing,

singular value decomposition, multiword, correspondence analysis, and latent Dirichlet

allocation [141, 141-146]. These approaches are based on computationally decomposing

relationships between queries and datasets, in an attempt to improve automated information

retrieval and indexing.

An advantage of using automated indexing over unassisted indexing is that it uses the

processing power of a computer to characterize language as it evolves over time [147, 148].

In particular, advances in computer technology have made it possible to rapidly mine large

sets of vocabulary data for characteristic vocabulary using an algorithm-based computational

strategy. The increased speed and precision of this computation enables researchers to

investigate language quantitatively, to help optimize language learning to promote a more

accessible learning environment. As a result, research in the area of automated indexing is

an important component in the investigation of language in engineering education.

34

3 STUDENT SELF-ASSESSMENT STUDY This dissertation discusses three research studies performed sequentially in the

investigation of language in engineering education. These studies are: ‘student self-

assessment’ (Chapter 3), ‘word frequency analysis’ (Chapter 4), and ‘automated indexing

and evaluation’ (Chapter 5). The first study, student self-assessment, was used to scope the

investigation of engineering language, and gauge the pervasiveness of vocabulary-related

learning barriers currently present in the classroom. The second study, word frequency

analysis, was used to investigate word-frequency as a technique that can characterize text

across datasets. This study also helped identify research in computer science and tested its

applicability to engineering education. The third study, ‘automated indexing and evaluation’

was about the design, development, and testing of a novel approach to characterizing

document text in engineering education. All three of these studies have been published (see

Appendix A for reprints).

The first study, which is the focus of this chapter, shows the research on student self-

assessment of vocabulary proficiency. This study was conducted early in the research

program and was published in the International Journal of Engineering Education, a reprint

of which is in Appendix A.2. This chapter is heavily based on that publication, and draws

from the material contained therein.

The purpose of the student self-assessment study was to gauge whether language-

related learning barriers and single-word understanding are existing problems in engineering

education, and if these problems could be characterized. This and subsequent studies relied

on a common dataset that contained examples of language used in engineering education. As

35

such, identifying an artifact of the engineering learning environment that is available from as

many courses as possible was essential.

3.1 THE DATASET Investigating language in engineering education is a broad field of research. One

reason for its breadth is due to the abundance of language in a learning environment. There

are many ways and mediums that language permeates the instructor/student interface. From

traditional “lecturing” environments to more textbook-based approaches, language is a key

component of learning in the classroom. Navigating the dataset of language in engineering

required a clearly defined scope, because of the sheer quantity of data available. In order to

define this scope, it was important to select an artifact of the engineering learning

environment that was common across many courses and readily available. It also had to be

an artifact that accurately captures the language used in engineering education.

Engineering courses have many documents associated with them: textbooks,

handouts, assessments (e.g. assignment instructions and tests), syllabi, and so on. These

documents are not necessarily representational of the entire course, but instead are a snapshot

of a particular aspect or learning outcome of a course. Syllabi are common, can provide a

broader understanding of a particular course, but are not necessarily indicative of teaching

materials. Additionally, the discipline-specific vocabulary in them might be sparse. In

particular, they are usually employed to address course content at a meta-discourse level.

One of the only written documents beyond the syllabus that is common to the majority of

engineering courses is a final examination. These are documents that provide a summative

encapsulation of course content, in a medium that is closely-supervised and often carefully

constructed. The main reasons for using final exams as the artifact for study in this research

36

program are that they are comprehensive, written carefully, are relatively standardized in

length, are intended to be interpreted without additional assistance, and that they represent a

substantial available dataset of language in engineering education.

Final examinations at the University of Toronto were chosen for several reasons.

First, the database of final exams is readily available and in electronic format at the Faculty

of Applied Science and Engineering at the University of Toronto. At this institution final

exams from previous years are available on a publicly-accessible website so that students can

use them for study purposes. The electronic-format of these exams allows for text-analysis

using software programs that can identify strings of characters as words, making it feasible to

perform research on large quantities of written language. Second, the rules for administering

the exams indicate that students are not able to access assistance during an exam, which

means that they must rely on their a priori vocabulary to make sense of the questions and the

vocabulary therein. And as a critical assessment in a course, the exam should be testing the

student’s understanding of the course concepts comprehensively rather than the student’s

vocabulary (unless vocabulary knowledge is a defined learning outcome). Finally, every

undergraduate-level final exam is the same duration, 2.5 hours, which allows for some

common basis of comparison.

3.2 OVERVIEW OF THE STUDY The first of three sequential studies, the “self-assessment study”, investigated the

prevalence of language-related learning barriers in engineering education, and was published

in the academic journal paper reprinted in Appendix A.2. There were a number of key

outcomes from this study that formed the foundation of our understanding moving forward

into the primary research.

37

Language used in engineering course materials can potentially be a barrier to learning

and inclusivity because students may perceive the meanings of words differently: this

variance could be attributed to cultural, technical preparation, and linguistic differences

among learners. This perception of vocabulary changes with education, experience, and

other factors. In an engineering classroom, however, there needs to be a convergent

understanding of language so that the course content can be interpreted accurately [149].

Basic TOEFL exams and English proficiency tests are not calibrated to gauge the cultural or

engineering-specific technical components of language used [150]. Furthermore, if students

cannot accurately assess their existing understanding of words, then it becomes increasingly

difficult to build a converging corpus of language used to communicate in the classroom

[149, 150].

Vocabulary used in final exams plays an important role in accurately assessing

student performance. If instructors use vocabulary that is not understood by the student,

then the assessment changes from testing course concepts to testing understanding of

vocabulary. As a consequence, the validity of engineering examinations may be

compromised when non-technical and inexplicitly-defined vocabulary is tested in addition to

course learning objectives. Specifically, using inaccessible vocabulary on final exams would

mean that the assessment no longer exclusively tests what it purports to test: students’

mastery of course concepts. Instead, the exam is now also testing whether students

understand this additional vocabulary used to contextualize course concepts.

The existing strategy used in a large first-year design course at the University of

Toronto is to use a word list, which is provided to each student prior to each test in this

course. This word list contains all of the infrequently used words (i.e. words such as “and”,

38

“the”, “are”, etc. are not included) that appear on that particular test. This word list is then

padded with some additional vocabulary and then alphabetized so that the questions on the

exam are not apparent from the words present on the list. The intent is to give the students an

opportunity to gauge their own level of understanding of the test vocabulary beforehand, and

if required, consult information sources to correct any gaps ahead of time. This strategy

allows the instructors to contextualize questions and use accurate, authentic vocabulary,

including engineering terminology. However, this approach is predicated on the assumption

that given a list of words, students can correctly assess their level of understanding of these

words [64].

These reasons motivated the study of student self-assessment of vocabulary

proficiency, as it is important to gauge what students think they know versus what they

actually know. Understanding this gap, if one exists, would help gauge the severity of

invisible learning barriers due to language in engineering education. A significant difference

between perceived understanding and actual understanding would indicate the presence of an

invisible learning barrier, and this would provide scope for the research program.

This study was used to gauge if students can accurately self-assess their

understanding of vocabulary on exams. In particular, this study tested whether vocabulary

understanding is a ‘visible’ or ‘invisible’ learning barrier from the learner’s point of view.

Better understanding this learning barrier, if it were to exist, would provide useful data to

help further investigate language in engineering education.

3.2.1 Methodology To carry out this study, an ethics protocol was established and approved by the Board

of Ethics at the University of Toronto. Then, posters and other signage were used to recruit

39

forty undergraduate engineering students of diverse (self-reported) cultural backgrounds and

proficiencies in English (including native speakers).

The study tested participants’ understanding of ten words that might appear on an

undergraduate engineering exam. These words were chosen from a dataset of final

engineering exams across all disciplines and years, with a focus on selecting words that were

very different from one another. For example, the words “car” and “truck” would be highly

similar, but the words, “car” and “ratatouille” are different. The criteria used to select the 10

words used in this study was to generate a list of 30 words used at least once in any existing

engineering exam. The next step was then to remove 20 words from that list which may

appear to be similar, until only 10 dissimilar terms remain. These ten words were given to

each participant in alphabetized order, and shown in the left column of Table 2.

The participants’ task was to rate, quantitatively and qualitatively, their perceived

understanding (PU) of each of the ten words. Each participant was asked to assign a

numerical value from 1-5 for their self-assessed understanding for each word using a scale.

If the student believed that they were very proficient in understanding that word, a high PU

score was assigned (“5”). If the student believed that they were not as proficient in

understanding that word, a low PU score was assigned (“1”). Supplementary information

about this scale is presented in Appendix A.2. Each participant was also asked to provide

synonyms and/or definition(s) to each word, to provide “evidence” for substantiating their

numerical understanding score.

The second task was to develop an observed understanding (OU) score that would

identify what students actually know. For this step, the experimenter used a reference source

to characterize the qualitative responses received by each student. Specifically, each

40

student’s explanations and synonyms for each word were compared to the Oxford dictionary

of English, an authoritative standard for word definitions. If the student-provided synonyms

and explanations were sufficiently close to the one provided in the dictionary, then a high OU

score (“5”) was given to that word. If not, then a low OU score (“1”) was assigned along a 5-

point scale.

3.2.2 Outcomes This study produced 800 data points in total (400 OU and 400 PU scores), which

were then analyzed using a statistical method called an ANOVA. The quantitative

similarities and differences between student perceived-understanding words were compared

to their observed-understanding. Some of these results and analyses of these data are shown

in Figure 4 and Table 2, with additional details in the journal article reprinted in Appendix

A.2.

The ten words and the statistical significance (ANOVA) are presented in Table 2.

This table also shows the means and standard deviations for the OU and PU scores, as well as

the difference between the two, and the t-test results for each word. The results indicate that

“bonnet”, “bungalow”, and to some degree “Jell-O” are self-assessed accurately. These

words do not appear to be an invisible learning barrier because they are accurately assessed.

These terms also had minimal variability. In addition, the data shows that though students

may be are unfamiliar with these terms, they recognize this lack of familiarity; it is a visible

learning barrier to them. These results also suggest that the other words are not assessed

accurately, and this is shown by the values of the t-tests.

41

Table 2- Shows the ten words and statistical significance as described using ANOVA

Word Means Stdev PU-OU Means

PU-OU Stdev

OU/PU t-test

Bonnet PU 2.1 1.582

.275 1.062 0.87 t(39)=1.64, p=.109 OU 1.83 1.81

Bungalow PU 3.25 1.565

.175 1.338 0.95 t(39)=0.83, p=.413 OU 3.08 1.94

Fax PU 4.03 0.800

.200 0.883 0.95 t(39)=1.43, p=.160 OU 3.83 0.747

Feasible PU 3.93 0.797

1.375 1.125 0.65 t(39)=7.73, p=.000 OU 2.55 1.011

Field PU 4.13 0.686

1.250 1.056 0.70 t(39)=7.49, p=.000 OU 2.88 0.822

Jell-O PU 3.73 1.219

0.250 0.742 0.93 t(39)=2.13, p=.040 OU 3.48 1.585

Mold PU 3.03 1.310

0.400 1.105 0.87 t(39)=2.29, p=.028 OU 2.63 1.462

Propagate PU 2.88 1.285

0.900 1.336 0.69 t(39)=4.26, p=.000 OU 1.98 1.544

Succinct PU 1.95 1.974

0.475 1.132 0.76 t(39)=2.65, p=.011 OU 1.48 1.853

Tolerance PU 3.90 0.672

2.075 1.385 0.47 t(39)=9.48, p=.000 OU 1.83 1.174

Figure 4- Comparison of the OU and PU scores for two sample words, Bungalow (left) and Tolerance (right). The vertical axis represents the summarized total of each score. This is reproduced from the literature in Appendix A.2.

42

Figure 4 illustrates an example where a technical term can be an invisible barrier to

learning. The left side of this figure shows a bar chart of OU and PU scores for the word,

“bungalow”. The right side of this figure shows a bar chart for the word, “tolerance”. When

these two sample words were compared, the difference between OU and PU scores for

“bungalow” was less than “tolerance”. This shows that the word “tolerance” could be

inaccurately self-assessed by the students, when compared to the word “bungalow”. Though

this is a snapshot of the words used in engineering education, it shows that some words can

potentially have variations between OU and PU assessments. Overall, data collected from

the study, showed that word mastery is non-uniform, and that PU is almost always ranked

higher than OU.

An accurate self-assessment would mean that the OU and PU scores would be identical

(OU-PU=0). However, the data showed that for this study, students correctly self-assessed

their understanding only 34.5% of the time, overrated their understanding 52.8% of the time,

and underrated their understanding 12.8% of the time. Additionally, there were noticeable

differences between words. Words that had an OU/PU ratio close to 1, as seen in Table 2,

included words like, “bungalow”, “Jell-O” and “fax”, all of which were present on at least

one engineering exam in the dataset. Moreover, the OU/PU ratio illustrated that these 40

participants were less likely to correctly self-assess their understanding of more technical

words such as “tolerance” and “propagate”. This is important because, although words like

“bonnet” had a low overall PU and OU, students were apparently aware of their lack of

understanding which made this word a visible learning barrier for them. The student’s

perception of understanding technical words, however, were often overrated, indicating that

this was an invisible learning barrier.

43

From this study it was determined that undergraduate students do face vocabulary-

related visible and invisible learning barriers in engineering education. In particular, words

used on existing engineering exams may be incorrectly understood by students, and

sometimes students are unaware of their lack of understanding. Words that appeared to be

technical (like “tolerance”), were less likely to be identified as unknown to the student.

Students are therefore less likely to seek assistance for these types of words, because they

think they know what they mean. This describes a “blind spot” where students are falsely

assuming their mastery of technical vocabulary, creating an invisible barrier to learning. In

contrast, words that appeared more cultural (like “bungalow”), were more accurately self-

assessed and therefore are a less significant learning barrier. As described in Section 1.2, the

approach employed was to increase visibility of invisible learning barriers, and as such,

increasing the visibility of learning barriers associated with technical vocabulary. Therefore,

it is particularly important to focus on the development of technical vocabulary as part of the

learning experience, as this data shows that it is a valid starting point for further research in


3.3 DISCUSSION OF STUDY The outcomes of this study showed that vocabulary characterization in engineering

education is a topic area that needs to be investigated further. The analyses of the data

suggested that since students are unable to accurately self-assess their understanding of

vocabulary used in engineering education, there is a need to focus on developing robust

language skills. Here, the goal is making terminology that is in the student’s blind spot, or

areas of inaccurate self-assessment, more visible. Specifically, this study helped scope the

investigation of language in engineering education to vocabulary that appears to be

44

“technical”, as this is an area where students cannot accurately see their deficiencies in

learning.

In the larger scope of the doctoral research, this study focused the research to a

specific type of vocabulary used in engineering education: technical vocabulary. The study

reinforced the need to develop mitigation strategies to identify and eliminate invisible

learning barriers, and connected that to technical vocabulary commonly used in engineering

education. The resulting journal paper also situated the research in the international scholarly

community and set up an expectation to follow up with the development of strategies to

reduce and eliminate learning barriers due to inaccessible vocabulary in this area.

In the context of investigating language in engineering education, this study suggested

a focal point where instructors can concentrate instructional efforts to improve learning in

their classroom. By drawing attention to invisible learning barriers, instructors can educate

students to master course concepts more thoroughly. As mentioned in Section 1.1.1, and the

adapted Johari Window concept, instructors can optimize teaching and learning by increasing

the visibility of learning barriers in education. Now that the study had identified one specific

invisible learning barrier due to language – technical vocabulary – the researcher could now

investigate ways in which to make that barrier more visible to students and instructors.

The student self-assessment study also provides some insight into the methodology

that could be employed to further investigate the language of engineering education. This

study suggested that we investigate the development of an objective approach in

characterizing vocabulary, so that invisible language barriers in engineering education can be

made clear. In general, this study helped to scope the larger problem of investigating

language in engineering education to a set of vocabulary that needs focused attention, while

45

informing a potentially more-reliable approach for further investigation. Specifically, this

study helps inform a strategy that focuses on an identified invisible vocabulary-related

learning barrier prevalent in engineering education, and that is to characterize the vocabulary

present on engineering exams.

46

4 FREQUENCY ANALYSIS STUDY The frequency analysis study was conducted to investigate an experimenter-

independent and computationally-based approach to characterizing language in engineering

education. Chapter 4 is based on this study, which was published in the proceedings of the

American Society for Engineering Education (ASEE), and is reprinted in Appendix A.3.

Specifically, the purpose of this study is to gauge if word frequency and computational

linguistics can be applied to further research in the context of investigating language in

engineering education. Additionally, this study further informs the development of an

automated tool that instructors can use as an identification strategy for invisible language

barriers in this field.

4.1 OVERVIEW OF THE STUDY Investigating frequently and infrequently used vocabulary, by means of word-

frequency analysis, informed a research direction for characterizing language in engineering

education. This approach provided some insight on the issue of inaccessible vocabulary used

in engineering education, and a potential application of an automated approach in identifying

this learning barrier.

As stated in the publication, the question was whether word familiarity could be

correlated with word frequency on engineering exams. The expectation is that familiar

words are used more frequently, and are part of natural “everyday” language. The first step

in this investigation was analyzing the frequency of words on engineering exams [151].

The methodology used the same standardized database of documents used throughout

the research (i.e. final exams), and calculated word frequency for each of those documents.

47

The output was a ranked word-frequency list for each input document. The results were

compared and contrasted with theory from the literature.

As stated in the publication, some analysis techniques in the area of vocabulary

frequency-analysis are presented by C.J. van Rijsbergen, who described using of Zipf’s Law

to understand the statistical distribution of words in language. Zipf’s law states that the most

frequent word in an article of text would appear twice as frequently as the second most

frequent word, and four times as frequently as the third most frequent word, etc. Thus, the

expected result of the frequency analysis, if natural “everyday” language was used, is a

hyperbolic curve with a narrow range of frequently-appearing words and broad range of

infrequently occurring words [151].

The frequency study analyzed undergraduate final exams: a closely-supervised

assessment common in engineering education with a substantial volume of vocabulary to be

used as a database. Nine undergraduate first-year final exams, most taught by engineering

faculty, were used. These exams were:

1. Calculus-I

2. Calculus-II

3. Linear Algebra

4. Physical Chemistry

5. Engineering Strategies and Practice I (Engineering Design and Technical

Communication)

6. Introduction to Materials Science

7. Fundamentals of Computer Programming

8. Electronics Fundamentals

48

9. Mechanics (Statics)

After acquiring electronic copies of the exams, the text contained in each document

was extracted and processed through word-frequency software. This procedure was

completed by manually copying all of the text in each document into a licensed program

called, “Hermetic Word Frequency Counter Advance v.12.45”. This produces data which

can be transferred to Microsoft® Excel spreadsheets for analysis, two of the 9 courses are

shown in Figure 5, with the full dataset presented in Appendix A3 [151].

Figure 5- Shows sample data from the Frequency Study, and is reprinted from the paper in Appendix A.3. This shows that language from a Mechanics (Statics) course, above, more

closely follows a natural language frequency distribution than the language from an Engineering Design and Communication course, shown below.

4.2 DISCUSSION OF STUDY The analysis of the data provided some insights into the characteristics of the

vocabulary used on engineering exams. The analysis focused on three areas,

49

1. the distribution of words on exams;

2. the relationship between the vocabulary used on particular types of exams and natural

language;

3. the relationship between the results and previous literature to understand how a proxy

system for familiarity might be developed.

Investigating the distribution of words on exams shows that the occurrence of words

which people might assume are very familiar, such as “name”, “clear”, or “length”, are not

particularly frequent (nor consistently infrequent). In addition, the data shows that

mathematics exams generally have fewer unique words than other exams. It was found that

the correlation between the word-frequency and word-rank, based on frequency, is not linear

but rather hyperbolic as seen in Figure 5. This follows Zipf’s law, which is a theorem from

computational linguistics that characterizes the frequency of word use in natural language

[152]. In applying existing theory to this study, the data suggests that exams from traditional

“fact and principle”1 engineering courses that are heavily math-based contain a word-

frequency distribution less typical of natural ‘everyday’ language than design courses, for

instance. This data makes sense – exams from design courses may tend to have a greater

amount of contextual information and writing, which is closer to natural language. Calculus

courses, in contrast, may have writing that is less characteristic of natural ‘everyday’

language on assessments. By seeing this distinction replicated in a quantitative format,

namely through frequency analysis of final engineering exams, it suggests a research

direction for further characterizing vocabulary in engineering documents.

1 “Fact and principle courses” is a term coined by L. Dee Fink, to describe “traditional” engineering courses like thermodynamics, electrical fundamentals, etc.

50

The frequency analysis of words in the context of engineering education is a valid but

largely untested area of research in computational linguistics. In the context of van

Rijsbergen’s work [153], the experimenter can see that further permutations of quantitative

measurement must be performed to better characterize vocabulary on documents [151].

Using frequency analysis alone is not enough to discern between different vocabulary types,

other than to gauge whether the document is written using natural language or not.

In order to characterize the kind of vocabulary contained in these documents, more

advanced approaches need to be used – and frequency analysis can be used as a foundation

upon which these approaches can be built [151]. Further discussion suggested that

contextual information is a key element to being able to discern and characterize vocabulary

in documents. Specifically, word frequency alone is not an accurate approach to understand

key words or degree of familiarity, but coupling word frequency with contextual information

about the document and words being analyzed can help better distinguish characteristic

language. The research from this study suggests that comparing individual exams,

differentiated based on discipline, may yield a quantitative understanding of vocabulary.

This comparison may identify how a given exam or group of exams compares to the general

characteristics of vocabulary used in these types of materials.

In general, this study concluded that computational frequency analysis of single-

words is an approach to understanding vocabulary in engineering exams, but that a more

nuanced approach is necessary. Based on the data produced using a simple word-frequency

approach, it appears that both infrequently and frequently used words appeared just as likely

to be used in natural language. Building on the previous study, this shows that frequency-

analysis alone is not enough to identify this specific kind of invisible learning barrier in

51

engineering education. Moreover, this study informs a research direction that builds on

frequency-analysis using a more advanced and context-aware quantitative computational

approach.

At the conclusion of this particular study, it was found that a more nuanced approach

was needed to accurately characterize the vocabulary on engineering exams. Additionally,

the outcomes of this study suggested the inclusion of contextual data to more accurately

characterize vocabulary on exams. This experience was critical to the design of the research

study discussed in Section 5.3.1.

52

5 AUTOMATED INDEXING AND EVALUATION Theory and literature from the area of information retrieval and automated indexing,

(see Section 2.3), informed a methodology that builds on the research studies discussed in

Section 3 and Section 4. The outcomes of the first study - the student self-assessment study

- suggested that technical words can be invisible barriers to accessibility, and need to be

explicitly identified in an instructional environment to promote more accurate vocabulary

learning. This led to the second study – the frequency analysis of words study. The second

study added to the previous knowledge by investigating a computational approach that

compared documents based on frequency analysis, to see if it can be used to quantitatively

characterize words in engineering education. Though the outcomes were not able to discern

technical vocabulary from other types of vocabulary, it was an important starting point to

begin designing a more robust computational approach to characterizing vocabulary.

The two previous studies, and theory from the literature on automated indexing (see

Section 2.3), suggest a methodology that is based on a more advanced approach than just

frequency analysis. Ideally, the goal is to design and evaluate an automated approach that

can mimic subject-matter expertise in identifying discipline-specific vocabulary in

engineering documents, yet remain flexible enough to account for changing language over

time and across contexts. This methodology is a computational approach that identifies

keywords on input documents based on more advanced mathematical word-frequency

calculations by comparing to contextually-relevant datasets. This is the basis for the third

study, the automated indexing and evaluation study.

The automated indexing and evaluation study is organized chronologically. Section

5.1 specifies the input dataset used for this study. Section 5.2 discusses the theory used to

53

characterize vocabulary on engineering exams. This study has two components: the

computational approach (Section 5.3), and the evaluation of the computational approach

(Section 5.4). The results of this study are presented Section 5.5, and discussed in the

context of the literature in Section 5.6.

5.1 ARTIFACTS OF STUDY The input selected for the automated indexing approach are a standardized artifact of

the undergraduate engineering learning environment: written final examinations. This

dataset is the same as that used in the second study, the Frequency Analysis Study (see

Chapter 4 and Appendix B.1-B.2). As mentioned in Section 3.1, final examinations at the

Faculty of Applied Science and Engineering at the University of Toronto are available in

electronic format, with the text being readable by a computer string-recognition algorithm.

Through the use of a PDF-to-TXT program used by the researcher for the second study,

described in Section 4.1, 2254 exams were converted to a text-only format. The researcher

created a text-cleaning software program that eliminated special characters and non-sensical

terms (words with numbers, etc), which is described in Appendix B.1-B.2. The total word

count of these documents are approximately 22.5 million words.

5.2 TF-IDF ALGORITHM AND MODIFICATION The approach used in this investigation is the Term Frequency Inverse Document

Frequency (TF-IDF) algorithm, which mathematically determines key terms on a document,

using a comparator set of documents. This algorithm is discussed in Section 2.3 in the

context of the existing literature, and in papers by the author (the proceedings of CEEA and

ASEE in Appendix A.4-A.7). The TF-IDF equation is written as follows:

TFIDF = TF × IDF

54

where

TF = �# of occurrencestotal # of words

�in a single target document

And,

IDF = log �# of documents

# of documents containing the word �in a set of comparator documents

The TF counts the number of occurrences of a particular word, and divides that

number by the total number of words in the target document, which is a simple measure of

frequency. The IDF is a measure of how characteristic a particular term is within a set of

comparator documents. It is calculated by dividing the total number of documents by the

number of documents in the set which contain at least one instance of that term, and then

takes the logarithm of this fraction. The logarithm, regardless of base, constitutes a constant

multiplicative factor towards the overall score.

The TF-IDF formula multiplies the TF and the IDF together and attaches the resulting

score to each unique word in the target document. A high TF-IDF score means that the word

is characteristic to the target document, whereas a low TF-IDF score means that the word is

not characteristic to that target document. For example, based on Zipf’s Law and the data

shown in the previous section, certain words appear more than once in a document and they

tend to be words like “the” and “a”. These words are likely to appear just as frequently in the

comparator set. As a result, the TF-IDF score for these words tend to be low.

The comparator set of documents used for the IDF score can be selected based on:

year of the exam, discipline, instructors, etc. The choice of comparator set changes the TF-

IDF score. Consequently, one benefit of this method is that new exams added to the dataset

55

affect the TF-IDF scores of existing words. This helps address the issue associated with

evolving language by reducing the stagnation of a vocabulary list that can occur with a

constantly-aging dataset.

The TF-IDF value is dependent on the degree to which the TF and IDF terms are high

or low. The TF is high when a term is frequently used in a target document. The TF is low

when the term is infrequently used in a target document. The IDF, by contrast, is high when

a word is infrequently used in a comparator set, and low when a word is frequently used in

the comparator set. Summarizing, the TF-IDF score is:

1. highest when a word is frequently used in a target document, but infrequently

used in the comparator set. This means that the word is characteristic to the

target document;

2. middling when a word occasionally appears in the target document and also

occasionally in the comparator set;

3. lowest when a word is commonly used in the comparator set regardless of its

frequency in the document set. This means that the word is not characteristic

to the target document.

A vector space model is one perspective that can help interpret the mechanics of the

TF-IDF statistic. Multiple documents in a collection can be viewed as a compilation of

vectors in a vector space – each term has one axis [154]. This means that information about

the order of words in a document is lost; words are treated independently of where they may

occur in a sentence. This is referred to as a “bag of words” model in the literature [155-157].

This is in contrast to a Boolean model, which basically identifies whether a term appears or

does not appear. In TF-IDF, these sets of vectors are compared to one another using both

56

magnitude and angle [140, 158]. In the frequency-analysis study (see Chapter 4), this vector

space model perspective can be used to say that only the magnitude difference between each

word vector was calculated. That study did not provide much information with respect to

comparing individual words, because it does not incorporate a sense of context (direction)

but rather just magnitude. In using TF-IDF, the vector space perspective incorporates angles

between each of these individual word-vectors. This angle is developed by calculating the

term frequency word by the inverse document frequency of that word based on a comparator

set of documents. By extension, this means that the TF-IDF statistic gives more information

about two vectors, compared to the previous study which just compared magnitude. The

additional contextual piece of information, the user-defined comparator set of documents, is a

feature which helps to better define the characteristics of words. Using this perspective, the

TF-IDF algorithm distinguishes vocabulary within documents and thus should theoretically

provide more discriminating information about characterizing vocabulary on documents than

by using frequency alone.

The TF-IDF approach was experimentally coded and tested for a sample dataset.

However, it was found that the basic TF-IDF algorithm did not appear to characterize

vocabulary on engineering exams as well as we wanted. To remedy this, an alternative

algorithm was designed to improve the existing TF-IDF algorithm to provide more resolution

which could be used to better distinguish the vocabulary on the input documents.

5.2.1 Modification of the TF-IDF Algorithm The modification of the existing TF-IDF algorithm uses different comparator sets to

extract and amplify the characteristic vocabulary of comparator sets within a domain. The

novel approach calculates the TF-IDF score using exams in the same discipline as one

57

comparator set and contrasts that with the TF-IDF score produced using all engineering

exams as another comparator set. The difference between these scores for each word

increases the resolution, and possibly accuracy, in finding discipline-specific vocabulary.

The procedure is:

1. Compare each word in the target document to all documents in engineering, minus

those that are in the same discipline, and generate a TF-IDF score.

2. Compare each word in the target document to all documents within the same

discipline as that input document, and generate another TF-IDF score. This should

distinguish terms that are characteristic to that course.

3. Subtract the two scores for the word, and then repeat for all words in the target

document.

This method generates three wordlists – one from each of the above contexts and the

difference. The first wordlist should theoretically highlight terms that are characteristic of

the discipline. This is because the target is compared to documents external to its discipline

but still within the same domain, engineering. The second wordlist should theoretically

highlight terms that are characteristic to the target document. This is because the target

document is compared to documents within the same discipline. Subtracting the two terms

should yield a third term that theoretically highlights characteristic discipline-specific

vocabulary on the target document.

This produces a list where words that are both course-specific and discipline-specific

are given a high score, whereas all other types of words are given a lower score. This

modified algorithm of the TF-IDF equation can be expressed as:

58

TFIDF = TF × IDF

𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇1 − 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇2

𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = 𝑇𝑇𝑇𝑇(𝑇𝑇𝑇𝑇𝑇𝑇1 − 𝑇𝑇𝑇𝑇𝑇𝑇2)

Where,

IDF𝑋𝑋 = log �# of documents


Where, x denotes the context. Specifically subscripts 1 and 2 would represent context #1 and

context #2 respectively, and TF is the same for both because the target document is the same.

Since TF remains the same when using the same input document, we can say that:

𝑇𝑇𝑇𝑇𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = 𝑇𝑇𝑇𝑇𝑇𝑇1 − 𝑇𝑇𝑇𝑇𝑇𝑇2

Incorporating the variables expands the equation into:

𝑇𝑇𝑇𝑇𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = log�𝑇𝑇𝐸𝐸𝑇𝑇𝐸𝐸,𝑊𝑊

� − log�𝑇𝑇𝐷𝐷𝑇𝑇𝐷𝐷,𝑊𝑊

�

𝑇𝑇𝑇𝑇𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = log �𝑇𝑇𝐸𝐸 • 𝑇𝑇𝐷𝐷,𝑊𝑊

𝑇𝑇𝐸𝐸,𝑊𝑊 • 𝑇𝑇𝐷𝐷�

And this produces the equation for the modified TF-IDF algorithm:

𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = TF • log �𝑇𝑇𝐸𝐸 • 𝑇𝑇𝐷𝐷,𝑊𝑊


Where,

𝑇𝑇𝐸𝐸 = # documents in engineering, minus discipline 𝑇𝑇𝐸𝐸,𝑊𝑊 = # documents in engineering, minus discipline, containing the target word 𝑇𝑇𝐷𝐷 = # documents in discipline 𝑇𝑇𝐷𝐷,𝑊𝑊 = # documents in discipline containing the target word

59

This is discussed in a paper by the author (reprinted in Appendix A.5). Based on the

proposed modification, the expected performance of the resulting algorithm is as follows:

1. 𝑇𝑇𝐸𝐸 • 𝑇𝑇𝐷𝐷,𝑊𝑊 is large when there are many documents in the discipline containing the

target word. This results in the IDFmod becoming larger, which then amplifies the

TFIDFmod value. In particular, this means that the word frequently occurs in the

discipline but infrequently in all of engineering, which implies it is likely discipline-

specific.

2. Conversely, 𝑇𝑇𝐸𝐸,𝑊𝑊 • 𝑇𝑇𝐷𝐷 is large when there are many documents in engineering

containing the target word, but not necessarily in the target discipline, then IDFmod

will get smaller. This reduces the TFIDFmod value. This means that the word occurs

frequently in engineering but is not necessarily unique to the discipline, which

implies it may not be discipline-specific.

Although the input words are being treated with a “bag of words” model, the

calculations incorporate two different contexts, in the form of comparator sets of documents,

to help characterize the vocabulary. This is one step closer towards developing a future

strategy that incorporates word meaning into the automated approach in characterizing

discipline-specific vocabulary on an engineering exam.

By introducing a modification that amplifies the range of potential TF-IDF scores, the

resulting list appears to identify characteristic discipline-specific vocabulary on engineering

exams. Due to the iterative application of the TF-IDF equation across different comparator

sets, the following behaviours should occur. First, when there is a word that has a high term

frequency in a document, while occurring frequently in the discipline but not in all of

60

engineering, then the modified approach would boost the score of that word. Second, if that

word does not occur frequently in the discipline but is common to engineering, then this

approach should lessen its score. Therefore, the amplifying effect of the iterative approach

preferentially affects words that are characteristic of the target document and discipline.

5.3 COMPUTATIONAL APPROACH To test the whether the modified TF-IDF approach can be used to identify

characteristic discipline-specific technical vocabulary on engineering exams, the researcher:

1) Coded the modified TF-IDF algorithm into a software program that produces

ranked word lists from target documents (engineering final exams)

2) Tested the quality of the generated lists using subject-matter experts.

The first process is heavily computer programming-based. The first major

programming component is to develop input conditioning software that prepares the exams

for calculation. The second major programming component is to develop the software that

processes each input document using the modified TF-IDF algorithm. These components

will be discussed in Section 5.3.1, with a more detailed discussion in Appendix B.1 and C.1,

and the code itself replicated in Appendix B.2 and C.2.

The second process is based on interviewing human subject-matter experts, and is

presented in Section 5.4. This aspect of the research evaluates the efficacy of the

computational approach in replicating human expertise in identifying characteristic

discipline-specific vocabulary. In this step, eleven faculty members scored words, and these

scores were correlated to the outputs of the first step to measure efficacy of the modified TF-

IDF algorithm.

61

5.3.1 Software Development This section describes the first major component of this study – developing the

software necessary to prepare the input documents for electronic analysis, and to perform the

TF-IDF calculations. A graphical representation of the methodology is shown in Figure 6.

PROCESSINGPreparing and cleaning

text

Engineering Exams

Raw

Te

xt

ORDERINGRanking characteristic keywords using TF-IDF computational method

Across EngineeringComputing an

exam to all exams in Engineering

(TFIDF1)

Within Subject Computing an exam to all exams in the

Same Discipline(TFIDF2)

Wor

dlist

Wor

dlist

DIFFERENTIATINGExtracting discipline-specific language used

specifically in that engineering course(TFIDF1-TFIDF2)

Wor

dlist

POST-PROCESSINGEliminating duplicates, sorting in decreasing order of TFIDF score

ADOBE ACROBAT X

TF-IDF (Visual Basic .NET)

MS EXCEL

Figure 6 - Shows the major components of the computational approach in chronological order, starting at the top. This is reprinted from the paper in Appendix A.6.

62

5.3.1.1 Document Preparation There are a total of 2254 unique electronically-available final exams that were used in

this study. The exams were created between 1999 and 2011 (inclusive). The exams span all

of the engineering disciplines at this institution, including: aerospace, biomedical, chemical,

computer, electrical, industrial, materials, mechanical, and mining engineering. The exams

cover a breadth of engineering disciplines and years of study, from first-year to senior-year

courses. Discussion about why this artifact of the engineering learning environment was

chosen, is gen in Section 3.1.

The input documents came in a variety of formats, so they needed to be converted

into machine-readable form. Specifically, the exams were in one of the following file

formats: .jpg, .bmp, .PDF, .doc, .docx, and .txt. Each input file needed to be standardized

into one common format for usability. To accomplish this, each of the input files was first

converted into a widely-used document container format called “Portable Document Format”

(PDF). This file format is commonly used to represent documents in a manner independent

of application software, hardware, or operating system. The PDF format also encapsulates

document properties including different fonts, graphics, and the information needed to

display them. Originally a proprietary standard introduced by Adobe® Systems Inc., this file

format is widely-used today to ensure that documents are accurately replicated for the reader

as intended by the original author(s). The majority of input documents used for this study

were already in PDF format.

Since the documents collected were over the span of 12 years, which coincide with

the diffusion of the PDF format as an industry standard, there were improvements in the later

years’ PDF documents that were not necessarily present in the documents created in the early

63

2000’s. Specifically, newer PDF documents contain embedded metadata that can include

text-only information, security features, better rendering, etc. All files in the dataset were

converted into the most-recent version of the PDF-standard to ease the processing. Files not

already in PDF format were converted using a freeware package called PDFCreator, which is

an online-accessible PDF conversion tool. All of the files were then arranged into folders

corresponding to discipline and course name.

Once all of the documents were in PDF format, the researcher developed a program

that extracts only the meaningful text from each file and places that extracted text into

another file. The majority of the input files did not have computer-selectable text. The

researcher employed the Optical Character Recognition (OCR) tool of Adobe Acrobat X® to

render the PDF files with a text-based layer above each word in the file. This tool takes what

it thinks are characters on an ASCII table, and assigns the most relevant one to each character

in the input document. This creates a file that has user-selectable text.

This process, however, has some faults: first, though the vast majority of characters

chosen by the OCR tool are accurate, there are some instances where errors may occur. For

example, the word “Indiana” may be overlayed with “Ind1ana”, if the file is poor quality.

This produced some words that contain nonsensical terms in the dataset. These, and other

types of problematic character groupings were removed by a filter created by the researcher

and discussed in Appendix B.1, the code for which is reprinted in Appendix B.2.

This program was created after multiple iterations using the object-oriented

Microsoft® VB.NET programming language, and can be tuned in the future for improvement.

The additional purpose of this program is to extract the text from each PDF-input file, create

a unique text file (.txt) with the same name as the input document, and insert the extracted

64

text into this new file. Therefore, each PDF-input file has a corresponding text-only file, and

these were used as the input for the TF-IDF program.

5.3.1.2 Coding for the TF-IDF Calculations After input conditioning, the modified TF-IDF algorithm was applied to the data

using a researcher-developed software program written in Visual Basic.NET. This program

assigns a score to each word in a user-specified input text file (i.e. target document). This

score, as discussed in Section 5.2 and specifically in Section 5.2.1, is a measure of how

characteristic each word is to a target document, when compared to others in user-defined

comparator sets.

Each word in a target document is assigned to multiple dynamically-memory

allocated matrices, multiple double-length integers, and then assigned a score which is

computed using the modified TF-IDF algorithm discussed in Section 5.2. This process

utilizes two large comparator sets of documents: engineering exams of the same discipline,

and engineering exams of dissimilar disciplines. The program developed is discussed in

greater detail in Appendix C.1, with the code reprinted in Appendix C.2.

5.3.1.3 Post-Processing of Output The output from each runtime of the TF-IDF program is written to a file called

“output.txt”. This file contains all of the words from the input exam (target document), as

well as their corresponding TF-IDF scores. The main component of the post-processing is to

import this data into spreadsheet software, specifically Microsoft Excel®. This allows the

researcher to automatically remove duplicates, sort, and perform mathematical calculations

which cannot be performed directly in the text-file. This process is discussed in Appendix

C.1.

65

The Excel program is used to automatically remove redundant pairs of cells from the

full list. By using the “remove duplicates” command, any pairs of cells that occur more than

once, yet contain the same pair of information, are automatically removed. This leaves us

with only unique entries; each word used in the input document appears only once in the

wordlist, along with its corresponding TF-IDF score.

5.3.2 Results using the Modified TF-IDF Algorithm Initially, the researcher compared the “traditional” unmodified TF-IDF method to the

modified method. The researcher also began to explore the kinds of words that appeared to

preferentially have higher TF-IDF scores (see Appendix A.4). Later work investigated the

modelling of the TF-IDF scores graphically, to compare disciplines, courses, and years of

study to one another (see Appendix A.5-A.6). Additionally, some of the later work was a

collaborative effort with communications instructors at the University of Toronto, who

situated the application of this modified TF-IDF approach in classroom learning, and oral

presentations (see Appendix A.8-A.9). Six peer-reviewed conference papers were produced

from these data, including three presented at the Canadian Engineering Education

Association (CEEA) annual conference, and three presented at the American Society of

Engineering Education (ASEE) annual conference. These papers are reprinted in Appendix

A.4-A-9.

A condensed example of a third-year Materials Science Engineering wordlist

generated by the code is shown in Table 3 below.

66

Table 3 - Shows a sample wordlist from a second year Materials Science Engineering course

Rank Word Modified TFIDF Score

1 dislocation 0.046749 2 dislocations 0.016992 3 cry 0.016379 4 grain 0.015939 5 crystal 0.014845 6 stress 0.013639 7 material 0.011965 8 strength 0.010907 9 deformation 0.008955 10 creep 0.008446 11 partials 0.008165 12 ofll 0.007426 13 intermetallic 0.007198 14 subgrain 0.007193 15 tensile 0.007181 16 metallic 0.006853 17 gb 0.006749 18 hardening 0.006659 19 boundaries 0.006414 20 hallpetch 0.006259 21 crss 0.00569 22 composite 0.005598 23 strengthening 0.005518 24 elastic 0.005376 25 lattice 0.005137 … 200 fact 0.000435 … 350 able -0.000104 … 450 equals -0.001426

67

The sample wordlist in Table 3 shows what a typical output from one instance of the

computational program looks like. The first column shows word rank in decreasing order of

TF-IDF score. The second column are words found on that exam. The third column shows

the TF-IDF score produced using the modified TF-IDF algorithm discussed in Section 5.2.1.

This sample wordlist shows the top 25 ranked words, and then snapshots at 200, 350, and

450 words, which are representative of the entire wordlist. This data is reproduced from the

ASEE publication located in Appendix A6. The list shows the prevalence of discipline-

specific vocabulary near the top of the wordlist. This kind of output is typical of other

courses in other disciplines and years as well. Other key observations from Table 3 include

the tendency of words near the top of the list to follow an exponential decline of modified

TF-IDF score as the rank increases, and how the modified TF-IDF score changes from

positive to negative as the rank increases. This shows us that there are terms on the target

document that are characteristic of the discipline, namely ones with high scores. As these

scores near zero, we can interpret this as words that are just as frequent in the discipline as

the larger dataset. Words with a negative score are more frequent in the larger dataset than in

the discipline. As expected, words with a low score do not appear to characterize the input

document nor do they appear to be discipline-specific. This is consistent among all of the

wordlists created using the computational approach. More thorough discussion of these

wordlists is given in the publications reprinted in Appendix A.4-A.7.

The data produced by the computational approach can also be plotted to graphically

depict the behaviour of TF-IDF scores. A visual representation of the complete wordlist for

the same third-year Materials Science Engineering course, Fracture and Failure of Materials,

is seen in Figure 7. The horizontal axis represents word rank (low rank means low TF-IDF

68

score), while the vertical axis represents the TF-IDF score computed using the method

described in Section 5.1.

Figure 7 - Shows the TF-IDF scores for a sample course, Fracture and Failure of Engineering Materials. The horizonal axis represents rank of the word along the wordlist, while the vertical axis represents modified TF-IDF score for each word. This is reproduced from the publication reprinted in Appendix A.6.

The data appear to exhibit a declining slope with increasing rank. The rate of change

of the slope declines until a plateau region appears, as seen in the sample case above, Figure

7, between word #60 and just prior to word #400. This long horizontal section represents

words that are just as frequently occurring in the discipline as they are in all of engineering.

These words are labelled “uncharacteristic” terms because of how similarly prevalent they

are regardless of document. Figure 8 below shows an approximate distinction between these

“regions” on the graph. This observation is discussed in the publications in Appendix A.5-

A.6.

69

Figure 8 - Shows approximately the three regions on the graph that correspond to areas where the TF-IDF score is (1) amplified, (2) not amplified nor suppressed, and (3) suppressed.

The observed behaviour of TF-IDF scores remains largely consistent among all of the

courses that were processed through the modified algorithm. This is true across different

courses, disciplines and years of study. The differences, however, include where the graph

“drops off” at the end, and this changes based on the course because there are a different

number of words used in each course. Specifically, the number of words controls how many

datapoints there are, and the higher the number of words, the farther the graph extends. The

following figures, Figure 9 and Figure 10 show datasets compared against different

disciplines, and also the same discipline across different years. These figures are

representative of all datasets across this study, and can be used to show similarities and

differences among vocabulary use across courses.

70

Figure 9- Shows the Materials Science Engineering (MSE) exam when compared to exams from Biomedical Engineering (BME) and Aerospace Engineering (AERO)

Figure 10 - Shows a comparison across different years of Materials Science Engineering exams

71

The data, as seen in Figure 9 and Figure 10 above, show that the TF-IDF scores

appear to follow similar behaviour as the word rank increases. This is discussed in greater

detail in the publications reprinted in Appendix A.5-A.6. It appears that the first hundred

words on the wordlists produced for each course are generally characteristic of the course

being examined, and the discipline. Furthermore, the plotted data show that this observation

is consistent across different disciplines, courses, and years of study. This means that the

modified TF-IDF algorithm results appear to be repeatable regardless of the exam, just as

long as the comparator sets are appropriate for the exam chosen. When these data are viewed

in the context of the words themselves, as in Table 3 for example, it appears that words that

have higher TF-IDF scores are discipline-specific. This observation is uniform across

different datasets as well.

We have made an assumption, based on theory and reasoning, that the words which

have a high modified TF-IDF score are characteristic of the discipline of the document being

processed. However, there are several different courses, disciplines, and years of exams that

are contained within this large dataset of final exams. The researcher himself is not an expert

in identifying characteristic vocabulary across such a rich dataset. So it is important to test

this assumption with individuals who are best-suited to gauge the validity and reliability of

these wordlists in capturing discipline-specific vocabulary. This leads to the second part of

this study, where subject-matter experts are presented with words that appear on final exams

for their course, and ask them to rate how discipline-specific those words are. These ratings

are then compared to the ratings produced by the computational approach. If there is

measureable and statistically significant agreement, then it becomes valid to suggest that the

72

computational approach can replicate subject-matter expertise in identifying and

characterizing discipline-specific vocabulary.

5.4 EVALUATION STUDY The goal of this study is to evaluate whether the computational method is capable of

identifying discipline-specific vocabulary on existing engineering final exams. In this study,

subject-matter experts were tasked with identifying discipline-specific vocabulary from the

same set of sample documents used in the computational study. By observing a correlation

between these two approaches, we can measure if and how well the computational approach

works.

Faculty members who set the exam are the most comprehensive subject-matter

experts for this dataset. These individuals are the ones best suited to assess whether an

external aid, like the software program, is able to accurately identify the discipline-specific

vocabulary used in the exam. The participants in this study are current faculty members of

the Faculty of Applied Science and Engineering at the University of Toronto.

A diagram of the study is shown in Figure 11. This shows the three major

components of the study – pre-processing, processing, and post-processing – with a brief

description of main steps. This study will be published as a paper for the 121st American

Society for Engineering Education Annual Conference, to be presented at the Engineering

Research and Methods division in June 2014. A pre-print is located in Appendix A.7.

73

Heading

ETHICS REVIEW

RECRUITING PARTICIPANTS

(Existing) Wordlist for

a course

TRAININGHaving participants understand

what the study is about

Scor

ed

Wor

dlis

t

COMPILING RESULTSInputting participant scores

onto Excel spreadsheets

RECRUITING PARTICIPANTS, and PREPARING

WORDLISTS

TRAINING, CALIBRATION, and ADMINISTERING

STUDY

ANALYSES

PREPARING WORDLISTS- Assign quintiles

- 100 Word Sample

SCHEDULING

CALIBRATIONHaving participants understand the 5-

point scale

100 WO

RD LIST

Given a word, assign a scoreGiven a word, assign a score

Given a score, suggest a word

ADMINISTER SURVEY

ANALYZING & CORRELATIONSComparing participant scores to TFIDF

scores to gauge efficacy of computational approach

Full 5pt scaleFull 5pt scale“5” and “1” only

EVALUATE SOFTWARE-BASED APPROACH TO FIND CHARACTERISTIC DISCIPLINE-SPECIFIC VOCABULARY

Figure 11 - Shows the major components of the study that evaluates the efficacy of the computational approach using subject-matter experts. This is reproduced from the paper in Appendix A.7.

74

This section will outline the major steps of the methodology, which have been

discussed previously in the paper reprinted in Appendix A.7. A wordlist was created by the

researcher for each of the 10 exams by:

1. Processing a complete full-length wordlist using the computational method,

2. Ranking the words in decreasing order of TF-IDF score,

3. Splitting the list into five “equally-sized” bins called quintiles, and

4. Selecting words from each bin until a 100-word sample wordlist was created.

There was a bias in the word selection: more words in the higher TF-IDF bin were

added to the list than other bins. The 30 highest TF-IDF ranked words were placed into the

top bin, with each subsequent bin containing 30, 20, 10, and 10 words, respectively.

Separately, 11 participants (i.e. faculty members) were recruited. Each participant

was trained to use a 5-point ranking scale. This scale is used to quantitatively measure the

degree to which the expert thinks a word is discipline-specific. Unknown to the participant,

this scale is intended to map to the five bins which were used to split up the full wordlist. A

word in quintile #1 should have a participant-ranked score of #1, and so on for each quintile.

The alphabetized 100-word sample wordlist was then given to the participant, and they rank

each word using the 5-point scale. The participant returned the completed wordlist to the

researcher, and the researcher checked for statistical correlation between quintile and

assigned-score using a statistics package called IBM SPSS v21.

The steps presented are discussed in greater detail below, and in the paper reprinted in

Appendix A.7.

75

Step 1: Ethics Review - An ethics protocol was submitted and approved by the Board

of Ethics at the University of Toronto

Step 2: Recruitment - Eleven participants were recruited using email. These people

are current faculty members at the Faculty of Applied Science and Engineering at the

University of Toronto, who have taught at least one course that has an exam contained in the

repository of exams available to the researcher. A sample recruitment document is presented

in Appendix D.1.

Step 3: Input Selection - Ten exams developed by the recruited faculty were

identified. These represent a breadth of engineering disciplines and years of study. The goal

was to maximize the diversity of the exams used for the study. One of these exams was from

a design-course. Two of the faculty members already recruited had taught that course and

were subject-matter experts this area.

Step 4: Input Preparation - Eleven 100-word sample wordlists were created.

Specifically, the complete wordlist from each participant’s course were first evaluated and

ranked in decreasing order of TF-IDF score. This process is discussed in the Section 5.3.1.

Then, this wordlist was split into one of five partitions, called quintiles. Each quintile

contains roughly the same number of words. The top quintile, #5, contains words that have

the highest TF-IDF scores, whereas the lowest quintile, #1, has a roughly equal number of

words with the lowest TF-IDF scores. The next step was to downsample this full list of

words to 100-words for the survey. Words from each quintile were selected until 100 words

were chosen in total. Since the goal is to gauge whether high-scores contain domain-specific

jargon, the number of words selected from higher quintiles (e.g. high TF-IDF scores) were

selected more often than lower quintiles. Specifically, the words for the 100-word sample

76

survey were selected as follows: 30 words from top (#5) quintile, 30 words from second-

highest (#4) quintile, 20 words from middle (#3) quintile, 10 words from each of the lowest

(#2, and #1) quintiles. The resulting wordlist was alphabetized to rearrange words

irrespective of TF-IDF score, and stripped of TF-IDF scores.

Step 5: Interviewing, Calibration - Each participant was scheduled for a 1-hour

meeting. At the meeting, the participant was provided with an “Informed Consent”

document, as seen in Appendix D.2. This required form was explained by the researcher,

and signed by each participant in the study. A copy of this form was given to the participant.

The study was briefly explained to the participant. This exercise reaffirmed the goal and

purpose of the research, and emphasized the importance of providing thoughtful input. The

experimenter used a semi-scripted approach to discuss the importance of using accessible

language in the classroom, and the critical need of subject-matter expertise to evaluate the

effectiveness of an automated tool to help do so. The participant was told that they would be

provided with a randomized list of 100 words, extracted from final exams of courses they

have instructed in the past. The participants were also told that these exams were gathered

from a publically-available online repository of existing final exams, intended to be used for

study and research purposes.

A brief calibration exercise preceded data collection. The participant was given a

print-out of the scale, and given words orally; usually about 5 words. Some of these words

were pertinent to the course and discipline, and some were not. For example, words like

“magnetism” would be pertinent to an Electrical Engineering course, whereas words like,

“green” and “walk”, would be more general. The participant briefly discussed what they

77

would score these words, and after they were confident in using the scale, the study

progressed.

Step 6: Interviewing, Instructions - The participant was presented with the 100

alphabetized words in a two-column spreadsheet layout, with the first column having one

word per cell element, and the second-column blank. The participant was asked to enter a

numerical score into the blank element next to each word. The scale is provided in

Appendix D.3. The number assigned is a representation of how discipline-specific that word

is, and how critical it is for a student to understand that word in the discipline, according to

the participant. If a word is critical to the course, discipline, or both, then the participant

would rate that word highly according to the 5-point scale provided to them. If the word is

neither critical to the course nor the discipline, then the word would be scored a low value. A

sample survey is available in Appendix D.4. The corresponding full-length wordlist from

which that was created is shown in Appendix D.5.

Each participant was given as much time as they needed, but all completed the survey

within 30 minutes. The experimenter did not leave the room for this part of the study even

though he suggested it as an option – none of the participants said they felt affected by the

presence of the experimenter in the room. There was no conversation as the participant

completed the survey, even though the experimenter offered to provide clarification at any

time.

Step 7: Interviewing, Debriefing - After completing the study, the participant was

thanked for their time and debriefed. After the survey was collected by the experimenter, the

participant was given a hardcopy of the complete wordlist for their course, ranked in

decreasing order of TF-IDF score. In addition to this wordlist, each participant was offered a

78

hardcopy of a short academic paper explaining the TF-IDF program development process

(written by the experimenter), and reprinted in Appendix A.5. The purpose for doing so was

to provide additional background about the study, and to encourage participants to provide

additional feedback about the study itself. To date, no additional new feedback was received

other than positive feedback about the purpose and execution of the study.

Step 8: Tabulation of Data - After the data were collected and the participants were

debriefed, the data was manually entered into computer spreadsheets by the researcher.

There are four columns in the resulting spreadsheet for each exam – a column containing

each of the 100-words, a TF-IDF score column, a quintile-ranking column, and a participant-

score column. A sample spreadsheet is shown in Section 5.5.2, as Table 5.

Step 9: Analysis of Data - The data was then analyzed to compare the participant-

assigned scores and the TF-IDF scores. The goal is to understand if any correlations exist

between the subject-matter expert’s ranking of words and the modified TF-IDF

computational approach developed by the experimenter. The data was analyzed by

observing if there are significant differences between the participants and the computational

approach. Using the 5-point scale that each participant used to score each word, these

participant responses were compared to the corresponding quintile bins, which grouped

words of similar TF-IDF scores. To perform the analyses, the experimenter used:

spreadsheet software (Microsoft Excel® 2013), and a software-based statistics package (IBM

SPSS version 21). The results are discussed in the following section.

5.5 RESULTS OF THE AUTOMATED INDEXING STUDY The results presented in this section supplements the conference papers reprinted in

Appendix A.3-A.7. The collection of exams selected as the dataset for this part of the study

79

is presented in Section 5.5.1. A sample of the data collected from one trial is discussed in

Section 5.5.2. A broader collection of data collected from all trials are presented in Section

5.5.3. The statistical analyses including correlations and data reliability measures are

presented in Section 5.5.4. These statistical analyses include Pearson’s R, Spearman’s

Correlation, Pearson’s Chi-Square, and Cronbach’s Alpha. Though some of these measures

might be considered redundant, the multi-disciplinary nature of this study suggests showing

statistical methods appropriate to all of the disciplines which intersect this investigation. One

special case is used to highlight the limitations of the program, and the data resulting from

this evaluation is presented in Section 5.5.5.

In addition to the results presented here, more detailed data are included in Appendix

E. These include individual case-by-case statistical analyses of each of the courses used in

this study. All of these results are then discussed in the context of the other studies and

literature in Chapter 6, with an explicit set of novel scholarly contributions presented in

Chapter 7.

5.5.1 Courses Selected for this Study The researcher has developed complete word lists for over 30 different courses in

engineering, but only a subset of 10 courses were used in this part of the study, as presented

in Table 4 below. The 10 exams were chosen based on breadth and year study within

engineering. The intent was to choose courses so that all engineering disciplines at the

University of Toronto would have representation, from a sample across first-year (freshman)

to fourth-year (senior) undergraduate studies. The left column shows the course name and

course code, the second column shows the discipline that exam belongs to, the third column

80

shows the comparator set used to process the exam, and the fourth column identifies the year

of study in which this exam was administered.

Table 4 - Shows the course exams used for the evaluation study

Course Name Discipline Comparator Set Year of Study

Engineering Strategies and Practice I (x2 participants) APS111

Common across Engineering

APS Applied Practical Science

1

Advanced Reactor Design CHE412

Chemical Engineering CHE Dept. of Chemistry and Chemical Engineering

3 and 4

Construction Management CIV280

Civil Engineering CIV Dept. of Civil Engineering

2 and 3

Electrical Fundamentals ECE110


ECE Dept. of Electrical and Computer Engineering

1

Electric and Magnetic Fields ECE221

Electrical and Computer Engineering

ECE Dept. of Electrical and Computer Engineering

2

Introduction to Psychological Science for Engineers MIE242

Industrial Engineering MIE Dept. of Mechanical and Industrial Engineering

2

Operations Research I MIE262


2 and 3

Design and Analysis of Software Systems MIE350


3 and 4

Introduction to Materials Science MSE101


MSE Dept. of Materials Science and Engineering

1

Mechanics CIV100


CIV Dept. Civil Engineering

1

5.5.2 Sample Dataset for One Trial Sample results for one trial of the study are shown in Table 5 below. This table is

reproduced from a conference paper written by the researcher and reprinted in Appendix

81

A.7. The table shows the results for a first-year electrical fundamentals course, and the

correlation between the TF-IDF binned-quintile scores and the human-participant scores.

The first column, on the left, shows the rank of each word in decreasing order of TF-IDF

score. The second column shows the word, while the third column shows the TF-IDF score.

The fourth column from the left shows the score out-of-five assigned by the human subject-

matter expert, and the last column shows the quintile-bin into which the word was sorted

based on TF-IDF score. The rows are colour-coded to match the quintile number. Two

correlations (explained in more detail in Section 5.6.1) are presented at the top-right corner

of the table. The correlation at the top, in yellow, is between the participant-assigned score

column and the quintile-bin column. The one immediately below shows the correlation

between those same columns, but only for the words in quintiles ‘1’ and ‘5’.

Table 5- Shows a sample trial wordlist from a first-year electrical fundamentals course.

RANK (/100) WORD TF-IDF SCORE

PARTICIPANT-ASSIGNED SCORE (/5)

QUINTILE-RANK (/5)

1 circuit 0.033323128 5 5

100-word CORRELATION (Using full 5-pt scale): 0.7165

2 voltage 0.015487884 5 5

3 electric 0.014911103 5 5

100-word CORRELATION (Using only extremes of 5-pt scale): 0.9272

4 capacitor 0.009280436 5 5 5 resistor 0.00906219 5 5

40 result 0.000262347 3 4 41 motor 0.000260432 3 4 42 discontinuous 0.000254686 3 4 43 tesla 0.000239045 5 4 44 deactivated 0.000227847 3 4 70 associated 0.000121868 3 3

82

5.5.2.1 Observations from the Trial There are several items to note in Table 5. First, the words near the top of the list

appear to be more discipline-specific than the words near the bottom of the list. This

observation is validated by the human subject-matter expert – the faculty member who taught

this course.

The data show that the correlation is dependent on whether the full 5-point scale is

used or if only the ‘1’ and ‘5’ quintile bins are used. If the full scale is used for the

correlation, it is 0.717. However, if only the 1 and 5 quintiles are examined, the correlation

is much higher, at 0.927. A perfect agreement between the computational approach and

human subject-matter expert would mean that the correlation is ‘1’. The observation about

using the extremes of the scale to achieve a higher correlation is typical for all eleven trials.

The data suggests that there is a good correlation between the computational approach

and the human participant for the case shown above. So, in the context of the sample course

71 respectively 0.000121452 1 3 72 half 0.00011827 1 3 73 results 0.000117417 3 3 74 losses 0.000112727 4 3 81 cannot 2.31533E-05 1 2 82 indicate 2.30839E-05 3 2 83 generated 2.05447E-05 3 2 84 difficulty 2.03236E-05 1 2 85 right 1.88357E-05 1 2 91 inside -9.57969E-05 1 1 92 variety -0.000101296 1 1 93 of -0.000115615 1 1 94 at -0.000124816 1 1 95 place -0.000125485 1 1

83

chosen above, it appears that the computational method is very likely to accurately identify

discipline-specific vocabulary on this particular target document.

5.5.3 Summary of Quantitative Results for All Courses

5.5.3.1 Summary of Participant scores Figure 12 below shows the data collected from the participants as plotted on a bar

chart. This chart summarizes and presents information about which participant scores were

assigned to which quintile, across all courses. The horizontal axis along the top shows the

computed quintile score for the words. The horizontal axis along the bottom shows the

participant scores for the words. The vertical axis represents the total number of times that

quintile/participant-score combination was recorded.

The data clearly shows that the high quintile words are frequently ranked a ‘5’ by the

experts. Furthermore, a quintile score of ‘1’ is frequently scored a ‘1 by the subject-matter

experts. The experts also frequently ranked the words in intermediate quintiles – ‘2’, ‘3’, and

‘4’ – with a score of ‘1’. This means the experts displayed a tendency to rank words in

quintile ‘5’ very highly (i.e. ‘5’), but scored essentially all of the other words very low (i.e.

‘1’), i.e. the participants employed a bimodal scoring strategy. They under-utilized scores of

‘2’, ‘3’, and ‘4’, and over used a score of ‘1’. This might suggest bias that over-emphasizes

the low score of ‘1’ over intermediate scores.

84

Figure 12 - Shows the relationship between quintile and participant-assigned scores. The top horizontal axis is the quintile score, and the bottom horizontal axis is the participant-assigned score. The vertical axis represents the count for the number of times that combination was made.

5.5.3.2 Summary of Participant Scores versus Computational Scores The data for each of the 11 trials are shown on Figure 13, and Figure 14. Both of

these figures show the distribution of scores across all exams, including APS111 which was

used twice. Both figures are structured the same, but whereas Figure 13 shows the subject-

matter expert scores, Figure 14 shows the TFIDF binned-quintile scores. The horizontal axis

shows the course code of the exam. Each of the coloured bars represent the participant-score

Participant Score

85

(Figure 13) or the quintile score (Figure 14). The colours of these bars represent whether

the score is a ‘1’ (blue), ‘2’ (green), ‘3’ (beige), ‘4’ (purple), or ‘5’ (yellow). The height of

these bars, the vertical axis, is the count of the number of times that score was assigned.

Figure 13 shows that the blue bars are almost always the highest bar for each course.

This again shows that participants are over-utilizing a score of ‘1’, and under-utilizing the

intermediate numbers.

A more in-depth look at Figure 13 shows that MIE262 and MIE350 have a

considerable number of words that were ranked a ‘1’ by their subject-matter expert. These

courses are “Operations Research 1” and “Design and Analysis of Software Systems”, and

subsequently also have among the lowest correlation values between the computational

approach and subject-matter expert (see Appendix E). While the first course has a large

design component, the second uses very specialized sets of symbols in that exam. The

former case is investigated in more detail using a proxy first-year design course case study

presented in Section 5.5.5, while the effects of the latter can be explained by poor input

conditioning of words used in the exam. This input conditioning step, as discussed in

Appendix B.1, is incapable of accurately processing exams with symbols and special

characters, and this likely leads to an inaccurate wordlist, resulting in the high count of ‘1’

scores.

Figure 13 also shows that the second highest count occurs when the participant-

scores are a ‘4’ or ‘5’. This suggests that participants are actively finding discipline-specific

vocabulary on their wordlists. This also corresponds to Figure 14, where ‘4’ and ‘5’ scores

also have a high count. A visual comparison shows that there is agreement between the

86

computational scores and the human subject-matter expert scores for these cases, and is a

visual representation of correlation (discussed quantitatively in Section 5.5.4).

87

Figure 13 - Shows the count of participant scores for all exams grouped by exam

Figure 14 - Shows the count of TFIDF binned-quintile scores for each exam

Participant Score

Quintile Score Count vs. Quintile Score

Count vs Participant Score

88

The appearance of a double-sized APS111 dataset on Figure 13 and Figure 14 is

because that course was scored by two subject-matter experts. When the data for this course

is compared to the other courses, a high count of ‘1’ scores on Figure 13 suggests that both

experts have found minimal discipline-specific vocabulary in that course. Figure 14,

however, would suggest that the majority of words should be ranked a ‘4’ or ‘5’. This

difference is further investigated in Section 5.5.5.

5.5.4 Statistical Analysis Statistics are used to measure the correlation between the calculated quintile scores

and the scores assigned by the human subject-matter experts. Tables 6-10 below is a

collection of tables that show several statistical methods used to present correlation.

The symmetric measures, Table 6, shows the Pearson’s R value as 0.570, and the

Spearman Correlation as 0.578, which are both significant. These clearly show that there is

strong agreement between the computational approach and the scores assigned by the

experts. These statistics are widely-used in the literature to describe the quality of

experimental results, and a correlation of 0.570 is considered an acceptable degree of

correlation.

Pearson’s Chi-Square statistic, Table 7, also shows a significant result for the 1100

cases, Table 8. This measure signifies that the chance of participants selecting a random

number for each word is extremely unlikely. It shows us that the participants used

considerable judgment in assigning scores to words and that the scores they assigned are

unlikely to be due to chance.

Cronbach’s Alpha (Table 9) and the Kruskal-Wallace test (Table 10) are used for

additional statistical validation. The Cronbach’s Alpha statistic assesses internal consistency,

89

and returns a value of 0.714, which is an acceptable result. The independent-samples

Kruskal-Wallace Test, for a 95% confidence interval, also returns a consistency result that is

congruent with Cronbach’s Alpha, clearly demonstrating that the results are not because of

chance.

By using all of these measures, the data quantitatively demonstrate a clear and

convincing correlation between the computational approach and the human subject-matter

experts. The test statistics presented are widely-used in related fields to test data and

measure similarities. When applied to this research, these statistics show that the data are

valid and also experimentally consistent to deduce that the computational approach works to

characterize discipline-specific vocabulary on engineering exams.

Table 6 - Symmetric Measures Value Asymp. Std. Errora Approx. Tb Approx. Sig.

Interval by Interval Pearson's R .570 .019 22.965 .000c

Ordinal by Ordinal Spearman Correlation .578 .021 23.465 .000c

N of Valid Cases 1100

a. Not assuming the null hypothesis.

b. Using the asymptotic standard error assuming the null hypothesis.

c. Based on normal approximation.

Table 7 - Chi-Square Tests

Value df

Asymp. Sig. (2-

sided)

Pearson Chi-Square 482.834a 16 .000

Likelihood Ratio 501.727 16 .000

Linear-by-Linear Association 356.597 1 .000

N of Valid Cases 1100

a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is

9.10.

90

Table 8 - Case Processing Summary

N %

Cases Valid 1100 100.0

Excludeda 0 .0

Total 1100 100.0

a. Listwise deletion based on all variables in the

procedure.

Table 9 - Reliability Statistics

Cronbach's

Alpha

Cronbach's

Alpha Based on

Standardized

Items N of Items

.714 .726 2

Table 10 - Hypothesis Test Summary

5.5.5 Special Case: the statistical effect of using a design-heavy exam The quantitative and numerical data for all of the exams, along with their course-

specific correlations (Pearson’s R, etc.), are shown in Appendix E. These data suggest that

some courses, in particular ones that have high technical content, have much higher

correlations between subject-matter experts and the computational method. Conversely,

91

courses that have a comparatively high design-content or are intended to cut across

disciplines appear to have a lower correlation between participant scores and quintiles from

the computational method.

In order to better characterize this observation, the researcher investigated a design-

heavy course using two different subject-matter experts – both of whom have taught this

course recently. This special case was used to check if the issue of low correlations was due

to:

1. the target document representing a design course (see Section 5.5.5.1);

2. the participants scoring the study; or,

3. due to the comparator set being used.

The researcher first compared the statistical correlation across all courses studied

(Pearson’s R = 0.570), and again a second time, but now excluding the APS111 design-heavy

course (Pearson’s R = 0.587). The implications of removing APS111 from the dataset is

shown quantitatively using Pearson’s R, Spearman’s Correlation, Pearson’s Chi-Square, and

Cronbach’s Alpha on Tables 11-14 below. When comparing to Tables 6-10, we see that the

Spearman’s Correlation increases from 0.578 to 0.598, Pearson’s Chi-Square remains

significant, and Cronbach’s Alpha increases from 0.714 to 0.728. This shows that almost all

statistics improve when the design-heavy course is removed from the dataset.

Table 11 - Symmetric Measures (APS111 omitted)

Value Asymp. Std.

Errora Approx. Tb Approx. Sig.

Interval by Interval Pearson's R .587 .020 21.731 .000c

Ordinal by Ordinal Spearman Correlation .598 .023 22.357 .000c N of Valid Cases 900 a. Not assuming the null hypothesis.

b. Using the asymptotic standard error assuming the null hypothesis.

92

c. Based on normal approximation.

Table 12 - Chi-Square Tests (APS111 omitted)

Value df

Asymp. Sig. (2-

sided) Pearson Chi-Square 435.246a 16 .000 Likelihood Ratio 453.649 16 .000 Linear-by-Linear Association 309.833 1 .000 N of Valid Cases 900 a. 0 cells (0.0%) have expected count less than 5. The minimum

expected count is 6.70.

Table 13 - Case Processing (APS111 omitted)

N % Cases Valid 900 100.0

Excludeda 0 .0 Total 900 100.0

a. Listwise deletion based on all variables in the

procedure.

Table 14 - Reliability Statistics (APS111 omitted)

Cronbach's Alpha

Cronbach's Alpha Based

on Standardized Items N of Items

.728 .740 2

5.5.5.1 Course-specific Correlations The increased correlation and reliability statistics suggest that the computational

approach is less effective at processing design-heavy exams than traditional “fact and

principle” courses (e.g electrical fundamentals). Table 15 below shows a condensed form of

the course-by-course correlations presented in Appendix E.1. The left column shows the

name of the course, the middle column states whether that course is design-based, and the

right column shows the Pearson correlation between the participant score and the quintile

93

score. This table shows that design courses have a tendency of having lower correlations

than non-design courses.

Table 15 - Shows that design courses appear to have lower correlations than non-design courses. Extracted from data presented in Appendix E.1

Course Code Design-based Course? (Yes/No)

Pearson Correlation (participant score vs. quintile)

APS111 – Participant #1 Yes 0.511

APS111 – Participant #2 Yes 0.503

CHE412 Yes 0.642

MIE262 Yes 0.452

MIE350 Yes 0.319

CIV100 No 0.625

CIV280 No 0.621

ECE110 No 0.717

ECE221 No 0.749

MIE242 No 0.717

MSE101 No 0.785

5.5.5.2 Inter-rater Correlation In order to sample whether participants affect the scores assigned to words, the researcher

measured the correlation between two participants scoring the same exam using the same

comparator set. Results from this analysis are shown in Table 16 below. This figure

presents the APS111 exam with the word shown in the first column on the left, the score

assigned by participant #1 in the second column, and the score assigned by participant #2 in

the third column. This is a 10-word snapshot of the larger list, presented in Appendix E.2.

Pearson’s R correlation between participant #1 and participant #2 is 0.68, suggesting that the

low scores are not due to differences in people scoring the exam.

94

Table 16 - Shows a sample of the scores for the APS111 exam, as assigned by two different participants. The correlation for the full list is 0.68.

WORD Score Assigned by Participant #1

Score Assigned by Participant #2

accessibility 4 2 administered 2 1 allocation 2 1 allowed 4 1 alternatives 5 5 aurora 1 1 automobile 2 1 biogas 4 3 brainstorming 5 5 caffeinated 2 1

5.5.5.3 Effect of Comparator Set on TF-IDF values Quintiles may not be an accurate measure of how discipline-specific a word ought to

be. The TF-IDF value, however, may be a better indicator of this. The analysis performed in

this section observes the misalignment between TF-IDF value and quintile score. It also

checks if this misaglinment can be characterized by the comparator set being used to

compute the TF-IDF scores.

Quintiles are used to group TF-IDF scores together. The highest TF-IDF values on a

wordlist are assigned a higher quintile, and the lower TF-IDF values are assigned lower

quintiles. There may be instances where the highest TF-IDF value of one exam is much

lower than the highest TF-IDF value of another exam. However, the quintiles for top scores

on both exams will be the same. This causes a problem when participants score the exam.

For the case of the APS111 design course, both subject-matter experts had a tendency

of assigning low scores to most words. This happened even though the majority of words

95

were ranked high (‘4’ and ‘5’) according to the quintiles. This may indicate the presence of a

misalignment between the TF-IDF values and the quintiles assigned to each word. One way

of checking this misalignment is by processing two exams through the same comparator set

and observing the difference, if any, that this makes to the TF-IDF value.

The comparator sets being used are APSC (applied practical science, which is a

collection of exams not specific to any discipline), and ALLENGR (all exams in

engineering). The data produced from this procedure are two wordlists having different

words and TF-IDF scores, as presented in Table 17 below. The table shows sample words

from the top quintile for two courses, ECE110 (electrical fundamentals), and APS111

(Engineering Strategies and Practice I). The ECE100 course has been processed once

‘normally’ using its existing comparator set (Electrical Engineering, EE), and again using the

APSC comparator set. The first and second column show the ECE110 course, with words in

decreasing order to TF-IDF score using the EE and APSC comparator sets, respectively. The

right column shows the APS111 course, computed using the APSC comparator set, with

words in decreasing order to TF-IDF score. This table is used to see the kinds of words that

appear near the top of each of the lists, and their corresponding TF-IDF scores.

The data on Table 17 shows that the electrical fundamentals course does not appear

to have discipline-specific words when using the APSC comparator set. Instead, the words at

the top of the list appear to be ‘general’ instead of discipline-specific to electrical

engineering. This is different than the wordlist produced using the Electrical Engineering

(EE) comparator set, as shown in the first column. Specifically, the APSC comparator set

appears to have eliminated the presence of discipline-specific words for the ECE110 course.

Additionally, the ECE110 wordlist has slightly lower TF-IDF scores (magnitude of 10-3)

96

using the APSC comparator set compared to using the EE comparator set (magnitude ranges

from 10-2 to 10-3, see Table 5). The low TF-IDF scores of the ECE110 exam are also similar

to the low TF-IDF scores on the APS111 exam. It is plausible that the low overall TF-IDF

scores for both exams might be a result of using the APSC comparator set of documents

instead of using a more discipline-specific comparator set.

Table 17 – Shows the top quintile when the ECE110 course wordlist is developed using ECE, and the APSC comparator sets of documents, and when compared to the APS111 course.

ECE110 Using ECE

Comparator Set

ECE110 Using APSC

Comparator Set

APS111 Using APSC

Comparator Set circuit 0.033323 fundamentals 0.003368 aps 0.006177 voltage 0.015487 sourc 0.003010 dichloromethane 0.006044 electric 0.014911 source 0.002984 ethic 0.005329 capacitor 0.009280 question 0.002224 stake 0.00452 resistor 0.009062 part 0.002107 identify 0.004097 current 0.007712 rint 0.001989 ethics 0.00402 switch 0.005743 print 0.001845 human 0.003983 power 0.005706 vo 0.001539 life 0.003967 magnetic 0.005631 field 0.001397 residuals 0.003941 inductor 0.004922 printed 0.001372 lemessurier 0.003846

Both the APS111 exam and the APSC comparator set were not a part of specialized

engineering disciplines. Instead, they fall under a category of undergraduate engineering

exams that are “general engineering”. It appears that the modified TF-IDF algorithm had

difficulty in detecting characteristic vocabulary on APS111 using this comparator set.

Additionally, this approach produced a wordlist that was unable to characterize the

vocabulary accurately, and this was reflected in the low correlation assigned by both subject-

matter experts.

97

5.6 DISCUSSION OF STUDY This section builds on the discussion presented in the 2013 and 2014 ASEE and

CEEA papers reprinted in Appendix A.4-A.7, and is specific to the studies presented in

Section 5.3 and Section 5.4. This section uses the results presented in Section 5.5 and

discusses the findings in more detail.

5.6.1 Correlation The results from Section 5.5 show that the computed scores and the expert-assigned

scores are correlated. Correlation indicates that there is agreement between multiple sets of

data. Though there are different measures of correlation, e.g. Pearson’s R and Spearman’s

Correlation, the idea is to use this statistic to quantitatively identify similarities and

differences among groups of data. This section will examine cases where the correlation was

high, and when it was low.

The goal of this study is to have the highest correlation possible, because this

indicates that the computational approach is successful in identifying characteristic

discipline-specific vocabulary on exams. As shown in Table 6, the overall correlation

between human subject-matter experts and the computational approach is 0.570.

5.6.1.1 Design Courses Some courses have a higher correlation between computed score and expert-assigned

scores than others do. Results that show course-specific correlations are located in

Appendix E1, and Table 15. The breadth of courses enables us to see the effects of

individual exams on the overall correlation between the computer-assigned scores and

human-assigned scores. One such effect is that exams that are design-based appear to have

lower correlations than traditional “fact and principle” courses, like electrical fundamentals.

98

The implications of the comparator set on TF-IDF score can be quantitatively

explained using the modified algorithm presented in Section 5.2.1. If design-based courses

have less common vocabulary among each other, then that means the 𝑇𝑇𝐷𝐷,𝑊𝑊 value is small.

This impacts the TF-IDF score by reducing the effect of the IDF part of the equation. By

extension, this means that comparator sets having common vocabulary is essential to

increasing the accuracy of the modified TF-IDF algorithm used to characterize discipline-

specific vocabulary.

5.6.1.2 Scoring Data about inter-rater correlation and participant scores can be investigated together

to better evaluate the computational approach. The inter-rater correlation using the Pearson

R statistic is 0.68 (as shown in Table 16). This suggests that participants agree with one

another. In addition, the count of ‘1’ and ‘5’ scores are much higher than the counts for

intermediate scores (see Figure 13). Also, the correlation between participant-score and

quintile is higher when only the quintile scores of ‘5’ and ‘1’ are considered (as shown in

Section 5.5.2). This may suggest that the subject-matter experts had a tendency of being

dichotomous when evaluating whether the computational approach works or not.

5.6.2 Implication of Results on Empirical Process This study substantiated the importance of calibrating the subject-matter experts.

Initially, the first participant was tasked with scoring the words in the study using an

electronic version of the 100-word sample survey. The robustness of the 5-point scale

calibration strategy was tested using this medium and it was found that the full resolution of

this scale was not consistently used when scoring the words. As a result, the experimenter

adjusted the strategy to focus more attention on the resolution of the scale, in turn promoting

a greater spread of responses. The improved strategy included changing the study from

99

being electronic-based to in-person based. This helped control the contact time that each

participant had with the surveys, while promoting two-way communication to clarify

instructions as necessary. This strategy may have contributed to the making the data reliable

(as seen in Section 5.5.4). The results from each of the exams show that the reliability of

using the scale appears approximately consistent even though there are multiple participants

each scoring different exams. Extending this to more exams would likely produce similar

results, as the 10-exam dataset used is a representative sample of the existing engineering

curriculum at this institution.

5.6.3 Insight on Potential Impact on Teaching and Learning All participants commented positively on the usefulness of providing an explicitly-

defined vocabulary list to their students, and the practical benefits of designing a tool to help

in this regard. Some participants were more inclined to use these lists in their courses than

others. Instructors of design courses suggested that the word lists produced for their courses

did not appear to be ideally suited to capture discipline-specific vocabulary, and as such may

be less likely to be used as a teaching aid. The instructors of more traditional “fact and

principle” engineering courses, however, reacted more positively to these lists, commenting

on the high accuracy with which their list captured discipline-specific language.

In general, this part of the research suggests that a novel approach based on a

modified TF-IDF keyword-search algorithm can be used identify characteristic vocabulary

on most engineering exams, and is likely to be used by most instructors as a teaching tool for

traditional “fact and principle” engineering courses.

100

6 DISCUSSION This chapter discusses the outcomes and implications of all three studies (Chapter 3,

4, and 5) and presents three major contributions resulting from this work (Section 6.1, 6.2,

and 6.3). These contributions are framed using the three research questions articulated in

Chapter 1, and discussed in the context of the literature presented in Chapter 1 and

Chapter 2. This chapter also builds on the existing discussion presented in the papers

written by the author and reprinted in Appendix A.1-A.9.

Three research questions, articulated in Chapter 1, are used to frame the discussion

of the results and the major contributions. These research questions are:

1. Do language-related learning barriers exist in engineering education?

2. If language-related learning barriers exist, can they be characterized?

3. Can an effective strategy be found or developed to assist in the identification and

characterization of these learning barriers?

A combination of literature analysis and empirical study were used to answer these

questions. The first research question was largely addressed by the first study (see Chapter

3), whereas the second and third questions were addressed using the studies presented in

Chapter 4 and 5. The answers to these questions are presented in the form of contributions.

6.1 RECOGNITION OF ENGINEERING VOCABULARY AS AN ACCESSIBILITY BARRIER The outcome of the study presented in Chapter 3 answers the first research question.

Literature from the areas of disability studies, higher education, and second-language

learning all contributed to the development of a research program to address this question

(see Section 1.1, 2.1, 2.2, and Appendix A.1, A.2).

101

Final exams are a representative artifact of the engineering learning environment.

They need to be clearly written so that all students can understand and respond to questions

as accurately and precisely as possible. Using foreign, fossilized, and misunderstood

vocabulary on such assessments may decrease clarity in communication, especially if this

vocabulary is not an explicit learning outcome of the course [55, 63, 98].

Research shows that inaccessible vocabulary is present on engineering final exams,

and that this language falls into a student’s ‘blind-spot’ (see Chapter 1 for discussion of

Johari’s Window and blind-spot). The research study presented in Chapter 3 and published

in the journal paper reprinted in Appendix A.2 show evidence to support this claim.

Specifically, the research shows that multiple undergraduate students of diverse backgrounds

and post-secondary educational levels are unable to accurately self-assess their mastery of

vocabulary on summative assessments. This suggests that students are not understanding

vocabulary as well as they ought to be, and are sometimes unable to accurately gauge their

learning. Therefore, vocabulary learning in engineering education falls into a student’s blind

spot, and can reduce their capacity to learn and master appropriate technical vocabulary. If

students are unable to see this obstacle to understanding, as discussed in the literature

reprinted in Appendix A.1, then this demonstrates the presence of an invisible learning

barrier. Hence, the research finds that engineering vocabulary is an invisible barrier to

accessibility in engineering education.

102

6.2 CREATION OF AN APPROACH TO IDENTIFY CHARACTERISTIC DISCIPLINE-SPECIFIC VOCABULARY

IN ENGINEERING EDUCATION

6.2.1 The Role of Technology in Vocabulary Characterization Technology can be used as a tool to analyze and measure different aspects of

language. This research shows an approach where frequency analysis of vocabulary is tested

to help characterize differences in language use. According to authors in the field of

keyword generation, software tools are becoming more integral to language analysis due to

the computational advantages of accuracy, precision, and timeliness [113, 116, 123-126, 132,

137, 139]. With more advanced hardware, for example, it appears that computers are

becoming more able to compute differences in language to a greater extent. The literature

states that though frequency analysis and the TF-IDF vocabulary characterization approach

can be used to measure different aspects of language, they can be employed using a

computational strategy [102, 124-126, 132].

6.2.2 Empirical Contribution In the context of this research study, the successful design of an approach to identify

characteristic discipline-specific vocabulary depends on this approach being feasible and

accurate. Feasibility refers to the capacity for processing data quickly, with minimal effort,

and reliably. Accuracy refers to the proximity of the results to a true value [105], where in

this case it would be a vocabulary list carefully developed by a subject-matter expert. For

this dissertation, the computational process designed and deployed was able to systematically

score words using a modified algorithm based on the TF-IDF equation, and this was

reproducible across datasets as well as correlated to subject-matter expertise.

103

The computational process systematically scored words on a large dataset within a

reasonable amount of time. In this study, the process computed TF-IDF scores by cross-

calculating several million words across multiple comparator sets of documents. It is a

feasible process because it takes minimal user input and can now take less than an hour to

produce a wordlist. Comparing this to a manual approach, a person would take much longer

to perform the same task. Overall, the computational strategy demonstrates increased

efficiency over a “manual” approach for processing quantitative aspects of vocabulary, e.g.

word frequency, in multiple documents across large datasets.

Employing a computational vocabulary analysis approach over a manual human-

based strategy may reduce accuracy. An individual may characterize vocabulary based on

prior experience or subject-matter expertise, and this is not computable by the program.

Language is a field that is affected by human-interpretation. So, there are aspects of

language that are not accounted for by the computational approach. For example, the student

self-assessment study of Chapter 3 is sufficiently broad to also examine detailed

characteristics of inaccessible language. Additional accuracy can be achieved by

investigating: the order of words, presentation of text, graphical elements, and so on. Similar

arguments can be made for the depth of other studies in this dissertation as well. However,

the focus of this dissertation is to identify and describe an approach that characterizes text on

documents, and is feasible given a reasonable trade-off with accuracy.

As shown in the results (Section 5.5), the data shows acceptable correlation between

the outcomes of the computational approach and the human subject-matter experts. This

correlation, and also the respective reliability measures, demonstrate an appropriate balance

between feasibility and accuracy. The data shows that it is possible to reasonably mimic

104

subject-matter expertise to a significant degree with a technology-based computational

strategy to characterize vocabulary in engineering education. With respect to the research

question, this dissertation shows that vocabulary barriers in engineering education can begin

to be addressed using a computational approach.

6.3 IMPLICATIONS OF THE APPROACH ON TEACHING AND LEARNING

6.3.1 Converging Perspectives from the Literature Characterizing vocabulary in engineering education using an assistive tool has

implications for reducing learning barriers in the classroom. By looking at the overlap of

literature in several areas, the effect of this tool becomes increasingly clear. Literature in

good educational practice that have clearly defined outline learning outcomes can lead to

more usable and higher-quality learning [23, 53, 55, 159]. In parallel, literature in the area of

accessible design shows that clear instructions lead to greater usability of a product or service

[54, 61, 160]. In areas related to public policy, frameworks for accessibility indicate that

increasing accessibility may also increase inclusivity [49, 56, 161].

6.3.2 The Development of Teaching Aids The results of this research can be used to develop teaching aids that can reduce a

specific learning barrier in engineering education. According to the research performed, it

appears that the learning barrier associated with technical vocabulary can be reduced by

explicitly making this vocabulary visible to students. This research can help instructors

develop wordlists that they can distribute to their students to actively promote the

development of a robust professional vocabulary. These wordlists can explicitly show the

105

student the vocabulary that they need to master, thereby increasing the visibility of the

learning barrier, and helping students become aware of what they need to know.

Using the concept of the blind-spot from the Johari Window, the outcomes of the

research demonstrate an approach that can decrease an invisible learning barrier to increase

what is teachable and learnable. If students are given the requisite special vocabulary for a

course (invisible made visible), then they are better equipped to learn that vocabulary and be

assessed on their grasp of it (increasing teachability and learnability of vocabulary).

6.3.3 Producing a Research-based Artifact of the Application of UID In this research, Universal Instructional Design (UID) is used to motivate the

development of a tool to increase accessibility in the engineering learning environment [23,

49, 53, 159]. The outcomes of this research satisfy many of the principles outlined in UID.

Using UID for this investigation contributes empirical evidence to the existing framework in

the context of engineering education. Table 18 shows an overview of how the research

outcomes address each of the principles of universal instructional design.

Table 18 - Shows the implications of the research using the framework of Universal Instructional Design

Universal Instructional Design Principle

Addresses (+) / No Change (NC)

Comment

Class Climate + Supports differences in communication and provides a way to converge corpora of technical vocabulary

Interaction + Encourages effective interaction between students and instructors around the use of technical jargon

Delivery Methods + Promotes multi-modal learning by not prescribing methods in which to learn the vocabulary identified

Information Resources and Technology

+ Encourages instructors to produce and use engaging, accessible course material

106

Assessment + Instructor and student have clearer understanding of mastery being assessed

Feedback + Design explicitly defines mastery of vocabulary

Accommodation + (Systemic) Based on universal access, not individualized accommodation strategy

Physical Environments and Products

NC Design does not affect physical characteristics of the learning environment

Table 18 shows the implications of the research on the engineering learning

environment. The largest impact that this research could have on an existing engineering

classroom is that it helps recognize that all students have a diverse corpus of vocabulary.

This recognition is a step forward in respecting student differences in the classroom, and

potentially has the effect of making students feel more comfortable and inclusive while being

different. Specifically, this research approach embraces diversity while promoting technical

vocabulary development.

Increasing clarity of requisite vocabulary has the effect of promoting clearer

communication between instructor and student, as well as between students. By extension,

increasing use of accurate technical vocabulary in one discipline may promote refinement of

one’s existing corpus of engineering language. This could, in addition, promote vocabulary

learning across disciplines due to overlapping terms. Therefore, clarifying requisite

vocabulary could lead to greater and higher-quality interaction in the classroom.

Clarifying requisite technical vocabulary and learning outcomes can also increase

flexibility in understanding course concepts. The instructor can now use advanced technical

terminology naturally during teaching. As an element of course material, students can seek

clarification from textbooks and other learning aids as well.

107

This technical vocabulary can be used to create engaging and accessible course

material and learning resources, delivered in an inclusive manner, based on authentic

contexts. Specifically, the language being used by the instructor will be more consistent with

the language that students are expected to know, resulting in increased quality of instruction.

In the context of STEM learning, specifically engineering, this vocabulary knowledge and

learning is key to the technical nature of the profession [97-99, 149, 150]. Also, instead of

guessing at whether students are correctly identifying and learning discipline-specific

vocabulary themselves, having this vocabulary explicitly-identified places the onus on the

student to learn. As such, the instructor can use these terms in the classroom knowing that

the students are aware that they will be asked to demonstrate their mastery of using them on

an assessment. This leads to another area where this research affects the learning

environment, student assessment.

Since the computational approach generates wordlists of characteristic discipline-

specific vocabulary, these wordlists can be employed for assessing student performance. In

particular, the instructor can use authentic contexts using this advanced vocabulary while

knowing that students will have mastered them prior to the assessment being administered.

This improves the robustness of an engineering assessment instrument as the test is now

assessing what it purports to assess, and that is mastery of course concepts.

6.3.3.1 Systemic Accommodation Strategy As this research is based on the foundations of Universal Instructional Design, the

tool is designed to increase accessibility for the greatest number of individuals and is not to

be considered an individual accommodation strategy. By generating a discipline-specific

wordlist that is the same for all students in the classroom, all students can take advantage of

108

explicitly-identified technical terms they need to know. This does not take the specific

previous understanding of each student into account, but rather serves to create a common

baseline that all students are expected to see.

The developed approach does not focus on accommodating individual-specific

learning. The same wordlist can be given to all students in that class, and all students are

expected to be familiar with the words on that list. By contrast, an individual strategy would

give personalized lists to each student and this would require a thorough understanding of

each learner’s characteristics and knowledge base. Individualized wordlists may be a

resource-intensive exercise especially if the class size is large. Therefore, developing an

institutional systemic strategy for promoting technical vocabulary learning by explicitly-

identifying learning outcomes is perhaps a more favourable approach in this context.

The implications of the research do not appear to improve the physical characteristics

of the learning environment. The only physical outcome of this research study would likely

be the wordlists themselves, which can be distributed in an electronic form. This may

increase accessibility for students using technology for accommodation (e.g. screen readers)

to more easily access the course vocabulary.

6.3.3.2 Implications of a Research-based Artifact of UID The literature suggests a framework for accessible instruction that supports second

language learning using an automated indexing-based software tool. The multi-disciplinary

research strategy employed here identifies learning barriers, tests an approach, modifies this

approach to incorporate multiple comparator sets, and is then evaluated in the context of

engineering education by experts.

109

The outcomes of the research build on the existing literature and combine approaches

from different domains to produce a strategy used to increase the visibility of learning

barriers. By identifying and making learning barriers associated with inaccessible language

visible, instructors will have explicitly identified some learning outcomes for their students.

The wordlists generated from this research process will enable students to more accurately

identify and understand characteristic discipline-specific vocabulary. By using this systemic

approach, students have the opportunity for better understanding the course material.

With respect to the research question, the results clearly show that a computational

strategy can be designed, prototyped, and successfully validated. This strategy increases the

visibility of learning barriers in the classroom by identifying and characterizing discipline-

specific vocabulary, and can be used to deploy teaching aids that increase accessibility to


110

7 CONCLUSIONS This dissertation contributes to the improvement of learning environment design by

developing a process for characterizing one particular learning barrier present in engineering

education. The research discusses the design and evaluation of a computational approach

used to identify characteristic discipline-specific vocabulary on engineering final exams.

Contributions include recognizing engineering vocabulary as an accessibility barrier; creating

and validating an approach to characterize that barrier; creating course-specific learning

materials that can be directly used; and successfully applying the theory of universal

instructional design (UID) to the development process.

7.1 RESEARCH CONTRIBUTIONS

1. Contribution to Theory Developed an application of UID theory; the application addresses inaccessible

vocabulary identification, learning, and evaluation in engineering education. This

helps to strengthen the framework by providing a successful case study.

2. Contribution to the Design of Learning Environments Designed and validated a modified algorithm based on the Term Frequency-Inverse

Document Frequency (TF-IDF) equation to detect technical vocabulary.

Built a prototype software program to generate wordlists of characteristic discipline-

specific terms, usable as teaching and learning aids. This:

1. Increases visibility of vocabulary-related learning barriers;

2. Is usable by instructors to explicitly clarify learning outcomes; and,

3. Promotes the development of a robust discipline-specific vocabulary

111

3. Important Findings Identification of three areas for future review:

o measuring detailed characteristics of inaccessible vocabulary,

o investigating vocabulary on alternate instructional instruments, and

o expansion of software capabilities.

Undergraduate engineering students often over-rate their level of understanding of

technical vocabulary.

Characteristics of language on design-heavy exams are different than language used

on more traditional “fact and principle” engineering exams.

Carefully selecting comparator sets for use with the TF-IDF algorithm changes the

value of the scores, in turn changing the arrangement of words on the wordlist.

4. Contributions with respect to recommendations for future practice A software framework that can be refined to include additional functionality to

investigate language in engineering education, and improve technical language

acquisition.

7.2 LIMITATIONS There are five main limitations of the research described in the dissertation: feasibility

versus accuracy, difficulties with respect to measurement, in-situ vocabulary (meaning

disambiguation), and inclusion of new words.

1. Feasibility vs. Accuracy The research conducted for this dissertation attempted to navigate a tradeoff between

accuracy and feasibility. A highly accurate wordlist can be produced by a human subject-

matter expert, given enough time and resources. The subject-matter expert would have to

carefully consider each word for inclusion, and sort through a large quantity of vocabulary.

112

This process may need to be repeated with each course, as the evolution of language may

require new vocabulary to be introduced and old vocabulary to be retired. The computational

approach greatly reduces the strain on the instructor to produce the wordlist, and can produce

a wordlist in a reasonable amount of time. Though, due to the mechanical nature of

computation rather than involving the expertise of the instructor, the computational method

can be less accurate. As such, whereas the subject-matter expert favours accuracy, the

computational approach favours feasibility. This is discussed further in Section 6.2.2.

2. Difficulties with respect to Measurement Measurement-related challenges included establishing criterion for measurement and

isolating problems with respect to input conditioning. Understanding and measuring

implications of learning barriers is difficult because of the variability in assessing meaning

due to student differences. For example, assigning an Observed Understanding score to

students (see Chapter 3 and Appendix A.2) is a subjective measure and depends on the

researcher’s ability to map a length-constrained student response to a standard provided by a

dictionary. Furthermore, the transfer of words from different data files into a text-only

format introduces foreign artifacts due to the optical character recognition algorithm. Each

artifact is unique, and so developing a filter to clean the text file needs to incorporate as many

potential permutations as possible.

These limitations were somewhat addressed using an integrative approach (e.g.

multiple participants, multiple comparator sets, etc.) and adjusting the input conditioning

algorithms to increase the accurate conditioning of input vocabulary (removing words with

irregular ASCII characters).

113

3. Single-word processing Prevailing concerns about single-word understanding centre around the generation of

meaning from neighboring words, the structure of language, polysemous vocabulary, etc.

This dissertation instead focuses on whether words can be correctly understood if shown in a

“bag of words” (BOW) model. The assumption is that words in a BOW model are less

accessible in isolation than when used as a phrase or sentence, because additional meaning

can be derived from structural placement. This does not allow for more accurate and precise

understanding of the prevalence of the learning barrier. However, this is somewhat

addressed by the computational method because all words are equivalently considered as

single-words. Since all structural information for all words is lost uniformly, it is assumed

that all words have an equivalent chance of being inaccessible. Using the BOW model errs

on the side of caution by assuming that words are most inaccessible when seen in isolation.

The program does not address the situation of two or more neighboring words

combining to produce a technical concept. An example of this is the phrase “fibre bundle”,

where the individual words may not be discipline-specific, but the word group is. The

program is currently not able to recognize when a set of words that appear together represent

a single concept.

4. Human Intervention Reducing the full wordlists for each course into a 100-word survey given to each

participant may have introduced bias. Though the larger wordlist was binned into quintiles,

and the number of words chosen from each quintile for each exam was the same, the

researcher was responsible for selecting the words from each quintile for transfer into the

survey. Due to this selection, the researcher may have inadvertently inserted or eliminated

words due to subjectivity.

114

The 100 words could have been chosen randomly, but then the list may have

contained non-word artifacts that pollute the word sets rather than actual words. The

automated filtering process used to prepare the word lists reduces this pollution considerably

and the instructor using a word list would have a relatively easy time deleting the few

remaining “junk” terms from the list before providing it to their students. Given this, it

seemed unproductive to allow the “junk” terms to enter the 100 word list presented to the

subject-matter experts so the researcher selected the words from each quintile to go into the

survey.

In this study, the researcher had prior knowledge of Materials Science Engineering,

and therefore the word list chosen for MSE101 may have more inherent subject matter

expertise inserted into the word selection than the other courses. In an ideal case, the subject-

matter expert would have been given the complete wordlist as produced by the computational

method – but, these wordlists are several hundred words long, and pruning this list into a

smaller one is necessary for the evaluation component of the study.

5. New words inclusion The manual addition of new words to the repository of exams is currently the only

way to account for evolution of language over time. The specific limitation is that the word

must be added to the repository in the context of an exam for it to be calculated with the

same degree of consideration as the existing words. Specifically, there is currently no

method for incorporating a single word into a repository. This is a consequence of the TF-

IDF algorithm utilizing comparator sets and the prevalence of that word within that

comparator set to produce a valid characterization of the vocabulary. This computational

method, however, relies a relatively up-to-date set of comparator documents. If the

115

comparator sets get stale because new docs are not added to the dataset, or old docs are not

retired from the set, then the tool is apt to become increasingly poor.

7.3 IMPLICATIONS FOR FURTHER RESEARCH Future research should increase the accuracy with which discipline-specific technical

vocabulary is characterized. The goal would be to first understand detailed characteristics of

learning barriers due to language, including meaning disambiguation using structural and

graphical elements, and then apply them to improving the TF-IDF processing strategy. One

area of further study could be to characterize differences between the TF-IDF scoring and the

participant-assigned scoring, perhaps using additional interviewing of subject-matter experts.

Another research area could be to develop a strategy to mathematically model the TF-

IDF curves shown in Section 5.3.2 so that instructors can roughly predict the length of word

list that should be created for their course. This would help inform the development of a

strategy to understand the accessibility of a word without performing a full computation

using the complete dataset, reducing computational load. It could also automate the process

of selecting a range of words used for a vocabulary list.

Another area of additional research is to use this multi-disciplinary approach to create a

software program that can identify authentic sentences within the repository that are most

characteristic of a user-inputted query word. This would assist instructors in developing

wordlists that have a sentence-based example. This may add additional meaning to the

discipline-specific word identified by the current research, and begin to address limitations

associated with word meaning ambiguity.

116

Further research could also be performed to understand other types of barriers in

engineering education, and to apply multi-disciplinary perspectives to identify and mitigate

those barriers. Though the scope of investigating language in engineering education is broad,

the potential for investigating other artifacts of the learning environment for other types of

learning barriers also exists, and should be studied. The application of Universal

Instructional Design to these materials may work to identify, characterize, and decrease

learning barriers in these contexts as well. Two exploratory short-papers explore the

potential future applications of this research, and these are reprinted in Appendix A.8, and

A.9.

7.4 FINAL WORD The research investigation about language in engineering education has provided a

perspective with which to begin characterizing learning barriers in this field. Using a multi-

disciplinary approach, the research shows that engineering education can be improved

towards greater accessibility, by increasing the visibility of learning barriers through the

creation of course-specific vocabulary lists.

117

References

[1] United Nations Dept of International Economic and Social Affairs, Disability: Situation, Strategies and Policies. New York: United Nations, 1986.

[2] T. Campbell, Disability Studies : Emerging Insights and Perspectives. Leeds, England: Disability Press, 2008.

[3] S. D. Edwards, Disability: Definitions, Value and Identity. Oxford: Radcliffe, 2005.

[4] M. Oliver, Social Work: Disabled People and Disabling Environments. London: Kingsley, 1991.

[5] J. Swain, Disabling Barriers, Enabling Environments. London: SAGE, 2004.

[6] P. Jarvis and S. Parker, Human Learning: An Holistic Approach. New York: Routledge, 2005.

[7] G. Reid, Effective Learning. New York, NY: Continuum International Pub. Group, 2009.

[8] K. J. M. Underwood, Teacher and Parent Beliefs about Barriers to Learning for Students with Disabilities: An Analysis of Theory and Practice. |c2006.: 2006.

[9] K. Bird and A. Mathis, Design for Accessibility: A Cultural Administrator's Handbook. [Washington, D.C.]: MetLife Foundation, 2003.

[10] A. Grant, Designing for Accessibility. London: RIBA Publishing, 2012.

[11] R. J. Sorenson, Design for Accessibility. New York: McGraw-Hill, 1979.

[12] C. Barnes and G. Mercer, The Social Model of Disability : Europe and the Majority World. Leeds: Disability Press, 2005.

[13] L. Davis, The Disability Studies Reader. Abingdon, Oxon: Routledge, 2006.

[14] P. Blakely and A. H. Tomlin, Adult Education: Issues and Developments. New York: Nova Science Publishers, 2008.

[15] H. L. Hodgkinson, Higher Education: Diversity is our Middle Name. Washington, D.C.: National Institute of Independent Colleges and Universities, 1986.

[16] T. Loreman, Inclusive Education: A Practical Guide to Supporting Diversity in the Classroom. London.: RoutledgeFalmer, 2005.

[17] K. A. Joseph, Implementing the Social Model of Disability: Theory and Research. Leeds: Disability Press, 2004.

118

[18] Anonymous How People Learn: Brain, Mind, Experience, and School. Washington, D.C.: National Academy Press, 1999.

[19] L. Campbell, Mindful Learning: 101 Proven Strategies for Student and Teacher Success. Thousand Oaks, Calif.: Corwin Press, 2003.

[20] C. S. Claxton, Learning Styles: Their Impact on Teaching and Administration. Washington: American Association for Higher Education, 1978.

[21] L. C. Sarasin, Learning Style Perspectives: Impact in the Classroom. Madison, WI.: Atwood Publishing, 1999.

[22] F. Bowe, Universal Design in Education: Teaching Nontraditional Students. Westport, CT: Bergin & Garvey, 2000.

[23] S. E. Burgstahler and R. C. Cory, Universal Design in Higher Education : From Principles to Practice. Cambridge: Harvard Education Press, 2008.

[24] Anonymous "Universal design," Engineers Australia, vol. 73, pp. 28, -07-01, 2001.

[25] W. F. E. Preiser, Universal Design Handbook. New York: McGraw-Hill, 2001.

[26] J. L. Nasar and J. Evans-Cowley, Universal Design and Visitability: From Accessability to Zoning. Columbus, Ohio,: [The John Glenn School of Public Affairs?]|c2007., 2007.

[27] A. Colburn, "Universal design," The Science Teacher, vol. 77, pp. 8, 03; 2014/1, 2010.

[28] H. M. Hebdon, "Universal Design," The Exceptional Parent, vol. 37, pp. 70, May 2007. 2007.

[29] C. Koch, "Marketing Universal Design," Qualified Remodeler, vol. 38, pp. 12, May 2012. 2012.

[30] H. Lawford-Smith, "Non-Ideal Accessibility," Ethic Theory Moral Prac, vol. 16, pp. 653-669, 2013.

[31] W. Lidwell, Universal Principles of Design: 125 Ways to Enhance Usability, Influence Perception, Increase Appeal, make Better Design Decisions, and Teach through Design. Beverly, Mass.: Rockport, 2010.

[32] Mary Brown Malouf, "Universal Design," The Salt Lake Tribune, pp. D.1-D1, Sep 22, 2003, 2003.

[33] E. Steinfeld, Universal Design: Creating Inclusive Environments. Hoboken, N.J.: John Wiley & Sons, 2012.

119

[34] Anonymous "Americans With Disabilities Act - Federal Agency Decisions," .

[35] L. Gostin and H. Beyer, Implementing the Americans with Disabilities Act. Cambridge, Mass.: Blackwell Publishers, 1996.

[36] M. C. Jasper, The Americans with Disabilities Act. Dobbs Ferry, N.Y.: Oceana Publications, 1998.

[37] Anonymous "Compliance manual Accessibility Standards for Customer Service, Ontario Regulation 429/07 : Accessibility for Ontarians with Disabilities Act, 2005 (AODA)," .

[38] 1. Medline, "Does accessibility of services lead to uncontrolled costs?" Employee Benefit Plan Rev., vol. 32, pp. 95, -03-01, 1978.

[39] National Research Council (U.S.)., Cost of Meeting Accessibility Requirements for Over-the-Road Buses. [Washington, D.C.]: Transportation Research Board, 2000.

[40] J. P. Conway, "Workplace discrimination and learning disability: The national EEOC ADA research project," ProQuest Dissertations and Theses, 2009.

[41] L. Snyder, J. Carmichael, L. Blackwell, J. Cleveland and G. Thornton, "Perceptions of Discrimination and Justice Among Employees with Disabilities," Employ Respons Rights J, vol. 22, pp. 5-19, 2010.

[42] U. Vu, "Physical disability going down, mental disability going up," Canadian HR Reporter, vol. 17, pp. 6, Mar 22, 2004, 2004.

[43] M. Berry, "Businesses must adapt to accommodate disabilities," Personnel Today, pp. 9, Sep 21, 2004, 2004.

[44] D. Carr, "Constructing disability in online worlds: conceptualising disability in online research," London Review of Education, vol. 8, pp. 51-61, March 2010, 2010.

[45] T. L. Childers and C. Kaufman-Scarborough, "Expanding opportunities for online shoppers with disabilities," Journal of Business Research, vol. 62, pp. 572-578, 200905, 2009.

[46] E. Ellcessor, "Access Ability: Policies, Practices, and Representations of Disability Online," ProQuest Dissertations and Theses, 2012.

[47] G. H. Pike, "Disability access and the Internet," Information Today, vol. 20, pp. 19, Feb 2003, 2003.

[48] L. E. Pinto, Curriculum Reform in Ontario: 'Common Sense' Policy Processes and Democratic Possibilities. Toronto, ON: University of Toronto Press, 2012.

120

[49] C. Bernacchio and M. Mullen, "Universal design for learning," Psychiatr. Rehabil. J., vol. 31, pp. 167-169, 2007.

[50] C. Curry, L. Cohen and N. Lightbody, "Universal Design in Science Learning," The Science Teacher, vol. 73, pp. 32-37, Mar 2006, 2006.

[51] D. Glass, A. Meyer and D. H. Rose, "Universal Design for Learning and the Arts," Harvard Educational Review, vol. 83, pp. 98-119,266,270,272, Spring 2013, 2013.

[52] M. King-Sears, "Universal Design for Learning: Technology and Pedagogy," Learning Disability Quarterly, vol. 32, pp. 199-201, Fall, 2009.

[53] S. Scott, J. Mcguire and S. Shaw, "Universal Design for Instruction," Remedial and Special Education, vol. 24, pp. 369-379, 2003.

[54] M. F. Story, "Maximizing Usability: The Principles of Universal Design," Assistive Technology, vol. 10, pp. 4-12, 1998, 1998.

[55] C. Variawa and S. McCahan, "Design of the learning environment for inclusivity: A review of the literature," in ASEE Annual Conference and Exposition, Conference Proceedings, 2010, .

[56] S. Brown, "Universal Design and me," Inside MS, vol. 25, pp. 43-44, Aug/Sep 2007. 2007.

[57] v. Bronswijk, "Ronald L. Mace FAIA (1941-1998), inventor of universal design," Gerontechnology, vol. 4, 2006.

[58] E. Steinfeld, Universal Design: Creating Inclusive Environments. Hoboken, N.J.: John Wiley & Sons, 2012.

[59] D. Zhang, "Research on Landscape Environmental Design of Universal Design," Applied Mechanics and Materials, vol. 71-78, pp. 4756, Jul 2011, 2011.

[60] D. Rose and A. Meyer, "Universal Design for Learning," Journal of Special Education Technology, vol. 15, pp. 67-70, 2000.

[61] R. L. Mace, "Universal Design in Housing," Assistive Technology, vol. 10, pp. 21-28, 1998, 1998.

[62] D. C. Ralston and J. Ho, Philosophical Reflections on Disability. New York: Springer Verlag, 2010.

[63] C. Variawa and S. McCahan, "Computational method for identifying inaccessible vocabulary in engineering educational materials," in ASEE Annual Conference and Exposition, Conference ProceedingsAnonymous 2012, .

121

[64] C. Variawa and S. Mccahan, "Identifying language as a learning barrier in engineering," The International Journal of Engineering Education, vol. 28, pp. 183-191, 2012.

[65] D. Handle, "Universal Instructional Design and World Languages," Equity & Excellence in Education, vol. 37, pp. 161-166, June 2004, 2004.

[66] P. Fletcher and M. Garman, Language Acquisition: Studies in First Language Development. New York: Cambridge University Press, .

[67] E. V. Clark, First Language Acquisition. New York: Cambridge University Press, 2009.

[68] M. Cruz-Ferreira, "First Language Acquisition and Teaching," AILA Review, vol. 24, pp. 78-87, 2011.

[69] C. Painter, Learning through Language in Early Childhood. New York: Cassell, 1999.

[70] C. Maienborn, K. v. Heusinger and P. Portner, Semantics: An International Handbook of Natural Language Meaning. New York: De Gruyter Mouton, 2011.

[71] J. Malrieu, Evaluative Semantics. Routledge, .

[72] H. v. d. Hulst, Recursion and Human Language. New York]: De Gruyter Mouton, 2010.

[73] A. Carstairs-McCarthy, An Introduction to English Morphology : Words and their Structure. Edinburgh: Edinburgh University Press, 2002.

[74] R. J. Teutsch and D. W. Jamieson, "Hockett on Effective Computability," Foundations of Language, vol. 11, pp. 287-293, Mar., 1974.

[75] K. De Bot, Second Language Acquisition: An Advanced Resource Book. New York: Routledge, 2005.

[76] S. M. Gass, Second Language Acquisition: An Introductory Course. New York: Routledge/Taylor and Francis Group, 2008.

[77] L. Selinker, "Interlanguage," International Review of Applied Linguistics in Language Teaching, IRAL, vol. 10, pp. 209, 1972.

[78] S. P. Corder and S. P. Corder, "The Significance of Learners Errors," International Review of Applied Linguistics in Language Teaching, IRAL, vol. 5, pp. 161-170, -01-01, 1967.

[79] L. Cummings. The Pragmatics Encyclopedia 2010.

[80] T. Riney, "Rediscovering interlanguage," System, vol. 22, pp. 119-122, 1994.

122

[81] U. Weinreich, Languages in Contact.: Findings and Problems. The Hague, Mouton: 1974.

[82] W. Hinzen, "The philosophical significance of Universal Grammar," Language Sciences, vol. 34, pp. 635-649, 201209, 2012.

[83] S. Naidu, "Connectionism," Distance Education, vol. 33, pp. 291-294, Nov 2012, 2012.

[84] C. A. Perfetti, "The Universal Grammar of Reading," Scientific Studies of Reading, vol. 7, pp. 3-24, 01Jan2003, 2003.

[85] R. Mitchell, Second Language Learning Theories. New York: Routledge, 2013.

[86] M. Sharwood Smith and J. Truscott, "Stages or Continua in Second Language Acquisition: A MOGUL Solution," Applied Linguistics, vol. 26, pp. 219-240, 2005.

[87] F. Mansouri, Second Language Acquisition Research : Theory-Construction and Testing. Newcastle-upon-Tyne: Cambridge Scholars, 2007.

[88] S. M. Gass and L. Selinker, Language Transfer in Language Learning. Philadelphia: J. Benjamins Pub. Co., 1992.

[89] S. Jarvis and S. A. Crossley, Approaching Language Transfer through Text Classification : Explorations in the Detection-Based Approach. Buffalo: Multilingual Matters, 2012.

[90] A. Y. Durgunoglu, W. E. Nagy and B. J. Hancin-Bhatt, "Cross-Language Transfer of Phonological Awareness," J. Educ. Psychol., vol. 85, pp. 453-465, 1993.

[91] S. D. Krashen, Language Acquisition and Language Education : Extensions and Applications. London: Prentice Hall International, 1989.

[92] D. A. Bilash Watkin, "An instructional model for facilitating second language acquisition integrating the Suzuki philosophy of learning and Krashen's Natural Approach," ProQuest Dissertations and Theses, 1996.

[93] B. C. Ng, Bilingualism : An Advanced Resource Book. New York: Routledge, 2007.

[94] R. Ellis, Study of Second Language Acquisition. Oxford: Oxford University Press, 2008.

[95] J. Hall and Joan Kelly Hall, "Classroom interaction and language learning Classroom interaction and language learning," Ilha do Desterro, pp. 165-187, -04-30, 2008.

[96] M. Mernik, Formal and Practical Aspects of Domain-Specific Languages: Recent Developments. Hershey, PA: Information Science Reference, 2013.

123

[97] Christiansen, Morten H.,Kirby, Simon. Language Evolution 2003.

[98] R. L. Trask, Language: The Basics. New York: Routledge, 1999.

[99] J. Fisiak, Historical Semantics, Historical Word Formation. New York: Mouton Publishers, 1985.

[100] P. Stekauer and R. Lieber, Handbook of Word-Formation. Dordrecht, The Netherlands: Springer, 2005.

[101] W. E. Nagy, P. A. Herman and R. C. Anderson, "Learning Words from Context," Reading Research Quarterly, vol. 20, pp. 233-253, Winter, 1985.

[102] P. De Keyser, Indexing: From Thesauri to the Semantic Web. Oxford: Chandos, 2009.

[103] S. Mishra, "Automated media indexing," Broadcast Engineering, vol. 45, pp. 12, Aug 2003, 2003.

[104] J. N. Olsgaard and J. E. Evans, "Improving keyword indexing," J. Am. Soc. Inf. Sci., vol. 32, pp. 71-72, 1981.

[105] Anonymous Oxford Dictionary of English. New York: Oxford University Press, 2003.

[106] T. C. Craven, String Indexing. Toronto: Academic Press, 1986.

[107] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer and R. Harshman, "Indexing by Latent Semantic Analysis," Journal of the American Society for Information Science (1986-1998), vol. 41, pp. 391, Sep 1990, 1990.

[108] T. B. Hahn, Subject Indexing : An Introductory Guide. Washington, D.C.: Special Libraries Association, 1991.

[109] P. Rafferty, Indexing Multimedia and Creative Works : The Problems of Meaning and Interpretation. Burlington, VT: Ashgate, 2005.

[110] M. W. Berry, Understanding Search Engines: Mathematical Modeling and Text Retrieval. Philadelphia: Society for Industrial and Applied Mathematics, Jan. 1999, .

[111] M. J. Cresswell, Semantic Indexicality. Boston: Kluwer Academic Publishers, 1996.

[112] A. P. Palma, "Indexicality," ProQuest Dissertations and Theses, 1989.

[113] B. Sherman, "Indexicality," ProQuest Dissertations and Theses, 2008.

[114] J. Brent, Charles Sanders Peirce: A Life. Bloomington: Indiana University Press, 1993.

124

[115] T. L. Short, Peirce's Theory of Signs. New York: Cambridge University Press, 2007.

[116] E. Ochs, "Experiencing language," Anthropological Theory, vol. 12, pp. 142-160, 2012.

[117] E. Ochs, Culture and Language Development: Language Acquisition and Language Socialization in a Samoan Village. New York: Cambridge University Press, 1988.

[118] G. Nunberg, "Indexicality and Deixis," Linguistics and Philosophy, vol. 16, pp. 1-43, Feb., 1993.

[119] S. Olderr, Symbolism : A Comprehensive Dictionary. Jefferson, N.C.: McFarland, 1986.

[120] U. REFFLE, "Efficiently generating correction suggestions for garbled tokens of historical language," Natural Language Engineering, vol. 17, pp. 265-282, Apr 2011, 2011.

[121] T. Wynn and F. Coolidge, "Beyond Symbolism and Language: An Introduction to Supplement 1, Working Memory," Curr. Anthropol., vol. 51, pp. S5-S16, June, 2010.

[122] L. A. Carlson and E. v. d. Zee, Functional Features in Language and Space: Insights from Perception, Categorization, and Development. New York: Oxford University Press, 2005.

[123] M. Aurnague, M. Hickmann and L. Vieu, The Categorization of Spatial Entities in Language and Cognition. Philadelphia: J. Benjamins, 2007.

[124] B. M. Amine and M. Mimoun, "WordNet based cross-language text categorization," in Computer Systems and Applications, 2007. AICCSA '07. IEEE/ACS International Conference On, 2007, pp. 848-855.

[125] L. Campbell, Language Classification: History and Method. New York: Cambridge University Press, 2008.

[126] P. Jackson, Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization. Philadelphia: John Benjamins Pub., 2007.

[127] A. Kasher, Language in Focus: Foundations, Methods, and Systems : Essays in Memory of Yehoshua Bar-Hillel. Boston: D. Reidel Pub. Co., 1976.

[128] E. Andrews, Conversations with Lotman: Cultural Semiotics in Language, Literature, and Cognition. Toronto: University of Toronto Press, 2003.

[129] R. Gramigna, "The place of language among sign systems: Juri Lotman and Émile Benveniste," Sign Systems Studies, vol. 41, -12-31, 2013.

125

[130] W. R. Ott, Locke's Philosophy of Language. New York: Cambridge University Press, 2004.

[131] G. P. Radford, On Eco. United States: Thomson/Wadsworth, 2003.

[132] Y. Ledeneva and G. Sidorov, "Recent advances in computational linguistics," vol. 34, pp. 3+, 03; 2014/1. 2010.

[133] J. Kacprzyk and S. Zadrozny, "Modern data-driven decision support systems: the role of computing with words and computational linguistics," International Journal of General Systems, vol. 39, pp. 379-393, May 2010, 2010.

[134] G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Reading, Mass.: Addison-Wesley, 1989.

[135] G. Salton, Dynamic Information and Library Processing. Englewood Cliffs, N.J.: Prentice-Hall, 1975.

[136] D. Dubin, "The Most Influential Paper Gerard Salton Never Wrote," Library Trends, vol. 52, pp. 748-764, Spring 2004, 2004.

[137] G. Salton, "Automatic Text Analysis," Journal of the American Society for Information Science (Pre-1986), vol. 30, pp. 116, Mar 1979, 1979.

[138] G. Salton and C. Buckley, "Improving retrieval performance by relevance feedback," J. Am. Soc. Inf. Sci., vol. 41, pp. 288-297, 1990.

[139] G. Salton, New Approaches to Automatic Document Processing. Ithaca, N.Y.: Dept. of Computer Science, 1971.

[140] G. Salton, A Theory of Indexing. Philadelphia: Society for Industrial and Applied Mathematics, 1975.

[141] F. Béchet, R. De Mori and D. Janiszek, "Data augmentation and language model adaptation using singular value decomposition," Pattern Recog. Lett., vol. 25, pp. 15-19, 2004.

[142] P. Bissiri and S. Walker, "Converting information into probability measures with the Kullback–Leibler divergence," Ann Inst Stat Math, vol. 64, pp. 1139-1160, 2012.

[143] E. M. da Silva and R. R. Souza, "Information retrieval system using Multiwords Expressions (MWE) as descriptors," Journal of Information Systems & Technology Management, vol. 9, pp. 213+, May; 2014/1, 2012.

126

[144] R. Hu, W. Xu and F. Kuang, "An Improved Incremental Singular Value Decomposition," International Journal of Advancements in Computing Technology, vol. 4, pp. 95-102, Feb 2012, 2012.

[145] Yanmin He, Tao Gan, Wufan Chen and Houjun Wang, "Adaptive Denoising by Singular Value Decomposition," Signal Processing Letters, IEEE, vol. 18, pp. 215-218, 2011.

[146] Zhi-Yong Shen, Jun Sun and Yi-Dong Shen, "Collective latent dirichlet allocation," in Data Mining, 2008. ICDM '08. Eighth IEEE International Conference On, 2008, pp. 1019-1024.

[147] R. Fidel, "User-centered indexing," J. Am. Soc. Inf. Sci., vol. 45, pp. 572-576, 1994.

[148] L. L. Hill, "Automated support to indexing," Information Processing and Management, vol. 29, pp. 528-531, 1993.

[149] I. W. Wait and J. W. Gressel, "Relationship Between TOEFL Score and Academic Success for International Engineering Students," J Eng Educ, vol. 98, pp. 389-398, Oct 2009, 2009.

[150] G. M. Vogel, "Language & cultural challenges facing business faculty in the ever-expanding global classroom," Journal of Instructional Pedagogies, vol. 11, pp. 1-32, May 2013, 2013.

[151] C. Variawa and S. Mccahan, "Frequency analysis of terminology on engineering examinations," in American Society for Engineering Education, 2011, .

[152] C. D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing. MIT press, 1999.

[153] C. J. Van Rijsbergen, "A non-classical logic for information retrieval," The Computer Journal, vol. 29, pp. 481-485, 1986.

[154] M. Moens, Automatic Indexing and Abstracting of Document Texts. Boston: Kluwer Academic Publishers, 2000.

[155] C. D. Manning, Foundations of Statistical Natural Language Processing. Cambridge, Mass.: MIT Press, 1999.

[156] M. Stevenson, Word Sense Disambiguation : The Case for Combination of Knowledge Sources. Stanford, Calif.: Center for the Study of Language and Information, 2003.

[157] P. A. M. Seuren, A View of Language. New York: Oxford University Press, 2001.

127

[158] G. Salton, A Vector Space Model for Automatic Indexing. Ithaca, N.Y.: Dept. of Computer Science, 1974.

[159] R. Zeff, "Universal design across the curriculum," New Directions for Higher Education, vol. 2007, pp. 27-44, 2007.

[160] B. R. Connell, M. Jones, R. Mace, J. Mueller, A. Mullick, E. Ostroff, J. Sanford, E. Steinfeld, M. Story and G. Vanderheiden, "What is Universal Design?" The Exceptional Parent, vol. 38, pp. 97, May 2008. 2008.

[161] W. McCann, "Ontario universities uniting to help faculty identify and help students struggling with mental health issues, improve accessibility for all," Council of Ontario Universities, pp. n/a, Oct 15, 2013, 2013.

128

APPENDIX A.1 – DESIGN OF THE LEARNING ENVIRONMENT FOR INCLUSIVITY: A REVIEW OF THE LITERATURE

C. Variawa, and S. McCahan. “Design of the Learning Environment for Inclusivity.” Proc. of 117th ASEE Annual Conference and Exposition. ASEE Paper No. AC 2010-1195. Louisville, 2010. This paper was presented at the 2010 American Society for Engineering Education Annual Conference. This paper reviews the literature on the subject of inclusivity with respect to learning disabilities, minority students and gender issues. Discussed within the context of first year post-secondary education, this work develops a framework that unites the different approaches into an up-to-date resource that is relevant for engineering education.

129

Design of the Learning Environment for Inclusivity: A Review of the Literature

Abstract Retention, especially of under-represented populations through the first year university, is an on-going concern in engineering programs. While this is a very complex issue, one of the aspects of retention that is being studied is the barriers to inclusion that some students feel when they enter university. There are many programs aimed at helping freshman acclimatize to the university environment and the issue of inclusivity is becoming more pronounced as we strive to increase and then maintain the diversity of our student population in engineering programs. There are many ways of approaching issues of student success toward a goal of improving diversity. However, the literature on this subject is highly fragmented. There is a cluster of work on students with learning disabilities, which is found primarily in the equity and disability literature. Then there is a considerable cluster of work on first generation students and minorities and the cultural issues that these students may face when entering university. And in the engineering education literature there is some work on minority student success strategies and a substantial amount of work on improving the retention of women in engineering programs. A fraction of this literature across all of these fields considers the barriers to inclusion that students may encounter in their engineering studies and, in particular, how the design of the learning environment impacts retention. The work in the area of design for retention comes mainly from literature in the field of higher education studies. In this paper we review the research on this subject, both in the engineering education literature and literature from other disciplines. From this review we have created a framework for understanding different approaches that have been taken to making the learning environment more inclusive for diverse student populations. This research identifies approaches that may be effective and transferable, and a number of open questions that should be investigated further. Introduction A look at current engineering classrooms shows how the demographic composition has diversified, especially in recent years. Most retention programs are aimed at freshman because of the vulnerability of this population, so questions of inclusivity and retention are particularly applicable to freshman programs. With constant change in the learner base, coupled with increasing diversity, one begins to question how engineering education should evolve to meet the needs of the next generation of students, and how this evolution affects the students.

Students with learning disabilities (physical and mental), minority students who are affected by the cultural undertones of contextualization, and gender issues are three major areas of diversity that are affected by inclusivity in the classroom. This paper attempts to review the literature on the subject of inclusivity with respect to these issues, within the context of first year post-secondary education, to create a practical framework that unites the different approaches into an up-to-date resource that is relevant for engineering.

130

The Online Ethics Center at the National Academy of Engineering1 has a collection of over 50 abstracts that address teaching to diversity in engineering. Minority retention rates in post-secondary education, for instance, is a topic that also falls in this category. The 2008 annual report by the National Action Council for Minorities in Engineering2 reviews the statistics on minority engineering students, and practicing engineers. Similar statistics exist for women in engineering.3,4 The statistics clearly show that women and minorities are often under-represented in engineering, and there are programs at many universities related to recruitment and retention that attempt to address this issue.

Although many programs exist, it is unclear what makes a retention program effective. It would be inappropriate to simply assume that a specific effective program at one institution could be successfully replicated at a very different type of institution. However, it would be useful to know if there are particular types of programs or approaches that have been successfully implemented across a variety of institutions. We might be able to conclude that there are proven methods that can be adapted to a specific institution to work in a particular context. Furthermore, by looking at the literature on inclusivity across diversity (gender, minority, and learning disabilities) we can see if there are commonalities in effective approaches that can be leveraged. Applying such strategies in an engineering context also has some unique challenges that need to be addressed.

The literature that was reviewed for this project covered three major populations: women, minorities, and people with learning disabilities. While it is possible to find hundreds of citations for each of these categories, references were chosen for breadth. For this reason some of the references are review articles that draw together literature from a large number of primary sources, but virtually all of the literature focuses on one population or another, or on the learning environment in general. Our purpose here is to view this literature altogether to identify commonalities that are relevant and useable for engineering, thus creating a framework for understanding effective approaches to inclusivity that can operate across a variety of populations. Students with Learning Disabilities Learning Disabilities (LD) are defined as “the conditions giving rise to a difficulty in acquiring knowledge and skills, especially in comparison with the norm for one’s peer group, typically because of a mental disability or cognitive disorder.”5 A number of recent publications look at the prevalence of learning disabilities in the classroom. These include studies ranging from the identification of students with visual impairment, autism,6 and auditory processing disorders.7 A review of the results from these sources indicates an increasing prevalence of children with LD which will translate into an increase in engineering students with LD. Further, research exists that suggests that disabilities have no effect on an individual’s intelligence, and therefore students in this population ought to have an equal opportunity to be successful in a learning institution. The studies generally conclude that increased inclusivity in the learning environment is beneficial. The sources reviewed for this project are from the engineering, equity, and disability literature, and pertain to a wide variety of identified disabilities. A particularly comprehensive resource,

131

Brinckerhoff et al.8 has over 900 references that discuss approaches that identify and address LD issues. It includes an analysis of the LD population, the dynamic process of providing accommodation, as well as tools for performing future work in the field. Scotch9 has reviewed the issues of preexisting bias, the presence of dominant tendencies in the workplace, disabling environments, assumptions of incapacity, and the culture of disability policy. A more recent article by Kavale et al.10 argues that the traditional definition of a learning disability “has remained static for 40 years, creating a schism between theory and practice.” Particularly, the authors suggest that the definition of LD ought to be updated to a more rigorous construct of physical as well as mental disability, including emotional disturbances of environmental, cultural or economic disadvantage. Similarly, Williams et al.11 recently published the results of their research on learning disabilities and how the sharing of information can influence particular outcomes. Their article concluded that sharing knowledge about student behaviors creates an increasingly personalized relationship with the learning population.

One route to addressing the issue of disabling learning environments is universal instructional design (UID), also called universal design in education. Bowe12 and Burgstahler et al.13 review the history of this approach and explore the principles behind it. The UID principles are aimed at changing the learning environment to reduce the barriers to learning for a broad range of students, while enhancing the environment for all students. The principles, drawn from universal design in architecture, are intuitively appealing. However, McGuire et al.14 have pointed out that this approach has not yet been rigorously tested.

Research on methods for addressing issues of LD has increased over time. Again, although there are many publications in this area, only the most recent and summative examples are chosen to discuss here. Research published in late 2009 considers the effects of computer-assisted instruction on the mathematics performance of students with learning disabilities.15,16 Although the instructors were generally willing to provide additional instructional and adapted materials to assist LD students, increased class sizes and lack of additional support structure made this approach difficult. Seo and Bryant17 analyzed 11 existing studies that compared computer assisted instruction (CAI) to face-to-face teaching. Although there was no consensus on whether CAI is advantageous per se, the authors were able to identify several key issues which need to be addressed before CAI can be realistically compared with traditional teaching practice. They suggest that CAI should be based on a valid learning theory (i.e. based on cognitive and constructivist models rather than behavioral) and should incorporate critical instruction features (such as feedback, etc). The validity of using CAI to assist LD students still needs to be studied further. The Seo and Bryant study is important because “e-learning” is gaining increased attention as a method of assisting students with learning disabilities. Another example is Todd’s work which considers several recent studies that aim to promote e-learning as a tool for assistive education.18 LoPresti et al.19 review assistive technologies currently being explored to reduce accessibility barriers, and provide improved quality of life. The literature shows that learning disabilities can affect both student success and inclusivity. In general, the literature suggests that increasing interaction between the instructor and the student is effective, and when that becomes difficult, methods such as e-learning that supplement traditional learning can be useful. However, e-learning is not universally effective and to be effective it must be understood well by both the instructor and the student, and it must

132

incorporate key elements of pedagogy. The advantages and disadvantages of a student-centered approach versus changing the institutional environment as a whole are summarized in Table 1.

The literature is now clear that bright students with learning disabilities also have much to contribute to the engineering profession. Most of our current practice in terms of retaining these students is based on finding appropriate individualized accommodations, but increasingly the literature points to changing teaching practice as a means of creating inclusivity. The literature on learning disability has also begun to point to a wider variety of factors such as economic and cultural differences (e.g. Kavale et al.10) that should be accounted for in the learning environment if we intend to create inclusivity.

Minorities and First Generation Students: Cultural Issues When students first enter university there is a period of adjustment when they must transition from the environment and learning skills they were accustomed to in high school, to a new environment with new demands. This period of transition, or feeling they are not yet successfully adjusted, can be especially acute for first generation and minority students. First generation students are people who are the first in their family to go to college. Admission decisions are generally based on grades, extracurricular activities, capacity to communicate in the language of instruction etc. However, these attributes do not necessarily measure how easily a student will fit into the learning environment, especially if the new learning environment and culture of the institution are very different than what they have experienced before. Nor do we want to exclude students who come from diverse backgrounds because they may have difficulty adjusting. This would have significant negative consequences for the institution, the learning environment, and the engineering profession.

Traditionally there has been an over-representation (relative to the general population) of white men in engineering in North America. This is a simplistic statement because it ignores hidden diversity. However, many aspects of current learning environments in engineering implicitly assume this simplistic homogeneity. As a result, students from diverse backgrounds may have difficulty adjusting to the institutional environment. This may be felt both inside and outside the classroom. We will focus here on the learning environment where cultural differences can result in unnecessary barriers to learning, for example, making meaning of the contextualization used in engineering applications. Eventually this can affect student success and retention because it leads to a disconnect between the learner and the material which can compromise grades and lead to a sense of alienation.

The cluster of work in this area is extensive, and is spread over many disciplines. For this reason, recent work, and that most closely-related to inclusivity in the first year engineering classroom, will be examined preferentially.

In a recent article Tapia20 argues that diversity requires attention to the student and institutional commitment. He gives examples of exemplary programs at various “top-tier” universities that support inclusive environments for minority students, and contends that a supportive institutional environment benefits everyone. Malone and Barabino21 considered such environments as they examined the role of environment in identity-formation. They also performed a comprehensive

133

analysis of narrations of race in science, technology, engineering, and math (STEM) settings. Their work identifies themes of invisibility and lack of recognition, exclusivity, racialization, and issues of integration of identity. In general, their work pulls together research from various sources, including existing literature and primary research studies.

Understanding the relationship between racial difference and minority inequality is complex. Trytten et al.22 for example, contend that racial inequality can exist in spite of over-representation. They point to the example of Asian American students in engineering in North America. Specifically, they argue that over-representation “does not remove the racially-based stereotyping and discrimination in our society,” and hence minority status. In their work, they describe five approaches for making engineering institutions more equitable, including: creating a support system for all minority groups; educating faculty and students about stereotyping; and remaining vigilant for possible issues including instances of discrimination not reported to the institution. Generally, they claim that minority students may require additional support to facilitate inclusivity, whether they are members of an over-represented or under-represented minority. This article exemplifies a message that is repeated in other sources: that while students from a particular background may face similar obstacles, we need to be careful not to stereotype, but instead to consider how diversity, both visible and invisible, can result in a disconnect between the learner and the learning environment. There are a variety of valuable recent articles in this field for further reading that are directly applicability to first-year engineering.23,24, 25

In terms of creating a framework for addressing the needs of culturally-diverse students, we have identified several underlying trends in the literature. First, minority students (cultural, racial, etc.) are subject to unique barriers to learning that “traditional” engineering students do not have to face. Second, the probability of minority student success depends on the degree to which the institution is able to develop and support an inclusive environment. Further, students from over-represented minorities and those with hidden diversity may encounter some of the same barriers to accessibility. Several approaches to mitigating these learning barriers were also examined in the literature including increased resources and counseling, recognition of achievements, and peer/faculty support-groups. Effectively, these add up to a student-centered approach that decreases a sense of alienation. One of the significant current trends is an emphasis on community building to achieve a sense of inclusion. A key recommendation for the in-class engineering learning environment is that contextualization of knowledge should take into account differences in the environmental, cultural, and economic backgrounds of students. The advantages and disadvantages of a student centered approach versus changing the institutional environment as a whole for addressing the needs of first-generation and minority students are summarized in Table 2.

Improving Retention of Women in Engineering Education There a huge body of literature in the field of gender differences in education, and a portion of this analyzes methods for improving the retention rate of women in engineering education. The number of women entering engineering has risen, but has not risen steadily, and has been out-paced by female representation in other professional fields. Some research suggests that recruitment into engineering is the primary issue, as opposed to retention.26 However, other

134

research suggests that women continue to experience a sense of exclusion in the engineering environment which may feedback and influence decisions that are made by the next generation of students. This has been an on-going issue in engineering education, and the consensus is that this is a complex issue that will require a societal as well as institutional evolution.

There are some excellent recent articles in this area that pertain to engineering education. Buchmann27 identifies areas where women lead and trail men in higher education. Essentially an up-to-date literature review of women in higher education, Buchmann also investigated the correlation between gender differences and student success rates. Leicht-Scholten et al.28 describe how the international community is fostering gender inclusivity in engineering education. And Garforth and Kerr29 analyze the issues of gender differences in science, technology, and engineering using a Foucauldian approach. This approach seeks to identify a feminine perspective by considering how women describe their interaction with the institution. They advocate incorporating this perspective into the academy instead of trying to acclimatize women into a preexisting environment. Gender disparity is also analyzed in a cluster of articles summarized in the summer 2009 National Women’s Studies Association Journal.30 The consensus is that inclusivity in science requires approaches that can be “varied and thus appeal to a wide variety of learners, and the applications would benefit all facets of society.”30 This idea echoes the learning disabilities and minority studies in STEM education literature. Du and Kolmos31 also suggest methods of improving inclusivity for women engineers, but their approach uses problem-based learning (PBL) courses. In their study, they analyze how PBL courses offer not only the usual learning benefits associated with PBL, but also increased female recruitment into areas where they are under-represented.

The relatively low percentage of women pursuing engineering degrees is also a societal issue. Studies by McCarthy32 and Chen33 suggest that negative cultural messages, restrictive role modeling, and lack of constructive middle and high school guidance contribute to the problem. McCarthy advocates fostering inclusive attitudes and language, reframing physical project assessments to foster a less destructive approach, and among other things, carefully marketing STEM education. In another study,34 researchers found that the perceived importance of engineering competencies is subconsciously influenced by gendered assumptions. Engineering competencies that are perceived as “feminine” are regarded as soft skills that are less valued. As a mitigation strategy, they and others35,36 suggest emphasizing the value and importance of a wide variety of competencies in engineering, and being careful not to reinforce stereotypes. To be effective, they contend improvement strategies should be structural rather than individualistic.

In general, the literature on gender issues in engineering education shows that the current population of women in STEM education is low relative to the general population and the inclusion of feminine identity plays a key role in the formation of an inclusive environment. University is an essential developmental period for many students, and it is important that women see the opportunity in engineering education of developing in an environment that affords their perspective and goals equal value. A summary of the key advantages and disadvantages of a few different approaches that have been tried in this field is shown in Table 3.

We have reviewed the literature in three clusters that pertain to specific learner populations: students with learning disabilities, minority students and cultural differences, and women. Along

135

with the literature on these specific populations, there is another body of literature which looks at the learning environment overall.

Design for Retention There is a body of literature in the field of higher education studies that pertains to retention. The literature in this area can be roughly subdivided into two categories: research into the attributes that make students more likely to succeed (with the aim of helping students boost their competencies in these areas); and research into intervention strategies or environmental factors that impact success.

There is research that demonstrates that the preexisting psychological state of the student, and their social and coping skills, have an effect on retention. Solberg Nes et al.37 surveyed over 2000 students to determine the effects of dispositional and academic optimism on college student retention. The former affected retention via motivation and adjustment, whereas the latter did the same, but affected GPA as well. One area that has received much attention is Emotional Intelligence (EI) and how that impacts retention. Qualter et al.38 showed that higher EI positively influences a student’s ability to progress, while also evaluating an EI-based intervention program using recent theoretical work to ground their results. This approach is typical. Schools that use EI assessment will generally follow up with the student, i.e. offer opportunities for the student to boost their competency in areas where their EI assessment is low.

Other researchers have focused on retention programs and the characteristics of the learning environment that positively impact retention. Jones and Braxton39 offer a good current review of the extent and types of recent approaches institutions are taking to reduce college student attrition. Bai and Pan40 performed an analysis of four different types of intervention. In their study, they found that social integration programs improve retention for female students, and identified which types of advising programs benefited first year students. Croft et al.41 examined a program which increased support of mathematics instruction to assist in retention efforts, and showed that the institution also progressed in other areas as a result of this university-wide support strategy. McQueen42 recently reviewed various models that are currently being used in the field of retention. She argues that an internationally prevalent model currently used by institutions for student retention, Tinto’s Student Integration Model although useful in certain areas, is not particularly applicable for education. She suggests that a more contextualized, nuanced, and psychosocial approach be used in the field.

The institutional environment, including the student community also plays a key role in retention. Oseguera and Rhee43 studied how the characteristics of the student population affected retention over a 6 year period. They found that better academically-prepared and better resourced students can act as buffers for at-risk students. That is, the better prepared students can help retain their peers during times of failure and self-doubt.

Overall, we found through the literature search that much of the research, although carried out in other fields, is applicable to engineering education. The issues of student attributes (e.g. EI) and approaches suggested for retention programming appear to be transferable to engineering. The literature suggests that supporting the development of student coping skills, and creating an

136

environment that encourages mentoring and a positive sense of community and inclusion have a positive impact on retention.

Like the other clusters of work we reviewed, the body of material in this field is huge. There appear to be many possible strategies that could be implemented to positively impact retention. However, we are faced with two difficulties. First, programs or approaches need to be fit to the needs of the particular institution, and simply “lifting” a strategy from elsewhere is probably not effective. So we need to understand not just the details of the strategy, but understand the principles that make it effective. Second, given limited resources we need to decide, on a practical level, which approaches will yield the most impact for resources invested.

Discussion

This review has considered clusters of literature which all pertain to inclusivity and by extension, retention. Within each of these clusters, the authors have examined recent literature with an emphasis on breadth. These topics include up-to-date literature surveys, statistics, and quintessential studies that examine inclusivity across diversity. Although each article takes a unique approach, there are some generalized conclusions which we can draw from this review.

Two schools of thought emerge from the literature examined, both have at their core the intent of increasing student success and retention in diverse learning environments via inclusivity. The individual-focused (IF) approach attempts to mitigate learning barriers by helping the individual student fit into the environment, while the system-focused (SF) approach attempts to change the environment to fit the broadest possible variety of students. All of the strategies and programs discussed in the literature, across all of the clusters we reviewed, can be categorized along this spectrum. Some approaches are purely IF or SF, but many are a mixture.

Tables 1, 2 and 3 summarize some of the main advantages and disadvantages of the IF and SF strategies for each cluster of literature. Table 1 shows how learning disabilities can be mitigated using IF and SF approaches. There is a tradeoff between individual accommodation or intervention and increasing the accessibility of the system overall. The goal in both approaches is to improve inclusivity. However the SF strategy adjusts the system to make the environment more accessible to a greater number of students. This, if done effectively, will improve the learning environment for LD students, and may also create a better learning environment for others (what is known as the “curb cut” effect). It also inherently accommodates students who may have a learning disability, but have not yet been assessed. The disadvantage is that even a system that is well designed for a broad set of users may not accommodate people on the far end of the spectrum in terms of needs. And, there may be a perception that building accommodation into the system compromises the integrity of the education. This may not be the reality, but it can impact on the effectiveness of an institutional change. Whereas, the IF approach targets LD learners specifically and seeks to provide accommodation or teach coping skills. As discussed in sources like Williams et al.,11 the creation of a personalized relationship between the accommodation service and the student increases a sense of inclusivity while reducing barriers to learning. However, other authors in the field argue that as more and more students resort to accommodation the system becomes strained, and students may become too highly dependent on this service for their sense of inclusion. Increasing load on individual accommodation services

137

requires greater resources while only meeting the needs of a limited portion of the learning population. Hence, there are disadvantages to using IF or SF strategies exclusively. Table 1 – Strategies for people with learning disabilities (physical/mental)

Individual-Focused System-Focused e.g. Accessibility volunteer who

helps in note-taking, physical assistance for transportation, extended-duration for assessment completion, etc. (per-case basis)

e.g. Universal Design in Education – maximize accessibility for the greatest number of learners possible. Provide an environment that is flexible, transparent, and more tolerant of user-error.

Pros Provides assistance to individuals who are highest at-risk of not succeeding Demonstrates strong sense of institution-learner commitment due to personalized response

Pros Provides an increased level of accessibility for all students, regardless of prior disability-level Increases universal access to education May promote/supplement alternative ways of learning, resulting from greater variability of access methods

Cons May promote a sense of unequal treatment among non-assisted and assisted learners Although student is being assisted, they may feel more out-of-place because of accepting this assistance Generally requires greater resources as students are addressed individually

Cons May leave out students at highest-risk of not succeeding There is a concern that this may compromise the integrity of education by “simplifying” Does not address barriers to individual learning specifically (addresses several barriers in a general-sense, but none are specific to any student)

References 8, 11, 17, 18, 19

8, 9, 12, 13, 14

Table 2 shows a comparison between the advantages and disadvantages of using the IF and SF approaches and provides some examples for first-generation, minority and culturally based student issues. The individual-focused strategy typically employs a personal tutor, coaching, or mentoring system. This approach encourages person-to-person interaction, and may greatly benefit individuals who severely lack any support and have a substantial sense of isolation or exclusion. Although the IF approach promotes a kind of inclusion, it also segregates individuals from their peers. Further one may argue that the learner may develop a dependence on this resource, and such dependency could possibly reduce the learner’s independent motivation and self-confidence. In terms of adjusting the environment to fit the student’s needs (i.e. the SF approach) sources such as Malone et al.21 suggest that identity creation is a major factor for increasing inclusivity, and the institution can affect this by supporting initiatives that build a sense of community belonging. Further, changing the classroom environment to include applications and contextualization that takes into account a diverse student population can have a

138

positive effect. However, similar to the shortcomings of the SF approaches used for learning disability, this approach may not meet the needs of the highest risk individuals. Table 2 – Strategies for first generation and minority students, and/or to address cultural issues

Individual-Focused System-Focused e.g. Personal tutor/mentor assigned to

student (or small group) Having clear lines of communication between instructor/learning population by promoting a human-centered approach (telephone, in-person meetings, etc) Individual-specific learning objectives

e.g. Restraining use of colloquial terms on assessment materials Promoting and funding cultural/minority groups on campus whose aim is increase understanding between learning population/society Diversity in methods of instruction allows for the learning population to use the one they are most familiar with (e.g. Lecturing vs. teaching using multimedia)

Pros Individualized sense of inclusivity – student feels closely associated with ‘mentor’ Increases self-confidence by having a resource that may know learner at a personal level

Pros Promotes an environment that increases inclusivity for all students to a greater degree Enhances instructional material by contextualizing data generally – improves transferability of knowledge/application Limited ‘alienation’ feeling due to the learner self-creating a model of effective learning (is not dependent on a ‘mentor’ for assistance)

Cons May form dependence on ‘mentor’ to act as interface between self and environment Addresses very specific issues – knowledge gained may have variable applicability

Cons Learners highest at-risk who need additional assistance still have their barriers to learning

References 20, 21, 22, 23, 25 20, 21, 24, 25

Table 3 considers the strategies available for addressing gender issues in engineering. One example is lack of female role models in engineering education. Using an IF approach an institution might develop a coaching or mentoring program. The advantage of approaching inclusivity in gender-issues from the angle of IF is that it promotes the sense of a personal relationship between a mentor and an individual student, and this fosters identity creation, increased self-confidence, and addresses other issues. A critique of the IF approach in gender issues is that it may promote a sense of exclusion for women because it suggests they are a foreign entity in engineering in need of support to operate successfully in the engineering

139

profession. This may be a source for alienation, and may be counter-productive if not addressed by the system. The systems-focused approach identifies gender-issues as a way to embrace differences and incorporate them into the diverse learning environment. This approach identifies gender issues not as a problem with women not fitting in, but rather as a part of the greater problem of an exclusive environment which also has implications for other types of diversity. A systems approach aims to address all of these issues via universal design applicable to the greatest number of users to the greatest degree possible. The difficulty is implementing such a change. There are numerous obstacles including societal factors, institutional inertia, etc. And it can be asked whether engineering currently has the means of making this change if there are an insufficient number of women to reach a critical mass, or tipping point.

Table 3 – Strategies to deal with gender issues Individual-Focused System-Focused e.g. Individual role-models in the

faculty who act as nodes for personal growth

e.g. Increasing enrolment rates for women in STEM education

Pros Highly personal relationship between individual and ‘mentor’ may increase sense of identity, and decrease self insecurity issues etc. Embraces gender differences as a means to accept diversity in the classroom

Pros Increases gender equality, and promotes universal treatment of all learners Self-identity creation is supplemented by the system addressing all students equally Gender differences are given the same ‘importance weighting’ as others; does not provide exclusive treatment of one group over another: system-wide

Cons May further segregate genders because of increased sense of exclusivity between “them” and “us”

Cons Gender issues may not be fully addressed for all persons affected – a surface-level approach to solving this problem promotes a partial understanding of the specific issue

References 26, 27, 28, 29, 33

26, 28, 29, 30-36

Conclusion Studying student success in learning environments has roots in inclusivity studies in education. Recent literature sources were used for this project which aims to identify means of increasing inclusivity by addressing the needs of students with learning disabilities, minority students and those who have cultural barriers to learning, and women in STEM education. We have also included the literature on retention in the review, particularly design for retention.

The breadth of work examined here was an attempt to create a list of resources which can serve as a starting point for future work. Several approaches currently being investigated in other disciplines, such as an understanding of EI as it pertains to retention, have potential to be used directly in engineering, or to be adapted for use in engineering.

140

Much of the literature is focused on the benefits of a human-centered approach to revising the learning environment either at the individual level or at the systemic level. The approach could hypothetically be engineered such that the educational system is designed around the user (students) to address their needs. This is a concept familiar to engineers in product or system design and we have the opportunity to apply our expertise in this area to improve the learning environment. Increased inclusivity will ideally accommodate the increasing diversity of tomorrow’s engineering population. However, the challenges of designing intervention programs, or redesigning the learning environment, are enormous and to date there is no one approach can be identified as the “standard” or best practice.

Considering the literature from a purely individual-focused or system-focused perspective is perhaps simplistic because so many of the suggested, and tested, strategies are a blend of these two approaches. However, we need a way of conceptualizing the vast quantity of research to make it meaningful and useable. Creating this framework helps to consolidate the literature in this field into a manageable form. In summary, the individual-focused approach addresses barriers to learning at a personal-level which works best for learners who are most at risk. It is also far easier to implement. However, it may require more resources and reach fewer students as the population diversifies. The system-focused approach on the other hand aims to increase inclusivity for the greatest number of students possible. So, whereas IF focuses on depth, SF focuses on breadth of learning barriers mitigated. The SF approach is harder to implement in many ways and may not meet the needs of the students who most at risk. However, it is geared toward developing a more inclusive environment which should be the goal of every engineering school. Overall, we should be considering both pathways to creating a more inclusive system. Bibliography

1 "Abstracts of Studies about Diversity in Engineering and Science" Online Ethics Center for Engineering 8/6/2009 National Academy of Engineering <www.onlineethics.org/Topics/LegalIssues/Diversity/abstractsindex.aspx> 2 "Synergies (2008 Annual Report) ". Rep. National Action Council for Minorities in Engineering. Web. <http://www.nacme.org/user/docs/NACME_AnnualReport2008.pdf>. 3 Lim, V. "A Feeling of Belonging and Effectiveness Key to Women's Success." Diverse: Issues in Higher Education 26.2 (2009): 17. 4 Kukreti, A., Simonson, K., Johnson, K., and L. Evans. "A NSF-Supported S-STEM Scholarship Program for Recruitment and Retention of Underrepresented Ethnic and Women Students in Engineering." ASEE Annual Conference and Exposition, Conference Proceedings (2009). 5 Oxford University Press. Oxford English Dictionary Online, 2010. 6 Li, A.. "Identification and Intervention for Students Who are Visually Impaired and Who have Autism Spectrum Disorders." Teaching Exceptional Children 41.4 (2009): 22-32. 7 Iliadou, V., Bamiou, D., Kaprinis, S., Kandylis, D., and G. Kaprinis. "Auditory Processing Disorders in Children Suspected of Learning Disabilities--a Need for Screening?" International Journal of Pediatric Otorhinolaryngology 73.7 (2009): 1029-34. 8 Brinckerhoff, L. C., McGuire, J. M., & Shaw, S. F. (2002). Postsecondary education and transition for students with learning disabilities. Austin, TX: Pro-Ed, Inc. 9 Scotch, R.K. "Disability Policy: An Eclectic Overview." Journal of Disability Policy Studies 11.1 (2000): 6-11. 10 Kavale, K., Spaulding, L., and A. Beam. "A Time to Define: Making the Specific Learning Disability Definition Prescribe Specific Learning Disability." Learning Disability Quarterly 32.1 (2009): 39-48

141

http://search0.scholarsportal.info/ids70/p_search_form.php?field=au&query=bamiou+doris+eva&log=literal&SID=849e50557c12fb24940ec740ed7cd8d4

http://search0.scholarsportal.info/ids70/p_search_form.php?field=au&query=kaprinis+stergios&log=literal&SID=849e50557c12fb24940ec740ed7cd8d4

http://search0.scholarsportal.info/ids70/p_search_form.php?field=au&query=kandylis+dimitris&log=literal&SID=849e50557c12fb24940ec740ed7cd8d4

11 Williams, V., Ponting, L., Ford, K., and P. Rudge. "A Bit of Common Ground: Personalisation and the use of Shared Knowledge in Interactions between People with Learning Disabilities and their Personal Assistants." Discourse Studies 11.5 (2009): 607-24. 12 Bowe, F. Universal Design in Education: Teaching Nontraditional Students. Westport, CT: Bergin & Garvey, 2000. 13 Universal Design in Higher Education: From Principles to Practice. Eds. S.E. Burgstahler and R.C. Cory. Cambridge: Harvard Education Press, 2008. 14 McGuire, J., Scott S., and S. Shaw. "Universal Design and its Applications in Educational Environments." Remedial and Special Education RASE 27.3 (2006): 166. 15 Busch, T., Pederson, K., and C. Espin. "Teaching Students with Learning Disabilities: Perceptions of a First-Year Teacher." The Journal of Special Education 35.2 (2001): 92-9. 16 Schumm, S. J., Vaughn, S., Haager, D., McDowell, J., Rothlein, L., and L. Saumell. "General Education Teacher Planning: What can Students with Learning Disabilities Expect?" Exceptional Children 61 (1995): 335. 17 Seo, Y., and D.P. Bryant. "Analysis of Studies of the Effects of Computer-Assisted Instruction on the Mathematics Performance of Students with Learning Disabilities." Computers & Education 53.3 (2009): 913-28. 18 Todd, R. E-Learning for Secondary School Teachers: Inclusive Science and Math Instruction for Students with Disabilities. Berlin: Springer, 2008. 19 LoPresti, E.F., Bodine, C., and C. Lewis. "Assistive Technology for Cognition." IEEE Engineering in Medicine and Biology Magazine 27.2 (2008): 29. 20 Tapia, R. "Minority Students and Research Universities: How to Overcome the "Mismatch"." The Chronicle of Higher Education 55.29 (2009): A72. 21 Malone, K., and G. Barabino. "Narrations of Race in STEM Research Settings: Identity Formation and its Discontents." Science Education 93.3 (2009): 485. 22 Trytten, D., Lowe, A., and S. Waiden. "Racial Inequality Exists in Spite of Over-Representation: The Case of Asian American Students in Engineering Education." ASEE Annual Conference and Exposition, Conference Proceedings (2009). 23 Crown, S., Fuentes, A., Tarawneh,C., Freeman, R., and H. Mahdi. "Student Academic Advisement: Innovative Tools for Improving Minority Student Attraction, Retention, and Graduation." ASEE Annual Conference and Exposition, Conference Proceedings (2009). 24 Gomez, T.. "Integrating Engineering, Modeling and Computation into the Biology Classroom: Development of a Multi-Disciplinary High School Neuroscience Curricula." ASEE Annual Conference and Exposition, Conference Proceedings (2009). 25 Lambright, J., Johnson, W., and C. Coates. "Attracting Minorities to Engineering Careers: Addressing the Challenges from k-12 to Post Secondary Education." ASEE Annual Conference and Exposition, Conference Proceedings (2009). 26 de Cohen, C., and N. Deterding. "Widening the Net: National Estimates of Gender Disparities in Engineering." Journal of Engineering Education. 98.3 (2009): 211-226. 27 Buchmann, C. "Gender Inequalities in the Transition to College." Teachers College record 111.10 (2009): 2320. 28 Leicht-Scholten, C., Weheliye, A. and A. Wolffram. "Institutionalisation of Gender and Diversity Management in Engineering Education." European Journal of Engineering Education 34.5 (2009): 447. 29 Garforth, L., and A. Kerr. "Women and Science: What's the Problem?" Social Politics 16.3 (2009): 379. 30 Norton, C., and D. Wygal. "Inclusive Science: Articulating Theory, Practice, and Action." National Women’s Studies Association Journal 21.2 (2009): vii. 31 Du, X., and A. Kolmos. "Increasing the diversity of engineering education – a gender analysis in a PBL context" European Journal of Engineering Education 34.5 (2009). 32 McCarthy, R. "Beyond Smash and Crash: Gender-Friendly Tech Ed. " The Technology Teacher 69.2 (2009): 16-21. 33 Chen, X. Students Who Study Science, Technology, Engineering, and Mathematics (STEM) in Postsecondary Education. Stats in Brief. NCES 2009-161. National Center for Education Statistics. Available from: ED Pubs. P.O. Box 1398, Jessup, MD 20794-1398. Tel: 877-433-7827 Web site: http://nces.ed.gov/help/orderinfo.asp, 2009. 34 Male, S., Bush, M., and K. Murray. "Think engineer, think male?" European Journal of Engineering Education 34.5 (2009). 35 Cronin, C. and A. Roger. “Theorizing progress: women in science, engineering, and technology in higher education.” Journal of Research in Science Teaching, 36.6 (2009): 637–661.

142

http://search2.scholarsportal.info/ids70/p_search_form.php?field=au&query=ponting+l&log=literal&SID=e00861f15f085a06bb0eabd5c9aaf90f

http://search2.scholarsportal.info/ids70/p_search_form.php?field=au&query=ford+k&log=literal&SID=e00861f15f085a06bb0eabd5c9aaf90f

http://search2.scholarsportal.info/ids70/p_search_form.php?field=au&query=rudge+p&log=literal&SID=e00861f15f085a06bb0eabd5c9aaf90f

http://search2.scholarsportal.info/ids70/p_search_form.php?field=au&query=bodine+c&log=literal&SID=492324f1c85f4119d521ebea3c8f4dbb

http://search2.scholarsportal.info/ids70/p_search_form.php?field=au&query=lewis+c&log=literal&SID=492324f1c85f4119d521ebea3c8f4dbb

http://search0.scholarsportal.info/ids70/p_search_form.php?field=au&query=waiden+susan&log=literal&SID=f1f9587842b0a53bf6dd10fec04242f9

http://search0.scholarsportal.info/ids70/p_search_form.php?field=au&query=fuentes+arturo&log=literal&SID=be24688bac6277da1586538396788ebf

http://search0.scholarsportal.info/ids70/p_search_form.php?field=au&query=freeman+robert&log=literal&SID=be24688bac6277da1586538396788ebf

http://search0.scholarsportal.info/ids70/p_search_form.php?field=au&query=johnson+wayne&log=literal&SID=e91be2f6523e1cdd259539d0f8af93b7

36 Fox, M.F., Sonnert, G., and I. Nikiforova. “Successful programs for undergraduate women in science and engineering: Adapting versus adopting the institutional environment.” Research in Higher Education, 50.4 (2009): 333–353. 37 Solberg Nes, L.,Evans, D.R. and S.C. Segerstrom. "Optimism and College Retention: Mediation by Motivation, Performance, and Adjustment". Journal of Applied Social Psychology 39.8 (2009): 1887-912. 38 Qualter, P., Whiteley, H., Morley, A., and H. Dudiak. "The Role of Emotional Intelligence in the Decision to Persist with Academic Studies in HE." Research in Post-Compulsory Education 14.3 (2009): 219. 39 Jones, W. A. and J.M. Braxton, “Cataloging and Comparing Institutional Efforts to Increase Student Retention Rates”, Journal of College Student Retention, 11.1 (2009-2010): 123-139. 40 Haiyan B., and W. Pan. "A Multilevel Approach to Assessing the Interaction Effects on College Student Retention". Journal of College Student Retention 11.2 (2009): 287-301. 41 Croft, A. C., M. C. Harrison, and C. L. Robinson. "Recruitment and Retention of Students - an Integrated and Holistic Vision of Mathematics Support." International Journal of Mathematical Education in Science and Technology 40.1 (2009): 109-25. 42 McQueen, H. "Integration and Regulation Matters in Educational Transition: A Theoretical Critique of Retention and Attrition Models." British Journal of Educational Studies 57.1 (2009): 70-88. 43 Oseguera, L., and B. S. Rhee. "The Influence of Institutional Retention Climates on Student Persistence to Degree Completion: A Multilevel Approach." Research in Higher Education 50.6 (2009): 546.

143

APPENDIX A.2 – IDENTIFYING LANGUAGE AS A LEARNING BARRIER IN ENGINEERING

C. Variawa, and S. McCahan. “Identifying Language as a Learning Barrier in Engineering.” International Journal of Engineering Education. Vol. 28:1, pp. 183-191, 2012. This journal paper is published in the International Journal of Engineering Education. Language used in engineering course materials may be a barrier to accurate assessment because students perceive the meanings of words differently. Universal Design in Education (UDE) has emerged as a strategy for making course material more accessible, but remains largely untested in this area. This study investigates if students can accurately self-assess their understanding of vocabulary, i.e. if this is a ‘visible’ or ‘invisible’ deficit from the student’s point of view, using a limited sample of ten words found on engineering exams. This is a preliminary investigation toward testing the efficacy of a UDE-based mitigation strategy, and finds that students often inaccurately self-assess their understanding of language used on engineering examinations.

144

1.0 – Introduction/Background

Often when we think of accessibility issues in higher education what comes to mind are physical barriers to

facility access for students or staff. However, increasingly we are aware of, and trying to address, more subtle

obstacles that may create unnecessary challenges that impact student success. These include creating appropriate

support systems for students with learning disabilities, and other “invisible” disabilities. More recently, we have

begun to recognize barriers that become perceptible as the student population diversifies. As engineers, we are

ideally situated to address this as a design problem because we recognize that designing for a broader set of users

has the potential to improve the design of a system for everyone.

The principles of universal design were first articulated in the 1960’s and 1970’s by Ron Mace and others

in the field of architecture [1, 2]. Fundamentally the goal of universal design in architecture is to design a building

or space that is intentionally accessible to the broadest range of people possible. Conceptually this means taking

accessibility into account from the beginning of the design process rather than as an afterthought. The accessible

design movement played a role in the development of legislation [3, 4]. As a result, many of our university and

college environments are now more physically accessible.

In the past two decades the principles of universal design have found their way into a number of other

fields, notably engineering. The principles of universal design in engineering have given rise to the development of

accessible transit systems, accessible information technology systems, and ergonomically designed household

products. The use of universal design in information technology is now pervasive: televisions come with built-in

closed captioning systems; text messaging is a standard feature on cell phones; screen magnifiers and readers are

readily available; and ATM machines have Braille lettering on the buttons. Universal design features are now built

in to many systems allowing the user to create a customized environment that fits their needs (e.g. Web 2.0). Design

engineers have moved from a mentality of human-centered design to interaction design and, most recently, to

experience design. In doing so, the concept of creating a system that is barrier-free and intuitive for a diverse set of

users has become a central theme in the engineering design process.

Universal design has now begun to permeate education, first at the K-12 level and more recently at the

post-secondary education level [5]. If we look at a course, a curriculum, or an institution as a designed system, then

the principles of universal design should help guide us toward a more accessible learning environment design for a

more diverse user group. There have been a number of authors who have re-interpreted the principles of universal

145

design to make them applicable to educational systems [5 - 7]. The universal design framework applies the principle

of “learner centered” not just to an individual class, but to the design of the whole learning environment at every

level. McGuire, Scott, and Shaw suggest that universal design in education (UDE) is a “paradigm shift” that

promotes uniformity of academic goals and standards by designing accessibility into a course, curriculum, and

institution, rather than making exceptions for individual students who do not fit our preconceived idea of what is

“typical” [6]. They point out that individualized accommodation will still be necessary for some students.

However, pervasive use of exceptions may undermine the integrity of a course, whereas designing accessibility into

a course opens up learning opportunities for a broad range of students. However, they have also noted that UDE

remains a largely untested strategy that requires further testing and validation. Pliner and Johnson discuss UDE in

relation to social justice and transforming social relationships which can be negatively affected by invisible barriers

to inclusivity [7]. Their work suggests that implementing UDE pedagogy creates a more “inclusive” environment

which can decrease the barriers to learning that all individuals may have to some extent (i.e. the so-called “curb cut”

effect).

A recent review of the literature shows that there is serious concern about barriers to success for students,

in both engineering and other fields, and a wide variety of approaches have been employed to try to mitigate barriers

for at-risk students [8]. UDE offers one possible approach and a framework for interpreting the impact of mitigation

tactics. It will serve as a useful context for considering the results of this study. However, we should also bear in

mind that UDE is not the only possible approach and other ways of thinking about these issues should be utilized.

2 - Purpose

A look at today’s educational institutions shows a dramatic increase in the cultural diversity of the student

population, and institutions have not fully evolved to account for this diversity. One example is the use of colloquial

language or culturally specific references on assignments and other learning materials. The student’s inability to

understand a question on an assignment, for example, can create a misalignment between the results of the

assessment and the learning objectives of the course. Basically, it compromises the validity of the assessment

because it may test colloquial vocabulary to some extent rather than just the engineering concepts. While virtually

all assessment instruments have this issue, in engineering it can be a particular problem because it is pedagogically

preferable to situate problems in an authentic context and use terminology that is authentic to practice in the

146

profession. This inaccessibility can cause a bias in the assessment, favouring those individuals who have a

particular background.

In engineering assessments, students may find questions difficult to answer if they are not familiar with the

non-course-related terminology used. In the case of an assignment, the student can get help understanding the

question. However, in a closely-supervised exam situation, which is often time-limited, it is usually not possible to

get assistance. In our experience, words such as ‘blob’ or ‘kettle’ are not specific to the engineering course material

being taught, yet they present a problem for some students when they appear on tests. Students knowing the

meaning of the word will have less difficulty in understanding the question, and ought to be able to answer it

correctly, as intended. Students not having any exposure to the word beforehand, but having sound knowledge of

the course material and the English language otherwise, will not be able to understand exactly what is being asked.

This concern is balanced by a need to ensure that students graduate with a vocabulary that allows them to operate

effectively in the profession. A broad vocabulary is a professional asset. So ideally, we would want students to

acquire a robust vocabulary but this is generally not specified in our learning objectives, and not explicitly taught or

assessed.

This vocabulary problem perennially arose in our large first year design course. Although we tried to write

tests using clear, non-culturally specific language we continued to experience problems. We did not want to “dumb-

down” the language because we felt that it is important to use accurate and authentic terminology. Therefore, we

took steps to mitigate the problem using the principles of UDE. We now develop a word list, which is posted prior

to each test in this course. This word list contains all of the infrequently used words (i.e. we leave out words such as

“and”, “the”, “are”, etc.) that appear on that particular test. We put some extra words on the list that do not appear

on the test but which are words we think are useful for an engineer to know. The word list is in alphabetic order so

the questions on the exam are not apparent from the list. The intent is to give the students an opportunity to gauge

their own level of understanding of the test vocabulary beforehand, and if required, consult information sources to

correct any gaps ahead of time. This strategy allows us to contextualize questions and use accurate, authentic

engineering terminology. However, this practical and simple approach to dealing with the problem is predicated on

several key assumptions, only a subset of which is investigated by the study. Some of the broader assumptions

include:

147

1. That the use of common, but infrequently used, words and terms may compromise the validity of the

assessment for some students. We are relatively certain this is true, based on experience, but research data

on the frequency and degree of the problem is not available. In addition, there is currently no existing data

about language on engineering examinations.

2. That given a list of words, students can correctly assess their level of understanding of these words. To

make good use of the word list this must be true, but we have no research that supports this assumption.

This study attempts to generate some data to test this assumption.

3. That students can independently learn the meaning of the words and phrases on this word list effectively.

Again, to make good use of the word list this must be true, but we have no research that supports this

assumption.

This study begins to address the second assumption. The primary objective of the broader study is to

analyze how well students can self-assess their understanding of problematic words that could appear on engineering

assignments or tests, i.e. to identify whether or not infrequently used words are an invisible or visible barrier for

students. This is the beginning of a long-term study to describe the accessibility issues that arise from language on

engineering learning materials, and develop tools for addressing this issue (if we find it exists). The specific

element of this larger study that we are examining here is whether students can correctly assess their understanding

of non-course-related words used in engineering examinations. While it is relatively easy to measure how the

addition of a ramp in place of stairs makes a building more accessible for many types of users, it is more challenging

to test how a change aimed at reducing language barriers in an engineering course could result in improved learning

for a variety of people. However, applying the principles of UDE has the potential to not only result in

improvements for people who would otherwise be “at-risk” but also improve the quality of the learning environment

for a broad range of students (i.e. the so-called “curb cut effect”). Other concurrent concerns are maintaining the

integrity of the learning objectives and the economic feasibility of changes to the system.

Within this type of potential barrier the authors chose to focus their attention on three categories of

colloquial language prevalent in engineering examinations, namely:

1. non-course-specific technical terminology,

2. culturally-based words, and

3. linguistically-difficult terminology.

148

These categories are very rough, and there is overlap between them, but we developed these approximate

groupings based on examination of the word lists we had prepared for our first year design course exams over a

number of years. It is worthwhile noting that while the authors did not find any pertinent literature suggesting such

groupings, literature in areas such as composition studies and linguistics may inform the development of such rough

categorization by viewing them from unique perspectives that take into account language development.

The first category contains words commonly used in North American society that have reference to

technology. Examples of such words or short-phrases include: mouse pad, power bar, remote control, and ear buds.

The second category, culturally-based words, includes words that are used only regionally or within a specific

culture. For example, a “typical” North American would be familiar with the “hood” of an automobile, whereas a

“typical” Western European would refer to it as the “bonnet”. Further examples of such words and phrases include:

loonie, Jell-O®, an efficiency apartment, and flapjack. There are, of course, words that fall into both the technical

terminology and culturally-based word categories. An example is “coordinates” (i.e. email address) which is used in

some regions of the world but is not common in the U.S. This is both cultural and technical. However, for this

preliminary study we assigned words to only one of the three categories for simplicity.

The third category that we have used for the classification is ‘linguistically-difficult’ terminology.

Essentially, words in this category fall into neither the first nor second classification, are not course-specific, yet

may cause difficulty in understanding the elements of an engineering assessment because they are outside the

everyday vocabulary of students. Examples of such words include: propagate, succinct, and happenstance.

3 - Design/Method

Our study analyzed the responses of forty very diverse undergraduate engineering students who each

completed a questionnaire containing ten words which might be found on an assignment or test. The participants for

this study represented a mix of very diverse ethnic and cultural backgrounds; a variety of native and non-native

speakers of English; representation of different genders; and were all aged 18-22 (typical undergraduate student

age). These words were chosen by the authors because they fit fairly well into one of the three categories we are

interested in exploring. In this preliminary study no attempt was made to choose the words using a more systematic

method. After fulfilling the ethics process at our institution, the study began by training the participants; they

learned about the task they were being asked to perform; the scale they would be using; and the motivation for the

149

study. This was meant to establish a clear purpose to this study and motivate the participants to provide genuine

answers. Then, the participants individually numerically assessed their understanding of each of 10 words on a

questionnaire on an equal interval scale from ‘0’ to ‘5’; with ‘0’ representing no knowledge of the word, and ‘5’

representing superior understanding. This is the “perceived-understanding” (PU) score. These words represented

the three categories mentioned, and a detailed explanation of the rating scale was provided on the question sheet to

minimize ambiguity. Finally, each participant was asked to write a maximum of 5 synonyms and/or a brief

definition of each word within a textbox to provide evidence of their level of understanding. To reduce ambiguity,

the most recent Oxford English Dictionary’s (O.E.D.) definition of ‘synonym’ was written as a footnote on the

question sheet, and participants were free to ask questions at any time. The researchers then consulted the O.E.D.

for the correct definitions and synonyms for each of the ten words used in the study. The responses from each

participant were compared against these dictionary definitions by the researchers and given a score. We counted the

definition as fully correct if it matched in meaning to at least one of the definitions for the word. The closeness of

correlation between the dictionary definition and the student’s definition was assigned an integer value from 0 to 5 –

this is the “observed-understanding” (OU) score. Finally, the participant’s PU score was compared to the OU score.

4 - Results

Table 1 shows the words used in the study. Below each word, in parentheses, is the category that best

describes the word. To the right of the word is a histogram comparing the sum of the OU and PU scores. The graph

in the right-hand box of each row shows the relationship between these scores based on occurrence. The larger-

diameter circles indicate a larger proportion of participants having that specific outcome. Although only 10 words

were used in this investigation, the number of participants resulted in a substantial data set requiring further methods

of analysis. Table 2 shows a summarized ANOVA for the statistical significance of the findings, in addition to

Figures 1 and 2 that examine the aggregate data of all words combined.

Figure 1 shows the frequency of the difference between OU and PU for all of the words together. Ideally,

an accurate self-assessment would mean that the OU and PU scores would be identical (OU-PU=0). The data,

however, shows that OU and PU scores were quite often different. The skew in Figure 1 demonstrates visually that

participants more often overrated their understanding of the words, and the bar charts in Table 1 and pie chart of

Figure 2 reiterate this point. We found that students correctly self-assessed their understanding 34.5% of the time

150

and overrated their understanding 52.8% of the time; they only under-rated their own understanding 12.8% of the

time as summarized in Figure 2.

When we examined the OU/PU ratio for each word (Table 1, left column), we see that there are noticeable

differences between words. Words such as “bungalow”, “fax”, “Jell-O”, “bonnet” and “mold” have an OU/PU ratio

relatively close to 1, suggesting that students are more likely to correctly self-assess their understanding of these

words. Conversely, the OU/PU ratio tells us that students are less likely to correctly self-assess their understanding

of words such as “tolerance”, “feasible”, “propagate” and “field”. This is important because, although words like

“bonnet” have a low overall PU and OU, students are apparently aware of their lack of understanding which makes

this type of word a visible learning barrier for them.

The data also shows that students believe they understand some of these words well; these words have

higher PU scores relative to the other words. For example, the students think they know the word “tolerance” better

than the word “bonnet”, however the observed understanding scores of these two words are quite similar. Table 1

shows that the words “field”, “fax”, and “feasible” are known to many of the participants; however, the students

substantially overrated their understanding in several of these cases.

To better understand the accuracy of self-assessment, we calculated the residual of each data point to the

line PU=OU for each word (shown in parenthesis in the scatter plots in Table 1). This number is calculated by

taking the sum of absolute differences for each data point to the line PU=OU. The results show that a word like

“tolerance” is consistently misjudged since it has a high residual relative to the other words. In this study, smaller

residuals suggest a more accurate self-assessment. For example, the scatter-plot for “fax” is skewed and clustered to

the upper-right (Table 1), so it is difficult to interpret the data from the scatter plot alone. However, it has a

relatively small residual relative to other words which demonstrates that students are typically correctly self-

assessing their understanding of this word. The combination of a high average OU score plus the low residual tells

us that students both understand this word accurately and are aware that they know it.

In contrast, the sum of scores plot in Table 1 for the word “succinct” suggests that it was not a well-known

word to most participants. Interestingly, it has a lower residual when compared to the other words as well. This is

because 38% of the participants had an OU=PU value that was zero. So, although the lower residual suggests an

accurate self-assessment (high correlation between OU and PU), the unclustered distribution of its scatter-plot

suggests it is a particularly inaccessible word for most students. Additionally, this case illustrates that the residual,

151

OU/PU ratio, sum of scores graph and scatter-plot should be considered together to formulate a more complete

understanding.

We also investigated the statistical significance by performing an ANOVA on the mean of the OU score

and PU score for each word, as seen in Table 2. The results show that “bonnet”, “bungalow” and “fax”, and to some

degree “Jell-O” (which just misses the threshold of p=.05), are self-assessed accurately. However, the other words

are not. Table 2 shows the means and standard deviations for the OU and PU scores, as well as the difference

between the two, and the t-test results for each word. It is interesting to note that “Jell-O” is not included in the list

of accurately self-assessed words when using this method, even though the scatter plot in Table 1 might indicate

otherwise. It is clear that the cultural terms we tested had the least variability in OU/PU ratio, and the highest

OU/PU values. In addition, the distribution of scores on the scatter-plots for the cultural terms shows that many

students are unfamiliar with these terms, but they recognize this lack of familiarity; it is visible to them. These

results appear to indicate that students more accurately assess their understanding of cultural words, or at least this

small subset of words.

The results for linguistically-difficult and technical words are more complex. The OU/PU ratio and

residual values for linguistically-difficult words are relatively consistent. The OU/PU values, for example, fall in a

relatively narrow range from 0.65 to 0.76 which is lower than the values obtained for the cultural words. This

indicates that students are consistently unaware of their misunderstanding of these words. The technical words, by

comparison, show far less consistency. There seems to be no clear trend for the technical terms, some are very well

understood and accurately self-assessed, such as “fax” and “mold”. While “tolerance” had a surprisingly low degree

of understanding and the level of understanding was poorly self-assessed. It will require further investigation

involving more words, ideally evaluated in context, to fully characterize the issue particularly for technical non-

course-specific terminology.

In this study, we found that students are typically better at self-assessing their understanding of cultural

words and had difficulty assessing their understanding of linguistically difficult words. This suggests that cultural

and perhaps even technical words are more often visible barriers to accessibility, while non-course-related

linguistically difficult words may more often represent invisible barriers. That is, students may not seek clarification

of a linguistically difficult word because they incorrectly believe they have a sufficient understanding of the word.

This type of invisible barrier has an analogy in misperceptions of basic physics concepts (which have been studied

152

extensively, e.g. the force concept inventory), or other pre-existing misconceptions, which need to be taken into

account to make instruction effective. These conclusions are limited, however, by the words that were used in this

study. A more extensive investigation, particularly examining the understanding of words in context, would be

needed to fully elucidate this issue.

It is important to note that the scope of this study limits the generalizability of the data. Specifically, we

cannot confidently predict whether misunderstanding a specific term can inhibit overall learning and the student’s

ability to succeed on assessment measures. Although making such claims might sound intuitive, this data is limited

and there is little additional data in the literature to support such a claim. Further research needs to be performed.

5 - Discussion

We can draw some preliminary conclusions from this study that should be tested further. From our

observations in the classroom, we find that language can be a barrier to accurate assessment of learning for some

students. This study provides in a very limited way whether these barriers are visible or invisible to students in the

form of ten words. Although this study is just a small element in the larger investigation of inaccessible language,

this study informed preliminary data about how students perceive their understanding of ten words found in

engineering exams. We found that all of the words tested were unfamiliar to some degree: no term had an average

observed understanding score above four. As expected the findings illustrate that students do not understand

colloquial language identically. We also found that these students did not accurately self-assess their understanding

of such words consistently. Perceived-understanding scores were consistently higher than the observed-

understanding scores. This shows that these students tended to over-rate their understanding of colloquial words and

this is appears to be especially true for linguistically-difficult words. This consistent over-rating is an example of a

learning barrier that students are unaware of, it is an “invisible” barrier to learning. This information can help us

create techniques that assist in vocabulary clarification to reduce these learning barriers.

The existing literature on accessibility is extensive and spans across several disciplines including equity,

disability, gender, and among others, higher education studies [9]. This literature helps to explain why language is

integral to an inclusive learning environment [8]. Specifically, the fact that learning barriers exist in in the language

of engineering course materials may be one reason why students (especially first-year students) find it challenging to

adjust to an environment that appears to be culturally foreign [10]. The finding that cultural language is a visible

153

barrier might be why students often attribute this alienation mainly to cultural-acclimatization. We may be

underestimating the role of invisible language barriers, such as the use of linguistically-difficult words. Specifically,

our findings suggest that it would be worthwhile to investigate further the impact of these invisible language barriers

on inclusivity.

Some work in the field of composition studies appears to link vocabulary and related issues to educational

discourse, and may inform a promising approach to such further investigation. Specifically, Bartholomae’s seminal

work has led to further exploration of how language can create a barrier to learning [11, 12]. For instance, learning

how to write like an “expert” may produce barriers if the student is unconfident in their current writing style; further

research shows how individualized approaches to language and vocabulary in the classroom may interfere with what

is considered “correct” in that field. Though integral to learning about language in academia, our study at-present

has a limited scope pertaining to self-efficacy in accurately identifying understanding of ten words on engineering

examinations. In addition, our study is to test if students can use this information to gauge their understanding of

these words; further discourse into composition studies and related fields is very useful albeit out of the scope of this

particular study.

While both visible and invisible learning barriers hinder student success, this study might hint that a UDE

approach such as word lists posted prior to an exam may be useful as a mitigation technique particularly for some

types of words. Since students are likely to accurately self-assess their understanding of colloquial-cultural

language, word lists of cultural terms may be an effective mitigation strategy for this particular type of learning

barrier. However, this is a very preliminary study of the situation, and a more thorough investigation can provide a

more complete picture of the issues. In addition, our results suggest that such word lists may not be as useful for

technical and linguistically-difficult words. Linguistically-difficult words, in particular, are different because they

often appear to be invisible barriers to understanding, which suggests that these words need to be identified as

unfamiliar before word lists can become an effective tool. Additionally, this mitigation tactic continues to assume

that students can independently learn the meaning of words once they are aware of their lack of understanding. The

principles of UDE provide guidance on creating a more accessible learning environment, but further study is needed

to identify how UDE can be used when the barriers to accessibility are invisible to the student.

This study is just a first step in elucidating the issues that arise with the contextualization of problems in

engineering learning materials. We need to better describe the vocabulary that is presenting difficulty for our

154

students, and then find methods for dealing with these barriers. One way of possibly alleviating language issues is to

develop tools (e.g. software) that explicitly identify inaccessible language for both the instructors and students. This

would allow the participants in the learning environment to personally choose how to mitigate the potential barriers.

Our future work will also consider learning barriers in engineering more broadly: Taber’s typology of learning

impediments can potentially be a starting point for this research [11]. Ideally, confronting these issues using a UDE-

based approach increases accessibility for everyone, not just those identifying cultural words as a learning barrier,

since both the instructors and students benefit from more valid assessment.

6 - Conclusions

From this study we have learned that colloquial language as a learning barrier can be characterized along a

spectrum from visible to invisible; the types of words that can be classified into each of these categories; and that we

can use this information to develop possible mitigation tactics. Within the context of ten words, our results show

that undergraduate engineering students view and understand colloquial language uniquely from each other and

from the instructor. Further, the accuracy of self-assessing one’s understanding of inaccessible language is

determined by the visibility of the learning barrier itself. These inaccessible terms can be roughly classified into

colloquial cultural, technical, and linguistically-difficult language; only the first appears to be a visible

inaccessibility for students according to our dataset. To mitigate potential effects of using colloquial-cultural

language on exams, we suggest that the use of word sheets containing these terms might be effective while

promoting a UDE approach to instruction. To reduce inaccessible vocabulary, the author’s future work includes

broadening the scope of this study to a larger corpus of language, then analyzing and developing a software-based

approach whose interface suggests accessible alternatives for identified visible and invisible language issues on

engineering assessment instruments.

Word

Histogram: Sum of Score The left bar represents the sum of perceived-understanding (PU) scores. The right bar represents the sum of observed-understanding (OU) scores.

Scatter-plot Data: X-axis: perceived-understanding scores (PU). (student’s self-assessed understanding). Y-axis: observed-understanding scores (OU). (understanding assessed based on written definition).

155

The number in parenthesis is the sum of absolute differences from each data point to the line PU=OU (y=x).

Succinct (Linguistic)

OU/PU=0.76

Propagate (Linguistic)

OU/PU=0.69

Feasible (Linguistic)

OU/PU=0.65

Field (Linguistic)

OU/PU=0.70

-10123456

-1 0 1 2 3 4 5 6

-10123456

-1 0 1 2 3 4 5 6

-10123456

-1 0 1 2 3 4 5 6

-10123456

-1 0 1 2 3 4 5 6

(56)

(57)

(31)

(53)

156

Mold (Technical)

OU/PU=0.87

Tolerance (Technical)

OU/PU=0.47

Fax (Technical)

OU/PU=0.95

Jell-O® (Cultural)

OU/PU=0.93

Bungalow (Cultural)

OU/PU=0.95

-10123456

-1 0 1 2 3 4 5 6

-10123456

-1 0 1 2 3 4 5 6

-10123456

-1 0 1 2 3 4 5 6

-10123456

-1 0 1 2 3 4 5 6

-10123456

-1 0 1 2 3 4 5 6

(34)

(81)

(24)

(17)

(39)

157

Bonnet (Cultural)

OU/PU=0.87

Table 1. Shows an individual analysis of each word. The sum of scores graphs show overall confidence and relative difference in OU and PU scores. The scatter-plots show interaction effects.

Figure 1. Shows the number of times the self-assessment is ideal (OU-PU=0) and the general tendency towards over-assessment (OU-PU<0). This is an aggregate of all words used in this study.

-10123456

-1 0 1 2 3 4 5 6

0

20

40

60

80

100

120

140

160

-5 -4 -3 -2 -1 0 1 2 3 4 5

# O

ccur

renc

es

OU-PU

Accuracy of Self-Assessment

(29)

158

Figure 2. Shows that the relative frequency of over-rating understanding is greater than accurate and under-rating understanding combined. This is an aggregate of all words used in this study.

Word Means Stdev PU-OU Means

PU-OU Stdev t-test

Bonnet PU 2.1 1.582

.275 1.062 t(39)=1.64, p=.109 OU 1.83 1.81

Bungalow PU 3.25 1.565

.175 1.338 t(39)=0.83, p=.413 OU 3.08 1.94

Fax PU 4.03 0.800

.200 0.883 t(39)=1.43, p=.160 OU 3.83 0.747

Feasible PU 3.93 0.797

1.375 1.125 t(39)=7.73, p=.000 OU 2.55 1.011

Field PU 4.13 0.686

1.250 1.056 t(39)=7.49, p=.000 OU 2.88 0.822

Jell-O PU 3.73 1.219

0.250 0.742 t(39)=2.13, p=.040 OU 3.48 1.585

Mold PU 3.03 1.310

0.400 1.105 t(39)=2.29, p=.028 OU 2.63 1.462

Propagate PU 2.88 1.285

0.900 1.336 t(39)=4.26, p=.000 OU 1.98 1.544

Succinct PU 1.95 1.974

0.475 1.132 t(39)=2.65, p=.011 OU 1.48 1.853

Tolerance PU 3.90 0.672

2.075 1.385 t(39)=9.48, p=.000 OU 1.83 1.174 Table 2. Shows the statistical significance of accurate self-assessment.

53%34%

13%

Frequency in Self-Assessment

OU-PU < 0

OU-PU = 0

OU-PU > 0

159

Acknowledgment The authors gratefully acknowledge Prof. Mark Chignell for his input on computational methods and for the participants of this study for their time.

References

1. North Carolina State University Center for Universal Design, http://www.design.ncsuedu/cud/univjdesign/ud.htm, Accessed 5 April 2010.

2. W. L. Wilkoff, and L. W. Abed, Practicing universal design: An interpretation of the ADA. Van Nostrand Reinhold, New York: NY, 1994.

3. Americans with Disabilities Act of 1990, P.L. 101-336, 104 Stat. 327, 42 U.S.C. 12101 et seq. 4. Telecommunications Act of 1996, P.L. 104-104, 110 Stat. 56. 5. F. Bowe, Universal design in education: Teaching nontraditional students, Bergin & Garvey, Westport: CT,

2000. 6. J.M. McGuire, S.S. Scott and S. F. Shaw, Universal design and its applications in educational environments,

Remedial and Special Education, 27(3), May-June 2006, pp. 166-75. 7. S.M. Pliner and J. R. Johnson, Historical, theoretical, and foundational principles of universal instructional

design in higher education, Equity & Excellence in Education, 37(2), 2004, pp. 105-113. 8. C. Variawa and S. McCahan, Design of the learning environment for inclusivity, Proceedings of the 2010

American Society for Engineering Education Annual Conference and Exposition, Louisville: KY, June 20-23 2010.

9. L.C. Brinckerhoff, J.M. McGuire and S.F. Shaw, Postsecondary education and transition for students with learning disabilities, Pro-Ed, Inc., Austin: TX, 2002.

10. D. Trytten, A. Lowe and S. Waiden, Racial inequality exists in spite of over-representation: The case of Asian American students in engineering education. In Proceedings of 2009 American Society for Engineering Education Annual Conference and Exposition. Austin: TX, June 14-17 2009.

11. D. Bartholomae, Inventing the University, Journal of Basic Writing, (5), 1986, pp. 4-23. 12. A. Johns, Text, Role, and Context: Developing Academic Literacies. Cambridge University Press, New

York, 1997. 13. K.S. Taber, The mismatch between assumed prior knowledge and the learner’s conceptions: A typology of

learning impediments, Educational Studies, 27(2), 2009, pp. 159-171.

160

http://search0.scholarsportal.info/ids70/p_search_form.php?field=au&query=waiden+susan&log=literal&SID=f1f9587842b0a53bf6dd10fec04242f9

APPENDIX A.3 – FREQUENCY ANALYSIS OF TERMINOLOGY ON ENGINEERING EXAMINATIONS

C. Variawa, and S. McCahan. “Frequency Analysis of Terminology on Engineering Examinations.” Proc. of 118th ASEE Annual Conference and Exposition. ASEE Paper No. AC 2011-1565. Vancouver, 2011. This paper was presented at the 2011 American Society for Engineering Education Annual Conference. This paper reviews the literature on frequency analysis of words, and presents a study that analyses the frequency of words on engineering final exams at the University of Toronto. Discussed within the context of Universal Instructional Design and learner characteristics, this work is an initial investigation in designing a strategy that computationally characterizes vocabulary in engineering education.

161

Frequency Analysis of Terminology on Engineering Examinations

CHIRAG VARIAWA AND SUSAN MCCAHAN University of Toronto

Abstract There have always been differences between instructor expectations of what students “should know” and the actual background experience that students have entering an engineering program. The divergence between this assumed knowledge and the actual knowledge base may be increasing as the student population diversifies. The issue is not just wide differences in preparation in basic math, or science, or communication ability, but diversity in the cultural background of students. While we frequently laud diversity we have not always followed this up by supporting inclusivity in our classrooms and finding ways to bridge cultural differences that may exist. Specifically, when we contextualize technical material to situate an engineering problem in a real-world scenario, students are subject to a test of their background experience and vocabulary – so, instead of clarifying a technical concept, the context may make the concept more inaccessible. This may also compromise the inclusivity of the learning environment, causing students to doubt their suitability for studying engineering.

This represents an instance where learner characteristics are misaligned with the expectations of the learning environment, and there has been little research in this particular area of engineering education. The goal of the current study is to evaluate the vocabulary we use in engineering education, so in future work we can consider the alignment between the vocabulary used and learner characteristics. As raw data we are using an exam bank that contains final examinations for all engineering courses at the University of Toronto. A frequency analysis of the words and terms used on the exams has been carried out, excluding course specific technical terminology. The hypothesis is that infrequently used words and terms are typically less familiar to students. This study is the first step to testing this hypothesis.

The results of the frequency analysis are analyzed with respect to: 1. The distribution of words and terms that are used on exams. 2. The relationship between the vocabulary used on particular types of exams and natural

language. 3. The work of authors like van Rijsbergen to understand how a proxy system for familiarity

might be developed. The results are discussed within the theoretical framework of learner characteristics and interaction with the learning environment. In particular, the results are examined with reference to the literature including Universal Instructional Design (UID), and critique of the UID approach.

162

Introduction The engineering student population is becoming increasingly diverse in recent years.1,2 As a result, the diversity of background experience and vocabulary that students bring with them to university is increasing as well. As these students integrate within engineering institutions, they may face issues of inclusivity and accessibility to course material because of their diverse backgrounds. One such dimension that particularly impacts student inclusivity is that of language.3 Students may face barriers to learning when the language of instruction and assessment does not accommodate differences in learner characteristics. The problem is that students may actually have a different corpus of language than instructors assume they have. For example, when a student encounters a term that is unfamiliar to them, the word creates a barrier to understanding. This barrier may inhibit learning, or compromise the validity of assessment if the student’s lack of understanding is not addressed. Some vocabulary (course specific vocabulary) is explicitly taught. However, when unfamiliar vocabulary is used, and not explicitly taught, it creates a misalignment between the learning environment and the learner. The learner experiences this as a barrier to accessibility of the learning environment

A potential solution to the issue of inaccessible language might appear to be the use of plain language. Plain language, is the notion that clear and simple language is the most accessible and logical way of communicating with one another. There is plenty of literature in this area, and there are several studies that show the benefits of using plain language.4-6 However as educators, we want our students to develop a deep and robust vocabulary as part of their engineering education. This is particularly important because the mastery of technical and professional corpora of language is beneficial for students and practicing engineers alike. As a result, educators cannot simply use plain language at an elementary level to address this issue, but instead need to investigate the issue of inaccessible language in their curricula.

The identification of inaccessible vocabulary has several advantages for engineering education. First, it addresses a barrier to accessibility that is becoming increasingly prevalent as the learning population diversifies. Second, it encourages both the instructor and student to develop resilient professional vocabulary while helping students over the barrier.

Our hypothesis is that word familiarity is correlated with word frequency. If this is true then words that appear frequently in teaching materials are better understood by students, and words that appear infrequently are more likely to be unfamiliar. The first step in this investigation is analyzing the frequency of words in a typical engineering classroom. Specifically, we believe that this approach will provide some insight on the issue of inaccessible vocabulary used in engineering education. Additionally, we hypothesize that words which appear frequently are a less significant accessibility issue than those that appear infrequently. This study measures the frequency of words in one particular type of learning material, undergraduate final exams, because this method of closely-supervised assessment is common in engineering education and provides a substantial database.

163

Some analysis techniques in the area of vocabulary frequency-analysis are presented by C.J. van Rijsbergen.7,8 Primarily, his work comments on the use of Zipf’s Law to understand the statistical distribution of words in language. Zipf’s law states that the most frequent word in an article of text will appear twice as frequently as the second most frequent word, and four times as frequently as the third most frequent word, etc.9 Thus, the expected result of a frequency analysis is a hyperbolic curve with a narrow range of frequently-appearing words and broad range of infrequently occurring words. To better understand the validity of this theorem, Li performed a study using a uniform distribution of all 26 letters, plus a “space” character to study the effect of Zipf’s law in different cases. His approximation established that the law is indeed valid no matter what vocabulary is used, but that its effect is more pronounced in natural language.10 The rough theory behind this phenomenon is that humans use both frequent and infrequent words that may or may not have meaning on their own.11,12 Additionally, this theory also helps explain why it is important to remove the word “the” (and others like “vowel-less” words) from frequency analysis studies; the word “the” is the most common word in the English language.7,13 Overall, Zipf’s law is one concept we can use to interpret the data acquired from the frequency analysis of words.

Methodology

The objective of the current work is to develop a list of words ranked by frequency for a set of engineering course final exams. These lists will then be processed in two ways. First, the word lists will be inputted to a database program so that the frequency and rank can be accurately matched to its corresponding word and exam. Second, the lists will be plotted graphically to determine overall trends in the data. The expected output from this process will be a dataset of vocabulary with each word being tagged by rank and frequency.

The study investigates the frequency of words used on engineering examinations at the University of Toronto. Final examinations were chosen for this study for several reasons. The database of final exams is readily available. At this institution final exams from previous years are posted on a publicly-accessible website so that students can use them as study aids. Also, students are not able to access assistance during an exam, which means that they must rely on their a priori vocabulary to make sense of the questions. And as a critical assessment in a course, the exam should be testing the student’s understanding of the course concepts rather than the student’s vocabulary. Presumably, the instructor has taken this into account when developing the exam. Finally, every exam in this program is the same duration, 2.5 hours, which allows for some common basis of comparison (e.g. number of words on the exam).

These exams are posted in PDF format, and include information about the course and instructor. We started by downloading the most recent exams from the freshman courses in Materials Science Engineering (MSE). An advantage of using this set of exams is that these courses are the same, or very similar to, courses taken by other freshman engineers at many institutions. Further, the authors have experience with the content and assessment objectives of each exam,

164

making is easy to identify vocabulary that is explicitly taught in each course. Additionally, using electronic exams enabled a computational solution to performing the frequency analysis.

To perform the frequency-analysis, several software tools are used. Each exam was first processed using Adobe® Acrobat Pro v.9.4.1 to make the text in each document searchable electronically. The optical character recognition (OCR) engine in this software is responsible for converting static images into text. The main advantages of using this software are that it dramatically reduces the amount of time and effort required to input text for the frequency computation, and reduces human error in data entry. A disadvantage of this approach, however, is that each word is not vetted by a human prior to entry. This means that typos in the original document are treated as actual words, and these eventually become part of the compiled database. Additionally, there may be cases where the software disregards disfigured words because they are unintelligible to process. In the case of distorted words, the authors inputted the problematic word manually.

The next step in this process was to use a program called Hermetic Word Frequency Counter Advance v.12.45. The program calculates the frequency of each word, and outputs the data as a text file which includes rank, word, and word frequency, and a unique identifier for each word. During this process, we instructed the program to disregard particular words, i.e. exclude particular words from the computation. Specifically, the program ignores specified character strings that are illogical and may confound the results (i.e. words that have less than 2 characters, contain a hyphen, that are just repetitions of the same letter, or which lack a vowel or ‘y’). In addition the word “the” was excluded from the data. “The” is the most frequent word in the English language13 and dwarfs all other words in terms of frequency, making it difficult to illustrate the data graphically. As discussed in the previous section, literature in the field of word frequency generally suggests excluding “the” from an analysis. One particular advantage to removing these words now is to reduce the likelihood of an erroneous word being processed, and to reduce the clutter in the database. One disadvantage of this software system, however, is that words with prefixes and/or suffixes are considered to be their own unique word. For example, the word “gear” would be considered different from “gears”. This limitation is one that the literature recommends be addressed, but there is currently no automated method for accomplishing this, and no clear systematic approach described in the literature. So we have not manipulated the data to combine the results for variations of the same word. . Additionally, the set of vocabulary that is very clearly course-specific was removed from the frequency-analysis. For instance, words such as “modulus” and “necking”, which are specific to the materials science course content, were removed from the dataset because the meaning of these types of specific technical words would be explicitly taught in the course. Our intent is to focus on vocabulary that is not explicitly taught; that is, words that the instructor has assumed every student understands without instruction. Specifically, we believe that accessibility issues can result when a term is unfamiliar a priori and not explicitly taught. The authors acknowledge that labelling a word as course-specific is subjective and presently non-rigorous; we intend to use a

165

corpora of technical vocabulary distilled from the textbooks used in each course to mitigate this problem. Additionally, we also acknowledge that the exam sample size (N=9) is presently small. However, the word lists are substantive (n=565, in total). This provides a large vocabulary sample that is indicative of the language used in introductory engineering courses.

The word lists that were produced from this process are sorted by frequency. The data was then plotted to produce graphs comparing the vocabulary frequency, and this was used as the basis for comparing different exams to one another. Further, the frequency distributions for each exam were examined statistically to understand how different exams compare with one another. This method produces data that can be mined in a variety of ways to better understand the language we use in the engineering learning environment.

Results and Discussion

The frequency analysis produced 9 datasets containing ranked frequency distributions of the words used on each exam. The data shows that the occurrence of words which we might assume are very familiar, such as “name”, “clear”, or “length”, are not particularly frequent (nor consistently infrequent). Additionally, the data shows that mathematics exams generally have fewer words than other exams. The data shows that all exams have a roughly-hyperbolic distribution of words per Zipf’s law; with some words occuring extremely frequently, and most occurring only once. Plots of the frequency distributions are shown in Table 1.

Table 1 shows the exams and the word frequencies. The first column shows the course title, and the total number of words. The information in parenthesis gives a very brief description of the type of course. The third column in Table 1 graphically shows the frequency analysis. The vertical axis is the “occurrence percentage”. This normalized value is calculated by dividing the number of occurrences of a specific word by the total number of words on that exam. This number shows how common the word is to the particular exam. For instance, the highest occurrence percentage is 12%, which is the word “marks” on the Physical Chemistry exam.

The data shows that the correlation between the independent and dependent axes is not linear, it is hyperbolic. This follows Zipf’s law, demonstrating that a small number of words are used much more frequently than all others. The frequency distribution shows that exams that have fewer words have a weaker hyperbolic correlation between occurrence percentage and rank. According to Zipf’s law, this could mean that the language used in these exams has less similarity to natural language. Table 1 shows that mathematics courses generally have lower word counts than other types of courses, meaning that they may use language that has less resemblance to natural language.

Another observation is that the maximum value of the percentage-occurrence of the most frequently used word is not predicted by the type of exam being analyzed. It is also noted that the most frequent word (other than “the”) is not the same for any two of these exams. So it

166

appears that different exams use a different corpus of vocabulary, even within a particular subject, e.g. mathematics.

Basic statistical analysis was also carried out for this dataset. The minimum number of words used is 91 (Calculus I), and the maximum is 565 (Introduction to Materials Science). The mean and standard deviations for this dataset are 282.7 and 164.3, respectively. This indicates that there is large variability in the word count for the exams studied; some exams have a much higher count than others. Overall, the data offers a preliminary look at the way vocabulary is utilized in engineering learning materials.

Table 1- Course information and frequency of unique words on the final exam. Horizontal axis – unique words Vertical axis – % occurrence. (Calculated by taking the # of occurrences of that word, dividing by the # of unique words present, and then multiplying by 100).

Name (Category)

Frequency of Words

Calculus I (Mathematics) # of unique words: 91

Calculus II (Mathematics) # of unique words: 172

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

0 20 40 60 80 100

0.00%

2.00%

4.00%

6.00%

8.00%

0 50 100 150 200

167

Linear Algebra (Mathematics) # of unique words: 165

Physical Chemistry (Chemistry) # of unique words: 221

Engineering Strategies and Practice (Engineering Design and Communication) # of unique words: 525

Introduction to Materials Science (Materials Science) # of unique words: 565

0.00%

2.00%

4.00%

6.00%

8.00%

0 50 100 150 200

0.00%2.00%4.00%6.00%8.00%

10.00%12.00%14.00%

0 50 100 150 200 250

0.00%

2.00%

4.00%

6.00%

8.00%

0 100 200 300 400 500 600

0.00%

2.00%

4.00%

6.00%

0 100 200 300 400 500 600

168

Fundamentals of Computer Programming (Computer Programming) # of unique words: 315

Electrical Fundamentals (Electrical Circuits) # of unique words: 304

Mechanics (Statics) # of unique words: 183

The results of this study can be situated in the context of the existing literature. For example, if the language used on engineering exams is typical of natural language then it should follow Zipf’s law of word frequency. Luhn’s work in the field of information retrieval suggests methods of data mining that can be applied to word frequency datasets.7,14, 15 And Van Rijsberben provides a critique of these approaches, and investigates other methods that can aid in creating word frequency analyses that are more meaningful.7

Zipf’s law states that the frequency of any word is proportional to its rank in the frequency table.9 From the limited set of exams analyzed to date it is clear that mathematics exams have a weaker correlation with Zipf’s law than the other types of exams. As with Li’s study10 mentioned earlier, this implies that the mathematics exams in this sample use language that departs from natural language. Examining the word lists from the math courses and design

0.00%

2.00%

4.00%

6.00%

8.00%

0 100 200 300 400 500

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

0 50 100 150 200 250 300 350

0.00%

2.00%

4.00%

6.00%

0 50 100 150 200

169

course supports this. Words that are common in natural language, such as “then”, “if”, and “but” appear infrequently on the math exams. Words that appear frequently on the math exams include “point”, “space”, and “choice” which are less usual in natural language.

Luhn worked extensively with information retrieval technologies, and suggests ways of data mining for accurate retrieval based on input queries. Specifically, he suggests that words be assigned tags and weightings. Tagging, for example, can be used to distinguish unique word definitions, e.g. allow “Apple” the company name to be distinguished from “apple” the fruit. Further, words that are variations of the same root word can be assigned the same tag. This reduces the clutter in the dataset because synonymous words are deleted, assuming the tagging has been done carefully. Weighting, however, means that we assign values to words; the values could be assigned based on a particular set of criteria. For example, we could assign terms that are less familiar to our students a higher weighting than terms that are more familiar, if there was data on familiarity. In general, tagging and assigning weights are both examples of grouping techniques that help condense the dataset into more manageable units. Used together, these methods may help to identify words that combine frequent use with low familiarity, i.e. words that may pose the most frequent, and significant learning barriers for students. This is an important consideration if we want to distinguish inaccessible terms from accessible ones. However, this separation first requires that we understand the characteristics of the learner’s vocabulary a priori. Knowing these characteristics, it is then possible to use the frequency distribution graphs that have been developed to isolate regions where inaccessible terms are most likely to appear. The literature suggests that upper and lower cut-off points can be defined on a word frequency graphs.

However, it is not clear how to best apply this methodology if the learner’s a priori vocabulary is both unknown and continuously shifting. It may be possible with the growing usage of electronic textbooks for students to identify unfamiliar words as they are studying a subject and have this data collected automatically. We can imagine a system that weights words based on the frequency that they are identified as unfamiliar by the students in a given class, and that this is used to help instructors identify words that may be problematic as they develop an exam, much like a spell checker works. However, to understand the current data in the framework of learner characteristics it is necessary to establish a proxy system that makes use of some other feature that is common to inaccessible vocabulary in order to bring it to the attention of the instructor and the student. Further, it is important to understand the limitations of these approaches so that we can more-fully elucidate the issues with word frequency analysis.

Van Rijsbergen’s critique organizes several approaches into a framework that can be used to understand word frequency analysis better and suggests, at least minimally, how to begin to develop a proxy system. Specifically, his critique is important because it articulates the limitations of this work while informing a potential direction for the analysis. Van Rijsbergen explains how prefixes and suffixes affect the meaning of words. Moreover, an understanding of this issue allows us to remove related words to simplify the resulting dataset. For example, the

170

removal of “ual” from “factual” retains the meaning in the root, but this is not true if “ual” is removed from “equal”. In his interpretation of Luhn’s work he establishes that most unique intermediate terms appear between the upper and lower cut-off points, as seen in figure 1.

In our results, this range includes words such as “coexistence”, “conversion” and “dilemma”. These words do, from a purely subjective perspective, appear to be potentially more unfamiliar or less accessible for students. Moreover, these words may be more challenging than words such as “marks” or “thanks” which are very frequent or very infrequent. That is to say, although this is a blunt approach that may capture some very familiar terms, or leave out some unfamiliar terms, there appears to be some promise that inaccessible language can be bounded, to some degree, by using this word-frequency analysis technique. However, more work and a larger sample size will be required before a definitive conclusion is possible. It is also not clear yet where exactly to draw the cut-off lines.

Figure 1 - Shows how significant terms are likely located between the upper and lower cut-off regions. (Reproduced here7,15)

At present, the accuracy of finding unfamiliar and inaccessible language is low. It is difficult to predict where these inaccessible terms are simply from browsing and comparing the graphs in Table 1 alone. Our initial hypothesis was that inaccessible terminology would be used less frequently than accessible terminology. However, the individual data sets do not support this hypothesis. We found that unfamiliar and inaccessible terms are not necessarily infrequent. Rather, these terms may occupy a region that is intermediate between high frequency and low frequency. Further, the characteristics of unfamiliar language are vague; it is difficult to predict where these terms are without further and more in-depth work.

Such work could involve compiling a larger set of exams, reducing cluttering of the data by removing pre-/suffixes, using tagging as suggested by Luhn’s work and comparing individual exams to an amalgamated dataset of all exams. At present, this small dataset is useful for exploratory work in a specific area. However, having a larger dataset can help to better assess the hypothesis. In addition, reducing the clutter in the database by focussing on the root word, rather than the form that includes pre-/suffixes, can assist in compacting the dataset. This is

171

particularly useful in maintaining the integrity of the database because having multiple permutations of the same word still retains the same basic meaning, but adds to the overall word count of the exam. Also, comparing individual or groups of exams to an amalgamated dataset of all exams might yield interesting results. This comparison may identify how a given exam or group of exams (for example, design courses) compares to the general characteristics of vocabulary used in these materials. Discussing the common features of these exams versus the large dataset may yield information about how a specific type of course might be more/less likely to have unfamiliar/inaccessible language for its learning population.

This is a first exploratory step in a line of study that informs an approach that might make engineering education more accessible for the majority of students. As such it is situated in a Universal Instructional Design (UID) approach to improving the learning environment for students.16 However, it should be noted that there will be limitations to any set of results or remediation strategy that is developed from this work. First, there remains a portion of learners that are “high-risk”. This population includes learners who require specialized individual attention or accommodation. For example, simply making vocabulary more familiar will not remove the need for accommodation for students with learning disabilities, but it may make the learning environment somewhat more accessible for these students. Another limitation to the applicability of this research is that vocabulary is not the only barrier to accessibility in the engineering classroom. There are many dimensions to learner characteristics that impact accessibility.3

There are, however, a number of advantages to finding and mitigating inaccessible vocabulary. Using accessible language may assist students who would not otherwise self-identify as people who face barriers in the learning environment. This is related to the “curb cut” effect mentioned frequently in the UID literature.16 Overall, making language more accessible helps a diverse learning population feel more included in an environment conducive to professional skills development. This logic is often used in the Universal Instructional Design (UID) literature. UID describes principles that make the learning environment more accessible to students. For example, encouraging clarity and flexibility in the delivery of instructional material has a positive effect on a variety of students each having different learning ability characteristics.

Inclusivity can also potentially encourage greater student involvement in the learning process. Language is often cited as an issue in the literature on inclusivity.3 Further, understanding language supports and encourages the development of a robust professional vocabulary while maintaining the integrity of the course learning objectives.

Conclusions

Language can be one dimension of some inclusivity and accessibility issues students face in engineering education. Identifying vocabulary that might be unfamiliar and inaccessible has

172

many benefits for all students. It helps students overcome learning barriers, while giving instructors information they can use to help students develop a robust professional vocabulary.

Frequency analysis of language has several limitations, but this exploratory study has shown some interesting results. Specifically, infrequently used words are just as likely to be inaccessible as frequently-used words are, and vice-versa on a given exam. Moreover, words that were near the centre of the frequency distribution appeared less accessible in general. However, more work needs to be done to accurately identify inaccessible words using frequency analysis. At present, we need to establish criteria to help focus our search for inaccessible vocabulary.

The applicability of accessible language in engineering pedagogy is profound. Using a UID approach, we can create more inclusive learning environments that are more flexible and can accommodate different learner characteristics. Our future work will investigate ways of improving the process of finding and mitigating inaccessible language used in all levels of engineering education, in addition to making the environment more accessible and inclusive for students. References 1 "Synergies (2008 Annual Report) ". Rep. National Action Council for Minorities in Engineering. Web.

<http://www.nacme.org/user/docs/NACME_AnnualReport2008.pdf>. 2 "Vision, the NACME Continuum (2010 Annual Report)". Rep. National Action Council for Minorities in

Engineering. Web. <http://www.nacme.org/user/docs/NACME_AR_2010_FINAL.pdf>. 3 Variawa, C., and S. McCahan. 2010. Design of the learning environment for inclusivity. In Proceedings of the

2010 American Society for Engineering Education Annual Conference and Exposition. Louisville, KY. 4 Bello, D. "President Signs 'Plain Language' Bill into Law." Safety health 182.6 (2010): 21. 5 Petelin, R. "Considering Plain Language: Issues and Initiatives." Corporate Communications 15.2 (2010): 205-16. 6 Harper, R., and D. Zimmerman. 2009. "Exploring Plain Language Guidelines." IEEE International Professional

Communication Conference. 7 Van Rijsbergen, C.J. 1979. “Chapter 2: Automatic Text Analysis”, Information Retrieval, 2nd Edition, Butterworth: London. pp. 10-15. Web. <http://www.dcs.gla.ac.uk/Keith/pdf/Chapter2.pdf>. 8 Lease, Matthew. 2007. Natural Language Processing for Information Retrieval: The Time is Ripe

(again). Association for Computing Machinery, Inc. New York:NY. 9 Zipf, H.P. 1949. Human Behaviour and the Principle of Least Effort, Addison-Wesley, Cambridge:Massachusetts. 10 Li, W. "Random Texts Exhibit Zipf's-Law-Like Word Frequency Distribution." IEEE Transactions on

Information Theory 38.6 (1992): 1842-5. 11 Bloom, L. "Cognition and the Development of Language." Language 50.2 (1974): 398-412. 12 Saffran, Jenny R., et al. "Incidental Language Learning: Listening (and Learning) Out of the Corner of Your

Ear." Psychological Science 8.2 (1997): 101-5. 13 The Linguistics Encyclopedia. Ed. Kirsten Malmkjær. 2nd ed. ed. New York: Routledge, 2002. 14 Luhn, H.P., The automatic creation of literature abstracts, IBM Journal of Research and Development, 2.0 (1958):

159-165. 15 Schultz, C.K.. 1968. H.P. Luhn: Pioneer of Information Science - Selected Works. Macmillan. London. 16 Bowe, F. 2000. Universal design in education: Teaching nontraditional students. Westport, CT: Bergin &

Garvey.

173

APPENDIX A.4 – COMPUTATIONAL METHOD FOR IDENTIFYING INACCESSIBLE VOCABULARY IN ENGINEERING EDUCATIONAL

MATERIALS

C. Variawa, and S. McCahan. “Computational Method for Identifying Inaccessible Vocabulary used in Engineering Education.” Proc. of 119th ASEE Annual Conference and Exposition. San Antonio, 2012. This paper was presented at the 2012 American Society for Engineering Education Annual Conference. This paper investigates the design of a computational approach to characterize vocabulary on engineering examinations. The work describes an application of the Term Frequency-Inverse Document Frequency (TF-IDF) equation and the effect of using different comparator sets of documents on the wordlists generated. The discussion elaborates on these effects, and ways in which to promote vocabulary characterization that increases the TF-IDF scores of specific words.

174

Computational Method for Identifying Inaccessible Vocabulary in Engineering Educational Materials

Introduction Instructors often face the challenge of making students feel more included in the classroom, especially in freshmen classes of engineering. In the freshman classroom, instructors are more often finding that their students are departing from the “traditional” homogenous demographics of engineering in the past. Engineering classrooms have broader representation from all cultural and socio-economic backgrounds and even greater variance in approaches to learning leading to greater diversity. This increased diversity among students may also lead to barriers that impede accessibility to learning and as a result, inclusivity.

Universal Instructional Design (UID) is a pedagogical philosophy that has emerged in the field of higher education research. It aims to increase accessibility to learning materials. The core concept of UID, universal design, is from civil engineering and calls for increasing accessibility to physical structures by incorporating accessibility as a priority in the design process. Applied to education, this design philosophy attempts to “make instruction accessible to the greatest extent for the largest number of people possible”.1 The literature on this subject suggests the use of seven principles that guide teachers to create accessible learning material by increasing clarity, transparency, flexibility and usability of instruction. However, the use of UID has not been rigorously examined within the context of engineering education as a tool to create more inclusive learning environments. The premise of our study is to use a UID-inspired approach to make engineering education more accessible to the greatest diversity of students possible to the greatest degree; we hope to maximize accessibility to engineering course material with the goal of making learning environments more inclusive.

One particular learning barrier that our students face is inaccessible language. In engineering, we generally encourage the development of a robust engineering vocabulary to help students develop as professionals. However, a critical look at the language we use in the classroom may raise questions about the accessibility to course material when we begin to use vocabulary that is external to the corpus of language our diverse students bring with them. So, while we claim to promote professional language development, we may inadvertently create a less than inclusive environment for our students. Specifically, when we use uncommon language that is neither discipline-specific nor explicitly taught, we are creating an environment that is biased towards learners that have the same corpus of vocabulary as the instructor – this is particularly evident when colloquial or cultural vocabulary is used in the classroom. In particular, our study attempts to investigate the communication barriers that exist between instructors and diverse student populations within the context of engineering education.

175

Final examinations are a standardized artifact of the engineering classroom whose purpose is to assess the student’s understanding of course material. As a summative evaluation technique, the exam probes the students mastery of what was taught in class and how well they can apply this material to answer the provided questions. In many cases, instructors attempt to provide realistic problems using course material that is contextualized within a particular setting under given conditions. The goal of providing such authentic, contextualized questions is to mimic a “real world” situation where engineering knowledge can be applied. In doing so, the instructors may inadvertently test the student’s cultural knowledge of the contextualized environment rather than testing course-specific instruction exclusively. Specifically, the vocabulary used in creating authentic engineering problems on such assessments may cause an inaccessible and non-inclusive environment for some students. For example when we contextualize an engineering problem by using societal references that we assume are “widely known”, some students may find the vocabulary to be unclear or foreign; the challenge of the question becomes trying to understand that vocabulary, rather than using engineering knowledge to answer the question. By extension, the use of such vocabulary may yield an invalid performance assessment because the final exam is no longer testing what it purports to test: a student’s understanding and mastery of course material. The use of accessible vocabulary while maintaining an authentic assessment environment may lead to final exams of higher quality that may promote robust vocabulary development as well. In our investigation, we aim to maximize accessible vocabulary while leaving the course and discipline-specific vocabulary as-is so that the integrity of the course material is unaffected by creating a more inclusive exam.

The goal of the study is to develop a computational approach to identify potentially inaccessible vocabulary and bring it to the attention of the instructor, while ignoring engineering-specific vocabulary explicitly taught in a course. In the process of doing so, we must examine the language currently used on engineering final exams to determine if there is a method to distinguish course-specific or discipline-specific language from the rest. Specifically, to find inaccessible vocabulary we are going to find words that are course-specific and then, in future work, take the complement as a possible source of inaccessible vocabulary. This is to ensure we are not inadvertently labeling course-specific words as being inaccessible. This paper focuses on this specific aspect of the larger study on inaccessible language. In particular we find that literature from the field of linguistics, computer science, computational linguistics, higher education and even some work from culture studies suggest tools for this type of work. While there are some limited corpora of vocabulary that are discipline specific, our intention is to establish a dataset using language relevant to engineering education at a typical North American engineering institution. We begin this particular component of the larger study by trying to find course-specific language.

Computational linguistics suggests the use of keyword-generation algorithms to establish a corpus of words that are characteristic of a particular piece of work.2,3 In our case, we can use this approach to develop a quantitatively-described hierarchy of potential keywords for final

176

exams used in engineering. The hypothesis is that we can then compare the keywords found in one exam to others, as a group or individually, to inform trends that describe the use of language in engineering education. The powerful use of keyword comparison can help us see how language changes across disciplines, how vocabulary of freshman classes differ from upper years, and so on. The aim is to compare keywords of different test cases to suggest a usable approach for determining course-specific language on final exams used in engineering. Although the use of keyword-generating algorithms is not the only approach to this issue, and neither is it exhaustive at that, the goal is to explore the intersection of computational linguistics and engineering education and see if we can use a tool from one discipline to help make the other more accessible. Methodology This paper focusses on classifying the vocabulary found on engineering final exams using a quantitative computational method. The work presented here builds on earlier studies that examined students’ self-assessment of vocabulary understanding, and word frequency analysis in engineering exams.4,5 Our methodology makes use of the results from these earlier works.

Specifically, this investigation compares keywords from different exams to one another with the goal of finding a method where course-specific terms are clustered and segregated from other vocabulary. Earlier studies found that word frequency analysis on-its-own was not sufficient for this purpose.4 The field of computational linguistics and computer science suggest several potential approaches that can be used to generate and compare keywords among groups of documents that are an improvement over word frequency analysis. Although the algorithms used in these approaches are often different, the goal remains to find words that effectively characterize the vocabulary used in a particular document.

One such approach is the Term-Frequency Inverse-Document Frequency (TFIDF) algorithm. This technique compares the frequency of words in a single document (TF) to the vocabulary used in a set of documents. The mathematical formula for TFIDF is:

𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 𝑇𝑇𝑇𝑇 × 𝑇𝑇𝑇𝑇𝑇𝑇

where

𝑇𝑇𝑇𝑇 = �# 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑡𝑡𝑜𝑜𝑡𝑡𝑡𝑡𝑡𝑡 # 𝑜𝑜𝑜𝑜 𝑤𝑤𝑜𝑜𝑜𝑜𝑤𝑤𝑜𝑜

�𝑖𝑖𝑖𝑖 𝑎𝑎 𝑠𝑠𝑖𝑖𝑖𝑖𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑠𝑠𝑖𝑖𝑑𝑑

and

𝑇𝑇𝑇𝑇𝑇𝑇 = log �# 𝑜𝑜𝑜𝑜 𝑤𝑤𝑜𝑜𝑜𝑜𝑜𝑜𝑑𝑑𝑜𝑜𝑜𝑜𝑡𝑡𝑜𝑜

# 𝑜𝑜𝑜𝑜 𝑤𝑤𝑜𝑜𝑜𝑜𝑜𝑜𝑑𝑑𝑜𝑜𝑜𝑜𝑡𝑡𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑡𝑡𝑡𝑡𝑐𝑐𝑜𝑜𝑐𝑐𝑜𝑜𝑐𝑐 𝑡𝑡ℎ𝑜𝑜 𝑤𝑤𝑜𝑜𝑜𝑜𝑤𝑤 �𝑖𝑖𝑖𝑖 𝑎𝑎 𝑠𝑠𝑠𝑠𝑑𝑑 𝑑𝑑𝑜𝑜 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑠𝑠𝑖𝑖𝑑𝑑𝑠𝑠

177

The IDF is a measure of how important a particular term is within a set of documents, and is calculated by dividing the total number of documents by the number of documents in the set which contain that term, and then takes the logarithm of the quotient. The TFIDF formula assigns a score to each word in the test document.

As an information retrieval tool, TFIDF is effective in finding characteristic words in a document when multiple documents are compared.6,7 The existing literature about TFIDF describes it as a technique used to classify documents based on keywords and modifiers. Specifically, TFIDF is used to describe documents using hierarchical subclasses, or other creative methods where the algorithm is used repeatedly per subclass. For example, a keyword for a computer hardware part might be described as “comp.sys.ibm.pc.hardware”, and this is an example of where the algorithm is used repeatedly in a loop within each subclass. From a computational perspective this puts a large load on the processor(s), and as such is quite intensive, but the results are generally accurate. Although we are not using a repeated looping method within subclasses for this study, we can still use the TFIDF to provide insight about words that are diagnostic for a document. Further, since we are using less processing loops, we are less concerned about computational overload and limitations on available computer memory.

Research shows that there are several other methods used to generate keywords used in computational linguistics and computer science.7-10 Research also shows tangential algorithms that can be used to classify documents and their strengths and weaknesses. In particular, the TFIDF method seems to be recognized as a validated approach to finding keywords in documents according to existing literature.7,8 The existing work also demonstrates approaches for accurately describing the magnitude of word-grouping behaviour using more sophisticated mathematical and statistical techniques including investigating the bottleneck method for disambiguation and greedy heuristic analysis approaches.8-10 In particular, the existing literature focuses on the algorithm itself and ways to describe documents using subclasses (for the sake of information retrieval) but not how to classify individual vocabulary in a study context similar to our work.

A higher TFIDF score means that the particular word being examined is diagnostic of that particular document and a low TFIDF score means that the word is not a keyword for the document. This algorithm is particularly effective because it tends to filter out common terms in a set of documents; the algorithm is able to filter out terms based on the documents being compared. As such, we can use the TFIDF method to potentially find words that are diagnostic of a particular exam and filter out words that are commonly used on engineering exams in general.

We use a repository of electronically available final exams in the Faculty of Applied Science and Engineering at the University of Toronto for this research. The total number of usable exams is 2254, with a final total of over 2,300,000 English words being examined in our work. The dataset spans the last ten years, covers all departments in the Faculty and is robust enough to

178

cover over 98% of all words on each exam used including captions on figures, etc. Words that contain numbers or foreign characters are discarded from the study: only words that contain ASCII characters from 65-122 (inclusive) are taken into account.

This phase of our study begins by converting the approximately 3000 electronically-available engineering exams from an image-PDF format to plain text using Optical Character Recognition (using Adobe Acrobat®). Then, all of the words from each exam are extracted and “scrubbed” to remove words that contain numbers or foreign characters. This word set is then automatically placed in its own new text-only file that is cross-referenced with the course designation, year, and discipline. Then, the user can select which of these files to compare against which group of exams to develop TFIDF values for that word set. The data is then exported and further scrubbed using a Microsoft Excel® macro to remove duplicates. The TFIDF processing code was created by the investigators. The investigators used four test cases for the TFIDF comparisons and these are detailed in the subsequent sections.

Results

The TFIDF computational method identifies keywords in a particular document by comparing it to a specified group of other documents. For this phase of the study, we compared one particular exam to four different groups of exams from the same institution to generate four case scenarios. The exam chosen as the “control” is from the Department of Materials Science and Engineering (MSE), for a third-year undergraduate course called ‘Mechanical Behavior of Materials’ held in 2009 (MSE316). This exam was chosen because the author is already familiar with the course-specific vocabulary and the course is very typical of a technical engineering course. We tried comparing this exam against 4 different document sets:

1. This exam is compared against all electronically available exams; 2. all exams created in the year 2009; 3. compared against all exams from the same department (MSE); 4. then compared against all exams from a different engineering department (Civil engineering).

In all cases, each word in the control exam is given a TFIDF score and ranked based on decreasing score. As mentioned, a higher TFIDF value means that the specific word has a higher probability of being a diagnostic word for that document.

179

Table 1- Shows TFIDF values in decreasing order for exam words across four test cases; bolded words indicate that the word is potentially course or discipline-specific

The table above shows a small sample of the 457 words from the MSE316 exam, along with the corresponding TFIDF value for each word ranked in decreasing order. The table also shows course or discipline-specific words, and these are highlighted in bold for each case. These words are explicitly taught in class and are expected to be known for this exam. For instance, the instructor lectures on crystal lattice structures and mentions deformation behavior along slip planes: these words would then become part of the “course-specific” corpus of this class. It should be noted we were hoping that this method would put course-specific words near the top of the list, as this is indicative of a tool that can successfully isolate these terms computationally.

The data shows that Case 2 has the highest number of course-specific terms near the top of the TFIDF list. Specifically, eight out of ten highest-ranked TFIDF words are course-specific with the farthest outlier ranked at 20. This means that Case 2 emerges as potentially the best approach when compared to the other three cases. However, to confirm this conclusion further testing with other exams may be required. The data shows that Case 1 has seven of its top ten words as

Rank Word TFIDF score Word TFIDF

score Word TFIDF score Word TFIDF

score1 dislocation 0.048687 dislocation 0.054035 dislocation 0.022046 dislocation 0.054812 stress 0.026217 stress 0.026728 ofll 0.01602 gb 0.0254653 ofll 0.023793 dislocations 0.022067 segment 0.015675 cry 0.025274 gb 0.020656 ofll 0.021895 gb 0.01388 crystal 0.0251385 dislocations 0.018314 grain 0.018767 stress 0.01372 dislocations 0.0223436 cry 0.017527 gb 0.01759 segments 0.011745 ofll 0.0222827 creep 0.017253 cry 0.01718 creep 0.011022 grain 0.0162828 grain 0.017148 slip 0.01653 subgrain 0.01047 slip 0.0140399 subgrain 0.016841 crystal 0.015391 dissociate 0.01047 creep 0.013568

10 partials 0.015869 deformation 0.014911 move 0.009878 stress 0.013437… … … … … … … … …

20 material 0.011816 tensile 0.011252 slope 0.00798 formation 0.01098430 formation 0.009302 strain 0.008904 appearance 0.006332 ofllv 0.00893740 continuation 0.008237 softening 0.007719 onset 0.005516 agb 0.0078350 onset 0.007397 agb 0.007072 cry 0.004803 leads 0.00672360 test 0.006167 theoretical 0.005505 unique 0.004257 strain 0.00599270 crystalline 0.005691 yx 0.004857 shaded 0.004044 plastic 0.005177

… … … … … … … … …100 matrix 0.004542 lowering 0.004413 pinned 0.00349 speaking 0.004469200 offset 0.002845 atoms 0.002752 mainly 0.002289 reflect 0.002914300 fx 0.00157 lb 0.001529 first 0.001323 stop 0.001542400 so 0.000285 what 0.000403 of 0.00036 the 0.000396

MSE316-2009 vs. All Exams

MSE316-2009 vs. All Exams in 2009

MSE316-2009 vs. All MSE exams

MSE316-2009 vs. All CIV exams

Case 1 Case 2 Case 3 Case 4

180

course-specific: however we begin to see some clustering behavior of course-specific words near the rank of 70. By “clustering” this means that some course-specific terms appear together in a group. This low-end clustering behaviour is also seen with Case 4, but the clustering behavior is around both 20 and 70. This type of behavior indicates that the computational method is leaving some course-specific words scattered throughout the list when we begin to compare an exam to the entire repository of exams or to just one other discipline. From the data we also see that comparing MSE316 to all exams from MSE the course-specific terms are often clustered near the top of the list, but the top ten also contains other words – this shows that comparing an exam to the exams of the same discipline does not yield particularly accurate results either. In particular, with the data computed and the particular scope of this study, it appears that comparing one exam to all exams in the same particular year shows the highest number of course-specific terms among the top ten with minimal outliers elsewhere.

Discussion

The data shows that the TFIDF method appears to be gathering course-specific terms at the top of the word lists. In computer science, the purpose of the TFIDF method is to find words characteristic of a particular document when comparing to a group of other documents. In our case, TFIDF appears to have done that successfully for all test cases, with some limitations. However, we see that this trend does have its limitations, based on the test cases reported in this study. First, not all course-specific terms are flagged by this algorithm. We see that some course-specific terms still appear at other locations on the word lists, but the majority of them appear near the top. This tells us that the computational method can be improved to become more accurate.

Another important limitation of this study is that the data are not sufficient to demonstrate a reliable process. We have compared just one exam to different groups of exams to generate our sample test cases. As an exploratory test case, it shows that the TFIDF method does gather course-specific terms near the top of the word lists. However, it does not show if this is applicable for other exams when other courses are used instead of MSE316-2009. As such, the need for expanding this study to include exams other than MSE316-2009 is necessary to better evaluate the reliability of the TFIDF method.

Another limitation of this study is that the TFIDF method only analyzes single words and not phrases. Specifically, the program is blind to the fact that multiple words appearing in a particular sequence can be classified as a course-specific phrase. For example, the term “face centered cubic” is treated as three words instead of as one specific phrase by the TFIDF algorithm while “FCC” is treated as one word even though both are course-specific to MSE316-2009. This limitation needs to be addressed

It should also be noted that the top ten words represent only the top 2% of words for this particular exam, and it does not provide particularly conclusive evidence about the applicability

181

of the computational process; we might need to examine more than just the top ten words. However, the purpose for this exploratory investigation is to compare cases to one another and determine which of these comparison types holds the most promise for being computationally effective – future work is needed to extend the scope of investigating these lists to include a more diverse set of exams.

Based on the literature in computer science, TFIDF is quite effective at making keyword lists that are characteristic of a particular document when compared to groups of other documents. Our study uses the TFIDF tool and tests it in the context of engineering education. Specifically, it extends the work found in the literature and shows that TFIDF is potentially able to find characteristic words on engineering exam documents as well. As instructors, we see from the data that “characteristic words” of exams are synonymous with course-specific terms. By extension, it appears that the algorithm originally developed in computer science can be used in engineering education for future work in the area of vocabulary analysis of assessment instruments.

The data shows that comparing one engineering exam to all engineering exams in the same year results in course-specific vocabulary having high TFIDF scores. In particular, the course-specific terms tend to demonstrate grouping behavior near the top of the list rather than being dispersed throughout the ranked vocabulary wordlist. The literature does not define why this behavior exists because such work has not been performed in this context before. Therefore, it is difficult to accurately describe the reasoning behind such behavior based on existing work. However, TFIDF is well known as a method to identify words that characterize a single document with respect to a body of documents. So from this perspective it makes sense that comparing to a wide range of other exams from the same year (i.e. use the same slang, or same current English colloquialisms) would allow for the identification of keywords that differentiate this exam from others.

While the data set presented here is limited, this study begins to offer insight into the development of more accessible course material for engineering. The vocabulary analysis process can potentially categorize words as course-specific, common, or into a third category; uncommon and not course-specific. It is this third category of terms that may pose accessibility challenges for students and can be brought to the attention of the instructor. Additional clarification or other assistance (such as a visual aid) can be used to improve the accessibility of the text. As instructors begin to create performance assessments for their students that contain more accessible language, they are also promoting a more valid assessment with the ability to contextualize material for more authentic questions. The premise of inclusive learning environments in engineering education is critical for the transition of students into post-secondary education as the goal is to develop a bias-free education system for the diverse nature of the learning population.

182

Future Work

The study attempts to investigate the language used in engineering education to promote inclusive learning environments. This particular investigation looks at the vocabulary used in engineering examinations and explores a computational method to distinguish course-specific vocabulary. By analyzing four cases, it is found that comparing a particular exam to all exams in that particular year is a promising approach for the computational approach to further investigate. Further work in this area will include a more diverse set of test cases and exams we compare to one another. In addition, we need to address the limitations of this study before we can conclusively state the effectiveness of using TFIDF as a method of finding course-specific terms. This exploratory study does however provide insight into the use of a computational tool to investigate the vocabulary used in engineering education. Future work in this area would help maintain the integrity of course-specific vocabulary on engineering examinations while we explore other ways to identify inaccessible language in engineering education. References 1 Bowe, F. 2000. Universal design in education: Teaching nontraditional students. Bergin & Garvey, Westport, CT. 2 Hausser, R. R. 2001. Foundations of computational linguistics: Human-computer communication in natural

language. Berlin: Springer. 3 Damascelli, A. T., and Martelli, A. 2003. Corpus linguistics and computational linguistics: An overview with

special reference to English. Torino: Celid. 4 Variawa, C., and S. McCahan. Identifying Language as a Learning Barrier in Engineering. International Journal of

Engineering Education. 28.1(2012): 183-191. 5 Variawa, C. and S. McCahan. 2011. Design of the Learning Environment for Inclusivity: A Review of the

Literature. Proceedings of the 117th ASEE Annual Conference and Exposition. Louisville, KY. 6 Matsunaga, L. 2008. Term Weighting Approaches for Text Categorization Improving. Proceedings of Eighth

International Conference on Intelligent Systems Design and Applications. Kaohsiung, Taiwan. 7 Russell, M. A. 2011. Mining the social web. Beijing: O'Reilly. 8 Foster, A., and P. Rafferty. 2010. Innovations in information retrieval: Perspectives for theory and practice.

London: Facet. 9 Bekkerman, R., et al. Distributional Word Clusters vs. Words for Text Categorization. Journal of Machine

Language Research. 3(2003): 1183-1208. 10 Hogenhout, W. and Y. Matsumoto. 1997. A preliminary study of word clustering based on syntactic behavior. In

T.M. Ellison (ed.) CoNLL97: Computational Natural Language.

183

APPENDIX A.5 – AN AUTOMATED APPROACH FOR FINDING COURSE-SPECIFIC VOCABULARY

C. Variawa, S. McCahan, and M. Chignell. “An Automated Approach for Finding Course-specific Vocabulary”. Proc. of 120th ASEE Annual Conference and Exposition. Atlanta, 2013. This paper was presented at the 2013 American Society for Engineering Education Annual Conference. This paper reviews the literature on automated indexing and language characterization, and presents a vocabulary characterization study. This work presents a modified algorithm based on the Term Frequency-Inverse Document Frequency word classification equation. The results present a wordlist and other data that suggest that this algorithm can characterize discipline-specific vocabulary on engineering final exams.

184

An Automated Approach for Finding Course-specific Vocabulary

Introduction

This study introduces methods to increase the transparency of specific learning outcomes expected in an engineering course. Freshman engineering students face the challenge of absorbing a new set of terminology associated with their discipline, while also adjusting to the university environment. As they learn, students may inaccurately grasp course concepts due to lack of understanding of domain vocabulary. One strategy for addressing this problem is to make design of vocabulary part of overall course design. This requires explicitly identifying the vocabulary that students need to learn in the course of their studies. Proper specification of vocabulary is likely to be particularly important in introductory courses that form the foundation of engineering disciplines.

Identifying discipline-specific words helps instructors establish clear expectations of required vocabulary knowledge, while building robust technical communication skills. If students have a clear understanding of required vocabulary, then instructors will be able to develop higher quality teaching and assessment material. As a result, instructors will likely be confident in the knowledge that students will not be handicapped by language usages that are neither part of their cultural background nor inherent to the course or domain. At the freshman level, vocabulary lists might be developed that highlight terms pertinent to the field. However, language has a fluidity that cannot be accurately captured by static wordlists that do not accommodate context. However, manual updating of word lists each year is an additional (and probably unwelcome) burden on instructors. In this study, the authors investigate an efficient and semi-automated approach for developing up-to-date course-specific vocabulary lists while requiring minimal contextual input from the instructor. The focus of this research is on engineering course material with the ultimate goal being to help freshman students adjust to new terminology in their field of study, without increasing the workload of teaching faculty. Going forward, the proposed computational method can inform the development of a tool - like a software program - that can automatically compile a list of course-specific vocabulary. Literature There are several approaches that can characterize language in document text. The fields of research that contain literature in this area include education, linguistics, computational linguistics, industrial engineering, as well as several others. Specifically, literature in the field of education pertinent to the study ranges from the Plain Language Movement to language acquisition and English as a Second Language research.1-3 These approaches aim to simplify

185

language structure and vocabulary to maximize accessibility.2,4 Further, research in this area focuses on the relationship of words to generate meaning and on how language development is affected by choice of vocabulary.1,2,4 The research informs an understanding of the importance of language development and the motivation to use accessible, yet immersive, language in learning environments.4-6 While this is important in public documents (i.e. tax forms) overly simplifying language does not suit the purposes of the engineering classroom. Engineering students need to develop robust vocabulary ability that is authentic to their field and will stand them in good stead when they take up their careers.

The fields of linguistics and computational linguistics are particularly broad, and they study language from several perspectives. Some approaches examine the development of language, symbolic meaning, and the structure of words.7, 8 Other approaches look at differences between languages and their evolution over time.9, 10 Computational approaches tend to convert the complexities of language into bits of information that can be quantified, classified and analyzed as packets of data. More specifically, this field investigates algorithms and tools that can measure and quantify vocabulary.11-14 Some algorithms and methods are broadly applicable across a range of linguistic fields. Classification algorithms use various corpora to organize words into hierarchical structures. Word hierarchies can also be elaborated with syntactic and semantic information to create a comprehensive representation of knowledge about the English lexicon. The most extensive tool of this type is WordNet, a database that contains words and their synonyms classified by relevance and similarities with each other, referred to as synsets.14 This approach forms lexical repositories of words that can be used to analyze the relationship of sets of words with one another.12-14 An advantage of using this approach is to develop a common lexical database of words pertinent to a field, but a disadvantage is that this repository continues to grow in size without a structured ability to prune words over time.12-14 Further tools like wordnet are designed to deal with the vocabulary of language in general and are less useful in organizing and explaining domain specific vocabularies. Thus an approach is needed that generates manageable, and domain specific, vocabulary lists.

Another area of research in computational linguistics is keyword-generation (automated indexing), and the development and application of algorithms to statistically determine the characteristics of words based on frequency. Some of the more frequently used approaches in this area include frequency analysis of words, keyword generation algorithms and artificial intelligence methods. Frequency analysis of words is an approach that attempts to correlate the frequency of use of a word in a target document to a corpus of English, or specific discipline.14,

16, 17 Prior work in this area shows that this method is useful to understand natural language, and can be used with algorithms that are supplemented by statistical theory, like Zipf’s Law16-18 Another approach is Latent Semantic Indexing, for example, which uses singular value decomposition (analogous to principal components analysis on large, sparse matrices) to identify associations between words based on their context, and which can also be used to generate data about the meaning of words when used in similar contexts.11, 13, 19 Multiword Expressions are

186

another set of approaches that investigate the meaning of words based on lexeme analysis,11, 12, 20 Specifically, multiword helps us understand that words can change meaning based on how they are used in a sentence, and this can inform a keyword generation procedure.11, 20 This general field of language analysis using computational approaches falls under a category of computer science and engineering defined as artificial intelligence (AI) because words are being translated from human vocabulary to computer-based computations, and then to a form that allows us to better understand its characteristics.

TF-IDF Approach

A preliminary analysis shows that a simplistic approach such as frequency-analysis on its own is inadequate to determine characteristic terms to a piece of text.21- 23 Frequency-analysis alone only generates information about how often certain words are used. This information is not particularly useful to this study because characteristic words on engineering documents are not necessarily those that appear most frequently. Specifically, literature shows that commonly occurring words are indicative of natural language, and not a measure of diagnostic vocabulary to an input document.13, 15, 18 As such, a more advanced approach is required: one that can characterize diagnostic words in documents, while requiring minimal contextual data other than the documents themselves, and one that can handle large sets of words and documents.

Term Frequency Inverse-Document Frequency (TF-IDF) analysis is a well known index method in information retrieval, and it is used to characterize vocabulary across sets of documents.11, 13,

15, 18, 23-25

The TF-IDF technique compares the frequency of words in a single document (TF) to the vocabulary used in a set of documents. The mathematical formula for TFIDF is:

TFIDF = TF × IDF

where



and


# of documents containing the word �in a set of comparitor documents

There are two main parts to the TF-IDF algorithm, and they work together to assign a score for each word in the target document. The TF counts the number of occurrences of a particular word, and divides that number by the total number of words in the target document, which is a simple measure of frequency. The IDF is a measure of how important a particular term is within a set of documents, and is calculated by dividing the total number of documents by the number

187

of documents in the set which contain that term, and then takes its logarithm. The TFIDF formula multiplies these together and attaches the resulting score to each unique word in the target document. A higher TFIDF score means that the particular word being examined is diagnostic of that particular document and a low TFIDF score means that the word is not a keyword for the document. This approach allows us to differentiate common vocabulary from words that are characteristic to the target document, like course-specific language in this case. This approach works reasonably well for one target document (e.g. the final exam in a course), but does not do a good job at differentiating course-specific or discipline-specific vocabulary from words that appear infrequently in natural language. For example, it might identify both “enthalpy” and “circulation” as characteristic words on a thermodynamics exam because these both are likely to occur rarely in a comparator document set. But a thermodynamics instructor would easily recognize “enthalpy” as being key disciplinary jargon, and “circulation” as not specific to the discipline.

Interpreting the mechanics of the approach

The authors propose a method that should improve the effectiveness of the TF-IDF algorithm for the purposes of investigating the language used by engineering documents. Specifically, we suggest developing two TF-IDF scores for each word in a document and then calculating their difference to maximize accuracy in finding course-specific vocabulary. The approach would be to use two different contexts for the same document to calculate two TF-IDF scores:

1. Compare a target document to all documents in engineering, minus those that are in the same discipline. This should highlight terms that are characteristic of the discipline.

2. Compare a target document to all documents within the same discipline as that input document. This should highlight terms that are characteristic to that course.

This method generates two wordlists – one from each context listed above. These lists can then be sorted alphabetically while subtracting the TF-IDF scores for context #2 from context #1. This produces a list where words that are both course-specific and discipline-specific are given a high score, whereas all other types of words are given a lower score. This modified use of the TF-IDF algorithm can be expressed as:

TFIDF = TF × IDF

𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇1 − 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇2 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = 𝑇𝑇𝑇𝑇(𝑇𝑇𝑇𝑇𝑇𝑇1 − 𝑇𝑇𝑇𝑇𝑇𝑇2)

Where, subscripts 1 and 2 would represent context #1 and context #2 respectively, and TF would be identical for both because input exam is the same.

188

And where,

𝑇𝑇𝐸𝐸 = # documents in engineering, minus discipline 𝑇𝑇𝐸𝐸,𝑊𝑊 = # documents in engineering, minus discipline, containing the same word 𝑇𝑇𝐷𝐷 = # documents in discipline 𝑇𝑇𝐷𝐷,𝑊𝑊 = # documents in discipline containing the same word

Condensing and simplifying:

𝑇𝑇𝑇𝑇𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = log �𝑇𝑇𝐸𝐸 • 𝑇𝑇𝐷𝐷,𝑊𝑊


Using this approach, words can be characterized based on how prevalent they are in engineering and in their respective discipline.

• if 𝑇𝑇𝐸𝐸 • 𝑇𝑇𝐷𝐷,𝑊𝑊 is large, because there are lots of documents in the discipline containing the same word, then it causes the numerator to increase, resulting in the IDFmod becoming larger, which then amplifies the TFIDFmod value; this means that the word frequently occurs in the discipline but not necessarily all of engineering, which implies it is likely discipline-specific

• conversely, if 𝑇𝑇𝐸𝐸,𝑊𝑊 • 𝑇𝑇𝐷𝐷is large, as a result of many documents in engineering containing the same word, then IDFmod will get smaller, which will reduce the TFIDFmod

value; this means that the word occurs frequently in engineering but is not necessarily unique to the discipline, which implies it may not be discipline-specific.

As a result, when there is a word that has a high term frequency in a document, but occurs frequently in the discipline but not in all of engineering, then the modified approach would boost the score of that word. However, if that word does not occur frequently in the discipline but is common to engineering, then this algorithm would shrink its score. Therefore, the boosting effect only significantly affects words that are characteristic of that document, meaning it appears preferentially in the discipline but not necessarily in all of engineering Methodology This study develops a method for characterizing wording in engineering documents. In particular, we are interested in developing an approach to automatically identify course-specific language so that instructors can help first year students adjust to the terminology used in their chosen field of study. This is relevant to the field of accessible language in general, because it identifies vocabulary that students need to be familiar with in a professional context. The approach is outlined in Figure 1 below. Words are prepared for analysis by converting all input

189

documents to text-only format, then the TFIDF algorithm is used to develop word lists based on a target document (e.g. the final exam for a course) and sets of comparator documents. These word lists are then used to differentiate and highlight course-specific vocabulary that characterizes the target document.


text

Engineering Exams

Raw

Te

xt




Within Subject Computing an

exam to all exams in the Same Discipline

Wor

dlist

Wor

dlist

DIFFERENTIATINGPulling out the language used specifically in that

engineering course(across engineering MINUS within subject)

Wor

dlist

POST-PROCESSINGNormalizing the TFIDF

values of outputted wordlists

ADOBE ACROBAT X


MS EXCEL

Figure 1- Shows graphically the methodology used in this study from top to bottom

190

The type of engineering document chosen for this study is engineering final exams. These documents are standardized artifacts of the engineering learning environment and are publicly-available for research and study purposes at the University of Toronto. The large dataset of exams spans several years, creating a substantial amount of vocabulary that can be examined. For this study, the authors begin by acquiring all electronically-available engineering exams at the University of Toronto. In total 2254 exams were used in the Faculty of Applied Science and Engineering between the years 1999 to 2009. These exams are in a variety of graphics and document formats, but they were converted to PDF-format using Adobe Acrobat X Professional to simplify subsequent coding and processing.

Clean Data and OCR exams

The text for each exam was subjected to an optimization process, as outlined in the top-most box of Figure 1. This process removes the majority of non-word artifacts that occur because of the original hardcopy-to-electronic conversion. Some of these artifacts included specks, misshapen words, improperly-oriented pages, equations, and foreign non-ASCII characters. Text conversion failed for roughly 20 of the exams, which were excluded from the remainder of the analysis.

TF-IDF Algorithm and equations

Once the text files for each of the exams in the study are created and optimized, the authors developed an applet in Visual Basic.NET that would compute the TFIDF score for words in target documents. Specifically, the program prompts for an input document and a folder where comparator documents are located. It computes the TFIDF score for each word in the target document based on the words found within text files contained the folder specified earlier. It then generates a list of words and their associated TFIDF scores and outputs that as another text file. Each sample exam is run through this program twice. One pass compares the exam against a comparator set of exams within the same discipline, while the other compares the exam to all exams in the repository. This procedure results in the creation of two word lists.

For each of the input exams, the TFIDFmod score is developed by subtracting the two word lists for each of the target documents, as outlined figure 1. This step is critical to the process because it helps to distinguish between vocabulary used in a discipline from vocabulary used across engineering. Specifically, this approach is used to highlight and further differentiate course-specific words from other vocabulary on the sample exam by increasing the spread of TFIDF values and outputting them as a scored wordlist.

Post-processing the TFIDF scores

The wordlist generated from the previous step is plotted graphically. This step graphically depicts the quantity and range of TFIDF values across an exam.

191

Results

Sample Case –Materials Engineering Exam

The results below track an exam from a course called “Fracture and Failure of Engineering Materials”, which is part of the Materials Engineering curriculum at the University of Toronto where we did this research. The data shows the TFIDF scores for a sample exam from the repository. Table 1 shows a ranked list of words in the target exam, in order of decreasing TFIDF scores. Figure 2 shows the rank of all of the words from the same exam plotted against their corresponding TFIDF score.

Table 1 - Shows the TFIDF scores (top 25 and selected others) for a sample exam from the course "Fracture and Failure of Engineering Materials"

Rank Word ModifiedTFIDF Score 1 dislocation 0.046749 2 dislocations 0.016992 3 cry 0.016379 4 grain 0.015939 5 crystal 0.014845 6 stress 0.013639 7 material 0.011965 8 strength 0.010907 9 deformation 0.008955

10 creep 0.008446 11 partials 0.008165 12 ofll 0.007426 13 intermetallic 0.007198 14 subgrain 0.007193 15 tensile 0.007181 16 metallic 0.006853 17 gb 0.006749 18 hardening 0.006659 19 boundaries 0.006414 20 hallpetch 0.006259 21 crss 0.00569 22 composite 0.005598 23 strengthening 0.005518 24 elastic 0.005376 25 lattice 0.005137 …

200 fact 0.000435 …

350 able -0.000104 …

450 equals -0.001426

192

Figure 2 - Shows all of the TFIDF-scored words from a course called “Fracture and Failure of Engineering Materials”

The wordlist in Table 1 contains a high number of course-specific vocabulary, especially near the top of the list. This is the expected result as words that are characteristic of the sample document are assigned a higher TFIDF score than words that are commonly found on all engineering exams. As the rank gets larger, the number of non-course-specific words increases significantly. Though there are too many words to list individually here, a number of sample words at various points along the TFIDF scale are included. For example, looking at word 350 “able”, shows that it is assigned a negative value, and this is a direct result of the TFIDFmod shrinking the value because it occurs frequently in all of engineering, the discipline, and the exam.

It is also worth noting that there are some “non-sensical” terms that are prevalent on this exam. Though only a small portion are seen in Table 1, like “ofll”, “gb” and “crss”, most of them exist in the ranks greater than what is shown. Further, it is important to note that “gb” is shorthand for “grain boundary”, and “crss” is shorthand for “critical resolved shear stress”, both of which are words characteristic to the course and might be interpreted otherwise. Other terms such as “ae”, “gc”, “ndx”, “derisity”, and many others pollute the dataset even though the exams have been carefully processed. Unfortunately, these words continue to exist on all of the datasets and affect the computation of accurate TF-IDF values. This shows that though the approach shows promise to distinguish course-specific words from “everyday” language, there remain many artifacts that compromise the accuracy of using this method as currently defined.

193

Figure 2 graphically depicts the words and their corresponding TFIDF score, ranked in decreasing order going from left to right. The data show that there is a small subset of vocabulary – seen here as being ranked from 1 to roughly 50 – that have a much higher TFIDF score than the majority of other words on the list. That is, a few words have a high TFIDF score while the majority have a consistently lowscore. It is also important to note that the tail of the data in Fig. 2 shows a downward (negative) trend as it approaches the lowest TFIDF scores. In the wordlist, these words are typically nonsensical artifacts that pollute the dataset and are not course-specific.

Discussion

Critique of the Approach as situated in the Literature

The TD-IDF method is one in a spectrum of approaches that range in utility and feasibility when applied to investigating the discipline specific terminology in a course. Ideally, a method can be found that is easy to employ with minimum effort by the instructor (i.e. highly feasible), and produces a list that is of high value to the freshman student (i.e. of high utility). On one end of the spectrum there are approaches that examine just the frequency of words (e.g. Zipf’s Law12,

13, 18) and these approaches are highly feasible but low in utility. Frequency information is useful as it explains how ‘conversational-sounding’ the text is12, and which words are used more or less frequently than others, etc. but it does not provide much utility towards the purpose identified here. The ease of implementation is high though, because documents can be submitted to a software program that tallies the occurrences of each word and graphs this information. The shape of the graph can then easily be used to characterize the language.13, 18

Conversely, there exist approaches that use synsets, or the relationship of words to one another using language corpora, that can be used to characterize language on documents.13, 14 These approaches rely on comparing the meaning of words to one another, and these meanings are identified using tools such as WordNet, etc. These synset-based approaches produce a large quantity of rich data about the vocabulary. This information would include the meaning of words in sentences and how they evolve with the context in which they appear. Though very informative and thus high in utility, the feasibility of using such approaches is low because the amount of information required about the vocabulary being explored a priori is high. For example, the corpora used in identifying meaning needs to be continuously updated by an expert (or the instructor) to take into account the ever-changing vocabulary. As such, synset-based approaches require a large amount of support to produce and use corpora that include not only a list of words, but also information about how they associate. This may be preferable, but also necessitates the creation of a large back-end support system versus an overly-simplistic frequency analysis approach that does not provide much utility.

The method we have identified, a modified application of the TFIDF algorithm, works toward creating a dataset that has higher utility than pure frequency analysis yet is more feasible to

194

implement than synset-based approaches. This is because the approach does not require multiple corpora and systems to understand the specific meaning and relationships between words, but instead uses contextual information provided by the comparator document set. Specifically, the user provides comparator sets in the form of groups of exams or other teaching materials. Users are not required to know specific details about the comparator sets, other than the course and disciplines, but they need to provide the document sets in a machine-readable format. This approach is a tradeoff between utility and feasibility because although it requires some contextual information, which comes in the form of other documents, it does not require a continuously-updating support system to update the meanings and relationships between words. As such, the TF-IDF method has a higher utility than purely a frequency analysis approach, while also being more feasible to implement than approaches based on synsets, yet still provide information that can assist us in separating discipline-specific language from others.

One can imagine a system that automatically files final exams into a database based on course information and a few key words that identify the field, e.g. materials science fundamentals or materials and metallurgy, etc. The instructor could simply identify the target document or documents (such as last year’s midterm exam, or final exam for her course), identify courses in the same discipline by course number or keyword, and then hit “run”. The program would automatically produce a word list for her to distribute to her students at the start of the new term. For a freshman student a word list like this lessens anxiety about what they need to learn and creates a starting vocabulary of terms relevant to the field.

Critique of the Methodology as it exists right now

The preliminary results suggest that the modified TFIDF approach is able to distinguish discipline-specific vocabulary from other words. The methodology is soundly grounded in existing methods of automated indexingand the TFIDF lists that we have produced appear to be largely discipline-specific for the first 50 or so words.

This method needs further improvement to eliminate artifacts before progressing further. The data currently shows a high number of nonsensical terms that do not appear to be in the English language. Specifically, the data contains terms like ‘ofll’ and ‘gb’ Ideally, artifacts would be eliminated before TFIDF calculations are made. However, this is not a straightforward task because of the use of acronyms and other anomalies in engineering jargon. Suggested approaches to this problem are as follows. First, it may be possible to remove words that do not include vowels a priori to being scanned. In doing so, we remove a significant portion of terms that might not exist in the English language6, 7 but risk removing important engineering acronyms. Another strategy is to incorporate a ‘spell-checker’ application that can scan text to highlight these terms using a combination of English-language as well as existing Corpus-based tools such as WordNet. This approach uses a larger corpus of language comparison tools than most word processors because it can draw on language from engineering corpora as well as standard English corpora. However more involved, this second strategy does not remove words

195

automatically and thus ensures that the process is not removing vocabulary that may be pertinent to the student, like technical jargon. However, as a result, each word on the outputted wordlists would need to be extracted manually and this could be a lengthy process that reduces the feasibility of a computational approach for an instructor. Such a method might also be used instead to highlight terms on the final outputted list. This way, an instructor can visually see words that appear high in the TF-IDF lists and appear in relevant corpora as well. Ideally, a combination of the vowel-removal and corpora-scan methods could be employed to remove as many non-English words as possible to maximize the effectiveness of a computational approach characterizing the language in engineering courses. As an added benefit an instructor could input a draft of an exam or other piece of course material and explicitly identify the vocabulary they are testing.

Conclusions

This study uses a modified approach from the field of computational linguistics to characterize vocabulary on engineering exams. The objective is to increase the transparency of learning outcomes expected in an engineering classroom, specifically the development of a professional vocabulary. By using a repository of 2229 exams, a modified term-frequency inverse-document frequency (TFIDF) algorithm assigns a weight to each word in an input exam by comparing it to the occurrences of those words across all exams; the weight represents the degree to which that word is characteristic to that document. The data show that this method does appear to preferentially give course-specific words higher ranking. However, we also found that these wordlists are polluted with non-English words and that further work in cleaning the input text files a priori is required. Going forward, this method can be used to create tools, like a software program, that can assist in compiling together a list of professional vocabulary in a course automatically. References

1. Mazur, Beth. "Revisiting Plain Language." Technical Communication: Journal of the Society for Technical Communication 47.2 (2000): 205-11.

2. Robinson, Peter, and Nick C. Ellis, eds. Handbook of cognitive linguistics and second language acquisition. London and New York: Routledge, 2008.

3. Ahearn, Laura M. "Language Acquisition and Socialization." Living Language: An Introduction to Linguistic Anthropology (2011): 50-64.

4. Bunch, George C., Percy L. Abram, Rachel A. Lotan, and Guadalupe Valdés. "Beyond sheltered instruction: Rethinking conditions for academic language development." TESOL Journal 10, no. 2‐3 (2001): 28-33.

5. Braun, Sabine. "From pedagogically relevant corpora to authentic language learning contents." ReCALL 17.1 (2005): 47-64.

6. Krashen, Stephen D. Explorations in language acquisition and use. Portsmouth, NH: Heinemann, 2003. 7. De Saussure, Ferdinand. Course in general linguistics. Columbia University Press, 2011. 8. Jackendoff, Ray. Foundations of language: Brain, meaning, grammar, evolution. Oxford University Press,

USA, 2002.

196

9. Dąbrowska, Ewa, and James Street. "Individual differences in language attainment: Comprehension of passive sentences by native and non-native English speakers." Language Sciences 28.6 (2006): 604-615.

10. Aitchison, Jean. Language change: progress or decay?. Cambridge University Press, 2000. 11. Church, Kenneth W., and Robert L. Mercer. "Introduction to the special issue on computational linguistics

using large corpora." Computational Linguistics19.1 (1993): 1-24. 12. Mitkov, Ruslan, ed. The Oxford handbook of computational linguistics. Oxford: Oxford University Press,

2003. 13. McEnery, Tony, Andrew Wilson, and Geoff Barnbrook. "Corpus linguistics." Computational

Linguistics 24.2 (2003). 14. Budanitsky, Alexander, and Graeme Hirst. "Evaluating wordnet-based measures of lexical semantic

relatedness." Computational Linguistics 32.1 (2006): 13-47. 15. Bybee, Joan L., and Paul Hopper, eds. Frequency and the emergence of linguistic structure. Vol. 45. John

Benjamins Publishing Company, 2001. 16. Phillips, Betty S. "Lexical diffusion, lexical frequency, and lexical analysis." Typological Studies in

Language. 45 (2001): 123-136. 17. Roland, Douglas, Frederic Dick, and Jeffrey L. Elman. "Frequency of basic English grammatical structures:

A corpus analysis." Journal of Memory and Language 57.3 (2007): 348-379. 18. Montemurro, Marcelo A. "Beyond the Zipf–Mandelbrot law in quantitative linguistics." Physica A:

Statistical Mechanics and its Applications 300.3 (2001): 567-578. 19. Bellegarda, Jerome R. "Exploiting latent semantic information in statistical language

modeling." Proceedings of the IEEE 88.8 (2000): 1279-1296. 20. Maynard, Diana, and Sophia Ananiadou. "Trucks: a model for automatic multiword term

recognition." Journal of Natural Language Processing 8.1 (2000): 101-126. 21. Variawa, Chirag, and Susan McCahan. “Identifying Language as a Learning Barrier in Engineering.”

International Journal of Engineering Education 28.1 (2012): 183-191. 22. SHI, Congying, Chaojun XU, and X. Yang. "Study of TFIDF algorithm." Journal of Computer

Applications 29 (2009): 167-170. 23. Robertson, Stephen. "Understanding inverse document frequency: on theoretical arguments for

IDF." Journal of Documentation 60.5 (2004): 503-520. 24. Singhal, Amit. "Modern information retrieval: A brief overview." IEEE Data Engineering Bulletin 24.4

(2001): 35-43. 25. Han, Eui-Hong, and George Karypis. "Centroid-based document classification: Analysis and experimental

results." Principles of Data Mining and Knowledge Discovery (2000): 116-123.

197

APPENDIX A.6 – AN AUTOMATED COURSE-SPECIFIC VOCABULARY IDENTIFICATION PROGRAM

C. Variawa, and S. McCahan. “Exploring the Applicability of an Automated Course-specific Vocabulary Search Program.” Proc. of Canadian Engineering Education Association Conference. CEEA Paper No. 59. Montreal, 2013. This paper was presented at the 2013 Canadian Engineering Education Association Annual Conference. This paper discusses the application of a modified algorithm based on the Term Frequency-Inverse Document Frequency (TF-IDF) equation as applied a second-year undergraduate chemical engineering exam. The results suggest clustering of TF-IDF scores around specific values, and similar data within each cluster.

198

AN AUTOMATED COURSE-SPECIFIC VOCABULARY IDENTIFICATION PROGRAM

Variawa, C., and McCahan, S. Department of Mechanical and Industrial Engineering, University of Toronto, Ontario Canada

[email protected]; [email protected]

1. INTRODUCTION

Professional engineering language is often used to precisely describe specific objects, processes, or situations but may not be taught to students in the engineering classroom as an explicit course objective. Though the method of teaching this vocabulary varies, students’ mastery in understanding this corpus is usually assessed (explicitly or implicitly) using written tests and exams based on course content.

As instructors develop their courses around specific learning outcomes, it can become difficult to accurately characterize the vocabulary that ought to be learned. In particular, the words may change over time and with instructor, the ability to discern course-specific from non-course-specific words can be subjective, and the word list of required vocabulary may not be feasible to produce manually. In addition, if the list of vocabulary is not defined a priori, then the transparency of the learning outcomes and assessment validity are reduced.

One approach to developing a critical vocabulary list is to deploy an automated strategy that statistically produces a list of “course-specific” words with minimal user intervention. The strategy relies on analyzing recent existing teaching materials for characteristic vocabulary. This dynamic approach maximizes objectivity while giving the instructor a starting point for defining a list of necessary vocabulary.

The general field of language analysis using automated computational approaches falls under a category of computer science and engineering defined as artificial intelligence. Specifically, words are being translated from human vocabulary to statistical values, and then combined to form a hierarchy of diagnostic keywords for a given document. The output of such an approach would be a word list of course-specific terms.

For this study, the researchers modified an algorithm called Term Frequency-Inverse Document Frequency (TF-IDF) to classify the vocabulary on a group of engineering exams. This algorithm first calculates the term frequency of each word on an exam by dividing the number of occurrences of a word by the total number of words in that document. The term frequency is then multiplied by the inverse document frequency, which is the logarithm of the number of documents in a comparator set divided by the number of documents in this set containing the word. This factor is dependent on the sample size; so, the more documents that are available in the database, the more accurate the output score for the word. Then, the word and its TF-IDF score are tabulated.

This method appears to show a correlation between course-specific words and TF-IDF score, but some common words also have high TF-IDF scores. To mitigate this, the researchers develop another word list using an identical TF-IDF approach, but this time using a different comparator set. The first word list is created by comparing an exam to all

exams in the same discipline. The second word list is created by comparing the same exam to exams in all of engineering. The first score is subtracted from the second resulting in a word list that differentiates course-specific vocabulary more reliably.

2. RESULTS/DISCUSSION

The automated method is used at the University of Toronto

(UofT) on a database of 2251 engineering exams from the years 2007-2011. The method was coded into a software program that automates serial tasks such as optical character recognition, content organization, and processing of calculated results and words.

The authors combined several instances of one course – CHE230 “Environmental Chemistry” – over a span of three years, 2009-2011, into a large word list. This wordlist is then compared to the 21 million word UofT Engineering Corpus using the TF-IDF algorithm. The resulting wordlist is graphically presented in Fig 1.

The figure above shows that the first 75 or so words on this list have a significantly higher modified TF-IDF score than the rest of the 507 words. Since the score is a measure of how characteristic a word is to a given document (in this case, 3 instances of the same exam), the first 75-words can be investigated in detail to see whether these appear to be course or domain-specific words. The long “plateau” of words, starting around 75 on the x-axis, are those that appear just as frequently in the CHE230 exams as they do in all of engineering, while the tail-end around 500 are words that occur less frequently in CHE230, than in engineering in general.

Theoretically, high scoring words should be the only ones that are characteristic to the document being investigated. To test whether the vocabulary is properly sorted, the authors are working with the course staff of CHE230 and a linguistics expert to identify whether the method has captured an appropriate word list. Future work includes increasing the robustness of this method by sifting words that are misspelled, or are non-English words. The goal is to reduce inaccessible language while promoting technical vocabulary development.

Figure 1 - Shows the combined results of CHE230 over a 3 year period. The TFIDF score is an indicator of keywords, and appears to plateau around the 75-word mark

199

APPENDIX A.7 – ENGINEERING VOCABULARY DEVELOPMENT USING AN AUTOMATED SOFTWARE TOOL

C. Variawa, S. McCahan. “Engineering Vocabulary Development using an Automated Software Tool”. Proc. of 121st ASEE Annual Conference and Exposition. Indianapolis, 2014. This paper will be presented at the 2014 American Society for Engineering Education Annual Conference. This paper evaluates the effectiveness of a modified algorithm based on the Term Frequency-Inverse Document Frequency (TF-IDF) in characterizing vocabulary. The dataset used for this study are written final exams at the University of Toronto. This paper presents and discusses a study where subject-matter experts evaluate the efficacy of this algorithm in identifying discipline-specific vocabulary on written final exams in engineering.

200

Engineering Vocabulary Development using an Automated Software Tool

Abstract

Understanding technical vocabulary is often a desired learning outcome in engineering education, and a significant part of professional communication in the engineering profession. Language used in engineering education plays a key role in creating an accessible and inclusive learning environment. The corpus of language common to both the instructor and student ought to converge as the student masters the course content. Instructors may currently use techniques to help identify this vocabulary, including referring to glossaries and increasing the frequency of their use in the classroom. There is an opportunity to increase transparency and accessibility to such vocabulary by developing an automated software-based tool that can be used by instructors to create customized course-specific wordlists for their courses. Using text extracted from instructional material in a course, the algorithm developed for this study is able to hierarchically identify and display course-specific terminology using principles from artificial intelligence, linguistics, higher education, and industrial engineering. Grounded in the theory of Universal Instructional Design, these wordlists can be integrated into a syllabus and then be used as a teaching aid to promote an accessible engineering education. The goal is to reduce barriers to learning by developing an explicitly-identified and robust list of vocabulary for all students in a given course. Creating an automated program that improves vocabulary information over time keeps it relevant and usable by instructors as well as students.

Presently, there is no automated method to develop course-specific vocabulary lists. To fill this gap, the authors have created a computer program, using a repository of over 2200 engineering exams since the year 2000 from the University of Toronto, which automatically identifies domain-specific terms on any given engineering exam. Specifically, each word from each exam is digitized and computed against others using a modified form of the Term-Frequency Inverse Document-Frequency (TF-IDF) algorithm to generate lists of context-specific characteristic terms. This well-known algorithm is used in the field of computational linguistics as a method of identifying words characteristic to a document, given a comparator set of documents. In this work, a modified approach has been developed that uses several comparator sets to produce a list of engineering vocabulary for a course. The effectiveness of this approach is evaluated by comparing the results to the judgment of subject-matter experts. This paper will use the data gathered to discuss the efficacy of this automated program in the context of engineering research methods, and will identify ways in which to make this program accessible to, and usable by, more educators in the field of engineering education.

201

Introduction

This study investigates an approach to increase transparency of learning outcomes by explicitly defining them for students. Engineering students, in particular at the undergraduate level, are subject to understanding terminology relevant to their discipline, as well as the context in which these terms can be used appropriately. Through understanding of discipline-specific vocabulary, each student eventually forms a corpus of words that they can use as part of professional practice. As such, the importance of learning discipline-specific vocabulary forms a critical component of learning in engineering education, and is an area for research and optimization.

Currently, identifying discipline-specific vocabulary must be done manually. If the instructor chooses to, they will review course material and make a list of course vocabulary based on their subject-matter expertise. Sometimes, an instructor may defer to a “glossary” of a required course textbook, or the body of the textbook to support the teaching of vocabulary. However, this may imply that all terms are equivalently weighted in terms of importance; it relies on having an up-to-date text; and it relies on the text matching the instructor’s terminology and teaching methods. In general, these manual processes are time-consuming and are not particularly rigorous to evolving knowledge and instructional environments.

An automated strategy can be based on existing instructional materials and be used as a starting point for further refinement by the instructor of the course. In this work we explore whether a computational method can be used to characterize vocabulary in engineering documents, and the efficacy of doing so. The approach used in this research is to develop and evaluate a computer program that can replicate human subject-matter expertise in characterizing vocabulary in instructional materials. This would provide a basis for further refining the learning outcomes to increase transparency, and as a result, accessibility to learning materials. The strategy for addressing this problem is to make design of vocabulary part of overall course design. This requires explicitly identifying the vocabulary that students need to learn in the course of their studies and is based on the framework of universal instructional design.

Literature This research is based on the framework of Universal Instructional Design (UID). The goal of the study is to increase accessibility to education by providing clearly-defined learning outcomes. In this specific study, this is done by identifying the discipline-specific requisite vocabulary that students need to master in a course. The UID framework is to “maximize accessibility to the greatest degree possible for the greatest number of users possible”. Here, the research study attempts to maximize accessibility to language used in engineering education for students. As such, the principles of universal design should help guide research toward more accessible learning environment design for diverse student populations. There have been a number of authors who have interpreted the principles of universal instructional design.1-3 The

202

universal design framework applies the principle of “learner centered” not just to one teaching instance, but to the design of the whole learning environment at every level. McGuire, Scott, and Shaw suggest that this framework is a “paradigm shift” that promotes uniformity of academic goals and standards by designing accessibility into a course, curriculum, and institution, rather than making exceptions for individual students who do not fit our preconceived idea of what is “typical”.1 They point out that individualized accommodation will still be necessary for some students. However, pervasive use of exceptions may undermine the integrity of a course, whereas designing accessibility into a course opens up learning opportunities for a broad range of students. Additionally, they have noted that this framework remains a largely untested strategy that requires further testing and validation. Pliner and Johnson discuss UID in relation to transforming social relationships which can be negatively affected by invisible barriers to inclusivity.3 Their work suggests that implementing UID pedagogy creates a more “inclusive” environment which can decrease the barriers to learning that all individuals may have to some extent.

A review of the literature shows that there is serious concern about barriers to success for students, and a wide variety of approaches have been employed to try to mitigate barriers for at-risk students. Universal Instructional Design offers one possible approach and a framework for interpreting the impact of mitigation tactics. It will serve as a useful context for designing instructional tools that aim to maximize accessibility to education. However, instructors should also bear in mind that this is not the only framework and other ways of thinking about these issues should be investigated.

Building on the previous work performed by the authors4, this paper expands on the modified Term Frequency-Inverse Document Frequency (TF-IDF) algorithm already used extensively in the area of vocabulary analysis. In summary, the TF-IDF algorithm is borne from the field of automated indexing and computational linguistics and a widely-accepted form of vocabulary characterization.5-10 It takes an input document, stores each word into an array element, then performs a series of mathematical calculations to assign a numerical score to each word. This score is a diagnostic measure of how characteristic a word is to that specific document. This algorithm assigns this score based on term frequency, and how often each of the words in that document appears in a comparator set of documents. The TF-IDF algorithm is based on the following equation:

TFIDF = TF × IDF

where



And,

203



The TF-IDF equation is a measure of how characteristic a word is to a document, and can be discussed in terms of its constituent terms. The TF is a number determined by counting of occurrences of a particular word, and dividing that number by the total number of words in the target document: as such, it is a measure of frequency. The IDF is a measure of how important a particular term is within a set of documents, and is calculated by dividing the total number of documents by the number of documents in the set which contain that term, and then takes its logarithm. The TF-IDF formula multiplies these together and attaches the resulting score to each unique word in the target document. This equation works by comparing the static term frequency score for each word in an input document by a variable inverse-document frequency-score.

Since the comparator set of documents can change based on a number of factors, including year, instructors, etc., the IDF score can be updated and influence the TF-IDF score for all documents. This causes the TF-IDF statistic to evolve with changing datasets, and helps address the issue associated with evolving language. Additionally, the multiplication factor, the logarithm, enhances the effect of the document frequency and increases the resolution of finding characteristic terms within the input document. A high weight in TF-IDF is reached by a high TF and a low IDF of a word in the comparator set of documents. The weights therefore tend to filter out common terms. Since the ratio inside the log inverse DF is greater than or equal to 1, the value of IDF (and TF-IDF) is greater than or equal to 0. As a term appears in more documents, the ratio inside the logarithm approaches 1, bringing the IDF and TF-IDF closer to 0. This expands the effect of terms appearing in multiple documents, and maximizes its contribution to the TF-IDF score even though the TF score itself may be very similar to others in a particular document.

The modification to this approach, also discussed in previous work4, is to use this TF-IDF algorithm repetitively in different contexts. Specifically, an input document can have the words within it be calculated using TF-IDF using one comparator set, and then calculated again using another comparator set. In both cases, the words will be the same since the same input document is being used. The TF-IDF scores, however, will be different because of the comparator sets. Based on the context being used, words will have a lower or higher TF-IDF score. Further, this phenomenon can be exploited to further extend the resolution TF-IDF scores, in particular by helping the experimenters discern vocabulary that is characteristic to an input document in a specific user-defined context – like a particular discipline, for example.

204

Methodology This study investigates the efficacy of the modified TF-IDF algorithm in mimicking human subject-matter expertise, as it develops wordlists of discipline-specific vocabulary. The methodology is comprised of two phases – the automated production of discipline-specific wordlists, and the testing of the efficacy of these wordlists. The first phase has been extensively published in previous work4 and these results show that the TF-IDF algorithm appears to work. The second phase of the study, as discussed in this paper, uses subject-matter experts – faculty members –to evaluate the efficacy of the wordlists developed. The correlation between the judgment of the subject-matter experts and the list generated through the computational method is assessed. The overall research study is outlined in Figures 1 and 2 below. Figure 1 shows phase one, and Figure 2 shows phase two, respectively. In phase one, words are prepared for analysis by converting all input documents to text-only format. Then the modified TF-IDF algorithm is used to develop word lists based on a target document (i.e. a specific document or set of documents from a specific course) and sets of comparator documents. The word list generated is a hierarchical discipline-specific vocabulary list that characterizes the target document. In phase two, human subject-matter experts were recruited to evaluate the efficacy of the automated approach in accurately identifying discipline-specific vocabulary. The documents used for this study are 2254 electronically-available undergraduate engineering final exams from the University of Toronto. These exams are a summative assessment of a student’s mastery of course concepts, and are intended to measure learning of the entire body of knowledge – or as close as possible – of a course. These documents are standardized across all engineering courses at the institution, are roughly the same length, administered in a closely-supervised environment, and are electronically available for data mining and study purposes. Due to the large quantity of words used in this study – over 22 million – this body of data serves as a starting point for additional research in the area of vocabulary characterization in engineering education.

205


text

Engineering Exams

Raw

Te

xt




Within Subject Computing an

exam to all exams in the Same

Discipline

Wor

dlist

Wor

dlist

DIFFERENTIATINGPulling out the language used specifically in that

engineering course(engineering minus discipline)

Wor

dlist

POST-PROCESSINGEliminating duplicates, sorting in decreasing order of TFIDF score

ADOBE ACROBAT X


MS EXCEL

Figure 1 - Shows graphically the methodology used in Phase One of the research study from top to bottom

206

Figure 2- Shows graphically the methodology used in Phase Two of the research study from top to bottom

207

Overview of Phase Two - Evaluating the Efficacy of the wordlists in capturing discipline-specific vocabulary

This study focuses on gauging how well the wordlists capture discipline-specific vocabulary. To evaluate this, 9 subject-matter experts were recruited from the pool of faculty members teaching the courses whose exams were processed in phase one. As instructors, these faculty members are very familiar with the language that ought to be discipline-specific for the courses that they teach. This aspect of the research has passed the ethics review at the institution where this study was conducted.

The methodology of this phase of the research involves training, calibration, quantitative data collection, and debriefing of each participant. A condensed methodology is described below:

1. Participants were recruited using a standardized email request. In some cases, participants were asked in-person as a follow-up to the email, to ensure that the email was read.

2. A Doodle.com account was created, and each willing participant was scheduled into a 1-hour meeting timeslot; one participant per timeslot.

3. At each meeting, the participant was provided with an “Informed Consent” document. This required form was signed by each participant of the study. The study was briefly explained. This exercise reaffirmed the goal and purpose of this research, and emphasized the importance of providing authentic input.

4. The participant was told that they will be provided with a randomized list of 100 words, extracted from final exams of courses they have instructed in the past. Though the course for each participant was unique, each wordlist was developed using combined data across all years that the participant taught that course.

5. Participants were told that they would be assigning a number to each word in the list, using a scale provided to them. This is a 5-point scale, and ranged from words being not discipline-specific to very discipline-specific. A brief calibration exercise preceded data collection. The participant was given a print-out of the scale, and was given five words orally. The participant briefly discussed what they would score these words, and after they were confident in using the scale, the study progressed forward.

6. The participant was then given a list of 100-words from their own course and asked to assign a number from 1 to 5 to each word.

7. After completing the study, the participant was debriefed and given a complete wordlist for their course. This wordlist contained ranked words with corresponding TF-IDF scores, and a copy of a short academic paper explaining the study (written by the experimenter). Each participant was also thanked for their time and contribution to this study.

208

8. Each of the 1100+ datapoints (scores) were then manually entered into Excel spreadsheets for data analysis to measure how they compare to the TF-IDF generated wordlists.

Results

The results from this evaluation study are currently being investigated to understand statistical significance. Preliminary calculations show that the algorithm works well for a yes/no characterization – domain-specific or not-domain-specific – but is weak in identifying words that fall in between 2 and 4 (inclusive) along the 5-point scale. For example, the initial data shows that the program can identify words that are characteristic to a discipline or not characteristic to a discipline, but has difficulty in differentiating more finely words that are somewhat characteristic, as judged by the subject-matter experts.

A sample output that shows the TF-IDF output and the human subject-matter expert score is provided in Table 1. below. The full wordlist from a sample freshman electrical engineering final exam is condensed onto 100-words, and sorted in decreasing TF-IDF score in the left-most column. The word itself is in the second column, followed by its TF-IDF score. The participant-assigned score is a value assigned by the faculty member, and falls along a scale that ranges from 1-5(inclusive), with a high value indicating a high degree of confidence that the word is discipline-specific. The quintile-rank is a value determined by binning the 100-word sample wordlist into 5 bins, and is used to map the TF-IDF score for each exam to the 5-point scale used by the faculty members. As such, a quintile rank of 5 should correspond to a participant rank of 5, and so on for an ideal case.

Table 1 – Shows a condensed sample of a wordlist from a freshman electrical engineering final exam. The word list is separated into 5 quintiles, indicated by differences in cell colour, and ranked in decreasing order of TF-IDF score. For brevity, the 100-word list has been condensed to show a sample of words from each quintile. Correlations are highlighted in yellow to the right.

209

RANK (/100) WORD TF-IDF SCORE

PARTICIPANT-ASSIGNED SCORE (/5)

QUINTILE-RANK (/5)

1 circuit 0.033323128 5 5

100-word CORRELATION (Using full 5-pt scale): 0.7165

2 voltage 0.015487884 5 5

3 electric 0.014911103 5 5

100-word CORRELATION (Using only extremes of 5-pt scale): 0.9272

4 capacitor 0.009280436 5 5 5 resistor 0.00906219 5 5

40 result 0.000262347 3 4 41 motor 0.000260432 3 4 42 discontinuous 0.000254686 3 4 43 tesla 0.000239045 5 4 44 deactivated 0.000227847 3 4 70 associated 0.000121868 3 3 71 respectively 0.000121452 1 3 72 half 0.00011827 1 3 73 results 0.000117417 3 3 74 losses 0.000112727 4 3 81 cannot 2.31533E-05 1 2 82 indicate 2.30839E-05 3 2 83 generated 2.05447E-05 3 2 84 difficulty 2.03236E-05 1 2 85 right 1.88357E-05 1 2 91 inside -9.57969E-05 1 1 92 variety -0.000101296 1 1 93 of -0.000115615 1 1 94 at -0.000124816 1 1 95 place -0.000125485 1 1

210

Discussion

The data shows that a correlation exists between the participant-assigned scores and the software-assigned scores for the sample case chosen. An initial investigation of the data shows that an outright correlation across the full 5-point scale between software-scores and human-scores is present, but is not as high as a correlation between each of the extremes of the scale. In particular, the 5-point scale given to the participants maps to quintile categorization of TF-IDF scores. The sample case shows a correlation of 0.71, and this is similar to the other courses still being calculated for statistical significance. A preliminary observation of the participant-ranked scores suggests that though they have utilized the full-resolution of the 5-point scale, participants have a tendency of assigning very high or very low scores to each word. This appears to be consistent among all participants so far, and may suggest that a 5-point scale may have a resolution higher than what can be fully-utilized by each participant. Even though each participant was calibrated to the 5-point scale prior to beginning the study, favoring extremes on that scale might suggest that participants are not able to discern gradients in between and/or the extended resolution is too high.

If only the extremes of the scale are taken into consideration, the data shows that the computational method works very well. Specifically, if words that are scored a “5” or “1” by the participant are compared to their TF-IDF quintile bin, then there is a strong correlation. Sample data from a test case, a freshman Electrical Engineering core course, has a correlation of 0.927 and is shown in Table 1. Initial observations suggest that the subject-matter experts and the TF-IDF program are in agreement for high-ranked and low-ranked words, for most of the data collected so far. Currently, 11 studies have been completed, and 4 remain; the data so far suggest that the program works as the correlations are comparable across all of these courses.

When data is compiled from courses which may have less technical vocabulary, like design courses for example, an initial examination suggests that the correlations between subject-matter expert and the TF-IDF program are lower. In planning the survey, the experimenter predictively assigned three subject-matter experts to score the exact same design-heavy course. Though the data is currently being compiled, initial observations show that the correlation between participants and computer-assigned scores is much lower; slightly less than 0.7.

Currently a group of senior-year students from computer engineering are developing a web-based project based on the modified TF-IDF algorithm. The goal is to make this project accessible to people from around the world, so that they can submit their exams for calculation. This is in response to questions asked during ASEE-2013 where instructors wanted access to this software for their own courses. The users of this platform will have their documents categorized and added to the existing repository, and in return receive a scored wordlist based on the modified TF-IDF algorithm.

211

Conclusions

The computational approach based on a modified TF-IDF algorithm appears to successfully replicate human subject-matter expert knowledge in identifying discipline-specific vocabulary. Through the dataset is currently limited to 9 exams, initial statistical measures for correlation show strong results. In particular, the software is able to accurately characterize vocabulary that is discipline-specific and this is a promising starting point for further research in the area of language analysis in engineering education. This work can lead to the development of clearer and more explicitly-defined learning outcomes, with the goal being to increase accessibility to technical terminology and robust vocabulary development for all students.

References

1. Bowe, F., Universal design in education: Teaching nontraditional students. Bergin & Garvey, Westport, CT, 2000.

2. McGuire, J.M., Scott, S.S., and S. F. Shaw. Universal design and its applications in educational environments.

Remedial and Special Education, 27(3), 2006, pp. 166-75. 3. Pliner, S.M., and J. R. Johnson. Historical, Theoretical, and Foundational principles of universal instructional

design in higher education. Equity & Excellence in Education, 37(2), 2004, pp. 105-113.

4. C. Variawa, S. McCahan, and M. Chignell. “An Automated Approach for Finding Course-specific Vocabulary”. Proc. of 120th ASEE Annual Conference and Exposition. Atlanta, 2013.

5. Church, Kenneth W., and Robert L. Mercer. "Introduction to the special issue on computational linguistics

using large corpora." Computational Linguistics19.1 (1993): 1-24.

6. McEnery, Tony, Andrew Wilson, and Geoff Barnbrook. "Corpus linguistics." Computational Linguistics 24.2 (2003).

7. Bybee, Joan L., and Paul Hopper, eds. Frequency and the emergence of linguistic structure. Vol. 45. John

Benjamins Publishing Company, 2001.

8. SHI, Congying, Chaojun XU, and X. Yang. "Study of TFIDF algorithm." Journal of Computer Applications 29 (2009): 167-170.

9. Robertson, Stephen. "Understanding inverse document frequency: on theoretical arguments for IDF." Journal of

Documentation 60.5 (2004): 503-520.

10. Singhal, Amit. "Modern information retrieval: A brief overview." IEEE Data Engineering Bulletin 24.4 (2001): 35-43.

212

APPENDIX A.8 – EXPLORING THE INSTRUCTIONAL IMPLICATIONS OF AN AUTOMATED COURSE-SPECIFIC VOCABULARY IDENTIFICATION

PROGRAM C. Variawa, and P. Kinnear. “Exploring the Instructional Implications of an Automated Course-specific Vocabulary Search Program.” Proc. of Canadian Engineering Education Association Conference. CEEA Paper No. 86. Montreal, 2013. This paper was presented at the 2013 Canadian Engineering Education Association Annual Conference. This paper discusses the instructional implications of employing an automated vocabulary characterization tool in a learning environment. The discussion focuses on vocabulary learning, and suggests that the tool helps create a foundation that supports lexicogrammatical development of professional language.

213

EXPLORING THE INSTRUCTIONAL IMPLICATIONS OF AN AUTOMATED COURSE-SPECIFIC VOCABULARY IDENTIFICATION PROGRAM

Variawa, C., and Kinnear, P.

Department of Mechanical and Industrial Engineering, University of Toronto, Ontario Canada [email protected]; [email protected]

Keywords: computational linguistics, vocabulary identification, vocabulary instruction, EAP, ESP. Based on the principles of universal instructional design [1], a research study is being performed at the University of Toronto to explore whether an automated technique can be used to identify course-specific vocabulary. The motivation behind this is to create clearer course objectives and highlight the importance of developing a robust professional vocabulary, while promoting a more accessible engineering education. Additionally, the vocabulary used in engineering courses often contain vernacular that is neither technical nor course-specific, but is used to help contextualize the course content - the automated technique can also be used to identify such language.

The Term Frequency-Inverse Document Frequency (TF-IDF) algorithm is one approach that can help identify keywords that are specific to documents, when compared to relevant comparator sets. In this study, final exams from engineering courses are compared to a corpus of all electronically-available engineering final exams to develop wordlists for many courses. The algorithm works by multiplying the term frequency of each word on a specific exam being studied to the logarithm of how often that word appears in the group. The resulting data is tabulated to form a wordlist, with words characteristic of the input document having higher TF-IDF values. This approach has modified so that the wordlists are generated twice - once by comparing across exams from the same discipline, and another by calculating with all engineering disciplines - to increase the reliability of the wordlists. So far, the data generated shows that words that appear to be course-specific are assigned higher TF-IDF scores, and preliminary research is being conducted to understand the effectiveness of this sample data for a chemical engineering course. The theoretical effectiveness of distributing such wordlists as part of required course syllabi and course material is examined as part of this paper.

Vocabulary is critical to the academic and professional success of engineering students. All the students must learn to understand and use discipline specific professional language to practice engineering. For multilingual students this is a daunting task. While estimates vary, a student needs to know 98-99% of the lexical items to understand written discourse.

It is slightly lower for spoken discourse, around 80% for good comprehension (e.g. lectures, discussions). That translates to over 7000 word families [2]. Two questions confront students entering an English-medium university and

their instructors. The first question asks which vocabulary students need to learn and the second question queries what it means to “know” a word. These questions are fraught. Even among discipline experts, key vocabulary is contested. Attempts to deal with the first question include Coxhead’s Academic Word List (AWL) [3] and Xue and Nations’ University Word List [3]. However, these lists do not address the discipline specific vocabulary students will encounter in Engineering. English for Specific Purposes (ESP) teaching has attempted to target discipline specific technical vocabulary in its instructions. Mudraya [4], however, cites research, along with her own corpus-based research that learning the technical vocabulary is not the biggest challenge, rather the subtechnical, words that exist between the technical and the general, everyday vocabulary. Deciding where to focus student efforts is not a simple decision, however we can explore the use of automated course-specific vocabulary identification techniques to address this problem.

The problem we face currently is how to more effectively ensure that our multilingual students have access to the lexical resources they need to successfully develop and use professional engineering language. Evidence indicates that this incidental learning enhances knowledge of vocabulary the student has already seen more than the acquisition of new vocabulary [2]. The “word list” approach is equally inefficient as it relies on memorization of single forms and meanings. Students new to a discipline may well benefit from knowing which words carry discipline specific meanings, even if they may not yet understand the conceptual meanings and associations of the words, thus presenting students with a well-chosen word list with definitions at this point could be effective to introduce words. At other times a focus on the collocations and constraints may be most useful. Instruction that concentrates on the polysemous nature of the subtechnical vocabulary that students encounter in engineering communication documents and journal articles also has a place. Having a well-defined word list derived in a principled way from the discipline corpus provides a solid foundation from which to develop various instructional and self-study strategies to support student lexicogrammatical development in their professional language.

The development of a course-specific wordlist is a starting point for further research in the area of instructional support for professional language development in engineering education.

214

References [1] F. Bowe, Universal Design in Education: Teaching

Nontraditional Students. Westpoint, CT: Bergin & Garvey, 2000.

[2] N. Schmitt, "Instructed second language vocabulary learning,"

Language Teaching Research, vol. 12, pp. 329-363, 2008. [3] N. Hancioglu, S. Neufeld, and J. Eldridge, "Through the

looking glass and into the land of lexico-grammar," English for Specific Purposes, vol. 27, pp. 459-479, 2008.

[4] O. Mudraya, "Engineering English: A lexical frequency

instructional model," English for Specific Purposes, vol. 25, pp. 235-256, 2006.

215

APPENDIX A.9 – EVALUATING THE USABILITY OF AN AUTOMATED COURSE-SPECIFIC VOCABULARY TO MEASURE LANGUAGE PROFICIENCY

IN ORAL PRESENTATIONS C. Variawa, and L. Wilkinson. “Evaluating the Usability of an Automated Course-specific Vocabulary to Measure Language Proficiency in Oral Presentations.” Proc. of Canadian Engineering Education Association Conference. CEEA Paper No. 75. Montreal, 2013. This paper was presented at the 2013 Canadian Engineering Education Association Annual Conference. This paper tests the usability of an automated vocabulary characterization tool to assist in identifying course-specific vocabulary used by students during oral presentations. This investigation shows that the tool should allow for the characterization of acronyms and allow for mapping across root words. (*Note: this study informed an update to the methodology used to condition input documents, and this updated methodology was used for the contents of the dissertation).

216

EVALUATING THE USABILITY OF AN AUTOMATED COURSE-SPECIFIC VOCABULARY TO

MEASURE LANGUAGE PROFICIENCY IN ORAL PRESENTATIONS

Chirag Variawa and Lydia Wilkinson Department of Mechanical and Industrial Engineering, Engineering Communication Program, University of Toronto

[email protected]; [email protected] Abstract - A study is being conducted to automatically identify and highlight course-specific language used in engineering courses. Specifically, the Term-Frequency Inverse-Document Frequency (TF-IDF) algorithm, from the field of computational linguistics, is used to identify words that are characteristic to exams. This paper analyzes the success of a TF-IDF generated word list to capture discipline-specific language in oral design presentations in a second year Environmental Chemistry course.

1. INTRODUCTION

This paper examines the use of computer-generated wordlists to identify relevant course-specific vocabulary in an engineering classroom. Principles from the fields of computational linguistics, industrial engineering, and higher education are employed to create a software program that performs statistical calculations on groups of input documents to generate keyword lists. Specifically, the program uses a combination of documents from the same discipline and all of engineering to assign a value, called a Term Frequency-Inverse Document Frequency (TF-IDF) score, to each word in an input document. In the studies so far, the words have been extracted from all electronically available engineering exams at the University of Toronto. These documents are used because they are a standardized artifact of the learning environment, and are usually a summative indicator of the learning objectives in different courses. As a result, the wordlists are specific to each course, even though there may be overlap of certain words with other exams - the program finds keywords in documents, and course-specific vocabulary are considered keywords for engineering exams. The words from each exam are then tabulated in order of decreasing TF-IDF score. The higher the TF-IDF score, the more likely it is that the word is a keyword. However, this approach does have some critical features that limit its applicability in its current form.

2. USING TF-IDF IN AN ORAL DESIGN PROJECT

In this study, TF-IDF was used to measure the effectiveness of student’s discipline-specific language in a second year environmental chemistry class. We focused specifically on the formal client meeting, an oral presentation that is an important step in the design project. Mapping the word list against vocabulary in the presentation exposed

limits arising from differences in purpose and mode between exams and design projects.

The relative purpose of exams and design projects affects the type of word forms that are used. Exams aim to test knowledge and as such often use the imperative form to deliver a set of instructions. Students are asked to calculate the alkalinity of…, give the equation for…., or estimate the concentration of….in order to display knowledge. In comparison, design projects, and these oral presentations specifically, often use the past or future tense to describe what has been accomplished or what will be done next, explaining for instance that an estimated total will be collected, or that ratios were calculated. A direct mapping of the wordlist against the presentation does not capture the form of the word or acknowledge shared root words.

The skills tested for and used in exams and design projects—knowledge versus problem solving, investigation and project management—also impact vocabulary. Design projects require students to look for solutions outside the lecture content, and as a result words occur in the project that may not be tested for on the exam. For example, in one group of presentations, gas was the most frequently recurring word list term; but while it would be used on an exam to refer to a gas state, it was used here to refer to gasoline manufacturers, key contributors to pollution on the site being evaluated. Gas was also used as a modifier in gas chromatography, and notably, chromatography was the next most frequent word list term. Both in usage and colocation we again see the importance of interpretation in applying the word list.

The list also does not capture acronyms, which are a key expression of professional competency within this context. A student at the beginning of their presentation may indicate that they are testing for Volatile Organic Compounds, but through the rest of the presentation consistently refer to VOC’s. While the full term is only captured once, the repeated acronym has the same basic meaning. Within Chemical Engineering condensing terms is a necessary means of ensuring concision within a discipline where lengthy word chains are the norm. A student’s inability to recognize acronyms by graduation would suggest a lack of proficiency in their chosen field.

Given the above limitations, it becomes clear that the current approach of using the method as-is is not able to fully characterize the course-specific vocabulary used on oral presentations. Specifically, the program is not yet capable of discerning between acronyms or shorthand, the demonstration of skills in a design course, and differences in word form and usage. Future work will investigate these areas

217

as well as exploring the use of speech-to-text to aid in identifying word usage in oral presentations.

218

APPENDIX B.1 – INPUT CONDITIONING

This section expands on the process used to prepare the dataset for processing.

219

Input Conditioning The first phase of the study is to design and prototype the computational approach. Based on

the modified TF-IDF method discussed earlier, the program is assigning a TF-IDF weight to each word in

a given input document. In order to develop this program, it was first necessary to prepare each input

document so that it could be accessed by a computer as a string. The input documents are

electronically-available final exams in engineering, and are received in a variety of file formats including:

.jpg, .bmp, .PDF, .doc, .docx, .txt, and so on. The common computational aspect of these documents is

that they all contain text. Though some of these files contain explicitly-defined text, as in .docx files, the

image files do not have an ability to select and extract text as-is. In order to employ the computational

approach, all of the input documents needed to be in a standardized file format. In order to convert

these files into a machine-readable format that TF-IDF can calculate, the documents needed to be

processed by a program that would convert each file into a text-only document. In this processing step,

graphical elements are lost because intended output, the text-only file, is not able to contain elements

other than those found on a standardized ASCII table. For the purpose of this study, the researcher is

interested in the analysis of vocabulary and though graphical elements may contain text, at this current

point in the research it is not considered a part of the input data.

These text-only files should not include any symbolic characters including commas, quotation

marks, brackets, and any non-alphanumeric characters. This is because the TF-IDF algorithm uses

characters bounded by white space as individual words. Specifically, commas and other symbolic

characters are considered part of a word since they are not separated from a word using white space. In

addition, it is also important to remove any numbers from the documents being converted into text-only

format. This is because the study is interested in exploring discipline-specific vocabulary, and that

numbers add to the miscalculation of TF-IDF values; numbers are not to be treated as words for this

specific study.

220

So far, any characters not on the ASCII table between lowercase A and uppercase Z, including all

numbers, are removed during the conversion of input documents to text-only format. In particular, the

conversion process creates a single string from each input document. This string represents each final

exam and is now the input document that the computational approach will use for subsequent aspects

of the study.

In using this conversion approach iteratively, once per input document across all documents, it

was determined that a further refinement of the conversion process is necessary. Specifically, words

beginning with an uppercase letter are treated differently than the same word using all lowercase

letters. In addition, it appears that eliminating words that do not contain a vowel can artificially remove

acronyms that would otherwise be used in a discipline-specific manner. As such, the processing also

includes an additional “cleaning step” where only the following kinds of “words” are added. The set of

characters between two white spaces is converted into text and added to a text-only file if it follows the

schema in figure X.

Table 1 - Shows modifications made during conversion from PDF to TXT

Converted and Added to Text File Not Converted and Not Added to Text File

• Contains 1 vowel, and characters

between 65 and 90, and 97 and 122

(inclusive) on the standard ASCII

table.

e.g. “beam”, “catalyst”, “Lattice”, “electron”,

“Operation”, “caffeine”.

• Any character not on the ASCII

table

e.g. “要” (foreign characters)

• Contains exclusively all capital

letters

• No vowels (and NOT all caps)

e.g. “bem”, “ctlst”, “ltc”, “lctrn”, “prtn”, “cffn”.

221

e.g. “BLDG”, “RXN”, “FCP”, “GRND”, “IF”,

“STDNT”

• Is a “space” character (ASCII value

= 32)

e.g. “ “ (blank space)

• Contains any numbers

e.g. “be1m”, “catalyst3”, “lt4c”, “electr55on”,

“IF4f”, “ST1p4”

222

APPENDIX B.2 – SOFTWARE CODE: INPUT CONDITIONING

This section presents the code used to create the prototype software which prepares the input.

223

Imports System.IO Imports Microsoft.VisualBasic Imports System Imports System.Text Module Module1 '************************************************************************************************************************** Sub Main() '************************************************************************************************************************** Dim DirArray() As String 'Declare our variables Dim FileArray() As String Dim FinalStringArray() As String Dim strName As String Dim Mydir As String Dim MyFileName As String Dim First As Boolean Dim rowcnt As Integer Dim FCOUNT As Integer Dim AcroApp As Acrobat.AcroApp 'sets up an object of type Acrobat.AcroApp (the whole Acrobat app) Dim AcroAVDoc As Acrobat.AcroAVDoc 'AVDoc is opened in Acrobat’s user interface Dim AcroPDDoc As Acrobat.AcroPDDoc 'PDDoc is opened in the background and manipulated without the user seeing it Dim AcroTextSelect As Acrobat.AcroPDTextSelect 'Allows text in PDF to be selected Dim PageNumber, PageContent As Object Dim content As String = "" Dim i, a, j, p, l, counter As Integer Dim txtflname As String Dim finalstring As String '************************************************************************************************************************** 'Prompt user to enter the particular Folder Name to get files and file-names from folder directory 'strName = InputBox(Prompt:="Please Enter The Folder Name", Title:="Enter The Folder Name") '************************************************************************************************************************** 'Take from folder existing in a particular directory Mydir = "C:\Users\MCCAHAN-LAB\Desktop\New folder\" & strName '************************************************************************************************************************** 'Code below finds all PDF files in specified folder, returns name and file directory location, and stores the results into DirArray() and FileArray() rowcnt = 1 fCount = 0 First = True Do While (1) If First = True Then MyFileName = Dir(Mydir + "\*.pdf", vbDirectory) First = False Else

224

MyFileName = Dir() End If If MyFileName = "" Then Exit Do ReDim Preserve DirArray(0 To rowcnt) DirArray(rowcnt) = (Mydir + "\" + MyFileName) ReDim Preserve FileArray(0 To rowcnt) FileArray(rowcnt) = MyFileName rowcnt = rowcnt + 1 'represents number of files in folder, or UBound(DirArray or FileArray) Loop '************************************************************************************************************************** 'Code below will extract text from PDF's, and return final string with @ at the end p = 1 l = 1 finalstring = "" Do Until p = rowcnt AcroApp = CreateObject("AcroExch.App") AcroAVDoc = CreateObject("AcroExch.AVDoc") If AcroAVDoc.Open(DirArray(p), vbNull) <> True Then Exit Sub End If AcroAVDoc = AcroApp.GetActiveDoc AcroPDDoc = AcroAVDoc.GetPDDoc For i = 0 To AcroPDDoc.GetNumPages - 1 'For all the page numbers PageNumber = AcroPDDoc.AcquirePage(i) PageContent = CreateObject("AcroExch.HiliteList") If PageContent.Add(0, 9000) <> True Then Exit Sub End If AcroTextSelect = PageNumber.CreatePageHilite(PageContent) For j = 0 To AcroTextSelect.GetNumText - 1 content = content & AcroTextSelect.GetText(j) Next j Next i '************************************************************************************************************************** 'Clean PDF Content string using ASCII before we store it into the FinalStringArrayI() below For Counter = 1 To Len(content) a = Asc(Mid(content, counter, 1)) If (a >= 64 And a <= 90) Or (a >= 97 And a <= 122) Or a = 32 Or a = 10 Then ' finalstring = finalstring & Mid(content, counter, 1)

225

End If Next '************************************************************************************************************************** '************************************************************************************************************************** Dim new_line_char As Char = Chr(10) finalstring = Replace(finalstring, new_line_char, " ") Dim allwords() As String = Split(finalstring) Dim numwords As Integer = UBound(allwords) + 1 Dim counter1 As Integer = 1 Dim wcount As Integer = 1 Dim finalwords() As String Dim count2, count3 As Integer Do Until counter1 = numwords + 1 Dim currword As String = allwords(counter1 - 1) count2 = 1 count3 = 0 If currword = "" Then 'if there are extra spaces in between words in the .txt file, there's an empty element in the array 'this line accounts for those empty elements and instructs program to continue with the loop ElseIf isallcaps(currword) = True Then 'if word is all caps, program adds it to finalwords ReDim Preserve finalwords(0 To wcount) finalwords(wcount - 1) = currword wcount += 1 Else currword = LCase(currword) 'if the entire word isn't all capitals, turns it all to lowercase to make it easier If Len(currword) = 1 Then 'if the word is only one letter long, program only adds it to finalwords if it's 'a' or 'i' If Asc(currword) = 97 Or Asc(currword) = 105 Then ReDim Preserve finalwords(0 To wcount) finalwords(wcount - 1) = currword wcount += 1 End If ElseIf isavowel(currword(0)) = True Then 'if the word begins with a vowel, program loops through word and only adds it to finalwords if the word also contains a consonant For count2 = 1 To Len(currword) - 1 If isavowel(currword(count2)) = False Then count3 += 1 'counter goes up if there's a consonant End If Next count2 Dim boolvar As Boolean = Not (count3 = 0) 'boolvar = True if count3 isn't 0, meaning there is a consionant in the word

226

If boolvar = True Then ReDim Preserve finalwords(0 To wcount) finalwords(wcount - 1) = currword wcount += 1 End If Else 'if the word begins with a consonant, it's only added to finalwords if the word also contains a vowel For count2 = 1 To Len(currword) - 1 If isavowel(currword(count2)) = True Then count3 += 1 'counter goes up if there's a vowel End If Next count2 Dim boolvar As Boolean = Not (count3 = 0) 'boolvar = True if count2 isn't 0, meaning there is a vowel in the word If boolvar = True Then ReDim Preserve finalwords(0 To wcount) finalwords(wcount - 1) = currword wcount += 1 End If End If End If counter1 += 1 Loop finalstring = Join(finalwords) '************************************************************************************************************************** '************************************************************************************************************************** 'Concatenate an @ at the end of the extracted PDF text finalstring = finalstring & " @" '************************************************************************************************************************** 'Store each of the resultant strings extracted from each of the PDF's into the array called FinalStringArray() - NOTE: THIS NEEDS FIXTURE ReDim Preserve FinalStringArray(0 To l) FinalStringArray(l) = finalstring l = l + 1 '************************************************************************************************************************** 'Create a text-file and store the resultant 'finalstring' into the textfile, just for insurance purposes - NOTE: THIS NEEDS FIXTURE 'NOTE: Current code will export the extracted text to the same directory where the original PDF files are located txtflname = Replace(DirArray(p), ".pdf", ".txt") Dim TextFile As New Scripting.FileSystemObject Dim fs As Scripting.FileSystemObject Dim ts As Scripting.TextStream fs = New Scripting.FileSystemObject ts = fs.CreateTextFile(txtflname)

227

ts.WriteLine(LCase(finalstring)) ts.Close() '************************************************************************************************************************** 'Reset values of content and finalstring to Null, as well close previously accessed PDF's before exiting the loop and moving on to the next PDF content = vbNullString finalstring = vbNullString AcroAVDoc.Close(True) AcroApp.Exit() AcroApp = Nothing '************************************************************************************************************************** 'Increase counter, and move onto the next PDF to extract text from, in the specified directory above p = p + 1 '************************************************************************************************************************** Loop '************************************************************************************************************************** End Sub '************************************************************************************************************************** Public Function isavowel(ByVal inputchar As Char) As Boolean If Asc(inputchar) = 97 Or Asc(inputchar) = 101 Or Asc(inputchar) = 105 Or Asc(inputchar) = 111 Or Asc(inputchar) = 117 Or Asc(inputchar) = 121 Then Return True End If Return False End Function Public Function isallcaps(ByVal inputstr As String) As Boolean Dim a1 As Integer = Asc(inputstr) Dim count4 As Integer = 1 Dim noncapcount As Integer = 0 If (a1 >= 65 And a1 <= 90) Then 'If first letter is caps For count4 = 1 To Len(inputstr) - 1 'Loops through rest of word a1 = Asc(inputstr(count4)) If (a1 < 65 Or a1 > 90) Then noncapcount += 1 'Increments noncapcount if a char in word isn't caps End If Next count4 Else Return False End If If noncapcount = 0 Then Return True Else Return False End If End Function End Module

228

APPENDIX C.1 – COMPUTATIONAL APPROACH

This section expands on the process used to deploy the modified algorithm based on the Term Frequency-Inverse Document Frequency (TF-IDF) equation.

229

Coding the Modified TF-IDF Program Specifically, the experimenter chooses one file to be used as the input file, and renames that

text-only file “1.txt”. This indicates to the TF-IDF program that this file will be the input which needs

processing. The second action that the experimenter performs is to isolate that “1.txt” file with either

all exams from that same discipline, or with all exams from engineering except for that same discipline.

For example, if the input file, “1.txt” is a first year chemical engineering course called CHE101, then it is

placed into a folder with either all other “CHE” text files, or into a folder with all engineering text files

except ones that contain “CHE” in the title. This is due to the modification to the TF-IDF algorithm

mentioned earlier, and uses context-based calculations to increase the resolution and spread of the TF-

IDF scores during calculation later on. As such, the input going into the TF-IDF program is a folder with

one file called “1.txt” and the rest of the files in that folder being text-only files of engineering exams,

either of the same discipline, or of all other engineering disciplines. The experimenter also creates a

blank file called “OUTPUT.txt” for the words and TF-IDF scores to be output to, at the completion of the

calculations.

Now that the input is fully prepared, the TF-IDF program can be developed and used to assign a

score to each word in that input “1.txt” file, to help characterize key words in that file. As with the text-

extractor program, the TF-IDF program contains are several major components, and include: declaring

variables, memory allocation, calling and using object libraries, creating and using file structures within

the Microsoft Windows® operating system, and several iterative calculations. The differences between

the text-extractor and this TF-IDF program are the type of calculations being performed, the amount of

memory being used, and the object libraries being called. Specifically, the TF-IDF program is more

involved, and uses six major user-coded subroutines instead of one in the previous program.

The TF-IDF program is first prompts the user, using a dialog box, for the folder location where

the input document and its comparator set are stored. This information is used by the program to learn

230

where all of the input documents are coming from, and later used to prepare a dynamically-allocated

array. First, the program counts the number of files in the input folder. It then creates an array with the

first element of each row containing the file path of each file in that folder. For example, row 1 may

contain the header “C:\ExperimentOne\CHE100versusALLCHE\CHE100.txt”. The program then

continues populating the first element of each row with the file path of each file in the folder, until no

more files are left to add. This is the main array that the program will use for calculation in future steps.

The next step is for the program to open “1.txt”, the main input document – the exam that the

user wishes to encode using the TF-IDF algorithm – and inputs the words into the array. In particular,

the TF-IDF program uses the Microsoft Filesystem ReadAllText subroutine to input all text from “1.txt”.

Each word from that file is assigned a unique element in the row; starting from the beginning of the

document, each word gets its own element in the row assigned to that document. As such, the first

word in the document would be assigned row1 column1, and the last word in the document would be

assigned to row1,column X, where X is the total number of words in that document. This program then

moves to the next row in the dynamically-allocated array, and stores the text from the next file in the

input folder in the same manner. After having completed processing all words from all text files in the

input folder, the TF-IDF program now has a large array with the words from the input exam occupying

individual elements in row 1, and all subsequent words from all documents occupying their

corresponding rows and columns. With this large grid now prepared, the program can use a coordinate

system to tactically pinpoint any word given the row and column number as necessary. As with the

previous program, all “int” variables used in the numerous loops and counters have been replaced with

“double” variables to increase the maximum number of row or column entries to 1.79x10308 (int values

can extend to only 2.1x109 in the positive domain).

The first major operation is to get the instances of a particular word. As such, the program

starts at the first element in the array – the first word of the input document – and counts how many

231

times it appears in that document. In addition to the system function called “FileReader”, the advanced

feature used here is the public shared function match called “Regex.Matches”. This system function

buffers the element string, in this case the first word, into memory and then scans across the same row

to count the number of times that identical string appears. Each time the Regex.Matches comes across

the same word, it increases the counter by one. For example, the word “name” can appear ten times in

a document, with the element containing “name” appearing a few times near the beginning of the row,

some near the middle, and a few more near the end; for each occurrence, it increases the counter by

one, ignoring all entries that do not match the word “name”. The program returns a count after each

word, and performs a series of steps before it moves onto the next word.

The next function is the term frequency subroutine. It takes the value just returned by the

previous function, and divides it by the total number of elements in the same row. As such, it is dividing

the number of occurrences of a particular word by the total number of words in the same document.

This function uses Regex.Matches to help distinguish between white space entries, and elements

actually containing a word so that the word count is more accurate. This value, the term frequency (TF),

is stored in another dynamically-allocated array. Here the TF number can be easily mapped using a

coordinate system to the original word in the original array, and is useful for future calculations and

debugging purposes.

In order to generate the inverse document frequency, the TF-IDF program then needs count

how many times each word in the input file – row 1 – occurs over the large initial array. In order to do

this, the program uses the coordinate system along with several loops and conditional statements to

count the number of documents being compared to. Specifically, the program increases a counter for

each instance of a unique file path, along column 1 of the initial array. It then buffers the first word of

the input exam into memory and uses a Regex.Match command to loop through each row, increasing

another counter for the number of times an identical word is found. If the counter increases past 0 for

232

the row being examined, then it means that that row contains at least one instance of the word in the

buffer. As such, the document word count increases by one. This repeats for each row – document in

the folder – until all rows have been searched. In particular, the program is interested in the number of

documents containing at least one instance of the specific word that occurs in the input document. The

number of times that word appears in other documents is not a critical feature, but is stored for future

analysis to help improve the system at a later date. For the purpose of this study, there is now a term

frequency value for each word in the input document, and a value that corresponds to the number of

documents that contains the identical word. The inverse document frequency is then calculated. This is

a value determined by taking the logarithm of the quotient when the total number of files in the input

folder is divided by the number of documents that contain the identical word. Specifically, the IDF is

calculated using the logarithm of the quotient between the upperbound of the number of documents in

the initial array, and dividing it by the number of documents containing the word being investigated.

The TF-IDF score is calculated by multiplying the TF determined earlier by the IDF just calculated. This

TF-IDF score is then printed to the screen, and using the Console.Writeline subroutine. As such, the user

will now see the first element of the input exam and a TF-IDF score next to it printed to their screen.

The program then uses the File.AppendAllText subroutine to append the word, the TF-IDF score, and a

“/n” newline carriage to the originally-blank “OUTPUT.txt” output file.

The TF-IDF program then continues this process for each word in row 1 so that each word in the

input file is assigned a TF-IDF score, printed to the screen, and to the output file. The program exits all

of the loops and commands an exit of the program when the “@” symbol is reached. As noted earlier,

the “@” symbol is at the end of each input-text file, and is used to tell the TF-IDF program that all words

in the exam have been calculated. As such, since there are no instances of the “@” symbol present,

other than at the end of the document, the TF-IDF program now knows that it has calculated a TF-IDF

score for each word in the input document. By choosing which comparator sets are included with the

233

original file, an authentic TF-IDF score can be calculated using context-aware frequency values; the

process is then repeated for the same document using either all exams in engineering or all exams

within the same discipline.

The first post-processing step is to import the data from the text-file. In Excel, this is done by

pointing the import data wizard to the output file, and specifying “space” as the delineating factor. This

causes the data to be inputted as two columns: the first column is each word in the exam, and the

second column is the corresponding TF-IDF score.

Now that the data has been inputted into Excel, the experimenter then deletes any information

in the output.txt file so that the next iteration of the TF-IDF program will have a clean output file. Each

exam is passed through the TF-IDF program twice: one instance is where the input file is compared

against all documents within the same engineering discipline, and another instance is when the file is

compared against all exams in engineering (minus the same discipline). For each iteration, the data is

stored into the spreadsheet created above.

The result of all of the steps in the study so far is now shown: the experimenter now has a list of

words from the input exam, sorted in decreasing order of TF-IDF score with no redundant pairs of cells.

This is the wordlist portion of the study, and completes the computational approach.

234

APPENDIX C.2 – SOFTWARE CODE: COMPUTATIONAL APPROACH

This section presents the code used deploy the modified algorithm based on the Term Frequency-Inverse Document Frequency (TF-IDF) equation.

235

Imports System.IO Imports Microsoft.VisualBasic Imports System Imports System.Text Module Module1 Sub Main() 'Declare our variables Dim DirArray() As String Dim FileArray() As String 'Dim strName As String Dim Mydir As String Dim MyFileName As String Dim First As Boolean Dim rowcnt As Integer Dim FCOUNT As Integer Dim wrdcnt(), wrdinst(), termfrq() As Double Dim i, j, ubDA, ubWA As Long '************************************************************************************************************************** 'Prompt user to enter the particular Folder Name to get files and file-names from folder directory 'strName = InputBox(Prompt:="Please Enter The Folder Name", Title:="Enter The Folder Name") '************************************************************************************************************************** 'Take from folder existing in a particular directory Mydir = "C:\Users\MCCAHAN-LAB\Desktop\Program\CALCULATIONS\MIE350\DISC\" '& strName '************************************************************************************************************************** 'Code below finds all TXT files in specified folder, returns name and file directory location, and stores the results into DirArray() and FileArray() rowcnt = 1 FCOUNT = 0 First = True Do While (1) If First = True Then MyFileName = Dir(Mydir + "\*.txt", vbDirectory) First = False Else MyFileName = Dir() End If If MyFileName = "" Then Exit Do ReDim Preserve DirArray(0 To rowcnt) DirArray(rowcnt) = (Mydir + "\" + MyFileName) ReDim Preserve FileArray(0 To rowcnt) FileArray(rowcnt) = MyFileName rowcnt = rowcnt + 1 Loop '**************************************************************************************************************************

236

'Do Until i = UBound(DirArray() + 1) ' Dim fileReader As String ' fileReader = My.Computer.FileSystem.ReadAllText(DirArray(i)) ' numdocs = UBound(DirArray()) ' Dim WordArray() As String = Split(fileReader, " ") ' Do Until j = UBound(WordArray() + 1) ' wrdcnt(i) = getWordCount(fileReader) ' wrdinst(i) = getInstancesofWord(WordArray(j), fileReader) ' termfrq(i) = TermFrequency(WordArray(j), fileReader) ' j = j + 1 ' Loop ' i = i + 1 'Loop '************************************************************************************************************************** Dim documentlist() As String ubDA = UBound(DirArray) + 1 Dim k As Long k = 1 Dim fileReader As String Do Until k = ubDA ' fileReader = My.Computer.FileSystem.ReadAllText(DirArray(k)) ReDim Preserve documentlist(0 To k) documentlist(k) = fileReader fileReader = "" k = k + 1 Loop Dim numberofarrayelements As Double Dim counterp As Double = 0 Dim TF As Double Dim IDF As Double Dim TFIDF, IOW, WC, NDCW As Double Dim numdocs As Double Dim TFIDFA() As Double ubDA = UBound(DirArray) + 1 i = 1 ' Do Until i = ubDA ' Dim fileReader As String fileReader = My.Computer.FileSystem.ReadAllText(DirArray(i)) Dim Words() As String = Split(fileReader, " ", , CompareMethod.Text) numdocs = UBound(DirArray) numberofarrayelements = UBound(Words) - LBound(Words) ubWA = UBound(Words) + 1

237

Do Until counterp = ubWA TF = TermFrequency(Words(counterp), fileReader) IOW = getInstancesofWord(Words(counterp), fileReader) WC = getWordCount(fileReader) NDCW = numofDocsContainingWord(numdocs, Words(counterp), documentlist) TFIDF = TermFreqInverseDocFreq(Words(counterp), fileReader, numdocs, documentlist) ' ReDim Preserve TFIDFA(0 To counterp) ' TFIDFA(counterp) = TFIDF Console.WriteLine(Words(counterp) & " " & TFIDF) File.AppendAllText("C:\Users\MCCAHAN-LAB\Desktop\Program\OUTPUTTEMP\MEOW.txt", Words(counterp) & " " & TFIDF & Environment.NewLine) counterp += 1 Loop i += 1 counterp = 0 ' Loop Console.ReadLine() End Sub '***CALCULATES NUMBER OF WORDS IN THE INPUT WHILE CHECKING FOR ERRORS*** Public Function getWordCount(ByVal InputString As String) As Double Dim TotalWords As Double 'Finds the Total Number of character strings separated by a space Dim strtest() As String Dim u As Integer TotalWords = System.Text.RegularExpressions.Regex.Matches(InputString, "\w+").Count ' Dim WordsWithNumbers As Double 'Finds the Total Number of character strings containing a numeric character ' WordsWithNumbers = System.Text.RegularExpressions.Regex.Matches(InputString, "\d+").Count Return TotalWords End Function '***CALCULATES INSTANCES OF A PARTICULAR WORD IN A STRING*** Public Function getInstancesofWord(ByVal InputWord As String, ByVal InputString As String) As Double Dim TotalInstancesofWord As Double TotalInstancesofWord = System.Text.RegularExpressions.Regex.Matches(InputString, InputWord).Count Return TotalInstancesofWord End Function '***CALCULATES THE NUMBER OF DOCUMENTS THAT CONTAIN A PARTICULAR WORD*** Public Function numofDocsContainingWord(ByVal numDocuments As Double, ByVal InputWord As String, ByVal documentList() As String) As Double Dim counterj As Double = 1 Dim count As Double = 0 While (counterj <= numDocuments - 1) If getInstancesofWord(InputWord, documentList(counterj)) > 0 Then

238

count += 1 End If counterj += 1 End While Return count End Function '**CALCULATES THE FREQUENCY OF A WORD IN A STRING** Function TermFrequency(ByVal InputWord As String, ByVal InputString As String) As Double Dim TFreq As Double TFreq = (getInstancesofWord(InputWord, InputString)) / (getWordCount(InputString)) Return TFreq End Function '**CALCULATES THE INVERSE-DOCUMENT FREQUENCY OF THE WORD IN THE ARRAY Public Function InverseDocFrequency(ByVal InputWord As String, ByVal numDocuments As Double, ByVal documentList As Array) As Double Dim IDF As Double IDF = Math.Log(UBound(documentList) / numofDocsContainingWord(numDocuments, InputWord, documentList)) Return IDF End Function '**CALCULATES THE TERM-FREQUENCY-INVERSE-DOCUMENT-FREQUENCY OF A WORD IN THE ARRAY Public Function TermFreqInverseDocFreq(ByVal InputWord As String, ByVal InputString As String, ByVal numDocuments As Double, ByVal documentList As Array) As Double Dim TFIDF As Double TFIDF = TermFrequency(InputWord, InputString) * InverseDocFrequency(InputWord, numDocuments, documentList) Return TFIDF End Function End Module

239

APPENDIX D.1 – SAMPLE PARTICIPANT-RECRUITMENT EMAIL

This section presents a sample email sent to potential participants for the evaluation study.

240

Dear Professor <surname>,

I am a Ph.D. student in the Department of Mechanical and Industrial Engineering, University of Toronto. This email is to request your participation in a short study to test the validity and efficacy of a course-specific wordlist generating software program that I have developed for my Ph.D. dissertation. Specifically, the program creates ranked word lists of vocabulary on publically-available UofT engineering exams in order to identify course-specific vocabulary. The goal is to give instructors a tool to automatically identify requisite technical vocabulary that students ought to be familiar with by the end of a specific course; the software helps students develop a robust technical vocabulary, while reducing learning barriers due to inaccessible language. In particular, I need you to gauge whether this program works for your course, as you are the best subject-matter expert in this regard.

Specifically, I am seeking your expertise as a professor for <course code> to help determine how integral certain words are to that course’s curriculum. Participation in this online web-accessible study will take up only 50 minutes of your time and can be completed at your convenience. Your participation would involve completion of an online survey especially designed for you, using Google Forms. In the survey you will be presented with 100 different words found in the final exam for a course you are the instructor for, and asked to click checkboxes on a five-point scale based on how specific each word is to <course code’s> curriculum. This component of my research study is critical to gauging the effectiveness of my automated course-specific wordlist generating program.

I would really appreciate it if you would agree to become a study participant. If you are interested in participating, or have further questions about my dissertation or the study in general, please reply to this email, or feel free to contact me at [email protected] or 555-555-5555.

Thank you so much for your time, and I look forward to hearing from you.

Sincerely,

Chirag Variawa

241

mailto:[email protected]

APPENDIX D.2 – INFORMED CONSENT

This section presents the informed consent form signed by each participant in the evaluation study.

242

INFORMED CONSENT FORM Please read the ENTIRE form before answering the question at the bottom. Answering the question at the bottom of this page means that you are confirming that you have read and understood this entire form and had any questions about this study satisfactorily answered. Introductory Information The purpose of the following survey is to determine which of the following words are central to the course, Terrestrial Energy Systems. The results of this survey will be used to help validate the output of a computer program that ranks words in an attempt to create course-specific vocabulary lists. These vocabulary lists can be used as teaching aids to help improve accessibility to information taught in courses. You have been requested to participate because of your knowledge of the Terrestrial Energy Systems content and all of the vocabulary pertaining to that course. This survey should take up at most 50 minutes of your time. You may complete the survey whenever and wherever you wish. However, we ask that you complete the entire survey in one sitting to improve the consistency of the data. Participation and Withdrawal Your participation in this study is voluntary. You may refuse to participate, withdraw at any time, or decline to answer any questions without any negative consequences. If you wish to withdraw at any point, you may do so by sending an email to [email protected] that states that you wish to withdraw. Any and all data pertaining to you would then be removed and deleted from this study. Risks/Benefits There are no reasonably foreseeable risks or harm related to participating in this study. No payments or compensation will be given for participation in this study. However, after the study has been completed, a summary of the results will be emailed to you, along with the coursespecific vocabulary list developed for CIV300, as determined by the computer program and algorithm. You may use this information and the wordlist for future teaching, or in whatever other way you see fit. Access to Information, Confidentiality, and Publication of Results The data for this study will be accessible only to the investigator, Chirag Variawa, his advisor, and a summer undergraduate student for the duration of her term of work. Upon completion of this specific study, estimated to be July 2014, all data pertaining to this study will be anonymized and made accessible only to the supervisor. The results of this study, however, will likely be published and used in public presentations. Due to the nature of this specific survey, your name and this course code will not be anonymized, and your participation in this study is not confidential. However, the researchers will NOT disseminate your name or the course code as part of the outcomes of this study. Fictitious course codes and course titles will be generated for purposes of dissemination.

243

Questions and Contact Information If you have any questions about the terms of consent or this study in general, please feel to contact the research team through the following email address: [email protected] If you have any questions about your rights as participants of a study, you can contact the Office of Research Ethics at [email protected] or 416-946-3273. As stated above, a copy of this form will be sent to your email for your own reference. If you would like a hard copy of this form, please contact the research team at [email protected] so that your request can be fulfilled. Consent Do you, XXXX XXXX, consent to the above terms and agree to participate in this study? * If you click 'YES', and then 'Continue', you will be taken to the first page of the survey. If you circle 'NO', then the survey will not be administered. By circling 'YES', you will be agreeing that you have you read and understood this ENTIRE form. YES NO

244

APPENDIX D.3 – INSTRUCTIONS AND SCALE USED FOR EVALUATION STUDY

This section presents the instructions and scale provided to each participant of the evaluation study.

245

The following words or acronyms are presented to you in alphabetical order.

Please rate them from 1 to 5 using the following scale.

1 - I STRONGLY DISAGREE that this is a course-specific word. This word is not specifically relevant to the content of my course.

2 - I DISAGREE that this is a course-specific word.

3 - I am UNDECIDED whether this is a course-specific word.

4 - I AGREE that this is a course-specific word.

5 - I STRONGLY AGREE that this is a course-specific word. This term is central and specific to the content of my course.

246

APPENDIX D.4 – SAMPLE WORDLIST

This section presents a sample survey given to one participant in the evaluation study.

about acwp and architects base bcwp bidders boeck bore boxshield bucket build casting city civil cleats compura concrete construction consultant corrosion costing cover cpm crane cranes

crashing crawler crosssection does doyles drill estimate excavate excavated excavation excavator faculty filling find footing formwork foundation fumes gaining garage grout hammerhead haul hawthorne here highways

247

holdback incremental indicators is islands items lower luffing material metre might network pave payments performing precast predecessor project proposing pumps scaffold scheduling seal shaded

shoring shortfall side sideboom similarities sketch soil stated steel storey structural tables their these time trench tunnel tunnels type utilities vaning voids waste yes

248

APPENDIX D.5 – COMPLETE WORDLIST

This section presents a one complete wordlist produced by the computational approach (sorted by decreasing TF-IDF score), and used to create the sample wordlist for the evaluation study. The colours represent the quintile that the word is assigned to; red (5), yellow (4), green (3), blue (2), and purple (1).

Rank Word TF-IDFmod 1 civ 0.056646635 2 concrete 0.015357495 3 excavate 0.009444958 4 construction 0.008459422 5 crane 0.007979697 6 similarities 0.00692172 7 tunnel 0.006666238 8 build 0.006388426 9 excavation 0.005518436

10 footing 0.005484795 11 excavated 0.005302422 12 holdback 0.005003342 13 formwork 0.004482372 14 storey 0.004384565 15 civil 0.004357722 16 soil 0.003602975 17 acwp 0.00354386 18 grout 0.003432802 19 metre 0.00339367 20 project 0.003354013 21 designbuild 0.003182623 22 bore 0.002921799 23 liner 0.002876685 24 mccabe 0.002725251 25 foundation 0.002702925 26 days 0.002530305 27 building 0.002481471 28 shoring 0.002481035 29 contractor 0.002434942 30 precast 0.002411362

31 cost 0.002405664 32 airport 0.002399101 33 bcwp 0.002362574 34 readymix 0.002332372 35 ofciv 0.002131704 36 island 0.001928155 37 similar 0.001895199 38 costs 0.001881067 39 passengers 0.001842953 40 struts 0.001831161 41 rakers 0.001831161 42 cpm 0.001758225 43 civf 0.00173563 44 mday 0.001653446 45 schedule 0.00158489 46 houston 0.001542524 47 section 0.001486652 48 contract 0.001475886 49 shown 0.001449173 50 drifts 0.001444948 51 jib 0.001412156 52 install 0.001376665 53 deep 0.00137432 54 fill 0.00130245 55 port 0.001283033 56 shore 0.001272473 57 highway 0.001228635 58 finish 0.001214661 59 safety 0.00118392 60 waterproof 0.001181287 61 designbidbuild 0.001181287

249

62 falsework 0.001181287 63 landside 0.001181287 64 bcws 0.001181287 65 bonnyville 0.001181287 66 design 0.001170817 67 paving 0.001156893 68 tower 0.001131172 69 wall 0.001108628 70 roof 0.001055789 71 restraint 0.001011111 72 bidding 0.001011111 73 bac 0.00101024 74 bid 0.001005988 75 site 0.001002769 76 day 0.000977339 77 total 0.000973858 78 way 0.000960041 79 partnerships 0.000941438 80 arrest 0.000941438 81 bottom 0.000925113 82 wales 0.00091558 83 travel 0.000908589 84 she 0.00088307 85 later 0.000882428 86 assumptions 0.000880653 87 mobilize 0.000879113 88 column 0.000876162 89 ground 0.000872169 90 act 0.000869316 91 rebar 0.000861655 92 parking 0.000861655 93 lien 0.000850212 94 street 0.000843786 95 truck 0.000837584 96 soldier 0.000830493 97 spent 0.000825415 98 owner 0.000813705 99 free 0.000798794

100 bidder 0.000797119 101 sidewalks 0.000795656 102 gravel 0.000795656

103 undertaking 0.000795656 104 ohsa 0.000795656 105 certified 0.000795656 106 ytz 0.000771262 107 excavator 0.000771262 108 cranes 0.000771262 109 water 0.000735917 110 department 0.000733905 111 metres 0.000733457 112 critical 0.000726156 113 western 0.000726096 114 ferry 0.000710568 115 planks 0.000710568 116 noticed 0.000710568 117 contractors 0.000710568 118 inspector 0.000710568 119 requirements 0.000695017 120 gross 0.000686612 121 continue 0.000686068 122 procurement 0.000675731 123 yellow 0.000672443 124 sections 0.00066208 125 safely 0.000641472 126 downtown 0.000640053 127 strength 0.000636313 128 reasonable 0.000635717 129 vehicles 0.000630138 130 esi 0.000627198 131 cables 0.000618833 132 mechanically 0.000616063 133 factor 0.000613516 134 columns 0.000613118 135 show 0.000604977 136 method 0.000592945 137 piles 0.000590643 138 torontos 0.000590643 139 dispute 0.000590643 140 condominium 0.000590643 141 subdivision 0.000590643 142 vaning 0.000590643 143 bishop 0.000590643

250

144 partnered 0.000590643 145 walkways 0.000590643 146 concreting 0.000590643 147 timeline 0.000590643 148 digs 0.000590643 149 demobilize 0.000590643 150 nimiber 0.000590643 151 univerity 0.000590643 152 assumptios 0.000590643 153 collectie 0.000590643 154 crawler 0.000590643 155 hawthorne 0.000590643 156 crtificat 0.000590643 157 freeonboard 0.000590643 158 similrities 0.000590643 159 afsuming 0.000590643 160 drainagewaterproofing 0.000590643 161 grond 0.000590643 162 reiriforcing 0.000590643 163 infractions 0.000590643 164 costplus 0.000590643 165 activltv 0.000590643 166 ouration 0.000590643 167 finisll 0.000590643 168 pagtt 0.000590643 169 diys 0.000590643 170 becomescritical 0.000590643 171 depository 0.000590643 172 eeoc 0.000590643 173 jurisdictional 0.000590643 174 sideboom 0.000590643 175 lanyard 0.000590643 176 fallarrest 0.000590643 177 cleats 0.000590643 178 doyles 0.000590643 179 primavera 0.000590643 180 overead 0.000590643 181 iciv 0.000590643 182 orillia 0.000590643 183 rsmeans 0.000590643 184 thecostcapacity 0.000590643

185 coststate 0.000590643 186 moreexpensivethan 0.000590643 187 boeck 0.000590643 188 boxshield 0.000590643 189 sews 0.000590643 190 luffing 0.000590643 191 hammerhead 0.000590643 192 crashing 0.000590643 193 compura 0.000590643 194 kingston 0.000590643 195 grassy 0.000590643 196 can 0.000590388 197 slope 0.000589818 198 ree 0.000579737 199 clause 0.000566969 200 summarize 0.000560274 201 pave 0.000557746 202 material 0.000554249 203 garage 0.00054505 204 up 0.000543229 205 tunnels 0.000536718 206 under 0.000533299 207 structural 0.000532638 208 theyve 0.000531413 209 incurred 0.000531413 210 bidders 0.000531413 211 table 0.000530947 212 short 0.00053094 213 crash 0.000518575 214 scale 0.000516984 215 diameter 0.000514206 216 highways 0.000505556 217 drawing 0.000504822 218 estimate 0.000495166 219 investigation 0.000494393 220 shaded 0.000491125 221 plan 0.000489963 222 crosssection 0.000475033 223 haul 0.000470719 224 gaining 0.000470719 225 conveniently 0.000470719

251

226 trench 0.000470719 227 built 0.000461135 228 tape 0.000457373 229 location 0.000450455 230 steel 0.000441077 231 bucket 0.000440226 232 eciv 0.000439556 233 estimated 0.000437159 234 quantities 0.000431835 235 streets 0.000425126 236 oec 0.000412708 237 pump 0.000410161 238 summarized 0.00040755 239 drawings 0.000402537 240 top 0.000395048 241 truckloads 0.000385631 242 wbs 0.000385631 243 civq 0.000385631 244 activitys 0.000385631 245 crashed 0.000385631 246 costtime 0.000385631 247 yonge 0.000385631 248 bloor 0.000385631 249 handy 0.000385631 250 birds 0.000385631 251 recev 0.000385631 252 trades 0.000385631 253 kcc 0.000385631 254 planning 0.000385135 255 will 0.000383527 256 constructed 0.000374468 257 name 0.000372915 258 corrosion 0.000365225 259 glue 0.000361237 260 finishing 0.000361237 261 agreement 0.000361237 262 airside 0.000361237 263 have 0.000359591 264 learning 0.000354468 265 experienced 0.00034663 266 inspect 0.000345029

267 drill 0.000341965 268 implied 0.000340038 269 placed 0.000336787 270 underground 0.000331706 271 swell 0.000319632 272 undertaken 0.000319632 273 weather 0.000319632 274 sheet 0.000318478 275 documents 0.0003113 276 framework 0.000308031 277 behind 0.000304857 278 office 0.000301028 279 was 0.000295392 280 number 0.000287379 281 youve 0.000286592 282 december 0.000282884 283 authority 0.000282612 284 include 0.000280773 285 are 0.000279719 286 toronto 0.000275734 287 sides 0.000275569 288 dec 0.000272418 289 graded 0.000271933 290 view 0.00027095 291 use 0.0002676 292 completed 0.000267416 293 voids 0.000265706 294 essentials 0.000265706 295 billy 0.000265706 296 mainland 0.000265706 297 mobilization 0.000265706 298 scissor 0.000265706 299 billed 0.000265706 300 pleased 0.000265706 301 invoiced 0.000265706 302 expensive 0.00025958 303 utility 0.00025958 304 circle 0.000257596 305 using 0.000257345 306 engineer 0.00025569 307 not 0.00025365

252

308 bar 0.000252651 309 original 0.000251516 310 student 0.000251274 311 management 0.00025102 312 lagging 0.000249715 313 if 0.000248702 314 area 0.000246474 315 last 0.000244553 316 over 0.000243969 317 do 0.00024391 318 calculations 0.000243299 319 measures 0.000242032 320 been 0.000241937 321 examiner 0.000237079 322 waste 0.000229444 323 decided 0.000229238 324 fairnes 0.000229238 325 hour 0.000227956 326 suggested 0.000223381 327 ring 0.000220783 328 taken 0.000219461 329 engineering 0.000217878 330 nothing 0.000217524 331 materials 0.000217487 332 little 0.000213153 333 casting 0.000211781 334 copies 0.000211781 335 manage 0.000208956 336 term 0.000204446 337 nor 0.000196007 338 quality 0.000184666 339 square 0.000184402 340 least 0.000182204 341 corner 0.000180618 342 overlapping 0.000180618 343 filling 0.000180618 344 competent 0.000180618 345 carry 0.000174643 346 available 0.00016816 347 need 0.000165931 348 how 0.000165507

349 were 0.000163972 350 suggest 0.000161644 351 answers 0.00015985 352 city 0.000157882 353 brought 0.000156944 354 end 0.000154906 355 during 0.000154296 356 together 0.000153823 357 completing 0.000152429 358 ff 0.000150026 359 new 0.000146704 360 your 0.000146638 361 islands 0.000145782 362 proposing 0.000145782 363 late 0.000141608 364 ignore 0.000141598 365 meet 0.000140844 366 lift 0.000139281 367 this 0.000136789 368 expected 0.000136364 369 options 0.000136271 370 payments 0.000135025 371 high 0.000133606 372 no 0.000132071 373 attached 0.000131006 374 utilities 0.00012979 375 long 0.000129428 376 but 0.000128311 377 network 0.000127783 378 incremental 0.000126693 379 profile 0.000126693 380 simplified 0.000123193 381 mens 0.000120912 382 provided 0.000120422 383 asked 0.000120174 384 right 0.000119825 385 plot 0.000119294 386 due 0.000118172 387 made 0.000117484 388 flights 0.000114619 389 fumes 0.000114619

253

390 extensively 0.000114619 391 pictured 0.000114619 392 fairness 0.000114619 393 indicators 0.000114619 394 take 0.000113715 395 exam 0.00011172 396 tables 0.00011161 397 details 0.000110945 398 understand 0.000105335 399 engineers 0.00010436 400 sketch 0.00010396 401 must 0.000103041 402 j 9.81113E-05 403 direct 9.68097E-05 404 university 9.59314E-05 405 slightly 9.55306E-05 406 productivity 9.55306E-05 407 added 9.50926E-05 408 protect 9.44452E-05 409 hours 9.31326E-05 410 driven 9.27002E-05 411 away 9.25263E-05 412 factory 9.02956E-05 413 cover 9.01614E-05 414 five 8.76356E-05 415 public 8.66942E-05 416 costing 8.64292E-05 417 scheduled 8.64292E-05 418 effect 8.51011E-05 419 near 8.48745E-05 420 held 8.48518E-05 421 detail 8.37709E-05 422 below 8.32747E-05 423 diagram 8.298E-05 424 scenario 8.26128E-05 425 determine 8.26128E-05 426 college 8.10999E-05 427 spi 8.08597E-05 428 required 8.0641E-05 429 million 7.97824E-05 430 including 7.88868E-05

431 scaffold 7.75996E-05 432 year 7.7173E-05 433 maximum 7.56574E-05 434 scheduling 7.40391E-05 435 per 7.12205E-05 436 consultant 7.07209E-05 437 according 6.79218E-05 438 pay 6.76253E-05 439 lf 6.66396E-05 440 purposes 6.52095E-05 441 open 6.44617E-05 442 techniques 6.39986E-05 443 space 6.29956E-05 444 base 6.01693E-05 445 would 5.69347E-05 446 be 5.67737E-05 447 closed 5.62524E-05 448 attention 5.259E-05 449 believe 5.18642E-05 450 seal 5.09956E-05 451 extra 4.98296E-05 452 preference 4.86199E-05 453 ladder 4.86199E-05 454 page 4.74948E-05 455 continuous 4.73083E-05 456 after 4.40676E-05 457 while 4.30683E-05 458 specifications 4.16052E-05 459 calculate 4.1298E-05 460 applied 3.87469E-05 461 better 3.80104E-05 462 before 3.74146E-05 463 circumstances 3.70196E-05 464 assurance 3.70196E-05 465 as 3.67345E-05 466 lob 3.55066E-05 467 permitted 3.36843E-05 468 special 3.27156E-05 469 job 3.25039E-05 470 mark 3.21408E-05 471 k 3.14642E-05

254

472 pumps 2.95313E-05 473 assuming 2.9374E-05 474 typical 2.79879E-05 475 assume 2.74957E-05 476 for 2.71397E-05 477 incident 2.68546E-05 478 aid 2.67784E-05 479 case 2.67199E-05 480 three 2.62673E-05 481 why 2.57423E-05 482 used 2.50566E-05 483 has 2.48354E-05 484 far 2.11992E-05 485 architects 1.51006E-05 486 collective 1.51006E-05 487 unionized 1.51006E-05 488 looked 1.51006E-05 489 client 1.41649E-05 490 types 1.40718E-05 491 answer 1.10522E-05 492 also 9.9191E-06 493 criteria 8.14099E-06 494 demonstrate 6.7685E-06 495 early 4.99649E-06 496 outside 4.72164E-06 497 typically 2.68884E-06 498 crews 2.68273E-06 499 workers 2.68273E-06 500 hire 1.34137E-06 501 inserted -1.33531E-06 502 book -2.00636E-06 503 hard -2.25624E-06 504 above -2.36973E-06 505 question -3.43808E-06 506 trade -4.95016E-06 507 there -5.28964E-06 508 performance -5.65828E-06 509 ec -6.45721E-06 510 normal -7.85311E-06 511 value -7.8695E-06 512 manager -8.57366E-06

513 situation -8.86899E-06 514 allowing -1.18062E-05 515 property -1.24822E-05 516 discuss -1.26828E-05 517 ef -1.2936E-05 518 fall -1.33407E-05 519 units -1.36376E-05 520 point -1.51391E-05 521 performing -1.69058E-05 522 discu -1.90243E-05 523 one -1.95518E-05 524 between -1.99948E-05 525 constraints -2.06736E-05 526 only -2.12218E-05 527 map -2.17414E-05 528 best -2.18332E-05 529 predecessor -2.4394E-05 530 especially -2.4394E-05 531 gof -2.4394E-05 532 shortfall -2.4394E-05 533 achieve -2.67697E-05 534 items -2.77361E-05 535 place -2.80918E-05 536 eac -2.84732E-05 537 doing -3.21238E-05 538 elements -3.28051E-05 539 needed -3.33653E-05 540 side -3.61072E-05 541 private -3.63341E-05 542 lower -3.6468E-05 543 is -3.7774E-05 544 tank -3.78264E-05 545 give -3.94277E-05 546 might -4.03855E-05 547 rank -4.07391E-05 548 budget -4.09127E-05 549 qc -4.2325E-05 550 about -4.56817E-05 551 faculty -4.59875E-05 552 reduce -4.61297E-05 553 occur -4.78536E-05

255

554 summary -4.80683E-05 555 shortfall -2.4394E-05 556 filled -4.87881E-05 557 tcpi -4.87881E-05 558 sloping -4.87881E-05 559 when -4.90922E-05 560 purpose -5.15204E-05 561 arch -5.25679E-05 562 works -5.29039E-05 563 run -5.33299E-05 564 contact -5.44983E-05 565 from -5.68757E-05 566 until -5.76617E-05 567 level -5.77997E-05 568 properly -5.92308E-05 569 them -5.98388E-05 570 another -6.0289E-05 571 linear -6.19014E-05 572 resulting -6.19568E-05 573 completion -6.33948E-05 574 could -6.56103E-05 575 tell -6.6864E-05 576 see -6.71569E-05 577 plants -6.99873E-05 578 willing -6.99873E-05 579 development -7.0608E-05 580 characteristics -7.10971E-05 581 x -7.11932E-05 582 outlined -7.29359E-05 583 saddle -7.31821E-05 584 program -7.50767E-05 585 form -7.56358E-05 586 person -7.67749E-05 587 problem -7.71345E-05 588 values -7.7498E-05 589 eg -7.77208E-05 590 science -7.81278E-05 591 relationship -7.85872E-05 592 pays -8.03662E-05 593 mass -8.07303E-05 594 than -8.11601E-05

595 fast -8.2905E-05 596 attributes -8.2905E-05 597 bank -8.37465E-05 598 increasing -8.39865E-05 599 learned -8.44179E-05 600 going -8.465E-05 601 released -8.54223E-05 602 pour -8.58076E-05 603 cannot -8.6893E-05 604 pipe -8.73754E-05 605 make -8.75435E-05 606 all -8.91625E-05 607 weaker -9.03933E-05 608 swing -9.03933E-05 609 dummy -9.03933E-05 610 durations -9.03933E-05 611 begin -9.17347E-05 612 acceptable -9.307E-05 613 to -9.37489E-05 614 off -9.44908E-05 615 price -9.4909E-05 616 boss -9.61366E-05 617 places -9.62503E-05 618 invoice -9.75761E-05 619 two -9.77922E-05 620 lsi -9.87254E-05 621 awarded -0.000100092 622 across -0.000100092 623 task -0.000100092 624 of -0.000100259 625 element -0.000100425 626 information -0.000101121 627 many -0.000101212 628 or -0.000101561 629 requires -0.000102805 630 know -0.000105651 631 advantages -0.000106385 632 issues -0.000106829 633 expect -0.000107892 634 es -0.000108072 635 conventional -0.000109482

256

636 week -0.000112547 637 first -0.000112718 638 said -0.000113156 639 it -0.000113589 640 preferred -0.000114068 641 boring -0.000118462 642 overheads -0.000118462 643 mayor -0.000118583 644 reduced -0.000119487 645 second -0.000121465 646 being -0.000122041 647 evaluation -0.0001224 648 which -0.000122924 649 control -0.000123139 650 may -0.000125243 651 business -0.000127413 652 empty -0.000127413 653 at -0.000128254 654 removed -0.000128498 655 through -0.000129065 656 any -0.000129744 657 terminal -0.000131731 658 by -0.000132019 659 choice -0.000132245 660 general -0.00013255 661 sv -0.000137858 662 unit -0.000137935 663 its -0.000138112 664 means -0.000139975 665 once -0.000140602 666 problems -0.000140672 667 the -0.000141274 668 generally -0.000144185 669 substantially -0.000144319 670 both -0.000146335 671 tf -0.000147915 672 so -0.000148444 673 report -0.000148461 674 main -0.000148686 675 an -0.00015104 676 produce -0.000151278

677 ways -0.000151362 678 difficult -0.000152422 679 companies -0.000158749 680 lot -0.000160529 681 working -0.000160732 682 paid -0.000160732 683 application -0.000162546 684 code -0.000165427 685 received -0.000165454 686 ever -0.000167771 687 employee -0.000167993 688 indirect -0.000167993 689 except -0.000169738 690 that -0.000169918 691 contain -0.00017051 692 final -0.000171212 693 data -0.000171627 694 work -0.000173109 695 time -0.000174359 696 type -0.000180481 697 cv -0.000180787 698 become -0.000180787 699 and -0.000181211 700 moving -0.000185656 701 sequence -0.000186215 702 here -0.000187044 703 yes -0.000187792 704 rate -0.000189324 705 comparisons -0.000189912 706 window -0.000189912 707 each -0.000190538 708 production -0.00019773 709 physical -0.000198486 710 then -0.000198838 711 operate -0.000202564 712 risk -0.00020463 713 crew -0.000207403 714 operation -0.000208198 715 pulp -0.000210318 716 vour -0.000210318 717 their -0.000210619

257

718 find -0.000211213 719 handle -0.000214235 720 smaller -0.000215899 721 b -0.000218379 722 g -0.000222005 723 these -0.000223202 724 allowed -0.000225003 725 picture -0.000228135 726 expenses -0.000229407 727 profit -0.000234468 728 does -0.000235376 729 shafts -0.000236923 730 earned -0.000237266 731 stated -0.000238508 732 description -0.00024148 733 industry -0.000242956 734 hit -0.000245178 735 communication -0.000245398 736 describe -0.000246207 737 profits -0.000247337 738 beneficial -0.000247337 739 such -0.00025046 740 every -0.000252268 741 largest -0.000254216 742 inside -0.000254216 743 books -0.000254216 744 went -0.000254826 745 now -0.000257423 746 should -0.000257946 747 booklet -0.000258127 748 help -0.000262689 749 latest -0.000264243 750 choic -0.000264489 751 in -0.000264775 752 win -0.000269407 753 into -0.000269952 754 m -0.000270374 755 dates -0.000277105 756 credit -0.000278674 757 did -0.000278935 758 differences -0.000279062

759 doesnt -0.000280235 760 company -0.000283124 761 arrow -0.000287918 762 difference -0.000292583 763 qa -0.000295406 764 quantity -0.000295638 765 series -0.000305104 766 cells -0.000305785 767 seven -0.000314494 768 define -0.000316156 769 triangle -0.000316796 770 equipment -0.00031775 771 ms -0.000318418 772 big -0.000318671 773 ls -0.000321633 774 lowest -0.000323596 775 call -0.000324375 776 net -0.000329834 777 where -0.000329848 778 takes -0.000331702 779 small -0.000331905 780 concept -0.000332475 781 chief -0.000335986 782 bold -0.000335986 783 path -0.000338153 784 more -0.0003386 785 printed -0.00034102 786 local -0.000342372 787 updated -0.000349331 788 grown -0.000349331 789 entered -0.000353411 790 flow -0.000355385 791 rings -0.000358432 792 wave -0.000359246 793 dur -0.000362229 794 much -0.000374646 795 date -0.000374901 796 none -0.000375272 797 manufacturing -0.000381969 798 start -0.000399632 799 fin -0.000413825

258

800 on -0.000416427 801 goes -0.000417785 802 go -0.000422677 803 wrong -0.000425029 804 electric -0.000426298 805 with -0.000428941 806 substantial -0.000432956 807 because -0.000437054 808 batch -0.000439005 809 overhead -0.000451967 810 logic -0.000457738 811 delivered -0.000460924 812 next -0.000463833 813 currently -0.000465225 814 other -0.0004832 815 who -0.000483262 816 software -0.000489114 817 f -0.00048955 818 lag -0.000491088 819 star -0.000492486 820 duty -0.000494675 821 her -0.00050156 822 cell -0.000502531 823 state -0.000508474 824 told -0.000526923 825 kings -0.000528487 826 explain -0.00054289 827 d -0.000543948 828 examiners -0.00055315 829 h -0.000562602 830 tracking -0.000566418 831 balance -0.00058306 832 they -0.000593565 833 create -0.000595617 834 s -0.000607414

835 meaning -0.00060969 836 very -0.000611945 837 vac -0.000615635 838 labour -0.000641241 839 put -0.000661731 840 plant -0.00066507 841 vou -0.000678608 842 definitions -0.000678928 843 l -0.000685939 844 estimating -0.00068822 845 c -0.000760889 846 what -0.000799191 847 dig -0.00084458 848 gap -0.000863595 849 cash -0.000911956 850 terms -0.00103995 851 activities -0.001120939 852 t -0.00116309 853 mob -0.001201179 854 a -0.00122672 855 set -0.001269138 856 drift -0.001272467 857 complete -0.001296933 858 i -0.001298824 859 index -0.00130204 860 shaft -0.001374978 861 union -0.001737676 862 e -0.001749475 863 cpi -0.001772435 864 duration -0.002160036 865 activity -0.003442018 866 float -0.004356521 867 -0.018155076

@

259

APPENDIX E – INDIVIDUAL COURSE STATISTICS

This section presents the statistical measure (Pearson Correlation) for the individual exams used for the study presented in Chapter 5.

260

Correlations Code = CHE412

Descriptive Statisticsa

Mean Std. Deviation N

Participant 2.43 1.653 100

Quintile 3.60 1.287 100

a. Code = CHE412

Correlationsa

Participant Quintile

Participant

Pearson Correlation 1 .642**

Sig. (2-tailed) .000

N 100 100

Quintile

Pearson Correlation .642** 1


N 100 100

**. Correlation is significant at the 0.01 level (2-tailed).

a. Code = CHE412

Code = CIV100




Quintile 3.60 1.287 100

a. Code = CIV100

Correlationsa


Participant



N 100 100

Quintile



N 100 100


a. Code = CIV100

Code = CIV280




Quintile 3.60 1.287 100

a. Code = CIV280

Correlationsa


Participant



N 100 100

Quintile



N 100 100


a. Code = CIV280

261

Code = ECE110 Descriptive Statisticsa



Quintile 3.60 1.287 100

a. Code = ECE110

Correlationsa


Participant



N 100 100

Quintile



N 100 100


a. Code = ECE110

Code = ECE221




Quintile 3.60 1.287 100

a. Code = ECE221

Correlationsa


Participant



N 100 100

Quintile



N 100 100


a. Code = ECE221

Code = MIE242




Quintile 3.60 1.287 100

a. Code = MIE242

Correlationsa


Participant



N 100 100

Quintile



N 100 100


a. Code = MIE242

Code = MIE262




Quintile 3.60 1.287 100

a. Code = MIE262

Correlationsa


Participant



N 100 100

Quintile Pearson Correlation .452** 1


262

N 100 100


a. Code = MIE262

Code = MIE350



Participant 1.48 .990 100

Quintile 3.60 1.287 100

a. Code = MIE350

Correlationsa


Participant



N 100 100

Quintile



N 100 100


a. Code = MIE350

Code = MSE101




Quintile 3.60 1.287 100

a. Code = MSE101

Correlationsa


Participant



N 100 100

Quintile



N 100 100


a. Code = MSE101

263