Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Investigating the Language of Engineering Education
by
Chirag Variawa
A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy
Department of Mechanical and Industrial Engineering University of Toronto
© Copyright by Chirag Variawa 2014
ii
Investigating the Language of Engineering Education
Chirag Variawa
Doctor of Philosophy
Department of Mechanical and Industrial Engineering University of Toronto
2014
Abstract
A significant part of professional communication development in engineering is the ability to
learn and understand technical vocabulary. Mastering such vocabulary is often a desired
learning outcome of engineering education. In promoting this goal, this research investigates
the development of a tool that creates wordlists of characteristic discipline-specific
vocabulary for a given course. These wordlists explicitly highlight requisite vocabulary
learning, and when used as a teaching aid, can promote greater accessibility in the learning
environment.
Literature, including work in higher education, diversity and language learning, suggest that
designing accessible learning environments can increase the quality of instruction and
learning for all students. Studying the student/instructor interface using the framework of
Universal Instructional Design identified vocabulary learning as an invisible barrier in
engineering education. A preliminary investigation of this barrier suggested that students
have difficulty assessing their understanding of technical vocabulary. Subsequently,
computing word frequency on engineering course material was investigated as an approach
for characterizing this barrier. However, it was concluded that a more nuanced method was
necessary.
iii
This research program was built on previous work in the fields of linguistics and computer
science, and lead to the design of an algorithm. The developed algorithm is based on a
statistical technique called, Term Frequency Inverse Document-Frequency. Comparator sets
of documents are used to hierarchically identify characteristic terms on a target document,
such as course materials from a previous term of study. The approach draws on a
standardized artifact of the engineering learning environment as its dataset; a repository of
2254 engineering final exams from the University of Toronto, to process the target material.
After producing wordlists for ten courses, with the goal of highlighting characteristic
discipline-specific terms, the effectiveness of the approach was evaluated by comparing the
computed results to the judgment of subject-matter experts. The overall data show a good
correlation between the program and the subject-matter experts. The results indicated a
balance between accuracy and feasibility, and suggested that this approach could mimic
subject-matter expertise to create a list discipline-specific vocabulary from course materials.
iv
Acknowledgments
This research was made possible by the invaluable counsel from Prof. Susan McCahan, Prof. Mark Chignell, Prof. Michael Grüninger, Prof. Eunice Jang, Prof. Greg Jamieson, and Prof. Clifton Johnston.
Special thanks to the participants of the studies for their time and experience.
This thesis is dedicated to the following:
My family – Mr. Ravindra Variawa, Mrs. Kavita Variawa, and my brother, Mr. Kunal Variawa.
Professor Susan McCahan.
Friends and colleagues.
v
TABLE OF CONTENTS
1 Introduction ....................................................................................................................... 1
1.1 The Problem ............................................................................................................... 2
1.1.1 Learning Barriers ................................................................................................ 5
1.1.2 Vocabulary Learning .......................................................................................... 7
1.1.3 Types of Vocabulary ........................................................................................... 8
1.2 Framing the Research ................................................................................................. 9
1.2.1 Research Questions ............................................................................................. 9
1.2.2 Theoretical Framework ..................................................................................... 11
1.2.3 Research Strategy .............................................................................................. 11
1.3 Roadmap of the Thesis ............................................................................................. 10
2 Literature Review............................................................................................................ 16
2.1 Universal Instructional Design ................................................................................. 17
2.1.1 Framework and Principles ................................................................................ 19
2.1.2 Criticism of Universal Instructional Design ..................................................... 23
2.1.3 The Implications of UID on the Study of Vocabulary in Engineering Education …………………………………………………………………………………25
2.2 Language Instruction ................................................................................................ 26
2.3 Automated Indexing ................................................................................................. 30
3 Student Self-Assessment Study ...................................................................................... 35
3.1 The Dataset ............................................................................................................... 36
3.2 Overview of the Study .............................................................................................. 37
3.2.1 Methodology ..................................................................................................... 39
3.2.2 Outcomes .......................................................................................................... 41
3.3 Discussion of Study .................................................................................................. 44
4 Frequency Analysis Study .............................................................................................. 47
4.1 Overview of the Study .............................................................................................. 47
4.2 Discussion of Study .................................................................................................. 49
5 Automated Indexing and Evaluation .............................................................................. 53
5.1 Artifacts of Study ..................................................................................................... 54
vi
5.2 TF-IDF Algorithm and Modification ....................................................................... 54
5.2.1 Modification of the TF-IDF Algorithm ............................................................ 57
5.3 Computational Approach ......................................................................................... 61
5.3.1 Software Development ...................................................................................... 62
5.3.2 Results using the Modified TF-IDF Algorithm ................................................ 66
5.4 Evaluation Study ...................................................................................................... 73
5.5 Results of the Automated Indexing Study ................................................................ 79
5.5.1 Courses Selected for this Study ........................................................................ 80
5.5.2 Sample Dataset for One Trial ........................................................................... 81
5.5.3 Summary of Quantitative Results for All Courses ........................................... 84
5.5.4 Statistical Analysis ............................................................................................ 89
5.5.5 Special Case: the statistical effect of using a design-heavy exam .................... 91
5.6 Discussion of Study .................................................................................................. 98
5.6.1 Correlation ........................................................................................................ 98
5.6.2 Implication of Results on Empirical Process .................................................... 99
5.6.3 Insight on Potential Impact on Teaching and Learning .................................. 100
6 Discussion ..................................................................................................................... 101
6.1 Recognition of Engineering Vocabulary as an Accessibility Barrier .................... 101
6.2 Creation of an Approach to Identify Characteristic Discipline-specific Vocabulary in Engineering Education .................................................................................................. 103
6.2.1 The Role of Technology in Vocabulary Characterization .............................. 103
6.2.2 Empirical Contribution ................................................................................... 103
6.3 Implications of the Approach on Teaching and Learning ...................................... 105
6.3.1 Converging Perspectives from the Literature ................................................. 105
6.3.2 The Development of Teaching Aids ............................................................... 105
6.3.3 Producing a Research-based Artifact of the Application of UID ................... 106
7 Conclusions ................................................................................................................... 111
7.1 Research Contributions .......................................................................................... 111
1. Contribution to Theory ....................................................................................... 111
2. Contribution to the Design of Learning Environments ...................................... 111
3. Important Findings ............................................................................................. 112
4. Contributions with respect to recommendations for future practice .................. 112
7.2 Limitations ............................................................................................................. 112
vii
1. Feasibility vs. Accuracy ..................................................................................... 112
2. Difficulties with respect to Measurement ........................................................... 113
3. Single-word processing ...................................................................................... 114
4. Human Intervention ............................................................................................ 114
5. New words inclusion .......................................................................................... 115
7.3 Implications for Further Research .......................................................................... 116
7.4 Final Word .............................................................................................................. 116
viii
List of Tables Table 1 - Principles of UD and UID ...................................................................................... 20 Table 2 - Ten words and statistical significance as described using ANOVA ...................... 42 Table 3 - Sample wordlist from a second year Materials Science Engineering course ......... 67 Table 4 - Course exams used for the evaluation study ........................................................... 81 Table 5 - Sample trial wordlist from a first-year electrical fundamentals course. ................. 82 Table 6 - Symmetric Measures............................................................................................... 90 Table 7 - Chi-Square Tests ..................................................................................................... 90 Table 8 - Case Processing Summary ...................................................................................... 91 Table 9 - Reliability Statistics ................................................................................................ 91 Table 10 - Hypothesis Test Summary .................................................................................... 91 Table 11 - Symmetric Measures (APS111 omitted) .............................................................. 92 Table 12 - Chi-Square Tests (APS111 omitted) .................................................................... 93 Table 13 - Case Processing (APS111 omitted) ...................................................................... 93 Table 14 - Reliability Statistics (APS111 omitted) ................................................................ 93 Table 15 – Independent Correlations .................................................................................... 94 Table 16 – Inter-rater Correlation for APS111 exam. ............................................................ 95 Table 17 – Effect of comparator sets. .................................................................................... 97 Table 18 - Implications of the research using the framework of Universal Instructional Design ................................................................................................................................... 106
ix
List of Figures Figure 1 - Individualized versus a systemic approaches to increasing accessibility .............. 4 Figure 2 - Adapted Johari Window and invisible/visible learning barriers ............................. 6 Figure 3 - Language types in engineering education ............................................................... 9 Figure 4 - Comparison of the OU and PU scores for sample words ...................................... 42 Figure 5 - Sample data from the Frequency Study. ............................................................... 49 Figure 6 - Major components of the computational approach ............................................... 62 Figure 7 - TF-IDF scores for a sample course. ...................................................................... 69 Figure 8 - Three regions on the plotted TF-IDF scores. ........................................................ 70 Figure 9 - Comparing TF-IDF plots of three courses ............................................................ 71 Figure 10 - Comparing TF-IDF plots of different years ....................................................... 71 Figure 11 - Major components of the evaluation study. ........................................................ 74 Figure 12 - Relationship between quintile and participant-assigned scores. ......................... 85 Figure 13 - Count of participant scores for all exams grouped by exam ............................... 88 Figure 14 - Count of TFIDF binned-quintile scores for each exam ....................................... 88
x
List of Appendices Appendix A1 - ASEE2010 - Literature Review Paper 129 Appendix A2 - IJEE - Student Self Assessment 144 Appendix A3 - ASEE2011 - Frequency Analysis 161 Appendix A4 - ASEE2012 - Automated Indexing Scoping 174 Appendix A5 - ASEE2013 - Automated Indexing Modified Algorithm 184 Appendix A6 - CEEA2013 - Automated Indexing CHE230 Sample Analysis 198 Appendix A7 - ASEE2014 - Automated Indexing Evaluation 200 Appendix A8 - CEEA2013 - Automated Indexing Language Learning 213 Appendix A9 - CEEA2013 - Automated Indexing Oral Presentations 216 Appendix B1 - Input Conditioning Walkthrough 219 Appendix B2 - Input Conditioning Software 223 Appendix C1 - Coding the Modified TFIDF Walkthrough 229 Appendix C2 - Coding the Modified TFIDF Software 235 Appendix D1 - Recruitment Email 240 Appendix D2 - Informed Consent 242 Appendix D3 - Instructions and Scale for Evaluation Study 245 Appendix D4 - SAMPLE wordlist for CIV280 247 Appendix D5 - COMPLETE wordlist for CIV280 249 Appendix E1 - Course-by-course Correlations 260 Appendix E2 – Inter-rater Correlation, and dataset for APS111 course 264
1 INTRODUCTION The work discussed in this dissertation is motivated by the idea that changing the
design of the learning environment can increase accessibility for a broad range of learners,
leading to a more inclusive classroom. By increasing accessibility to instructional material,
instructors can provide a higher-quality learning experience for students and potentially
reduce the need to some extent for individual accommodation. Proactively planning for
students with diverse characteristics and transforming the learning environment to increase
inclusivity enables individuals with differences to be accommodated without additional effort
within that system. This inclusive and systemic perspective on educational improvement is
based on a theoretical framework called Universal Instructional Design. The application of
this framework to engineering education, specifically, the vocabulary used in this
environment, is yet untested. This research tests a specific application using this framework
to increase accessibility to course material.
This research investigates the language used in engineering education. Using a multi-
disciplinary approach based in industrial engineering, the communication interface between
students and instructors is examined to identify and reduce learning barriers associated with
inaccessible vocabulary. Three successive research studies are used to: identify vocabulary
that can act as a learning barrier; develop and refine an automated approach to reduce that
barrier; and evaluate the efficacy of that automated approach.
Some of the vocabulary used in engineering education is domain-specific. This type
of vocabulary refers to words that are technical and specialized to the engineering profession.
The learning of this kind of vocabulary is often an outcome of the educational experience. A
student proceeds through engineering education, the corpus of language common to both the
1
instructor and student ought to converge as the student masters the course content. By
maximizing transparency in identifying these corpora, students are better equipped to
develop a robust professional vocabulary while learning the course content. Additionally,
once instructors have identified the core technical vocabulary required for their course, they
can integrate this list into the instructional material knowing that students are aware of the
authentic language they need to master.
The goal of this study is to develop a tool that can be used to increase transparency in
communication in engineering education. Specifically, this tool should be able to replicate
human subject-matter expertise in characterizing discipline-specific vocabulary used in
instructional material. The novel contribution of this research to the existing literature is the
design and evaluation of an automated process that can characterize technical vocabulary
using engineering assessment instruments as its dataset.
1.1 THE PROBLEM Student differences should be taken into account in the classroom. The existing focus
is often on the individuals with differences and the development of strategies for those
individuals to cope with disabling environments [1-3]. However, it may be that the
environment itself is contributing to disability [4]. As such, changing this environment may
lead to a more accessible and inclusive space [1-8]. One case of a disabling environment is
when a wheelchair user encounters a set of stairs that blocks their path. Another case is when
a learner is unintentionally alienated by the context of the instructional material used to
create an authentic learning experience. A third case is the unintentional use of unfamiliar
vocabulary to communicate ideas in a conversation. In the first case, it is clear to see that the
individual is disabled by their physical environment. In the second case, it may become
2
difficult for an instructor to identify this barrier to accessibility if unaided by the student.
The third case may be completely invisible to the instructor. Individuals in the third case
may feel discouraged to seek clarification of ideas, especially if they feel judged for their
lack of understanding of the conversation. All of these are examples of disabilities where
environmental factors can contribute to lack of access.
If an environment is created such that it accommodates people with a wide range of
abilities, right from the initial design stage, then many disabling situations can be minimized
or designed out, whether or not they are identified a priori [5, 9-11]. The area of disability
studies refers to this as the Social Model of Disability. When disability is seen as coming
from the environment, systems, and attitudes, instead of from the individuals themselves,
then it becomes increasingly possible to create a framework and tools with which
accessibility can be increased [2, 5, 12, 13], and this is visually depicted in Figure 1 below.
This figure represents a shift away from individualized learning strategies to cope with
disabling learning environments. It instead focuses on identifying and mitigating
characteristics of the learning environment as a whole to identify learning barriers, and in
turn, inaccessible education.
The left side of Figure 1 shows individual accommodations, each specific to a
particular user. This shows multiple strategies all with the same goal of increasing
accessibility. On the right is a broad-based systemic accommodation, available to everybody
in the set, with each user making use of it according to their needs. This represents a strategy
that is flexible and can be used as needed to increase accessibility for all users.
3
Figure 1 - From left to right: depicts an individualized approach versus a systemic approach designed to increase accessibility for use in a classroom
In a diverse and inclusive learning environment, instructors see students having a
varied set of characteristics, which is viewed as a normal situation [2, 14-16]. These may
include physical and sensory disabilities, as well as learning disabilities, mental health
challenges, chronic illnesses, and psychological factors including attention deficit disorders
[2, 12-14, 17]. If the instructor adapts the learning material so that accessible technology,
active and participatory methods of teaching, online tools, and flexible instruction are added,
then students can experience a more inclusive learning experience [7, 14, 16, 18-21]. This
allows these students to interact with the material and be more comfortable with their
learning environment, potentially leading to a more effective or higher-quality experience
[14, 18]. As a result, the classroom can become enabling for all students, not only for those
who have an identified disability, and this population can include non-traditional students,
second language learners, and a larger proportion of a diverse student body [22, 23].
4
1.1.1 Learning Barriers Existing literature on classroom inclusivity was reviewed for a conference publication
called, “Design of the Learning Environment for Inclusivity: a Review of the Literature”, and
is reprinted in Appendix A.1.
Learning barriers are roughly classified in two dimensions: physical or non-physical,
and visible or invisible. Physical barriers are features of spaces or buildings that make it
difficult or impossible for people to access, often due to anthropometrical or mobility-related
limitations. Non-physical barriers are obstacles that discriminate individuals based on other
features, like understanding of information or not supporting assistive devices. Visible
barriers are obstacles to access that people can clearly identify in an environment. Invisible
barriers are obstacles that people cannot easily identify, clearly understand, accurately
predict, or reliably characterize and discern from the environment.
This research focuses on non-physical invisible learning barriers in the classroom.
These barriers are characterized as impediments that can reduce accessibility to learning and
reduce student performance. Examples of such barriers may include cultural aspects of the
classroom that impede effective two-way communication, unclear classroom rules, ill-
defined learning outcomes, and so on. As instructors try to make the learning environment
more inclusive and accessible for all students, learning barriers reduce instructional material
that is teachable and learnable.
5
Figure 2 - Shows an adapted Johari Window that suggests decreasing invisible barriers as an approach to maximize what is teachable and learnable in a classroom.
One way of visualizing the invisible/visible dimension of barriers in the context of
teaching and learning is by referring to a schematic called a Johari Window. The adapted
Johari Window in Figure 2 can be used to help frame the research by representing the
interface between an instructor and students. It also shows a relationship between teaching,
learning, and learning barriers. The top-left quadrant represents a case where course material
is teachable by an instructor, and learnable by students, if the interface is void of invisible
barriers. The top-right quadrant represents a situation where students face an invisible
barrier, and this reduces learning. The bottom-right quadrant shows a situation where
invisible barriers are present for both the instructor and the students, and this means that
course material can neither be taught nor learned. The bottom-left quadrant represents a case
where an instructor faces an invisible barrier, reducing what course material can be taught in
the classroom.
6
The arrows show that modifying the visibility of barriers affects what is teachable and
learnable. The dotted line represents implications of what a reduction in invisible barriers for
instructor and students would look like. Decreasing invisible barriers interface increases
what can be taught by instructors, and also increases what students can learn. This window
suggests that it is possible to engineer an approach to increase learning by changing the
visibility of barriers. Specifically, this window is useful to help guide an approach that can
increase learning of a particular aspect of engineering education, vocabulary learning.
1.1.2 Vocabulary Learning Understanding technical vocabulary is a component of the engineering learning
environment. Technical jargon can lead to more accurate and precise communication
especially in professional education, such as engineering. As instructors facilitate learning in
the classroom, teaching vocabulary can be simultaneous with the other instruction being
presented. Specifically, the use of technical vocabulary is pervasive as it aids in creating
authentic contexts, develops deeper meaning, and can be employed for assessment purposes.
The problem is that technical vocabulary is not necessarily knowledge that students bring
with them as they enter an engineering learning environment. Students often have to learn a
new corpus of vocabulary – or add to an existing corpus – to develop a more robust
professional repertoire of terminology. In developing this vocabulary, students may face
invisible non-physical learning barriers.
Vocabulary learning in engineering education may be an invisible non-physical
learning barrier for students. Specifically, a priori language differs from person to person,
and in a diversified student body it becomes increasingly challenging to manage language
learning. Due to existing student characteristics, some students may be more adept at
7
learning this vocabulary than others, whereas others may face learning barriers and not even
know. This may be particularly evident if the required learning is not identified a priori.
Additionally, students may have difficulty self-assessing their mastery of vocabulary used in
engineering education and this reduces the efficacy of learning technical vocabulary.
Individualized accommodation in learning discipline-specific jargon becomes increasingly
difficult to administer with a growing student body. Systemic design change that enables
language learning in engineering education becomes increasingly appropriate. As a result,
this can increase accessibility to course material, in turn increasing inclusivity in the
classroom.
1.1.3 Types of Vocabulary There are different types of vocabulary present in an engineering learning
environment. While some of this vocabulary may be technical, there is other vocabulary that
is not. This research treats the characterization of these corpora in a simplified way;
represented in the diagram shown below. Figure 3 shows that the corpus of vocabulary used
in engineering education categorized into three types. The outer circle represents all of the
language used in the learning environment. This would include both non-technical and
technical words. Examples include: “pilates”, “model”, and “laminar”. Contained within
that larger circle is a subset that contains engineering-specific vocabulary. Examples from
this corpus include, “model”, and “laminar”. Within the engineering corpus, there is a subset
of vocabulary that is discipline-specific. An example from this corpus could include the
word “laminar”. Instructors use a combination of words across all of these subsets because
understanding technical language is often a learning outcome of courses in engineering
education.
8
Figure 3 - Shows that language used in engineering education has a subset that is engineering-specific, a subset of which is discipline-specific
1.2 FRAMING THE RESEARCH The research investigates the language of engineering education, with a specific focus
on identifying and decreasing invisible learning barriers due to vocabulary presently used in
the classroom. The overall goal is to make the discipline specific vocabulary used in
engineering education visible to both the instructor and the students. There are three main
research questions that this research attempts to answer.
1.2.1 Research Questions The research attempts to address the following research questions:
1. Do language-related learning barriers exist in engineering education?
2. If language-related learning barriers exist, can they be characterized?
3. Can a strategy be found or developed to assist in the identification and
characterization of these learning barriers, transforming them from invisible to
visible?
Language in Engineering Education
Engineering Vocabulary
Discipline Vocabulary
9
The first research question is used to help gauge, scope, and focus the research.
Investigating language in engineering education can be a broad area, and so this question is
used to gauge the potential for study while defining the specific area of research if one exists.
The measurement used to answer this question is responses acquired from 40 undergraduate
students. Specifically, participants from a representative sample of the engineering learning
population were recruited to take part in a specially designed study. The study aimed to find
whether the participants encountered learning barriers when given language-samples found in
engineering.
The second research question is an extension of the first, and is used to group
common aspects of language together in an attempt to further define the problem area. The
goal of this question is to define what language-related learning barriers look like. Since
language is broad and dynamic, this question attempts to build on commonalities between
problematic vocabularies to characterize the learning barriers.
The third research question is two-part, and forms the bulk of the research.
Specifically, this question probes whether an engineered tool can be developed to replicate
subject-matter expertise in identifying and characterizing inaccessible vocabulary in
engineering education. This question is integral to the development of a strategy that can be
used to mitigate learning barriers due to language. The objective is to replicate human
expertise to a statistically-significant degree. As such, the expected outcome is likely to have
a tradeoff between accuracy and feasibility, due to the massive dataset of language that exists
in the area of engineering education.
10
1.2.2 Theoretical Framework The theoretical framework used to ground this research program is Universal
Instructional Design (UID), and is explained in greater detail in Section 2.1. UID is a set of
guiding principles which seeks to maximize accessibility to education for the greatest number
of learners possible. The intent of the research program is aligned with this well-cited
framework from the academic literature. The framework itself is cross-disciplinary and is
used in several domains to guide the development of more accessible environments. In this
research program, however, the theoretical framework guides the development of a strategy
used to increase accessibility to vocabulary in engineering education.
1.2.3 Research Strategy Three major research studies are used to structure the research program. The first
study is an investigation of how prevalent the issue of non-physical invisible learning barriers
are, in the context of vocabulary on existing course material in engineering education. This
study utilized forty undergraduate students and several words commonly found on existing
artifacts of the engineering learning environment. The outcomes of this study help to scope
the investigation of language to a more focused area of research.
The second study is a preliminary investigation of an algorithm-based mitigation
strategy which has as its goal to help instructors disclose requisite technical jargon
automatically. This study builds on the outcomes of the first to present a cross-disciplinary
approach in identifying vocabulary that would otherwise be seen as an invisible learning
barrier.
The third study introduces a novel process, based on a modified version of an existing
computational keyword-search algorithm, to automatically generate wordlists of
11
characteristic discipline-specific vocabulary based on a dataset of all existing engineering
final exams at the University of Toronto. This process is coded into a computer program that
attempts to replicate human subject-matter expertise in characterizing vocabulary. The
efficacy of this program is then evaluated using eleven faculty members who are presented
with a sample wordlist from their course. These participants then evaluate whether and to
what degree each of those sample words are discipline-specific, and these findings are
compared to the output from the modified algorithm and computational process developed
previously. The outcome is a measure of correlation between the software and the instructor,
and is used to gauge the efficacy of the novel software program. These studies work together
to build and evaluate a strategy that can identify and potentially mitigate a learning barrier in
engineering education.
The outcomes of the research build on the existing literature and combine approaches
from different domains to produce a theoretical strategy to reduce learning barriers. The
methodology employed in this study is cross-disciplinary and intended to serve as a starting
point for future research in the area of language analysis in engineering education.
1.3 ROADMAP OF THE THESIS This dissertation contains three major research studies, all aspects of which have been
published in conference proceedings and peer-reviewed literature, and serve to guide the
development of a strategy that can increase accessibility in engineering education. After a
literature review section that describes prior art in the relevant cross-disciplinary fields, this
dissertation outlines the main findings from the three components of the research.
Chapter 3 presents the first phase of the research, which is to characterize learning
barriers due to language in engineering education. This area is supplemented by literature
12
written by the author and reprinted in Appendix A.2. This is the first of three major studies
which are conducted to develop a strategy to increase accessibility in engineering education.
Chapter 4 presents the second phase of the research, which builds on the first study,
and investigates frequency analysis as an approach to characterize language. This area is
supplemented by literature written by the author and reprinted in Appendix A.3.
Chapter 5 presents the third phase of the research, which builds on the first and
second studies, and comprises the bulk of the research program. This section discusses the
design of a strategy that is used to identify and characterize inaccessible language found on
existing standardized-artifacts of the engineering learning environment. In addition, this
section also investigates the effectiveness of the strategy developed with respect to how well
it can replicate human-subject matter expertise. This information is used to evaluate the
approach in the context of engineering education, and is used to inform the results section
which follows. The design of the strategy is discussed in depth in Section 5.3.
This chapter is complemented by literature, program-specific information, and
evaluation-specific material in the appendices.
• Literature published is reprinted in Appendix A. The researcher is first-author.
o Investigating the application of the TF-IDF equation, and experimenting
with different algorithms: Appendix A.4.
o Piloting the modified algorithm on a chemical engineering course:
Appendix A.5.
o Applying the modified algorithm based on the TF-IDF equation:
Appendix A.6.
13
o Evaluating the efficacy of the computational approach: Appendix A.7.
o Extending the computational approach to subtechnical vocabulary
learning: Appendix A.8.
o Extending the computational approach to measure language proficiency
in oral presentations: Appendix A.9.
• Supplemental information about the input preparation and software programming
is presented in Appendix B.
o A walkthrough of the input conditioning process is presented in
Appendix B.1.
The Visual Basic .NET program code for the input conditioning
process is located in Appendix B.2.
o A walkthrough of the TF-IDF computational method is presented in
Appendix C.1.
The Visual Basic .NET program code developed for the
computational method is located in Appendix C.2.
• Supplemental information about the evaluation study is located in Appendix D.
o Participant recruitment material is presented in Appendix D.1.
o Informed Consent documentation used in the study is located in
Appendix D.2.
o A complete wordlist produced by the computational approach, containing
all words from an exam, is presented in Appendix D.3.
Section 5.4, presents summarized results. This section contains quantitative
outcomes from the strategy employed to characterize vocabulary, as well as statistical
14
correlations which are used to measure efficacy. These results are supplemented by course-
by-course measurements, reprinted in Appendix E.
Chapter 6 discusses the three research studies and their outcomes in the context of
one another, the literature, and more globally. This section highlights key relationships
between the outcomes of each of these studies, and how those are applied to addressing the
research questions presented in Section 1.2.1.
Chapter 7 presents conclusions and future work, and summarizes the contributions of
the research. The conclusion also discusses limitations of the current work, improvement
strategies, and an outlook on potential future work for researchers in this area. This section
also provides an overview of tangential work that was performed to supplement the doctoral
research performed for this dissertation.
15
2 LITERATURE REVIEW There are several different bodies of literature that are conceptually related to this
research. Specifically, this literature is collected from the areas of universal instructional
design, second language learning, and automated indexing.
This section builds on a peer-reviewed conference publication written by the researcher
at an earlier stage of the research program (reprinted in Appendix A.1, “Design of the
Learning Environment for Inclusivity: A Review of the Literature”). That publication, a state
of the art review of the literature, discusses diversity in learning environments, retention in
technical education, disability studies, learning barriers, and literature relevant to this
dissertation. In that paper, the researcher organizes approaches to mitigate learning barriers
into two categories: those which accommodate individual students to those which change the
learning system. The first category includes one-to-one mentoring and personalized
instruction. The second category includes learning strategies for large groups of students and
changes in the design of the learning environment. In general, there are advantages and
disadvantages to both of these types of approaches and there is no single “optimal” method to
address all learning barriers or all learners in the classroom.
Literature about the design of accessible spaces is relevant to the dissertation, and is
used to contextualize the study of accessible learning environments in the field of
engineering education. Literature about the design of public spaces contains discussion
about Universal Design (UD), one of the schools of thought in the literature about
accessibility. A permutation of UD is Universal Instructional Design (UID), which is the
application of design principles to optimize accessibility in educational environments. These
16
concepts are discussed in the context of relevant criticisms, and are then examined in the
context of this research study about language in engineering education.
2.1 UNIVERSAL INSTRUCTIONAL DESIGN The foundations of accessible learning environments stem from areas like Universal
Design (UD). Universal Design is an approach to making public spaces as accessible as
possible to the greatest number of people possible [24-28]. UD is an extension of barrier-free
design, in that it attempts to reduce barriers while also promoting design decisions that
increase accessibility for diverse populations. Over several years, as the UD philosophy was
integrated into the design of physical spaces, it became apparent that the environmental
changes needed to accommodate people with disabilities was also benefitting people who did
not identify as having a disability already. Recognizing that many accessibility features
could be commonly integrated, UD became a tool to make designs more cost-effective, more
attractive, and more usable by larger and more diverse populations. According to several
authors, universal design can be used as a strategy to counter the presence of invisible
barriers [10, 11, 27-33]. UD is a set of design principles that promote interest and awareness
in proactively identifying and reducing accessibility barriers before they become an obstacle
[32].
Universal design has gained public interest and has prompted significant updates to
laws and regulations. Examples include the Americans with Disabilities Act (ADA) [34-36]
and the Accessibility for Ontarians with Disabilities Act (AODA) [37]. In particular,
standards of accessibility have become integral to the design of public spaces to maximize
usability and functionality for users who may be physically or mentally disabled. Examples
of such accessibility standards include: the number of required accessible parking spaces,
17
curb-cuts on sidewalks, dwelling-related construction features, and so on. From the fields of
design, architecture, and construction, it appears that making such changes to existing
structures for accommodation often occurs at a higher financial cost than to incorporate these
changes into the design from the start [9, 22, 35, 38, 39]. As designers create more
accessible public spaces for people with physical disabilities, there is also much work being
done to make spaces accessible for people with non-physical disabilities: the work in this
area is focused on perception of physical spaces and using multi-modal sensory inputs to
understand one’s surroundings [40-42]. Additionally, the ADA is beginning to increase
accessibility to online environments with design considerations that impact information
transfer, e-business, and also aspects of cyber security [10, 26, 41, 43-47].
The Accessibility for Ontarians with Disabilities Act (AODA) is a more local
application of accessibility legislation that directly affects public spaces and services
including universities [37]. The AODA’s requirements for educational institutions, as
interpreted by the Council of Universities (COU) emphasizes the importance of increasing
accessibility in the classroom [37, 48]. One of COU’s most publicized mandates is to
connect students to resources on campus which work to raise awareness for, and reduce,
mental health-related issues. More recently, the COU and AODA are interested in deploying
online resources to reduce barriers to accessibility. In the context of this research,
understanding language barriers can become a component of COU’s strategy and may
eventually be used to decrease learning barriers for university environments. As such,
institutions are obligated to understand and mitigate learning barriers, and the research being
performed is a step towards this achieving this goal.
18
Universal design has been applied to the area of teaching and learning to maximize
accessibility in learning environments. When UD is applied to education, it can be referred
to as Universal Instructional Design (UID), Universal Design in Learning (UDL), or
Universal Design of Education (UDE), all of which are used interchangeably [49-52]. In the
context of this research study, the phrase Universal Instructional Design will be used. UID
applies the principles of Universal Design to teaching and learning. UID is not just about
accessibility for persons with a disability, but like UD, it is about considering the potential
needs of all learners when designing and delivering instruction. Through that process, one
can identify and eliminate barriers to teaching and learning [53]. This can improve access to
learning for students of all backgrounds and learner characteristics, while maintaining
academic integrity and minimizing the need for special accommodations.
2.1.1 Framework and Principles
Universal Instructional Design is a set of principles that form a practical framework
with the goal of improving learning opportunities. It represents a set of initiatives, principles,
guidelines, and projects that promote and work toward inclusive and equitable access to
learning [14, 22, 23, 27, 54]. In particular, it is a process that involves considering the
potential needs of all learners when designing and delivering instruction or creating
institutional policy and systems.
The goal of UID is to maximize accessibility for all students, including students with
disabilities and differences, in educational environments [23, 53]. In addition to the number
of students who face physical barriers, an increasing number of students are identifying other
types of learning barriers in classrooms. This challenges the existing approach of teaching in
diverse environments [55]. Currently, some of the challenges associated with removing or
19
mitigating learning barriers are managed by individual one-on-one learning support [33, 53,
53, 55]. As the learning population increases and diversifies, this poses a significant strain
on existing resources which in turn necessitates considering systemic changes [55]. UID
attempts to increase accessibility for as many students as possible. Hence, the UID approach
is in contrast to providing accommodations for a specific student - it is a systemic approach
used to increase accessibility in the classroom [55].
UD is a design philosophy for physical spaces and UID is an approach used in
instructional environments. The principles intrinsic to both can be compared in the context
of this research program. Both of the UD and UID schools of thought are grounded in seven
principles which serve to codify design methodologies (see Table 1). This table shows the
principles of UD to the left [25-28, 31, 32, 56-59], and the corresponding principles of UID
on the right [22, 23, 49, 53, 60, 60], as significantly condensed from the literature.
Table 1 - Shows the principles of UD and UID Principle from Universal
Design Principle from Universal Instructional Design
1. Equitable Use Class climate 2. Flexibility in use Interaction 3. Simple and intuitive to use Physical environments and
products 4. Perceptible information Delivery methods 5. Tolerance for error Information resources and
technology 6. Low physical effort Feedback, Assessment 7. Size and space for approach
and use Accommodation
The principles of UD and UID were developed and subsequently refined by several
authors in the field. Ronald Mace is credited as being the founder of Universal Design [57,
59, 61], and founded the Center for Universal Design [33]. Mace and others [25-28, 31, 32,
20
56-59] place high value on the importance of diversity and inclusiveness, and promote
maximizing accessibility to an engineered outcome to the greatest extent possible [33, 59].
Similarly, centres for Universal Instructional Design at several universities developed and
subsequently refined the principles of accessibility as a codified aspect of the curriculum and
learning environment. Initially, these principles were guidelines formed by instructors using
tools that allow for creativity in teaching methods, alternative means of presentation, and
choices for effective assessment in the classroom [23, 51, 52]. Over time, several authors in
the field of instructional design coalesced the ideas that eventually led to the creation of UID
principles, based on research about accessibility in education [22, 23, 49, 53, 60, 60]. Both
design philosophies are roughly comparable to one another and can be discussed in the
context of the research being performed.
Principle #1 removes value judgment from the design. It suggests providing access in
a way that values diversity as a normal part of the environment. It also suggests that
instructors adopt practices that respect both diversity and inclusiveness, and make this an
explicit objective of their teaching experiences.
Principle #2 places an emphasis on the capacity of the design to accommodate a wide
range of individual preferences and abilities. An example of this would be the design of a
warning signal to be both audible and visible. It also encourages regular and effective
interactions between all members of the learning community including all students and the
instructors, and emphasizes the importance of clear and accessible communication. For
example, this principle implies that instructors should be explicit about communicating
learning outcomes and how they will be measured, as well as maximizing use of accessible
language for all people in the learning environment.
21
Principle #3 promotes designs that are easy to understand, regardless of the user’s
experience, knowledge, and language skills. It suggests using instruction that is appropriate
for the level of learner, not necessarily simplifying the content.
Principle #4 suggests that a design should communicate information effectively
regardless of ambient conditions or the user’s sensory abilities. It also suggests using
multiple modes to deliver content, including in-person, online, collaboration, and
independent learning.
Principle #5 suggests that the design should minimize hazards and the adverse
consequences of accidental or unintended actions. In particular, the designer should begin to
predict potential misuses and design to reduce the probability of any associated risks. It also
discourages instructors from using course materials and resources that unintentionally bypass
learning. One example of good practice would be to discourage students from closely-
following a template for an open-ended design report, as this may constrain students from
investigating deeper learning in some cases.
Principle #6 suggests that the design should be useable with minimum fatigue. For
example; incorporating ergonomics principles in the design of products and processes. As
applied to the classroom, means that instruction should minimize unnecessary repetition that
does not encourage new learning or strengthening of what is already learned.
Principle #7 suggests that appropriate physical size and space is provided for
approach, reach, manipulation, and use regardless of the user’s body size, posture, or
mobility. For example; design of a laboratory environment where a user has adequate
uncluttered experimental space. This encourages proactively planning for student needs that
22
are not met by the non-physical aspects of instructional design. Consulting campus resources
such as note-takers, providing course materials in alternate formats, and arranging for other
accommodations for students with known disabilities are all examples of how this principle
is incorporated in the classroom setting.
The principles above show how UD and UID are codified approaches that can be
used to increase accessibility in both physical spaces and the learning environment. These
guidelines all work to improve accessibility in products and environments by specifying
general goals that an engineer and designer ought to work towards [27, 28, 31-33, 57, 59].
These are generally applicable to most engineering designs and they can be applied to other
disciplines as well, including education, as shown by the principles of UID.
2.1.2 Criticism of Universal Instructional Design Though UID appears to be a promising approach, there are criticisms of this
methodology discussed in existing work [14, 29, 49, 53, 55, 60]. It is suggested that UID
falls short of increasing accessibility for everyone when a specific student requires
individualized attention or accommodation. Here, the systemic-approach of UID assumes
that all students are being served by increasing accessibility within a course. However, this
neglects to take into account the repercussions or effects of the holistic approach on an
individual for whom this may not be enough. UID is a step towards making these
environments more accessible, but personalized individual attention would still be required
for some people [55].
Another criticism is that UID simplifies existing learning material. Here, the use of
accessible language and multi-modal instruction is seen as a factor that makes learning
“easy” instead of being rigorous. This criticism is based on the logic that existing material is
23
difficult and that the energy spent in understanding is part of the learning process. UID,
however, suggests that learning material is presented at a level appropriate for the audience,
and not to pointlessly complicate nor simplify the instructional content or its delivery [23,
55]. This may help to promote deeper learning than previously thought because the students
are potentially developing a clearer and more intentional sense of what they are learning.
Further criticism of the UID approach is that it is quite resource-intensive to
implement [2, 16, 35, 62]. In particular, the cost of changing pedagogical approaches and
physical spaces is higher when completed on a large scale than when performed on an
individual basis. Here, the argument is that UID is becoming more necessary because more
students are requiring such additional assistance, and instead of having repeated
accommodations for a growing population, it would be more favourable to increase
accessibility for all students [53, 55]. The additional benefit of this is that accessibility is
increased for students who may not have asked for special accommodation, and as such is
useful to a greater population than initially intended.
This criticism applies to the research study as well. Here, the proposed research
accepts the fact that not all students will benefit by increasing clarity and transparency of
learning outcomes. Some students may still require additional accommodation and
individual learning strategies to cope with specific learning barriers. Reasons for their
unique learning barriers may vary, and as such a system-level approach to increasing
accessibility across the board may leave out areas where specific students still need support.
For example, increasing accessibility to course material by using multi-modal instruction
may make it easier for the majority of students to engage with the material, but it still leaves
out some students who require additional one-on-one remedial support.
24
The second area of criticism misses the point, however, because the intent is not to
simplify the vocabulary used in engineering education, but rather to highlight terms that
students ought to learn in order to develop a robust professional vocabulary. The use of a
UID strategy to make invisible learning barriers visible does not eliminate inaccessible
words, but rather empowers students to learn these words proactively.
The third criticism is also addressed, and this is due to the increasing need to identify
and characterize learning barriers in education that affect students. In particular, legislation
such as the AODA places a high priority on accessible learning environments, and this
motivates the development of strategies to increase accessibility for all students. Though
there are criticisms in deploying the UID system-focused method to investigate invisible
language-related learning barriers in engineering education, there is still significant reason to
pursue this research project through to completion.
2.1.3 The Implications of UID on the Study of Vocabulary in Engineering Education
Universal Instructional Design is used as the conceptual framework for this research
that investigates of language in engineering education. The systems-focused approach of
UID serves to increase accessibility for a broad and diverse population without need to
specifically test the accessibility requirements of each individual within that environment
[55, 63, 64]. This serves the study well because the student population is constantly
changing [55]. In addition, the intention is to study and develop instructional tools that can
work regardless of the institution, and a broad scope would benefit transferability in this
regard. Also, UID serves as a framework for developing a credible research strategy that
focuses on using an engineered approach to design the learning environment for accessibility.
Instead of focusing specifically on each user, the approach targets larger situational factors
25
that affect teaching and learning [53, 63]. Additionally, this encourages inclusivity in the
learning environment by creating an atmosphere where all learners are treated equally rather
than singling-out individuals who require accommodation. This may increase learning
because all learners feel like they are part of a community where everyone has equal access
to course instruction and learning [65]. The theoretical implications of UID to the research
on language in engineering education focus on: the design of a research study to maximize
accessibility to a large and diverse user group; increase teaching and learning in the
classroom; and, promoting the development of an inclusive learning environment.
2.2 LANGUAGE INSTRUCTION
Existing literature in the area of language instruction relevant to this research can be
classified quite broadly into first-language acquisition and second-language learning.
Literature in the area of first-language learning often focuses on the mechanics and the
understanding of the structure and meaning of vocabulary as well as the use of this
vocabulary is synthesizing new meaning [66-68]. There is a substantial amount of work in
this area and it includes material from early childhood development and related studies [66-
69]. Existing research focuses on phonology, morphology, syntax, semantics, and the
development of vocabulary [68, 70, 71]. Though language can be vocalized or presented in
more physical form, the capacity of learning language is based on a syntactic principle called
recursion [70, 72, 72]. There is also work that describes how language is interpreted when
being learned, and aspects that characterize this development of meaning.
A characteristic aspect of language acquisition is the ability to make connections that
appear arbitrary. For example, there is nothing about the word “mouse” that connects to the
26
meaning of this word. Additionally, the combination of multiple symbols and meanings to
produce novel meanings is also an area of research in this field of language instruction [73].
Literature according to Hockett defines this as “productivity” and as a critical element of
human first-language instruction, it is the ability to use an unlimited number of words that are
constantly changing and developing new meaning [74]. The research in first-language
acquisition emphasizes the ability to use a seemingly-unlimited range of vocabulary tokens
and actively produce new meaning [66, 67]. In addition to this work, new and emerging
research speaks to language learning from a biological perspective.
More recent research in the specific area of first-language acquisition also focuses on
how language capacities are developed by young children and whether there is a biological
component to first-language acquisition [68, 69]. In particular, the literature suggests that
first-language acquisition may be partially based on how the human-brain functions innately
versus the language environment in which a person is raised [68-70]. These discussions lead
to questions about what happens when multiple languages are learned.
Second-language learning (SLL) refers to any language learned in addition to a
person’s first language, and is not restricted to the order in which these subsequent languages
are learned. In particular, the field of SLL is related to applied linguistics and is connected to
fields like psychology, cognitive psychology, and education [75, 76]. Two quintessential
papers in this field are P. Corder’s “The Significance of Learner’s Errors” and L. Selinker’s
“Interlanguage”, both of which are pioneering and highly-cited in this area [77, 78]. These
two papers are used as a basis for much other work in this field. Corder’s work examines the
types and causes of language errors and suggests that they can occur not because of
similarities or differences between a learners first and second language, but rather because of
27
faulty inferences about the rules of the new language. In the context of the dissertation, this
means that errors in new language learning (like technical vocabulary learning) are affected
more by clarifying rules of that new language rather than trying to establish a connection
between existing knowledge and the new vocabulary. Selinker’s highly-cited work builds on
initial findings by Weinreich, and suggests that language can, in addition to other things,
become “fossilized” between low and high proficiency [79-81]. In the context of this
dissertation, it means that as students learn technical language they can stop at a “hybrid
point” between not knowing the meaning of a word, and fully understanding the meaning of
a word. Regular and intended use of new language helps to improve proficiency. It also
reduces the stagnation that can occur if words are not used natively during communication.
The implication of this is that if students are aware of the technical vocabulary they ought to
be familiar with ahead of time, then that creates a starting point for instructors to use that
vocabulary naturally in teaching. Furthermore, according to theory, this reduces the
fossilization of language in turn promoting more robust vocabulary development, and
potentially more effective communicators.
At least two major perspectives have emerged since these works and are based
loosely on universal grammar [82-84]. These perspectives include skill acquisition theory
and connectionism. They maintain that learning another language is rooted in how a person
can incorporate the symbolic representation of ideas into their existing knowledge, and
connect those new symbols to the ability to communicate with others [76, 85]. As these
topics emerged, there has been increasing debate on how language is learned.
There is much discussion in the academic literature about exactly how language is
learned. One difficulty is that the multidisciplinarity of this field causes experts from each
28
discipline to tend towards theories that they associate with rather than a larger unifying
understanding of second-language acquisition [85]. However, experts agree that the stages of
second-language acquisition are as follows: preproduction, early production, intermediate
fluency, and advanced fluency [86, 87]. This means that learners gradually increase their
vocabulary knowledge initially using imitation to form short sentences, then to develop
simple conversational sentences, and gradually master vocabulary through repetition and
creative development of ideas using word combinations.
A large differentiating factor of first and second-language learning is that the latter is
influenced by the languages that the learner already knows. The literature describes the
interaction between languages as language transfer, and this can also be influenced by
external non-communication-related factors as well [88-90]. According to Krashen’s theory,
the sequencing found in traditional classrooms to learn new languages may be detrimental to
language development because learners often use a universal grammar model [90-92]. This
is further seen in the literature in the theory of comprehensible output and studies of
bilingualism, where learning additional languages using a sentence-based protocol is less
likely to produce proficient speakers than when using bits of that language [93].
Long’s interaction hypothesis suggests that second-language acquisition is
particularly strong when there is “normal” communication in that second-language by at least
one speaker [94, 95]. These research studies infer that it may be more beneficial to have
students learn additional languages using pieces of language rather than always using them to
develop sentences and longer linguistic artifacts. In addition, it suggests that the instructor
ought to continue speaking in the “new language” to strengthen vocabulary development in
the student population. Further, these findings suggest that teaching students words in a new
29
language can preferentially help them develop an understanding of concepts in that language,
and then those students can gradually increase their proficiency over time [88-91, 93-95].
The theoretical implication is that learning additional languages, such as the
technical-speak used in engineering disciplines, can be made stronger by emphasizing and
encouraging learning of key words, as the instructor continues to use them normally.
Further, the process of understanding language may be just as effective, if not more effective,
when the student learns key pieces of a new language within an immersive context of
introductory learning.
2.3 AUTOMATED INDEXING The literature discussed in this section builds on the literature reviews in academic
papers reprinted in Appendix A.4-A.9.
Language is an ever-changing form of communication that has elements particular to
the domain in which it is used [96, 97]. In the context of engineering education, this means
that the vocabulary can change based on a number of factors including: time, discipline (field
of engineering), intended audience, instructor, technology, situational factors, and so forth.
Other elements of language include structural properties, linguistic roots, presentation, and
rhetorical aspects [98]. For the purpose of this research, the scope of the investigation will be
limited to the vocabulary itself to help bound this very broad area of study. Research in the
area of vocabulary analysis is also quite broad and includes literature about the formation and
use of words, the contextual properties of different types of words, and also about how words
change over time [96-101].
30
Of the many types of vocabulary studies performed, there are a few that are most
directly applicable to the study of vocabulary in the context of engineering education. In
particular, literature in the area of indexing is pertinent to the analysis and of vocabulary to
develop an instructional aid based on principles of universal design and language instruction.
This particular focus became more evident as the research progressed: the start of the
research study did not focus directly on indexing approaches because it was only one of
many approaches under consideration. As the research continued, it became clearer that a
research strategy that included indexing as a vocabulary characterizing tool was of great
importance. As such, the papers provided in Appendix A.4-A.9 include more literature
background about the area of indexing than the publications produced at the beginning of the
study. Indexing appears to provide a computational technique in which to analyze
vocabulary sets that are diagnostic to the disciplines within the field [102, 103], and this is
central to the research being performed.
Indexing is used to characterize information for ease of access. Traditionally,
indexing was used by professions such as library studies to classify books and other
instruments of information [102]. Indexing is also widely used in the broader scope of
communication, to condense volumes of knowledge into meaningful bits of information
[104]. This emphasis on succinctness has an effect that can make processing of multiple bits
of information manageable, especially with limited resources.
The word “index” has multiple definitions across different domains, ranging from
business to mathematics and others [105]. As used in this study, indexing refers to the
collection and characterization of information within documents – specifically, vocabulary.
In the broad field of information studies, indexing is considered one of the foundational
31
elements of research in this area [106-109]. With changing technology, research in this field
has become increasingly based on the technology of the time – from manually indexing and
archiving work to, more recently, search-engine programming. Indexing is critical to the
characterizing of document text [97-99, 110].
In the fields of linguistics and the philosophy of language and education, indexicality
bridges language and learning [111-113]. In the process of characterizing an idea, C.P.
Peirce suggests that the subject becomes a meaningful symbol through the use of accurate
vocabulary [114, 115]. Noted scholar E. Ochs suggests that indexing language can be an
extension of linguistic anthropology, where the inclusion of gender-related mechanics of
language influence greater societal studies [116, 117]. Indexing is also conceptually linked
to semantics, since it allows researchers to better-understand the area of communication and
its use to convey meaning [111].
Research in the area of indexing includes studies about deixis and deictic terms,
which have meanings that vary with contextual elements (examples of which include, “now”,
“here”, and “I”). Authors have also performed extensive studies on the pragmatic aspects of
indexing [108, 118]. Specifically, the Peirce Trichotomy of Signs discusses sign-relations
and the bases being indexed [111, 114, 115]. Using more comprehensive linguistics-based
approaches, models, and research, classifying language can be performed in terms of tokens,
symbols, and arguments, as well as sign-to-object relationships [119-121]. This research also
discusses referential and non-referential indexicality, orders of reference, deference and
interlocutor studies, and extensions to other ontological areas of research [120-126]. Though
there are several highly-cited authors in this field, much of the work in these areas can be
traced back to researchers and philosophers that include: Y. Bar-Hillel, R. Lingens, J. Locke,
32
U. Eco and J. Lotman [127-131]. Their combined works include research on pragmatism,
semiotics, semantics, literary theory, cultural and social-discourse. Much of these areas of
research are linked to indexing studies and characterizing vocabulary, which are, in turn,
central components of language, learning, and social development.
Automated indexing is the use of assistive means to characterize pieces of
information within datasets. In the context of this research study, automated indexing refers
to using a computer program to characterize vocabulary used in engineering education. With
advances in technology, computers are becoming more capable of processing larger datasets
and utilizing more complex algorithms in a reasonable run time to identify key terms in
documents [132, 133].
A pioneer in the field of automated indexing and information studies is Gerard Salton.
Salton’s work is critical to this research study for a number of reasons, the most important of
which is his contribution to the development of algorithms that characterize words [132, 134,
135]. Specifically, he was involved in building a vector space model for information
retrieval - and subsequently the Term Frequency-Inverse Document Frequency equation - in
the field of computational linguistics. The algorithm which employs this equation is
modified for use in the current research. In Salton’s model, both documents and queries are
represented as vectors of term counts, and the similarity between a document and a query is
given by the cosine between the term vector and the document vector [134-140]. Though the
application of this is intended for retrieving information from a large dataset, it can also be
used for automated indexing and summarization of vocabulary.
Other authors in this field have contributed to research in the area of automated
indexing, and have evaluated the effectiveness of different algorithms. Some of these
33
approaches include: Kullback-Leibler divergence, latent semantic analysis and indexing,
singular value decomposition, multiword, correspondence analysis, and latent Dirichlet
allocation [141, 141-146]. These approaches are based on computationally decomposing
relationships between queries and datasets, in an attempt to improve automated information
retrieval and indexing.
An advantage of using automated indexing over unassisted indexing is that it uses the
processing power of a computer to characterize language as it evolves over time [147, 148].
In particular, advances in computer technology have made it possible to rapidly mine large
sets of vocabulary data for characteristic vocabulary using an algorithm-based computational
strategy. The increased speed and precision of this computation enables researchers to
investigate language quantitatively, to help optimize language learning to promote a more
accessible learning environment. As a result, research in the area of automated indexing is
an important component in the investigation of language in engineering education.
34
3 STUDENT SELF-ASSESSMENT STUDY This dissertation discusses three research studies performed sequentially in the
investigation of language in engineering education. These studies are: ‘student self-
assessment’ (Chapter 3), ‘word frequency analysis’ (Chapter 4), and ‘automated indexing
and evaluation’ (Chapter 5). The first study, student self-assessment, was used to scope the
investigation of engineering language, and gauge the pervasiveness of vocabulary-related
learning barriers currently present in the classroom. The second study, word frequency
analysis, was used to investigate word-frequency as a technique that can characterize text
across datasets. This study also helped identify research in computer science and tested its
applicability to engineering education. The third study, ‘automated indexing and evaluation’
was about the design, development, and testing of a novel approach to characterizing
document text in engineering education. All three of these studies have been published (see
Appendix A for reprints).
The first study, which is the focus of this chapter, shows the research on student self-
assessment of vocabulary proficiency. This study was conducted early in the research
program and was published in the International Journal of Engineering Education, a reprint
of which is in Appendix A.2. This chapter is heavily based on that publication, and draws
from the material contained therein.
The purpose of the student self-assessment study was to gauge whether language-
related learning barriers and single-word understanding are existing problems in engineering
education, and if these problems could be characterized. This and subsequent studies relied
on a common dataset that contained examples of language used in engineering education. As
35
such, identifying an artifact of the engineering learning environment that is available from as
many courses as possible was essential.
3.1 THE DATASET Investigating language in engineering education is a broad field of research. One
reason for its breadth is due to the abundance of language in a learning environment. There
are many ways and mediums that language permeates the instructor/student interface. From
traditional “lecturing” environments to more textbook-based approaches, language is a key
component of learning in the classroom. Navigating the dataset of language in engineering
required a clearly defined scope, because of the sheer quantity of data available. In order to
define this scope, it was important to select an artifact of the engineering learning
environment that was common across many courses and readily available. It also had to be
an artifact that accurately captures the language used in engineering education.
Engineering courses have many documents associated with them: textbooks,
handouts, assessments (e.g. assignment instructions and tests), syllabi, and so on. These
documents are not necessarily representational of the entire course, but instead are a snapshot
of a particular aspect or learning outcome of a course. Syllabi are common, can provide a
broader understanding of a particular course, but are not necessarily indicative of teaching
materials. Additionally, the discipline-specific vocabulary in them might be sparse. In
particular, they are usually employed to address course content at a meta-discourse level.
One of the only written documents beyond the syllabus that is common to the majority of
engineering courses is a final examination. These are documents that provide a summative
encapsulation of course content, in a medium that is closely-supervised and often carefully
constructed. The main reasons for using final exams as the artifact for study in this research
36
program are that they are comprehensive, written carefully, are relatively standardized in
length, are intended to be interpreted without additional assistance, and that they represent a
substantial available dataset of language in engineering education.
Final examinations at the University of Toronto were chosen for several reasons.
First, the database of final exams is readily available and in electronic format at the Faculty
of Applied Science and Engineering at the University of Toronto. At this institution final
exams from previous years are available on a publicly-accessible website so that students can
use them for study purposes. The electronic-format of these exams allows for text-analysis
using software programs that can identify strings of characters as words, making it feasible to
perform research on large quantities of written language. Second, the rules for administering
the exams indicate that students are not able to access assistance during an exam, which
means that they must rely on their a priori vocabulary to make sense of the questions and the
vocabulary therein. And as a critical assessment in a course, the exam should be testing the
student’s understanding of the course concepts comprehensively rather than the student’s
vocabulary (unless vocabulary knowledge is a defined learning outcome). Finally, every
undergraduate-level final exam is the same duration, 2.5 hours, which allows for some
common basis of comparison.
3.2 OVERVIEW OF THE STUDY The first of three sequential studies, the “self-assessment study”, investigated the
prevalence of language-related learning barriers in engineering education, and was published
in the academic journal paper reprinted in Appendix A.2. There were a number of key
outcomes from this study that formed the foundation of our understanding moving forward
into the primary research.
37
Language used in engineering course materials can potentially be a barrier to learning
and inclusivity because students may perceive the meanings of words differently: this
variance could be attributed to cultural, technical preparation, and linguistic differences
among learners. This perception of vocabulary changes with education, experience, and
other factors. In an engineering classroom, however, there needs to be a convergent
understanding of language so that the course content can be interpreted accurately [149].
Basic TOEFL exams and English proficiency tests are not calibrated to gauge the cultural or
engineering-specific technical components of language used [150]. Furthermore, if students
cannot accurately assess their existing understanding of words, then it becomes increasingly
difficult to build a converging corpus of language used to communicate in the classroom
[149, 150].
Vocabulary used in final exams plays an important role in accurately assessing
student performance. If instructors use vocabulary that is not understood by the student,
then the assessment changes from testing course concepts to testing understanding of
vocabulary. As a consequence, the validity of engineering examinations may be
compromised when non-technical and inexplicitly-defined vocabulary is tested in addition to
course learning objectives. Specifically, using inaccessible vocabulary on final exams would
mean that the assessment no longer exclusively tests what it purports to test: students’
mastery of course concepts. Instead, the exam is now also testing whether students
understand this additional vocabulary used to contextualize course concepts.
The existing strategy used in a large first-year design course at the University of
Toronto is to use a word list, which is provided to each student prior to each test in this
course. This word list contains all of the infrequently used words (i.e. words such as “and”,
38
“the”, “are”, etc. are not included) that appear on that particular test. This word list is then
padded with some additional vocabulary and then alphabetized so that the questions on the
exam are not apparent from the words present on the list. The intent is to give the students an
opportunity to gauge their own level of understanding of the test vocabulary beforehand, and
if required, consult information sources to correct any gaps ahead of time. This strategy
allows the instructors to contextualize questions and use accurate, authentic vocabulary,
including engineering terminology. However, this approach is predicated on the assumption
that given a list of words, students can correctly assess their level of understanding of these
words [64].
These reasons motivated the study of student self-assessment of vocabulary
proficiency, as it is important to gauge what students think they know versus what they
actually know. Understanding this gap, if one exists, would help gauge the severity of
invisible learning barriers due to language in engineering education. A significant difference
between perceived understanding and actual understanding would indicate the presence of an
invisible learning barrier, and this would provide scope for the research program.
This study was used to gauge if students can accurately self-assess their
understanding of vocabulary on exams. In particular, this study tested whether vocabulary
understanding is a ‘visible’ or ‘invisible’ learning barrier from the learner’s point of view.
Better understanding this learning barrier, if it were to exist, would provide useful data to
help further investigate language in engineering education.
3.2.1 Methodology To carry out this study, an ethics protocol was established and approved by the Board
of Ethics at the University of Toronto. Then, posters and other signage were used to recruit
39
forty undergraduate engineering students of diverse (self-reported) cultural backgrounds and
proficiencies in English (including native speakers).
The study tested participants’ understanding of ten words that might appear on an
undergraduate engineering exam. These words were chosen from a dataset of final
engineering exams across all disciplines and years, with a focus on selecting words that were
very different from one another. For example, the words “car” and “truck” would be highly
similar, but the words, “car” and “ratatouille” are different. The criteria used to select the 10
words used in this study was to generate a list of 30 words used at least once in any existing
engineering exam. The next step was then to remove 20 words from that list which may
appear to be similar, until only 10 dissimilar terms remain. These ten words were given to
each participant in alphabetized order, and shown in the left column of Table 2.
The participants’ task was to rate, quantitatively and qualitatively, their perceived
understanding (PU) of each of the ten words. Each participant was asked to assign a
numerical value from 1-5 for their self-assessed understanding for each word using a scale.
If the student believed that they were very proficient in understanding that word, a high PU
score was assigned (“5”). If the student believed that they were not as proficient in
understanding that word, a low PU score was assigned (“1”). Supplementary information
about this scale is presented in Appendix A.2. Each participant was also asked to provide
synonyms and/or definition(s) to each word, to provide “evidence” for substantiating their
numerical understanding score.
The second task was to develop an observed understanding (OU) score that would
identify what students actually know. For this step, the experimenter used a reference source
to characterize the qualitative responses received by each student. Specifically, each
40
student’s explanations and synonyms for each word were compared to the Oxford dictionary
of English, an authoritative standard for word definitions. If the student-provided synonyms
and explanations were sufficiently close to the one provided in the dictionary, then a high OU
score (“5”) was given to that word. If not, then a low OU score (“1”) was assigned along a 5-
point scale.
3.2.2 Outcomes This study produced 800 data points in total (400 OU and 400 PU scores), which
were then analyzed using a statistical method called an ANOVA. The quantitative
similarities and differences between student perceived-understanding words were compared
to their observed-understanding. Some of these results and analyses of these data are shown
in Figure 4 and Table 2, with additional details in the journal article reprinted in Appendix
A.2.
The ten words and the statistical significance (ANOVA) are presented in Table 2.
This table also shows the means and standard deviations for the OU and PU scores, as well as
the difference between the two, and the t-test results for each word. The results indicate that
“bonnet”, “bungalow”, and to some degree “Jell-O” are self-assessed accurately. These
words do not appear to be an invisible learning barrier because they are accurately assessed.
These terms also had minimal variability. In addition, the data shows that though students
may be are unfamiliar with these terms, they recognize this lack of familiarity; it is a visible
learning barrier to them. These results also suggest that the other words are not assessed
accurately, and this is shown by the values of the t-tests.
41
Table 2- Shows the ten words and statistical significance as described using ANOVA
Word Means Stdev PU-OU Means
PU-OU Stdev
OU/PU t-test
Bonnet PU 2.1 1.582
.275 1.062 0.87 t(39)=1.64, p=.109 OU 1.83 1.81
Bungalow PU 3.25 1.565
.175 1.338 0.95 t(39)=0.83, p=.413 OU 3.08 1.94
Fax PU 4.03 0.800
.200 0.883 0.95 t(39)=1.43, p=.160 OU 3.83 0.747
Feasible PU 3.93 0.797
1.375 1.125 0.65 t(39)=7.73, p=.000 OU 2.55 1.011
Field PU 4.13 0.686
1.250 1.056 0.70 t(39)=7.49, p=.000 OU 2.88 0.822
Jell-O PU 3.73 1.219
0.250 0.742 0.93 t(39)=2.13, p=.040 OU 3.48 1.585
Mold PU 3.03 1.310
0.400 1.105 0.87 t(39)=2.29, p=.028 OU 2.63 1.462
Propagate PU 2.88 1.285
0.900 1.336 0.69 t(39)=4.26, p=.000 OU 1.98 1.544
Succinct PU 1.95 1.974
0.475 1.132 0.76 t(39)=2.65, p=.011 OU 1.48 1.853
Tolerance PU 3.90 0.672
2.075 1.385 0.47 t(39)=9.48, p=.000 OU 1.83 1.174
Figure 4- Comparison of the OU and PU scores for two sample words, Bungalow (left) and Tolerance (right). The vertical axis represents the summarized total of each score. This is reproduced from the literature in Appendix A.2.
42
Figure 4 illustrates an example where a technical term can be an invisible barrier to
learning. The left side of this figure shows a bar chart of OU and PU scores for the word,
“bungalow”. The right side of this figure shows a bar chart for the word, “tolerance”. When
these two sample words were compared, the difference between OU and PU scores for
“bungalow” was less than “tolerance”. This shows that the word “tolerance” could be
inaccurately self-assessed by the students, when compared to the word “bungalow”. Though
this is a snapshot of the words used in engineering education, it shows that some words can
potentially have variations between OU and PU assessments. Overall, data collected from
the study, showed that word mastery is non-uniform, and that PU is almost always ranked
higher than OU.
An accurate self-assessment would mean that the OU and PU scores would be identical
(OU-PU=0). However, the data showed that for this study, students correctly self-assessed
their understanding only 34.5% of the time, overrated their understanding 52.8% of the time,
and underrated their understanding 12.8% of the time. Additionally, there were noticeable
differences between words. Words that had an OU/PU ratio close to 1, as seen in Table 2,
included words like, “bungalow”, “Jell-O” and “fax”, all of which were present on at least
one engineering exam in the dataset. Moreover, the OU/PU ratio illustrated that these 40
participants were less likely to correctly self-assess their understanding of more technical
words such as “tolerance” and “propagate”. This is important because, although words like
“bonnet” had a low overall PU and OU, students were apparently aware of their lack of
understanding which made this word a visible learning barrier for them. The student’s
perception of understanding technical words, however, were often overrated, indicating that
this was an invisible learning barrier.
43
From this study it was determined that undergraduate students do face vocabulary-
related visible and invisible learning barriers in engineering education. In particular, words
used on existing engineering exams may be incorrectly understood by students, and
sometimes students are unaware of their lack of understanding. Words that appeared to be
technical (like “tolerance”), were less likely to be identified as unknown to the student.
Students are therefore less likely to seek assistance for these types of words, because they
think they know what they mean. This describes a “blind spot” where students are falsely
assuming their mastery of technical vocabulary, creating an invisible barrier to learning. In
contrast, words that appeared more cultural (like “bungalow”), were more accurately self-
assessed and therefore are a less significant learning barrier. As described in Section 1.2, the
approach employed was to increase visibility of invisible learning barriers, and as such,
increasing the visibility of learning barriers associated with technical vocabulary. Therefore,
it is particularly important to focus on the development of technical vocabulary as part of the
learning experience, as this data shows that it is a valid starting point for further research in
engineering education.
3.3 DISCUSSION OF STUDY The outcomes of this study showed that vocabulary characterization in engineering
education is a topic area that needs to be investigated further. The analyses of the data
suggested that since students are unable to accurately self-assess their understanding of
vocabulary used in engineering education, there is a need to focus on developing robust
language skills. Here, the goal is making terminology that is in the student’s blind spot, or
areas of inaccurate self-assessment, more visible. Specifically, this study helped scope the
investigation of language in engineering education to vocabulary that appears to be
44
“technical”, as this is an area where students cannot accurately see their deficiencies in
learning.
In the larger scope of the doctoral research, this study focused the research to a
specific type of vocabulary used in engineering education: technical vocabulary. The study
reinforced the need to develop mitigation strategies to identify and eliminate invisible
learning barriers, and connected that to technical vocabulary commonly used in engineering
education. The resulting journal paper also situated the research in the international scholarly
community and set up an expectation to follow up with the development of strategies to
reduce and eliminate learning barriers due to inaccessible vocabulary in this area.
In the context of investigating language in engineering education, this study suggested
a focal point where instructors can concentrate instructional efforts to improve learning in
their classroom. By drawing attention to invisible learning barriers, instructors can educate
students to master course concepts more thoroughly. As mentioned in Section 1.1.1, and the
adapted Johari Window concept, instructors can optimize teaching and learning by increasing
the visibility of learning barriers in education. Now that the study had identified one specific
invisible learning barrier due to language – technical vocabulary – the researcher could now
investigate ways in which to make that barrier more visible to students and instructors.
The student self-assessment study also provides some insight into the methodology
that could be employed to further investigate the language of engineering education. This
study suggested that we investigate the development of an objective approach in
characterizing vocabulary, so that invisible language barriers in engineering education can be
made clear. In general, this study helped to scope the larger problem of investigating
language in engineering education to a set of vocabulary that needs focused attention, while
45
informing a potentially more-reliable approach for further investigation. Specifically, this
study helps inform a strategy that focuses on an identified invisible vocabulary-related
learning barrier prevalent in engineering education, and that is to characterize the vocabulary
present on engineering exams.
46
4 FREQUENCY ANALYSIS STUDY The frequency analysis study was conducted to investigate an experimenter-
independent and computationally-based approach to characterizing language in engineering
education. Chapter 4 is based on this study, which was published in the proceedings of the
American Society for Engineering Education (ASEE), and is reprinted in Appendix A.3.
Specifically, the purpose of this study is to gauge if word frequency and computational
linguistics can be applied to further research in the context of investigating language in
engineering education. Additionally, this study further informs the development of an
automated tool that instructors can use as an identification strategy for invisible language
barriers in this field.
4.1 OVERVIEW OF THE STUDY Investigating frequently and infrequently used vocabulary, by means of word-
frequency analysis, informed a research direction for characterizing language in engineering
education. This approach provided some insight on the issue of inaccessible vocabulary used
in engineering education, and a potential application of an automated approach in identifying
this learning barrier.
As stated in the publication, the question was whether word familiarity could be
correlated with word frequency on engineering exams. The expectation is that familiar
words are used more frequently, and are part of natural “everyday” language. The first step
in this investigation was analyzing the frequency of words on engineering exams [151].
The methodology used the same standardized database of documents used throughout
the research (i.e. final exams), and calculated word frequency for each of those documents.
47
The output was a ranked word-frequency list for each input document. The results were
compared and contrasted with theory from the literature.
As stated in the publication, some analysis techniques in the area of vocabulary
frequency-analysis are presented by C.J. van Rijsbergen, who described using of Zipf’s Law
to understand the statistical distribution of words in language. Zipf’s law states that the most
frequent word in an article of text would appear twice as frequently as the second most
frequent word, and four times as frequently as the third most frequent word, etc. Thus, the
expected result of the frequency analysis, if natural “everyday” language was used, is a
hyperbolic curve with a narrow range of frequently-appearing words and broad range of
infrequently occurring words [151].
The frequency study analyzed undergraduate final exams: a closely-supervised
assessment common in engineering education with a substantial volume of vocabulary to be
used as a database. Nine undergraduate first-year final exams, most taught by engineering
faculty, were used. These exams were:
1. Calculus-I
2. Calculus-II
3. Linear Algebra
4. Physical Chemistry
5. Engineering Strategies and Practice I (Engineering Design and Technical
Communication)
6. Introduction to Materials Science
7. Fundamentals of Computer Programming
8. Electronics Fundamentals
48
9. Mechanics (Statics)
After acquiring electronic copies of the exams, the text contained in each document
was extracted and processed through word-frequency software. This procedure was
completed by manually copying all of the text in each document into a licensed program
called, “Hermetic Word Frequency Counter Advance v.12.45”. This produces data which
can be transferred to Microsoft® Excel spreadsheets for analysis, two of the 9 courses are
shown in Figure 5, with the full dataset presented in Appendix A3 [151].
Figure 5- Shows sample data from the Frequency Study, and is reprinted from the paper in Appendix A.3. This shows that language from a Mechanics (Statics) course, above, more
closely follows a natural language frequency distribution than the language from an Engineering Design and Communication course, shown below.
4.2 DISCUSSION OF STUDY The analysis of the data provided some insights into the characteristics of the
vocabulary used on engineering exams. The analysis focused on three areas,
49
1. the distribution of words on exams;
2. the relationship between the vocabulary used on particular types of exams and natural
language;
3. the relationship between the results and previous literature to understand how a proxy
system for familiarity might be developed.
Investigating the distribution of words on exams shows that the occurrence of words
which people might assume are very familiar, such as “name”, “clear”, or “length”, are not
particularly frequent (nor consistently infrequent). In addition, the data shows that
mathematics exams generally have fewer unique words than other exams. It was found that
the correlation between the word-frequency and word-rank, based on frequency, is not linear
but rather hyperbolic as seen in Figure 5. This follows Zipf’s law, which is a theorem from
computational linguistics that characterizes the frequency of word use in natural language
[152]. In applying existing theory to this study, the data suggests that exams from traditional
“fact and principle”1 engineering courses that are heavily math-based contain a word-
frequency distribution less typical of natural ‘everyday’ language than design courses, for
instance. This data makes sense – exams from design courses may tend to have a greater
amount of contextual information and writing, which is closer to natural language. Calculus
courses, in contrast, may have writing that is less characteristic of natural ‘everyday’
language on assessments. By seeing this distinction replicated in a quantitative format,
namely through frequency analysis of final engineering exams, it suggests a research
direction for further characterizing vocabulary in engineering documents.
1 “Fact and principle courses” is a term coined by L. Dee Fink, to describe “traditional” engineering courses like thermodynamics, electrical fundamentals, etc.
50
The frequency analysis of words in the context of engineering education is a valid but
largely untested area of research in computational linguistics. In the context of van
Rijsbergen’s work [153], the experimenter can see that further permutations of quantitative
measurement must be performed to better characterize vocabulary on documents [151].
Using frequency analysis alone is not enough to discern between different vocabulary types,
other than to gauge whether the document is written using natural language or not.
In order to characterize the kind of vocabulary contained in these documents, more
advanced approaches need to be used – and frequency analysis can be used as a foundation
upon which these approaches can be built [151]. Further discussion suggested that
contextual information is a key element to being able to discern and characterize vocabulary
in documents. Specifically, word frequency alone is not an accurate approach to understand
key words or degree of familiarity, but coupling word frequency with contextual information
about the document and words being analyzed can help better distinguish characteristic
language. The research from this study suggests that comparing individual exams,
differentiated based on discipline, may yield a quantitative understanding of vocabulary.
This comparison may identify how a given exam or group of exams compares to the general
characteristics of vocabulary used in these types of materials.
In general, this study concluded that computational frequency analysis of single-
words is an approach to understanding vocabulary in engineering exams, but that a more
nuanced approach is necessary. Based on the data produced using a simple word-frequency
approach, it appears that both infrequently and frequently used words appeared just as likely
to be used in natural language. Building on the previous study, this shows that frequency-
analysis alone is not enough to identify this specific kind of invisible learning barrier in
51
engineering education. Moreover, this study informs a research direction that builds on
frequency-analysis using a more advanced and context-aware quantitative computational
approach.
At the conclusion of this particular study, it was found that a more nuanced approach
was needed to accurately characterize the vocabulary on engineering exams. Additionally,
the outcomes of this study suggested the inclusion of contextual data to more accurately
characterize vocabulary on exams. This experience was critical to the design of the research
study discussed in Section 5.3.1.
52
5 AUTOMATED INDEXING AND EVALUATION Theory and literature from the area of information retrieval and automated indexing,
(see Section 2.3), informed a methodology that builds on the research studies discussed in
Section 3 and Section 4. The outcomes of the first study - the student self-assessment study
- suggested that technical words can be invisible barriers to accessibility, and need to be
explicitly identified in an instructional environment to promote more accurate vocabulary
learning. This led to the second study – the frequency analysis of words study. The second
study added to the previous knowledge by investigating a computational approach that
compared documents based on frequency analysis, to see if it can be used to quantitatively
characterize words in engineering education. Though the outcomes were not able to discern
technical vocabulary from other types of vocabulary, it was an important starting point to
begin designing a more robust computational approach to characterizing vocabulary.
The two previous studies, and theory from the literature on automated indexing (see
Section 2.3), suggest a methodology that is based on a more advanced approach than just
frequency analysis. Ideally, the goal is to design and evaluate an automated approach that
can mimic subject-matter expertise in identifying discipline-specific vocabulary in
engineering documents, yet remain flexible enough to account for changing language over
time and across contexts. This methodology is a computational approach that identifies
keywords on input documents based on more advanced mathematical word-frequency
calculations by comparing to contextually-relevant datasets. This is the basis for the third
study, the automated indexing and evaluation study.
The automated indexing and evaluation study is organized chronologically. Section
5.1 specifies the input dataset used for this study. Section 5.2 discusses the theory used to
53
characterize vocabulary on engineering exams. This study has two components: the
computational approach (Section 5.3), and the evaluation of the computational approach
(Section 5.4). The results of this study are presented Section 5.5, and discussed in the
context of the literature in Section 5.6.
5.1 ARTIFACTS OF STUDY The input selected for the automated indexing approach are a standardized artifact of
the undergraduate engineering learning environment: written final examinations. This
dataset is the same as that used in the second study, the Frequency Analysis Study (see
Chapter 4 and Appendix B.1-B.2). As mentioned in Section 3.1, final examinations at the
Faculty of Applied Science and Engineering at the University of Toronto are available in
electronic format, with the text being readable by a computer string-recognition algorithm.
Through the use of a PDF-to-TXT program used by the researcher for the second study,
described in Section 4.1, 2254 exams were converted to a text-only format. The researcher
created a text-cleaning software program that eliminated special characters and non-sensical
terms (words with numbers, etc), which is described in Appendix B.1-B.2. The total word
count of these documents are approximately 22.5 million words.
5.2 TF-IDF ALGORITHM AND MODIFICATION The approach used in this investigation is the Term Frequency Inverse Document
Frequency (TF-IDF) algorithm, which mathematically determines key terms on a document,
using a comparator set of documents. This algorithm is discussed in Section 2.3 in the
context of the existing literature, and in papers by the author (the proceedings of CEEA and
ASEE in Appendix A.4-A.7). The TF-IDF equation is written as follows:
TFIDF = TF × IDF
54
where
TF = �# of occurrencestotal # of words
�in a single target document
And,
IDF = log �# of documents
# of documents containing the word �in a set of comparator documents
The TF counts the number of occurrences of a particular word, and divides that
number by the total number of words in the target document, which is a simple measure of
frequency. The IDF is a measure of how characteristic a particular term is within a set of
comparator documents. It is calculated by dividing the total number of documents by the
number of documents in the set which contain at least one instance of that term, and then
takes the logarithm of this fraction. The logarithm, regardless of base, constitutes a constant
multiplicative factor towards the overall score.
The TF-IDF formula multiplies the TF and the IDF together and attaches the resulting
score to each unique word in the target document. A high TF-IDF score means that the word
is characteristic to the target document, whereas a low TF-IDF score means that the word is
not characteristic to that target document. For example, based on Zipf’s Law and the data
shown in the previous section, certain words appear more than once in a document and they
tend to be words like “the” and “a”. These words are likely to appear just as frequently in the
comparator set. As a result, the TF-IDF score for these words tend to be low.
The comparator set of documents used for the IDF score can be selected based on:
year of the exam, discipline, instructors, etc. The choice of comparator set changes the TF-
IDF score. Consequently, one benefit of this method is that new exams added to the dataset
55
affect the TF-IDF scores of existing words. This helps address the issue associated with
evolving language by reducing the stagnation of a vocabulary list that can occur with a
constantly-aging dataset.
The TF-IDF value is dependent on the degree to which the TF and IDF terms are high
or low. The TF is high when a term is frequently used in a target document. The TF is low
when the term is infrequently used in a target document. The IDF, by contrast, is high when
a word is infrequently used in a comparator set, and low when a word is frequently used in
the comparator set. Summarizing, the TF-IDF score is:
1. highest when a word is frequently used in a target document, but infrequently
used in the comparator set. This means that the word is characteristic to the
target document;
2. middling when a word occasionally appears in the target document and also
occasionally in the comparator set;
3. lowest when a word is commonly used in the comparator set regardless of its
frequency in the document set. This means that the word is not characteristic
to the target document.
A vector space model is one perspective that can help interpret the mechanics of the
TF-IDF statistic. Multiple documents in a collection can be viewed as a compilation of
vectors in a vector space – each term has one axis [154]. This means that information about
the order of words in a document is lost; words are treated independently of where they may
occur in a sentence. This is referred to as a “bag of words” model in the literature [155-157].
This is in contrast to a Boolean model, which basically identifies whether a term appears or
does not appear. In TF-IDF, these sets of vectors are compared to one another using both
56
magnitude and angle [140, 158]. In the frequency-analysis study (see Chapter 4), this vector
space model perspective can be used to say that only the magnitude difference between each
word vector was calculated. That study did not provide much information with respect to
comparing individual words, because it does not incorporate a sense of context (direction)
but rather just magnitude. In using TF-IDF, the vector space perspective incorporates angles
between each of these individual word-vectors. This angle is developed by calculating the
term frequency word by the inverse document frequency of that word based on a comparator
set of documents. By extension, this means that the TF-IDF statistic gives more information
about two vectors, compared to the previous study which just compared magnitude. The
additional contextual piece of information, the user-defined comparator set of documents, is a
feature which helps to better define the characteristics of words. Using this perspective, the
TF-IDF algorithm distinguishes vocabulary within documents and thus should theoretically
provide more discriminating information about characterizing vocabulary on documents than
by using frequency alone.
The TF-IDF approach was experimentally coded and tested for a sample dataset.
However, it was found that the basic TF-IDF algorithm did not appear to characterize
vocabulary on engineering exams as well as we wanted. To remedy this, an alternative
algorithm was designed to improve the existing TF-IDF algorithm to provide more resolution
which could be used to better distinguish the vocabulary on the input documents.
5.2.1 Modification of the TF-IDF Algorithm The modification of the existing TF-IDF algorithm uses different comparator sets to
extract and amplify the characteristic vocabulary of comparator sets within a domain. The
novel approach calculates the TF-IDF score using exams in the same discipline as one
57
comparator set and contrasts that with the TF-IDF score produced using all engineering
exams as another comparator set. The difference between these scores for each word
increases the resolution, and possibly accuracy, in finding discipline-specific vocabulary.
The procedure is:
1. Compare each word in the target document to all documents in engineering, minus
those that are in the same discipline, and generate a TF-IDF score.
2. Compare each word in the target document to all documents within the same
discipline as that input document, and generate another TF-IDF score. This should
distinguish terms that are characteristic to that course.
3. Subtract the two scores for the word, and then repeat for all words in the target
document.
This method generates three wordlists – one from each of the above contexts and the
difference. The first wordlist should theoretically highlight terms that are characteristic of
the discipline. This is because the target is compared to documents external to its discipline
but still within the same domain, engineering. The second wordlist should theoretically
highlight terms that are characteristic to the target document. This is because the target
document is compared to documents within the same discipline. Subtracting the two terms
should yield a third term that theoretically highlights characteristic discipline-specific
vocabulary on the target document.
This produces a list where words that are both course-specific and discipline-specific
are given a high score, whereas all other types of words are given a lower score. This
modified algorithm of the TF-IDF equation can be expressed as:
58
TFIDF = TF × IDF
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇1 − 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇2
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = 𝑇𝑇𝑇𝑇(𝑇𝑇𝑇𝑇𝑇𝑇1 − 𝑇𝑇𝑇𝑇𝑇𝑇2)
Where,
IDF𝑋𝑋 = log �# of documents
# of documents containing the word �in a set of comparator documents
Where, x denotes the context. Specifically subscripts 1 and 2 would represent context #1 and
context #2 respectively, and TF is the same for both because the target document is the same.
Since TF remains the same when using the same input document, we can say that:
𝑇𝑇𝑇𝑇𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = 𝑇𝑇𝑇𝑇𝑇𝑇1 − 𝑇𝑇𝑇𝑇𝑇𝑇2
Incorporating the variables expands the equation into:
𝑇𝑇𝑇𝑇𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = log�𝑇𝑇𝐸𝐸𝑇𝑇𝐸𝐸,𝑊𝑊
� − log�𝑇𝑇𝐷𝐷𝑇𝑇𝐷𝐷,𝑊𝑊
�
𝑇𝑇𝑇𝑇𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = log �𝑇𝑇𝐸𝐸 • 𝑇𝑇𝐷𝐷,𝑊𝑊
𝑇𝑇𝐸𝐸,𝑊𝑊 • 𝑇𝑇𝐷𝐷�
And this produces the equation for the modified TF-IDF algorithm:
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = TF • log �𝑇𝑇𝐸𝐸 • 𝑇𝑇𝐷𝐷,𝑊𝑊
𝑇𝑇𝐸𝐸,𝑊𝑊 • 𝑇𝑇𝐷𝐷�
Where,
𝑇𝑇𝐸𝐸 = # documents in engineering, minus discipline 𝑇𝑇𝐸𝐸,𝑊𝑊 = # documents in engineering, minus discipline, containing the target word 𝑇𝑇𝐷𝐷 = # documents in discipline 𝑇𝑇𝐷𝐷,𝑊𝑊 = # documents in discipline containing the target word
59
This is discussed in a paper by the author (reprinted in Appendix A.5). Based on the
proposed modification, the expected performance of the resulting algorithm is as follows:
1. 𝑇𝑇𝐸𝐸 • 𝑇𝑇𝐷𝐷,𝑊𝑊 is large when there are many documents in the discipline containing the
target word. This results in the IDFmod becoming larger, which then amplifies the
TFIDFmod value. In particular, this means that the word frequently occurs in the
discipline but infrequently in all of engineering, which implies it is likely discipline-
specific.
2. Conversely, 𝑇𝑇𝐸𝐸,𝑊𝑊 • 𝑇𝑇𝐷𝐷 is large when there are many documents in engineering
containing the target word, but not necessarily in the target discipline, then IDFmod
will get smaller. This reduces the TFIDFmod value. This means that the word occurs
frequently in engineering but is not necessarily unique to the discipline, which
implies it may not be discipline-specific.
Although the input words are being treated with a “bag of words” model, the
calculations incorporate two different contexts, in the form of comparator sets of documents,
to help characterize the vocabulary. This is one step closer towards developing a future
strategy that incorporates word meaning into the automated approach in characterizing
discipline-specific vocabulary on an engineering exam.
By introducing a modification that amplifies the range of potential TF-IDF scores, the
resulting list appears to identify characteristic discipline-specific vocabulary on engineering
exams. Due to the iterative application of the TF-IDF equation across different comparator
sets, the following behaviours should occur. First, when there is a word that has a high term
frequency in a document, while occurring frequently in the discipline but not in all of
60
engineering, then the modified approach would boost the score of that word. Second, if that
word does not occur frequently in the discipline but is common to engineering, then this
approach should lessen its score. Therefore, the amplifying effect of the iterative approach
preferentially affects words that are characteristic of the target document and discipline.
5.3 COMPUTATIONAL APPROACH To test the whether the modified TF-IDF approach can be used to identify
characteristic discipline-specific technical vocabulary on engineering exams, the researcher:
1) Coded the modified TF-IDF algorithm into a software program that produces
ranked word lists from target documents (engineering final exams)
2) Tested the quality of the generated lists using subject-matter experts.
The first process is heavily computer programming-based. The first major
programming component is to develop input conditioning software that prepares the exams
for calculation. The second major programming component is to develop the software that
processes each input document using the modified TF-IDF algorithm. These components
will be discussed in Section 5.3.1, with a more detailed discussion in Appendix B.1 and C.1,
and the code itself replicated in Appendix B.2 and C.2.
The second process is based on interviewing human subject-matter experts, and is
presented in Section 5.4. This aspect of the research evaluates the efficacy of the
computational approach in replicating human expertise in identifying characteristic
discipline-specific vocabulary. In this step, eleven faculty members scored words, and these
scores were correlated to the outputs of the first step to measure efficacy of the modified TF-
IDF algorithm.
61
5.3.1 Software Development This section describes the first major component of this study – developing the
software necessary to prepare the input documents for electronic analysis, and to perform the
TF-IDF calculations. A graphical representation of the methodology is shown in Figure 6.
PROCESSINGPreparing and cleaning
text
Engineering Exams
Raw
Te
xt
ORDERINGRanking characteristic keywords using TF-IDF computational method
Across EngineeringComputing an
exam to all exams in Engineering
(TFIDF1)
Within Subject Computing an exam to all exams in the
Same Discipline(TFIDF2)
Wor
dlist
Wor
dlist
DIFFERENTIATINGExtracting discipline-specific language used
specifically in that engineering course(TFIDF1-TFIDF2)
Wor
dlist
POST-PROCESSINGEliminating duplicates, sorting in decreasing order of TFIDF score
ADOBE ACROBAT X
TF-IDF (Visual Basic .NET)
MS EXCEL
Figure 6 - Shows the major components of the computational approach in chronological order, starting at the top. This is reprinted from the paper in Appendix A.6.
62
5.3.1.1 Document Preparation There are a total of 2254 unique electronically-available final exams that were used in
this study. The exams were created between 1999 and 2011 (inclusive). The exams span all
of the engineering disciplines at this institution, including: aerospace, biomedical, chemical,
computer, electrical, industrial, materials, mechanical, and mining engineering. The exams
cover a breadth of engineering disciplines and years of study, from first-year to senior-year
courses. Discussion about why this artifact of the engineering learning environment was
chosen, is gen in Section 3.1.
The input documents came in a variety of formats, so they needed to be converted
into machine-readable form. Specifically, the exams were in one of the following file
formats: .jpg, .bmp, .PDF, .doc, .docx, and .txt. Each input file needed to be standardized
into one common format for usability. To accomplish this, each of the input files was first
converted into a widely-used document container format called “Portable Document Format”
(PDF). This file format is commonly used to represent documents in a manner independent
of application software, hardware, or operating system. The PDF format also encapsulates
document properties including different fonts, graphics, and the information needed to
display them. Originally a proprietary standard introduced by Adobe® Systems Inc., this file
format is widely-used today to ensure that documents are accurately replicated for the reader
as intended by the original author(s). The majority of input documents used for this study
were already in PDF format.
Since the documents collected were over the span of 12 years, which coincide with
the diffusion of the PDF format as an industry standard, there were improvements in the later
years’ PDF documents that were not necessarily present in the documents created in the early
63
2000’s. Specifically, newer PDF documents contain embedded metadata that can include
text-only information, security features, better rendering, etc. All files in the dataset were
converted into the most-recent version of the PDF-standard to ease the processing. Files not
already in PDF format were converted using a freeware package called PDFCreator, which is
an online-accessible PDF conversion tool. All of the files were then arranged into folders
corresponding to discipline and course name.
Once all of the documents were in PDF format, the researcher developed a program
that extracts only the meaningful text from each file and places that extracted text into
another file. The majority of the input files did not have computer-selectable text. The
researcher employed the Optical Character Recognition (OCR) tool of Adobe Acrobat X® to
render the PDF files with a text-based layer above each word in the file. This tool takes what
it thinks are characters on an ASCII table, and assigns the most relevant one to each character
in the input document. This creates a file that has user-selectable text.
This process, however, has some faults: first, though the vast majority of characters
chosen by the OCR tool are accurate, there are some instances where errors may occur. For
example, the word “Indiana” may be overlayed with “Ind1ana”, if the file is poor quality.
This produced some words that contain nonsensical terms in the dataset. These, and other
types of problematic character groupings were removed by a filter created by the researcher
and discussed in Appendix B.1, the code for which is reprinted in Appendix B.2.
This program was created after multiple iterations using the object-oriented
Microsoft® VB.NET programming language, and can be tuned in the future for improvement.
The additional purpose of this program is to extract the text from each PDF-input file, create
a unique text file (.txt) with the same name as the input document, and insert the extracted
64
text into this new file. Therefore, each PDF-input file has a corresponding text-only file, and
these were used as the input for the TF-IDF program.
5.3.1.2 Coding for the TF-IDF Calculations After input conditioning, the modified TF-IDF algorithm was applied to the data
using a researcher-developed software program written in Visual Basic.NET. This program
assigns a score to each word in a user-specified input text file (i.e. target document). This
score, as discussed in Section 5.2 and specifically in Section 5.2.1, is a measure of how
characteristic each word is to a target document, when compared to others in user-defined
comparator sets.
Each word in a target document is assigned to multiple dynamically-memory
allocated matrices, multiple double-length integers, and then assigned a score which is
computed using the modified TF-IDF algorithm discussed in Section 5.2. This process
utilizes two large comparator sets of documents: engineering exams of the same discipline,
and engineering exams of dissimilar disciplines. The program developed is discussed in
greater detail in Appendix C.1, with the code reprinted in Appendix C.2.
5.3.1.3 Post-Processing of Output The output from each runtime of the TF-IDF program is written to a file called
“output.txt”. This file contains all of the words from the input exam (target document), as
well as their corresponding TF-IDF scores. The main component of the post-processing is to
import this data into spreadsheet software, specifically Microsoft Excel®. This allows the
researcher to automatically remove duplicates, sort, and perform mathematical calculations
which cannot be performed directly in the text-file. This process is discussed in Appendix
C.1.
65
The Excel program is used to automatically remove redundant pairs of cells from the
full list. By using the “remove duplicates” command, any pairs of cells that occur more than
once, yet contain the same pair of information, are automatically removed. This leaves us
with only unique entries; each word used in the input document appears only once in the
wordlist, along with its corresponding TF-IDF score.
5.3.2 Results using the Modified TF-IDF Algorithm Initially, the researcher compared the “traditional” unmodified TF-IDF method to the
modified method. The researcher also began to explore the kinds of words that appeared to
preferentially have higher TF-IDF scores (see Appendix A.4). Later work investigated the
modelling of the TF-IDF scores graphically, to compare disciplines, courses, and years of
study to one another (see Appendix A.5-A.6). Additionally, some of the later work was a
collaborative effort with communications instructors at the University of Toronto, who
situated the application of this modified TF-IDF approach in classroom learning, and oral
presentations (see Appendix A.8-A.9). Six peer-reviewed conference papers were produced
from these data, including three presented at the Canadian Engineering Education
Association (CEEA) annual conference, and three presented at the American Society of
Engineering Education (ASEE) annual conference. These papers are reprinted in Appendix
A.4-A-9.
A condensed example of a third-year Materials Science Engineering wordlist
generated by the code is shown in Table 3 below.
66
Table 3 - Shows a sample wordlist from a second year Materials Science Engineering course
Rank Word Modified TFIDF Score
1 dislocation 0.046749 2 dislocations 0.016992 3 cry 0.016379 4 grain 0.015939 5 crystal 0.014845 6 stress 0.013639 7 material 0.011965 8 strength 0.010907 9 deformation 0.008955 10 creep 0.008446 11 partials 0.008165 12 ofll 0.007426 13 intermetallic 0.007198 14 subgrain 0.007193 15 tensile 0.007181 16 metallic 0.006853 17 gb 0.006749 18 hardening 0.006659 19 boundaries 0.006414 20 hallpetch 0.006259 21 crss 0.00569 22 composite 0.005598 23 strengthening 0.005518 24 elastic 0.005376 25 lattice 0.005137 … 200 fact 0.000435 … 350 able -0.000104 … 450 equals -0.001426
67
The sample wordlist in Table 3 shows what a typical output from one instance of the
computational program looks like. The first column shows word rank in decreasing order of
TF-IDF score. The second column are words found on that exam. The third column shows
the TF-IDF score produced using the modified TF-IDF algorithm discussed in Section 5.2.1.
This sample wordlist shows the top 25 ranked words, and then snapshots at 200, 350, and
450 words, which are representative of the entire wordlist. This data is reproduced from the
ASEE publication located in Appendix A6. The list shows the prevalence of discipline-
specific vocabulary near the top of the wordlist. This kind of output is typical of other
courses in other disciplines and years as well. Other key observations from Table 3 include
the tendency of words near the top of the list to follow an exponential decline of modified
TF-IDF score as the rank increases, and how the modified TF-IDF score changes from
positive to negative as the rank increases. This shows us that there are terms on the target
document that are characteristic of the discipline, namely ones with high scores. As these
scores near zero, we can interpret this as words that are just as frequent in the discipline as
the larger dataset. Words with a negative score are more frequent in the larger dataset than in
the discipline. As expected, words with a low score do not appear to characterize the input
document nor do they appear to be discipline-specific. This is consistent among all of the
wordlists created using the computational approach. More thorough discussion of these
wordlists is given in the publications reprinted in Appendix A.4-A.7.
The data produced by the computational approach can also be plotted to graphically
depict the behaviour of TF-IDF scores. A visual representation of the complete wordlist for
the same third-year Materials Science Engineering course, Fracture and Failure of Materials,
is seen in Figure 7. The horizontal axis represents word rank (low rank means low TF-IDF
68
score), while the vertical axis represents the TF-IDF score computed using the method
described in Section 5.1.
Figure 7 - Shows the TF-IDF scores for a sample course, Fracture and Failure of Engineering Materials. The horizonal axis represents rank of the word along the wordlist, while the vertical axis represents modified TF-IDF score for each word. This is reproduced from the publication reprinted in Appendix A.6.
The data appear to exhibit a declining slope with increasing rank. The rate of change
of the slope declines until a plateau region appears, as seen in the sample case above, Figure
7, between word #60 and just prior to word #400. This long horizontal section represents
words that are just as frequently occurring in the discipline as they are in all of engineering.
These words are labelled “uncharacteristic” terms because of how similarly prevalent they
are regardless of document. Figure 8 below shows an approximate distinction between these
“regions” on the graph. This observation is discussed in the publications in Appendix A.5-
A.6.
69
Figure 8 - Shows approximately the three regions on the graph that correspond to areas where the TF-IDF score is (1) amplified, (2) not amplified nor suppressed, and (3) suppressed.
The observed behaviour of TF-IDF scores remains largely consistent among all of the
courses that were processed through the modified algorithm. This is true across different
courses, disciplines and years of study. The differences, however, include where the graph
“drops off” at the end, and this changes based on the course because there are a different
number of words used in each course. Specifically, the number of words controls how many
datapoints there are, and the higher the number of words, the farther the graph extends. The
following figures, Figure 9 and Figure 10 show datasets compared against different
disciplines, and also the same discipline across different years. These figures are
representative of all datasets across this study, and can be used to show similarities and
differences among vocabulary use across courses.
70
Figure 9- Shows the Materials Science Engineering (MSE) exam when compared to exams from Biomedical Engineering (BME) and Aerospace Engineering (AERO)
Figure 10 - Shows a comparison across different years of Materials Science Engineering exams
71
The data, as seen in Figure 9 and Figure 10 above, show that the TF-IDF scores
appear to follow similar behaviour as the word rank increases. This is discussed in greater
detail in the publications reprinted in Appendix A.5-A.6. It appears that the first hundred
words on the wordlists produced for each course are generally characteristic of the course
being examined, and the discipline. Furthermore, the plotted data show that this observation
is consistent across different disciplines, courses, and years of study. This means that the
modified TF-IDF algorithm results appear to be repeatable regardless of the exam, just as
long as the comparator sets are appropriate for the exam chosen. When these data are viewed
in the context of the words themselves, as in Table 3 for example, it appears that words that
have higher TF-IDF scores are discipline-specific. This observation is uniform across
different datasets as well.
We have made an assumption, based on theory and reasoning, that the words which
have a high modified TF-IDF score are characteristic of the discipline of the document being
processed. However, there are several different courses, disciplines, and years of exams that
are contained within this large dataset of final exams. The researcher himself is not an expert
in identifying characteristic vocabulary across such a rich dataset. So it is important to test
this assumption with individuals who are best-suited to gauge the validity and reliability of
these wordlists in capturing discipline-specific vocabulary. This leads to the second part of
this study, where subject-matter experts are presented with words that appear on final exams
for their course, and ask them to rate how discipline-specific those words are. These ratings
are then compared to the ratings produced by the computational approach. If there is
measureable and statistically significant agreement, then it becomes valid to suggest that the
72
computational approach can replicate subject-matter expertise in identifying and
characterizing discipline-specific vocabulary.
5.4 EVALUATION STUDY The goal of this study is to evaluate whether the computational method is capable of
identifying discipline-specific vocabulary on existing engineering final exams. In this study,
subject-matter experts were tasked with identifying discipline-specific vocabulary from the
same set of sample documents used in the computational study. By observing a correlation
between these two approaches, we can measure if and how well the computational approach
works.
Faculty members who set the exam are the most comprehensive subject-matter
experts for this dataset. These individuals are the ones best suited to assess whether an
external aid, like the software program, is able to accurately identify the discipline-specific
vocabulary used in the exam. The participants in this study are current faculty members of
the Faculty of Applied Science and Engineering at the University of Toronto.
A diagram of the study is shown in Figure 11. This shows the three major
components of the study – pre-processing, processing, and post-processing – with a brief
description of main steps. This study will be published as a paper for the 121st American
Society for Engineering Education Annual Conference, to be presented at the Engineering
Research and Methods division in June 2014. A pre-print is located in Appendix A.7.
73
Heading
ETHICS REVIEW
RECRUITING PARTICIPANTS
(Existing) Wordlist for
a course
TRAININGHaving participants understand
what the study is about
Scor
ed
Wor
dlis
t
COMPILING RESULTSInputting participant scores
onto Excel spreadsheets
RECRUITING PARTICIPANTS, and PREPARING
WORDLISTS
TRAINING, CALIBRATION, and ADMINISTERING
STUDY
ANALYSES
PREPARING WORDLISTS- Assign quintiles
- 100 Word Sample
SCHEDULING
CALIBRATIONHaving participants understand the 5-
point scale
100 WO
RD LIST
Given a word, assign a scoreGiven a word, assign a score
Given a score, suggest a word
ADMINISTER SURVEY
ANALYZING & CORRELATIONSComparing participant scores to TFIDF
scores to gauge efficacy of computational approach
Full 5pt scaleFull 5pt scale“5” and “1” only
EVALUATE SOFTWARE-BASED APPROACH TO FIND CHARACTERISTIC DISCIPLINE-SPECIFIC VOCABULARY
Figure 11 - Shows the major components of the study that evaluates the efficacy of the computational approach using subject-matter experts. This is reproduced from the paper in Appendix A.7.
74
This section will outline the major steps of the methodology, which have been
discussed previously in the paper reprinted in Appendix A.7. A wordlist was created by the
researcher for each of the 10 exams by:
1. Processing a complete full-length wordlist using the computational method,
2. Ranking the words in decreasing order of TF-IDF score,
3. Splitting the list into five “equally-sized” bins called quintiles, and
4. Selecting words from each bin until a 100-word sample wordlist was created.
There was a bias in the word selection: more words in the higher TF-IDF bin were
added to the list than other bins. The 30 highest TF-IDF ranked words were placed into the
top bin, with each subsequent bin containing 30, 20, 10, and 10 words, respectively.
Separately, 11 participants (i.e. faculty members) were recruited. Each participant
was trained to use a 5-point ranking scale. This scale is used to quantitatively measure the
degree to which the expert thinks a word is discipline-specific. Unknown to the participant,
this scale is intended to map to the five bins which were used to split up the full wordlist. A
word in quintile #1 should have a participant-ranked score of #1, and so on for each quintile.
The alphabetized 100-word sample wordlist was then given to the participant, and they rank
each word using the 5-point scale. The participant returned the completed wordlist to the
researcher, and the researcher checked for statistical correlation between quintile and
assigned-score using a statistics package called IBM SPSS v21.
The steps presented are discussed in greater detail below, and in the paper reprinted in
Appendix A.7.
75
Step 1: Ethics Review - An ethics protocol was submitted and approved by the Board
of Ethics at the University of Toronto
Step 2: Recruitment - Eleven participants were recruited using email. These people
are current faculty members at the Faculty of Applied Science and Engineering at the
University of Toronto, who have taught at least one course that has an exam contained in the
repository of exams available to the researcher. A sample recruitment document is presented
in Appendix D.1.
Step 3: Input Selection - Ten exams developed by the recruited faculty were
identified. These represent a breadth of engineering disciplines and years of study. The goal
was to maximize the diversity of the exams used for the study. One of these exams was from
a design-course. Two of the faculty members already recruited had taught that course and
were subject-matter experts this area.
Step 4: Input Preparation - Eleven 100-word sample wordlists were created.
Specifically, the complete wordlist from each participant’s course were first evaluated and
ranked in decreasing order of TF-IDF score. This process is discussed in the Section 5.3.1.
Then, this wordlist was split into one of five partitions, called quintiles. Each quintile
contains roughly the same number of words. The top quintile, #5, contains words that have
the highest TF-IDF scores, whereas the lowest quintile, #1, has a roughly equal number of
words with the lowest TF-IDF scores. The next step was to downsample this full list of
words to 100-words for the survey. Words from each quintile were selected until 100 words
were chosen in total. Since the goal is to gauge whether high-scores contain domain-specific
jargon, the number of words selected from higher quintiles (e.g. high TF-IDF scores) were
selected more often than lower quintiles. Specifically, the words for the 100-word sample
76
survey were selected as follows: 30 words from top (#5) quintile, 30 words from second-
highest (#4) quintile, 20 words from middle (#3) quintile, 10 words from each of the lowest
(#2, and #1) quintiles. The resulting wordlist was alphabetized to rearrange words
irrespective of TF-IDF score, and stripped of TF-IDF scores.
Step 5: Interviewing, Calibration - Each participant was scheduled for a 1-hour
meeting. At the meeting, the participant was provided with an “Informed Consent”
document, as seen in Appendix D.2. This required form was explained by the researcher,
and signed by each participant in the study. A copy of this form was given to the participant.
The study was briefly explained to the participant. This exercise reaffirmed the goal and
purpose of the research, and emphasized the importance of providing thoughtful input. The
experimenter used a semi-scripted approach to discuss the importance of using accessible
language in the classroom, and the critical need of subject-matter expertise to evaluate the
effectiveness of an automated tool to help do so. The participant was told that they would be
provided with a randomized list of 100 words, extracted from final exams of courses they
have instructed in the past. The participants were also told that these exams were gathered
from a publically-available online repository of existing final exams, intended to be used for
study and research purposes.
A brief calibration exercise preceded data collection. The participant was given a
print-out of the scale, and given words orally; usually about 5 words. Some of these words
were pertinent to the course and discipline, and some were not. For example, words like
“magnetism” would be pertinent to an Electrical Engineering course, whereas words like,
“green” and “walk”, would be more general. The participant briefly discussed what they
77
would score these words, and after they were confident in using the scale, the study
progressed.
Step 6: Interviewing, Instructions - The participant was presented with the 100
alphabetized words in a two-column spreadsheet layout, with the first column having one
word per cell element, and the second-column blank. The participant was asked to enter a
numerical score into the blank element next to each word. The scale is provided in
Appendix D.3. The number assigned is a representation of how discipline-specific that word
is, and how critical it is for a student to understand that word in the discipline, according to
the participant. If a word is critical to the course, discipline, or both, then the participant
would rate that word highly according to the 5-point scale provided to them. If the word is
neither critical to the course nor the discipline, then the word would be scored a low value. A
sample survey is available in Appendix D.4. The corresponding full-length wordlist from
which that was created is shown in Appendix D.5.
Each participant was given as much time as they needed, but all completed the survey
within 30 minutes. The experimenter did not leave the room for this part of the study even
though he suggested it as an option – none of the participants said they felt affected by the
presence of the experimenter in the room. There was no conversation as the participant
completed the survey, even though the experimenter offered to provide clarification at any
time.
Step 7: Interviewing, Debriefing - After completing the study, the participant was
thanked for their time and debriefed. After the survey was collected by the experimenter, the
participant was given a hardcopy of the complete wordlist for their course, ranked in
decreasing order of TF-IDF score. In addition to this wordlist, each participant was offered a
78
hardcopy of a short academic paper explaining the TF-IDF program development process
(written by the experimenter), and reprinted in Appendix A.5. The purpose for doing so was
to provide additional background about the study, and to encourage participants to provide
additional feedback about the study itself. To date, no additional new feedback was received
other than positive feedback about the purpose and execution of the study.
Step 8: Tabulation of Data - After the data were collected and the participants were
debriefed, the data was manually entered into computer spreadsheets by the researcher.
There are four columns in the resulting spreadsheet for each exam – a column containing
each of the 100-words, a TF-IDF score column, a quintile-ranking column, and a participant-
score column. A sample spreadsheet is shown in Section 5.5.2, as Table 5.
Step 9: Analysis of Data - The data was then analyzed to compare the participant-
assigned scores and the TF-IDF scores. The goal is to understand if any correlations exist
between the subject-matter expert’s ranking of words and the modified TF-IDF
computational approach developed by the experimenter. The data was analyzed by
observing if there are significant differences between the participants and the computational
approach. Using the 5-point scale that each participant used to score each word, these
participant responses were compared to the corresponding quintile bins, which grouped
words of similar TF-IDF scores. To perform the analyses, the experimenter used:
spreadsheet software (Microsoft Excel® 2013), and a software-based statistics package (IBM
SPSS version 21). The results are discussed in the following section.
5.5 RESULTS OF THE AUTOMATED INDEXING STUDY The results presented in this section supplements the conference papers reprinted in
Appendix A.3-A.7. The collection of exams selected as the dataset for this part of the study
79
is presented in Section 5.5.1. A sample of the data collected from one trial is discussed in
Section 5.5.2. A broader collection of data collected from all trials are presented in Section
5.5.3. The statistical analyses including correlations and data reliability measures are
presented in Section 5.5.4. These statistical analyses include Pearson’s R, Spearman’s
Correlation, Pearson’s Chi-Square, and Cronbach’s Alpha. Though some of these measures
might be considered redundant, the multi-disciplinary nature of this study suggests showing
statistical methods appropriate to all of the disciplines which intersect this investigation. One
special case is used to highlight the limitations of the program, and the data resulting from
this evaluation is presented in Section 5.5.5.
In addition to the results presented here, more detailed data are included in Appendix
E. These include individual case-by-case statistical analyses of each of the courses used in
this study. All of these results are then discussed in the context of the other studies and
literature in Chapter 6, with an explicit set of novel scholarly contributions presented in
Chapter 7.
5.5.1 Courses Selected for this Study The researcher has developed complete word lists for over 30 different courses in
engineering, but only a subset of 10 courses were used in this part of the study, as presented
in Table 4 below. The 10 exams were chosen based on breadth and year study within
engineering. The intent was to choose courses so that all engineering disciplines at the
University of Toronto would have representation, from a sample across first-year (freshman)
to fourth-year (senior) undergraduate studies. The left column shows the course name and
course code, the second column shows the discipline that exam belongs to, the third column
80
shows the comparator set used to process the exam, and the fourth column identifies the year
of study in which this exam was administered.
Table 4 - Shows the course exams used for the evaluation study
Course Name Discipline Comparator Set Year of Study
Engineering Strategies and Practice I (x2 participants) APS111
Common across Engineering
APS Applied Practical Science
1
Advanced Reactor Design CHE412
Chemical Engineering CHE Dept. of Chemistry and Chemical Engineering
3 and 4
Construction Management CIV280
Civil Engineering CIV Dept. of Civil Engineering
2 and 3
Electrical Fundamentals ECE110
Common across Engineering
ECE Dept. of Electrical and Computer Engineering
1
Electric and Magnetic Fields ECE221
Electrical and Computer Engineering
ECE Dept. of Electrical and Computer Engineering
2
Introduction to Psychological Science for Engineers MIE242
Industrial Engineering MIE Dept. of Mechanical and Industrial Engineering
2
Operations Research I MIE262
Industrial Engineering MIE Dept. of Mechanical and Industrial Engineering
2 and 3
Design and Analysis of Software Systems MIE350
Industrial Engineering MIE Dept. of Mechanical and Industrial Engineering
3 and 4
Introduction to Materials Science MSE101
Common across Engineering
MSE Dept. of Materials Science and Engineering
1
Mechanics CIV100
Common across Engineering
CIV Dept. Civil Engineering
1
5.5.2 Sample Dataset for One Trial Sample results for one trial of the study are shown in Table 5 below. This table is
reproduced from a conference paper written by the researcher and reprinted in Appendix
81
A.7. The table shows the results for a first-year electrical fundamentals course, and the
correlation between the TF-IDF binned-quintile scores and the human-participant scores.
The first column, on the left, shows the rank of each word in decreasing order of TF-IDF
score. The second column shows the word, while the third column shows the TF-IDF score.
The fourth column from the left shows the score out-of-five assigned by the human subject-
matter expert, and the last column shows the quintile-bin into which the word was sorted
based on TF-IDF score. The rows are colour-coded to match the quintile number. Two
correlations (explained in more detail in Section 5.6.1) are presented at the top-right corner
of the table. The correlation at the top, in yellow, is between the participant-assigned score
column and the quintile-bin column. The one immediately below shows the correlation
between those same columns, but only for the words in quintiles ‘1’ and ‘5’.
Table 5- Shows a sample trial wordlist from a first-year electrical fundamentals course.
RANK (/100) WORD TF-IDF SCORE
PARTICIPANT-ASSIGNED SCORE (/5)
QUINTILE-RANK (/5)
1 circuit 0.033323128 5 5
100-word CORRELATION (Using full 5-pt scale): 0.7165
2 voltage 0.015487884 5 5
3 electric 0.014911103 5 5
100-word CORRELATION (Using only extremes of 5-pt scale): 0.9272
4 capacitor 0.009280436 5 5 5 resistor 0.00906219 5 5
40 result 0.000262347 3 4 41 motor 0.000260432 3 4 42 discontinuous 0.000254686 3 4 43 tesla 0.000239045 5 4 44 deactivated 0.000227847 3 4 70 associated 0.000121868 3 3
82
5.5.2.1 Observations from the Trial There are several items to note in Table 5. First, the words near the top of the list
appear to be more discipline-specific than the words near the bottom of the list. This
observation is validated by the human subject-matter expert – the faculty member who taught
this course.
The data show that the correlation is dependent on whether the full 5-point scale is
used or if only the ‘1’ and ‘5’ quintile bins are used. If the full scale is used for the
correlation, it is 0.717. However, if only the 1 and 5 quintiles are examined, the correlation
is much higher, at 0.927. A perfect agreement between the computational approach and
human subject-matter expert would mean that the correlation is ‘1’. The observation about
using the extremes of the scale to achieve a higher correlation is typical for all eleven trials.
The data suggests that there is a good correlation between the computational approach
and the human participant for the case shown above. So, in the context of the sample course
71 respectively 0.000121452 1 3 72 half 0.00011827 1 3 73 results 0.000117417 3 3 74 losses 0.000112727 4 3 81 cannot 2.31533E-05 1 2 82 indicate 2.30839E-05 3 2 83 generated 2.05447E-05 3 2 84 difficulty 2.03236E-05 1 2 85 right 1.88357E-05 1 2 91 inside -9.57969E-05 1 1 92 variety -0.000101296 1 1 93 of -0.000115615 1 1 94 at -0.000124816 1 1 95 place -0.000125485 1 1
83
chosen above, it appears that the computational method is very likely to accurately identify
discipline-specific vocabulary on this particular target document.
5.5.3 Summary of Quantitative Results for All Courses
5.5.3.1 Summary of Participant scores Figure 12 below shows the data collected from the participants as plotted on a bar
chart. This chart summarizes and presents information about which participant scores were
assigned to which quintile, across all courses. The horizontal axis along the top shows the
computed quintile score for the words. The horizontal axis along the bottom shows the
participant scores for the words. The vertical axis represents the total number of times that
quintile/participant-score combination was recorded.
The data clearly shows that the high quintile words are frequently ranked a ‘5’ by the
experts. Furthermore, a quintile score of ‘1’ is frequently scored a ‘1 by the subject-matter
experts. The experts also frequently ranked the words in intermediate quintiles – ‘2’, ‘3’, and
‘4’ – with a score of ‘1’. This means the experts displayed a tendency to rank words in
quintile ‘5’ very highly (i.e. ‘5’), but scored essentially all of the other words very low (i.e.
‘1’), i.e. the participants employed a bimodal scoring strategy. They under-utilized scores of
‘2’, ‘3’, and ‘4’, and over used a score of ‘1’. This might suggest bias that over-emphasizes
the low score of ‘1’ over intermediate scores.
84
Figure 12 - Shows the relationship between quintile and participant-assigned scores. The top horizontal axis is the quintile score, and the bottom horizontal axis is the participant-assigned score. The vertical axis represents the count for the number of times that combination was made.
5.5.3.2 Summary of Participant Scores versus Computational Scores The data for each of the 11 trials are shown on Figure 13, and Figure 14. Both of
these figures show the distribution of scores across all exams, including APS111 which was
used twice. Both figures are structured the same, but whereas Figure 13 shows the subject-
matter expert scores, Figure 14 shows the TFIDF binned-quintile scores. The horizontal axis
shows the course code of the exam. Each of the coloured bars represent the participant-score
Participant Score
85
(Figure 13) or the quintile score (Figure 14). The colours of these bars represent whether
the score is a ‘1’ (blue), ‘2’ (green), ‘3’ (beige), ‘4’ (purple), or ‘5’ (yellow). The height of
these bars, the vertical axis, is the count of the number of times that score was assigned.
Figure 13 shows that the blue bars are almost always the highest bar for each course.
This again shows that participants are over-utilizing a score of ‘1’, and under-utilizing the
intermediate numbers.
A more in-depth look at Figure 13 shows that MIE262 and MIE350 have a
considerable number of words that were ranked a ‘1’ by their subject-matter expert. These
courses are “Operations Research 1” and “Design and Analysis of Software Systems”, and
subsequently also have among the lowest correlation values between the computational
approach and subject-matter expert (see Appendix E). While the first course has a large
design component, the second uses very specialized sets of symbols in that exam. The
former case is investigated in more detail using a proxy first-year design course case study
presented in Section 5.5.5, while the effects of the latter can be explained by poor input
conditioning of words used in the exam. This input conditioning step, as discussed in
Appendix B.1, is incapable of accurately processing exams with symbols and special
characters, and this likely leads to an inaccurate wordlist, resulting in the high count of ‘1’
scores.
Figure 13 also shows that the second highest count occurs when the participant-
scores are a ‘4’ or ‘5’. This suggests that participants are actively finding discipline-specific
vocabulary on their wordlists. This also corresponds to Figure 14, where ‘4’ and ‘5’ scores
also have a high count. A visual comparison shows that there is agreement between the
86
computational scores and the human subject-matter expert scores for these cases, and is a
visual representation of correlation (discussed quantitatively in Section 5.5.4).
87
Figure 13 - Shows the count of participant scores for all exams grouped by exam
Figure 14 - Shows the count of TFIDF binned-quintile scores for each exam
Participant Score
Quintile Score Count vs. Quintile Score
Count vs Participant Score
88
The appearance of a double-sized APS111 dataset on Figure 13 and Figure 14 is
because that course was scored by two subject-matter experts. When the data for this course
is compared to the other courses, a high count of ‘1’ scores on Figure 13 suggests that both
experts have found minimal discipline-specific vocabulary in that course. Figure 14,
however, would suggest that the majority of words should be ranked a ‘4’ or ‘5’. This
difference is further investigated in Section 5.5.5.
5.5.4 Statistical Analysis Statistics are used to measure the correlation between the calculated quintile scores
and the scores assigned by the human subject-matter experts. Tables 6-10 below is a
collection of tables that show several statistical methods used to present correlation.
The symmetric measures, Table 6, shows the Pearson’s R value as 0.570, and the
Spearman Correlation as 0.578, which are both significant. These clearly show that there is
strong agreement between the computational approach and the scores assigned by the
experts. These statistics are widely-used in the literature to describe the quality of
experimental results, and a correlation of 0.570 is considered an acceptable degree of
correlation.
Pearson’s Chi-Square statistic, Table 7, also shows a significant result for the 1100
cases, Table 8. This measure signifies that the chance of participants selecting a random
number for each word is extremely unlikely. It shows us that the participants used
considerable judgment in assigning scores to words and that the scores they assigned are
unlikely to be due to chance.
Cronbach’s Alpha (Table 9) and the Kruskal-Wallace test (Table 10) are used for
additional statistical validation. The Cronbach’s Alpha statistic assesses internal consistency,
89
and returns a value of 0.714, which is an acceptable result. The independent-samples
Kruskal-Wallace Test, for a 95% confidence interval, also returns a consistency result that is
congruent with Cronbach’s Alpha, clearly demonstrating that the results are not because of
chance.
By using all of these measures, the data quantitatively demonstrate a clear and
convincing correlation between the computational approach and the human subject-matter
experts. The test statistics presented are widely-used in related fields to test data and
measure similarities. When applied to this research, these statistics show that the data are
valid and also experimentally consistent to deduce that the computational approach works to
characterize discipline-specific vocabulary on engineering exams.
Table 6 - Symmetric Measures Value Asymp. Std. Errora Approx. Tb Approx. Sig.
Interval by Interval Pearson's R .570 .019 22.965 .000c
Ordinal by Ordinal Spearman Correlation .578 .021 23.465 .000c
N of Valid Cases 1100
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
c. Based on normal approximation.
Table 7 - Chi-Square Tests
Value df
Asymp. Sig. (2-
sided)
Pearson Chi-Square 482.834a 16 .000
Likelihood Ratio 501.727 16 .000
Linear-by-Linear Association 356.597 1 .000
N of Valid Cases 1100
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is
9.10.
90
Table 8 - Case Processing Summary
N %
Cases Valid 1100 100.0
Excludeda 0 .0
Total 1100 100.0
a. Listwise deletion based on all variables in the
procedure.
Table 9 - Reliability Statistics
Cronbach's
Alpha
Cronbach's
Alpha Based on
Standardized
Items N of Items
.714 .726 2
Table 10 - Hypothesis Test Summary
5.5.5 Special Case: the statistical effect of using a design-heavy exam The quantitative and numerical data for all of the exams, along with their course-
specific correlations (Pearson’s R, etc.), are shown in Appendix E. These data suggest that
some courses, in particular ones that have high technical content, have much higher
correlations between subject-matter experts and the computational method. Conversely,
91
courses that have a comparatively high design-content or are intended to cut across
disciplines appear to have a lower correlation between participant scores and quintiles from
the computational method.
In order to better characterize this observation, the researcher investigated a design-
heavy course using two different subject-matter experts – both of whom have taught this
course recently. This special case was used to check if the issue of low correlations was due
to:
1. the target document representing a design course (see Section 5.5.5.1);
2. the participants scoring the study; or,
3. due to the comparator set being used.
The researcher first compared the statistical correlation across all courses studied
(Pearson’s R = 0.570), and again a second time, but now excluding the APS111 design-heavy
course (Pearson’s R = 0.587). The implications of removing APS111 from the dataset is
shown quantitatively using Pearson’s R, Spearman’s Correlation, Pearson’s Chi-Square, and
Cronbach’s Alpha on Tables 11-14 below. When comparing to Tables 6-10, we see that the
Spearman’s Correlation increases from 0.578 to 0.598, Pearson’s Chi-Square remains
significant, and Cronbach’s Alpha increases from 0.714 to 0.728. This shows that almost all
statistics improve when the design-heavy course is removed from the dataset.
Table 11 - Symmetric Measures (APS111 omitted)
Value Asymp. Std.
Errora Approx. Tb Approx. Sig.
Interval by Interval Pearson's R .587 .020 21.731 .000c
Ordinal by Ordinal Spearman Correlation .598 .023 22.357 .000c N of Valid Cases 900 a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
92
c. Based on normal approximation.
Table 12 - Chi-Square Tests (APS111 omitted)
Value df
Asymp. Sig. (2-
sided) Pearson Chi-Square 435.246a 16 .000 Likelihood Ratio 453.649 16 .000 Linear-by-Linear Association 309.833 1 .000 N of Valid Cases 900 a. 0 cells (0.0%) have expected count less than 5. The minimum
expected count is 6.70.
Table 13 - Case Processing (APS111 omitted)
N % Cases Valid 900 100.0
Excludeda 0 .0 Total 900 100.0
a. Listwise deletion based on all variables in the
procedure.
Table 14 - Reliability Statistics (APS111 omitted)
Cronbach's Alpha
Cronbach's Alpha Based
on Standardized Items N of Items
.728 .740 2
5.5.5.1 Course-specific Correlations The increased correlation and reliability statistics suggest that the computational
approach is less effective at processing design-heavy exams than traditional “fact and
principle” courses (e.g electrical fundamentals). Table 15 below shows a condensed form of
the course-by-course correlations presented in Appendix E.1. The left column shows the
name of the course, the middle column states whether that course is design-based, and the
right column shows the Pearson correlation between the participant score and the quintile
93
score. This table shows that design courses have a tendency of having lower correlations
than non-design courses.
Table 15 - Shows that design courses appear to have lower correlations than non-design courses. Extracted from data presented in Appendix E.1
Course Code Design-based Course? (Yes/No)
Pearson Correlation (participant score vs. quintile)
APS111 – Participant #1 Yes 0.511
APS111 – Participant #2 Yes 0.503
CHE412 Yes 0.642
MIE262 Yes 0.452
MIE350 Yes 0.319
CIV100 No 0.625
CIV280 No 0.621
ECE110 No 0.717
ECE221 No 0.749
MIE242 No 0.717
MSE101 No 0.785
5.5.5.2 Inter-rater Correlation In order to sample whether participants affect the scores assigned to words, the researcher
measured the correlation between two participants scoring the same exam using the same
comparator set. Results from this analysis are shown in Table 16 below. This figure
presents the APS111 exam with the word shown in the first column on the left, the score
assigned by participant #1 in the second column, and the score assigned by participant #2 in
the third column. This is a 10-word snapshot of the larger list, presented in Appendix E.2.
Pearson’s R correlation between participant #1 and participant #2 is 0.68, suggesting that the
low scores are not due to differences in people scoring the exam.
94
Table 16 - Shows a sample of the scores for the APS111 exam, as assigned by two different participants. The correlation for the full list is 0.68.
WORD Score Assigned by Participant #1
Score Assigned by Participant #2
accessibility 4 2 administered 2 1 allocation 2 1 allowed 4 1 alternatives 5 5 aurora 1 1 automobile 2 1 biogas 4 3 brainstorming 5 5 caffeinated 2 1
5.5.5.3 Effect of Comparator Set on TF-IDF values Quintiles may not be an accurate measure of how discipline-specific a word ought to
be. The TF-IDF value, however, may be a better indicator of this. The analysis performed in
this section observes the misalignment between TF-IDF value and quintile score. It also
checks if this misaglinment can be characterized by the comparator set being used to
compute the TF-IDF scores.
Quintiles are used to group TF-IDF scores together. The highest TF-IDF values on a
wordlist are assigned a higher quintile, and the lower TF-IDF values are assigned lower
quintiles. There may be instances where the highest TF-IDF value of one exam is much
lower than the highest TF-IDF value of another exam. However, the quintiles for top scores
on both exams will be the same. This causes a problem when participants score the exam.
For the case of the APS111 design course, both subject-matter experts had a tendency
of assigning low scores to most words. This happened even though the majority of words
95
were ranked high (‘4’ and ‘5’) according to the quintiles. This may indicate the presence of a
misalignment between the TF-IDF values and the quintiles assigned to each word. One way
of checking this misalignment is by processing two exams through the same comparator set
and observing the difference, if any, that this makes to the TF-IDF value.
The comparator sets being used are APSC (applied practical science, which is a
collection of exams not specific to any discipline), and ALLENGR (all exams in
engineering). The data produced from this procedure are two wordlists having different
words and TF-IDF scores, as presented in Table 17 below. The table shows sample words
from the top quintile for two courses, ECE110 (electrical fundamentals), and APS111
(Engineering Strategies and Practice I). The ECE100 course has been processed once
‘normally’ using its existing comparator set (Electrical Engineering, EE), and again using the
APSC comparator set. The first and second column show the ECE110 course, with words in
decreasing order to TF-IDF score using the EE and APSC comparator sets, respectively. The
right column shows the APS111 course, computed using the APSC comparator set, with
words in decreasing order to TF-IDF score. This table is used to see the kinds of words that
appear near the top of each of the lists, and their corresponding TF-IDF scores.
The data on Table 17 shows that the electrical fundamentals course does not appear
to have discipline-specific words when using the APSC comparator set. Instead, the words at
the top of the list appear to be ‘general’ instead of discipline-specific to electrical
engineering. This is different than the wordlist produced using the Electrical Engineering
(EE) comparator set, as shown in the first column. Specifically, the APSC comparator set
appears to have eliminated the presence of discipline-specific words for the ECE110 course.
Additionally, the ECE110 wordlist has slightly lower TF-IDF scores (magnitude of 10-3)
96
using the APSC comparator set compared to using the EE comparator set (magnitude ranges
from 10-2 to 10-3, see Table 5). The low TF-IDF scores of the ECE110 exam are also similar
to the low TF-IDF scores on the APS111 exam. It is plausible that the low overall TF-IDF
scores for both exams might be a result of using the APSC comparator set of documents
instead of using a more discipline-specific comparator set.
Table 17 – Shows the top quintile when the ECE110 course wordlist is developed using ECE, and the APSC comparator sets of documents, and when compared to the APS111 course.
ECE110 Using ECE
Comparator Set
ECE110 Using APSC
Comparator Set
APS111 Using APSC
Comparator Set circuit 0.033323 fundamentals 0.003368 aps 0.006177 voltage 0.015487 sourc 0.003010 dichloromethane 0.006044 electric 0.014911 source 0.002984 ethic 0.005329 capacitor 0.009280 question 0.002224 stake 0.00452 resistor 0.009062 part 0.002107 identify 0.004097 current 0.007712 rint 0.001989 ethics 0.00402 switch 0.005743 print 0.001845 human 0.003983 power 0.005706 vo 0.001539 life 0.003967 magnetic 0.005631 field 0.001397 residuals 0.003941 inductor 0.004922 printed 0.001372 lemessurier 0.003846
Both the APS111 exam and the APSC comparator set were not a part of specialized
engineering disciplines. Instead, they fall under a category of undergraduate engineering
exams that are “general engineering”. It appears that the modified TF-IDF algorithm had
difficulty in detecting characteristic vocabulary on APS111 using this comparator set.
Additionally, this approach produced a wordlist that was unable to characterize the
vocabulary accurately, and this was reflected in the low correlation assigned by both subject-
matter experts.
97
5.6 DISCUSSION OF STUDY This section builds on the discussion presented in the 2013 and 2014 ASEE and
CEEA papers reprinted in Appendix A.4-A.7, and is specific to the studies presented in
Section 5.3 and Section 5.4. This section uses the results presented in Section 5.5 and
discusses the findings in more detail.
5.6.1 Correlation The results from Section 5.5 show that the computed scores and the expert-assigned
scores are correlated. Correlation indicates that there is agreement between multiple sets of
data. Though there are different measures of correlation, e.g. Pearson’s R and Spearman’s
Correlation, the idea is to use this statistic to quantitatively identify similarities and
differences among groups of data. This section will examine cases where the correlation was
high, and when it was low.
The goal of this study is to have the highest correlation possible, because this
indicates that the computational approach is successful in identifying characteristic
discipline-specific vocabulary on exams. As shown in Table 6, the overall correlation
between human subject-matter experts and the computational approach is 0.570.
5.6.1.1 Design Courses Some courses have a higher correlation between computed score and expert-assigned
scores than others do. Results that show course-specific correlations are located in
Appendix E1, and Table 15. The breadth of courses enables us to see the effects of
individual exams on the overall correlation between the computer-assigned scores and
human-assigned scores. One such effect is that exams that are design-based appear to have
lower correlations than traditional “fact and principle” courses, like electrical fundamentals.
98
The implications of the comparator set on TF-IDF score can be quantitatively
explained using the modified algorithm presented in Section 5.2.1. If design-based courses
have less common vocabulary among each other, then that means the 𝑇𝑇𝐷𝐷,𝑊𝑊 value is small.
This impacts the TF-IDF score by reducing the effect of the IDF part of the equation. By
extension, this means that comparator sets having common vocabulary is essential to
increasing the accuracy of the modified TF-IDF algorithm used to characterize discipline-
specific vocabulary.
5.6.1.2 Scoring Data about inter-rater correlation and participant scores can be investigated together
to better evaluate the computational approach. The inter-rater correlation using the Pearson
R statistic is 0.68 (as shown in Table 16). This suggests that participants agree with one
another. In addition, the count of ‘1’ and ‘5’ scores are much higher than the counts for
intermediate scores (see Figure 13). Also, the correlation between participant-score and
quintile is higher when only the quintile scores of ‘5’ and ‘1’ are considered (as shown in
Section 5.5.2). This may suggest that the subject-matter experts had a tendency of being
dichotomous when evaluating whether the computational approach works or not.
5.6.2 Implication of Results on Empirical Process This study substantiated the importance of calibrating the subject-matter experts.
Initially, the first participant was tasked with scoring the words in the study using an
electronic version of the 100-word sample survey. The robustness of the 5-point scale
calibration strategy was tested using this medium and it was found that the full resolution of
this scale was not consistently used when scoring the words. As a result, the experimenter
adjusted the strategy to focus more attention on the resolution of the scale, in turn promoting
a greater spread of responses. The improved strategy included changing the study from
99
being electronic-based to in-person based. This helped control the contact time that each
participant had with the surveys, while promoting two-way communication to clarify
instructions as necessary. This strategy may have contributed to the making the data reliable
(as seen in Section 5.5.4). The results from each of the exams show that the reliability of
using the scale appears approximately consistent even though there are multiple participants
each scoring different exams. Extending this to more exams would likely produce similar
results, as the 10-exam dataset used is a representative sample of the existing engineering
curriculum at this institution.
5.6.3 Insight on Potential Impact on Teaching and Learning All participants commented positively on the usefulness of providing an explicitly-
defined vocabulary list to their students, and the practical benefits of designing a tool to help
in this regard. Some participants were more inclined to use these lists in their courses than
others. Instructors of design courses suggested that the word lists produced for their courses
did not appear to be ideally suited to capture discipline-specific vocabulary, and as such may
be less likely to be used as a teaching aid. The instructors of more traditional “fact and
principle” engineering courses, however, reacted more positively to these lists, commenting
on the high accuracy with which their list captured discipline-specific language.
In general, this part of the research suggests that a novel approach based on a
modified TF-IDF keyword-search algorithm can be used identify characteristic vocabulary
on most engineering exams, and is likely to be used by most instructors as a teaching tool for
traditional “fact and principle” engineering courses.
100
6 DISCUSSION This chapter discusses the outcomes and implications of all three studies (Chapter 3,
4, and 5) and presents three major contributions resulting from this work (Section 6.1, 6.2,
and 6.3). These contributions are framed using the three research questions articulated in
Chapter 1, and discussed in the context of the literature presented in Chapter 1 and
Chapter 2. This chapter also builds on the existing discussion presented in the papers
written by the author and reprinted in Appendix A.1-A.9.
Three research questions, articulated in Chapter 1, are used to frame the discussion
of the results and the major contributions. These research questions are:
1. Do language-related learning barriers exist in engineering education?
2. If language-related learning barriers exist, can they be characterized?
3. Can an effective strategy be found or developed to assist in the identification and
characterization of these learning barriers?
A combination of literature analysis and empirical study were used to answer these
questions. The first research question was largely addressed by the first study (see Chapter
3), whereas the second and third questions were addressed using the studies presented in
Chapter 4 and 5. The answers to these questions are presented in the form of contributions.
6.1 RECOGNITION OF ENGINEERING VOCABULARY AS AN ACCESSIBILITY BARRIER The outcome of the study presented in Chapter 3 answers the first research question.
Literature from the areas of disability studies, higher education, and second-language
learning all contributed to the development of a research program to address this question
(see Section 1.1, 2.1, 2.2, and Appendix A.1, A.2).
101
Final exams are a representative artifact of the engineering learning environment.
They need to be clearly written so that all students can understand and respond to questions
as accurately and precisely as possible. Using foreign, fossilized, and misunderstood
vocabulary on such assessments may decrease clarity in communication, especially if this
vocabulary is not an explicit learning outcome of the course [55, 63, 98].
Research shows that inaccessible vocabulary is present on engineering final exams,
and that this language falls into a student’s ‘blind-spot’ (see Chapter 1 for discussion of
Johari’s Window and blind-spot). The research study presented in Chapter 3 and published
in the journal paper reprinted in Appendix A.2 show evidence to support this claim.
Specifically, the research shows that multiple undergraduate students of diverse backgrounds
and post-secondary educational levels are unable to accurately self-assess their mastery of
vocabulary on summative assessments. This suggests that students are not understanding
vocabulary as well as they ought to be, and are sometimes unable to accurately gauge their
learning. Therefore, vocabulary learning in engineering education falls into a student’s blind
spot, and can reduce their capacity to learn and master appropriate technical vocabulary. If
students are unable to see this obstacle to understanding, as discussed in the literature
reprinted in Appendix A.1, then this demonstrates the presence of an invisible learning
barrier. Hence, the research finds that engineering vocabulary is an invisible barrier to
accessibility in engineering education.
102
6.2 CREATION OF AN APPROACH TO IDENTIFY CHARACTERISTIC DISCIPLINE-SPECIFIC VOCABULARY
IN ENGINEERING EDUCATION
6.2.1 The Role of Technology in Vocabulary Characterization Technology can be used as a tool to analyze and measure different aspects of
language. This research shows an approach where frequency analysis of vocabulary is tested
to help characterize differences in language use. According to authors in the field of
keyword generation, software tools are becoming more integral to language analysis due to
the computational advantages of accuracy, precision, and timeliness [113, 116, 123-126, 132,
137, 139]. With more advanced hardware, for example, it appears that computers are
becoming more able to compute differences in language to a greater extent. The literature
states that though frequency analysis and the TF-IDF vocabulary characterization approach
can be used to measure different aspects of language, they can be employed using a
computational strategy [102, 124-126, 132].
6.2.2 Empirical Contribution In the context of this research study, the successful design of an approach to identify
characteristic discipline-specific vocabulary depends on this approach being feasible and
accurate. Feasibility refers to the capacity for processing data quickly, with minimal effort,
and reliably. Accuracy refers to the proximity of the results to a true value [105], where in
this case it would be a vocabulary list carefully developed by a subject-matter expert. For
this dissertation, the computational process designed and deployed was able to systematically
score words using a modified algorithm based on the TF-IDF equation, and this was
reproducible across datasets as well as correlated to subject-matter expertise.
103
The computational process systematically scored words on a large dataset within a
reasonable amount of time. In this study, the process computed TF-IDF scores by cross-
calculating several million words across multiple comparator sets of documents. It is a
feasible process because it takes minimal user input and can now take less than an hour to
produce a wordlist. Comparing this to a manual approach, a person would take much longer
to perform the same task. Overall, the computational strategy demonstrates increased
efficiency over a “manual” approach for processing quantitative aspects of vocabulary, e.g.
word frequency, in multiple documents across large datasets.
Employing a computational vocabulary analysis approach over a manual human-
based strategy may reduce accuracy. An individual may characterize vocabulary based on
prior experience or subject-matter expertise, and this is not computable by the program.
Language is a field that is affected by human-interpretation. So, there are aspects of
language that are not accounted for by the computational approach. For example, the student
self-assessment study of Chapter 3 is sufficiently broad to also examine detailed
characteristics of inaccessible language. Additional accuracy can be achieved by
investigating: the order of words, presentation of text, graphical elements, and so on. Similar
arguments can be made for the depth of other studies in this dissertation as well. However,
the focus of this dissertation is to identify and describe an approach that characterizes text on
documents, and is feasible given a reasonable trade-off with accuracy.
As shown in the results (Section 5.5), the data shows acceptable correlation between
the outcomes of the computational approach and the human subject-matter experts. This
correlation, and also the respective reliability measures, demonstrate an appropriate balance
between feasibility and accuracy. The data shows that it is possible to reasonably mimic
104
subject-matter expertise to a significant degree with a technology-based computational
strategy to characterize vocabulary in engineering education. With respect to the research
question, this dissertation shows that vocabulary barriers in engineering education can begin
to be addressed using a computational approach.
6.3 IMPLICATIONS OF THE APPROACH ON TEACHING AND LEARNING
6.3.1 Converging Perspectives from the Literature Characterizing vocabulary in engineering education using an assistive tool has
implications for reducing learning barriers in the classroom. By looking at the overlap of
literature in several areas, the effect of this tool becomes increasingly clear. Literature in
good educational practice that have clearly defined outline learning outcomes can lead to
more usable and higher-quality learning [23, 53, 55, 159]. In parallel, literature in the area of
accessible design shows that clear instructions lead to greater usability of a product or service
[54, 61, 160]. In areas related to public policy, frameworks for accessibility indicate that
increasing accessibility may also increase inclusivity [49, 56, 161].
6.3.2 The Development of Teaching Aids The results of this research can be used to develop teaching aids that can reduce a
specific learning barrier in engineering education. According to the research performed, it
appears that the learning barrier associated with technical vocabulary can be reduced by
explicitly making this vocabulary visible to students. This research can help instructors
develop wordlists that they can distribute to their students to actively promote the
development of a robust professional vocabulary. These wordlists can explicitly show the
105
student the vocabulary that they need to master, thereby increasing the visibility of the
learning barrier, and helping students become aware of what they need to know.
Using the concept of the blind-spot from the Johari Window, the outcomes of the
research demonstrate an approach that can decrease an invisible learning barrier to increase
what is teachable and learnable. If students are given the requisite special vocabulary for a
course (invisible made visible), then they are better equipped to learn that vocabulary and be
assessed on their grasp of it (increasing teachability and learnability of vocabulary).
6.3.3 Producing a Research-based Artifact of the Application of UID In this research, Universal Instructional Design (UID) is used to motivate the
development of a tool to increase accessibility in the engineering learning environment [23,
49, 53, 159]. The outcomes of this research satisfy many of the principles outlined in UID.
Using UID for this investigation contributes empirical evidence to the existing framework in
the context of engineering education. Table 18 shows an overview of how the research
outcomes address each of the principles of universal instructional design.
Table 18 - Shows the implications of the research using the framework of Universal Instructional Design
Universal Instructional Design Principle
Addresses (+) / No Change (NC)
Comment
Class Climate + Supports differences in communication and provides a way to converge corpora of technical vocabulary
Interaction + Encourages effective interaction between students and instructors around the use of technical jargon
Delivery Methods + Promotes multi-modal learning by not prescribing methods in which to learn the vocabulary identified
Information Resources and Technology
+ Encourages instructors to produce and use engaging, accessible course material
106
Assessment + Instructor and student have clearer understanding of mastery being assessed
Feedback + Design explicitly defines mastery of vocabulary
Accommodation + (Systemic) Based on universal access, not individualized accommodation strategy
Physical Environments and Products
NC Design does not affect physical characteristics of the learning environment
Table 18 shows the implications of the research on the engineering learning
environment. The largest impact that this research could have on an existing engineering
classroom is that it helps recognize that all students have a diverse corpus of vocabulary.
This recognition is a step forward in respecting student differences in the classroom, and
potentially has the effect of making students feel more comfortable and inclusive while being
different. Specifically, this research approach embraces diversity while promoting technical
vocabulary development.
Increasing clarity of requisite vocabulary has the effect of promoting clearer
communication between instructor and student, as well as between students. By extension,
increasing use of accurate technical vocabulary in one discipline may promote refinement of
one’s existing corpus of engineering language. This could, in addition, promote vocabulary
learning across disciplines due to overlapping terms. Therefore, clarifying requisite
vocabulary could lead to greater and higher-quality interaction in the classroom.
Clarifying requisite technical vocabulary and learning outcomes can also increase
flexibility in understanding course concepts. The instructor can now use advanced technical
terminology naturally during teaching. As an element of course material, students can seek
clarification from textbooks and other learning aids as well.
107
This technical vocabulary can be used to create engaging and accessible course
material and learning resources, delivered in an inclusive manner, based on authentic
contexts. Specifically, the language being used by the instructor will be more consistent with
the language that students are expected to know, resulting in increased quality of instruction.
In the context of STEM learning, specifically engineering, this vocabulary knowledge and
learning is key to the technical nature of the profession [97-99, 149, 150]. Also, instead of
guessing at whether students are correctly identifying and learning discipline-specific
vocabulary themselves, having this vocabulary explicitly-identified places the onus on the
student to learn. As such, the instructor can use these terms in the classroom knowing that
the students are aware that they will be asked to demonstrate their mastery of using them on
an assessment. This leads to another area where this research affects the learning
environment, student assessment.
Since the computational approach generates wordlists of characteristic discipline-
specific vocabulary, these wordlists can be employed for assessing student performance. In
particular, the instructor can use authentic contexts using this advanced vocabulary while
knowing that students will have mastered them prior to the assessment being administered.
This improves the robustness of an engineering assessment instrument as the test is now
assessing what it purports to assess, and that is mastery of course concepts.
6.3.3.1 Systemic Accommodation Strategy As this research is based on the foundations of Universal Instructional Design, the
tool is designed to increase accessibility for the greatest number of individuals and is not to
be considered an individual accommodation strategy. By generating a discipline-specific
wordlist that is the same for all students in the classroom, all students can take advantage of
108
explicitly-identified technical terms they need to know. This does not take the specific
previous understanding of each student into account, but rather serves to create a common
baseline that all students are expected to see.
The developed approach does not focus on accommodating individual-specific
learning. The same wordlist can be given to all students in that class, and all students are
expected to be familiar with the words on that list. By contrast, an individual strategy would
give personalized lists to each student and this would require a thorough understanding of
each learner’s characteristics and knowledge base. Individualized wordlists may be a
resource-intensive exercise especially if the class size is large. Therefore, developing an
institutional systemic strategy for promoting technical vocabulary learning by explicitly-
identifying learning outcomes is perhaps a more favourable approach in this context.
The implications of the research do not appear to improve the physical characteristics
of the learning environment. The only physical outcome of this research study would likely
be the wordlists themselves, which can be distributed in an electronic form. This may
increase accessibility for students using technology for accommodation (e.g. screen readers)
to more easily access the course vocabulary.
6.3.3.2 Implications of a Research-based Artifact of UID The literature suggests a framework for accessible instruction that supports second
language learning using an automated indexing-based software tool. The multi-disciplinary
research strategy employed here identifies learning barriers, tests an approach, modifies this
approach to incorporate multiple comparator sets, and is then evaluated in the context of
engineering education by experts.
109
The outcomes of the research build on the existing literature and combine approaches
from different domains to produce a strategy used to increase the visibility of learning
barriers. By identifying and making learning barriers associated with inaccessible language
visible, instructors will have explicitly identified some learning outcomes for their students.
The wordlists generated from this research process will enable students to more accurately
identify and understand characteristic discipline-specific vocabulary. By using this systemic
approach, students have the opportunity for better understanding the course material.
With respect to the research question, the results clearly show that a computational
strategy can be designed, prototyped, and successfully validated. This strategy increases the
visibility of learning barriers in the classroom by identifying and characterizing discipline-
specific vocabulary, and can be used to deploy teaching aids that increase accessibility to
engineering education.
110
7 CONCLUSIONS This dissertation contributes to the improvement of learning environment design by
developing a process for characterizing one particular learning barrier present in engineering
education. The research discusses the design and evaluation of a computational approach
used to identify characteristic discipline-specific vocabulary on engineering final exams.
Contributions include recognizing engineering vocabulary as an accessibility barrier; creating
and validating an approach to characterize that barrier; creating course-specific learning
materials that can be directly used; and successfully applying the theory of universal
instructional design (UID) to the development process.
7.1 RESEARCH CONTRIBUTIONS
1. Contribution to Theory Developed an application of UID theory; the application addresses inaccessible
vocabulary identification, learning, and evaluation in engineering education. This
helps to strengthen the framework by providing a successful case study.
2. Contribution to the Design of Learning Environments Designed and validated a modified algorithm based on the Term Frequency-Inverse
Document Frequency (TF-IDF) equation to detect technical vocabulary.
Built a prototype software program to generate wordlists of characteristic discipline-
specific terms, usable as teaching and learning aids. This:
1. Increases visibility of vocabulary-related learning barriers;
2. Is usable by instructors to explicitly clarify learning outcomes; and,
3. Promotes the development of a robust discipline-specific vocabulary
111
3. Important Findings Identification of three areas for future review:
o measuring detailed characteristics of inaccessible vocabulary,
o investigating vocabulary on alternate instructional instruments, and
o expansion of software capabilities.
Undergraduate engineering students often over-rate their level of understanding of
technical vocabulary.
Characteristics of language on design-heavy exams are different than language used
on more traditional “fact and principle” engineering exams.
Carefully selecting comparator sets for use with the TF-IDF algorithm changes the
value of the scores, in turn changing the arrangement of words on the wordlist.
4. Contributions with respect to recommendations for future practice A software framework that can be refined to include additional functionality to
investigate language in engineering education, and improve technical language
acquisition.
7.2 LIMITATIONS There are five main limitations of the research described in the dissertation: feasibility
versus accuracy, difficulties with respect to measurement, in-situ vocabulary (meaning
disambiguation), and inclusion of new words.
1. Feasibility vs. Accuracy The research conducted for this dissertation attempted to navigate a tradeoff between
accuracy and feasibility. A highly accurate wordlist can be produced by a human subject-
matter expert, given enough time and resources. The subject-matter expert would have to
carefully consider each word for inclusion, and sort through a large quantity of vocabulary.
112
This process may need to be repeated with each course, as the evolution of language may
require new vocabulary to be introduced and old vocabulary to be retired. The computational
approach greatly reduces the strain on the instructor to produce the wordlist, and can produce
a wordlist in a reasonable amount of time. Though, due to the mechanical nature of
computation rather than involving the expertise of the instructor, the computational method
can be less accurate. As such, whereas the subject-matter expert favours accuracy, the
computational approach favours feasibility. This is discussed further in Section 6.2.2.
2. Difficulties with respect to Measurement Measurement-related challenges included establishing criterion for measurement and
isolating problems with respect to input conditioning. Understanding and measuring
implications of learning barriers is difficult because of the variability in assessing meaning
due to student differences. For example, assigning an Observed Understanding score to
students (see Chapter 3 and Appendix A.2) is a subjective measure and depends on the
researcher’s ability to map a length-constrained student response to a standard provided by a
dictionary. Furthermore, the transfer of words from different data files into a text-only
format introduces foreign artifacts due to the optical character recognition algorithm. Each
artifact is unique, and so developing a filter to clean the text file needs to incorporate as many
potential permutations as possible.
These limitations were somewhat addressed using an integrative approach (e.g.
multiple participants, multiple comparator sets, etc.) and adjusting the input conditioning
algorithms to increase the accurate conditioning of input vocabulary (removing words with
irregular ASCII characters).
113
3. Single-word processing Prevailing concerns about single-word understanding centre around the generation of
meaning from neighboring words, the structure of language, polysemous vocabulary, etc.
This dissertation instead focuses on whether words can be correctly understood if shown in a
“bag of words” (BOW) model. The assumption is that words in a BOW model are less
accessible in isolation than when used as a phrase or sentence, because additional meaning
can be derived from structural placement. This does not allow for more accurate and precise
understanding of the prevalence of the learning barrier. However, this is somewhat
addressed by the computational method because all words are equivalently considered as
single-words. Since all structural information for all words is lost uniformly, it is assumed
that all words have an equivalent chance of being inaccessible. Using the BOW model errs
on the side of caution by assuming that words are most inaccessible when seen in isolation.
The program does not address the situation of two or more neighboring words
combining to produce a technical concept. An example of this is the phrase “fibre bundle”,
where the individual words may not be discipline-specific, but the word group is. The
program is currently not able to recognize when a set of words that appear together represent
a single concept.
4. Human Intervention Reducing the full wordlists for each course into a 100-word survey given to each
participant may have introduced bias. Though the larger wordlist was binned into quintiles,
and the number of words chosen from each quintile for each exam was the same, the
researcher was responsible for selecting the words from each quintile for transfer into the
survey. Due to this selection, the researcher may have inadvertently inserted or eliminated
words due to subjectivity.
114
The 100 words could have been chosen randomly, but then the list may have
contained non-word artifacts that pollute the word sets rather than actual words. The
automated filtering process used to prepare the word lists reduces this pollution considerably
and the instructor using a word list would have a relatively easy time deleting the few
remaining “junk” terms from the list before providing it to their students. Given this, it
seemed unproductive to allow the “junk” terms to enter the 100 word list presented to the
subject-matter experts so the researcher selected the words from each quintile to go into the
survey.
In this study, the researcher had prior knowledge of Materials Science Engineering,
and therefore the word list chosen for MSE101 may have more inherent subject matter
expertise inserted into the word selection than the other courses. In an ideal case, the subject-
matter expert would have been given the complete wordlist as produced by the computational
method – but, these wordlists are several hundred words long, and pruning this list into a
smaller one is necessary for the evaluation component of the study.
5. New words inclusion The manual addition of new words to the repository of exams is currently the only
way to account for evolution of language over time. The specific limitation is that the word
must be added to the repository in the context of an exam for it to be calculated with the
same degree of consideration as the existing words. Specifically, there is currently no
method for incorporating a single word into a repository. This is a consequence of the TF-
IDF algorithm utilizing comparator sets and the prevalence of that word within that
comparator set to produce a valid characterization of the vocabulary. This computational
method, however, relies a relatively up-to-date set of comparator documents. If the
115
comparator sets get stale because new docs are not added to the dataset, or old docs are not
retired from the set, then the tool is apt to become increasingly poor.
7.3 IMPLICATIONS FOR FURTHER RESEARCH Future research should increase the accuracy with which discipline-specific technical
vocabulary is characterized. The goal would be to first understand detailed characteristics of
learning barriers due to language, including meaning disambiguation using structural and
graphical elements, and then apply them to improving the TF-IDF processing strategy. One
area of further study could be to characterize differences between the TF-IDF scoring and the
participant-assigned scoring, perhaps using additional interviewing of subject-matter experts.
Another research area could be to develop a strategy to mathematically model the TF-
IDF curves shown in Section 5.3.2 so that instructors can roughly predict the length of word
list that should be created for their course. This would help inform the development of a
strategy to understand the accessibility of a word without performing a full computation
using the complete dataset, reducing computational load. It could also automate the process
of selecting a range of words used for a vocabulary list.
Another area of additional research is to use this multi-disciplinary approach to create a
software program that can identify authentic sentences within the repository that are most
characteristic of a user-inputted query word. This would assist instructors in developing
wordlists that have a sentence-based example. This may add additional meaning to the
discipline-specific word identified by the current research, and begin to address limitations
associated with word meaning ambiguity.
116
Further research could also be performed to understand other types of barriers in
engineering education, and to apply multi-disciplinary perspectives to identify and mitigate
those barriers. Though the scope of investigating language in engineering education is broad,
the potential for investigating other artifacts of the learning environment for other types of
learning barriers also exists, and should be studied. The application of Universal
Instructional Design to these materials may work to identify, characterize, and decrease
learning barriers in these contexts as well. Two exploratory short-papers explore the
potential future applications of this research, and these are reprinted in Appendix A.8, and
A.9.
7.4 FINAL WORD The research investigation about language in engineering education has provided a
perspective with which to begin characterizing learning barriers in this field. Using a multi-
disciplinary approach, the research shows that engineering education can be improved
towards greater accessibility, by increasing the visibility of learning barriers through the
creation of course-specific vocabulary lists.
117
References
[1] United Nations Dept of International Economic and Social Affairs, Disability: Situation, Strategies and Policies. New York: United Nations, 1986.
[2] T. Campbell, Disability Studies : Emerging Insights and Perspectives. Leeds, England: Disability Press, 2008.
[3] S. D. Edwards, Disability: Definitions, Value and Identity. Oxford: Radcliffe, 2005.
[4] M. Oliver, Social Work: Disabled People and Disabling Environments. London: Kingsley, 1991.
[5] J. Swain, Disabling Barriers, Enabling Environments. London: SAGE, 2004.
[6] P. Jarvis and S. Parker, Human Learning: An Holistic Approach. New York: Routledge, 2005.
[7] G. Reid, Effective Learning. New York, NY: Continuum International Pub. Group, 2009.
[8] K. J. M. Underwood, Teacher and Parent Beliefs about Barriers to Learning for Students with Disabilities: An Analysis of Theory and Practice. |c2006.: 2006.
[9] K. Bird and A. Mathis, Design for Accessibility: A Cultural Administrator's Handbook. [Washington, D.C.]: MetLife Foundation, 2003.
[10] A. Grant, Designing for Accessibility. London: RIBA Publishing, 2012.
[11] R. J. Sorenson, Design for Accessibility. New York: McGraw-Hill, 1979.
[12] C. Barnes and G. Mercer, The Social Model of Disability : Europe and the Majority World. Leeds: Disability Press, 2005.
[13] L. Davis, The Disability Studies Reader. Abingdon, Oxon: Routledge, 2006.
[14] P. Blakely and A. H. Tomlin, Adult Education: Issues and Developments. New York: Nova Science Publishers, 2008.
[15] H. L. Hodgkinson, Higher Education: Diversity is our Middle Name. Washington, D.C.: National Institute of Independent Colleges and Universities, 1986.
[16] T. Loreman, Inclusive Education: A Practical Guide to Supporting Diversity in the Classroom. London.: RoutledgeFalmer, 2005.
[17] K. A. Joseph, Implementing the Social Model of Disability: Theory and Research. Leeds: Disability Press, 2004.
118
[18] Anonymous How People Learn: Brain, Mind, Experience, and School. Washington, D.C.: National Academy Press, 1999.
[19] L. Campbell, Mindful Learning: 101 Proven Strategies for Student and Teacher Success. Thousand Oaks, Calif.: Corwin Press, 2003.
[20] C. S. Claxton, Learning Styles: Their Impact on Teaching and Administration. Washington: American Association for Higher Education, 1978.
[21] L. C. Sarasin, Learning Style Perspectives: Impact in the Classroom. Madison, WI.: Atwood Publishing, 1999.
[22] F. Bowe, Universal Design in Education: Teaching Nontraditional Students. Westport, CT: Bergin & Garvey, 2000.
[23] S. E. Burgstahler and R. C. Cory, Universal Design in Higher Education : From Principles to Practice. Cambridge: Harvard Education Press, 2008.
[24] Anonymous "Universal design," Engineers Australia, vol. 73, pp. 28, -07-01, 2001.
[25] W. F. E. Preiser, Universal Design Handbook. New York: McGraw-Hill, 2001.
[26] J. L. Nasar and J. Evans-Cowley, Universal Design and Visitability: From Accessability to Zoning. Columbus, Ohio,: [The John Glenn School of Public Affairs?]|c2007., 2007.
[27] A. Colburn, "Universal design," The Science Teacher, vol. 77, pp. 8, 03; 2014/1, 2010.
[28] H. M. Hebdon, "Universal Design," The Exceptional Parent, vol. 37, pp. 70, May 2007. 2007.
[29] C. Koch, "Marketing Universal Design," Qualified Remodeler, vol. 38, pp. 12, May 2012. 2012.
[30] H. Lawford-Smith, "Non-Ideal Accessibility," Ethic Theory Moral Prac, vol. 16, pp. 653-669, 2013.
[31] W. Lidwell, Universal Principles of Design: 125 Ways to Enhance Usability, Influence Perception, Increase Appeal, make Better Design Decisions, and Teach through Design. Beverly, Mass.: Rockport, 2010.
[32] Mary Brown Malouf, "Universal Design," The Salt Lake Tribune, pp. D.1-D1, Sep 22, 2003, 2003.
[33] E. Steinfeld, Universal Design: Creating Inclusive Environments. Hoboken, N.J.: John Wiley & Sons, 2012.
119
[34] Anonymous "Americans With Disabilities Act - Federal Agency Decisions," .
[35] L. Gostin and H. Beyer, Implementing the Americans with Disabilities Act. Cambridge, Mass.: Blackwell Publishers, 1996.
[36] M. C. Jasper, The Americans with Disabilities Act. Dobbs Ferry, N.Y.: Oceana Publications, 1998.
[37] Anonymous "Compliance manual Accessibility Standards for Customer Service, Ontario Regulation 429/07 : Accessibility for Ontarians with Disabilities Act, 2005 (AODA)," .
[38] 1. Medline, "Does accessibility of services lead to uncontrolled costs?" Employee Benefit Plan Rev., vol. 32, pp. 95, -03-01, 1978.
[39] National Research Council (U.S.)., Cost of Meeting Accessibility Requirements for Over-the-Road Buses. [Washington, D.C.]: Transportation Research Board, 2000.
[40] J. P. Conway, "Workplace discrimination and learning disability: The national EEOC ADA research project," ProQuest Dissertations and Theses, 2009.
[41] L. Snyder, J. Carmichael, L. Blackwell, J. Cleveland and G. Thornton, "Perceptions of Discrimination and Justice Among Employees with Disabilities," Employ Respons Rights J, vol. 22, pp. 5-19, 2010.
[42] U. Vu, "Physical disability going down, mental disability going up," Canadian HR Reporter, vol. 17, pp. 6, Mar 22, 2004, 2004.
[43] M. Berry, "Businesses must adapt to accommodate disabilities," Personnel Today, pp. 9, Sep 21, 2004, 2004.
[44] D. Carr, "Constructing disability in online worlds: conceptualising disability in online research," London Review of Education, vol. 8, pp. 51-61, March 2010, 2010.
[45] T. L. Childers and C. Kaufman-Scarborough, "Expanding opportunities for online shoppers with disabilities," Journal of Business Research, vol. 62, pp. 572-578, 200905, 2009.
[46] E. Ellcessor, "Access Ability: Policies, Practices, and Representations of Disability Online," ProQuest Dissertations and Theses, 2012.
[47] G. H. Pike, "Disability access and the Internet," Information Today, vol. 20, pp. 19, Feb 2003, 2003.
[48] L. E. Pinto, Curriculum Reform in Ontario: 'Common Sense' Policy Processes and Democratic Possibilities. Toronto, ON: University of Toronto Press, 2012.
120
[49] C. Bernacchio and M. Mullen, "Universal design for learning," Psychiatr. Rehabil. J., vol. 31, pp. 167-169, 2007.
[50] C. Curry, L. Cohen and N. Lightbody, "Universal Design in Science Learning," The Science Teacher, vol. 73, pp. 32-37, Mar 2006, 2006.
[51] D. Glass, A. Meyer and D. H. Rose, "Universal Design for Learning and the Arts," Harvard Educational Review, vol. 83, pp. 98-119,266,270,272, Spring 2013, 2013.
[52] M. King-Sears, "Universal Design for Learning: Technology and Pedagogy," Learning Disability Quarterly, vol. 32, pp. 199-201, Fall, 2009.
[53] S. Scott, J. Mcguire and S. Shaw, "Universal Design for Instruction," Remedial and Special Education, vol. 24, pp. 369-379, 2003.
[54] M. F. Story, "Maximizing Usability: The Principles of Universal Design," Assistive Technology, vol. 10, pp. 4-12, 1998, 1998.
[55] C. Variawa and S. McCahan, "Design of the learning environment for inclusivity: A review of the literature," in ASEE Annual Conference and Exposition, Conference Proceedings, 2010, .
[56] S. Brown, "Universal Design and me," Inside MS, vol. 25, pp. 43-44, Aug/Sep 2007. 2007.
[57] v. Bronswijk, "Ronald L. Mace FAIA (1941-1998), inventor of universal design," Gerontechnology, vol. 4, 2006.
[58] E. Steinfeld, Universal Design: Creating Inclusive Environments. Hoboken, N.J.: John Wiley & Sons, 2012.
[59] D. Zhang, "Research on Landscape Environmental Design of Universal Design," Applied Mechanics and Materials, vol. 71-78, pp. 4756, Jul 2011, 2011.
[60] D. Rose and A. Meyer, "Universal Design for Learning," Journal of Special Education Technology, vol. 15, pp. 67-70, 2000.
[61] R. L. Mace, "Universal Design in Housing," Assistive Technology, vol. 10, pp. 21-28, 1998, 1998.
[62] D. C. Ralston and J. Ho, Philosophical Reflections on Disability. New York: Springer Verlag, 2010.
[63] C. Variawa and S. McCahan, "Computational method for identifying inaccessible vocabulary in engineering educational materials," in ASEE Annual Conference and Exposition, Conference ProceedingsAnonymous 2012, .
121
[64] C. Variawa and S. Mccahan, "Identifying language as a learning barrier in engineering," The International Journal of Engineering Education, vol. 28, pp. 183-191, 2012.
[65] D. Handle, "Universal Instructional Design and World Languages," Equity & Excellence in Education, vol. 37, pp. 161-166, June 2004, 2004.
[66] P. Fletcher and M. Garman, Language Acquisition: Studies in First Language Development. New York: Cambridge University Press, .
[67] E. V. Clark, First Language Acquisition. New York: Cambridge University Press, 2009.
[68] M. Cruz-Ferreira, "First Language Acquisition and Teaching," AILA Review, vol. 24, pp. 78-87, 2011.
[69] C. Painter, Learning through Language in Early Childhood. New York: Cassell, 1999.
[70] C. Maienborn, K. v. Heusinger and P. Portner, Semantics: An International Handbook of Natural Language Meaning. New York: De Gruyter Mouton, 2011.
[71] J. Malrieu, Evaluative Semantics. Routledge, .
[72] H. v. d. Hulst, Recursion and Human Language. New York]: De Gruyter Mouton, 2010.
[73] A. Carstairs-McCarthy, An Introduction to English Morphology : Words and their Structure. Edinburgh: Edinburgh University Press, 2002.
[74] R. J. Teutsch and D. W. Jamieson, "Hockett on Effective Computability," Foundations of Language, vol. 11, pp. 287-293, Mar., 1974.
[75] K. De Bot, Second Language Acquisition: An Advanced Resource Book. New York: Routledge, 2005.
[76] S. M. Gass, Second Language Acquisition: An Introductory Course. New York: Routledge/Taylor and Francis Group, 2008.
[77] L. Selinker, "Interlanguage," International Review of Applied Linguistics in Language Teaching, IRAL, vol. 10, pp. 209, 1972.
[78] S. P. Corder and S. P. Corder, "The Significance of Learners Errors," International Review of Applied Linguistics in Language Teaching, IRAL, vol. 5, pp. 161-170, -01-01, 1967.
[79] L. Cummings. The Pragmatics Encyclopedia 2010.
[80] T. Riney, "Rediscovering interlanguage," System, vol. 22, pp. 119-122, 1994.
122
[81] U. Weinreich, Languages in Contact.: Findings and Problems. The Hague, Mouton: 1974.
[82] W. Hinzen, "The philosophical significance of Universal Grammar," Language Sciences, vol. 34, pp. 635-649, 201209, 2012.
[83] S. Naidu, "Connectionism," Distance Education, vol. 33, pp. 291-294, Nov 2012, 2012.
[84] C. A. Perfetti, "The Universal Grammar of Reading," Scientific Studies of Reading, vol. 7, pp. 3-24, 01Jan2003, 2003.
[85] R. Mitchell, Second Language Learning Theories. New York: Routledge, 2013.
[86] M. Sharwood Smith and J. Truscott, "Stages or Continua in Second Language Acquisition: A MOGUL Solution," Applied Linguistics, vol. 26, pp. 219-240, 2005.
[87] F. Mansouri, Second Language Acquisition Research : Theory-Construction and Testing. Newcastle-upon-Tyne: Cambridge Scholars, 2007.
[88] S. M. Gass and L. Selinker, Language Transfer in Language Learning. Philadelphia: J. Benjamins Pub. Co., 1992.
[89] S. Jarvis and S. A. Crossley, Approaching Language Transfer through Text Classification : Explorations in the Detection-Based Approach. Buffalo: Multilingual Matters, 2012.
[90] A. Y. Durgunoglu, W. E. Nagy and B. J. Hancin-Bhatt, "Cross-Language Transfer of Phonological Awareness," J. Educ. Psychol., vol. 85, pp. 453-465, 1993.
[91] S. D. Krashen, Language Acquisition and Language Education : Extensions and Applications. London: Prentice Hall International, 1989.
[92] D. A. Bilash Watkin, "An instructional model for facilitating second language acquisition integrating the Suzuki philosophy of learning and Krashen's Natural Approach," ProQuest Dissertations and Theses, 1996.
[93] B. C. Ng, Bilingualism : An Advanced Resource Book. New York: Routledge, 2007.
[94] R. Ellis, Study of Second Language Acquisition. Oxford: Oxford University Press, 2008.
[95] J. Hall and Joan Kelly Hall, "Classroom interaction and language learning Classroom interaction and language learning," Ilha do Desterro, pp. 165-187, -04-30, 2008.
[96] M. Mernik, Formal and Practical Aspects of Domain-Specific Languages: Recent Developments. Hershey, PA: Information Science Reference, 2013.
123
[97] Christiansen, Morten H.,Kirby, Simon. Language Evolution 2003.
[98] R. L. Trask, Language: The Basics. New York: Routledge, 1999.
[99] J. Fisiak, Historical Semantics, Historical Word Formation. New York: Mouton Publishers, 1985.
[100] P. Stekauer and R. Lieber, Handbook of Word-Formation. Dordrecht, The Netherlands: Springer, 2005.
[101] W. E. Nagy, P. A. Herman and R. C. Anderson, "Learning Words from Context," Reading Research Quarterly, vol. 20, pp. 233-253, Winter, 1985.
[102] P. De Keyser, Indexing: From Thesauri to the Semantic Web. Oxford: Chandos, 2009.
[103] S. Mishra, "Automated media indexing," Broadcast Engineering, vol. 45, pp. 12, Aug 2003, 2003.
[104] J. N. Olsgaard and J. E. Evans, "Improving keyword indexing," J. Am. Soc. Inf. Sci., vol. 32, pp. 71-72, 1981.
[105] Anonymous Oxford Dictionary of English. New York: Oxford University Press, 2003.
[106] T. C. Craven, String Indexing. Toronto: Academic Press, 1986.
[107] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer and R. Harshman, "Indexing by Latent Semantic Analysis," Journal of the American Society for Information Science (1986-1998), vol. 41, pp. 391, Sep 1990, 1990.
[108] T. B. Hahn, Subject Indexing : An Introductory Guide. Washington, D.C.: Special Libraries Association, 1991.
[109] P. Rafferty, Indexing Multimedia and Creative Works : The Problems of Meaning and Interpretation. Burlington, VT: Ashgate, 2005.
[110] M. W. Berry, Understanding Search Engines: Mathematical Modeling and Text Retrieval. Philadelphia: Society for Industrial and Applied Mathematics, Jan. 1999, .
[111] M. J. Cresswell, Semantic Indexicality. Boston: Kluwer Academic Publishers, 1996.
[112] A. P. Palma, "Indexicality," ProQuest Dissertations and Theses, 1989.
[113] B. Sherman, "Indexicality," ProQuest Dissertations and Theses, 2008.
[114] J. Brent, Charles Sanders Peirce: A Life. Bloomington: Indiana University Press, 1993.
124
[115] T. L. Short, Peirce's Theory of Signs. New York: Cambridge University Press, 2007.
[116] E. Ochs, "Experiencing language," Anthropological Theory, vol. 12, pp. 142-160, 2012.
[117] E. Ochs, Culture and Language Development: Language Acquisition and Language Socialization in a Samoan Village. New York: Cambridge University Press, 1988.
[118] G. Nunberg, "Indexicality and Deixis," Linguistics and Philosophy, vol. 16, pp. 1-43, Feb., 1993.
[119] S. Olderr, Symbolism : A Comprehensive Dictionary. Jefferson, N.C.: McFarland, 1986.
[120] U. REFFLE, "Efficiently generating correction suggestions for garbled tokens of historical language," Natural Language Engineering, vol. 17, pp. 265-282, Apr 2011, 2011.
[121] T. Wynn and F. Coolidge, "Beyond Symbolism and Language: An Introduction to Supplement 1, Working Memory," Curr. Anthropol., vol. 51, pp. S5-S16, June, 2010.
[122] L. A. Carlson and E. v. d. Zee, Functional Features in Language and Space: Insights from Perception, Categorization, and Development. New York: Oxford University Press, 2005.
[123] M. Aurnague, M. Hickmann and L. Vieu, The Categorization of Spatial Entities in Language and Cognition. Philadelphia: J. Benjamins, 2007.
[124] B. M. Amine and M. Mimoun, "WordNet based cross-language text categorization," in Computer Systems and Applications, 2007. AICCSA '07. IEEE/ACS International Conference On, 2007, pp. 848-855.
[125] L. Campbell, Language Classification: History and Method. New York: Cambridge University Press, 2008.
[126] P. Jackson, Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization. Philadelphia: John Benjamins Pub., 2007.
[127] A. Kasher, Language in Focus: Foundations, Methods, and Systems : Essays in Memory of Yehoshua Bar-Hillel. Boston: D. Reidel Pub. Co., 1976.
[128] E. Andrews, Conversations with Lotman: Cultural Semiotics in Language, Literature, and Cognition. Toronto: University of Toronto Press, 2003.
[129] R. Gramigna, "The place of language among sign systems: Juri Lotman and Émile Benveniste," Sign Systems Studies, vol. 41, -12-31, 2013.
125
[130] W. R. Ott, Locke's Philosophy of Language. New York: Cambridge University Press, 2004.
[131] G. P. Radford, On Eco. United States: Thomson/Wadsworth, 2003.
[132] Y. Ledeneva and G. Sidorov, "Recent advances in computational linguistics," vol. 34, pp. 3+, 03; 2014/1. 2010.
[133] J. Kacprzyk and S. Zadrozny, "Modern data-driven decision support systems: the role of computing with words and computational linguistics," International Journal of General Systems, vol. 39, pp. 379-393, May 2010, 2010.
[134] G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Reading, Mass.: Addison-Wesley, 1989.
[135] G. Salton, Dynamic Information and Library Processing. Englewood Cliffs, N.J.: Prentice-Hall, 1975.
[136] D. Dubin, "The Most Influential Paper Gerard Salton Never Wrote," Library Trends, vol. 52, pp. 748-764, Spring 2004, 2004.
[137] G. Salton, "Automatic Text Analysis," Journal of the American Society for Information Science (Pre-1986), vol. 30, pp. 116, Mar 1979, 1979.
[138] G. Salton and C. Buckley, "Improving retrieval performance by relevance feedback," J. Am. Soc. Inf. Sci., vol. 41, pp. 288-297, 1990.
[139] G. Salton, New Approaches to Automatic Document Processing. Ithaca, N.Y.: Dept. of Computer Science, 1971.
[140] G. Salton, A Theory of Indexing. Philadelphia: Society for Industrial and Applied Mathematics, 1975.
[141] F. Béchet, R. De Mori and D. Janiszek, "Data augmentation and language model adaptation using singular value decomposition," Pattern Recog. Lett., vol. 25, pp. 15-19, 2004.
[142] P. Bissiri and S. Walker, "Converting information into probability measures with the Kullback–Leibler divergence," Ann Inst Stat Math, vol. 64, pp. 1139-1160, 2012.
[143] E. M. da Silva and R. R. Souza, "Information retrieval system using Multiwords Expressions (MWE) as descriptors," Journal of Information Systems & Technology Management, vol. 9, pp. 213+, May; 2014/1, 2012.
126
[144] R. Hu, W. Xu and F. Kuang, "An Improved Incremental Singular Value Decomposition," International Journal of Advancements in Computing Technology, vol. 4, pp. 95-102, Feb 2012, 2012.
[145] Yanmin He, Tao Gan, Wufan Chen and Houjun Wang, "Adaptive Denoising by Singular Value Decomposition," Signal Processing Letters, IEEE, vol. 18, pp. 215-218, 2011.
[146] Zhi-Yong Shen, Jun Sun and Yi-Dong Shen, "Collective latent dirichlet allocation," in Data Mining, 2008. ICDM '08. Eighth IEEE International Conference On, 2008, pp. 1019-1024.
[147] R. Fidel, "User-centered indexing," J. Am. Soc. Inf. Sci., vol. 45, pp. 572-576, 1994.
[148] L. L. Hill, "Automated support to indexing," Information Processing and Management, vol. 29, pp. 528-531, 1993.
[149] I. W. Wait and J. W. Gressel, "Relationship Between TOEFL Score and Academic Success for International Engineering Students," J Eng Educ, vol. 98, pp. 389-398, Oct 2009, 2009.
[150] G. M. Vogel, "Language & cultural challenges facing business faculty in the ever-expanding global classroom," Journal of Instructional Pedagogies, vol. 11, pp. 1-32, May 2013, 2013.
[151] C. Variawa and S. Mccahan, "Frequency analysis of terminology on engineering examinations," in American Society for Engineering Education, 2011, .
[152] C. D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing. MIT press, 1999.
[153] C. J. Van Rijsbergen, "A non-classical logic for information retrieval," The Computer Journal, vol. 29, pp. 481-485, 1986.
[154] M. Moens, Automatic Indexing and Abstracting of Document Texts. Boston: Kluwer Academic Publishers, 2000.
[155] C. D. Manning, Foundations of Statistical Natural Language Processing. Cambridge, Mass.: MIT Press, 1999.
[156] M. Stevenson, Word Sense Disambiguation : The Case for Combination of Knowledge Sources. Stanford, Calif.: Center for the Study of Language and Information, 2003.
[157] P. A. M. Seuren, A View of Language. New York: Oxford University Press, 2001.
127
[158] G. Salton, A Vector Space Model for Automatic Indexing. Ithaca, N.Y.: Dept. of Computer Science, 1974.
[159] R. Zeff, "Universal design across the curriculum," New Directions for Higher Education, vol. 2007, pp. 27-44, 2007.
[160] B. R. Connell, M. Jones, R. Mace, J. Mueller, A. Mullick, E. Ostroff, J. Sanford, E. Steinfeld, M. Story and G. Vanderheiden, "What is Universal Design?" The Exceptional Parent, vol. 38, pp. 97, May 2008. 2008.
[161] W. McCann, "Ontario universities uniting to help faculty identify and help students struggling with mental health issues, improve accessibility for all," Council of Ontario Universities, pp. n/a, Oct 15, 2013, 2013.
128
APPENDIX A.1 – DESIGN OF THE LEARNING ENVIRONMENT FOR INCLUSIVITY: A REVIEW OF THE LITERATURE
C. Variawa, and S. McCahan. “Design of the Learning Environment for Inclusivity.” Proc. of 117th ASEE Annual Conference and Exposition. ASEE Paper No. AC 2010-1195. Louisville, 2010. This paper was presented at the 2010 American Society for Engineering Education Annual Conference. This paper reviews the literature on the subject of inclusivity with respect to learning disabilities, minority students and gender issues. Discussed within the context of first year post-secondary education, this work develops a framework that unites the different approaches into an up-to-date resource that is relevant for engineering education.
129
Design of the Learning Environment for Inclusivity: A Review of the Literature
Abstract Retention, especially of under-represented populations through the first year university, is an on-going concern in engineering programs. While this is a very complex issue, one of the aspects of retention that is being studied is the barriers to inclusion that some students feel when they enter university. There are many programs aimed at helping freshman acclimatize to the university environment and the issue of inclusivity is becoming more pronounced as we strive to increase and then maintain the diversity of our student population in engineering programs. There are many ways of approaching issues of student success toward a goal of improving diversity. However, the literature on this subject is highly fragmented. There is a cluster of work on students with learning disabilities, which is found primarily in the equity and disability literature. Then there is a considerable cluster of work on first generation students and minorities and the cultural issues that these students may face when entering university. And in the engineering education literature there is some work on minority student success strategies and a substantial amount of work on improving the retention of women in engineering programs. A fraction of this literature across all of these fields considers the barriers to inclusion that students may encounter in their engineering studies and, in particular, how the design of the learning environment impacts retention. The work in the area of design for retention comes mainly from literature in the field of higher education studies. In this paper we review the research on this subject, both in the engineering education literature and literature from other disciplines. From this review we have created a framework for understanding different approaches that have been taken to making the learning environment more inclusive for diverse student populations. This research identifies approaches that may be effective and transferable, and a number of open questions that should be investigated further. Introduction A look at current engineering classrooms shows how the demographic composition has diversified, especially in recent years. Most retention programs are aimed at freshman because of the vulnerability of this population, so questions of inclusivity and retention are particularly applicable to freshman programs. With constant change in the learner base, coupled with increasing diversity, one begins to question how engineering education should evolve to meet the needs of the next generation of students, and how this evolution affects the students.
Students with learning disabilities (physical and mental), minority students who are affected by the cultural undertones of contextualization, and gender issues are three major areas of diversity that are affected by inclusivity in the classroom. This paper attempts to review the literature on the subject of inclusivity with respect to these issues, within the context of first year post-secondary education, to create a practical framework that unites the different approaches into an up-to-date resource that is relevant for engineering.
130
The Online Ethics Center at the National Academy of Engineering1 has a collection of over 50 abstracts that address teaching to diversity in engineering. Minority retention rates in post-secondary education, for instance, is a topic that also falls in this category. The 2008 annual report by the National Action Council for Minorities in Engineering2 reviews the statistics on minority engineering students, and practicing engineers. Similar statistics exist for women in engineering.3,4 The statistics clearly show that women and minorities are often under-represented in engineering, and there are programs at many universities related to recruitment and retention that attempt to address this issue.
Although many programs exist, it is unclear what makes a retention program effective. It would be inappropriate to simply assume that a specific effective program at one institution could be successfully replicated at a very different type of institution. However, it would be useful to know if there are particular types of programs or approaches that have been successfully implemented across a variety of institutions. We might be able to conclude that there are proven methods that can be adapted to a specific institution to work in a particular context. Furthermore, by looking at the literature on inclusivity across diversity (gender, minority, and learning disabilities) we can see if there are commonalities in effective approaches that can be leveraged. Applying such strategies in an engineering context also has some unique challenges that need to be addressed.
The literature that was reviewed for this project covered three major populations: women, minorities, and people with learning disabilities. While it is possible to find hundreds of citations for each of these categories, references were chosen for breadth. For this reason some of the references are review articles that draw together literature from a large number of primary sources, but virtually all of the literature focuses on one population or another, or on the learning environment in general. Our purpose here is to view this literature altogether to identify commonalities that are relevant and useable for engineering, thus creating a framework for understanding effective approaches to inclusivity that can operate across a variety of populations. Students with Learning Disabilities Learning Disabilities (LD) are defined as “the conditions giving rise to a difficulty in acquiring knowledge and skills, especially in comparison with the norm for one’s peer group, typically because of a mental disability or cognitive disorder.”5 A number of recent publications look at the prevalence of learning disabilities in the classroom. These include studies ranging from the identification of students with visual impairment, autism,6 and auditory processing disorders.7 A review of the results from these sources indicates an increasing prevalence of children with LD which will translate into an increase in engineering students with LD. Further, research exists that suggests that disabilities have no effect on an individual’s intelligence, and therefore students in this population ought to have an equal opportunity to be successful in a learning institution. The studies generally conclude that increased inclusivity in the learning environment is beneficial. The sources reviewed for this project are from the engineering, equity, and disability literature, and pertain to a wide variety of identified disabilities. A particularly comprehensive resource,
131
Brinckerhoff et al.8 has over 900 references that discuss approaches that identify and address LD issues. It includes an analysis of the LD population, the dynamic process of providing accommodation, as well as tools for performing future work in the field. Scotch9 has reviewed the issues of preexisting bias, the presence of dominant tendencies in the workplace, disabling environments, assumptions of incapacity, and the culture of disability policy. A more recent article by Kavale et al.10 argues that the traditional definition of a learning disability “has remained static for 40 years, creating a schism between theory and practice.” Particularly, the authors suggest that the definition of LD ought to be updated to a more rigorous construct of physical as well as mental disability, including emotional disturbances of environmental, cultural or economic disadvantage. Similarly, Williams et al.11 recently published the results of their research on learning disabilities and how the sharing of information can influence particular outcomes. Their article concluded that sharing knowledge about student behaviors creates an increasingly personalized relationship with the learning population.
One route to addressing the issue of disabling learning environments is universal instructional design (UID), also called universal design in education. Bowe12 and Burgstahler et al.13 review the history of this approach and explore the principles behind it. The UID principles are aimed at changing the learning environment to reduce the barriers to learning for a broad range of students, while enhancing the environment for all students. The principles, drawn from universal design in architecture, are intuitively appealing. However, McGuire et al.14 have pointed out that this approach has not yet been rigorously tested.
Research on methods for addressing issues of LD has increased over time. Again, although there are many publications in this area, only the most recent and summative examples are chosen to discuss here. Research published in late 2009 considers the effects of computer-assisted instruction on the mathematics performance of students with learning disabilities.15,16 Although the instructors were generally willing to provide additional instructional and adapted materials to assist LD students, increased class sizes and lack of additional support structure made this approach difficult. Seo and Bryant17 analyzed 11 existing studies that compared computer assisted instruction (CAI) to face-to-face teaching. Although there was no consensus on whether CAI is advantageous per se, the authors were able to identify several key issues which need to be addressed before CAI can be realistically compared with traditional teaching practice. They suggest that CAI should be based on a valid learning theory (i.e. based on cognitive and constructivist models rather than behavioral) and should incorporate critical instruction features (such as feedback, etc). The validity of using CAI to assist LD students still needs to be studied further. The Seo and Bryant study is important because “e-learning” is gaining increased attention as a method of assisting students with learning disabilities. Another example is Todd’s work which considers several recent studies that aim to promote e-learning as a tool for assistive education.18 LoPresti et al.19 review assistive technologies currently being explored to reduce accessibility barriers, and provide improved quality of life. The literature shows that learning disabilities can affect both student success and inclusivity. In general, the literature suggests that increasing interaction between the instructor and the student is effective, and when that becomes difficult, methods such as e-learning that supplement traditional learning can be useful. However, e-learning is not universally effective and to be effective it must be understood well by both the instructor and the student, and it must
132
incorporate key elements of pedagogy. The advantages and disadvantages of a student-centered approach versus changing the institutional environment as a whole are summarized in Table 1.
The literature is now clear that bright students with learning disabilities also have much to contribute to the engineering profession. Most of our current practice in terms of retaining these students is based on finding appropriate individualized accommodations, but increasingly the literature points to changing teaching practice as a means of creating inclusivity. The literature on learning disability has also begun to point to a wider variety of factors such as economic and cultural differences (e.g. Kavale et al.10) that should be accounted for in the learning environment if we intend to create inclusivity.
Minorities and First Generation Students: Cultural Issues When students first enter university there is a period of adjustment when they must transition from the environment and learning skills they were accustomed to in high school, to a new environment with new demands. This period of transition, or feeling they are not yet successfully adjusted, can be especially acute for first generation and minority students. First generation students are people who are the first in their family to go to college. Admission decisions are generally based on grades, extracurricular activities, capacity to communicate in the language of instruction etc. However, these attributes do not necessarily measure how easily a student will fit into the learning environment, especially if the new learning environment and culture of the institution are very different than what they have experienced before. Nor do we want to exclude students who come from diverse backgrounds because they may have difficulty adjusting. This would have significant negative consequences for the institution, the learning environment, and the engineering profession.
Traditionally there has been an over-representation (relative to the general population) of white men in engineering in North America. This is a simplistic statement because it ignores hidden diversity. However, many aspects of current learning environments in engineering implicitly assume this simplistic homogeneity. As a result, students from diverse backgrounds may have difficulty adjusting to the institutional environment. This may be felt both inside and outside the classroom. We will focus here on the learning environment where cultural differences can result in unnecessary barriers to learning, for example, making meaning of the contextualization used in engineering applications. Eventually this can affect student success and retention because it leads to a disconnect between the learner and the material which can compromise grades and lead to a sense of alienation.
The cluster of work in this area is extensive, and is spread over many disciplines. For this reason, recent work, and that most closely-related to inclusivity in the first year engineering classroom, will be examined preferentially.
In a recent article Tapia20 argues that diversity requires attention to the student and institutional commitment. He gives examples of exemplary programs at various “top-tier” universities that support inclusive environments for minority students, and contends that a supportive institutional environment benefits everyone. Malone and Barabino21 considered such environments as they examined the role of environment in identity-formation. They also performed a comprehensive
133
analysis of narrations of race in science, technology, engineering, and math (STEM) settings. Their work identifies themes of invisibility and lack of recognition, exclusivity, racialization, and issues of integration of identity. In general, their work pulls together research from various sources, including existing literature and primary research studies.
Understanding the relationship between racial difference and minority inequality is complex. Trytten et al.22 for example, contend that racial inequality can exist in spite of over-representation. They point to the example of Asian American students in engineering in North America. Specifically, they argue that over-representation “does not remove the racially-based stereotyping and discrimination in our society,” and hence minority status. In their work, they describe five approaches for making engineering institutions more equitable, including: creating a support system for all minority groups; educating faculty and students about stereotyping; and remaining vigilant for possible issues including instances of discrimination not reported to the institution. Generally, they claim that minority students may require additional support to facilitate inclusivity, whether they are members of an over-represented or under-represented minority. This article exemplifies a message that is repeated in other sources: that while students from a particular background may face similar obstacles, we need to be careful not to stereotype, but instead to consider how diversity, both visible and invisible, can result in a disconnect between the learner and the learning environment. There are a variety of valuable recent articles in this field for further reading that are directly applicability to first-year engineering.23,24, 25
In terms of creating a framework for addressing the needs of culturally-diverse students, we have identified several underlying trends in the literature. First, minority students (cultural, racial, etc.) are subject to unique barriers to learning that “traditional” engineering students do not have to face. Second, the probability of minority student success depends on the degree to which the institution is able to develop and support an inclusive environment. Further, students from over-represented minorities and those with hidden diversity may encounter some of the same barriers to accessibility. Several approaches to mitigating these learning barriers were also examined in the literature including increased resources and counseling, recognition of achievements, and peer/faculty support-groups. Effectively, these add up to a student-centered approach that decreases a sense of alienation. One of the significant current trends is an emphasis on community building to achieve a sense of inclusion. A key recommendation for the in-class engineering learning environment is that contextualization of knowledge should take into account differences in the environmental, cultural, and economic backgrounds of students. The advantages and disadvantages of a student centered approach versus changing the institutional environment as a whole for addressing the needs of first-generation and minority students are summarized in Table 2.
Improving Retention of Women in Engineering Education There a huge body of literature in the field of gender differences in education, and a portion of this analyzes methods for improving the retention rate of women in engineering education. The number of women entering engineering has risen, but has not risen steadily, and has been out-paced by female representation in other professional fields. Some research suggests that recruitment into engineering is the primary issue, as opposed to retention.26 However, other
134
research suggests that women continue to experience a sense of exclusion in the engineering environment which may feedback and influence decisions that are made by the next generation of students. This has been an on-going issue in engineering education, and the consensus is that this is a complex issue that will require a societal as well as institutional evolution.
There are some excellent recent articles in this area that pertain to engineering education. Buchmann27 identifies areas where women lead and trail men in higher education. Essentially an up-to-date literature review of women in higher education, Buchmann also investigated the correlation between gender differences and student success rates. Leicht-Scholten et al.28 describe how the international community is fostering gender inclusivity in engineering education. And Garforth and Kerr29 analyze the issues of gender differences in science, technology, and engineering using a Foucauldian approach. This approach seeks to identify a feminine perspective by considering how women describe their interaction with the institution. They advocate incorporating this perspective into the academy instead of trying to acclimatize women into a preexisting environment. Gender disparity is also analyzed in a cluster of articles summarized in the summer 2009 National Women’s Studies Association Journal.30 The consensus is that inclusivity in science requires approaches that can be “varied and thus appeal to a wide variety of learners, and the applications would benefit all facets of society.”30 This idea echoes the learning disabilities and minority studies in STEM education literature. Du and Kolmos31 also suggest methods of improving inclusivity for women engineers, but their approach uses problem-based learning (PBL) courses. In their study, they analyze how PBL courses offer not only the usual learning benefits associated with PBL, but also increased female recruitment into areas where they are under-represented.
The relatively low percentage of women pursuing engineering degrees is also a societal issue. Studies by McCarthy32 and Chen33 suggest that negative cultural messages, restrictive role modeling, and lack of constructive middle and high school guidance contribute to the problem. McCarthy advocates fostering inclusive attitudes and language, reframing physical project assessments to foster a less destructive approach, and among other things, carefully marketing STEM education. In another study,34 researchers found that the perceived importance of engineering competencies is subconsciously influenced by gendered assumptions. Engineering competencies that are perceived as “feminine” are regarded as soft skills that are less valued. As a mitigation strategy, they and others35,36 suggest emphasizing the value and importance of a wide variety of competencies in engineering, and being careful not to reinforce stereotypes. To be effective, they contend improvement strategies should be structural rather than individualistic.
In general, the literature on gender issues in engineering education shows that the current population of women in STEM education is low relative to the general population and the inclusion of feminine identity plays a key role in the formation of an inclusive environment. University is an essential developmental period for many students, and it is important that women see the opportunity in engineering education of developing in an environment that affords their perspective and goals equal value. A summary of the key advantages and disadvantages of a few different approaches that have been tried in this field is shown in Table 3.
We have reviewed the literature in three clusters that pertain to specific learner populations: students with learning disabilities, minority students and cultural differences, and women. Along
135
with the literature on these specific populations, there is another body of literature which looks at the learning environment overall.
Design for Retention There is a body of literature in the field of higher education studies that pertains to retention. The literature in this area can be roughly subdivided into two categories: research into the attributes that make students more likely to succeed (with the aim of helping students boost their competencies in these areas); and research into intervention strategies or environmental factors that impact success.
There is research that demonstrates that the preexisting psychological state of the student, and their social and coping skills, have an effect on retention. Solberg Nes et al.37 surveyed over 2000 students to determine the effects of dispositional and academic optimism on college student retention. The former affected retention via motivation and adjustment, whereas the latter did the same, but affected GPA as well. One area that has received much attention is Emotional Intelligence (EI) and how that impacts retention. Qualter et al.38 showed that higher EI positively influences a student’s ability to progress, while also evaluating an EI-based intervention program using recent theoretical work to ground their results. This approach is typical. Schools that use EI assessment will generally follow up with the student, i.e. offer opportunities for the student to boost their competency in areas where their EI assessment is low.
Other researchers have focused on retention programs and the characteristics of the learning environment that positively impact retention. Jones and Braxton39 offer a good current review of the extent and types of recent approaches institutions are taking to reduce college student attrition. Bai and Pan40 performed an analysis of four different types of intervention. In their study, they found that social integration programs improve retention for female students, and identified which types of advising programs benefited first year students. Croft et al.41 examined a program which increased support of mathematics instruction to assist in retention efforts, and showed that the institution also progressed in other areas as a result of this university-wide support strategy. McQueen42 recently reviewed various models that are currently being used in the field of retention. She argues that an internationally prevalent model currently used by institutions for student retention, Tinto’s Student Integration Model although useful in certain areas, is not particularly applicable for education. She suggests that a more contextualized, nuanced, and psychosocial approach be used in the field.
The institutional environment, including the student community also plays a key role in retention. Oseguera and Rhee43 studied how the characteristics of the student population affected retention over a 6 year period. They found that better academically-prepared and better resourced students can act as buffers for at-risk students. That is, the better prepared students can help retain their peers during times of failure and self-doubt.
Overall, we found through the literature search that much of the research, although carried out in other fields, is applicable to engineering education. The issues of student attributes (e.g. EI) and approaches suggested for retention programming appear to be transferable to engineering. The literature suggests that supporting the development of student coping skills, and creating an
136
environment that encourages mentoring and a positive sense of community and inclusion have a positive impact on retention.
Like the other clusters of work we reviewed, the body of material in this field is huge. There appear to be many possible strategies that could be implemented to positively impact retention. However, we are faced with two difficulties. First, programs or approaches need to be fit to the needs of the particular institution, and simply “lifting” a strategy from elsewhere is probably not effective. So we need to understand not just the details of the strategy, but understand the principles that make it effective. Second, given limited resources we need to decide, on a practical level, which approaches will yield the most impact for resources invested.
Discussion
This review has considered clusters of literature which all pertain to inclusivity and by extension, retention. Within each of these clusters, the authors have examined recent literature with an emphasis on breadth. These topics include up-to-date literature surveys, statistics, and quintessential studies that examine inclusivity across diversity. Although each article takes a unique approach, there are some generalized conclusions which we can draw from this review.
Two schools of thought emerge from the literature examined, both have at their core the intent of increasing student success and retention in diverse learning environments via inclusivity. The individual-focused (IF) approach attempts to mitigate learning barriers by helping the individual student fit into the environment, while the system-focused (SF) approach attempts to change the environment to fit the broadest possible variety of students. All of the strategies and programs discussed in the literature, across all of the clusters we reviewed, can be categorized along this spectrum. Some approaches are purely IF or SF, but many are a mixture.
Tables 1, 2 and 3 summarize some of the main advantages and disadvantages of the IF and SF strategies for each cluster of literature. Table 1 shows how learning disabilities can be mitigated using IF and SF approaches. There is a tradeoff between individual accommodation or intervention and increasing the accessibility of the system overall. The goal in both approaches is to improve inclusivity. However the SF strategy adjusts the system to make the environment more accessible to a greater number of students. This, if done effectively, will improve the learning environment for LD students, and may also create a better learning environment for others (what is known as the “curb cut” effect). It also inherently accommodates students who may have a learning disability, but have not yet been assessed. The disadvantage is that even a system that is well designed for a broad set of users may not accommodate people on the far end of the spectrum in terms of needs. And, there may be a perception that building accommodation into the system compromises the integrity of the education. This may not be the reality, but it can impact on the effectiveness of an institutional change. Whereas, the IF approach targets LD learners specifically and seeks to provide accommodation or teach coping skills. As discussed in sources like Williams et al.,11 the creation of a personalized relationship between the accommodation service and the student increases a sense of inclusivity while reducing barriers to learning. However, other authors in the field argue that as more and more students resort to accommodation the system becomes strained, and students may become too highly dependent on this service for their sense of inclusion. Increasing load on individual accommodation services
137
requires greater resources while only meeting the needs of a limited portion of the learning population. Hence, there are disadvantages to using IF or SF strategies exclusively. Table 1 – Strategies for people with learning disabilities (physical/mental)
Individual-Focused System-Focused e.g. Accessibility volunteer who
helps in note-taking, physical assistance for transportation, extended-duration for assessment completion, etc. (per-case basis)
e.g. Universal Design in Education – maximize accessibility for the greatest number of learners possible. Provide an environment that is flexible, transparent, and more tolerant of user-error.
Pros Provides assistance to individuals who are highest at-risk of not succeeding Demonstrates strong sense of institution-learner commitment due to personalized response
Pros Provides an increased level of accessibility for all students, regardless of prior disability-level Increases universal access to education May promote/supplement alternative ways of learning, resulting from greater variability of access methods
Cons May promote a sense of unequal treatment among non-assisted and assisted learners Although student is being assisted, they may feel more out-of-place because of accepting this assistance Generally requires greater resources as students are addressed individually
Cons May leave out students at highest-risk of not succeeding There is a concern that this may compromise the integrity of education by “simplifying” Does not address barriers to individual learning specifically (addresses several barriers in a general-sense, but none are specific to any student)
References 8, 11, 17, 18, 19
8, 9, 12, 13, 14
Table 2 shows a comparison between the advantages and disadvantages of using the IF and SF approaches and provides some examples for first-generation, minority and culturally based student issues. The individual-focused strategy typically employs a personal tutor, coaching, or mentoring system. This approach encourages person-to-person interaction, and may greatly benefit individuals who severely lack any support and have a substantial sense of isolation or exclusion. Although the IF approach promotes a kind of inclusion, it also segregates individuals from their peers. Further one may argue that the learner may develop a dependence on this resource, and such dependency could possibly reduce the learner’s independent motivation and self-confidence. In terms of adjusting the environment to fit the student’s needs (i.e. the SF approach) sources such as Malone et al.21 suggest that identity creation is a major factor for increasing inclusivity, and the institution can affect this by supporting initiatives that build a sense of community belonging. Further, changing the classroom environment to include applications and contextualization that takes into account a diverse student population can have a
138
positive effect. However, similar to the shortcomings of the SF approaches used for learning disability, this approach may not meet the needs of the highest risk individuals. Table 2 – Strategies for first generation and minority students, and/or to address cultural issues
Individual-Focused System-Focused e.g. Personal tutor/mentor assigned to
student (or small group) Having clear lines of communication between instructor/learning population by promoting a human-centered approach (telephone, in-person meetings, etc) Individual-specific learning objectives
e.g. Restraining use of colloquial terms on assessment materials Promoting and funding cultural/minority groups on campus whose aim is increase understanding between learning population/society Diversity in methods of instruction allows for the learning population to use the one they are most familiar with (e.g. Lecturing vs. teaching using multimedia)
Pros Individualized sense of inclusivity – student feels closely associated with ‘mentor’ Increases self-confidence by having a resource that may know learner at a personal level
Pros Promotes an environment that increases inclusivity for all students to a greater degree Enhances instructional material by contextualizing data generally – improves transferability of knowledge/application Limited ‘alienation’ feeling due to the learner self-creating a model of effective learning (is not dependent on a ‘mentor’ for assistance)
Cons May form dependence on ‘mentor’ to act as interface between self and environment Addresses very specific issues – knowledge gained may have variable applicability
Cons Learners highest at-risk who need additional assistance still have their barriers to learning
References 20, 21, 22, 23, 25 20, 21, 24, 25
Table 3 considers the strategies available for addressing gender issues in engineering. One example is lack of female role models in engineering education. Using an IF approach an institution might develop a coaching or mentoring program. The advantage of approaching inclusivity in gender-issues from the angle of IF is that it promotes the sense of a personal relationship between a mentor and an individual student, and this fosters identity creation, increased self-confidence, and addresses other issues. A critique of the IF approach in gender issues is that it may promote a sense of exclusion for women because it suggests they are a foreign entity in engineering in need of support to operate successfully in the engineering
139
profession. This may be a source for alienation, and may be counter-productive if not addressed by the system. The systems-focused approach identifies gender-issues as a way to embrace differences and incorporate them into the diverse learning environment. This approach identifies gender issues not as a problem with women not fitting in, but rather as a part of the greater problem of an exclusive environment which also has implications for other types of diversity. A systems approach aims to address all of these issues via universal design applicable to the greatest number of users to the greatest degree possible. The difficulty is implementing such a change. There are numerous obstacles including societal factors, institutional inertia, etc. And it can be asked whether engineering currently has the means of making this change if there are an insufficient number of women to reach a critical mass, or tipping point.
Table 3 – Strategies to deal with gender issues Individual-Focused System-Focused e.g. Individual role-models in the
faculty who act as nodes for personal growth
e.g. Increasing enrolment rates for women in STEM education
Pros Highly personal relationship between individual and ‘mentor’ may increase sense of identity, and decrease self insecurity issues etc. Embraces gender differences as a means to accept diversity in the classroom
Pros Increases gender equality, and promotes universal treatment of all learners Self-identity creation is supplemented by the system addressing all students equally Gender differences are given the same ‘importance weighting’ as others; does not provide exclusive treatment of one group over another: system-wide
Cons May further segregate genders because of increased sense of exclusivity between “them” and “us”
Cons Gender issues may not be fully addressed for all persons affected – a surface-level approach to solving this problem promotes a partial understanding of the specific issue
References 26, 27, 28, 29, 33
26, 28, 29, 30-36
Conclusion Studying student success in learning environments has roots in inclusivity studies in education. Recent literature sources were used for this project which aims to identify means of increasing inclusivity by addressing the needs of students with learning disabilities, minority students and those who have cultural barriers to learning, and women in STEM education. We have also included the literature on retention in the review, particularly design for retention.
The breadth of work examined here was an attempt to create a list of resources which can serve as a starting point for future work. Several approaches currently being investigated in other disciplines, such as an understanding of EI as it pertains to retention, have potential to be used directly in engineering, or to be adapted for use in engineering.
140
Much of the literature is focused on the benefits of a human-centered approach to revising the learning environment either at the individual level or at the systemic level. The approach could hypothetically be engineered such that the educational system is designed around the user (students) to address their needs. This is a concept familiar to engineers in product or system design and we have the opportunity to apply our expertise in this area to improve the learning environment. Increased inclusivity will ideally accommodate the increasing diversity of tomorrow’s engineering population. However, the challenges of designing intervention programs, or redesigning the learning environment, are enormous and to date there is no one approach can be identified as the “standard” or best practice.
Considering the literature from a purely individual-focused or system-focused perspective is perhaps simplistic because so many of the suggested, and tested, strategies are a blend of these two approaches. However, we need a way of conceptualizing the vast quantity of research to make it meaningful and useable. Creating this framework helps to consolidate the literature in this field into a manageable form. In summary, the individual-focused approach addresses barriers to learning at a personal-level which works best for learners who are most at risk. It is also far easier to implement. However, it may require more resources and reach fewer students as the population diversifies. The system-focused approach on the other hand aims to increase inclusivity for the greatest number of students possible. So, whereas IF focuses on depth, SF focuses on breadth of learning barriers mitigated. The SF approach is harder to implement in many ways and may not meet the needs of the students who most at risk. However, it is geared toward developing a more inclusive environment which should be the goal of every engineering school. Overall, we should be considering both pathways to creating a more inclusive system. Bibliography
1 "Abstracts of Studies about Diversity in Engineering and Science" Online Ethics Center for Engineering 8/6/2009 National Academy of Engineering <www.onlineethics.org/Topics/LegalIssues/Diversity/abstractsindex.aspx> 2 "Synergies (2008 Annual Report) ". Rep. National Action Council for Minorities in Engineering. Web. <http://www.nacme.org/user/docs/NACME_AnnualReport2008.pdf>. 3 Lim, V. "A Feeling of Belonging and Effectiveness Key to Women's Success." Diverse: Issues in Higher Education 26.2 (2009): 17. 4 Kukreti, A., Simonson, K., Johnson, K., and L. Evans. "A NSF-Supported S-STEM Scholarship Program for Recruitment and Retention of Underrepresented Ethnic and Women Students in Engineering." ASEE Annual Conference and Exposition, Conference Proceedings (2009). 5 Oxford University Press. Oxford English Dictionary Online, 2010. 6 Li, A.. "Identification and Intervention for Students Who are Visually Impaired and Who have Autism Spectrum Disorders." Teaching Exceptional Children 41.4 (2009): 22-32. 7 Iliadou, V., Bamiou, D., Kaprinis, S., Kandylis, D., and G. Kaprinis. "Auditory Processing Disorders in Children Suspected of Learning Disabilities--a Need for Screening?" International Journal of Pediatric Otorhinolaryngology 73.7 (2009): 1029-34. 8 Brinckerhoff, L. C., McGuire, J. M., & Shaw, S. F. (2002). Postsecondary education and transition for students with learning disabilities. Austin, TX: Pro-Ed, Inc. 9 Scotch, R.K. "Disability Policy: An Eclectic Overview." Journal of Disability Policy Studies 11.1 (2000): 6-11. 10 Kavale, K., Spaulding, L., and A. Beam. "A Time to Define: Making the Specific Learning Disability Definition Prescribe Specific Learning Disability." Learning Disability Quarterly 32.1 (2009): 39-48
141
11 Williams, V., Ponting, L., Ford, K., and P. Rudge. "A Bit of Common Ground: Personalisation and the use of Shared Knowledge in Interactions between People with Learning Disabilities and their Personal Assistants." Discourse Studies 11.5 (2009): 607-24. 12 Bowe, F. Universal Design in Education: Teaching Nontraditional Students. Westport, CT: Bergin & Garvey, 2000. 13 Universal Design in Higher Education: From Principles to Practice. Eds. S.E. Burgstahler and R.C. Cory. Cambridge: Harvard Education Press, 2008. 14 McGuire, J., Scott S., and S. Shaw. "Universal Design and its Applications in Educational Environments." Remedial and Special Education RASE 27.3 (2006): 166. 15 Busch, T., Pederson, K., and C. Espin. "Teaching Students with Learning Disabilities: Perceptions of a First-Year Teacher." The Journal of Special Education 35.2 (2001): 92-9. 16 Schumm, S. J., Vaughn, S., Haager, D., McDowell, J., Rothlein, L., and L. Saumell. "General Education Teacher Planning: What can Students with Learning Disabilities Expect?" Exceptional Children 61 (1995): 335. 17 Seo, Y., and D.P. Bryant. "Analysis of Studies of the Effects of Computer-Assisted Instruction on the Mathematics Performance of Students with Learning Disabilities." Computers & Education 53.3 (2009): 913-28. 18 Todd, R. E-Learning for Secondary School Teachers: Inclusive Science and Math Instruction for Students with Disabilities. Berlin: Springer, 2008. 19 LoPresti, E.F., Bodine, C., and C. Lewis. "Assistive Technology for Cognition." IEEE Engineering in Medicine and Biology Magazine 27.2 (2008): 29. 20 Tapia, R. "Minority Students and Research Universities: How to Overcome the "Mismatch"." The Chronicle of Higher Education 55.29 (2009): A72. 21 Malone, K., and G. Barabino. "Narrations of Race in STEM Research Settings: Identity Formation and its Discontents." Science Education 93.3 (2009): 485. 22 Trytten, D., Lowe, A., and S. Waiden. "Racial Inequality Exists in Spite of Over-Representation: The Case of Asian American Students in Engineering Education." ASEE Annual Conference and Exposition, Conference Proceedings (2009). 23 Crown, S., Fuentes, A., Tarawneh,C., Freeman, R., and H. Mahdi. "Student Academic Advisement: Innovative Tools for Improving Minority Student Attraction, Retention, and Graduation." ASEE Annual Conference and Exposition, Conference Proceedings (2009). 24 Gomez, T.. "Integrating Engineering, Modeling and Computation into the Biology Classroom: Development of a Multi-Disciplinary High School Neuroscience Curricula." ASEE Annual Conference and Exposition, Conference Proceedings (2009). 25 Lambright, J., Johnson, W., and C. Coates. "Attracting Minorities to Engineering Careers: Addressing the Challenges from k-12 to Post Secondary Education." ASEE Annual Conference and Exposition, Conference Proceedings (2009). 26 de Cohen, C., and N. Deterding. "Widening the Net: National Estimates of Gender Disparities in Engineering." Journal of Engineering Education. 98.3 (2009): 211-226. 27 Buchmann, C. "Gender Inequalities in the Transition to College." Teachers College record 111.10 (2009): 2320. 28 Leicht-Scholten, C., Weheliye, A. and A. Wolffram. "Institutionalisation of Gender and Diversity Management in Engineering Education." European Journal of Engineering Education 34.5 (2009): 447. 29 Garforth, L., and A. Kerr. "Women and Science: What's the Problem?" Social Politics 16.3 (2009): 379. 30 Norton, C., and D. Wygal. "Inclusive Science: Articulating Theory, Practice, and Action." National Women’s Studies Association Journal 21.2 (2009): vii. 31 Du, X., and A. Kolmos. "Increasing the diversity of engineering education – a gender analysis in a PBL context" European Journal of Engineering Education 34.5 (2009). 32 McCarthy, R. "Beyond Smash and Crash: Gender-Friendly Tech Ed. " The Technology Teacher 69.2 (2009): 16-21. 33 Chen, X. Students Who Study Science, Technology, Engineering, and Mathematics (STEM) in Postsecondary Education. Stats in Brief. NCES 2009-161. National Center for Education Statistics. Available from: ED Pubs. P.O. Box 1398, Jessup, MD 20794-1398. Tel: 877-433-7827 Web site: http://nces.ed.gov/help/orderinfo.asp, 2009. 34 Male, S., Bush, M., and K. Murray. "Think engineer, think male?" European Journal of Engineering Education 34.5 (2009). 35 Cronin, C. and A. Roger. “Theorizing progress: women in science, engineering, and technology in higher education.” Journal of Research in Science Teaching, 36.6 (2009): 637–661.
142
36 Fox, M.F., Sonnert, G., and I. Nikiforova. “Successful programs for undergraduate women in science and engineering: Adapting versus adopting the institutional environment.” Research in Higher Education, 50.4 (2009): 333–353. 37 Solberg Nes, L.,Evans, D.R. and S.C. Segerstrom. "Optimism and College Retention: Mediation by Motivation, Performance, and Adjustment". Journal of Applied Social Psychology 39.8 (2009): 1887-912. 38 Qualter, P., Whiteley, H., Morley, A., and H. Dudiak. "The Role of Emotional Intelligence in the Decision to Persist with Academic Studies in HE." Research in Post-Compulsory Education 14.3 (2009): 219. 39 Jones, W. A. and J.M. Braxton, “Cataloging and Comparing Institutional Efforts to Increase Student Retention Rates”, Journal of College Student Retention, 11.1 (2009-2010): 123-139. 40 Haiyan B., and W. Pan. "A Multilevel Approach to Assessing the Interaction Effects on College Student Retention". Journal of College Student Retention 11.2 (2009): 287-301. 41 Croft, A. C., M. C. Harrison, and C. L. Robinson. "Recruitment and Retention of Students - an Integrated and Holistic Vision of Mathematics Support." International Journal of Mathematical Education in Science and Technology 40.1 (2009): 109-25. 42 McQueen, H. "Integration and Regulation Matters in Educational Transition: A Theoretical Critique of Retention and Attrition Models." British Journal of Educational Studies 57.1 (2009): 70-88. 43 Oseguera, L., and B. S. Rhee. "The Influence of Institutional Retention Climates on Student Persistence to Degree Completion: A Multilevel Approach." Research in Higher Education 50.6 (2009): 546.
143
APPENDIX A.2 – IDENTIFYING LANGUAGE AS A LEARNING BARRIER IN ENGINEERING
C. Variawa, and S. McCahan. “Identifying Language as a Learning Barrier in Engineering.” International Journal of Engineering Education. Vol. 28:1, pp. 183-191, 2012. This journal paper is published in the International Journal of Engineering Education. Language used in engineering course materials may be a barrier to accurate assessment because students perceive the meanings of words differently. Universal Design in Education (UDE) has emerged as a strategy for making course material more accessible, but remains largely untested in this area. This study investigates if students can accurately self-assess their understanding of vocabulary, i.e. if this is a ‘visible’ or ‘invisible’ deficit from the student’s point of view, using a limited sample of ten words found on engineering exams. This is a preliminary investigation toward testing the efficacy of a UDE-based mitigation strategy, and finds that students often inaccurately self-assess their understanding of language used on engineering examinations.
144
1.0 – Introduction/Background
Often when we think of accessibility issues in higher education what comes to mind are physical barriers to
facility access for students or staff. However, increasingly we are aware of, and trying to address, more subtle
obstacles that may create unnecessary challenges that impact student success. These include creating appropriate
support systems for students with learning disabilities, and other “invisible” disabilities. More recently, we have
begun to recognize barriers that become perceptible as the student population diversifies. As engineers, we are
ideally situated to address this as a design problem because we recognize that designing for a broader set of users
has the potential to improve the design of a system for everyone.
The principles of universal design were first articulated in the 1960’s and 1970’s by Ron Mace and others
in the field of architecture [1, 2]. Fundamentally the goal of universal design in architecture is to design a building
or space that is intentionally accessible to the broadest range of people possible. Conceptually this means taking
accessibility into account from the beginning of the design process rather than as an afterthought. The accessible
design movement played a role in the development of legislation [3, 4]. As a result, many of our university and
college environments are now more physically accessible.
In the past two decades the principles of universal design have found their way into a number of other
fields, notably engineering. The principles of universal design in engineering have given rise to the development of
accessible transit systems, accessible information technology systems, and ergonomically designed household
products. The use of universal design in information technology is now pervasive: televisions come with built-in
closed captioning systems; text messaging is a standard feature on cell phones; screen magnifiers and readers are
readily available; and ATM machines have Braille lettering on the buttons. Universal design features are now built
in to many systems allowing the user to create a customized environment that fits their needs (e.g. Web 2.0). Design
engineers have moved from a mentality of human-centered design to interaction design and, most recently, to
experience design. In doing so, the concept of creating a system that is barrier-free and intuitive for a diverse set of
users has become a central theme in the engineering design process.
Universal design has now begun to permeate education, first at the K-12 level and more recently at the
post-secondary education level [5]. If we look at a course, a curriculum, or an institution as a designed system, then
the principles of universal design should help guide us toward a more accessible learning environment design for a
more diverse user group. There have been a number of authors who have re-interpreted the principles of universal
145
design to make them applicable to educational systems [5 - 7]. The universal design framework applies the principle
of “learner centered” not just to an individual class, but to the design of the whole learning environment at every
level. McGuire, Scott, and Shaw suggest that universal design in education (UDE) is a “paradigm shift” that
promotes uniformity of academic goals and standards by designing accessibility into a course, curriculum, and
institution, rather than making exceptions for individual students who do not fit our preconceived idea of what is
“typical” [6]. They point out that individualized accommodation will still be necessary for some students.
However, pervasive use of exceptions may undermine the integrity of a course, whereas designing accessibility into
a course opens up learning opportunities for a broad range of students. However, they have also noted that UDE
remains a largely untested strategy that requires further testing and validation. Pliner and Johnson discuss UDE in
relation to social justice and transforming social relationships which can be negatively affected by invisible barriers
to inclusivity [7]. Their work suggests that implementing UDE pedagogy creates a more “inclusive” environment
which can decrease the barriers to learning that all individuals may have to some extent (i.e. the so-called “curb cut”
effect).
A recent review of the literature shows that there is serious concern about barriers to success for students,
in both engineering and other fields, and a wide variety of approaches have been employed to try to mitigate barriers
for at-risk students [8]. UDE offers one possible approach and a framework for interpreting the impact of mitigation
tactics. It will serve as a useful context for considering the results of this study. However, we should also bear in
mind that UDE is not the only possible approach and other ways of thinking about these issues should be utilized.
2 - Purpose
A look at today’s educational institutions shows a dramatic increase in the cultural diversity of the student
population, and institutions have not fully evolved to account for this diversity. One example is the use of colloquial
language or culturally specific references on assignments and other learning materials. The student’s inability to
understand a question on an assignment, for example, can create a misalignment between the results of the
assessment and the learning objectives of the course. Basically, it compromises the validity of the assessment
because it may test colloquial vocabulary to some extent rather than just the engineering concepts. While virtually
all assessment instruments have this issue, in engineering it can be a particular problem because it is pedagogically
preferable to situate problems in an authentic context and use terminology that is authentic to practice in the
146
profession. This inaccessibility can cause a bias in the assessment, favouring those individuals who have a
particular background.
In engineering assessments, students may find questions difficult to answer if they are not familiar with the
non-course-related terminology used. In the case of an assignment, the student can get help understanding the
question. However, in a closely-supervised exam situation, which is often time-limited, it is usually not possible to
get assistance. In our experience, words such as ‘blob’ or ‘kettle’ are not specific to the engineering course material
being taught, yet they present a problem for some students when they appear on tests. Students knowing the
meaning of the word will have less difficulty in understanding the question, and ought to be able to answer it
correctly, as intended. Students not having any exposure to the word beforehand, but having sound knowledge of
the course material and the English language otherwise, will not be able to understand exactly what is being asked.
This concern is balanced by a need to ensure that students graduate with a vocabulary that allows them to operate
effectively in the profession. A broad vocabulary is a professional asset. So ideally, we would want students to
acquire a robust vocabulary but this is generally not specified in our learning objectives, and not explicitly taught or
assessed.
This vocabulary problem perennially arose in our large first year design course. Although we tried to write
tests using clear, non-culturally specific language we continued to experience problems. We did not want to “dumb-
down” the language because we felt that it is important to use accurate and authentic terminology. Therefore, we
took steps to mitigate the problem using the principles of UDE. We now develop a word list, which is posted prior
to each test in this course. This word list contains all of the infrequently used words (i.e. we leave out words such as
“and”, “the”, “are”, etc.) that appear on that particular test. We put some extra words on the list that do not appear
on the test but which are words we think are useful for an engineer to know. The word list is in alphabetic order so
the questions on the exam are not apparent from the list. The intent is to give the students an opportunity to gauge
their own level of understanding of the test vocabulary beforehand, and if required, consult information sources to
correct any gaps ahead of time. This strategy allows us to contextualize questions and use accurate, authentic
engineering terminology. However, this practical and simple approach to dealing with the problem is predicated on
several key assumptions, only a subset of which is investigated by the study. Some of the broader assumptions
include:
147
1. That the use of common, but infrequently used, words and terms may compromise the validity of the
assessment for some students. We are relatively certain this is true, based on experience, but research data
on the frequency and degree of the problem is not available. In addition, there is currently no existing data
about language on engineering examinations.
2. That given a list of words, students can correctly assess their level of understanding of these words. To
make good use of the word list this must be true, but we have no research that supports this assumption.
This study attempts to generate some data to test this assumption.
3. That students can independently learn the meaning of the words and phrases on this word list effectively.
Again, to make good use of the word list this must be true, but we have no research that supports this
assumption.
This study begins to address the second assumption. The primary objective of the broader study is to
analyze how well students can self-assess their understanding of problematic words that could appear on engineering
assignments or tests, i.e. to identify whether or not infrequently used words are an invisible or visible barrier for
students. This is the beginning of a long-term study to describe the accessibility issues that arise from language on
engineering learning materials, and develop tools for addressing this issue (if we find it exists). The specific
element of this larger study that we are examining here is whether students can correctly assess their understanding
of non-course-related words used in engineering examinations. While it is relatively easy to measure how the
addition of a ramp in place of stairs makes a building more accessible for many types of users, it is more challenging
to test how a change aimed at reducing language barriers in an engineering course could result in improved learning
for a variety of people. However, applying the principles of UDE has the potential to not only result in
improvements for people who would otherwise be “at-risk” but also improve the quality of the learning environment
for a broad range of students (i.e. the so-called “curb cut effect”). Other concurrent concerns are maintaining the
integrity of the learning objectives and the economic feasibility of changes to the system.
Within this type of potential barrier the authors chose to focus their attention on three categories of
colloquial language prevalent in engineering examinations, namely:
1. non-course-specific technical terminology,
2. culturally-based words, and
3. linguistically-difficult terminology.
148
These categories are very rough, and there is overlap between them, but we developed these approximate
groupings based on examination of the word lists we had prepared for our first year design course exams over a
number of years. It is worthwhile noting that while the authors did not find any pertinent literature suggesting such
groupings, literature in areas such as composition studies and linguistics may inform the development of such rough
categorization by viewing them from unique perspectives that take into account language development.
The first category contains words commonly used in North American society that have reference to
technology. Examples of such words or short-phrases include: mouse pad, power bar, remote control, and ear buds.
The second category, culturally-based words, includes words that are used only regionally or within a specific
culture. For example, a “typical” North American would be familiar with the “hood” of an automobile, whereas a
“typical” Western European would refer to it as the “bonnet”. Further examples of such words and phrases include:
loonie, Jell-O®, an efficiency apartment, and flapjack. There are, of course, words that fall into both the technical
terminology and culturally-based word categories. An example is “coordinates” (i.e. email address) which is used in
some regions of the world but is not common in the U.S. This is both cultural and technical. However, for this
preliminary study we assigned words to only one of the three categories for simplicity.
The third category that we have used for the classification is ‘linguistically-difficult’ terminology.
Essentially, words in this category fall into neither the first nor second classification, are not course-specific, yet
may cause difficulty in understanding the elements of an engineering assessment because they are outside the
everyday vocabulary of students. Examples of such words include: propagate, succinct, and happenstance.
3 - Design/Method
Our study analyzed the responses of forty very diverse undergraduate engineering students who each
completed a questionnaire containing ten words which might be found on an assignment or test. The participants for
this study represented a mix of very diverse ethnic and cultural backgrounds; a variety of native and non-native
speakers of English; representation of different genders; and were all aged 18-22 (typical undergraduate student
age). These words were chosen by the authors because they fit fairly well into one of the three categories we are
interested in exploring. In this preliminary study no attempt was made to choose the words using a more systematic
method. After fulfilling the ethics process at our institution, the study began by training the participants; they
learned about the task they were being asked to perform; the scale they would be using; and the motivation for the
149
study. This was meant to establish a clear purpose to this study and motivate the participants to provide genuine
answers. Then, the participants individually numerically assessed their understanding of each of 10 words on a
questionnaire on an equal interval scale from ‘0’ to ‘5’; with ‘0’ representing no knowledge of the word, and ‘5’
representing superior understanding. This is the “perceived-understanding” (PU) score. These words represented
the three categories mentioned, and a detailed explanation of the rating scale was provided on the question sheet to
minimize ambiguity. Finally, each participant was asked to write a maximum of 5 synonyms and/or a brief
definition of each word within a textbox to provide evidence of their level of understanding. To reduce ambiguity,
the most recent Oxford English Dictionary’s (O.E.D.) definition of ‘synonym’ was written as a footnote on the
question sheet, and participants were free to ask questions at any time. The researchers then consulted the O.E.D.
for the correct definitions and synonyms for each of the ten words used in the study. The responses from each
participant were compared against these dictionary definitions by the researchers and given a score. We counted the
definition as fully correct if it matched in meaning to at least one of the definitions for the word. The closeness of
correlation between the dictionary definition and the student’s definition was assigned an integer value from 0 to 5 –
this is the “observed-understanding” (OU) score. Finally, the participant’s PU score was compared to the OU score.
4 - Results
Table 1 shows the words used in the study. Below each word, in parentheses, is the category that best
describes the word. To the right of the word is a histogram comparing the sum of the OU and PU scores. The graph
in the right-hand box of each row shows the relationship between these scores based on occurrence. The larger-
diameter circles indicate a larger proportion of participants having that specific outcome. Although only 10 words
were used in this investigation, the number of participants resulted in a substantial data set requiring further methods
of analysis. Table 2 shows a summarized ANOVA for the statistical significance of the findings, in addition to
Figures 1 and 2 that examine the aggregate data of all words combined.
Figure 1 shows the frequency of the difference between OU and PU for all of the words together. Ideally,
an accurate self-assessment would mean that the OU and PU scores would be identical (OU-PU=0). The data,
however, shows that OU and PU scores were quite often different. The skew in Figure 1 demonstrates visually that
participants more often overrated their understanding of the words, and the bar charts in Table 1 and pie chart of
Figure 2 reiterate this point. We found that students correctly self-assessed their understanding 34.5% of the time
150
and overrated their understanding 52.8% of the time; they only under-rated their own understanding 12.8% of the
time as summarized in Figure 2.
When we examined the OU/PU ratio for each word (Table 1, left column), we see that there are noticeable
differences between words. Words such as “bungalow”, “fax”, “Jell-O”, “bonnet” and “mold” have an OU/PU ratio
relatively close to 1, suggesting that students are more likely to correctly self-assess their understanding of these
words. Conversely, the OU/PU ratio tells us that students are less likely to correctly self-assess their understanding
of words such as “tolerance”, “feasible”, “propagate” and “field”. This is important because, although words like
“bonnet” have a low overall PU and OU, students are apparently aware of their lack of understanding which makes
this type of word a visible learning barrier for them.
The data also shows that students believe they understand some of these words well; these words have
higher PU scores relative to the other words. For example, the students think they know the word “tolerance” better
than the word “bonnet”, however the observed understanding scores of these two words are quite similar. Table 1
shows that the words “field”, “fax”, and “feasible” are known to many of the participants; however, the students
substantially overrated their understanding in several of these cases.
To better understand the accuracy of self-assessment, we calculated the residual of each data point to the
line PU=OU for each word (shown in parenthesis in the scatter plots in Table 1). This number is calculated by
taking the sum of absolute differences for each data point to the line PU=OU. The results show that a word like
“tolerance” is consistently misjudged since it has a high residual relative to the other words. In this study, smaller
residuals suggest a more accurate self-assessment. For example, the scatter-plot for “fax” is skewed and clustered to
the upper-right (Table 1), so it is difficult to interpret the data from the scatter plot alone. However, it has a
relatively small residual relative to other words which demonstrates that students are typically correctly self-
assessing their understanding of this word. The combination of a high average OU score plus the low residual tells
us that students both understand this word accurately and are aware that they know it.
In contrast, the sum of scores plot in Table 1 for the word “succinct” suggests that it was not a well-known
word to most participants. Interestingly, it has a lower residual when compared to the other words as well. This is
because 38% of the participants had an OU=PU value that was zero. So, although the lower residual suggests an
accurate self-assessment (high correlation between OU and PU), the unclustered distribution of its scatter-plot
suggests it is a particularly inaccessible word for most students. Additionally, this case illustrates that the residual,
151
OU/PU ratio, sum of scores graph and scatter-plot should be considered together to formulate a more complete
understanding.
We also investigated the statistical significance by performing an ANOVA on the mean of the OU score
and PU score for each word, as seen in Table 2. The results show that “bonnet”, “bungalow” and “fax”, and to some
degree “Jell-O” (which just misses the threshold of p=.05), are self-assessed accurately. However, the other words
are not. Table 2 shows the means and standard deviations for the OU and PU scores, as well as the difference
between the two, and the t-test results for each word. It is interesting to note that “Jell-O” is not included in the list
of accurately self-assessed words when using this method, even though the scatter plot in Table 1 might indicate
otherwise. It is clear that the cultural terms we tested had the least variability in OU/PU ratio, and the highest
OU/PU values. In addition, the distribution of scores on the scatter-plots for the cultural terms shows that many
students are unfamiliar with these terms, but they recognize this lack of familiarity; it is visible to them. These
results appear to indicate that students more accurately assess their understanding of cultural words, or at least this
small subset of words.
The results for linguistically-difficult and technical words are more complex. The OU/PU ratio and
residual values for linguistically-difficult words are relatively consistent. The OU/PU values, for example, fall in a
relatively narrow range from 0.65 to 0.76 which is lower than the values obtained for the cultural words. This
indicates that students are consistently unaware of their misunderstanding of these words. The technical words, by
comparison, show far less consistency. There seems to be no clear trend for the technical terms, some are very well
understood and accurately self-assessed, such as “fax” and “mold”. While “tolerance” had a surprisingly low degree
of understanding and the level of understanding was poorly self-assessed. It will require further investigation
involving more words, ideally evaluated in context, to fully characterize the issue particularly for technical non-
course-specific terminology.
In this study, we found that students are typically better at self-assessing their understanding of cultural
words and had difficulty assessing their understanding of linguistically difficult words. This suggests that cultural
and perhaps even technical words are more often visible barriers to accessibility, while non-course-related
linguistically difficult words may more often represent invisible barriers. That is, students may not seek clarification
of a linguistically difficult word because they incorrectly believe they have a sufficient understanding of the word.
This type of invisible barrier has an analogy in misperceptions of basic physics concepts (which have been studied
152
extensively, e.g. the force concept inventory), or other pre-existing misconceptions, which need to be taken into
account to make instruction effective. These conclusions are limited, however, by the words that were used in this
study. A more extensive investigation, particularly examining the understanding of words in context, would be
needed to fully elucidate this issue.
It is important to note that the scope of this study limits the generalizability of the data. Specifically, we
cannot confidently predict whether misunderstanding a specific term can inhibit overall learning and the student’s
ability to succeed on assessment measures. Although making such claims might sound intuitive, this data is limited
and there is little additional data in the literature to support such a claim. Further research needs to be performed.
5 - Discussion
We can draw some preliminary conclusions from this study that should be tested further. From our
observations in the classroom, we find that language can be a barrier to accurate assessment of learning for some
students. This study provides in a very limited way whether these barriers are visible or invisible to students in the
form of ten words. Although this study is just a small element in the larger investigation of inaccessible language,
this study informed preliminary data about how students perceive their understanding of ten words found in
engineering exams. We found that all of the words tested were unfamiliar to some degree: no term had an average
observed understanding score above four. As expected the findings illustrate that students do not understand
colloquial language identically. We also found that these students did not accurately self-assess their understanding
of such words consistently. Perceived-understanding scores were consistently higher than the observed-
understanding scores. This shows that these students tended to over-rate their understanding of colloquial words and
this is appears to be especially true for linguistically-difficult words. This consistent over-rating is an example of a
learning barrier that students are unaware of, it is an “invisible” barrier to learning. This information can help us
create techniques that assist in vocabulary clarification to reduce these learning barriers.
The existing literature on accessibility is extensive and spans across several disciplines including equity,
disability, gender, and among others, higher education studies [9]. This literature helps to explain why language is
integral to an inclusive learning environment [8]. Specifically, the fact that learning barriers exist in in the language
of engineering course materials may be one reason why students (especially first-year students) find it challenging to
adjust to an environment that appears to be culturally foreign [10]. The finding that cultural language is a visible
153
barrier might be why students often attribute this alienation mainly to cultural-acclimatization. We may be
underestimating the role of invisible language barriers, such as the use of linguistically-difficult words. Specifically,
our findings suggest that it would be worthwhile to investigate further the impact of these invisible language barriers
on inclusivity.
Some work in the field of composition studies appears to link vocabulary and related issues to educational
discourse, and may inform a promising approach to such further investigation. Specifically, Bartholomae’s seminal
work has led to further exploration of how language can create a barrier to learning [11, 12]. For instance, learning
how to write like an “expert” may produce barriers if the student is unconfident in their current writing style; further
research shows how individualized approaches to language and vocabulary in the classroom may interfere with what
is considered “correct” in that field. Though integral to learning about language in academia, our study at-present
has a limited scope pertaining to self-efficacy in accurately identifying understanding of ten words on engineering
examinations. In addition, our study is to test if students can use this information to gauge their understanding of
these words; further discourse into composition studies and related fields is very useful albeit out of the scope of this
particular study.
While both visible and invisible learning barriers hinder student success, this study might hint that a UDE
approach such as word lists posted prior to an exam may be useful as a mitigation technique particularly for some
types of words. Since students are likely to accurately self-assess their understanding of colloquial-cultural
language, word lists of cultural terms may be an effective mitigation strategy for this particular type of learning
barrier. However, this is a very preliminary study of the situation, and a more thorough investigation can provide a
more complete picture of the issues. In addition, our results suggest that such word lists may not be as useful for
technical and linguistically-difficult words. Linguistically-difficult words, in particular, are different because they
often appear to be invisible barriers to understanding, which suggests that these words need to be identified as
unfamiliar before word lists can become an effective tool. Additionally, this mitigation tactic continues to assume
that students can independently learn the meaning of words once they are aware of their lack of understanding. The
principles of UDE provide guidance on creating a more accessible learning environment, but further study is needed
to identify how UDE can be used when the barriers to accessibility are invisible to the student.
This study is just a first step in elucidating the issues that arise with the contextualization of problems in
engineering learning materials. We need to better describe the vocabulary that is presenting difficulty for our
154
students, and then find methods for dealing with these barriers. One way of possibly alleviating language issues is to
develop tools (e.g. software) that explicitly identify inaccessible language for both the instructors and students. This
would allow the participants in the learning environment to personally choose how to mitigate the potential barriers.
Our future work will also consider learning barriers in engineering more broadly: Taber’s typology of learning
impediments can potentially be a starting point for this research [11]. Ideally, confronting these issues using a UDE-
based approach increases accessibility for everyone, not just those identifying cultural words as a learning barrier,
since both the instructors and students benefit from more valid assessment.
6 - Conclusions
From this study we have learned that colloquial language as a learning barrier can be characterized along a
spectrum from visible to invisible; the types of words that can be classified into each of these categories; and that we
can use this information to develop possible mitigation tactics. Within the context of ten words, our results show
that undergraduate engineering students view and understand colloquial language uniquely from each other and
from the instructor. Further, the accuracy of self-assessing one’s understanding of inaccessible language is
determined by the visibility of the learning barrier itself. These inaccessible terms can be roughly classified into
colloquial cultural, technical, and linguistically-difficult language; only the first appears to be a visible
inaccessibility for students according to our dataset. To mitigate potential effects of using colloquial-cultural
language on exams, we suggest that the use of word sheets containing these terms might be effective while
promoting a UDE approach to instruction. To reduce inaccessible vocabulary, the author’s future work includes
broadening the scope of this study to a larger corpus of language, then analyzing and developing a software-based
approach whose interface suggests accessible alternatives for identified visible and invisible language issues on
engineering assessment instruments.
Word
Histogram: Sum of Score The left bar represents the sum of perceived-understanding (PU) scores. The right bar represents the sum of observed-understanding (OU) scores.
Scatter-plot Data: X-axis: perceived-understanding scores (PU). (student’s self-assessed understanding). Y-axis: observed-understanding scores (OU). (understanding assessed based on written definition).
155
The number in parenthesis is the sum of absolute differences from each data point to the line PU=OU (y=x).
Succinct (Linguistic)
OU/PU=0.76
Propagate (Linguistic)
OU/PU=0.69
Feasible (Linguistic)
OU/PU=0.65
Field (Linguistic)
OU/PU=0.70
-10123456
-1 0 1 2 3 4 5 6
-10123456
-1 0 1 2 3 4 5 6
-10123456
-1 0 1 2 3 4 5 6
-10123456
-1 0 1 2 3 4 5 6
(56)
(57)
(31)
(53)
156
Mold (Technical)
OU/PU=0.87
Tolerance (Technical)
OU/PU=0.47
Fax (Technical)
OU/PU=0.95
Jell-O® (Cultural)
OU/PU=0.93
Bungalow (Cultural)
OU/PU=0.95
-10123456
-1 0 1 2 3 4 5 6
-10123456
-1 0 1 2 3 4 5 6
-10123456
-1 0 1 2 3 4 5 6
-10123456
-1 0 1 2 3 4 5 6
-10123456
-1 0 1 2 3 4 5 6
(34)
(81)
(24)
(17)
(39)
157
Bonnet (Cultural)
OU/PU=0.87
Table 1. Shows an individual analysis of each word. The sum of scores graphs show overall confidence and relative difference in OU and PU scores. The scatter-plots show interaction effects.
Figure 1. Shows the number of times the self-assessment is ideal (OU-PU=0) and the general tendency towards over-assessment (OU-PU<0). This is an aggregate of all words used in this study.
-10123456
-1 0 1 2 3 4 5 6
0
20
40
60
80
100
120
140
160
-5 -4 -3 -2 -1 0 1 2 3 4 5
# O
ccur
renc
es
OU-PU
Accuracy of Self-Assessment
(29)
158
Figure 2. Shows that the relative frequency of over-rating understanding is greater than accurate and under-rating understanding combined. This is an aggregate of all words used in this study.
Word Means Stdev PU-OU Means
PU-OU Stdev t-test
Bonnet PU 2.1 1.582
.275 1.062 t(39)=1.64, p=.109 OU 1.83 1.81
Bungalow PU 3.25 1.565
.175 1.338 t(39)=0.83, p=.413 OU 3.08 1.94
Fax PU 4.03 0.800
.200 0.883 t(39)=1.43, p=.160 OU 3.83 0.747
Feasible PU 3.93 0.797
1.375 1.125 t(39)=7.73, p=.000 OU 2.55 1.011
Field PU 4.13 0.686
1.250 1.056 t(39)=7.49, p=.000 OU 2.88 0.822
Jell-O PU 3.73 1.219
0.250 0.742 t(39)=2.13, p=.040 OU 3.48 1.585
Mold PU 3.03 1.310
0.400 1.105 t(39)=2.29, p=.028 OU 2.63 1.462
Propagate PU 2.88 1.285
0.900 1.336 t(39)=4.26, p=.000 OU 1.98 1.544
Succinct PU 1.95 1.974
0.475 1.132 t(39)=2.65, p=.011 OU 1.48 1.853
Tolerance PU 3.90 0.672
2.075 1.385 t(39)=9.48, p=.000 OU 1.83 1.174 Table 2. Shows the statistical significance of accurate self-assessment.
53%34%
13%
Frequency in Self-Assessment
OU-PU < 0
OU-PU = 0
OU-PU > 0
159
Acknowledgment The authors gratefully acknowledge Prof. Mark Chignell for his input on computational methods and for the participants of this study for their time.
References
1. North Carolina State University Center for Universal Design, http://www.design.ncsuedu/cud/univjdesign/ud.htm, Accessed 5 April 2010.
2. W. L. Wilkoff, and L. W. Abed, Practicing universal design: An interpretation of the ADA. Van Nostrand Reinhold, New York: NY, 1994.
3. Americans with Disabilities Act of 1990, P.L. 101-336, 104 Stat. 327, 42 U.S.C. 12101 et seq. 4. Telecommunications Act of 1996, P.L. 104-104, 110 Stat. 56. 5. F. Bowe, Universal design in education: Teaching nontraditional students, Bergin & Garvey, Westport: CT,
2000. 6. J.M. McGuire, S.S. Scott and S. F. Shaw, Universal design and its applications in educational environments,
Remedial and Special Education, 27(3), May-June 2006, pp. 166-75. 7. S.M. Pliner and J. R. Johnson, Historical, theoretical, and foundational principles of universal instructional
design in higher education, Equity & Excellence in Education, 37(2), 2004, pp. 105-113. 8. C. Variawa and S. McCahan, Design of the learning environment for inclusivity, Proceedings of the 2010
American Society for Engineering Education Annual Conference and Exposition, Louisville: KY, June 20-23 2010.
9. L.C. Brinckerhoff, J.M. McGuire and S.F. Shaw, Postsecondary education and transition for students with learning disabilities, Pro-Ed, Inc., Austin: TX, 2002.
10. D. Trytten, A. Lowe and S. Waiden, Racial inequality exists in spite of over-representation: The case of Asian American students in engineering education. In Proceedings of 2009 American Society for Engineering Education Annual Conference and Exposition. Austin: TX, June 14-17 2009.
11. D. Bartholomae, Inventing the University, Journal of Basic Writing, (5), 1986, pp. 4-23. 12. A. Johns, Text, Role, and Context: Developing Academic Literacies. Cambridge University Press, New
York, 1997. 13. K.S. Taber, The mismatch between assumed prior knowledge and the learner’s conceptions: A typology of
learning impediments, Educational Studies, 27(2), 2009, pp. 159-171.
160
APPENDIX A.3 – FREQUENCY ANALYSIS OF TERMINOLOGY ON ENGINEERING EXAMINATIONS
C. Variawa, and S. McCahan. “Frequency Analysis of Terminology on Engineering Examinations.” Proc. of 118th ASEE Annual Conference and Exposition. ASEE Paper No. AC 2011-1565. Vancouver, 2011. This paper was presented at the 2011 American Society for Engineering Education Annual Conference. This paper reviews the literature on frequency analysis of words, and presents a study that analyses the frequency of words on engineering final exams at the University of Toronto. Discussed within the context of Universal Instructional Design and learner characteristics, this work is an initial investigation in designing a strategy that computationally characterizes vocabulary in engineering education.
161
Frequency Analysis of Terminology on Engineering Examinations
CHIRAG VARIAWA AND SUSAN MCCAHAN University of Toronto
Abstract There have always been differences between instructor expectations of what students “should know” and the actual background experience that students have entering an engineering program. The divergence between this assumed knowledge and the actual knowledge base may be increasing as the student population diversifies. The issue is not just wide differences in preparation in basic math, or science, or communication ability, but diversity in the cultural background of students. While we frequently laud diversity we have not always followed this up by supporting inclusivity in our classrooms and finding ways to bridge cultural differences that may exist. Specifically, when we contextualize technical material to situate an engineering problem in a real-world scenario, students are subject to a test of their background experience and vocabulary – so, instead of clarifying a technical concept, the context may make the concept more inaccessible. This may also compromise the inclusivity of the learning environment, causing students to doubt their suitability for studying engineering.
This represents an instance where learner characteristics are misaligned with the expectations of the learning environment, and there has been little research in this particular area of engineering education. The goal of the current study is to evaluate the vocabulary we use in engineering education, so in future work we can consider the alignment between the vocabulary used and learner characteristics. As raw data we are using an exam bank that contains final examinations for all engineering courses at the University of Toronto. A frequency analysis of the words and terms used on the exams has been carried out, excluding course specific technical terminology. The hypothesis is that infrequently used words and terms are typically less familiar to students. This study is the first step to testing this hypothesis.
The results of the frequency analysis are analyzed with respect to: 1. The distribution of words and terms that are used on exams. 2. The relationship between the vocabulary used on particular types of exams and natural
language. 3. The work of authors like van Rijsbergen to understand how a proxy system for familiarity
might be developed. The results are discussed within the theoretical framework of learner characteristics and interaction with the learning environment. In particular, the results are examined with reference to the literature including Universal Instructional Design (UID), and critique of the UID approach.
162
Introduction The engineering student population is becoming increasingly diverse in recent years.1,2 As a result, the diversity of background experience and vocabulary that students bring with them to university is increasing as well. As these students integrate within engineering institutions, they may face issues of inclusivity and accessibility to course material because of their diverse backgrounds. One such dimension that particularly impacts student inclusivity is that of language.3 Students may face barriers to learning when the language of instruction and assessment does not accommodate differences in learner characteristics. The problem is that students may actually have a different corpus of language than instructors assume they have. For example, when a student encounters a term that is unfamiliar to them, the word creates a barrier to understanding. This barrier may inhibit learning, or compromise the validity of assessment if the student’s lack of understanding is not addressed. Some vocabulary (course specific vocabulary) is explicitly taught. However, when unfamiliar vocabulary is used, and not explicitly taught, it creates a misalignment between the learning environment and the learner. The learner experiences this as a barrier to accessibility of the learning environment
A potential solution to the issue of inaccessible language might appear to be the use of plain language. Plain language, is the notion that clear and simple language is the most accessible and logical way of communicating with one another. There is plenty of literature in this area, and there are several studies that show the benefits of using plain language.4-6 However as educators, we want our students to develop a deep and robust vocabulary as part of their engineering education. This is particularly important because the mastery of technical and professional corpora of language is beneficial for students and practicing engineers alike. As a result, educators cannot simply use plain language at an elementary level to address this issue, but instead need to investigate the issue of inaccessible language in their curricula.
The identification of inaccessible vocabulary has several advantages for engineering education. First, it addresses a barrier to accessibility that is becoming increasingly prevalent as the learning population diversifies. Second, it encourages both the instructor and student to develop resilient professional vocabulary while helping students over the barrier.
Our hypothesis is that word familiarity is correlated with word frequency. If this is true then words that appear frequently in teaching materials are better understood by students, and words that appear infrequently are more likely to be unfamiliar. The first step in this investigation is analyzing the frequency of words in a typical engineering classroom. Specifically, we believe that this approach will provide some insight on the issue of inaccessible vocabulary used in engineering education. Additionally, we hypothesize that words which appear frequently are a less significant accessibility issue than those that appear infrequently. This study measures the frequency of words in one particular type of learning material, undergraduate final exams, because this method of closely-supervised assessment is common in engineering education and provides a substantial database.
163
Some analysis techniques in the area of vocabulary frequency-analysis are presented by C.J. van Rijsbergen.7,8 Primarily, his work comments on the use of Zipf’s Law to understand the statistical distribution of words in language. Zipf’s law states that the most frequent word in an article of text will appear twice as frequently as the second most frequent word, and four times as frequently as the third most frequent word, etc.9 Thus, the expected result of a frequency analysis is a hyperbolic curve with a narrow range of frequently-appearing words and broad range of infrequently occurring words. To better understand the validity of this theorem, Li performed a study using a uniform distribution of all 26 letters, plus a “space” character to study the effect of Zipf’s law in different cases. His approximation established that the law is indeed valid no matter what vocabulary is used, but that its effect is more pronounced in natural language.10 The rough theory behind this phenomenon is that humans use both frequent and infrequent words that may or may not have meaning on their own.11,12 Additionally, this theory also helps explain why it is important to remove the word “the” (and others like “vowel-less” words) from frequency analysis studies; the word “the” is the most common word in the English language.7,13 Overall, Zipf’s law is one concept we can use to interpret the data acquired from the frequency analysis of words.
Methodology
The objective of the current work is to develop a list of words ranked by frequency for a set of engineering course final exams. These lists will then be processed in two ways. First, the word lists will be inputted to a database program so that the frequency and rank can be accurately matched to its corresponding word and exam. Second, the lists will be plotted graphically to determine overall trends in the data. The expected output from this process will be a dataset of vocabulary with each word being tagged by rank and frequency.
The study investigates the frequency of words used on engineering examinations at the University of Toronto. Final examinations were chosen for this study for several reasons. The database of final exams is readily available. At this institution final exams from previous years are posted on a publicly-accessible website so that students can use them as study aids. Also, students are not able to access assistance during an exam, which means that they must rely on their a priori vocabulary to make sense of the questions. And as a critical assessment in a course, the exam should be testing the student’s understanding of the course concepts rather than the student’s vocabulary. Presumably, the instructor has taken this into account when developing the exam. Finally, every exam in this program is the same duration, 2.5 hours, which allows for some common basis of comparison (e.g. number of words on the exam).
These exams are posted in PDF format, and include information about the course and instructor. We started by downloading the most recent exams from the freshman courses in Materials Science Engineering (MSE). An advantage of using this set of exams is that these courses are the same, or very similar to, courses taken by other freshman engineers at many institutions. Further, the authors have experience with the content and assessment objectives of each exam,
164
making is easy to identify vocabulary that is explicitly taught in each course. Additionally, using electronic exams enabled a computational solution to performing the frequency analysis.
To perform the frequency-analysis, several software tools are used. Each exam was first processed using Adobe® Acrobat Pro v.9.4.1 to make the text in each document searchable electronically. The optical character recognition (OCR) engine in this software is responsible for converting static images into text. The main advantages of using this software are that it dramatically reduces the amount of time and effort required to input text for the frequency computation, and reduces human error in data entry. A disadvantage of this approach, however, is that each word is not vetted by a human prior to entry. This means that typos in the original document are treated as actual words, and these eventually become part of the compiled database. Additionally, there may be cases where the software disregards disfigured words because they are unintelligible to process. In the case of distorted words, the authors inputted the problematic word manually.
The next step in this process was to use a program called Hermetic Word Frequency Counter Advance v.12.45. The program calculates the frequency of each word, and outputs the data as a text file which includes rank, word, and word frequency, and a unique identifier for each word. During this process, we instructed the program to disregard particular words, i.e. exclude particular words from the computation. Specifically, the program ignores specified character strings that are illogical and may confound the results (i.e. words that have less than 2 characters, contain a hyphen, that are just repetitions of the same letter, or which lack a vowel or ‘y’). In addition the word “the” was excluded from the data. “The” is the most frequent word in the English language13 and dwarfs all other words in terms of frequency, making it difficult to illustrate the data graphically. As discussed in the previous section, literature in the field of word frequency generally suggests excluding “the” from an analysis. One particular advantage to removing these words now is to reduce the likelihood of an erroneous word being processed, and to reduce the clutter in the database. One disadvantage of this software system, however, is that words with prefixes and/or suffixes are considered to be their own unique word. For example, the word “gear” would be considered different from “gears”. This limitation is one that the literature recommends be addressed, but there is currently no automated method for accomplishing this, and no clear systematic approach described in the literature. So we have not manipulated the data to combine the results for variations of the same word. . Additionally, the set of vocabulary that is very clearly course-specific was removed from the frequency-analysis. For instance, words such as “modulus” and “necking”, which are specific to the materials science course content, were removed from the dataset because the meaning of these types of specific technical words would be explicitly taught in the course. Our intent is to focus on vocabulary that is not explicitly taught; that is, words that the instructor has assumed every student understands without instruction. Specifically, we believe that accessibility issues can result when a term is unfamiliar a priori and not explicitly taught. The authors acknowledge that labelling a word as course-specific is subjective and presently non-rigorous; we intend to use a
165
corpora of technical vocabulary distilled from the textbooks used in each course to mitigate this problem. Additionally, we also acknowledge that the exam sample size (N=9) is presently small. However, the word lists are substantive (n=565, in total). This provides a large vocabulary sample that is indicative of the language used in introductory engineering courses.
The word lists that were produced from this process are sorted by frequency. The data was then plotted to produce graphs comparing the vocabulary frequency, and this was used as the basis for comparing different exams to one another. Further, the frequency distributions for each exam were examined statistically to understand how different exams compare with one another. This method produces data that can be mined in a variety of ways to better understand the language we use in the engineering learning environment.
Results and Discussion
The frequency analysis produced 9 datasets containing ranked frequency distributions of the words used on each exam. The data shows that the occurrence of words which we might assume are very familiar, such as “name”, “clear”, or “length”, are not particularly frequent (nor consistently infrequent). Additionally, the data shows that mathematics exams generally have fewer words than other exams. The data shows that all exams have a roughly-hyperbolic distribution of words per Zipf’s law; with some words occuring extremely frequently, and most occurring only once. Plots of the frequency distributions are shown in Table 1.
Table 1 shows the exams and the word frequencies. The first column shows the course title, and the total number of words. The information in parenthesis gives a very brief description of the type of course. The third column in Table 1 graphically shows the frequency analysis. The vertical axis is the “occurrence percentage”. This normalized value is calculated by dividing the number of occurrences of a specific word by the total number of words on that exam. This number shows how common the word is to the particular exam. For instance, the highest occurrence percentage is 12%, which is the word “marks” on the Physical Chemistry exam.
The data shows that the correlation between the independent and dependent axes is not linear, it is hyperbolic. This follows Zipf’s law, demonstrating that a small number of words are used much more frequently than all others. The frequency distribution shows that exams that have fewer words have a weaker hyperbolic correlation between occurrence percentage and rank. According to Zipf’s law, this could mean that the language used in these exams has less similarity to natural language. Table 1 shows that mathematics courses generally have lower word counts than other types of courses, meaning that they may use language that has less resemblance to natural language.
Another observation is that the maximum value of the percentage-occurrence of the most frequently used word is not predicted by the type of exam being analyzed. It is also noted that the most frequent word (other than “the”) is not the same for any two of these exams. So it
166
appears that different exams use a different corpus of vocabulary, even within a particular subject, e.g. mathematics.
Basic statistical analysis was also carried out for this dataset. The minimum number of words used is 91 (Calculus I), and the maximum is 565 (Introduction to Materials Science). The mean and standard deviations for this dataset are 282.7 and 164.3, respectively. This indicates that there is large variability in the word count for the exams studied; some exams have a much higher count than others. Overall, the data offers a preliminary look at the way vocabulary is utilized in engineering learning materials.
Table 1- Course information and frequency of unique words on the final exam. Horizontal axis – unique words Vertical axis – % occurrence. (Calculated by taking the # of occurrences of that word, dividing by the # of unique words present, and then multiplying by 100).
Name (Category)
Frequency of Words
Calculus I (Mathematics) # of unique words: 91
Calculus II (Mathematics) # of unique words: 172
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
0 20 40 60 80 100
0.00%
2.00%
4.00%
6.00%
8.00%
0 50 100 150 200
167
Linear Algebra (Mathematics) # of unique words: 165
Physical Chemistry (Chemistry) # of unique words: 221
Engineering Strategies and Practice (Engineering Design and Communication) # of unique words: 525
Introduction to Materials Science (Materials Science) # of unique words: 565
0.00%
2.00%
4.00%
6.00%
8.00%
0 50 100 150 200
0.00%2.00%4.00%6.00%8.00%
10.00%12.00%14.00%
0 50 100 150 200 250
0.00%
2.00%
4.00%
6.00%
8.00%
0 100 200 300 400 500 600
0.00%
2.00%
4.00%
6.00%
0 100 200 300 400 500 600
168
Fundamentals of Computer Programming (Computer Programming) # of unique words: 315
Electrical Fundamentals (Electrical Circuits) # of unique words: 304
Mechanics (Statics) # of unique words: 183
The results of this study can be situated in the context of the existing literature. For example, if the language used on engineering exams is typical of natural language then it should follow Zipf’s law of word frequency. Luhn’s work in the field of information retrieval suggests methods of data mining that can be applied to word frequency datasets.7,14, 15 And Van Rijsberben provides a critique of these approaches, and investigates other methods that can aid in creating word frequency analyses that are more meaningful.7
Zipf’s law states that the frequency of any word is proportional to its rank in the frequency table.9 From the limited set of exams analyzed to date it is clear that mathematics exams have a weaker correlation with Zipf’s law than the other types of exams. As with Li’s study10 mentioned earlier, this implies that the mathematics exams in this sample use language that departs from natural language. Examining the word lists from the math courses and design
0.00%
2.00%
4.00%
6.00%
8.00%
0 100 200 300 400 500
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
0 50 100 150 200 250 300 350
0.00%
2.00%
4.00%
6.00%
0 50 100 150 200
169
course supports this. Words that are common in natural language, such as “then”, “if”, and “but” appear infrequently on the math exams. Words that appear frequently on the math exams include “point”, “space”, and “choice” which are less usual in natural language.
Luhn worked extensively with information retrieval technologies, and suggests ways of data mining for accurate retrieval based on input queries. Specifically, he suggests that words be assigned tags and weightings. Tagging, for example, can be used to distinguish unique word definitions, e.g. allow “Apple” the company name to be distinguished from “apple” the fruit. Further, words that are variations of the same root word can be assigned the same tag. This reduces the clutter in the dataset because synonymous words are deleted, assuming the tagging has been done carefully. Weighting, however, means that we assign values to words; the values could be assigned based on a particular set of criteria. For example, we could assign terms that are less familiar to our students a higher weighting than terms that are more familiar, if there was data on familiarity. In general, tagging and assigning weights are both examples of grouping techniques that help condense the dataset into more manageable units. Used together, these methods may help to identify words that combine frequent use with low familiarity, i.e. words that may pose the most frequent, and significant learning barriers for students. This is an important consideration if we want to distinguish inaccessible terms from accessible ones. However, this separation first requires that we understand the characteristics of the learner’s vocabulary a priori. Knowing these characteristics, it is then possible to use the frequency distribution graphs that have been developed to isolate regions where inaccessible terms are most likely to appear. The literature suggests that upper and lower cut-off points can be defined on a word frequency graphs.
However, it is not clear how to best apply this methodology if the learner’s a priori vocabulary is both unknown and continuously shifting. It may be possible with the growing usage of electronic textbooks for students to identify unfamiliar words as they are studying a subject and have this data collected automatically. We can imagine a system that weights words based on the frequency that they are identified as unfamiliar by the students in a given class, and that this is used to help instructors identify words that may be problematic as they develop an exam, much like a spell checker works. However, to understand the current data in the framework of learner characteristics it is necessary to establish a proxy system that makes use of some other feature that is common to inaccessible vocabulary in order to bring it to the attention of the instructor and the student. Further, it is important to understand the limitations of these approaches so that we can more-fully elucidate the issues with word frequency analysis.
Van Rijsbergen’s critique organizes several approaches into a framework that can be used to understand word frequency analysis better and suggests, at least minimally, how to begin to develop a proxy system. Specifically, his critique is important because it articulates the limitations of this work while informing a potential direction for the analysis. Van Rijsbergen explains how prefixes and suffixes affect the meaning of words. Moreover, an understanding of this issue allows us to remove related words to simplify the resulting dataset. For example, the
170
removal of “ual” from “factual” retains the meaning in the root, but this is not true if “ual” is removed from “equal”. In his interpretation of Luhn’s work he establishes that most unique intermediate terms appear between the upper and lower cut-off points, as seen in figure 1.
In our results, this range includes words such as “coexistence”, “conversion” and “dilemma”. These words do, from a purely subjective perspective, appear to be potentially more unfamiliar or less accessible for students. Moreover, these words may be more challenging than words such as “marks” or “thanks” which are very frequent or very infrequent. That is to say, although this is a blunt approach that may capture some very familiar terms, or leave out some unfamiliar terms, there appears to be some promise that inaccessible language can be bounded, to some degree, by using this word-frequency analysis technique. However, more work and a larger sample size will be required before a definitive conclusion is possible. It is also not clear yet where exactly to draw the cut-off lines.
Figure 1 - Shows how significant terms are likely located between the upper and lower cut-off regions. (Reproduced here7,15)
At present, the accuracy of finding unfamiliar and inaccessible language is low. It is difficult to predict where these inaccessible terms are simply from browsing and comparing the graphs in Table 1 alone. Our initial hypothesis was that inaccessible terminology would be used less frequently than accessible terminology. However, the individual data sets do not support this hypothesis. We found that unfamiliar and inaccessible terms are not necessarily infrequent. Rather, these terms may occupy a region that is intermediate between high frequency and low frequency. Further, the characteristics of unfamiliar language are vague; it is difficult to predict where these terms are without further and more in-depth work.
Such work could involve compiling a larger set of exams, reducing cluttering of the data by removing pre-/suffixes, using tagging as suggested by Luhn’s work and comparing individual exams to an amalgamated dataset of all exams. At present, this small dataset is useful for exploratory work in a specific area. However, having a larger dataset can help to better assess the hypothesis. In addition, reducing the clutter in the database by focussing on the root word, rather than the form that includes pre-/suffixes, can assist in compacting the dataset. This is
171
particularly useful in maintaining the integrity of the database because having multiple permutations of the same word still retains the same basic meaning, but adds to the overall word count of the exam. Also, comparing individual or groups of exams to an amalgamated dataset of all exams might yield interesting results. This comparison may identify how a given exam or group of exams (for example, design courses) compares to the general characteristics of vocabulary used in these materials. Discussing the common features of these exams versus the large dataset may yield information about how a specific type of course might be more/less likely to have unfamiliar/inaccessible language for its learning population.
This is a first exploratory step in a line of study that informs an approach that might make engineering education more accessible for the majority of students. As such it is situated in a Universal Instructional Design (UID) approach to improving the learning environment for students.16 However, it should be noted that there will be limitations to any set of results or remediation strategy that is developed from this work. First, there remains a portion of learners that are “high-risk”. This population includes learners who require specialized individual attention or accommodation. For example, simply making vocabulary more familiar will not remove the need for accommodation for students with learning disabilities, but it may make the learning environment somewhat more accessible for these students. Another limitation to the applicability of this research is that vocabulary is not the only barrier to accessibility in the engineering classroom. There are many dimensions to learner characteristics that impact accessibility.3
There are, however, a number of advantages to finding and mitigating inaccessible vocabulary. Using accessible language may assist students who would not otherwise self-identify as people who face barriers in the learning environment. This is related to the “curb cut” effect mentioned frequently in the UID literature.16 Overall, making language more accessible helps a diverse learning population feel more included in an environment conducive to professional skills development. This logic is often used in the Universal Instructional Design (UID) literature. UID describes principles that make the learning environment more accessible to students. For example, encouraging clarity and flexibility in the delivery of instructional material has a positive effect on a variety of students each having different learning ability characteristics.
Inclusivity can also potentially encourage greater student involvement in the learning process. Language is often cited as an issue in the literature on inclusivity.3 Further, understanding language supports and encourages the development of a robust professional vocabulary while maintaining the integrity of the course learning objectives.
Conclusions
Language can be one dimension of some inclusivity and accessibility issues students face in engineering education. Identifying vocabulary that might be unfamiliar and inaccessible has
172
many benefits for all students. It helps students overcome learning barriers, while giving instructors information they can use to help students develop a robust professional vocabulary.
Frequency analysis of language has several limitations, but this exploratory study has shown some interesting results. Specifically, infrequently used words are just as likely to be inaccessible as frequently-used words are, and vice-versa on a given exam. Moreover, words that were near the centre of the frequency distribution appeared less accessible in general. However, more work needs to be done to accurately identify inaccessible words using frequency analysis. At present, we need to establish criteria to help focus our search for inaccessible vocabulary.
The applicability of accessible language in engineering pedagogy is profound. Using a UID approach, we can create more inclusive learning environments that are more flexible and can accommodate different learner characteristics. Our future work will investigate ways of improving the process of finding and mitigating inaccessible language used in all levels of engineering education, in addition to making the environment more accessible and inclusive for students. References 1 "Synergies (2008 Annual Report) ". Rep. National Action Council for Minorities in Engineering. Web.
<http://www.nacme.org/user/docs/NACME_AnnualReport2008.pdf>. 2 "Vision, the NACME Continuum (2010 Annual Report)". Rep. National Action Council for Minorities in
Engineering. Web. <http://www.nacme.org/user/docs/NACME_AR_2010_FINAL.pdf>. 3 Variawa, C., and S. McCahan. 2010. Design of the learning environment for inclusivity. In Proceedings of the
2010 American Society for Engineering Education Annual Conference and Exposition. Louisville, KY. 4 Bello, D. "President Signs 'Plain Language' Bill into Law." Safety health 182.6 (2010): 21. 5 Petelin, R. "Considering Plain Language: Issues and Initiatives." Corporate Communications 15.2 (2010): 205-16. 6 Harper, R., and D. Zimmerman. 2009. "Exploring Plain Language Guidelines." IEEE International Professional
Communication Conference. 7 Van Rijsbergen, C.J. 1979. “Chapter 2: Automatic Text Analysis”, Information Retrieval, 2nd Edition, Butterworth: London. pp. 10-15. Web. <http://www.dcs.gla.ac.uk/Keith/pdf/Chapter2.pdf>. 8 Lease, Matthew. 2007. Natural Language Processing for Information Retrieval: The Time is Ripe
(again). Association for Computing Machinery, Inc. New York:NY. 9 Zipf, H.P. 1949. Human Behaviour and the Principle of Least Effort, Addison-Wesley, Cambridge:Massachusetts. 10 Li, W. "Random Texts Exhibit Zipf's-Law-Like Word Frequency Distribution." IEEE Transactions on
Information Theory 38.6 (1992): 1842-5. 11 Bloom, L. "Cognition and the Development of Language." Language 50.2 (1974): 398-412. 12 Saffran, Jenny R., et al. "Incidental Language Learning: Listening (and Learning) Out of the Corner of Your
Ear." Psychological Science 8.2 (1997): 101-5. 13 The Linguistics Encyclopedia. Ed. Kirsten Malmkjær. 2nd ed. ed. New York: Routledge, 2002. 14 Luhn, H.P., The automatic creation of literature abstracts, IBM Journal of Research and Development, 2.0 (1958):
159-165. 15 Schultz, C.K.. 1968. H.P. Luhn: Pioneer of Information Science - Selected Works. Macmillan. London. 16 Bowe, F. 2000. Universal design in education: Teaching nontraditional students. Westport, CT: Bergin &
Garvey.
173
APPENDIX A.4 – COMPUTATIONAL METHOD FOR IDENTIFYING INACCESSIBLE VOCABULARY IN ENGINEERING EDUCATIONAL
MATERIALS
C. Variawa, and S. McCahan. “Computational Method for Identifying Inaccessible Vocabulary used in Engineering Education.” Proc. of 119th ASEE Annual Conference and Exposition. San Antonio, 2012. This paper was presented at the 2012 American Society for Engineering Education Annual Conference. This paper investigates the design of a computational approach to characterize vocabulary on engineering examinations. The work describes an application of the Term Frequency-Inverse Document Frequency (TF-IDF) equation and the effect of using different comparator sets of documents on the wordlists generated. The discussion elaborates on these effects, and ways in which to promote vocabulary characterization that increases the TF-IDF scores of specific words.
174
Computational Method for Identifying Inaccessible Vocabulary in Engineering Educational Materials
Introduction Instructors often face the challenge of making students feel more included in the classroom, especially in freshmen classes of engineering. In the freshman classroom, instructors are more often finding that their students are departing from the “traditional” homogenous demographics of engineering in the past. Engineering classrooms have broader representation from all cultural and socio-economic backgrounds and even greater variance in approaches to learning leading to greater diversity. This increased diversity among students may also lead to barriers that impede accessibility to learning and as a result, inclusivity.
Universal Instructional Design (UID) is a pedagogical philosophy that has emerged in the field of higher education research. It aims to increase accessibility to learning materials. The core concept of UID, universal design, is from civil engineering and calls for increasing accessibility to physical structures by incorporating accessibility as a priority in the design process. Applied to education, this design philosophy attempts to “make instruction accessible to the greatest extent for the largest number of people possible”.1 The literature on this subject suggests the use of seven principles that guide teachers to create accessible learning material by increasing clarity, transparency, flexibility and usability of instruction. However, the use of UID has not been rigorously examined within the context of engineering education as a tool to create more inclusive learning environments. The premise of our study is to use a UID-inspired approach to make engineering education more accessible to the greatest diversity of students possible to the greatest degree; we hope to maximize accessibility to engineering course material with the goal of making learning environments more inclusive.
One particular learning barrier that our students face is inaccessible language. In engineering, we generally encourage the development of a robust engineering vocabulary to help students develop as professionals. However, a critical look at the language we use in the classroom may raise questions about the accessibility to course material when we begin to use vocabulary that is external to the corpus of language our diverse students bring with them. So, while we claim to promote professional language development, we may inadvertently create a less than inclusive environment for our students. Specifically, when we use uncommon language that is neither discipline-specific nor explicitly taught, we are creating an environment that is biased towards learners that have the same corpus of vocabulary as the instructor – this is particularly evident when colloquial or cultural vocabulary is used in the classroom. In particular, our study attempts to investigate the communication barriers that exist between instructors and diverse student populations within the context of engineering education.
175
Final examinations are a standardized artifact of the engineering classroom whose purpose is to assess the student’s understanding of course material. As a summative evaluation technique, the exam probes the students mastery of what was taught in class and how well they can apply this material to answer the provided questions. In many cases, instructors attempt to provide realistic problems using course material that is contextualized within a particular setting under given conditions. The goal of providing such authentic, contextualized questions is to mimic a “real world” situation where engineering knowledge can be applied. In doing so, the instructors may inadvertently test the student’s cultural knowledge of the contextualized environment rather than testing course-specific instruction exclusively. Specifically, the vocabulary used in creating authentic engineering problems on such assessments may cause an inaccessible and non-inclusive environment for some students. For example when we contextualize an engineering problem by using societal references that we assume are “widely known”, some students may find the vocabulary to be unclear or foreign; the challenge of the question becomes trying to understand that vocabulary, rather than using engineering knowledge to answer the question. By extension, the use of such vocabulary may yield an invalid performance assessment because the final exam is no longer testing what it purports to test: a student’s understanding and mastery of course material. The use of accessible vocabulary while maintaining an authentic assessment environment may lead to final exams of higher quality that may promote robust vocabulary development as well. In our investigation, we aim to maximize accessible vocabulary while leaving the course and discipline-specific vocabulary as-is so that the integrity of the course material is unaffected by creating a more inclusive exam.
The goal of the study is to develop a computational approach to identify potentially inaccessible vocabulary and bring it to the attention of the instructor, while ignoring engineering-specific vocabulary explicitly taught in a course. In the process of doing so, we must examine the language currently used on engineering final exams to determine if there is a method to distinguish course-specific or discipline-specific language from the rest. Specifically, to find inaccessible vocabulary we are going to find words that are course-specific and then, in future work, take the complement as a possible source of inaccessible vocabulary. This is to ensure we are not inadvertently labeling course-specific words as being inaccessible. This paper focuses on this specific aspect of the larger study on inaccessible language. In particular we find that literature from the field of linguistics, computer science, computational linguistics, higher education and even some work from culture studies suggest tools for this type of work. While there are some limited corpora of vocabulary that are discipline specific, our intention is to establish a dataset using language relevant to engineering education at a typical North American engineering institution. We begin this particular component of the larger study by trying to find course-specific language.
Computational linguistics suggests the use of keyword-generation algorithms to establish a corpus of words that are characteristic of a particular piece of work.2,3 In our case, we can use this approach to develop a quantitatively-described hierarchy of potential keywords for final
176
exams used in engineering. The hypothesis is that we can then compare the keywords found in one exam to others, as a group or individually, to inform trends that describe the use of language in engineering education. The powerful use of keyword comparison can help us see how language changes across disciplines, how vocabulary of freshman classes differ from upper years, and so on. The aim is to compare keywords of different test cases to suggest a usable approach for determining course-specific language on final exams used in engineering. Although the use of keyword-generating algorithms is not the only approach to this issue, and neither is it exhaustive at that, the goal is to explore the intersection of computational linguistics and engineering education and see if we can use a tool from one discipline to help make the other more accessible. Methodology This paper focusses on classifying the vocabulary found on engineering final exams using a quantitative computational method. The work presented here builds on earlier studies that examined students’ self-assessment of vocabulary understanding, and word frequency analysis in engineering exams.4,5 Our methodology makes use of the results from these earlier works.
Specifically, this investigation compares keywords from different exams to one another with the goal of finding a method where course-specific terms are clustered and segregated from other vocabulary. Earlier studies found that word frequency analysis on-its-own was not sufficient for this purpose.4 The field of computational linguistics and computer science suggest several potential approaches that can be used to generate and compare keywords among groups of documents that are an improvement over word frequency analysis. Although the algorithms used in these approaches are often different, the goal remains to find words that effectively characterize the vocabulary used in a particular document.
One such approach is the Term-Frequency Inverse-Document Frequency (TFIDF) algorithm. This technique compares the frequency of words in a single document (TF) to the vocabulary used in a set of documents. The mathematical formula for TFIDF is:
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 𝑇𝑇𝑇𝑇 × 𝑇𝑇𝑇𝑇𝑇𝑇
where
𝑇𝑇𝑇𝑇 = �# 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑡𝑡𝑜𝑜𝑡𝑡𝑡𝑡𝑡𝑡 # 𝑜𝑜𝑜𝑜 𝑤𝑤𝑜𝑜𝑜𝑜𝑤𝑤𝑜𝑜
�𝑖𝑖𝑖𝑖 𝑎𝑎 𝑠𝑠𝑖𝑖𝑖𝑖𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑠𝑠𝑖𝑖𝑑𝑑
and
𝑇𝑇𝑇𝑇𝑇𝑇 = log �# 𝑜𝑜𝑜𝑜 𝑤𝑤𝑜𝑜𝑜𝑜𝑜𝑜𝑑𝑑𝑜𝑜𝑜𝑜𝑡𝑡𝑜𝑜
# 𝑜𝑜𝑜𝑜 𝑤𝑤𝑜𝑜𝑜𝑜𝑜𝑜𝑑𝑑𝑜𝑜𝑜𝑜𝑡𝑡𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑡𝑡𝑡𝑡𝑐𝑐𝑜𝑜𝑐𝑐𝑜𝑜𝑐𝑐 𝑡𝑡ℎ𝑜𝑜 𝑤𝑤𝑜𝑜𝑜𝑜𝑤𝑤 �𝑖𝑖𝑖𝑖 𝑎𝑎 𝑠𝑠𝑠𝑠𝑑𝑑 𝑑𝑑𝑜𝑜 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑠𝑠𝑖𝑖𝑑𝑑𝑠𝑠
177
The IDF is a measure of how important a particular term is within a set of documents, and is calculated by dividing the total number of documents by the number of documents in the set which contain that term, and then takes the logarithm of the quotient. The TFIDF formula assigns a score to each word in the test document.
As an information retrieval tool, TFIDF is effective in finding characteristic words in a document when multiple documents are compared.6,7 The existing literature about TFIDF describes it as a technique used to classify documents based on keywords and modifiers. Specifically, TFIDF is used to describe documents using hierarchical subclasses, or other creative methods where the algorithm is used repeatedly per subclass. For example, a keyword for a computer hardware part might be described as “comp.sys.ibm.pc.hardware”, and this is an example of where the algorithm is used repeatedly in a loop within each subclass. From a computational perspective this puts a large load on the processor(s), and as such is quite intensive, but the results are generally accurate. Although we are not using a repeated looping method within subclasses for this study, we can still use the TFIDF to provide insight about words that are diagnostic for a document. Further, since we are using less processing loops, we are less concerned about computational overload and limitations on available computer memory.
Research shows that there are several other methods used to generate keywords used in computational linguistics and computer science.7-10 Research also shows tangential algorithms that can be used to classify documents and their strengths and weaknesses. In particular, the TFIDF method seems to be recognized as a validated approach to finding keywords in documents according to existing literature.7,8 The existing work also demonstrates approaches for accurately describing the magnitude of word-grouping behaviour using more sophisticated mathematical and statistical techniques including investigating the bottleneck method for disambiguation and greedy heuristic analysis approaches.8-10 In particular, the existing literature focuses on the algorithm itself and ways to describe documents using subclasses (for the sake of information retrieval) but not how to classify individual vocabulary in a study context similar to our work.
A higher TFIDF score means that the particular word being examined is diagnostic of that particular document and a low TFIDF score means that the word is not a keyword for the document. This algorithm is particularly effective because it tends to filter out common terms in a set of documents; the algorithm is able to filter out terms based on the documents being compared. As such, we can use the TFIDF method to potentially find words that are diagnostic of a particular exam and filter out words that are commonly used on engineering exams in general.
We use a repository of electronically available final exams in the Faculty of Applied Science and Engineering at the University of Toronto for this research. The total number of usable exams is 2254, with a final total of over 2,300,000 English words being examined in our work. The dataset spans the last ten years, covers all departments in the Faculty and is robust enough to
178
cover over 98% of all words on each exam used including captions on figures, etc. Words that contain numbers or foreign characters are discarded from the study: only words that contain ASCII characters from 65-122 (inclusive) are taken into account.
This phase of our study begins by converting the approximately 3000 electronically-available engineering exams from an image-PDF format to plain text using Optical Character Recognition (using Adobe Acrobat®). Then, all of the words from each exam are extracted and “scrubbed” to remove words that contain numbers or foreign characters. This word set is then automatically placed in its own new text-only file that is cross-referenced with the course designation, year, and discipline. Then, the user can select which of these files to compare against which group of exams to develop TFIDF values for that word set. The data is then exported and further scrubbed using a Microsoft Excel® macro to remove duplicates. The TFIDF processing code was created by the investigators. The investigators used four test cases for the TFIDF comparisons and these are detailed in the subsequent sections.
Results
The TFIDF computational method identifies keywords in a particular document by comparing it to a specified group of other documents. For this phase of the study, we compared one particular exam to four different groups of exams from the same institution to generate four case scenarios. The exam chosen as the “control” is from the Department of Materials Science and Engineering (MSE), for a third-year undergraduate course called ‘Mechanical Behavior of Materials’ held in 2009 (MSE316). This exam was chosen because the author is already familiar with the course-specific vocabulary and the course is very typical of a technical engineering course. We tried comparing this exam against 4 different document sets:
1. This exam is compared against all electronically available exams; 2. all exams created in the year 2009; 3. compared against all exams from the same department (MSE); 4. then compared against all exams from a different engineering department (Civil engineering).
In all cases, each word in the control exam is given a TFIDF score and ranked based on decreasing score. As mentioned, a higher TFIDF value means that the specific word has a higher probability of being a diagnostic word for that document.
179
Table 1- Shows TFIDF values in decreasing order for exam words across four test cases; bolded words indicate that the word is potentially course or discipline-specific
The table above shows a small sample of the 457 words from the MSE316 exam, along with the corresponding TFIDF value for each word ranked in decreasing order. The table also shows course or discipline-specific words, and these are highlighted in bold for each case. These words are explicitly taught in class and are expected to be known for this exam. For instance, the instructor lectures on crystal lattice structures and mentions deformation behavior along slip planes: these words would then become part of the “course-specific” corpus of this class. It should be noted we were hoping that this method would put course-specific words near the top of the list, as this is indicative of a tool that can successfully isolate these terms computationally.
The data shows that Case 2 has the highest number of course-specific terms near the top of the TFIDF list. Specifically, eight out of ten highest-ranked TFIDF words are course-specific with the farthest outlier ranked at 20. This means that Case 2 emerges as potentially the best approach when compared to the other three cases. However, to confirm this conclusion further testing with other exams may be required. The data shows that Case 1 has seven of its top ten words as
Rank Word TFIDF score Word TFIDF
score Word TFIDF score Word TFIDF
score1 dislocation 0.048687 dislocation 0.054035 dislocation 0.022046 dislocation 0.054812 stress 0.026217 stress 0.026728 ofll 0.01602 gb 0.0254653 ofll 0.023793 dislocations 0.022067 segment 0.015675 cry 0.025274 gb 0.020656 ofll 0.021895 gb 0.01388 crystal 0.0251385 dislocations 0.018314 grain 0.018767 stress 0.01372 dislocations 0.0223436 cry 0.017527 gb 0.01759 segments 0.011745 ofll 0.0222827 creep 0.017253 cry 0.01718 creep 0.011022 grain 0.0162828 grain 0.017148 slip 0.01653 subgrain 0.01047 slip 0.0140399 subgrain 0.016841 crystal 0.015391 dissociate 0.01047 creep 0.013568
10 partials 0.015869 deformation 0.014911 move 0.009878 stress 0.013437… … … … … … … … …
20 material 0.011816 tensile 0.011252 slope 0.00798 formation 0.01098430 formation 0.009302 strain 0.008904 appearance 0.006332 ofllv 0.00893740 continuation 0.008237 softening 0.007719 onset 0.005516 agb 0.0078350 onset 0.007397 agb 0.007072 cry 0.004803 leads 0.00672360 test 0.006167 theoretical 0.005505 unique 0.004257 strain 0.00599270 crystalline 0.005691 yx 0.004857 shaded 0.004044 plastic 0.005177
… … … … … … … … …100 matrix 0.004542 lowering 0.004413 pinned 0.00349 speaking 0.004469200 offset 0.002845 atoms 0.002752 mainly 0.002289 reflect 0.002914300 fx 0.00157 lb 0.001529 first 0.001323 stop 0.001542400 so 0.000285 what 0.000403 of 0.00036 the 0.000396
MSE316-2009 vs. All Exams
MSE316-2009 vs. All Exams in 2009
MSE316-2009 vs. All MSE exams
MSE316-2009 vs. All CIV exams
Case 1 Case 2 Case 3 Case 4
180
course-specific: however we begin to see some clustering behavior of course-specific words near the rank of 70. By “clustering” this means that some course-specific terms appear together in a group. This low-end clustering behaviour is also seen with Case 4, but the clustering behavior is around both 20 and 70. This type of behavior indicates that the computational method is leaving some course-specific words scattered throughout the list when we begin to compare an exam to the entire repository of exams or to just one other discipline. From the data we also see that comparing MSE316 to all exams from MSE the course-specific terms are often clustered near the top of the list, but the top ten also contains other words – this shows that comparing an exam to the exams of the same discipline does not yield particularly accurate results either. In particular, with the data computed and the particular scope of this study, it appears that comparing one exam to all exams in the same particular year shows the highest number of course-specific terms among the top ten with minimal outliers elsewhere.
Discussion
The data shows that the TFIDF method appears to be gathering course-specific terms at the top of the word lists. In computer science, the purpose of the TFIDF method is to find words characteristic of a particular document when comparing to a group of other documents. In our case, TFIDF appears to have done that successfully for all test cases, with some limitations. However, we see that this trend does have its limitations, based on the test cases reported in this study. First, not all course-specific terms are flagged by this algorithm. We see that some course-specific terms still appear at other locations on the word lists, but the majority of them appear near the top. This tells us that the computational method can be improved to become more accurate.
Another important limitation of this study is that the data are not sufficient to demonstrate a reliable process. We have compared just one exam to different groups of exams to generate our sample test cases. As an exploratory test case, it shows that the TFIDF method does gather course-specific terms near the top of the word lists. However, it does not show if this is applicable for other exams when other courses are used instead of MSE316-2009. As such, the need for expanding this study to include exams other than MSE316-2009 is necessary to better evaluate the reliability of the TFIDF method.
Another limitation of this study is that the TFIDF method only analyzes single words and not phrases. Specifically, the program is blind to the fact that multiple words appearing in a particular sequence can be classified as a course-specific phrase. For example, the term “face centered cubic” is treated as three words instead of as one specific phrase by the TFIDF algorithm while “FCC” is treated as one word even though both are course-specific to MSE316-2009. This limitation needs to be addressed
It should also be noted that the top ten words represent only the top 2% of words for this particular exam, and it does not provide particularly conclusive evidence about the applicability
181
of the computational process; we might need to examine more than just the top ten words. However, the purpose for this exploratory investigation is to compare cases to one another and determine which of these comparison types holds the most promise for being computationally effective – future work is needed to extend the scope of investigating these lists to include a more diverse set of exams.
Based on the literature in computer science, TFIDF is quite effective at making keyword lists that are characteristic of a particular document when compared to groups of other documents. Our study uses the TFIDF tool and tests it in the context of engineering education. Specifically, it extends the work found in the literature and shows that TFIDF is potentially able to find characteristic words on engineering exam documents as well. As instructors, we see from the data that “characteristic words” of exams are synonymous with course-specific terms. By extension, it appears that the algorithm originally developed in computer science can be used in engineering education for future work in the area of vocabulary analysis of assessment instruments.
The data shows that comparing one engineering exam to all engineering exams in the same year results in course-specific vocabulary having high TFIDF scores. In particular, the course-specific terms tend to demonstrate grouping behavior near the top of the list rather than being dispersed throughout the ranked vocabulary wordlist. The literature does not define why this behavior exists because such work has not been performed in this context before. Therefore, it is difficult to accurately describe the reasoning behind such behavior based on existing work. However, TFIDF is well known as a method to identify words that characterize a single document with respect to a body of documents. So from this perspective it makes sense that comparing to a wide range of other exams from the same year (i.e. use the same slang, or same current English colloquialisms) would allow for the identification of keywords that differentiate this exam from others.
While the data set presented here is limited, this study begins to offer insight into the development of more accessible course material for engineering. The vocabulary analysis process can potentially categorize words as course-specific, common, or into a third category; uncommon and not course-specific. It is this third category of terms that may pose accessibility challenges for students and can be brought to the attention of the instructor. Additional clarification or other assistance (such as a visual aid) can be used to improve the accessibility of the text. As instructors begin to create performance assessments for their students that contain more accessible language, they are also promoting a more valid assessment with the ability to contextualize material for more authentic questions. The premise of inclusive learning environments in engineering education is critical for the transition of students into post-secondary education as the goal is to develop a bias-free education system for the diverse nature of the learning population.
182
Future Work
The study attempts to investigate the language used in engineering education to promote inclusive learning environments. This particular investigation looks at the vocabulary used in engineering examinations and explores a computational method to distinguish course-specific vocabulary. By analyzing four cases, it is found that comparing a particular exam to all exams in that particular year is a promising approach for the computational approach to further investigate. Further work in this area will include a more diverse set of test cases and exams we compare to one another. In addition, we need to address the limitations of this study before we can conclusively state the effectiveness of using TFIDF as a method of finding course-specific terms. This exploratory study does however provide insight into the use of a computational tool to investigate the vocabulary used in engineering education. Future work in this area would help maintain the integrity of course-specific vocabulary on engineering examinations while we explore other ways to identify inaccessible language in engineering education. References 1 Bowe, F. 2000. Universal design in education: Teaching nontraditional students. Bergin & Garvey, Westport, CT. 2 Hausser, R. R. 2001. Foundations of computational linguistics: Human-computer communication in natural
language. Berlin: Springer. 3 Damascelli, A. T., and Martelli, A. 2003. Corpus linguistics and computational linguistics: An overview with
special reference to English. Torino: Celid. 4 Variawa, C., and S. McCahan. Identifying Language as a Learning Barrier in Engineering. International Journal of
Engineering Education. 28.1(2012): 183-191. 5 Variawa, C. and S. McCahan. 2011. Design of the Learning Environment for Inclusivity: A Review of the
Literature. Proceedings of the 117th ASEE Annual Conference and Exposition. Louisville, KY. 6 Matsunaga, L. 2008. Term Weighting Approaches for Text Categorization Improving. Proceedings of Eighth
International Conference on Intelligent Systems Design and Applications. Kaohsiung, Taiwan. 7 Russell, M. A. 2011. Mining the social web. Beijing: O'Reilly. 8 Foster, A., and P. Rafferty. 2010. Innovations in information retrieval: Perspectives for theory and practice.
London: Facet. 9 Bekkerman, R., et al. Distributional Word Clusters vs. Words for Text Categorization. Journal of Machine
Language Research. 3(2003): 1183-1208. 10 Hogenhout, W. and Y. Matsumoto. 1997. A preliminary study of word clustering based on syntactic behavior. In
T.M. Ellison (ed.) CoNLL97: Computational Natural Language.
183
APPENDIX A.5 – AN AUTOMATED APPROACH FOR FINDING COURSE-SPECIFIC VOCABULARY
C. Variawa, S. McCahan, and M. Chignell. “An Automated Approach for Finding Course-specific Vocabulary”. Proc. of 120th ASEE Annual Conference and Exposition. Atlanta, 2013. This paper was presented at the 2013 American Society for Engineering Education Annual Conference. This paper reviews the literature on automated indexing and language characterization, and presents a vocabulary characterization study. This work presents a modified algorithm based on the Term Frequency-Inverse Document Frequency word classification equation. The results present a wordlist and other data that suggest that this algorithm can characterize discipline-specific vocabulary on engineering final exams.
184
An Automated Approach for Finding Course-specific Vocabulary
Introduction
This study introduces methods to increase the transparency of specific learning outcomes expected in an engineering course. Freshman engineering students face the challenge of absorbing a new set of terminology associated with their discipline, while also adjusting to the university environment. As they learn, students may inaccurately grasp course concepts due to lack of understanding of domain vocabulary. One strategy for addressing this problem is to make design of vocabulary part of overall course design. This requires explicitly identifying the vocabulary that students need to learn in the course of their studies. Proper specification of vocabulary is likely to be particularly important in introductory courses that form the foundation of engineering disciplines.
Identifying discipline-specific words helps instructors establish clear expectations of required vocabulary knowledge, while building robust technical communication skills. If students have a clear understanding of required vocabulary, then instructors will be able to develop higher quality teaching and assessment material. As a result, instructors will likely be confident in the knowledge that students will not be handicapped by language usages that are neither part of their cultural background nor inherent to the course or domain. At the freshman level, vocabulary lists might be developed that highlight terms pertinent to the field. However, language has a fluidity that cannot be accurately captured by static wordlists that do not accommodate context. However, manual updating of word lists each year is an additional (and probably unwelcome) burden on instructors. In this study, the authors investigate an efficient and semi-automated approach for developing up-to-date course-specific vocabulary lists while requiring minimal contextual input from the instructor. The focus of this research is on engineering course material with the ultimate goal being to help freshman students adjust to new terminology in their field of study, without increasing the workload of teaching faculty. Going forward, the proposed computational method can inform the development of a tool - like a software program - that can automatically compile a list of course-specific vocabulary. Literature There are several approaches that can characterize language in document text. The fields of research that contain literature in this area include education, linguistics, computational linguistics, industrial engineering, as well as several others. Specifically, literature in the field of education pertinent to the study ranges from the Plain Language Movement to language acquisition and English as a Second Language research.1-3 These approaches aim to simplify
185
language structure and vocabulary to maximize accessibility.2,4 Further, research in this area focuses on the relationship of words to generate meaning and on how language development is affected by choice of vocabulary.1,2,4 The research informs an understanding of the importance of language development and the motivation to use accessible, yet immersive, language in learning environments.4-6 While this is important in public documents (i.e. tax forms) overly simplifying language does not suit the purposes of the engineering classroom. Engineering students need to develop robust vocabulary ability that is authentic to their field and will stand them in good stead when they take up their careers.
The fields of linguistics and computational linguistics are particularly broad, and they study language from several perspectives. Some approaches examine the development of language, symbolic meaning, and the structure of words.7, 8 Other approaches look at differences between languages and their evolution over time.9, 10 Computational approaches tend to convert the complexities of language into bits of information that can be quantified, classified and analyzed as packets of data. More specifically, this field investigates algorithms and tools that can measure and quantify vocabulary.11-14 Some algorithms and methods are broadly applicable across a range of linguistic fields. Classification algorithms use various corpora to organize words into hierarchical structures. Word hierarchies can also be elaborated with syntactic and semantic information to create a comprehensive representation of knowledge about the English lexicon. The most extensive tool of this type is WordNet, a database that contains words and their synonyms classified by relevance and similarities with each other, referred to as synsets.14 This approach forms lexical repositories of words that can be used to analyze the relationship of sets of words with one another.12-14 An advantage of using this approach is to develop a common lexical database of words pertinent to a field, but a disadvantage is that this repository continues to grow in size without a structured ability to prune words over time.12-14 Further tools like wordnet are designed to deal with the vocabulary of language in general and are less useful in organizing and explaining domain specific vocabularies. Thus an approach is needed that generates manageable, and domain specific, vocabulary lists.
Another area of research in computational linguistics is keyword-generation (automated indexing), and the development and application of algorithms to statistically determine the characteristics of words based on frequency. Some of the more frequently used approaches in this area include frequency analysis of words, keyword generation algorithms and artificial intelligence methods. Frequency analysis of words is an approach that attempts to correlate the frequency of use of a word in a target document to a corpus of English, or specific discipline.14,
16, 17 Prior work in this area shows that this method is useful to understand natural language, and can be used with algorithms that are supplemented by statistical theory, like Zipf’s Law16-18 Another approach is Latent Semantic Indexing, for example, which uses singular value decomposition (analogous to principal components analysis on large, sparse matrices) to identify associations between words based on their context, and which can also be used to generate data about the meaning of words when used in similar contexts.11, 13, 19 Multiword Expressions are
186
another set of approaches that investigate the meaning of words based on lexeme analysis,11, 12, 20 Specifically, multiword helps us understand that words can change meaning based on how they are used in a sentence, and this can inform a keyword generation procedure.11, 20 This general field of language analysis using computational approaches falls under a category of computer science and engineering defined as artificial intelligence (AI) because words are being translated from human vocabulary to computer-based computations, and then to a form that allows us to better understand its characteristics.
TF-IDF Approach
A preliminary analysis shows that a simplistic approach such as frequency-analysis on its own is inadequate to determine characteristic terms to a piece of text.21- 23 Frequency-analysis alone only generates information about how often certain words are used. This information is not particularly useful to this study because characteristic words on engineering documents are not necessarily those that appear most frequently. Specifically, literature shows that commonly occurring words are indicative of natural language, and not a measure of diagnostic vocabulary to an input document.13, 15, 18 As such, a more advanced approach is required: one that can characterize diagnostic words in documents, while requiring minimal contextual data other than the documents themselves, and one that can handle large sets of words and documents.
Term Frequency Inverse-Document Frequency (TF-IDF) analysis is a well known index method in information retrieval, and it is used to characterize vocabulary across sets of documents.11, 13,
15, 18, 23-25
The TF-IDF technique compares the frequency of words in a single document (TF) to the vocabulary used in a set of documents. The mathematical formula for TFIDF is:
TFIDF = TF × IDF
where
TF = �# of occurrencestotal # of words
�in a single target document
and
IDF = log �# of documents
# of documents containing the word �in a set of comparitor documents
There are two main parts to the TF-IDF algorithm, and they work together to assign a score for each word in the target document. The TF counts the number of occurrences of a particular word, and divides that number by the total number of words in the target document, which is a simple measure of frequency. The IDF is a measure of how important a particular term is within a set of documents, and is calculated by dividing the total number of documents by the number
187
of documents in the set which contain that term, and then takes its logarithm. The TFIDF formula multiplies these together and attaches the resulting score to each unique word in the target document. A higher TFIDF score means that the particular word being examined is diagnostic of that particular document and a low TFIDF score means that the word is not a keyword for the document. This approach allows us to differentiate common vocabulary from words that are characteristic to the target document, like course-specific language in this case. This approach works reasonably well for one target document (e.g. the final exam in a course), but does not do a good job at differentiating course-specific or discipline-specific vocabulary from words that appear infrequently in natural language. For example, it might identify both “enthalpy” and “circulation” as characteristic words on a thermodynamics exam because these both are likely to occur rarely in a comparator document set. But a thermodynamics instructor would easily recognize “enthalpy” as being key disciplinary jargon, and “circulation” as not specific to the discipline.
Interpreting the mechanics of the approach
The authors propose a method that should improve the effectiveness of the TF-IDF algorithm for the purposes of investigating the language used by engineering documents. Specifically, we suggest developing two TF-IDF scores for each word in a document and then calculating their difference to maximize accuracy in finding course-specific vocabulary. The approach would be to use two different contexts for the same document to calculate two TF-IDF scores:
1. Compare a target document to all documents in engineering, minus those that are in the same discipline. This should highlight terms that are characteristic of the discipline.
2. Compare a target document to all documents within the same discipline as that input document. This should highlight terms that are characteristic to that course.
This method generates two wordlists – one from each context listed above. These lists can then be sorted alphabetically while subtracting the TF-IDF scores for context #2 from context #1. This produces a list where words that are both course-specific and discipline-specific are given a high score, whereas all other types of words are given a lower score. This modified use of the TF-IDF algorithm can be expressed as:
TFIDF = TF × IDF
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇1 − 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇2 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = 𝑇𝑇𝑇𝑇(𝑇𝑇𝑇𝑇𝑇𝑇1 − 𝑇𝑇𝑇𝑇𝑇𝑇2)
Where, subscripts 1 and 2 would represent context #1 and context #2 respectively, and TF would be identical for both because input exam is the same.
188
And where,
𝑇𝑇𝐸𝐸 = # documents in engineering, minus discipline 𝑇𝑇𝐸𝐸,𝑊𝑊 = # documents in engineering, minus discipline, containing the same word 𝑇𝑇𝐷𝐷 = # documents in discipline 𝑇𝑇𝐷𝐷,𝑊𝑊 = # documents in discipline containing the same word
Condensing and simplifying:
𝑇𝑇𝑇𝑇𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 = log �𝑇𝑇𝐸𝐸 • 𝑇𝑇𝐷𝐷,𝑊𝑊
𝑇𝑇𝐸𝐸,𝑊𝑊 • 𝑇𝑇𝐷𝐷�
Using this approach, words can be characterized based on how prevalent they are in engineering and in their respective discipline.
• if 𝑇𝑇𝐸𝐸 • 𝑇𝑇𝐷𝐷,𝑊𝑊 is large, because there are lots of documents in the discipline containing the same word, then it causes the numerator to increase, resulting in the IDFmod becoming larger, which then amplifies the TFIDFmod value; this means that the word frequently occurs in the discipline but not necessarily all of engineering, which implies it is likely discipline-specific
• conversely, if 𝑇𝑇𝐸𝐸,𝑊𝑊 • 𝑇𝑇𝐷𝐷is large, as a result of many documents in engineering containing the same word, then IDFmod will get smaller, which will reduce the TFIDFmod
value; this means that the word occurs frequently in engineering but is not necessarily unique to the discipline, which implies it may not be discipline-specific.
As a result, when there is a word that has a high term frequency in a document, but occurs frequently in the discipline but not in all of engineering, then the modified approach would boost the score of that word. However, if that word does not occur frequently in the discipline but is common to engineering, then this algorithm would shrink its score. Therefore, the boosting effect only significantly affects words that are characteristic of that document, meaning it appears preferentially in the discipline but not necessarily in all of engineering Methodology This study develops a method for characterizing wording in engineering documents. In particular, we are interested in developing an approach to automatically identify course-specific language so that instructors can help first year students adjust to the terminology used in their chosen field of study. This is relevant to the field of accessible language in general, because it identifies vocabulary that students need to be familiar with in a professional context. The approach is outlined in Figure 1 below. Words are prepared for analysis by converting all input
189
documents to text-only format, then the TFIDF algorithm is used to develop word lists based on a target document (e.g. the final exam for a course) and sets of comparator documents. These word lists are then used to differentiate and highlight course-specific vocabulary that characterizes the target document.
PROCESSINGPreparing and cleaning
text
Engineering Exams
Raw
Te
xt
ORDERINGRanking characteristic keywords using TF-IDF computational method
Across EngineeringComputing an
exam to all exams in Engineering
Within Subject Computing an
exam to all exams in the Same Discipline
Wor
dlist
Wor
dlist
DIFFERENTIATINGPulling out the language used specifically in that
engineering course(across engineering MINUS within subject)
Wor
dlist
POST-PROCESSINGNormalizing the TFIDF
values of outputted wordlists
ADOBE ACROBAT X
TF-IDF (Visual Basic .NET)
MS EXCEL
Figure 1- Shows graphically the methodology used in this study from top to bottom
190
The type of engineering document chosen for this study is engineering final exams. These documents are standardized artifacts of the engineering learning environment and are publicly-available for research and study purposes at the University of Toronto. The large dataset of exams spans several years, creating a substantial amount of vocabulary that can be examined. For this study, the authors begin by acquiring all electronically-available engineering exams at the University of Toronto. In total 2254 exams were used in the Faculty of Applied Science and Engineering between the years 1999 to 2009. These exams are in a variety of graphics and document formats, but they were converted to PDF-format using Adobe Acrobat X Professional to simplify subsequent coding and processing.
Clean Data and OCR exams
The text for each exam was subjected to an optimization process, as outlined in the top-most box of Figure 1. This process removes the majority of non-word artifacts that occur because of the original hardcopy-to-electronic conversion. Some of these artifacts included specks, misshapen words, improperly-oriented pages, equations, and foreign non-ASCII characters. Text conversion failed for roughly 20 of the exams, which were excluded from the remainder of the analysis.
TF-IDF Algorithm and equations
Once the text files for each of the exams in the study are created and optimized, the authors developed an applet in Visual Basic.NET that would compute the TFIDF score for words in target documents. Specifically, the program prompts for an input document and a folder where comparator documents are located. It computes the TFIDF score for each word in the target document based on the words found within text files contained the folder specified earlier. It then generates a list of words and their associated TFIDF scores and outputs that as another text file. Each sample exam is run through this program twice. One pass compares the exam against a comparator set of exams within the same discipline, while the other compares the exam to all exams in the repository. This procedure results in the creation of two word lists.
For each of the input exams, the TFIDFmod score is developed by subtracting the two word lists for each of the target documents, as outlined figure 1. This step is critical to the process because it helps to distinguish between vocabulary used in a discipline from vocabulary used across engineering. Specifically, this approach is used to highlight and further differentiate course-specific words from other vocabulary on the sample exam by increasing the spread of TFIDF values and outputting them as a scored wordlist.
Post-processing the TFIDF scores
The wordlist generated from the previous step is plotted graphically. This step graphically depicts the quantity and range of TFIDF values across an exam.
191
Results
Sample Case –Materials Engineering Exam
The results below track an exam from a course called “Fracture and Failure of Engineering Materials”, which is part of the Materials Engineering curriculum at the University of Toronto where we did this research. The data shows the TFIDF scores for a sample exam from the repository. Table 1 shows a ranked list of words in the target exam, in order of decreasing TFIDF scores. Figure 2 shows the rank of all of the words from the same exam plotted against their corresponding TFIDF score.
Table 1 - Shows the TFIDF scores (top 25 and selected others) for a sample exam from the course "Fracture and Failure of Engineering Materials"
Rank Word ModifiedTFIDF Score 1 dislocation 0.046749 2 dislocations 0.016992 3 cry 0.016379 4 grain 0.015939 5 crystal 0.014845 6 stress 0.013639 7 material 0.011965 8 strength 0.010907 9 deformation 0.008955
10 creep 0.008446 11 partials 0.008165 12 ofll 0.007426 13 intermetallic 0.007198 14 subgrain 0.007193 15 tensile 0.007181 16 metallic 0.006853 17 gb 0.006749 18 hardening 0.006659 19 boundaries 0.006414 20 hallpetch 0.006259 21 crss 0.00569 22 composite 0.005598 23 strengthening 0.005518 24 elastic 0.005376 25 lattice 0.005137 …
200 fact 0.000435 …
350 able -0.000104 …
450 equals -0.001426
192
Figure 2 - Shows all of the TFIDF-scored words from a course called “Fracture and Failure of Engineering Materials”
The wordlist in Table 1 contains a high number of course-specific vocabulary, especially near the top of the list. This is the expected result as words that are characteristic of the sample document are assigned a higher TFIDF score than words that are commonly found on all engineering exams. As the rank gets larger, the number of non-course-specific words increases significantly. Though there are too many words to list individually here, a number of sample words at various points along the TFIDF scale are included. For example, looking at word 350 “able”, shows that it is assigned a negative value, and this is a direct result of the TFIDFmod shrinking the value because it occurs frequently in all of engineering, the discipline, and the exam.
It is also worth noting that there are some “non-sensical” terms that are prevalent on this exam. Though only a small portion are seen in Table 1, like “ofll”, “gb” and “crss”, most of them exist in the ranks greater than what is shown. Further, it is important to note that “gb” is shorthand for “grain boundary”, and “crss” is shorthand for “critical resolved shear stress”, both of which are words characteristic to the course and might be interpreted otherwise. Other terms such as “ae”, “gc”, “ndx”, “derisity”, and many others pollute the dataset even though the exams have been carefully processed. Unfortunately, these words continue to exist on all of the datasets and affect the computation of accurate TF-IDF values. This shows that though the approach shows promise to distinguish course-specific words from “everyday” language, there remain many artifacts that compromise the accuracy of using this method as currently defined.
193
Figure 2 graphically depicts the words and their corresponding TFIDF score, ranked in decreasing order going from left to right. The data show that there is a small subset of vocabulary – seen here as being ranked from 1 to roughly 50 – that have a much higher TFIDF score than the majority of other words on the list. That is, a few words have a high TFIDF score while the majority have a consistently lowscore. It is also important to note that the tail of the data in Fig. 2 shows a downward (negative) trend as it approaches the lowest TFIDF scores. In the wordlist, these words are typically nonsensical artifacts that pollute the dataset and are not course-specific.
Discussion
Critique of the Approach as situated in the Literature
The TD-IDF method is one in a spectrum of approaches that range in utility and feasibility when applied to investigating the discipline specific terminology in a course. Ideally, a method can be found that is easy to employ with minimum effort by the instructor (i.e. highly feasible), and produces a list that is of high value to the freshman student (i.e. of high utility). On one end of the spectrum there are approaches that examine just the frequency of words (e.g. Zipf’s Law12,
13, 18) and these approaches are highly feasible but low in utility. Frequency information is useful as it explains how ‘conversational-sounding’ the text is12, and which words are used more or less frequently than others, etc. but it does not provide much utility towards the purpose identified here. The ease of implementation is high though, because documents can be submitted to a software program that tallies the occurrences of each word and graphs this information. The shape of the graph can then easily be used to characterize the language.13, 18
Conversely, there exist approaches that use synsets, or the relationship of words to one another using language corpora, that can be used to characterize language on documents.13, 14 These approaches rely on comparing the meaning of words to one another, and these meanings are identified using tools such as WordNet, etc. These synset-based approaches produce a large quantity of rich data about the vocabulary. This information would include the meaning of words in sentences and how they evolve with the context in which they appear. Though very informative and thus high in utility, the feasibility of using such approaches is low because the amount of information required about the vocabulary being explored a priori is high. For example, the corpora used in identifying meaning needs to be continuously updated by an expert (or the instructor) to take into account the ever-changing vocabulary. As such, synset-based approaches require a large amount of support to produce and use corpora that include not only a list of words, but also information about how they associate. This may be preferable, but also necessitates the creation of a large back-end support system versus an overly-simplistic frequency analysis approach that does not provide much utility.
The method we have identified, a modified application of the TFIDF algorithm, works toward creating a dataset that has higher utility than pure frequency analysis yet is more feasible to
194
implement than synset-based approaches. This is because the approach does not require multiple corpora and systems to understand the specific meaning and relationships between words, but instead uses contextual information provided by the comparator document set. Specifically, the user provides comparator sets in the form of groups of exams or other teaching materials. Users are not required to know specific details about the comparator sets, other than the course and disciplines, but they need to provide the document sets in a machine-readable format. This approach is a tradeoff between utility and feasibility because although it requires some contextual information, which comes in the form of other documents, it does not require a continuously-updating support system to update the meanings and relationships between words. As such, the TF-IDF method has a higher utility than purely a frequency analysis approach, while also being more feasible to implement than approaches based on synsets, yet still provide information that can assist us in separating discipline-specific language from others.
One can imagine a system that automatically files final exams into a database based on course information and a few key words that identify the field, e.g. materials science fundamentals or materials and metallurgy, etc. The instructor could simply identify the target document or documents (such as last year’s midterm exam, or final exam for her course), identify courses in the same discipline by course number or keyword, and then hit “run”. The program would automatically produce a word list for her to distribute to her students at the start of the new term. For a freshman student a word list like this lessens anxiety about what they need to learn and creates a starting vocabulary of terms relevant to the field.
Critique of the Methodology as it exists right now
The preliminary results suggest that the modified TFIDF approach is able to distinguish discipline-specific vocabulary from other words. The methodology is soundly grounded in existing methods of automated indexingand the TFIDF lists that we have produced appear to be largely discipline-specific for the first 50 or so words.
This method needs further improvement to eliminate artifacts before progressing further. The data currently shows a high number of nonsensical terms that do not appear to be in the English language. Specifically, the data contains terms like ‘ofll’ and ‘gb’ Ideally, artifacts would be eliminated before TFIDF calculations are made. However, this is not a straightforward task because of the use of acronyms and other anomalies in engineering jargon. Suggested approaches to this problem are as follows. First, it may be possible to remove words that do not include vowels a priori to being scanned. In doing so, we remove a significant portion of terms that might not exist in the English language6, 7 but risk removing important engineering acronyms. Another strategy is to incorporate a ‘spell-checker’ application that can scan text to highlight these terms using a combination of English-language as well as existing Corpus-based tools such as WordNet. This approach uses a larger corpus of language comparison tools than most word processors because it can draw on language from engineering corpora as well as standard English corpora. However more involved, this second strategy does not remove words
195
automatically and thus ensures that the process is not removing vocabulary that may be pertinent to the student, like technical jargon. However, as a result, each word on the outputted wordlists would need to be extracted manually and this could be a lengthy process that reduces the feasibility of a computational approach for an instructor. Such a method might also be used instead to highlight terms on the final outputted list. This way, an instructor can visually see words that appear high in the TF-IDF lists and appear in relevant corpora as well. Ideally, a combination of the vowel-removal and corpora-scan methods could be employed to remove as many non-English words as possible to maximize the effectiveness of a computational approach characterizing the language in engineering courses. As an added benefit an instructor could input a draft of an exam or other piece of course material and explicitly identify the vocabulary they are testing.
Conclusions
This study uses a modified approach from the field of computational linguistics to characterize vocabulary on engineering exams. The objective is to increase the transparency of learning outcomes expected in an engineering classroom, specifically the development of a professional vocabulary. By using a repository of 2229 exams, a modified term-frequency inverse-document frequency (TFIDF) algorithm assigns a weight to each word in an input exam by comparing it to the occurrences of those words across all exams; the weight represents the degree to which that word is characteristic to that document. The data show that this method does appear to preferentially give course-specific words higher ranking. However, we also found that these wordlists are polluted with non-English words and that further work in cleaning the input text files a priori is required. Going forward, this method can be used to create tools, like a software program, that can assist in compiling together a list of professional vocabulary in a course automatically. References
1. Mazur, Beth. "Revisiting Plain Language." Technical Communication: Journal of the Society for Technical Communication 47.2 (2000): 205-11.
2. Robinson, Peter, and Nick C. Ellis, eds. Handbook of cognitive linguistics and second language acquisition. London and New York: Routledge, 2008.
3. Ahearn, Laura M. "Language Acquisition and Socialization." Living Language: An Introduction to Linguistic Anthropology (2011): 50-64.
4. Bunch, George C., Percy L. Abram, Rachel A. Lotan, and Guadalupe Valdés. "Beyond sheltered instruction: Rethinking conditions for academic language development." TESOL Journal 10, no. 2‐3 (2001): 28-33.
5. Braun, Sabine. "From pedagogically relevant corpora to authentic language learning contents." ReCALL 17.1 (2005): 47-64.
6. Krashen, Stephen D. Explorations in language acquisition and use. Portsmouth, NH: Heinemann, 2003. 7. De Saussure, Ferdinand. Course in general linguistics. Columbia University Press, 2011. 8. Jackendoff, Ray. Foundations of language: Brain, meaning, grammar, evolution. Oxford University Press,
USA, 2002.
196
9. Dąbrowska, Ewa, and James Street. "Individual differences in language attainment: Comprehension of passive sentences by native and non-native English speakers." Language Sciences 28.6 (2006): 604-615.
10. Aitchison, Jean. Language change: progress or decay?. Cambridge University Press, 2000. 11. Church, Kenneth W., and Robert L. Mercer. "Introduction to the special issue on computational linguistics
using large corpora." Computational Linguistics19.1 (1993): 1-24. 12. Mitkov, Ruslan, ed. The Oxford handbook of computational linguistics. Oxford: Oxford University Press,
2003. 13. McEnery, Tony, Andrew Wilson, and Geoff Barnbrook. "Corpus linguistics." Computational
Linguistics 24.2 (2003). 14. Budanitsky, Alexander, and Graeme Hirst. "Evaluating wordnet-based measures of lexical semantic
relatedness." Computational Linguistics 32.1 (2006): 13-47. 15. Bybee, Joan L., and Paul Hopper, eds. Frequency and the emergence of linguistic structure. Vol. 45. John
Benjamins Publishing Company, 2001. 16. Phillips, Betty S. "Lexical diffusion, lexical frequency, and lexical analysis." Typological Studies in
Language. 45 (2001): 123-136. 17. Roland, Douglas, Frederic Dick, and Jeffrey L. Elman. "Frequency of basic English grammatical structures:
A corpus analysis." Journal of Memory and Language 57.3 (2007): 348-379. 18. Montemurro, Marcelo A. "Beyond the Zipf–Mandelbrot law in quantitative linguistics." Physica A:
Statistical Mechanics and its Applications 300.3 (2001): 567-578. 19. Bellegarda, Jerome R. "Exploiting latent semantic information in statistical language
modeling." Proceedings of the IEEE 88.8 (2000): 1279-1296. 20. Maynard, Diana, and Sophia Ananiadou. "Trucks: a model for automatic multiword term
recognition." Journal of Natural Language Processing 8.1 (2000): 101-126. 21. Variawa, Chirag, and Susan McCahan. “Identifying Language as a Learning Barrier in Engineering.”
International Journal of Engineering Education 28.1 (2012): 183-191. 22. SHI, Congying, Chaojun XU, and X. Yang. "Study of TFIDF algorithm." Journal of Computer
Applications 29 (2009): 167-170. 23. Robertson, Stephen. "Understanding inverse document frequency: on theoretical arguments for
IDF." Journal of Documentation 60.5 (2004): 503-520. 24. Singhal, Amit. "Modern information retrieval: A brief overview." IEEE Data Engineering Bulletin 24.4
(2001): 35-43. 25. Han, Eui-Hong, and George Karypis. "Centroid-based document classification: Analysis and experimental
results." Principles of Data Mining and Knowledge Discovery (2000): 116-123.
197
APPENDIX A.6 – AN AUTOMATED COURSE-SPECIFIC VOCABULARY IDENTIFICATION PROGRAM
C. Variawa, and S. McCahan. “Exploring the Applicability of an Automated Course-specific Vocabulary Search Program.” Proc. of Canadian Engineering Education Association Conference. CEEA Paper No. 59. Montreal, 2013. This paper was presented at the 2013 Canadian Engineering Education Association Annual Conference. This paper discusses the application of a modified algorithm based on the Term Frequency-Inverse Document Frequency (TF-IDF) equation as applied a second-year undergraduate chemical engineering exam. The results suggest clustering of TF-IDF scores around specific values, and similar data within each cluster.
198
AN AUTOMATED COURSE-SPECIFIC VOCABULARY IDENTIFICATION PROGRAM
Variawa, C., and McCahan, S. Department of Mechanical and Industrial Engineering, University of Toronto, Ontario Canada
[email protected]; [email protected]
1. INTRODUCTION
Professional engineering language is often used to precisely describe specific objects, processes, or situations but may not be taught to students in the engineering classroom as an explicit course objective. Though the method of teaching this vocabulary varies, students’ mastery in understanding this corpus is usually assessed (explicitly or implicitly) using written tests and exams based on course content.
As instructors develop their courses around specific learning outcomes, it can become difficult to accurately characterize the vocabulary that ought to be learned. In particular, the words may change over time and with instructor, the ability to discern course-specific from non-course-specific words can be subjective, and the word list of required vocabulary may not be feasible to produce manually. In addition, if the list of vocabulary is not defined a priori, then the transparency of the learning outcomes and assessment validity are reduced.
One approach to developing a critical vocabulary list is to deploy an automated strategy that statistically produces a list of “course-specific” words with minimal user intervention. The strategy relies on analyzing recent existing teaching materials for characteristic vocabulary. This dynamic approach maximizes objectivity while giving the instructor a starting point for defining a list of necessary vocabulary.
The general field of language analysis using automated computational approaches falls under a category of computer science and engineering defined as artificial intelligence. Specifically, words are being translated from human vocabulary to statistical values, and then combined to form a hierarchy of diagnostic keywords for a given document. The output of such an approach would be a word list of course-specific terms.
For this study, the researchers modified an algorithm called Term Frequency-Inverse Document Frequency (TF-IDF) to classify the vocabulary on a group of engineering exams. This algorithm first calculates the term frequency of each word on an exam by dividing the number of occurrences of a word by the total number of words in that document. The term frequency is then multiplied by the inverse document frequency, which is the logarithm of the number of documents in a comparator set divided by the number of documents in this set containing the word. This factor is dependent on the sample size; so, the more documents that are available in the database, the more accurate the output score for the word. Then, the word and its TF-IDF score are tabulated.
This method appears to show a correlation between course-specific words and TF-IDF score, but some common words also have high TF-IDF scores. To mitigate this, the researchers develop another word list using an identical TF-IDF approach, but this time using a different comparator set. The first word list is created by comparing an exam to all
exams in the same discipline. The second word list is created by comparing the same exam to exams in all of engineering. The first score is subtracted from the second resulting in a word list that differentiates course-specific vocabulary more reliably.
2. RESULTS/DISCUSSION
The automated method is used at the University of Toronto
(UofT) on a database of 2251 engineering exams from the years 2007-2011. The method was coded into a software program that automates serial tasks such as optical character recognition, content organization, and processing of calculated results and words.
The authors combined several instances of one course – CHE230 “Environmental Chemistry” – over a span of three years, 2009-2011, into a large word list. This wordlist is then compared to the 21 million word UofT Engineering Corpus using the TF-IDF algorithm. The resulting wordlist is graphically presented in Fig 1.
The figure above shows that the first 75 or so words on this list have a significantly higher modified TF-IDF score than the rest of the 507 words. Since the score is a measure of how characteristic a word is to a given document (in this case, 3 instances of the same exam), the first 75-words can be investigated in detail to see whether these appear to be course or domain-specific words. The long “plateau” of words, starting around 75 on the x-axis, are those that appear just as frequently in the CHE230 exams as they do in all of engineering, while the tail-end around 500 are words that occur less frequently in CHE230, than in engineering in general.
Theoretically, high scoring words should be the only ones that are characteristic to the document being investigated. To test whether the vocabulary is properly sorted, the authors are working with the course staff of CHE230 and a linguistics expert to identify whether the method has captured an appropriate word list. Future work includes increasing the robustness of this method by sifting words that are misspelled, or are non-English words. The goal is to reduce inaccessible language while promoting technical vocabulary development.
Figure 1 - Shows the combined results of CHE230 over a 3 year period. The TFIDF score is an indicator of keywords, and appears to plateau around the 75-word mark
199
APPENDIX A.7 – ENGINEERING VOCABULARY DEVELOPMENT USING AN AUTOMATED SOFTWARE TOOL
C. Variawa, S. McCahan. “Engineering Vocabulary Development using an Automated Software Tool”. Proc. of 121st ASEE Annual Conference and Exposition. Indianapolis, 2014. This paper will be presented at the 2014 American Society for Engineering Education Annual Conference. This paper evaluates the effectiveness of a modified algorithm based on the Term Frequency-Inverse Document Frequency (TF-IDF) in characterizing vocabulary. The dataset used for this study are written final exams at the University of Toronto. This paper presents and discusses a study where subject-matter experts evaluate the efficacy of this algorithm in identifying discipline-specific vocabulary on written final exams in engineering.
200
Engineering Vocabulary Development using an Automated Software Tool
Abstract
Understanding technical vocabulary is often a desired learning outcome in engineering education, and a significant part of professional communication in the engineering profession. Language used in engineering education plays a key role in creating an accessible and inclusive learning environment. The corpus of language common to both the instructor and student ought to converge as the student masters the course content. Instructors may currently use techniques to help identify this vocabulary, including referring to glossaries and increasing the frequency of their use in the classroom. There is an opportunity to increase transparency and accessibility to such vocabulary by developing an automated software-based tool that can be used by instructors to create customized course-specific wordlists for their courses. Using text extracted from instructional material in a course, the algorithm developed for this study is able to hierarchically identify and display course-specific terminology using principles from artificial intelligence, linguistics, higher education, and industrial engineering. Grounded in the theory of Universal Instructional Design, these wordlists can be integrated into a syllabus and then be used as a teaching aid to promote an accessible engineering education. The goal is to reduce barriers to learning by developing an explicitly-identified and robust list of vocabulary for all students in a given course. Creating an automated program that improves vocabulary information over time keeps it relevant and usable by instructors as well as students.
Presently, there is no automated method to develop course-specific vocabulary lists. To fill this gap, the authors have created a computer program, using a repository of over 2200 engineering exams since the year 2000 from the University of Toronto, which automatically identifies domain-specific terms on any given engineering exam. Specifically, each word from each exam is digitized and computed against others using a modified form of the Term-Frequency Inverse Document-Frequency (TF-IDF) algorithm to generate lists of context-specific characteristic terms. This well-known algorithm is used in the field of computational linguistics as a method of identifying words characteristic to a document, given a comparator set of documents. In this work, a modified approach has been developed that uses several comparator sets to produce a list of engineering vocabulary for a course. The effectiveness of this approach is evaluated by comparing the results to the judgment of subject-matter experts. This paper will use the data gathered to discuss the efficacy of this automated program in the context of engineering research methods, and will identify ways in which to make this program accessible to, and usable by, more educators in the field of engineering education.
201
Introduction
This study investigates an approach to increase transparency of learning outcomes by explicitly defining them for students. Engineering students, in particular at the undergraduate level, are subject to understanding terminology relevant to their discipline, as well as the context in which these terms can be used appropriately. Through understanding of discipline-specific vocabulary, each student eventually forms a corpus of words that they can use as part of professional practice. As such, the importance of learning discipline-specific vocabulary forms a critical component of learning in engineering education, and is an area for research and optimization.
Currently, identifying discipline-specific vocabulary must be done manually. If the instructor chooses to, they will review course material and make a list of course vocabulary based on their subject-matter expertise. Sometimes, an instructor may defer to a “glossary” of a required course textbook, or the body of the textbook to support the teaching of vocabulary. However, this may imply that all terms are equivalently weighted in terms of importance; it relies on having an up-to-date text; and it relies on the text matching the instructor’s terminology and teaching methods. In general, these manual processes are time-consuming and are not particularly rigorous to evolving knowledge and instructional environments.
An automated strategy can be based on existing instructional materials and be used as a starting point for further refinement by the instructor of the course. In this work we explore whether a computational method can be used to characterize vocabulary in engineering documents, and the efficacy of doing so. The approach used in this research is to develop and evaluate a computer program that can replicate human subject-matter expertise in characterizing vocabulary in instructional materials. This would provide a basis for further refining the learning outcomes to increase transparency, and as a result, accessibility to learning materials. The strategy for addressing this problem is to make design of vocabulary part of overall course design. This requires explicitly identifying the vocabulary that students need to learn in the course of their studies and is based on the framework of universal instructional design.
Literature This research is based on the framework of Universal Instructional Design (UID). The goal of the study is to increase accessibility to education by providing clearly-defined learning outcomes. In this specific study, this is done by identifying the discipline-specific requisite vocabulary that students need to master in a course. The UID framework is to “maximize accessibility to the greatest degree possible for the greatest number of users possible”. Here, the research study attempts to maximize accessibility to language used in engineering education for students. As such, the principles of universal design should help guide research toward more accessible learning environment design for diverse student populations. There have been a number of authors who have interpreted the principles of universal instructional design.1-3 The
202
universal design framework applies the principle of “learner centered” not just to one teaching instance, but to the design of the whole learning environment at every level. McGuire, Scott, and Shaw suggest that this framework is a “paradigm shift” that promotes uniformity of academic goals and standards by designing accessibility into a course, curriculum, and institution, rather than making exceptions for individual students who do not fit our preconceived idea of what is “typical”.1 They point out that individualized accommodation will still be necessary for some students. However, pervasive use of exceptions may undermine the integrity of a course, whereas designing accessibility into a course opens up learning opportunities for a broad range of students. Additionally, they have noted that this framework remains a largely untested strategy that requires further testing and validation. Pliner and Johnson discuss UID in relation to transforming social relationships which can be negatively affected by invisible barriers to inclusivity.3 Their work suggests that implementing UID pedagogy creates a more “inclusive” environment which can decrease the barriers to learning that all individuals may have to some extent.
A review of the literature shows that there is serious concern about barriers to success for students, and a wide variety of approaches have been employed to try to mitigate barriers for at-risk students. Universal Instructional Design offers one possible approach and a framework for interpreting the impact of mitigation tactics. It will serve as a useful context for designing instructional tools that aim to maximize accessibility to education. However, instructors should also bear in mind that this is not the only framework and other ways of thinking about these issues should be investigated.
Building on the previous work performed by the authors4, this paper expands on the modified Term Frequency-Inverse Document Frequency (TF-IDF) algorithm already used extensively in the area of vocabulary analysis. In summary, the TF-IDF algorithm is borne from the field of automated indexing and computational linguistics and a widely-accepted form of vocabulary characterization.5-10 It takes an input document, stores each word into an array element, then performs a series of mathematical calculations to assign a numerical score to each word. This score is a diagnostic measure of how characteristic a word is to that specific document. This algorithm assigns this score based on term frequency, and how often each of the words in that document appears in a comparator set of documents. The TF-IDF algorithm is based on the following equation:
TFIDF = TF × IDF
where
TF = �# of occurrencestotal # of words
�in a single target document
And,
203
IDF = log �# of documents
# of documents containing the word �in a set of comparator documents
The TF-IDF equation is a measure of how characteristic a word is to a document, and can be discussed in terms of its constituent terms. The TF is a number determined by counting of occurrences of a particular word, and dividing that number by the total number of words in the target document: as such, it is a measure of frequency. The IDF is a measure of how important a particular term is within a set of documents, and is calculated by dividing the total number of documents by the number of documents in the set which contain that term, and then takes its logarithm. The TF-IDF formula multiplies these together and attaches the resulting score to each unique word in the target document. This equation works by comparing the static term frequency score for each word in an input document by a variable inverse-document frequency-score.
Since the comparator set of documents can change based on a number of factors, including year, instructors, etc., the IDF score can be updated and influence the TF-IDF score for all documents. This causes the TF-IDF statistic to evolve with changing datasets, and helps address the issue associated with evolving language. Additionally, the multiplication factor, the logarithm, enhances the effect of the document frequency and increases the resolution of finding characteristic terms within the input document. A high weight in TF-IDF is reached by a high TF and a low IDF of a word in the comparator set of documents. The weights therefore tend to filter out common terms. Since the ratio inside the log inverse DF is greater than or equal to 1, the value of IDF (and TF-IDF) is greater than or equal to 0. As a term appears in more documents, the ratio inside the logarithm approaches 1, bringing the IDF and TF-IDF closer to 0. This expands the effect of terms appearing in multiple documents, and maximizes its contribution to the TF-IDF score even though the TF score itself may be very similar to others in a particular document.
The modification to this approach, also discussed in previous work4, is to use this TF-IDF algorithm repetitively in different contexts. Specifically, an input document can have the words within it be calculated using TF-IDF using one comparator set, and then calculated again using another comparator set. In both cases, the words will be the same since the same input document is being used. The TF-IDF scores, however, will be different because of the comparator sets. Based on the context being used, words will have a lower or higher TF-IDF score. Further, this phenomenon can be exploited to further extend the resolution TF-IDF scores, in particular by helping the experimenters discern vocabulary that is characteristic to an input document in a specific user-defined context – like a particular discipline, for example.
204
Methodology This study investigates the efficacy of the modified TF-IDF algorithm in mimicking human subject-matter expertise, as it develops wordlists of discipline-specific vocabulary. The methodology is comprised of two phases – the automated production of discipline-specific wordlists, and the testing of the efficacy of these wordlists. The first phase has been extensively published in previous work4 and these results show that the TF-IDF algorithm appears to work. The second phase of the study, as discussed in this paper, uses subject-matter experts – faculty members –to evaluate the efficacy of the wordlists developed. The correlation between the judgment of the subject-matter experts and the list generated through the computational method is assessed. The overall research study is outlined in Figures 1 and 2 below. Figure 1 shows phase one, and Figure 2 shows phase two, respectively. In phase one, words are prepared for analysis by converting all input documents to text-only format. Then the modified TF-IDF algorithm is used to develop word lists based on a target document (i.e. a specific document or set of documents from a specific course) and sets of comparator documents. The word list generated is a hierarchical discipline-specific vocabulary list that characterizes the target document. In phase two, human subject-matter experts were recruited to evaluate the efficacy of the automated approach in accurately identifying discipline-specific vocabulary. The documents used for this study are 2254 electronically-available undergraduate engineering final exams from the University of Toronto. These exams are a summative assessment of a student’s mastery of course concepts, and are intended to measure learning of the entire body of knowledge – or as close as possible – of a course. These documents are standardized across all engineering courses at the institution, are roughly the same length, administered in a closely-supervised environment, and are electronically available for data mining and study purposes. Due to the large quantity of words used in this study – over 22 million – this body of data serves as a starting point for additional research in the area of vocabulary characterization in engineering education.
205
PROCESSINGPreparing and cleaning
text
Engineering Exams
Raw
Te
xt
ORDERINGRanking characteristic keywords using TF-IDF computational method
Across EngineeringComputing an
exam to all exams in Engineering
Within Subject Computing an
exam to all exams in the Same
Discipline
Wor
dlist
Wor
dlist
DIFFERENTIATINGPulling out the language used specifically in that
engineering course(engineering minus discipline)
Wor
dlist
POST-PROCESSINGEliminating duplicates, sorting in decreasing order of TFIDF score
ADOBE ACROBAT X
TF-IDF (Visual Basic .NET)
MS EXCEL
Figure 1 - Shows graphically the methodology used in Phase One of the research study from top to bottom
206
Figure 2- Shows graphically the methodology used in Phase Two of the research study from top to bottom
207
Overview of Phase Two - Evaluating the Efficacy of the wordlists in capturing discipline-specific vocabulary
This study focuses on gauging how well the wordlists capture discipline-specific vocabulary. To evaluate this, 9 subject-matter experts were recruited from the pool of faculty members teaching the courses whose exams were processed in phase one. As instructors, these faculty members are very familiar with the language that ought to be discipline-specific for the courses that they teach. This aspect of the research has passed the ethics review at the institution where this study was conducted.
The methodology of this phase of the research involves training, calibration, quantitative data collection, and debriefing of each participant. A condensed methodology is described below:
1. Participants were recruited using a standardized email request. In some cases, participants were asked in-person as a follow-up to the email, to ensure that the email was read.
2. A Doodle.com account was created, and each willing participant was scheduled into a 1-hour meeting timeslot; one participant per timeslot.
3. At each meeting, the participant was provided with an “Informed Consent” document. This required form was signed by each participant of the study. The study was briefly explained. This exercise reaffirmed the goal and purpose of this research, and emphasized the importance of providing authentic input.
4. The participant was told that they will be provided with a randomized list of 100 words, extracted from final exams of courses they have instructed in the past. Though the course for each participant was unique, each wordlist was developed using combined data across all years that the participant taught that course.
5. Participants were told that they would be assigning a number to each word in the list, using a scale provided to them. This is a 5-point scale, and ranged from words being not discipline-specific to very discipline-specific. A brief calibration exercise preceded data collection. The participant was given a print-out of the scale, and was given five words orally. The participant briefly discussed what they would score these words, and after they were confident in using the scale, the study progressed forward.
6. The participant was then given a list of 100-words from their own course and asked to assign a number from 1 to 5 to each word.
7. After completing the study, the participant was debriefed and given a complete wordlist for their course. This wordlist contained ranked words with corresponding TF-IDF scores, and a copy of a short academic paper explaining the study (written by the experimenter). Each participant was also thanked for their time and contribution to this study.
208
8. Each of the 1100+ datapoints (scores) were then manually entered into Excel spreadsheets for data analysis to measure how they compare to the TF-IDF generated wordlists.
Results
The results from this evaluation study are currently being investigated to understand statistical significance. Preliminary calculations show that the algorithm works well for a yes/no characterization – domain-specific or not-domain-specific – but is weak in identifying words that fall in between 2 and 4 (inclusive) along the 5-point scale. For example, the initial data shows that the program can identify words that are characteristic to a discipline or not characteristic to a discipline, but has difficulty in differentiating more finely words that are somewhat characteristic, as judged by the subject-matter experts.
A sample output that shows the TF-IDF output and the human subject-matter expert score is provided in Table 1. below. The full wordlist from a sample freshman electrical engineering final exam is condensed onto 100-words, and sorted in decreasing TF-IDF score in the left-most column. The word itself is in the second column, followed by its TF-IDF score. The participant-assigned score is a value assigned by the faculty member, and falls along a scale that ranges from 1-5(inclusive), with a high value indicating a high degree of confidence that the word is discipline-specific. The quintile-rank is a value determined by binning the 100-word sample wordlist into 5 bins, and is used to map the TF-IDF score for each exam to the 5-point scale used by the faculty members. As such, a quintile rank of 5 should correspond to a participant rank of 5, and so on for an ideal case.
Table 1 – Shows a condensed sample of a wordlist from a freshman electrical engineering final exam. The word list is separated into 5 quintiles, indicated by differences in cell colour, and ranked in decreasing order of TF-IDF score. For brevity, the 100-word list has been condensed to show a sample of words from each quintile. Correlations are highlighted in yellow to the right.
209
RANK (/100) WORD TF-IDF SCORE
PARTICIPANT-ASSIGNED SCORE (/5)
QUINTILE-RANK (/5)
1 circuit 0.033323128 5 5
100-word CORRELATION (Using full 5-pt scale): 0.7165
2 voltage 0.015487884 5 5
3 electric 0.014911103 5 5
100-word CORRELATION (Using only extremes of 5-pt scale): 0.9272
4 capacitor 0.009280436 5 5 5 resistor 0.00906219 5 5
40 result 0.000262347 3 4 41 motor 0.000260432 3 4 42 discontinuous 0.000254686 3 4 43 tesla 0.000239045 5 4 44 deactivated 0.000227847 3 4 70 associated 0.000121868 3 3 71 respectively 0.000121452 1 3 72 half 0.00011827 1 3 73 results 0.000117417 3 3 74 losses 0.000112727 4 3 81 cannot 2.31533E-05 1 2 82 indicate 2.30839E-05 3 2 83 generated 2.05447E-05 3 2 84 difficulty 2.03236E-05 1 2 85 right 1.88357E-05 1 2 91 inside -9.57969E-05 1 1 92 variety -0.000101296 1 1 93 of -0.000115615 1 1 94 at -0.000124816 1 1 95 place -0.000125485 1 1
210
Discussion
The data shows that a correlation exists between the participant-assigned scores and the software-assigned scores for the sample case chosen. An initial investigation of the data shows that an outright correlation across the full 5-point scale between software-scores and human-scores is present, but is not as high as a correlation between each of the extremes of the scale. In particular, the 5-point scale given to the participants maps to quintile categorization of TF-IDF scores. The sample case shows a correlation of 0.71, and this is similar to the other courses still being calculated for statistical significance. A preliminary observation of the participant-ranked scores suggests that though they have utilized the full-resolution of the 5-point scale, participants have a tendency of assigning very high or very low scores to each word. This appears to be consistent among all participants so far, and may suggest that a 5-point scale may have a resolution higher than what can be fully-utilized by each participant. Even though each participant was calibrated to the 5-point scale prior to beginning the study, favoring extremes on that scale might suggest that participants are not able to discern gradients in between and/or the extended resolution is too high.
If only the extremes of the scale are taken into consideration, the data shows that the computational method works very well. Specifically, if words that are scored a “5” or “1” by the participant are compared to their TF-IDF quintile bin, then there is a strong correlation. Sample data from a test case, a freshman Electrical Engineering core course, has a correlation of 0.927 and is shown in Table 1. Initial observations suggest that the subject-matter experts and the TF-IDF program are in agreement for high-ranked and low-ranked words, for most of the data collected so far. Currently, 11 studies have been completed, and 4 remain; the data so far suggest that the program works as the correlations are comparable across all of these courses.
When data is compiled from courses which may have less technical vocabulary, like design courses for example, an initial examination suggests that the correlations between subject-matter expert and the TF-IDF program are lower. In planning the survey, the experimenter predictively assigned three subject-matter experts to score the exact same design-heavy course. Though the data is currently being compiled, initial observations show that the correlation between participants and computer-assigned scores is much lower; slightly less than 0.7.
Currently a group of senior-year students from computer engineering are developing a web-based project based on the modified TF-IDF algorithm. The goal is to make this project accessible to people from around the world, so that they can submit their exams for calculation. This is in response to questions asked during ASEE-2013 where instructors wanted access to this software for their own courses. The users of this platform will have their documents categorized and added to the existing repository, and in return receive a scored wordlist based on the modified TF-IDF algorithm.
211
Conclusions
The computational approach based on a modified TF-IDF algorithm appears to successfully replicate human subject-matter expert knowledge in identifying discipline-specific vocabulary. Through the dataset is currently limited to 9 exams, initial statistical measures for correlation show strong results. In particular, the software is able to accurately characterize vocabulary that is discipline-specific and this is a promising starting point for further research in the area of language analysis in engineering education. This work can lead to the development of clearer and more explicitly-defined learning outcomes, with the goal being to increase accessibility to technical terminology and robust vocabulary development for all students.
References
1. Bowe, F., Universal design in education: Teaching nontraditional students. Bergin & Garvey, Westport, CT, 2000.
2. McGuire, J.M., Scott, S.S., and S. F. Shaw. Universal design and its applications in educational environments.
Remedial and Special Education, 27(3), 2006, pp. 166-75. 3. Pliner, S.M., and J. R. Johnson. Historical, Theoretical, and Foundational principles of universal instructional
design in higher education. Equity & Excellence in Education, 37(2), 2004, pp. 105-113.
4. C. Variawa, S. McCahan, and M. Chignell. “An Automated Approach for Finding Course-specific Vocabulary”. Proc. of 120th ASEE Annual Conference and Exposition. Atlanta, 2013.
5. Church, Kenneth W., and Robert L. Mercer. "Introduction to the special issue on computational linguistics
using large corpora." Computational Linguistics19.1 (1993): 1-24.
6. McEnery, Tony, Andrew Wilson, and Geoff Barnbrook. "Corpus linguistics." Computational Linguistics 24.2 (2003).
7. Bybee, Joan L., and Paul Hopper, eds. Frequency and the emergence of linguistic structure. Vol. 45. John
Benjamins Publishing Company, 2001.
8. SHI, Congying, Chaojun XU, and X. Yang. "Study of TFIDF algorithm." Journal of Computer Applications 29 (2009): 167-170.
9. Robertson, Stephen. "Understanding inverse document frequency: on theoretical arguments for IDF." Journal of
Documentation 60.5 (2004): 503-520.
10. Singhal, Amit. "Modern information retrieval: A brief overview." IEEE Data Engineering Bulletin 24.4 (2001): 35-43.
212
APPENDIX A.8 – EXPLORING THE INSTRUCTIONAL IMPLICATIONS OF AN AUTOMATED COURSE-SPECIFIC VOCABULARY IDENTIFICATION
PROGRAM C. Variawa, and P. Kinnear. “Exploring the Instructional Implications of an Automated Course-specific Vocabulary Search Program.” Proc. of Canadian Engineering Education Association Conference. CEEA Paper No. 86. Montreal, 2013. This paper was presented at the 2013 Canadian Engineering Education Association Annual Conference. This paper discusses the instructional implications of employing an automated vocabulary characterization tool in a learning environment. The discussion focuses on vocabulary learning, and suggests that the tool helps create a foundation that supports lexicogrammatical development of professional language.
213
EXPLORING THE INSTRUCTIONAL IMPLICATIONS OF AN AUTOMATED COURSE-SPECIFIC VOCABULARY IDENTIFICATION PROGRAM
Variawa, C., and Kinnear, P.
Department of Mechanical and Industrial Engineering, University of Toronto, Ontario Canada [email protected]; [email protected]
Keywords: computational linguistics, vocabulary identification, vocabulary instruction, EAP, ESP. Based on the principles of universal instructional design [1], a research study is being performed at the University of Toronto to explore whether an automated technique can be used to identify course-specific vocabulary. The motivation behind this is to create clearer course objectives and highlight the importance of developing a robust professional vocabulary, while promoting a more accessible engineering education. Additionally, the vocabulary used in engineering courses often contain vernacular that is neither technical nor course-specific, but is used to help contextualize the course content - the automated technique can also be used to identify such language.
The Term Frequency-Inverse Document Frequency (TF-IDF) algorithm is one approach that can help identify keywords that are specific to documents, when compared to relevant comparator sets. In this study, final exams from engineering courses are compared to a corpus of all electronically-available engineering final exams to develop wordlists for many courses. The algorithm works by multiplying the term frequency of each word on a specific exam being studied to the logarithm of how often that word appears in the group. The resulting data is tabulated to form a wordlist, with words characteristic of the input document having higher TF-IDF values. This approach has modified so that the wordlists are generated twice - once by comparing across exams from the same discipline, and another by calculating with all engineering disciplines - to increase the reliability of the wordlists. So far, the data generated shows that words that appear to be course-specific are assigned higher TF-IDF scores, and preliminary research is being conducted to understand the effectiveness of this sample data for a chemical engineering course. The theoretical effectiveness of distributing such wordlists as part of required course syllabi and course material is examined as part of this paper.
Vocabulary is critical to the academic and professional success of engineering students. All the students must learn to understand and use discipline specific professional language to practice engineering. For multilingual students this is a daunting task. While estimates vary, a student needs to know 98-99% of the lexical items to understand written discourse.
It is slightly lower for spoken discourse, around 80% for good comprehension (e.g. lectures, discussions). That translates to over 7000 word families [2]. Two questions confront students entering an English-medium university and
their instructors. The first question asks which vocabulary students need to learn and the second question queries what it means to “know” a word. These questions are fraught. Even among discipline experts, key vocabulary is contested. Attempts to deal with the first question include Coxhead’s Academic Word List (AWL) [3] and Xue and Nations’ University Word List [3]. However, these lists do not address the discipline specific vocabulary students will encounter in Engineering. English for Specific Purposes (ESP) teaching has attempted to target discipline specific technical vocabulary in its instructions. Mudraya [4], however, cites research, along with her own corpus-based research that learning the technical vocabulary is not the biggest challenge, rather the subtechnical, words that exist between the technical and the general, everyday vocabulary. Deciding where to focus student efforts is not a simple decision, however we can explore the use of automated course-specific vocabulary identification techniques to address this problem.
The problem we face currently is how to more effectively ensure that our multilingual students have access to the lexical resources they need to successfully develop and use professional engineering language. Evidence indicates that this incidental learning enhances knowledge of vocabulary the student has already seen more than the acquisition of new vocabulary [2]. The “word list” approach is equally inefficient as it relies on memorization of single forms and meanings. Students new to a discipline may well benefit from knowing which words carry discipline specific meanings, even if they may not yet understand the conceptual meanings and associations of the words, thus presenting students with a well-chosen word list with definitions at this point could be effective to introduce words. At other times a focus on the collocations and constraints may be most useful. Instruction that concentrates on the polysemous nature of the subtechnical vocabulary that students encounter in engineering communication documents and journal articles also has a place. Having a well-defined word list derived in a principled way from the discipline corpus provides a solid foundation from which to develop various instructional and self-study strategies to support student lexicogrammatical development in their professional language.
The development of a course-specific wordlist is a starting point for further research in the area of instructional support for professional language development in engineering education.
214
References [1] F. Bowe, Universal Design in Education: Teaching
Nontraditional Students. Westpoint, CT: Bergin & Garvey, 2000.
[2] N. Schmitt, "Instructed second language vocabulary learning,"
Language Teaching Research, vol. 12, pp. 329-363, 2008. [3] N. Hancioglu, S. Neufeld, and J. Eldridge, "Through the
looking glass and into the land of lexico-grammar," English for Specific Purposes, vol. 27, pp. 459-479, 2008.
[4] O. Mudraya, "Engineering English: A lexical frequency
instructional model," English for Specific Purposes, vol. 25, pp. 235-256, 2006.
215
APPENDIX A.9 – EVALUATING THE USABILITY OF AN AUTOMATED COURSE-SPECIFIC VOCABULARY TO MEASURE LANGUAGE PROFICIENCY
IN ORAL PRESENTATIONS C. Variawa, and L. Wilkinson. “Evaluating the Usability of an Automated Course-specific Vocabulary to Measure Language Proficiency in Oral Presentations.” Proc. of Canadian Engineering Education Association Conference. CEEA Paper No. 75. Montreal, 2013. This paper was presented at the 2013 Canadian Engineering Education Association Annual Conference. This paper tests the usability of an automated vocabulary characterization tool to assist in identifying course-specific vocabulary used by students during oral presentations. This investigation shows that the tool should allow for the characterization of acronyms and allow for mapping across root words. (*Note: this study informed an update to the methodology used to condition input documents, and this updated methodology was used for the contents of the dissertation).
216
EVALUATING THE USABILITY OF AN AUTOMATED COURSE-SPECIFIC VOCABULARY TO
MEASURE LANGUAGE PROFICIENCY IN ORAL PRESENTATIONS
Chirag Variawa and Lydia Wilkinson Department of Mechanical and Industrial Engineering, Engineering Communication Program, University of Toronto
[email protected]; [email protected] Abstract - A study is being conducted to automatically identify and highlight course-specific language used in engineering courses. Specifically, the Term-Frequency Inverse-Document Frequency (TF-IDF) algorithm, from the field of computational linguistics, is used to identify words that are characteristic to exams. This paper analyzes the success of a TF-IDF generated word list to capture discipline-specific language in oral design presentations in a second year Environmental Chemistry course.
1. INTRODUCTION
This paper examines the use of computer-generated wordlists to identify relevant course-specific vocabulary in an engineering classroom. Principles from the fields of computational linguistics, industrial engineering, and higher education are employed to create a software program that performs statistical calculations on groups of input documents to generate keyword lists. Specifically, the program uses a combination of documents from the same discipline and all of engineering to assign a value, called a Term Frequency-Inverse Document Frequency (TF-IDF) score, to each word in an input document. In the studies so far, the words have been extracted from all electronically available engineering exams at the University of Toronto. These documents are used because they are a standardized artifact of the learning environment, and are usually a summative indicator of the learning objectives in different courses. As a result, the wordlists are specific to each course, even though there may be overlap of certain words with other exams - the program finds keywords in documents, and course-specific vocabulary are considered keywords for engineering exams. The words from each exam are then tabulated in order of decreasing TF-IDF score. The higher the TF-IDF score, the more likely it is that the word is a keyword. However, this approach does have some critical features that limit its applicability in its current form.
2. USING TF-IDF IN AN ORAL DESIGN PROJECT
In this study, TF-IDF was used to measure the effectiveness of student’s discipline-specific language in a second year environmental chemistry class. We focused specifically on the formal client meeting, an oral presentation that is an important step in the design project. Mapping the word list against vocabulary in the presentation exposed
limits arising from differences in purpose and mode between exams and design projects.
The relative purpose of exams and design projects affects the type of word forms that are used. Exams aim to test knowledge and as such often use the imperative form to deliver a set of instructions. Students are asked to calculate the alkalinity of…, give the equation for…., or estimate the concentration of….in order to display knowledge. In comparison, design projects, and these oral presentations specifically, often use the past or future tense to describe what has been accomplished or what will be done next, explaining for instance that an estimated total will be collected, or that ratios were calculated. A direct mapping of the wordlist against the presentation does not capture the form of the word or acknowledge shared root words.
The skills tested for and used in exams and design projects—knowledge versus problem solving, investigation and project management—also impact vocabulary. Design projects require students to look for solutions outside the lecture content, and as a result words occur in the project that may not be tested for on the exam. For example, in one group of presentations, gas was the most frequently recurring word list term; but while it would be used on an exam to refer to a gas state, it was used here to refer to gasoline manufacturers, key contributors to pollution on the site being evaluated. Gas was also used as a modifier in gas chromatography, and notably, chromatography was the next most frequent word list term. Both in usage and colocation we again see the importance of interpretation in applying the word list.
The list also does not capture acronyms, which are a key expression of professional competency within this context. A student at the beginning of their presentation may indicate that they are testing for Volatile Organic Compounds, but through the rest of the presentation consistently refer to VOC’s. While the full term is only captured once, the repeated acronym has the same basic meaning. Within Chemical Engineering condensing terms is a necessary means of ensuring concision within a discipline where lengthy word chains are the norm. A student’s inability to recognize acronyms by graduation would suggest a lack of proficiency in their chosen field.
Given the above limitations, it becomes clear that the current approach of using the method as-is is not able to fully characterize the course-specific vocabulary used on oral presentations. Specifically, the program is not yet capable of discerning between acronyms or shorthand, the demonstration of skills in a design course, and differences in word form and usage. Future work will investigate these areas
217
as well as exploring the use of speech-to-text to aid in identifying word usage in oral presentations.
218
APPENDIX B.1 – INPUT CONDITIONING
This section expands on the process used to prepare the dataset for processing.
219
Input Conditioning The first phase of the study is to design and prototype the computational approach. Based on
the modified TF-IDF method discussed earlier, the program is assigning a TF-IDF weight to each word in
a given input document. In order to develop this program, it was first necessary to prepare each input
document so that it could be accessed by a computer as a string. The input documents are
electronically-available final exams in engineering, and are received in a variety of file formats including:
.jpg, .bmp, .PDF, .doc, .docx, .txt, and so on. The common computational aspect of these documents is
that they all contain text. Though some of these files contain explicitly-defined text, as in .docx files, the
image files do not have an ability to select and extract text as-is. In order to employ the computational
approach, all of the input documents needed to be in a standardized file format. In order to convert
these files into a machine-readable format that TF-IDF can calculate, the documents needed to be
processed by a program that would convert each file into a text-only document. In this processing step,
graphical elements are lost because intended output, the text-only file, is not able to contain elements
other than those found on a standardized ASCII table. For the purpose of this study, the researcher is
interested in the analysis of vocabulary and though graphical elements may contain text, at this current
point in the research it is not considered a part of the input data.
These text-only files should not include any symbolic characters including commas, quotation
marks, brackets, and any non-alphanumeric characters. This is because the TF-IDF algorithm uses
characters bounded by white space as individual words. Specifically, commas and other symbolic
characters are considered part of a word since they are not separated from a word using white space. In
addition, it is also important to remove any numbers from the documents being converted into text-only
format. This is because the study is interested in exploring discipline-specific vocabulary, and that
numbers add to the miscalculation of TF-IDF values; numbers are not to be treated as words for this
specific study.
220
So far, any characters not on the ASCII table between lowercase A and uppercase Z, including all
numbers, are removed during the conversion of input documents to text-only format. In particular, the
conversion process creates a single string from each input document. This string represents each final
exam and is now the input document that the computational approach will use for subsequent aspects
of the study.
In using this conversion approach iteratively, once per input document across all documents, it
was determined that a further refinement of the conversion process is necessary. Specifically, words
beginning with an uppercase letter are treated differently than the same word using all lowercase
letters. In addition, it appears that eliminating words that do not contain a vowel can artificially remove
acronyms that would otherwise be used in a discipline-specific manner. As such, the processing also
includes an additional “cleaning step” where only the following kinds of “words” are added. The set of
characters between two white spaces is converted into text and added to a text-only file if it follows the
schema in figure X.
Table 1 - Shows modifications made during conversion from PDF to TXT
Converted and Added to Text File Not Converted and Not Added to Text File
• Contains 1 vowel, and characters
between 65 and 90, and 97 and 122
(inclusive) on the standard ASCII
table.
e.g. “beam”, “catalyst”, “Lattice”, “electron”,
“Operation”, “caffeine”.
• Any character not on the ASCII
table
e.g. “要” (foreign characters)
• Contains exclusively all capital
letters
• No vowels (and NOT all caps)
e.g. “bem”, “ctlst”, “ltc”, “lctrn”, “prtn”, “cffn”.
221
e.g. “BLDG”, “RXN”, “FCP”, “GRND”, “IF”,
“STDNT”
• Is a “space” character (ASCII value
= 32)
e.g. “ “ (blank space)
• Contains any numbers
e.g. “be1m”, “catalyst3”, “lt4c”, “electr55on”,
“IF4f”, “ST1p4”
222
APPENDIX B.2 – SOFTWARE CODE: INPUT CONDITIONING
This section presents the code used to create the prototype software which prepares the input.
223
Imports System.IO Imports Microsoft.VisualBasic Imports System Imports System.Text Module Module1 '************************************************************************************************************************** Sub Main() '************************************************************************************************************************** Dim DirArray() As String 'Declare our variables Dim FileArray() As String Dim FinalStringArray() As String Dim strName As String Dim Mydir As String Dim MyFileName As String Dim First As Boolean Dim rowcnt As Integer Dim FCOUNT As Integer Dim AcroApp As Acrobat.AcroApp 'sets up an object of type Acrobat.AcroApp (the whole Acrobat app) Dim AcroAVDoc As Acrobat.AcroAVDoc 'AVDoc is opened in Acrobat’s user interface Dim AcroPDDoc As Acrobat.AcroPDDoc 'PDDoc is opened in the background and manipulated without the user seeing it Dim AcroTextSelect As Acrobat.AcroPDTextSelect 'Allows text in PDF to be selected Dim PageNumber, PageContent As Object Dim content As String = "" Dim i, a, j, p, l, counter As Integer Dim txtflname As String Dim finalstring As String '************************************************************************************************************************** 'Prompt user to enter the particular Folder Name to get files and file-names from folder directory 'strName = InputBox(Prompt:="Please Enter The Folder Name", Title:="Enter The Folder Name") '************************************************************************************************************************** 'Take from folder existing in a particular directory Mydir = "C:\Users\MCCAHAN-LAB\Desktop\New folder\" & strName '************************************************************************************************************************** 'Code below finds all PDF files in specified folder, returns name and file directory location, and stores the results into DirArray() and FileArray() rowcnt = 1 fCount = 0 First = True Do While (1) If First = True Then MyFileName = Dir(Mydir + "\*.pdf", vbDirectory) First = False Else
224
MyFileName = Dir() End If If MyFileName = "" Then Exit Do ReDim Preserve DirArray(0 To rowcnt) DirArray(rowcnt) = (Mydir + "\" + MyFileName) ReDim Preserve FileArray(0 To rowcnt) FileArray(rowcnt) = MyFileName rowcnt = rowcnt + 1 'represents number of files in folder, or UBound(DirArray or FileArray) Loop '************************************************************************************************************************** 'Code below will extract text from PDF's, and return final string with @ at the end p = 1 l = 1 finalstring = "" Do Until p = rowcnt AcroApp = CreateObject("AcroExch.App") AcroAVDoc = CreateObject("AcroExch.AVDoc") If AcroAVDoc.Open(DirArray(p), vbNull) <> True Then Exit Sub End If AcroAVDoc = AcroApp.GetActiveDoc AcroPDDoc = AcroAVDoc.GetPDDoc For i = 0 To AcroPDDoc.GetNumPages - 1 'For all the page numbers PageNumber = AcroPDDoc.AcquirePage(i) PageContent = CreateObject("AcroExch.HiliteList") If PageContent.Add(0, 9000) <> True Then Exit Sub End If AcroTextSelect = PageNumber.CreatePageHilite(PageContent) For j = 0 To AcroTextSelect.GetNumText - 1 content = content & AcroTextSelect.GetText(j) Next j Next i '************************************************************************************************************************** 'Clean PDF Content string using ASCII before we store it into the FinalStringArrayI() below For Counter = 1 To Len(content) a = Asc(Mid(content, counter, 1)) If (a >= 64 And a <= 90) Or (a >= 97 And a <= 122) Or a = 32 Or a = 10 Then ' finalstring = finalstring & Mid(content, counter, 1)
225
End If Next '************************************************************************************************************************** '************************************************************************************************************************** Dim new_line_char As Char = Chr(10) finalstring = Replace(finalstring, new_line_char, " ") Dim allwords() As String = Split(finalstring) Dim numwords As Integer = UBound(allwords) + 1 Dim counter1 As Integer = 1 Dim wcount As Integer = 1 Dim finalwords() As String Dim count2, count3 As Integer Do Until counter1 = numwords + 1 Dim currword As String = allwords(counter1 - 1) count2 = 1 count3 = 0 If currword = "" Then 'if there are extra spaces in between words in the .txt file, there's an empty element in the array 'this line accounts for those empty elements and instructs program to continue with the loop ElseIf isallcaps(currword) = True Then 'if word is all caps, program adds it to finalwords ReDim Preserve finalwords(0 To wcount) finalwords(wcount - 1) = currword wcount += 1 Else currword = LCase(currword) 'if the entire word isn't all capitals, turns it all to lowercase to make it easier If Len(currword) = 1 Then 'if the word is only one letter long, program only adds it to finalwords if it's 'a' or 'i' If Asc(currword) = 97 Or Asc(currword) = 105 Then ReDim Preserve finalwords(0 To wcount) finalwords(wcount - 1) = currword wcount += 1 End If ElseIf isavowel(currword(0)) = True Then 'if the word begins with a vowel, program loops through word and only adds it to finalwords if the word also contains a consonant For count2 = 1 To Len(currword) - 1 If isavowel(currword(count2)) = False Then count3 += 1 'counter goes up if there's a consonant End If Next count2 Dim boolvar As Boolean = Not (count3 = 0) 'boolvar = True if count3 isn't 0, meaning there is a consionant in the word
226
If boolvar = True Then ReDim Preserve finalwords(0 To wcount) finalwords(wcount - 1) = currword wcount += 1 End If Else 'if the word begins with a consonant, it's only added to finalwords if the word also contains a vowel For count2 = 1 To Len(currword) - 1 If isavowel(currword(count2)) = True Then count3 += 1 'counter goes up if there's a vowel End If Next count2 Dim boolvar As Boolean = Not (count3 = 0) 'boolvar = True if count2 isn't 0, meaning there is a vowel in the word If boolvar = True Then ReDim Preserve finalwords(0 To wcount) finalwords(wcount - 1) = currword wcount += 1 End If End If End If counter1 += 1 Loop finalstring = Join(finalwords) '************************************************************************************************************************** '************************************************************************************************************************** 'Concatenate an @ at the end of the extracted PDF text finalstring = finalstring & " @" '************************************************************************************************************************** 'Store each of the resultant strings extracted from each of the PDF's into the array called FinalStringArray() - NOTE: THIS NEEDS FIXTURE ReDim Preserve FinalStringArray(0 To l) FinalStringArray(l) = finalstring l = l + 1 '************************************************************************************************************************** 'Create a text-file and store the resultant 'finalstring' into the textfile, just for insurance purposes - NOTE: THIS NEEDS FIXTURE 'NOTE: Current code will export the extracted text to the same directory where the original PDF files are located txtflname = Replace(DirArray(p), ".pdf", ".txt") Dim TextFile As New Scripting.FileSystemObject Dim fs As Scripting.FileSystemObject Dim ts As Scripting.TextStream fs = New Scripting.FileSystemObject ts = fs.CreateTextFile(txtflname)
227
ts.WriteLine(LCase(finalstring)) ts.Close() '************************************************************************************************************************** 'Reset values of content and finalstring to Null, as well close previously accessed PDF's before exiting the loop and moving on to the next PDF content = vbNullString finalstring = vbNullString AcroAVDoc.Close(True) AcroApp.Exit() AcroApp = Nothing '************************************************************************************************************************** 'Increase counter, and move onto the next PDF to extract text from, in the specified directory above p = p + 1 '************************************************************************************************************************** Loop '************************************************************************************************************************** End Sub '************************************************************************************************************************** Public Function isavowel(ByVal inputchar As Char) As Boolean If Asc(inputchar) = 97 Or Asc(inputchar) = 101 Or Asc(inputchar) = 105 Or Asc(inputchar) = 111 Or Asc(inputchar) = 117 Or Asc(inputchar) = 121 Then Return True End If Return False End Function Public Function isallcaps(ByVal inputstr As String) As Boolean Dim a1 As Integer = Asc(inputstr) Dim count4 As Integer = 1 Dim noncapcount As Integer = 0 If (a1 >= 65 And a1 <= 90) Then 'If first letter is caps For count4 = 1 To Len(inputstr) - 1 'Loops through rest of word a1 = Asc(inputstr(count4)) If (a1 < 65 Or a1 > 90) Then noncapcount += 1 'Increments noncapcount if a char in word isn't caps End If Next count4 Else Return False End If If noncapcount = 0 Then Return True Else Return False End If End Function End Module
228
APPENDIX C.1 – COMPUTATIONAL APPROACH
This section expands on the process used to deploy the modified algorithm based on the Term Frequency-Inverse Document Frequency (TF-IDF) equation.
229
Coding the Modified TF-IDF Program Specifically, the experimenter chooses one file to be used as the input file, and renames that
text-only file “1.txt”. This indicates to the TF-IDF program that this file will be the input which needs
processing. The second action that the experimenter performs is to isolate that “1.txt” file with either
all exams from that same discipline, or with all exams from engineering except for that same discipline.
For example, if the input file, “1.txt” is a first year chemical engineering course called CHE101, then it is
placed into a folder with either all other “CHE” text files, or into a folder with all engineering text files
except ones that contain “CHE” in the title. This is due to the modification to the TF-IDF algorithm
mentioned earlier, and uses context-based calculations to increase the resolution and spread of the TF-
IDF scores during calculation later on. As such, the input going into the TF-IDF program is a folder with
one file called “1.txt” and the rest of the files in that folder being text-only files of engineering exams,
either of the same discipline, or of all other engineering disciplines. The experimenter also creates a
blank file called “OUTPUT.txt” for the words and TF-IDF scores to be output to, at the completion of the
calculations.
Now that the input is fully prepared, the TF-IDF program can be developed and used to assign a
score to each word in that input “1.txt” file, to help characterize key words in that file. As with the text-
extractor program, the TF-IDF program contains are several major components, and include: declaring
variables, memory allocation, calling and using object libraries, creating and using file structures within
the Microsoft Windows® operating system, and several iterative calculations. The differences between
the text-extractor and this TF-IDF program are the type of calculations being performed, the amount of
memory being used, and the object libraries being called. Specifically, the TF-IDF program is more
involved, and uses six major user-coded subroutines instead of one in the previous program.
The TF-IDF program is first prompts the user, using a dialog box, for the folder location where
the input document and its comparator set are stored. This information is used by the program to learn
230
where all of the input documents are coming from, and later used to prepare a dynamically-allocated
array. First, the program counts the number of files in the input folder. It then creates an array with the
first element of each row containing the file path of each file in that folder. For example, row 1 may
contain the header “C:\ExperimentOne\CHE100versusALLCHE\CHE100.txt”. The program then
continues populating the first element of each row with the file path of each file in the folder, until no
more files are left to add. This is the main array that the program will use for calculation in future steps.
The next step is for the program to open “1.txt”, the main input document – the exam that the
user wishes to encode using the TF-IDF algorithm – and inputs the words into the array. In particular,
the TF-IDF program uses the Microsoft Filesystem ReadAllText subroutine to input all text from “1.txt”.
Each word from that file is assigned a unique element in the row; starting from the beginning of the
document, each word gets its own element in the row assigned to that document. As such, the first
word in the document would be assigned row1 column1, and the last word in the document would be
assigned to row1,column X, where X is the total number of words in that document. This program then
moves to the next row in the dynamically-allocated array, and stores the text from the next file in the
input folder in the same manner. After having completed processing all words from all text files in the
input folder, the TF-IDF program now has a large array with the words from the input exam occupying
individual elements in row 1, and all subsequent words from all documents occupying their
corresponding rows and columns. With this large grid now prepared, the program can use a coordinate
system to tactically pinpoint any word given the row and column number as necessary. As with the
previous program, all “int” variables used in the numerous loops and counters have been replaced with
“double” variables to increase the maximum number of row or column entries to 1.79x10308 (int values
can extend to only 2.1x109 in the positive domain).
The first major operation is to get the instances of a particular word. As such, the program
starts at the first element in the array – the first word of the input document – and counts how many
231
times it appears in that document. In addition to the system function called “FileReader”, the advanced
feature used here is the public shared function match called “Regex.Matches”. This system function
buffers the element string, in this case the first word, into memory and then scans across the same row
to count the number of times that identical string appears. Each time the Regex.Matches comes across
the same word, it increases the counter by one. For example, the word “name” can appear ten times in
a document, with the element containing “name” appearing a few times near the beginning of the row,
some near the middle, and a few more near the end; for each occurrence, it increases the counter by
one, ignoring all entries that do not match the word “name”. The program returns a count after each
word, and performs a series of steps before it moves onto the next word.
The next function is the term frequency subroutine. It takes the value just returned by the
previous function, and divides it by the total number of elements in the same row. As such, it is dividing
the number of occurrences of a particular word by the total number of words in the same document.
This function uses Regex.Matches to help distinguish between white space entries, and elements
actually containing a word so that the word count is more accurate. This value, the term frequency (TF),
is stored in another dynamically-allocated array. Here the TF number can be easily mapped using a
coordinate system to the original word in the original array, and is useful for future calculations and
debugging purposes.
In order to generate the inverse document frequency, the TF-IDF program then needs count
how many times each word in the input file – row 1 – occurs over the large initial array. In order to do
this, the program uses the coordinate system along with several loops and conditional statements to
count the number of documents being compared to. Specifically, the program increases a counter for
each instance of a unique file path, along column 1 of the initial array. It then buffers the first word of
the input exam into memory and uses a Regex.Match command to loop through each row, increasing
another counter for the number of times an identical word is found. If the counter increases past 0 for
232
the row being examined, then it means that that row contains at least one instance of the word in the
buffer. As such, the document word count increases by one. This repeats for each row – document in
the folder – until all rows have been searched. In particular, the program is interested in the number of
documents containing at least one instance of the specific word that occurs in the input document. The
number of times that word appears in other documents is not a critical feature, but is stored for future
analysis to help improve the system at a later date. For the purpose of this study, there is now a term
frequency value for each word in the input document, and a value that corresponds to the number of
documents that contains the identical word. The inverse document frequency is then calculated. This is
a value determined by taking the logarithm of the quotient when the total number of files in the input
folder is divided by the number of documents that contain the identical word. Specifically, the IDF is
calculated using the logarithm of the quotient between the upperbound of the number of documents in
the initial array, and dividing it by the number of documents containing the word being investigated.
The TF-IDF score is calculated by multiplying the TF determined earlier by the IDF just calculated. This
TF-IDF score is then printed to the screen, and using the Console.Writeline subroutine. As such, the user
will now see the first element of the input exam and a TF-IDF score next to it printed to their screen.
The program then uses the File.AppendAllText subroutine to append the word, the TF-IDF score, and a
“/n” newline carriage to the originally-blank “OUTPUT.txt” output file.
The TF-IDF program then continues this process for each word in row 1 so that each word in the
input file is assigned a TF-IDF score, printed to the screen, and to the output file. The program exits all
of the loops and commands an exit of the program when the “@” symbol is reached. As noted earlier,
the “@” symbol is at the end of each input-text file, and is used to tell the TF-IDF program that all words
in the exam have been calculated. As such, since there are no instances of the “@” symbol present,
other than at the end of the document, the TF-IDF program now knows that it has calculated a TF-IDF
score for each word in the input document. By choosing which comparator sets are included with the
233
original file, an authentic TF-IDF score can be calculated using context-aware frequency values; the
process is then repeated for the same document using either all exams in engineering or all exams
within the same discipline.
The first post-processing step is to import the data from the text-file. In Excel, this is done by
pointing the import data wizard to the output file, and specifying “space” as the delineating factor. This
causes the data to be inputted as two columns: the first column is each word in the exam, and the
second column is the corresponding TF-IDF score.
Now that the data has been inputted into Excel, the experimenter then deletes any information
in the output.txt file so that the next iteration of the TF-IDF program will have a clean output file. Each
exam is passed through the TF-IDF program twice: one instance is where the input file is compared
against all documents within the same engineering discipline, and another instance is when the file is
compared against all exams in engineering (minus the same discipline). For each iteration, the data is
stored into the spreadsheet created above.
The result of all of the steps in the study so far is now shown: the experimenter now has a list of
words from the input exam, sorted in decreasing order of TF-IDF score with no redundant pairs of cells.
This is the wordlist portion of the study, and completes the computational approach.
234
APPENDIX C.2 – SOFTWARE CODE: COMPUTATIONAL APPROACH
This section presents the code used deploy the modified algorithm based on the Term Frequency-Inverse Document Frequency (TF-IDF) equation.
235
Imports System.IO Imports Microsoft.VisualBasic Imports System Imports System.Text Module Module1 Sub Main() 'Declare our variables Dim DirArray() As String Dim FileArray() As String 'Dim strName As String Dim Mydir As String Dim MyFileName As String Dim First As Boolean Dim rowcnt As Integer Dim FCOUNT As Integer Dim wrdcnt(), wrdinst(), termfrq() As Double Dim i, j, ubDA, ubWA As Long '************************************************************************************************************************** 'Prompt user to enter the particular Folder Name to get files and file-names from folder directory 'strName = InputBox(Prompt:="Please Enter The Folder Name", Title:="Enter The Folder Name") '************************************************************************************************************************** 'Take from folder existing in a particular directory Mydir = "C:\Users\MCCAHAN-LAB\Desktop\Program\CALCULATIONS\MIE350\DISC\" '& strName '************************************************************************************************************************** 'Code below finds all TXT files in specified folder, returns name and file directory location, and stores the results into DirArray() and FileArray() rowcnt = 1 FCOUNT = 0 First = True Do While (1) If First = True Then MyFileName = Dir(Mydir + "\*.txt", vbDirectory) First = False Else MyFileName = Dir() End If If MyFileName = "" Then Exit Do ReDim Preserve DirArray(0 To rowcnt) DirArray(rowcnt) = (Mydir + "\" + MyFileName) ReDim Preserve FileArray(0 To rowcnt) FileArray(rowcnt) = MyFileName rowcnt = rowcnt + 1 Loop '**************************************************************************************************************************
236
'Do Until i = UBound(DirArray() + 1) ' Dim fileReader As String ' fileReader = My.Computer.FileSystem.ReadAllText(DirArray(i)) ' numdocs = UBound(DirArray()) ' Dim WordArray() As String = Split(fileReader, " ") ' Do Until j = UBound(WordArray() + 1) ' wrdcnt(i) = getWordCount(fileReader) ' wrdinst(i) = getInstancesofWord(WordArray(j), fileReader) ' termfrq(i) = TermFrequency(WordArray(j), fileReader) ' j = j + 1 ' Loop ' i = i + 1 'Loop '************************************************************************************************************************** Dim documentlist() As String ubDA = UBound(DirArray) + 1 Dim k As Long k = 1 Dim fileReader As String Do Until k = ubDA ' fileReader = My.Computer.FileSystem.ReadAllText(DirArray(k)) ReDim Preserve documentlist(0 To k) documentlist(k) = fileReader fileReader = "" k = k + 1 Loop Dim numberofarrayelements As Double Dim counterp As Double = 0 Dim TF As Double Dim IDF As Double Dim TFIDF, IOW, WC, NDCW As Double Dim numdocs As Double Dim TFIDFA() As Double ubDA = UBound(DirArray) + 1 i = 1 ' Do Until i = ubDA ' Dim fileReader As String fileReader = My.Computer.FileSystem.ReadAllText(DirArray(i)) Dim Words() As String = Split(fileReader, " ", , CompareMethod.Text) numdocs = UBound(DirArray) numberofarrayelements = UBound(Words) - LBound(Words) ubWA = UBound(Words) + 1
237
Do Until counterp = ubWA TF = TermFrequency(Words(counterp), fileReader) IOW = getInstancesofWord(Words(counterp), fileReader) WC = getWordCount(fileReader) NDCW = numofDocsContainingWord(numdocs, Words(counterp), documentlist) TFIDF = TermFreqInverseDocFreq(Words(counterp), fileReader, numdocs, documentlist) ' ReDim Preserve TFIDFA(0 To counterp) ' TFIDFA(counterp) = TFIDF Console.WriteLine(Words(counterp) & " " & TFIDF) File.AppendAllText("C:\Users\MCCAHAN-LAB\Desktop\Program\OUTPUTTEMP\MEOW.txt", Words(counterp) & " " & TFIDF & Environment.NewLine) counterp += 1 Loop i += 1 counterp = 0 ' Loop Console.ReadLine() End Sub '***CALCULATES NUMBER OF WORDS IN THE INPUT WHILE CHECKING FOR ERRORS*** Public Function getWordCount(ByVal InputString As String) As Double Dim TotalWords As Double 'Finds the Total Number of character strings separated by a space Dim strtest() As String Dim u As Integer TotalWords = System.Text.RegularExpressions.Regex.Matches(InputString, "\w+").Count ' Dim WordsWithNumbers As Double 'Finds the Total Number of character strings containing a numeric character ' WordsWithNumbers = System.Text.RegularExpressions.Regex.Matches(InputString, "\d+").Count Return TotalWords End Function '***CALCULATES INSTANCES OF A PARTICULAR WORD IN A STRING*** Public Function getInstancesofWord(ByVal InputWord As String, ByVal InputString As String) As Double Dim TotalInstancesofWord As Double TotalInstancesofWord = System.Text.RegularExpressions.Regex.Matches(InputString, InputWord).Count Return TotalInstancesofWord End Function '***CALCULATES THE NUMBER OF DOCUMENTS THAT CONTAIN A PARTICULAR WORD*** Public Function numofDocsContainingWord(ByVal numDocuments As Double, ByVal InputWord As String, ByVal documentList() As String) As Double Dim counterj As Double = 1 Dim count As Double = 0 While (counterj <= numDocuments - 1) If getInstancesofWord(InputWord, documentList(counterj)) > 0 Then
238
count += 1 End If counterj += 1 End While Return count End Function '**CALCULATES THE FREQUENCY OF A WORD IN A STRING** Function TermFrequency(ByVal InputWord As String, ByVal InputString As String) As Double Dim TFreq As Double TFreq = (getInstancesofWord(InputWord, InputString)) / (getWordCount(InputString)) Return TFreq End Function '**CALCULATES THE INVERSE-DOCUMENT FREQUENCY OF THE WORD IN THE ARRAY Public Function InverseDocFrequency(ByVal InputWord As String, ByVal numDocuments As Double, ByVal documentList As Array) As Double Dim IDF As Double IDF = Math.Log(UBound(documentList) / numofDocsContainingWord(numDocuments, InputWord, documentList)) Return IDF End Function '**CALCULATES THE TERM-FREQUENCY-INVERSE-DOCUMENT-FREQUENCY OF A WORD IN THE ARRAY Public Function TermFreqInverseDocFreq(ByVal InputWord As String, ByVal InputString As String, ByVal numDocuments As Double, ByVal documentList As Array) As Double Dim TFIDF As Double TFIDF = TermFrequency(InputWord, InputString) * InverseDocFrequency(InputWord, numDocuments, documentList) Return TFIDF End Function End Module
239
APPENDIX D.1 – SAMPLE PARTICIPANT-RECRUITMENT EMAIL
This section presents a sample email sent to potential participants for the evaluation study.
240
Dear Professor <surname>,
I am a Ph.D. student in the Department of Mechanical and Industrial Engineering, University of Toronto. This email is to request your participation in a short study to test the validity and efficacy of a course-specific wordlist generating software program that I have developed for my Ph.D. dissertation. Specifically, the program creates ranked word lists of vocabulary on publically-available UofT engineering exams in order to identify course-specific vocabulary. The goal is to give instructors a tool to automatically identify requisite technical vocabulary that students ought to be familiar with by the end of a specific course; the software helps students develop a robust technical vocabulary, while reducing learning barriers due to inaccessible language. In particular, I need you to gauge whether this program works for your course, as you are the best subject-matter expert in this regard.
Specifically, I am seeking your expertise as a professor for <course code> to help determine how integral certain words are to that course’s curriculum. Participation in this online web-accessible study will take up only 50 minutes of your time and can be completed at your convenience. Your participation would involve completion of an online survey especially designed for you, using Google Forms. In the survey you will be presented with 100 different words found in the final exam for a course you are the instructor for, and asked to click checkboxes on a five-point scale based on how specific each word is to <course code’s> curriculum. This component of my research study is critical to gauging the effectiveness of my automated course-specific wordlist generating program.
I would really appreciate it if you would agree to become a study participant. If you are interested in participating, or have further questions about my dissertation or the study in general, please reply to this email, or feel free to contact me at [email protected] or 555-555-5555.
Thank you so much for your time, and I look forward to hearing from you.
Sincerely,
Chirag Variawa
241
APPENDIX D.2 – INFORMED CONSENT
This section presents the informed consent form signed by each participant in the evaluation study.
242
INFORMED CONSENT FORM Please read the ENTIRE form before answering the question at the bottom. Answering the question at the bottom of this page means that you are confirming that you have read and understood this entire form and had any questions about this study satisfactorily answered. Introductory Information The purpose of the following survey is to determine which of the following words are central to the course, Terrestrial Energy Systems. The results of this survey will be used to help validate the output of a computer program that ranks words in an attempt to create course-specific vocabulary lists. These vocabulary lists can be used as teaching aids to help improve accessibility to information taught in courses. You have been requested to participate because of your knowledge of the Terrestrial Energy Systems content and all of the vocabulary pertaining to that course. This survey should take up at most 50 minutes of your time. You may complete the survey whenever and wherever you wish. However, we ask that you complete the entire survey in one sitting to improve the consistency of the data. Participation and Withdrawal Your participation in this study is voluntary. You may refuse to participate, withdraw at any time, or decline to answer any questions without any negative consequences. If you wish to withdraw at any point, you may do so by sending an email to [email protected] that states that you wish to withdraw. Any and all data pertaining to you would then be removed and deleted from this study. Risks/Benefits There are no reasonably foreseeable risks or harm related to participating in this study. No payments or compensation will be given for participation in this study. However, after the study has been completed, a summary of the results will be emailed to you, along with the coursespecific vocabulary list developed for CIV300, as determined by the computer program and algorithm. You may use this information and the wordlist for future teaching, or in whatever other way you see fit. Access to Information, Confidentiality, and Publication of Results The data for this study will be accessible only to the investigator, Chirag Variawa, his advisor, and a summer undergraduate student for the duration of her term of work. Upon completion of this specific study, estimated to be July 2014, all data pertaining to this study will be anonymized and made accessible only to the supervisor. The results of this study, however, will likely be published and used in public presentations. Due to the nature of this specific survey, your name and this course code will not be anonymized, and your participation in this study is not confidential. However, the researchers will NOT disseminate your name or the course code as part of the outcomes of this study. Fictitious course codes and course titles will be generated for purposes of dissemination.
243
Questions and Contact Information If you have any questions about the terms of consent or this study in general, please feel to contact the research team through the following email address: [email protected] If you have any questions about your rights as participants of a study, you can contact the Office of Research Ethics at [email protected] or 416-946-3273. As stated above, a copy of this form will be sent to your email for your own reference. If you would like a hard copy of this form, please contact the research team at [email protected] so that your request can be fulfilled. Consent Do you, XXXX XXXX, consent to the above terms and agree to participate in this study? * If you click 'YES', and then 'Continue', you will be taken to the first page of the survey. If you circle 'NO', then the survey will not be administered. By circling 'YES', you will be agreeing that you have you read and understood this ENTIRE form. YES NO
244
APPENDIX D.3 – INSTRUCTIONS AND SCALE USED FOR EVALUATION STUDY
This section presents the instructions and scale provided to each participant of the evaluation study.
245
The following words or acronyms are presented to you in alphabetical order.
Please rate them from 1 to 5 using the following scale.
1 - I STRONGLY DISAGREE that this is a course-specific word. This word is not specifically relevant to the content of my course.
2 - I DISAGREE that this is a course-specific word.
3 - I am UNDECIDED whether this is a course-specific word.
4 - I AGREE that this is a course-specific word.
5 - I STRONGLY AGREE that this is a course-specific word. This term is central and specific to the content of my course.
246
APPENDIX D.4 – SAMPLE WORDLIST
This section presents a sample survey given to one participant in the evaluation study.
about acwp and architects base bcwp bidders boeck bore boxshield bucket build casting city civil cleats compura concrete construction consultant corrosion costing cover cpm crane cranes
crashing crawler crosssection does doyles drill estimate excavate excavated excavation excavator faculty filling find footing formwork foundation fumes gaining garage grout hammerhead haul hawthorne here highways
247
holdback incremental indicators is islands items lower luffing material metre might network pave payments performing precast predecessor project proposing pumps scaffold scheduling seal shaded
shoring shortfall side sideboom similarities sketch soil stated steel storey structural tables their these time trench tunnel tunnels type utilities vaning voids waste yes
248
APPENDIX D.5 – COMPLETE WORDLIST
This section presents a one complete wordlist produced by the computational approach (sorted by decreasing TF-IDF score), and used to create the sample wordlist for the evaluation study. The colours represent the quintile that the word is assigned to; red (5), yellow (4), green (3), blue (2), and purple (1).
Rank Word TF-IDFmod 1 civ 0.056646635 2 concrete 0.015357495 3 excavate 0.009444958 4 construction 0.008459422 5 crane 0.007979697 6 similarities 0.00692172 7 tunnel 0.006666238 8 build 0.006388426 9 excavation 0.005518436
10 footing 0.005484795 11 excavated 0.005302422 12 holdback 0.005003342 13 formwork 0.004482372 14 storey 0.004384565 15 civil 0.004357722 16 soil 0.003602975 17 acwp 0.00354386 18 grout 0.003432802 19 metre 0.00339367 20 project 0.003354013 21 designbuild 0.003182623 22 bore 0.002921799 23 liner 0.002876685 24 mccabe 0.002725251 25 foundation 0.002702925 26 days 0.002530305 27 building 0.002481471 28 shoring 0.002481035 29 contractor 0.002434942 30 precast 0.002411362
31 cost 0.002405664 32 airport 0.002399101 33 bcwp 0.002362574 34 readymix 0.002332372 35 ofciv 0.002131704 36 island 0.001928155 37 similar 0.001895199 38 costs 0.001881067 39 passengers 0.001842953 40 struts 0.001831161 41 rakers 0.001831161 42 cpm 0.001758225 43 civf 0.00173563 44 mday 0.001653446 45 schedule 0.00158489 46 houston 0.001542524 47 section 0.001486652 48 contract 0.001475886 49 shown 0.001449173 50 drifts 0.001444948 51 jib 0.001412156 52 install 0.001376665 53 deep 0.00137432 54 fill 0.00130245 55 port 0.001283033 56 shore 0.001272473 57 highway 0.001228635 58 finish 0.001214661 59 safety 0.00118392 60 waterproof 0.001181287 61 designbidbuild 0.001181287
249
62 falsework 0.001181287 63 landside 0.001181287 64 bcws 0.001181287 65 bonnyville 0.001181287 66 design 0.001170817 67 paving 0.001156893 68 tower 0.001131172 69 wall 0.001108628 70 roof 0.001055789 71 restraint 0.001011111 72 bidding 0.001011111 73 bac 0.00101024 74 bid 0.001005988 75 site 0.001002769 76 day 0.000977339 77 total 0.000973858 78 way 0.000960041 79 partnerships 0.000941438 80 arrest 0.000941438 81 bottom 0.000925113 82 wales 0.00091558 83 travel 0.000908589 84 she 0.00088307 85 later 0.000882428 86 assumptions 0.000880653 87 mobilize 0.000879113 88 column 0.000876162 89 ground 0.000872169 90 act 0.000869316 91 rebar 0.000861655 92 parking 0.000861655 93 lien 0.000850212 94 street 0.000843786 95 truck 0.000837584 96 soldier 0.000830493 97 spent 0.000825415 98 owner 0.000813705 99 free 0.000798794
100 bidder 0.000797119 101 sidewalks 0.000795656 102 gravel 0.000795656
103 undertaking 0.000795656 104 ohsa 0.000795656 105 certified 0.000795656 106 ytz 0.000771262 107 excavator 0.000771262 108 cranes 0.000771262 109 water 0.000735917 110 department 0.000733905 111 metres 0.000733457 112 critical 0.000726156 113 western 0.000726096 114 ferry 0.000710568 115 planks 0.000710568 116 noticed 0.000710568 117 contractors 0.000710568 118 inspector 0.000710568 119 requirements 0.000695017 120 gross 0.000686612 121 continue 0.000686068 122 procurement 0.000675731 123 yellow 0.000672443 124 sections 0.00066208 125 safely 0.000641472 126 downtown 0.000640053 127 strength 0.000636313 128 reasonable 0.000635717 129 vehicles 0.000630138 130 esi 0.000627198 131 cables 0.000618833 132 mechanically 0.000616063 133 factor 0.000613516 134 columns 0.000613118 135 show 0.000604977 136 method 0.000592945 137 piles 0.000590643 138 torontos 0.000590643 139 dispute 0.000590643 140 condominium 0.000590643 141 subdivision 0.000590643 142 vaning 0.000590643 143 bishop 0.000590643
250
144 partnered 0.000590643 145 walkways 0.000590643 146 concreting 0.000590643 147 timeline 0.000590643 148 digs 0.000590643 149 demobilize 0.000590643 150 nimiber 0.000590643 151 univerity 0.000590643 152 assumptios 0.000590643 153 collectie 0.000590643 154 crawler 0.000590643 155 hawthorne 0.000590643 156 crtificat 0.000590643 157 freeonboard 0.000590643 158 similrities 0.000590643 159 afsuming 0.000590643 160 drainagewaterproofing 0.000590643 161 grond 0.000590643 162 reiriforcing 0.000590643 163 infractions 0.000590643 164 costplus 0.000590643 165 activltv 0.000590643 166 ouration 0.000590643 167 finisll 0.000590643 168 pagtt 0.000590643 169 diys 0.000590643 170 becomescritical 0.000590643 171 depository 0.000590643 172 eeoc 0.000590643 173 jurisdictional 0.000590643 174 sideboom 0.000590643 175 lanyard 0.000590643 176 fallarrest 0.000590643 177 cleats 0.000590643 178 doyles 0.000590643 179 primavera 0.000590643 180 overead 0.000590643 181 iciv 0.000590643 182 orillia 0.000590643 183 rsmeans 0.000590643 184 thecostcapacity 0.000590643
185 coststate 0.000590643 186 moreexpensivethan 0.000590643 187 boeck 0.000590643 188 boxshield 0.000590643 189 sews 0.000590643 190 luffing 0.000590643 191 hammerhead 0.000590643 192 crashing 0.000590643 193 compura 0.000590643 194 kingston 0.000590643 195 grassy 0.000590643 196 can 0.000590388 197 slope 0.000589818 198 ree 0.000579737 199 clause 0.000566969 200 summarize 0.000560274 201 pave 0.000557746 202 material 0.000554249 203 garage 0.00054505 204 up 0.000543229 205 tunnels 0.000536718 206 under 0.000533299 207 structural 0.000532638 208 theyve 0.000531413 209 incurred 0.000531413 210 bidders 0.000531413 211 table 0.000530947 212 short 0.00053094 213 crash 0.000518575 214 scale 0.000516984 215 diameter 0.000514206 216 highways 0.000505556 217 drawing 0.000504822 218 estimate 0.000495166 219 investigation 0.000494393 220 shaded 0.000491125 221 plan 0.000489963 222 crosssection 0.000475033 223 haul 0.000470719 224 gaining 0.000470719 225 conveniently 0.000470719
251
226 trench 0.000470719 227 built 0.000461135 228 tape 0.000457373 229 location 0.000450455 230 steel 0.000441077 231 bucket 0.000440226 232 eciv 0.000439556 233 estimated 0.000437159 234 quantities 0.000431835 235 streets 0.000425126 236 oec 0.000412708 237 pump 0.000410161 238 summarized 0.00040755 239 drawings 0.000402537 240 top 0.000395048 241 truckloads 0.000385631 242 wbs 0.000385631 243 civq 0.000385631 244 activitys 0.000385631 245 crashed 0.000385631 246 costtime 0.000385631 247 yonge 0.000385631 248 bloor 0.000385631 249 handy 0.000385631 250 birds 0.000385631 251 recev 0.000385631 252 trades 0.000385631 253 kcc 0.000385631 254 planning 0.000385135 255 will 0.000383527 256 constructed 0.000374468 257 name 0.000372915 258 corrosion 0.000365225 259 glue 0.000361237 260 finishing 0.000361237 261 agreement 0.000361237 262 airside 0.000361237 263 have 0.000359591 264 learning 0.000354468 265 experienced 0.00034663 266 inspect 0.000345029
267 drill 0.000341965 268 implied 0.000340038 269 placed 0.000336787 270 underground 0.000331706 271 swell 0.000319632 272 undertaken 0.000319632 273 weather 0.000319632 274 sheet 0.000318478 275 documents 0.0003113 276 framework 0.000308031 277 behind 0.000304857 278 office 0.000301028 279 was 0.000295392 280 number 0.000287379 281 youve 0.000286592 282 december 0.000282884 283 authority 0.000282612 284 include 0.000280773 285 are 0.000279719 286 toronto 0.000275734 287 sides 0.000275569 288 dec 0.000272418 289 graded 0.000271933 290 view 0.00027095 291 use 0.0002676 292 completed 0.000267416 293 voids 0.000265706 294 essentials 0.000265706 295 billy 0.000265706 296 mainland 0.000265706 297 mobilization 0.000265706 298 scissor 0.000265706 299 billed 0.000265706 300 pleased 0.000265706 301 invoiced 0.000265706 302 expensive 0.00025958 303 utility 0.00025958 304 circle 0.000257596 305 using 0.000257345 306 engineer 0.00025569 307 not 0.00025365
252
308 bar 0.000252651 309 original 0.000251516 310 student 0.000251274 311 management 0.00025102 312 lagging 0.000249715 313 if 0.000248702 314 area 0.000246474 315 last 0.000244553 316 over 0.000243969 317 do 0.00024391 318 calculations 0.000243299 319 measures 0.000242032 320 been 0.000241937 321 examiner 0.000237079 322 waste 0.000229444 323 decided 0.000229238 324 fairnes 0.000229238 325 hour 0.000227956 326 suggested 0.000223381 327 ring 0.000220783 328 taken 0.000219461 329 engineering 0.000217878 330 nothing 0.000217524 331 materials 0.000217487 332 little 0.000213153 333 casting 0.000211781 334 copies 0.000211781 335 manage 0.000208956 336 term 0.000204446 337 nor 0.000196007 338 quality 0.000184666 339 square 0.000184402 340 least 0.000182204 341 corner 0.000180618 342 overlapping 0.000180618 343 filling 0.000180618 344 competent 0.000180618 345 carry 0.000174643 346 available 0.00016816 347 need 0.000165931 348 how 0.000165507
349 were 0.000163972 350 suggest 0.000161644 351 answers 0.00015985 352 city 0.000157882 353 brought 0.000156944 354 end 0.000154906 355 during 0.000154296 356 together 0.000153823 357 completing 0.000152429 358 ff 0.000150026 359 new 0.000146704 360 your 0.000146638 361 islands 0.000145782 362 proposing 0.000145782 363 late 0.000141608 364 ignore 0.000141598 365 meet 0.000140844 366 lift 0.000139281 367 this 0.000136789 368 expected 0.000136364 369 options 0.000136271 370 payments 0.000135025 371 high 0.000133606 372 no 0.000132071 373 attached 0.000131006 374 utilities 0.00012979 375 long 0.000129428 376 but 0.000128311 377 network 0.000127783 378 incremental 0.000126693 379 profile 0.000126693 380 simplified 0.000123193 381 mens 0.000120912 382 provided 0.000120422 383 asked 0.000120174 384 right 0.000119825 385 plot 0.000119294 386 due 0.000118172 387 made 0.000117484 388 flights 0.000114619 389 fumes 0.000114619
253
390 extensively 0.000114619 391 pictured 0.000114619 392 fairness 0.000114619 393 indicators 0.000114619 394 take 0.000113715 395 exam 0.00011172 396 tables 0.00011161 397 details 0.000110945 398 understand 0.000105335 399 engineers 0.00010436 400 sketch 0.00010396 401 must 0.000103041 402 j 9.81113E-05 403 direct 9.68097E-05 404 university 9.59314E-05 405 slightly 9.55306E-05 406 productivity 9.55306E-05 407 added 9.50926E-05 408 protect 9.44452E-05 409 hours 9.31326E-05 410 driven 9.27002E-05 411 away 9.25263E-05 412 factory 9.02956E-05 413 cover 9.01614E-05 414 five 8.76356E-05 415 public 8.66942E-05 416 costing 8.64292E-05 417 scheduled 8.64292E-05 418 effect 8.51011E-05 419 near 8.48745E-05 420 held 8.48518E-05 421 detail 8.37709E-05 422 below 8.32747E-05 423 diagram 8.298E-05 424 scenario 8.26128E-05 425 determine 8.26128E-05 426 college 8.10999E-05 427 spi 8.08597E-05 428 required 8.0641E-05 429 million 7.97824E-05 430 including 7.88868E-05
431 scaffold 7.75996E-05 432 year 7.7173E-05 433 maximum 7.56574E-05 434 scheduling 7.40391E-05 435 per 7.12205E-05 436 consultant 7.07209E-05 437 according 6.79218E-05 438 pay 6.76253E-05 439 lf 6.66396E-05 440 purposes 6.52095E-05 441 open 6.44617E-05 442 techniques 6.39986E-05 443 space 6.29956E-05 444 base 6.01693E-05 445 would 5.69347E-05 446 be 5.67737E-05 447 closed 5.62524E-05 448 attention 5.259E-05 449 believe 5.18642E-05 450 seal 5.09956E-05 451 extra 4.98296E-05 452 preference 4.86199E-05 453 ladder 4.86199E-05 454 page 4.74948E-05 455 continuous 4.73083E-05 456 after 4.40676E-05 457 while 4.30683E-05 458 specifications 4.16052E-05 459 calculate 4.1298E-05 460 applied 3.87469E-05 461 better 3.80104E-05 462 before 3.74146E-05 463 circumstances 3.70196E-05 464 assurance 3.70196E-05 465 as 3.67345E-05 466 lob 3.55066E-05 467 permitted 3.36843E-05 468 special 3.27156E-05 469 job 3.25039E-05 470 mark 3.21408E-05 471 k 3.14642E-05
254
472 pumps 2.95313E-05 473 assuming 2.9374E-05 474 typical 2.79879E-05 475 assume 2.74957E-05 476 for 2.71397E-05 477 incident 2.68546E-05 478 aid 2.67784E-05 479 case 2.67199E-05 480 three 2.62673E-05 481 why 2.57423E-05 482 used 2.50566E-05 483 has 2.48354E-05 484 far 2.11992E-05 485 architects 1.51006E-05 486 collective 1.51006E-05 487 unionized 1.51006E-05 488 looked 1.51006E-05 489 client 1.41649E-05 490 types 1.40718E-05 491 answer 1.10522E-05 492 also 9.9191E-06 493 criteria 8.14099E-06 494 demonstrate 6.7685E-06 495 early 4.99649E-06 496 outside 4.72164E-06 497 typically 2.68884E-06 498 crews 2.68273E-06 499 workers 2.68273E-06 500 hire 1.34137E-06 501 inserted -1.33531E-06 502 book -2.00636E-06 503 hard -2.25624E-06 504 above -2.36973E-06 505 question -3.43808E-06 506 trade -4.95016E-06 507 there -5.28964E-06 508 performance -5.65828E-06 509 ec -6.45721E-06 510 normal -7.85311E-06 511 value -7.8695E-06 512 manager -8.57366E-06
513 situation -8.86899E-06 514 allowing -1.18062E-05 515 property -1.24822E-05 516 discuss -1.26828E-05 517 ef -1.2936E-05 518 fall -1.33407E-05 519 units -1.36376E-05 520 point -1.51391E-05 521 performing -1.69058E-05 522 discu -1.90243E-05 523 one -1.95518E-05 524 between -1.99948E-05 525 constraints -2.06736E-05 526 only -2.12218E-05 527 map -2.17414E-05 528 best -2.18332E-05 529 predecessor -2.4394E-05 530 especially -2.4394E-05 531 gof -2.4394E-05 532 shortfall -2.4394E-05 533 achieve -2.67697E-05 534 items -2.77361E-05 535 place -2.80918E-05 536 eac -2.84732E-05 537 doing -3.21238E-05 538 elements -3.28051E-05 539 needed -3.33653E-05 540 side -3.61072E-05 541 private -3.63341E-05 542 lower -3.6468E-05 543 is -3.7774E-05 544 tank -3.78264E-05 545 give -3.94277E-05 546 might -4.03855E-05 547 rank -4.07391E-05 548 budget -4.09127E-05 549 qc -4.2325E-05 550 about -4.56817E-05 551 faculty -4.59875E-05 552 reduce -4.61297E-05 553 occur -4.78536E-05
255
554 summary -4.80683E-05 555 shortfall -2.4394E-05 556 filled -4.87881E-05 557 tcpi -4.87881E-05 558 sloping -4.87881E-05 559 when -4.90922E-05 560 purpose -5.15204E-05 561 arch -5.25679E-05 562 works -5.29039E-05 563 run -5.33299E-05 564 contact -5.44983E-05 565 from -5.68757E-05 566 until -5.76617E-05 567 level -5.77997E-05 568 properly -5.92308E-05 569 them -5.98388E-05 570 another -6.0289E-05 571 linear -6.19014E-05 572 resulting -6.19568E-05 573 completion -6.33948E-05 574 could -6.56103E-05 575 tell -6.6864E-05 576 see -6.71569E-05 577 plants -6.99873E-05 578 willing -6.99873E-05 579 development -7.0608E-05 580 characteristics -7.10971E-05 581 x -7.11932E-05 582 outlined -7.29359E-05 583 saddle -7.31821E-05 584 program -7.50767E-05 585 form -7.56358E-05 586 person -7.67749E-05 587 problem -7.71345E-05 588 values -7.7498E-05 589 eg -7.77208E-05 590 science -7.81278E-05 591 relationship -7.85872E-05 592 pays -8.03662E-05 593 mass -8.07303E-05 594 than -8.11601E-05
595 fast -8.2905E-05 596 attributes -8.2905E-05 597 bank -8.37465E-05 598 increasing -8.39865E-05 599 learned -8.44179E-05 600 going -8.465E-05 601 released -8.54223E-05 602 pour -8.58076E-05 603 cannot -8.6893E-05 604 pipe -8.73754E-05 605 make -8.75435E-05 606 all -8.91625E-05 607 weaker -9.03933E-05 608 swing -9.03933E-05 609 dummy -9.03933E-05 610 durations -9.03933E-05 611 begin -9.17347E-05 612 acceptable -9.307E-05 613 to -9.37489E-05 614 off -9.44908E-05 615 price -9.4909E-05 616 boss -9.61366E-05 617 places -9.62503E-05 618 invoice -9.75761E-05 619 two -9.77922E-05 620 lsi -9.87254E-05 621 awarded -0.000100092 622 across -0.000100092 623 task -0.000100092 624 of -0.000100259 625 element -0.000100425 626 information -0.000101121 627 many -0.000101212 628 or -0.000101561 629 requires -0.000102805 630 know -0.000105651 631 advantages -0.000106385 632 issues -0.000106829 633 expect -0.000107892 634 es -0.000108072 635 conventional -0.000109482
256
636 week -0.000112547 637 first -0.000112718 638 said -0.000113156 639 it -0.000113589 640 preferred -0.000114068 641 boring -0.000118462 642 overheads -0.000118462 643 mayor -0.000118583 644 reduced -0.000119487 645 second -0.000121465 646 being -0.000122041 647 evaluation -0.0001224 648 which -0.000122924 649 control -0.000123139 650 may -0.000125243 651 business -0.000127413 652 empty -0.000127413 653 at -0.000128254 654 removed -0.000128498 655 through -0.000129065 656 any -0.000129744 657 terminal -0.000131731 658 by -0.000132019 659 choice -0.000132245 660 general -0.00013255 661 sv -0.000137858 662 unit -0.000137935 663 its -0.000138112 664 means -0.000139975 665 once -0.000140602 666 problems -0.000140672 667 the -0.000141274 668 generally -0.000144185 669 substantially -0.000144319 670 both -0.000146335 671 tf -0.000147915 672 so -0.000148444 673 report -0.000148461 674 main -0.000148686 675 an -0.00015104 676 produce -0.000151278
677 ways -0.000151362 678 difficult -0.000152422 679 companies -0.000158749 680 lot -0.000160529 681 working -0.000160732 682 paid -0.000160732 683 application -0.000162546 684 code -0.000165427 685 received -0.000165454 686 ever -0.000167771 687 employee -0.000167993 688 indirect -0.000167993 689 except -0.000169738 690 that -0.000169918 691 contain -0.00017051 692 final -0.000171212 693 data -0.000171627 694 work -0.000173109 695 time -0.000174359 696 type -0.000180481 697 cv -0.000180787 698 become -0.000180787 699 and -0.000181211 700 moving -0.000185656 701 sequence -0.000186215 702 here -0.000187044 703 yes -0.000187792 704 rate -0.000189324 705 comparisons -0.000189912 706 window -0.000189912 707 each -0.000190538 708 production -0.00019773 709 physical -0.000198486 710 then -0.000198838 711 operate -0.000202564 712 risk -0.00020463 713 crew -0.000207403 714 operation -0.000208198 715 pulp -0.000210318 716 vour -0.000210318 717 their -0.000210619
257
718 find -0.000211213 719 handle -0.000214235 720 smaller -0.000215899 721 b -0.000218379 722 g -0.000222005 723 these -0.000223202 724 allowed -0.000225003 725 picture -0.000228135 726 expenses -0.000229407 727 profit -0.000234468 728 does -0.000235376 729 shafts -0.000236923 730 earned -0.000237266 731 stated -0.000238508 732 description -0.00024148 733 industry -0.000242956 734 hit -0.000245178 735 communication -0.000245398 736 describe -0.000246207 737 profits -0.000247337 738 beneficial -0.000247337 739 such -0.00025046 740 every -0.000252268 741 largest -0.000254216 742 inside -0.000254216 743 books -0.000254216 744 went -0.000254826 745 now -0.000257423 746 should -0.000257946 747 booklet -0.000258127 748 help -0.000262689 749 latest -0.000264243 750 choic -0.000264489 751 in -0.000264775 752 win -0.000269407 753 into -0.000269952 754 m -0.000270374 755 dates -0.000277105 756 credit -0.000278674 757 did -0.000278935 758 differences -0.000279062
759 doesnt -0.000280235 760 company -0.000283124 761 arrow -0.000287918 762 difference -0.000292583 763 qa -0.000295406 764 quantity -0.000295638 765 series -0.000305104 766 cells -0.000305785 767 seven -0.000314494 768 define -0.000316156 769 triangle -0.000316796 770 equipment -0.00031775 771 ms -0.000318418 772 big -0.000318671 773 ls -0.000321633 774 lowest -0.000323596 775 call -0.000324375 776 net -0.000329834 777 where -0.000329848 778 takes -0.000331702 779 small -0.000331905 780 concept -0.000332475 781 chief -0.000335986 782 bold -0.000335986 783 path -0.000338153 784 more -0.0003386 785 printed -0.00034102 786 local -0.000342372 787 updated -0.000349331 788 grown -0.000349331 789 entered -0.000353411 790 flow -0.000355385 791 rings -0.000358432 792 wave -0.000359246 793 dur -0.000362229 794 much -0.000374646 795 date -0.000374901 796 none -0.000375272 797 manufacturing -0.000381969 798 start -0.000399632 799 fin -0.000413825
258
800 on -0.000416427 801 goes -0.000417785 802 go -0.000422677 803 wrong -0.000425029 804 electric -0.000426298 805 with -0.000428941 806 substantial -0.000432956 807 because -0.000437054 808 batch -0.000439005 809 overhead -0.000451967 810 logic -0.000457738 811 delivered -0.000460924 812 next -0.000463833 813 currently -0.000465225 814 other -0.0004832 815 who -0.000483262 816 software -0.000489114 817 f -0.00048955 818 lag -0.000491088 819 star -0.000492486 820 duty -0.000494675 821 her -0.00050156 822 cell -0.000502531 823 state -0.000508474 824 told -0.000526923 825 kings -0.000528487 826 explain -0.00054289 827 d -0.000543948 828 examiners -0.00055315 829 h -0.000562602 830 tracking -0.000566418 831 balance -0.00058306 832 they -0.000593565 833 create -0.000595617 834 s -0.000607414
835 meaning -0.00060969 836 very -0.000611945 837 vac -0.000615635 838 labour -0.000641241 839 put -0.000661731 840 plant -0.00066507 841 vou -0.000678608 842 definitions -0.000678928 843 l -0.000685939 844 estimating -0.00068822 845 c -0.000760889 846 what -0.000799191 847 dig -0.00084458 848 gap -0.000863595 849 cash -0.000911956 850 terms -0.00103995 851 activities -0.001120939 852 t -0.00116309 853 mob -0.001201179 854 a -0.00122672 855 set -0.001269138 856 drift -0.001272467 857 complete -0.001296933 858 i -0.001298824 859 index -0.00130204 860 shaft -0.001374978 861 union -0.001737676 862 e -0.001749475 863 cpi -0.001772435 864 duration -0.002160036 865 activity -0.003442018 866 float -0.004356521 867 -0.018155076
@
259
APPENDIX E – INDIVIDUAL COURSE STATISTICS
This section presents the statistical measure (Pearson Correlation) for the individual exams used for the study presented in Chapter 5.
260
Correlations Code = CHE412
Descriptive Statisticsa
Mean Std. Deviation N
Participant 2.43 1.653 100
Quintile 3.60 1.287 100
a. Code = CHE412
Correlationsa
Participant Quintile
Participant
Pearson Correlation 1 .642**
Sig. (2-tailed) .000
N 100 100
Quintile
Pearson Correlation .642** 1
Sig. (2-tailed) .000
N 100 100
**. Correlation is significant at the 0.01 level (2-tailed).
a. Code = CHE412
Code = CIV100
Descriptive Statisticsa
Mean Std. Deviation N
Participant 2.60 1.621 100
Quintile 3.60 1.287 100
a. Code = CIV100
Correlationsa
Participant Quintile
Participant
Pearson Correlation 1 .625**
Sig. (2-tailed) .000
N 100 100
Quintile
Pearson Correlation .625** 1
Sig. (2-tailed) .000
N 100 100
**. Correlation is significant at the 0.01 level (2-tailed).
a. Code = CIV100
Code = CIV280
Descriptive Statisticsa
Mean Std. Deviation N
Participant 3.12 1.604 100
Quintile 3.60 1.287 100
a. Code = CIV280
Correlationsa
Participant Quintile
Participant
Pearson Correlation 1 .621**
Sig. (2-tailed) .000
N 100 100
Quintile
Pearson Correlation .621** 1
Sig. (2-tailed) .000
N 100 100
**. Correlation is significant at the 0.01 level (2-tailed).
a. Code = CIV280
261
Code = ECE110 Descriptive Statisticsa
Mean Std. Deviation N
Participant 2.95 1.720 100
Quintile 3.60 1.287 100
a. Code = ECE110
Correlationsa
Participant Quintile
Participant
Pearson Correlation 1 .717**
Sig. (2-tailed) .000
N 100 100
Quintile
Pearson Correlation .717** 1
Sig. (2-tailed) .000
N 100 100
**. Correlation is significant at the 0.01 level (2-tailed).
a. Code = ECE110
Code = ECE221
Descriptive Statisticsa
Mean Std. Deviation N
Participant 2.70 1.403 100
Quintile 3.60 1.287 100
a. Code = ECE221
Correlationsa
Participant Quintile
Participant
Pearson Correlation 1 .749**
Sig. (2-tailed) .000
N 100 100
Quintile
Pearson Correlation .749** 1
Sig. (2-tailed) .000
N 100 100
**. Correlation is significant at the 0.01 level (2-tailed).
a. Code = ECE221
Code = MIE242
Descriptive Statisticsa
Mean Std. Deviation N
Participant 3.25 1.480 100
Quintile 3.60 1.287 100
a. Code = MIE242
Correlationsa
Participant Quintile
Participant
Pearson Correlation 1 .727**
Sig. (2-tailed) .000
N 100 100
Quintile
Pearson Correlation .727** 1
Sig. (2-tailed) .000
N 100 100
**. Correlation is significant at the 0.01 level (2-tailed).
a. Code = MIE242
Code = MIE262
Descriptive Statisticsa
Mean Std. Deviation N
Participant 1.93 1.513 100
Quintile 3.60 1.287 100
a. Code = MIE262
Correlationsa
Participant Quintile
Participant
Pearson Correlation 1 .452**
Sig. (2-tailed) .000
N 100 100
Quintile Pearson Correlation .452** 1
Sig. (2-tailed) .000
262
N 100 100
**. Correlation is significant at the 0.01 level (2-tailed).
a. Code = MIE262
Code = MIE350
Descriptive Statisticsa
Mean Std. Deviation N
Participant 1.48 .990 100
Quintile 3.60 1.287 100
a. Code = MIE350
Correlationsa
Participant Quintile
Participant
Pearson Correlation 1 .319**
Sig. (2-tailed) .001
N 100 100
Quintile
Pearson Correlation .319** 1
Sig. (2-tailed) .001
N 100 100
**. Correlation is significant at the 0.01 level (2-tailed).
a. Code = MIE350
Code = MSE101
Descriptive Statisticsa
Mean Std. Deviation N
Participant 3.37 1.397 100
Quintile 3.60 1.287 100
a. Code = MSE101
Correlationsa
Participant Quintile
Participant
Pearson Correlation 1 .785**
Sig. (2-tailed) .000
N 100 100
Quintile
Pearson Correlation .785** 1
Sig. (2-tailed) .000
N 100 100
**. Correlation is significant at the 0.01 level (2-tailed).
a. Code = MSE101
263