Upload
hoangminh
View
213
Download
0
Embed Size (px)
Citation preview
Which Language to Use? Chinese-English Bilingual Speakers’ Language Selection Criteria for Digital
Information Resources
Peishan Bartley
Simmons College
Abstract
The World Wide Web has made it easy to access information resources of different languages, and
efforts have been put into developing information systems that could search for information across
languages in the field of cross language information retrieval. At the same time, a wealth of studies
delved into how people search for information in the fields of information seeking behavior. How users
are currently searching and consuming information across languages, on the other hand, has received
comparably little attention. This research explores the variables that influences a bilingual user’s
language choice for digital information. More specifically, this research focuses on the influences of
variables that constitutes a bilingual speaker’s language profile, such as language history and language
exposure. Using a mixed method approach that includes surveys and an article selection exercise, this
research explores the variables that cause a Chinese-English bilingual user to prefer one language over
the other when they are given parallel digital contents in Chinese and English. The results show that the
user’s language profile and background has statistically significant impact on the user’s language
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
choice. The results also show concerns over social interaction and personal biases to carry over into
information seeking behavior.
Keywords: information seeking behavior, cross-language information retrieval, multilingual
information access, language choice, language preference, bilingual speakers
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Contents
Peishan Bartley...........................................................................................................................................i
Simmons College........................................................................................................................................i
Abstract.......................................................................................................................................................i
Chapter 1. Introduction and Problem Area Overview............................................................................viii
Chapter 2. Definition of Terms..................................................................................................................9
Languages.........................................................................................................................................9
Information and Information Seeking..............................................................................................9
Bilingual and Multilingual Speakers..............................................................................................10
Language proficiency, dominance, preference, and attitude...........................................................11
Chapter 3: Literature Review...................................................................................................................15
Overview..............................................................................................................................................15
Cross Language Information Retrieval and Multilingual Information Access....................................16
Machine-Based Approaches...........................................................................................................17
Dictionary-Based Approaches........................................................................................................18
Parallel Corpora Based Disambiguation Methods..........................................................................20
Probabilistic-Based and Statistical Approaches..............................................................................22
Latent Semantic Indexing (LSI) and Language Models.................................................................25
Transitive and Triangulation Methods............................................................................................27
Other Design Issues........................................................................................................................28
Summary of CLIR Literature Review............................................................................................29
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
CLIR and Bilingual Users....................................................................................................................29
The Occasions for CLIR.................................................................................................................30
Multilingual Information Seeking Behavior, Language Proficiency and Domain Knowledge.....33
The Use of Existing CLIR Features and Systems..........................................................................35
Summary of Literature Review on CLIR and Bilingual Users.......................................................36
Information Seeking Behavioral..........................................................................................................38
Information Seeking Behavior Models...........................................................................................39
Summary of Literature Review on Information Seeking Behavior Studies...................................45
Bilingual User's Language Choice and Language Use........................................................................47
Code Switching...............................................................................................................................48
The Impact Factors of Language Choices......................................................................................49
First and Second Language Uses in Composition..........................................................................54
First and Second Language Uses in Composition Computer-Mediated-Communication..............55
Language Exposure and Language Dominance.............................................................................56
Summary of Literature Review on Bilingualism............................................................................57
Conclusions of the Literature Review..................................................................................................58
Chapter 4. Research Question and Research Methodology.....................................................................61
Research Question...............................................................................................................................61
Research Method.................................................................................................................................62
Overview.........................................................................................................................................62
Measurements......................................................................................................................................65
Language Attitude...........................................................................................................................65
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Language Exposure and Experience...............................................................................................66
Language Proficiency.....................................................................................................................66
Subject Matter.................................................................................................................................66
Putting it Together..........................................................................................................................67
Material................................................................................................................................................67
Language Proficiency and Internet Usage and Experience Survey................................................67
Article Selection Software..............................................................................................................68
Think Aloud Protocol.....................................................................................................................69
Article Selection Follow-Up Questionnaire...................................................................................70
Population............................................................................................................................................70
Procedures............................................................................................................................................71
Pilot study.......................................................................................................................................71
General User Survey.......................................................................................................................72
Scope and Limitations of the Study.....................................................................................................72
Chapter 5. Data Analysis..........................................................................................................................74
Pilot Study Results...............................................................................................................................74
General Survey Results........................................................................................................................74
Language Profile.............................................................................................................................75
Language Preference in General.....................................................................................................90
Language Use Scenarios.................................................................................................................93
Article Selection Exercise..................................................................................................................105
Article Selection Results...............................................................................................................105
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Post Article Selection Survey........................................................................................................111
Additional Thoughts.....................................................................................................................116
Chapter 6. Discussion............................................................................................................................117
Research Question and Method Review............................................................................................117
Language Attitude..............................................................................................................................118
Result............................................................................................................................................118
Discussions and Implications........................................................................................................119
Summary.......................................................................................................................................123
Language Exposure and the History of Language Use......................................................................124
Result............................................................................................................................................125
Discussion and Implications.........................................................................................................125
Summary.......................................................................................................................................129
Language Proficiency........................................................................................................................130
Result............................................................................................................................................130
Discussion and Implications.........................................................................................................130
Summary.......................................................................................................................................133
Subject Matter....................................................................................................................................133
Result............................................................................................................................................134
Discussions and Implications.......................................................................................................134
Other Findings and Observations.......................................................................................................139
Chapter 7. Conclusion, Implications, Limitations, and Future Research..............................................142
Limitations.........................................................................................................................................143
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Future Research.................................................................................................................................144
References..............................................................................................................................................146
Appendix I. Participant Recruitment Letter...........................................................................................164
Appendix II. Informed Consent Form...................................................................................................165
Informed Consent Form English Version...........................................................................................165
Informed Consent Form Chinese Version..........................................................................................166
Appendix III. Demographic and Language Skill Questionnaire...........................................................167
English version...................................................................................................................................167
Chinese Version..................................................................................................................................172
Appendix IV. User Study Article Selection Samples.............................................................................176
Appendix V. Interview Script.................................................................................................................178
Appendix VI. Post Article Selection Questionnaire...............................................................................179
English Version with Simulated Article Selection Result..................................................................179
Chinese Version with Simulated Article Selection Result.................................................................180
Appendix VII. Variables represented in the survey items......................................................................181
Appendix VIII Coding Framework........................................................................................................182
Initial Coding Example......................................................................................................................182
Secondary Coding Example...............................................................................................................182
Appendix IX Literature Review Source................................................................................................183
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Table of Figures
Table 1. Use of Language in Different Situations....................................................................................85
Table 2. Participant English Reading Proficiency...................................................................................86
Table 3. Participant’s Chinese Reading Proficiency................................................................................86
Table 4. Language Preference and Dominant Language Comparison.....................................................91
Table 5. Participant’s reasons for preferring one language......................................................................92
Table 6 Dominant language and the corresponding language to dominant culture.................................93
Table 7. Daily Activity and Language Use..............................................................................................94
Table 8 Participant Online Activity-language Use Summary..................................................................99
Table 9 Participant Online Activity-language Use...................................................................................99
Table 10. Dominant Language and the Frequency of Using English for Online Activities. Mann-
Whitney Test Result...............................................................................................................................100
Table 11. Preferred Language and the Frequency of Using English for Online Activities....................101
Table 12. Language Choice for Internet Activity and Language Proficiency........................................103
Table 13. Number of Years Living in the US and Conducting Internet Activity in English..................103
Table 14. English Daily Exposure (in Percentage) and Conducting Online Activity in English...........104
Table 15. Criteria for online language choice........................................................................................104
Table 16. Article Selection Result..........................................................................................................105
Table 17. A Cross Comparison of General Language Preference and Online Language Preference.....111
Table 18. Is it easy or hard to choose between different language excerpts?........................................112
Table 19. Language preference for the news article excerpts................................................................113
Table 20. Language preference reasons.................................................................................................114
Table 21. Why one language appeals to you first..................................................................................115
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Figure 1. MLIR process.............................................................................................................................2
Figure 2. Language Choice Variables and Information Seeking Behavior..............................................67
Figure 3. Daily English Exposure in Percentage vs. Number of Years Residing in the US....................79
Figure 4. Survey Language and Amount of Daily English Exposure......................................................80
Figure 5. Scatter Plot – English as Spoken Language vs. English as Text Language.............................82
Figure 6. Daily use of language...............................................................................................................84
Figure 7. Amount of Daily English Exposure and the Length of Daily English Use..............................84
Figure 8. English proficiency and number of years residing in US.........................................................87
Figure 9. English Proficiency Level and Dominant Language................................................................89
Figure 10. English Proficiency and Survey Language.............................................................................90
Figure 11. English Proficiency and Answer Language............................................................................90
Figure 12. Language Choice for Different Situations..............................................................................95
Figure 13. Language Use Online and the Number of Years Living in US..............................................96
Figure 14. Language Use Online and Daily Language Exposure............................................................96
Figure 15. Domain Language and Online Language Preference Comparison........................................97
Figure 16. Language Preference in General and Online Language Preference Comparison..................97
Figure 17. Daily Internet use...................................................................................................................98
Figure 18. Amount of Personal/Recreational Research Conducted in English Clustered by English
Proficiency (1 – lowest, 5 – highest).....................................................................................................103
Figure 19. Article Selection Result and Information Source Language................................................107
Figure 20. Amount of Daily English Exposure and the Number of English Articles Selected.............108
Figure 21. Number of English Articles Selected and English Proficiency Level..................................109
Figure 22. Average Number of English Version Articles Selected and English Proficiency Level.......109
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Figure 23. Pie chart - language preference for the news article excerpts..............................................113
Figure 24. Pie Chart - Preferred Language and English Proficiency.....................................................114
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Chapter 1. Introduction and Problem Area Overview
Human language is rich in diversity. This is evident in Maryland Language Science Center’s
Langscape project (http://langscape.umd.edu) which mapped 6,300 languages from 175 countries, and
in European Union’s 24 official and working languages (http://ec.europa.eu/education/official-
languages-eu-0_en). The wide linguistic diversity in both text and speech is reflected in online
resources as well. A simple search on Google brings up websites created in a vast number of languages,
from Afrikaans to Kongo to Yiddish. There are bountiful of information expressed in all these different
languages ready for anyone with an Internet connection to tap into from anywhere in the world. Yet
even with the easy access, information seekers may never find the information because they don’t
know the language the information is written in. They may not be able to understand it if they
happened upon it, and may not even begin to use the language to form a search term and search for it.
The discrepancy between a user’s known languages and the language the relevant information is
written in (henceforth referred to as document language) is a barrier that bars information access. To
bridge this language gap is the primary goal of the fields of multilingual information access (MLIA)
and cross language information retrieval (CLIR).
MLIA and CLIR are subfields of information retrieval (IR), a field that strives to solve the issues
of storage, retrieval, and display of information resources (Baeza-Yates & Riveiro-Neto, 1999). MLIA
emphasizes on the discovery, access, querying, and retrieving of multilingual digital documents. It is IR
further complicated by the addition of multiple languages. The multilingual nature is within the
collection itself, and also between the user and the documents (Oard, 2009; Peters & Sheridan, 2001;
Peinado, Rodrigo, & Lopez-Ostenero, 2013). There are three components in MLIA that works together
as described in Figure 1.
(1) a digital collection of multilingual documents,
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
(2) a computer system that supports information retrieval of multilingual documents, and
(3) users who can or would make use of a multilingual collection.
Figure 1. MLIR process.
On one side of the system is the collection. A collection of multilingual documents could contain
documents each composed in only one language (monolingual documents), or be made up of multiple
languages (multilingual documents). An example of monolingual document collection is the Parliament
of Canada website (www.parl.gc.ca) which provides online access to digital documents written either
completely in French, or completely in English. As for digital collections with documents that contain a
mix of languages, see language teaching websites, such as www.guidetojapanese.org, for example. The
language teaching website uses a mix of languages within a single sentence to provide vocabulary
definition or usage demonstration.
On the other side of the system is the users. Internet users may or may not be fluent in the
document languages present in the collection. With disparate language skills, some users may be able
to come up with the search terms that would retrieve information relevant to their information need,
and some would not be able to do so without help. The later situation can potentially be solved by a
cross language information retrieval (CLIR) system. CLIR systems are the computer systems that sits
between the users and the document collections, and acts as a bridge. CLIR systems are designed to
accept search terms in one language (the query language) and retrieve relevant documents in other
languages (the document languages).
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
CLIR is viewed by some researchers as an integral part of MLIA (Oard, 1999; and Peters &
Sheridan, 2001). Not only does it address language identification, character encoding, and multilingual
indexing issues that occur in the processing of a multilingual collection, it also aims to provide a way to
cross the language gap (Peters & Sheridan, 2001, Gey, Kadno, & Peters, 2005; Peters & Sheridan,
2001). The process of CLIR begins when a user composes a query in a query language of his/her
choice. Once the search terms were entered, the system proceeds to match it to potentially relevant
documents through translations, statistical algorithms, or other matching methods (Herbert, Szarvas, &
Gurevych, 2011; Ye, Huang, He, & Lin, 2012; and Nie, 2010). Using a CLIR system, the users are no
longer confined by their language ability; they can search in collections of foreign language, and
retrieve documents in different tongues. A user's access to multilingual digital information resources
would therefore be broadened.
With the rapid spread of the Internet, and the development of a Web structure on which
multilingual contents can be hosted and accessed by users during the 1990’s, the potential and
importance of CLIR was recognized (Peters, Braschler, & Clough, 2012, Gey, Kando, & Peters, 2005).
Since then, there has been many significant developments within the field, yet the translation of
technology developments into a comprehensive CLIR system for common users appears to be slow in
coming (Gey et al., 2005; Diekema, 2012; Peters, Clough, Gey, Karlgren, & Magnini, 2007). CLIR
applications widely available to the general public were not available until Google launched Google
Translated Search, also known as Translated Foreign Pages search option, in May, 2007 (Sterling,
2007).
Google Translated Search combined statistical machine translation (Inside Google Translate, n.d.;
Russell, April 23, 2013) and Web searching. It offered a way to search for digital information resources
across languages. Google Translated Search takes a user's query, replace it with search terms from the
user's intended document languages, and use the new set of search term to retrieve results. The process
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
is similar to query reformulating (Belkin, 2000) but with the added complexity of reformulating the
query into a different language. The search results can be viewed in the original document languages,
or be translated into the query language. The search feature was seen as a breakthrough for
transitioning CLIR research into a publicly available, real-life application (Chen & Bao, 2009).
However, this breakthrough was not widely adopted by end-users even though researchers foreseen
many benefits for using such a system, such as to plan for a foreign trip or to broaden research scope
(Artiles, Gonzalo, Lopez-Ostenero, & Peinado, 2007; Marlow, Clough, Recuero, & Artiles, 2008). In a
small study, Web users who were introduced to the feature expressed doubt to its usefulness and
practicality (Marlo et al., 2008). Citing lack of use, Google Translated Search was disabled in 2013
(Schwartz, 2013). To date, there has not been another generally available Web-based CLIR system.
There are many possible reasons for Google Translated Search's lack of use, such as insufficient
user awareness (e.g. Marlow et al., 2008), degradation of retrieval effectiveness (Savoy and Dolmaic,
2009). From subject responses collected by Marlow et al. (2008), it appears that the lack of use is
possibly a result of too little understanding of its potential users and how they approach multilingual
digital information resources.
Ruminating on the applications of CLIR systems, Oard (2009) envisioned two types of users: (1)
multilingual speakers who may be able to formulate queries and read documents in different languages,
and (2) monolingual speakers who need translations to help bridge the language differences between
their known language and the collection language. The biggest difference between the two types of
users is language proficiency. A number of existing MLIA research held similar vision and view user’s
language proficiency as one of the major MLIA impact factors. For example, Marlo, et al. (2008)
studied the impact of language proficiency on users' multilingual search task results; and Hong (2011)
examined how bilingual users adjust their search strategies according to their language proficiency.
Petrelli, Hansen, Beaulieu, Sanderson, Demetriou, and Herring (2004) inspected the amount of details
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
and information needed for users of different language proficiency to judge a document’s relevance.
Other studies observed how users’ attitudes toward using a less familiar language with an existing
CLIR systems. Such examples include Artiles, et al. (2006) and Marlo, et al.’s (2008) studies using
Google Translate, and FlickrLing; and Petrelli, Beaulieu, Sanderson, Demetriou, Herring, and Hansen's
(2004) study on test system CLARITY. Yet is language proficiency the only variable that impacts a
user's CLIR experience?
Some researchers propose search task as the other impact factor on the information seeker’s
CLIR behavior (Petrelli et al., 2004, Rieh and Rieh, 2005, Hong, 2011, Steichen, Ghorab, O’Conner,
Lawless, & Wade, 2014). Research have found that users do not begin an information seeking process
blindly. When an information need rises, users decide upon the language and information resource to
use based on past search experience or speculation of where they can most likely find relevant
information. For example, travelers needing train time table and students looking for movie show time
choose to search for the information using the native language (Aula & Kellar, 2007; Hong, 2011).
When the same students need to do scholarly research, they search in academic databases using English
as their query language (Hong, 2011). From these observations, it appears that users’ language selection
is task based. I would argue that it is task based because information resources are currently segregated
by languages. Information seekers are knowledgeable enough to know that if they are looking for local
information, they need to use the native language. If they are looking for information within a field that
has a dominant language, they need to search within the dominant language. This phenomenon has
been observed by Stiechen, et al. (2014), and has more to do with the current availability of
information, and less to do with the user. Whereas language proficiency is an innate capacity and
contributes to a person’s language preference, task based decision is made based on learned experience,
and not a personal choice. This research is focused on the impact factors on user’s personal preference.
The emphasis is on the variables brought about by the user. Variables such as language proficiency,
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
which is but one of a complicated set of factors that, altogether, forms a person’s language profile.
A person’s language profile is a composition of the person’s language history (exposure, length of
use, etc.), current language environment, language attitude (preference and cultural identity) and
language dominance (concepts of language preference and dominance are discussed in detail in the
term definition section that follows) (Marian, Blumenfeld, & Kaushanskaya, 2007). Research in the
field of bilingualism have found that several language profile elements are more influential to the
language choice of a bi- or multilingual speaker in different situations, and demonstrated how language
choice is complex and involves many different factors (Dewaele, 2007; Bahrick, Hall, Goggin,
Bahrick, & Berger, 1994, Hakuta & D'Andrea, 1992). This is reflected in a dated but still relevant
statement by linguist Fishman, “habitual language choice is far from being a random matter of
momentary inclination, even under those circumstances when it could very well function as such from
a purely probabilistic point of view” (1965, p. 67). Linguists have been studying bilingual speakers and
identifying different features and functions relating to language choice. Their findings, however, have
not been examined in the light of information seeking or CLIR. This study intends to fill this gap by
examining bilingual user’s information behavior through the user’s language profile.
In summary, cross lingual information retrieval systems have expanded into real world
application but has not gained traction among users. I propose that part of the reason is because there
has not been enough understanding of CLIR users. There are studies on how users handle language
skill deficiencies when they need to search for information across languages, and on how language
proficiency level impacts users’ information seeking behavior, but there are likely other impact factors
that have not yet been explored. This study looks beyond language skills and incorporates other
language profile elements into the investigation of users’ cross language information behaviors. This
study further differs from existing research by examining potential impact factors of the language
selection process without the interference of search tasks and different system interface designs. By
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
doing so, this study gains a deeper look into bilingual users and a deepen the knowledge of is needed to
help users navigate the multilingual World Wide Web. The results is valuable across several subject
areas including CLIR system designs, web-based information seeking behavior, and bilingualism.
For CLIR, the findings of this research could enhance system designers' understandings of
bilingual users' multilingual information resource uses, and on what type of metadata or search features
could better assist users. Understanding how users identify content with language and how language
factors into information consumption can help decide whether full text translation is needed, or if
supportive features, such as query term suggestions, would be enough. This study's finding will also
contribute to the research into information seeking behavior where researchers continues to examine
the phenomenon of information seeking on the Web.
Web-based information seeking behavior has been examined from many different angles, such as
the general behavioral patterns of Web users (Jansen & Spink, 2006); the intents of Web searching
(Jansen, Booth, & Spink, 2008); the variability of a user's Web search patterns (White & Drucker,
2007); and the different behaviors of users by subject field (Ge, 2010). The field has so far focused on
monolingual speakers who represent only a part of the highly heterogenous Internet using demography.
There are few discussions about information seeking behaviors of bilingual and multilingual users, or
about cross-language information seeking behaviors. This study examines the information seeking
behaviors of bilingual and multilingual users in the hopes that it would enrich our understanding of
information users, and add additional layers to the as yet defined bilingual user profile. This research
also sheds light on how users perceive the availability of information.
For bilingualism, the findings of this study provide additional data on how bilingual speakers
think about multilingual digital information resources, and how they approach them. Bilingualism
research often focus on language switching in social context, for educational purposes, or on its effects
on mental processing. For example, Androutsopoulos' (2009) examination of language use on an online
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
forum observed that language choice is used to alter the setting of a discussion, such as from formal to
informal. Another example is Kaushanskaya, Gross, and Buac's (2014) study on the effect of children's
bilingual experience on cognitive skills. Studies of language use online, such as Lam (2004), have
largely confined to social media where language is used for communication between people. This
research explores users’ language choice when the resource is text-based information.
The next chapter provides the definitions to crucial concepts relevant to this study with the
literature review following after.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Chapter 2. Definition of Terms
Before we continue any further, important terms and concepts used in this paper needs to be
defined. At the center of the discussion are the concepts: information, information seeking, bilingual
and multilingual speakers, and language proficiency, dominance, attitude, and preference.
Languages
English. English refers to modern English with Americanized grammar and spelling, spoken
and used in the United States.
Chinese. Chinese refers to Mandarin Chinese, the common, standard, working language in
Taiwan and mainland China. Other dialects will be referred to in their name, such as Hokken, a dialect
used in Taiwan, and Cantonese, a major language used in Hong Kong.
Information and Information Seeking
Information. “Information” is not an easy concept to define. Researchers from different fields
have extensively discussed what the term “information” means (e.g. Shannon, 1948; Artandi, 1973;
Belkin & Robertson, 1976; Belkin, 1978; Farradane, 1980; Zhang, 1988; Buckland, 1991; Capurro &
Hjorland, 2003; Bates, 2006; Hjorland, 2007). Information has been described as a message issued
from a source as signals to a receiver (Shannon, 1948); as physical representations of knowledge, such
as books (Buckland, 1991); in relation to a communicated and transformed state of knowledge (Belkin,
1978); or as a pattern or organization of matter and energy (Bates, 2006). This study follows Bates’
(2009) line of thought and define information through its role in general conversation (Bates, 2009)
such as: students search for information about the American Civil War for a school project; a person
looks for public transportation information for an upcoming trip in a foreign city; reporters searches for
information on different products in order to write a review. In the above cases, “information” are facts
and statistics that enters a person's cognitive space either through active pursuit or passive encounter,
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
and alters the person's knowledge store. Encountering information leaves an impact on the person in
the forms of emotional responses, deeper knowledge, a reaffirmation of pre-existing understandings, or
new ideas and thoughts. In this context, “information”: (1) is fact or data transmitted through a media,
text, image, or sound; (2) needs to be received by a person; and (3) changes the emotional and/or
knowledge state of the receiver. The physical vehicle (such as documents, images, or collections of
both) that carries the information are referred to as information resources.
This study is about user’s language choices regarding digital, text-based documents. Henceforth
in this research, the term “information” refers to digital, text-based documents, and “information
resources” refers to databases or document collections that can be accessed through the Internet.
Information seeking. Information seeking is the action of looking for relevant information that
would satisfy one’s information need. The action can be a purposeful search of specific fact, or
intentional browsing for interesting information resources. It is a deliberate behavior, different from
information encountering in which unexpected discoveries are made through passive, unintended
exposure to information (Erdelez, 1999). The process of online information seeking involves: (1)
recognition of the information need, (2) an initial strategy of browsing or searching, (3) formulation of
search terms, and (4) examination and evaluation of retrieved document set (Holscher & Strube, 2000),
Bilingual and Multilingual Speakers
Oxford English Dictionary defines “monolingual” as “a person who speaks only one language”
(“Monolingual”, 2013). More specifically, monolinguals are people who can speak, read, write, and
comprehend only one language.
OED's definition of “bilingual” refers to “one who can speak two languages” (“Bilingual” [Def.
3], n.d.). In the same spirit, multilingual describes a person who can speak more than two languages.
While there are differences in the number of languages involved in the concepts of multilingualism and
bilingualism, the complexity of both concepts stems from the involvement for more than one language.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
As a result, the following discussion groups multilingual speakers with bilingual speakers, and focuses
on the concept of bilingual speakers and bilingualism.
For many scholars, “able to speak two languages” is an oversimplification of bilingualism (Baker
& Jones, 1998). “The ability to speak” is overly vague and does not cover the many levels of
proficiency, frequency of use, and ways of language use. Bloomfield (1935), for example, strictly sees
only speakers with “native-like control of two languages” (p.56) as bilingual. Macnamara (1967), on
the other hand, only requires one to have a minimal competency in any of the four language skills
(listening comprehension, speaking, reading, and writing) in one non-native language to qualify as
bilingual. Grosjean (2012) approaches bilingualism emphasizing on the frequency of language use.
Bilingual speakers are defined as “those who use two or more languages (or dialects) in their everyday
lives” (Grosjean, 2012, p.4). Hamers and Blanc (2000) argues that bilingualism needs to be defined
through societal and cultural context. From their point of view, the definition of bilingualism needs to
account for the psychological and social functions of language.
The definition for “bilingual” in this dissertation falls somewhere between Bloomfield’s (1935)
stringent language skills requirement and Macnamara’s (1967) relaxed condition. A bilingual speaker is
defined in this paper as a person who can read, write, speak, and have adequate listening
comprehension ability to carry on a conversation in a language in addition to their native language.
Some of the subjects in this study are exposed to both languages regularly or habitually in their daily
lives either at home or in professional settings. Others only use their second language sporadically. All
of them are able to communicate somewhat with both languages in both oral or written form.
The social, cultural, and psychological dimensions as well as language proficiency are not treated
as critical criteria, but would be accounted for as potential impact factors for this study.
Language proficiency, dominance, preference, and attitude
Language proficiency. A person’s use of language is a complicated matter that can be viewed
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
and measured in various ways. Language proficiency is one such measure, and it is one that is often
examined as an impact factor in cross-lingual information seeking.
Language proficiency is a person's ability to express their thoughts in and comprehend a
language (Francis, 2012). Lim, Liow, Lincoln, Chan, and Onslow (2008) quoted Birdsong (2006) and
describes proficiency as related to “the mastery of syntax, vocabulary, and pronunciation of a
language” (p.39). Proficiency is often measured by educators, researchers, or institutions through
language tests such as ones offered by American Council on the Teaching of Foreign Languages
(www.actfl.org) and Cambridge English Language Assessment (www.cambridgeenglish.org). It can be
tested and the result expressed quantitatively. For this study, participants are asked to self-rate their
language proficiency level using the LEAP-Q survey questions (Marian, et al., 2007).
Language dominance and language preference. The meaning of “language dominance” is not
as clear cut as language proficiency. Bedore et al. (2012) views dominance as “a measure of relative
performance” (p. 4). Measurements of it can be taken through the actions of reading and writing. In
another point of view, dominance results from the difference in one's mental processing ability between
first and second languages (Birdsong, 2006; Aparicio & Lavaur, 2013). Dominant language is the one
that a speaker can process faster and more accurately (Aparicio & Lavaur, 2013). Flege, Mackay, and
Piske (2002) included other measurements into consideration, such as self-ratings of a person’s ability
to read, write, speak and understand his/her known languages; and the speed, or “automaticity”, of a
person's language processing capability.
This research uses Grosjean’s (1982) definition of language dominance: a person's inclination to
use one language over other known languages. This is different from language preference, which is a
person’s more favorable attitude toward one language over other languages. Language dominance is a
confluence of a host of variables, including the person’s language proficiency, the degree of ease they
feel when they are mentally processing the language, their cultural identification, the frequency of
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
language use, and their exposure to the language (Gertken, Amengual, & Birdsong, 2014; Grosjean,
1982). A person who is most proficient in their native tongue could view their second language as the
dominant one because of the environment they are in, the domain in which the language is used, or the
amount of exposure one is subjected to (Lim et al., 2008; Birdsong, 2006; Grosjean, 1982). A person
might prefer one language, but has a different dominant language due to the frequency of use. The
above two examples demonstrate how multifaceted and nuanced language dominance is formed.
In this research, participants are asked of their dominant languages in the survey. They are also
asked about language preferences. Moreover, language preference is observed through the participants’
choice of language for the survey questions (survey language), and for answering the questions (answer
language).
Language attitude. Language attitude has long been associated with the acquisition and
maintenance of language, language use, identity construction, and other language related issues (Ianos,
Huguet, Janes, & Lapresta, 2015). It is:
The attitudes which speakers of different languages or language varieties have towards each
other’s languages or to their own languages. Expressions of positive or negative feelings
towards a language may reflect impressions of linguistic difficulty or simplicity, ease or
difficulty of learning, degree of importance, elegance, social status, etc. Attitudes towards a
language may also show what people feel about the speakers of that language. Language
attitudes may have an effect on second language or foreign language learning. (Longman
Dictionary of Language Teaching and Applied Linguistics, 2010, p.314).
A classical view of attitudes is to break it into three components: cognitive (thoughts and
belief), affective (feeling), and readiness for action (behavioral intention or plan of action) (Baker,
1992). For example, a Chinese-English bilingual speaker’s attitude towards English can be seen as the
composition of: (1) a belief that it is important to be able to speak English (cognitive), (2) anxiety of
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
having to speak English (affective), and (3) avoidance of using English when Chinese is available
and/or accepted (readiness for action). Consequently, this current study measures the language attitude
by collecting participants’ thoughts about languages through open-ended questions, and the observation
of language choice for the survey and for question answering.
Now that the terms have been defined, the next chapter reviews literature of relevance.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Chapter 3: Literature Review
Overview
The information seeking behavior of a bilingual user is an interdisciplinary issue. It involves
information seeking, accessing multilingual information resources (MILA), the use of different
languages, and the possible employment of a cross-language information retrieval (CLIR) system. As a
result, at least three fields are of concern: information seeking behavior, bilingualism, and MLIA and
CLIR. Relevant studies from these three fields are reviewed below (see Appendix VI for resources and
search terms used) beginning with CLIR and MLIA, followed by information seeking behavior, and
bilingualism. The following paragraphs will provide an introduction to and define the scope of the
literature review.
CLIR has been an active and productive field since the 1980's. For this study, an overview of the
major growths in the field will be given to illustrate how the main focus of the field has been in the
development of more efficient CLIR technology. The review on MLIA literature will focus on the
studies on MLIA user behaviors that mostly focus on how and what resources are used by users; how
users thought about and use CLIR systems; how they conduct cross-language information seeking; and
how language related variables impact system use.
Information seeking behavior is multifaceted and complicated. There is the motive that spurred
the action, the selection of the information source, the formulation of the query, and finally the act of
reviewing the retrieved documents and judging their relevancy to one's information need. The focal
point of the literature review would be on establishing the importance of considering language choice
in the overall information seeking behavior.
Bilingualism is an interdisciplinary subject involving many fields as well, including linguistics,
psychology, neuroscience, education and sociology. Here, particular interest is paid to research
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
studying the variables that impact language preference, language choice, and language switching.
As will be shown in the rest of this chapter, literature in CLIR seems to focus on technology and
system development, literature in information seeking behavior focuses on monolingual information
seeking patterns, and literature in bilingualism emphasizes the cultural and social aspects of bilingual
speakers' language use in oral and sometimes written communications. While there are MLIA research
on how bilingual users are using existing CLIR systems, not much has been done to explore how they
select the language to begin with. This review will demonstrate that still more needs to be done to
understand how bilingual Web users associate with languages in regards to digital information
resources.
Cross Language Information Retrieval and Multilingual Information Access
Cross-Language Information Retrieval (CLIR) addresses the situation in which a user submits a
query in one language to retrieve documents written in other languages. As a recognized sub-field of
information retrieval (IR) (Gey, Kando & Peters, 2005), CLIR shares many characteristics of the
general IR which deals with the representation, storage, retrieval, and access of a document collection
(Baeza-Yates & Ribeiro-Neto, 1999), but has the added complexity of language differences between
the query language and the document language to contend with. The major challenge is to represent and
store documents of multiple languages in a way that can facilitate effective access and retrieval using a
different query language.
A review of CLIR literature suggest that the major trend in the field is in system and technology
development, especially in improving the recall (the percentage of relevant results retrieved) and
precision (the percentage of relevant results among all retrieved results) of the system. More recently,
topics of image and multimedia file retrieval, and user's experience interacting with the system have
also garnered interest in the field (Gey et al, 2005). Even with the emergence of studies on human
interaction with CLIR system, the focus of the field appears to be on technological advancements (Gey,
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Kando, Peters, 2005). While user behavior, interface designs, and user's information needs are
identified as major issues, they received little attention and still present research opportunities (Gey,
Kando, Peters, 2005; Petrelliet al., 2004).
Within the scope of this study (see Chapter 4 for the complete outline of the scope), and the use
of text document within this proposed research, the following review on CLIR literature identified and
covers the major approaches to CLIR systems that includes: machine translation that combines existing
machine translation systems with monolingual IR systems; query and/or document translation using
lexical resources such as dictionaries, corpora, and Web pages; sense disambiguation and query
expansion techniques that augment the accuracy or recall of retrieval results; statistical models, such as
latent semantic indexing, that maps the relationship among words; language modeling, triangulation
and other alternative methods that can be used when there is a dearth of direct language to language
translation resource. The next section begins with machine-based approaches.
Machine-Based Approaches
Machine-based translation (MT) is the use of existing computer software to translate text
document from one language to another via statistical algorithm or some linguistic resource (Somers,
1999; Pecina et al., 2014). The translation software, such as Google Translate, is designed to produce
translations that are as accurate and fluent for human readers. With the ability to translate either the
documents or the queries into the same language, MT should effectively remove the language barrier
and change a cross lingual situation into a monolingual one (Oard, 1998, McCarley, 1999; Zhou,
Truran, Brailsford, and Ashman, 2008). Yet for a long time, MT were not able to achieve the results of
other CLIR approaches (Oard, 1997; Fujii and Ishikawa, 2000). On one hand, queries, often in the form
of brief, sequential words, do not usually provide enough contextual clues for accurate translation
(Pikola, 1998). On the other hand, while lengthier documents often do lead to better translation quality,
the translated results are still not good enough to produce a satisfactory CLIR result (Oard, 1998; He,
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Wang, Oard, & Nossal, 2002). Early efforts to improve machine translation qualities for CLIR purposes
were judged as not effective enough to mitigate the required cost (Ballesteros and Croft, 1997), but
improvements have been made as technology advances. And statistical machine translation based
approaches have dominated CLIR efforts as query translation qualities improve (Pecina et al., 2014;
Wu, He, & Grishman, 2008), document translations become good enough to identify a documents'
relevancy to a query (Orengo & Huyck, 2006), and machine translation results come closer to
simulating what a non-native speaker might be able to produce (Chen, Ding, Jiang, and Knudson,
2012).
Google and Microsoft Bing, two major English search engines, both provide translation features
through MT (Russell, May 24, 2007; DePalma, July 11, 2012). Google translate
(http://translate.google.com/ ) supports 80 languages (Google, 2013, December 10), and Bing
(http://www.bing.com/translator ) lets users choose among 44 (http://www.bing.com/translator/help/).
In May 2007, Google combined its translation feature with search, and rolled out Google Translated
Search (Sterling, May 23, 2007). Users were able to use Google Translated Search to retrieve results
from multiple languages using one query. The search feature, however, was discontinued in 2013
(Sterling, October 7, 2013) as its machine translation feature continues to serve many CLIR projects as
the basis for statistical machine translation (Pecina et al., 2014).
Dictionary-Based Approaches
Dictionary-based approaches use machine readable dictionaries, bilingual word lists, or other
lexical resources to translate the query terms by replacing them with their target language equivalents
(e.g. Hull and Frefenstette, 1996; Croft, 1998; Oard, 1998; Prikola et al. 2001; Zhou et al, 2008; Airio
and Kettnen, 2009; Kishida and Ishita, 2009). These types of lexical resources may be different from
conventional dictionaries and thesauri in that they are not used to provide a precise description of the
meaning of the word or examples of uses for human readers, but to provide connections among words
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
such as synonyms and acronyms that can be read by software. In general, machine readable lexical
resources are easier to construct for different language pairs than an effective MT system that requires
the development of statistical algorithms on top of the lexical resources. Dictionary-based approaches
are therefore viewed as easier to implement (Levow, Oard, and Resnik, 2005; Oard, 1998).
Furthermore, researchers have found that by using language resources with comprehensive coverage
and accurate relationships among words, they are able to produce CLIR results that rival or even
surpass monolingual IR system performances (Zhou, Truran, Brailsford, Wade, & Ashman, 2012). The
quality and coverage of the language resources can have significant impact to the CLIR system’s
translation performance. A poorly constructed dictionary that fails to identify phrases, does not cover
newly coined terms and compound words, lacks in the coverage of multi-word expressions and
common out of vocabulary terms such as proper names and jargon, or has a narrow coverage can
greatly hinder the system performance (Hull & Grefenstette, 1996; Xu & Weishedel, 2000; Demner-
Fushman & Oard, 2003; Zhou, et.al, 2012).
There is also the issue of translation ambiguity that can cause translation errors. Translation
ambiguity occurs because words often carry multiple meanings that can lead to several different
translations. Not all of the translated meanings may be intended in the query. The situation can be
handled by either including all variations of translations, or discern, by some means, which translations
best represent the original query. The later process is referred to as disambiguation.
Using all the possible translations of every word in the query can inadvertently add noise to the
retrieval process (Hull & Grefenstette, 1996; Ballesteros & Crofts, 1998). This is because the approach
can lead to words with the most possible translations receiving more weight than words with fewer
translations, devaluing the latter, and thus degrading the retrieval outcome. The approach does not
appear to be used in more recent projects.
The alternative approach of selecting one translation for each words in the query can be done in
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
different ways. One way is to assume that the first definition listed in the dictionary is the most
frequently used, and the most likely to reflect the concepts expressed in the original query term.
Therefore, the system selects the terms corresponding to the first sense, or just the first term to use (e.g.
Oard, 1998; Ballesteros & Croft, 1998; Zhouet al., 2012). Or, as every term bears some possibility to be
the correct translation, one can randomly select a term from the potential translations as the new query
term. Oard (1998) showed that selecting a random translation from multiple translations can be as
effective as retaining every possible translation for a query, although both are far below the
performance of monolingual retrieval.
A better approach is to use statistical approach, such as co-occurrence rate, as the basis to decide
which translation is most likely to be appropriate. For example, Reddy and Hanumanthappa's (2012)
devised an approach based on the assumption that the correct translations of the words that form the
query have a higher likelihood to co-occur in the target document. Other methods used to disambiguate
the word sense includes part-of-speech tagging (Cutting, Kupiec, Pedersen, & Sibun, 1992; Davis and
Ogden, 1997) and the employment of parallel corpora.
Parallel Corpora Based Disambiguation Methods
Parallel corpora, also referred to as translation corpora, are sets of translation-equivalent texts in
which the corpus in language A mirrors the corpus in language B in both content and structure (Cartoni,
Zufferey, & Meyer, 2013; Johansson, 2007; Dyvik, 2004). Parallel corpora can be used as a direct
translation source through side-by-side analysis of text (e.g. Zhou, Truran, Brailsford, Wade, &
Ashman, 2012), as training text for statistical machine translation systems, (e.g. Cartoni, Zufferey, &
Meyer, 2013), to obtain co-occurrence statistics for sense disambiguation (e.g. Ballersteros & Croft,
1998; Ide, Erjavec, & Tufis, 2002, July; Ng, Wang, & Chan, 2003, July), or for linear disambiguation
(e.g. Davis, 1996; Davis, 1998; Davis & Ogden, 1997). These uses of parallel corpora would be
discussed more in later paragraphs.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
The premise for using the co-occurrence statistics in parallel corpora for sense disambiguation is
that the correct translations would be used together in the document language corpora as the original
terms would be in the query language corpora. Therefore, the correct translations would co-occur in the
document language in frequency and distance the way the original terms co-occur in the query
language (e.g. Gao, Nie, He, Chen, & Zhou, 2002). The co-occurrence statistics of the potential
translations are then used as the foundation to select the correct word sense as translations.
The parallel nature of the corpora is used differently for linear disambiguation approach. This
approach is based on the assumption that a term and its translation would retrieve similar sets of
documents in their respective collections. Therefore, systems using linear disambiguation methods
retrieve documents from both language sets using the original query and all its translations. The
retrieved document sets are matched against each other. The translation that produced the most similar
set of documents to the original query is selected as the correct translation (Hiemstra & Jong, 1999).
While the translation process may produce excessive translations that are not related to the user's
original search, there is also the possibility that certain meanings are lost in translation. This situation is
often handled by query expansion methods, including local feedback and local context analysis.
Query expansion methods. Local feedback and local context analysis are two popular query
expansion methods used by IR systems to solve word sense mismatch problems that occurs when the
same idea is expressed in different ways in the query and the document, making it difficult for the
system to associate one with the other. Query expansion methods are used in CLIR to reduce this type
of dictionary-based translation errors (Ballesteros & Croft, 1997; Ballesteros & Croft, 1998; McNamee
& Mayfield, 2002; McNamee & Mayfield, 2004). Whereas disambiguation approaches are used to
eliminate incorrect translations, query expansion methods are used to make sure the correct sense is
included in the final translation set.
Local feedback is a common query expansion technique that retrieves documents in two steps. A
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
first set of documents is retrieved using the original query terms. The documents that were ranked
highest in relevancy are used by the system to extract additional query terms to expand the user's query
and retrieve the final set for the user (Xu and Croft, 2000; Wu & He, 2008). Local context analysis is a
method proposed by Xu and Croft (2000) that employs co-occurrence analysis for query expansion.
Concepts, instead of terms, are extracted from the top-retrieved documents. Both of the methods have
been seen to improve CLIR results (Zhouet al., 2012).
In addition to using lexical resources to look up translations, statistical and probabilistic-based
methods, reviewed in the next section, are also used to evaluate possible translations.
Probabilistic-Based and Statistical Approaches
Probabilistic based approaches in monolingual IR use algorithms to predict the probability of a
document matching a query (Baeza-Yates & Riveiro-Neto, 1999). In CLIR, rather than seeking direct
translation, probabilistic-based approaches estimate the probability of a term in the document language
being the translation of a term in the query language (e.g. Romdhane, Elayeb, Bounhas, Evrard, &
Saoud., 2013).
There are two great advantages to probabilistic-based approaches. One is that, once developed,
the systems are able to handle the languages in both directions - it can be used to translate language A
to language B, and from language B to language A without modification. The method was found to be
most effective if used among languages with similar structures, such as among European languages, or
among certain Asian languages (McNamee & Mayfield, 2004). Another advantage is that they are not
language dependent. In other words, once the algorithm has been developed, it can be used on any
language pairs as long as there are sufficient linguistic training materials such as parallel corpora.
Unfortunately, not every language has sufficient lexical resources for such use. Furthermore, with the
the varying sizes and quality of the linguistic resources, different language pairs usually require
individual systems to handle (Franz, McCarley, & Roukos, 1999). These two disadvantages show case
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
the importance of lexical resources to probabilistic-based approaches. Many different types of
resources have been explored for their potential use in probabilistic- based approaches as well as in
statistical-based approaches. These lexical resources include parallel corpora, comparable corpora,
webpages, and other Web resources. The use of these resources in probabilistic- and statistical-based
approaches are discussed next.
Lexical resources for probabilistic and statistical approaches. Parallel corpora are often
viewed as suitable resources for matching terms in one language to those in another because of the
translation-equivalent text they contain (e.g. Cartoni, Zufferey, & Meyer, 2013; Dyvik, 2004). The
aligned texts provide a foundation to construct the correlations between words, and are often used as
training material for statistical machine translation systems such as IBM's fast document translation
system (Franzet al., 1999) that built bilingual dictionaries and translation models using algorithms
automatically learned from aligned texts of parallel corpora, and HAIRCUT (McNamee and Mayfield,
2004). HAIRCUT uses parallel corpora as a basis for a n-gram based statistical model. The model
relies on language similarity instead of direct translation for query term mapping, and was shown to be
highly effective in CLIR testing (McNamee and Mayfield, 2004). Parallel corpora are also used to
augment coverage of existing bilingual dictionaries (Gao, Nie, Xun, Zhang, Zhou, and Huang, 2001);
or combined with other techniques for improved performance (e.g. Ture, Lin, & Oard, 2012;
Azarbonyad, shakery, & Faili, 2012). Researchers are able to use parallel corpora to explore the
relationship among words within the same language, and across languages without resorting to the use
of dictionaries.
But parallel corpora are not without shortcomings. The quality of the translation obtained through
parallel corpora is highly dependent on the scope and domain of the corpora, the vocabulary used in the
text, and the frequencies or word use (Nieet al., 1999; Maedaet al., 2000; McNamee & Mayfield, 2002;
Kraaij, 2001; Rogati & Yang, 2004). Same words may express different concepts when used in
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
different domains. The word “model”, for example, carries different connotations in the fields of math,
physics, and fashion. Parallel corpora in one subject domain may not be an effective translation
resource for documents written for another domain. Furthermore, parallel corpora are hard to come by
and difficult to develop (Ballesteros & Croft, 1998; Zhou et al., 2011; Pirkola et al., 2001; Gao, et. al.,
2001).
In addition to parallel corpora, there is comparable corpora. Comparable corpora are collections
of texts that contain similar content, but are not structurally aligned (Shakery & Zhai, 2013). The
contents in each corpus is written independently, and are not direct translations of each other. The
corpora provide a data source to map natural language lexical equivalents among languages due to the
fact that the texts are written in their individual languages for their respective readers. There are several
projects that uses contents on the World Wide Web as basis to create comparable corpora (e.g. Baroni,
Bernardini, Ferraresi, & Zanchetta, 2009; Schäfer & Bildhauer, 2012). Comparable corpora can be used
to generate multilingual thesaurus (e.g. Sheridan & Ballerini, 1996), or to extract the translative
relationships of words (e.g. Picchi & Peters, 1998; Franz, McCarley, & Roukos, 1999; Sadat, F., 2010;
Prochasson & Fung, 2011).
Be it parallel or comparable, the nature of the corpora makes it so that CLIR systems constructed
using corpora are multi-directional. That is, the systems can translate the words in both languages to
and from each other. However, comparable corpora share the disadvantages of parallel corpora. Though
it is presumed that comparable corpora are easier to obtain than parallel corpora, it is still hard to
develop, or to acquire a large enough set. It is also domain specific with word caring meanings that is is
applicable in one field but not necessary to another. The sensitivity to domain is an advantage when the
lexical sources used to train the statistical translation system and the document collection shares the
same domain (Pecina et al., 2014). Not so when the training corpora and the collection varies in topic.
Because of these shortcomings, alternative resources are tested for their suitability to replace
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
corpora as linguistic resources. For example, the World Wide Web has been treated as potential
resource by many researchers for parallel or comparable texts. Efforts using the Web as the training
resource for a probabilistic based CLIR system include the STRAND system developed by Resnik and
Smith (2003), PTMiner developed by Nie and Cai (2001), and Chiao and Zweigenbaum (2002). In
these systems, software identifies and harvests Web pages of similar content, and use them as parallel
or comparable corpus to train a probabilistic model. Chia and Zweigenbaum (2002), for example,
collected French and English websites on the same topic through the use of a medical thesaurus
(MeSH) and two Internet catalogs of medical websites (CISMeF for French medical websites and
CliniWeb for English medical websites) as the comparable corpus to use in a translation system. In
addition to webpages, other Web resources have also been explored for their potential as translation
material, such as anchor text and link structures (Lu, Chien, & Lee, 2004); search engine results (Zhang
& Vines, 2004; Chen et al., 2004); Web directory (Kumura, Meda, & Uemura, 2004); library online
public access catalogs (OPAC) (Larson, Gey, & Chen, 2003); news archives and blogs (Saralegi & de
Lacalle, 2010), and Wikipedia (Herbert, Szarvas, & Gurevych, 2011).
Instead of finding translations and estimating the probability of a term being the translation of
another, other approaches, namely latent semantic indexing and language models, examine and
calculate the relationship among words. Research on latent semantic indexing and language models are
reviewed in the next section.
Latent Semantic Indexing (LSI) and Language Models
Latent semantic indexing. LSI is a variant of the vector-space model that constructs the word-
word inter-relationship in a vector-space through the use of a set of multilingual documents (Littman,
Dumais and Landauer, 1998). LSI does not rely on external lexicon resources to determine word
relationships. The relationships are derived from a numerical analysis of the initial training data, such
as a set of multilingual documents. The method examines the contexts in which words appear, and
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
creates a multidimensional space with each term represented by a vector. In this lexicon dimension,
words used in similar contexts are located close together. Documents are also represented in the same
vector space, therefore similarities and dissimilarities between words and/or documents can be
determined by the distances among their representations in the multidimensional space.
By exploring the relationships among words and documents, within and across languages, LSI
models are able to retrieve relevant documents even when the documents and the queries do not contain
the same words. Once the vector-space is established, new materials could be added in without re-
establishment or adjustment. The method is entirely algorithmic, and does not need other lexical
resources besides the initial training data, which can be quickly developed. But as with probabilistic
models, the system's performance highly depends on the scope, quality, and domain of the training
material. Words with multiple meanings can cause semantic distortions. LSI is also computationally
expensive, and may be quite costly when dealing with a larger data set (Evanset al., 1998; Moriet al.,
2001).
Language models. Language modeling is used in information retrieval to predict the occurrences
of terms, with no sequential orders, in a document (Ponte and Croft, 1998). Where traditional
probabilistic models estimate the possibility of a document being relevant to a query, language models
assume that users have a general idea of what terms are likely to be found in their target documents.
Given a query, the language model estimates the probability that the query is generated to search for
each of the documents in the collection. The documents with the highest probabilities are presented to
the users as the search result (Lakey & Connell, 2005; Xu & Weischedel, 2000; Xu & Weischedel,
2001; Xu, Weischedel, & Nguyen 2001; Lavrenko, Choquette and Croft, 2002). The language models
can be established through the use of lexical resources such as parallel corpus or bilingual dictionaries.
Language model approaches are based on statistical theories. They are language independent, and
can incorporate the use of additional enhancements, such as document expansion and stemming
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
alternatives, into the system (Larkey and Connell, 2005). However, as with other statistical approaches,
lexicon coverage is extremely important for accurate translation probability estimations (Lavrenko,
Choquette, Croft, 2002); the model still relies on parallel corpus with comprehensive lexical coverage
for effective retrieval results.
However sophisticated the aforementioned CLIR approaches are, they all share one weakness:
the reliance on some kind of lexical resources, such as machine readable bilingual dictionaries or
corpora. Not all languages have such resources readily available. Researchers have come up with
different alternative methods for languages without sufficient lexical resources. Two of these methods
are transitive and triangulation methods.
Transitive and Triangulation Methods
Transitive and triangulation methods are used when there is not enough lexical resource to
establish a system that can directly map language A to and from language B. In this instance, a third
language that has established lexical resources for translations into both languages is used to facilitate
the process (Ballesteros & Sanderson, 2003; Purwarianti, Tsuchiya, & Nakagawa, 2007). These
methods not only allow for CLIR between languages that do not share translation resources; they are
also able to reduce the number of translations that needs to be done when a large number of languages
are involved.
Transitive methods. Transitive methods use a medium language to bridge the translation gap
between two languages. For example, one may wish to build a CLIR system that takes query terms in
Indonesian to retrieve documents in Japanese, yet there is no available machine readable Indonesian-
Japanese dictionary, parallel corpora, or comparable corpora for this language pair. Fortunately, there
are existing lexical resources between each of the languages and English. Therefore, a Indonesian-
Japanese CLIR system may be built using English as a pivot language (Purawarianti et al., 2007).
Terms in Indonesia is mapped to English terms that are than mapped to Japanese terms. Another
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
example uses machine translation systems to translate both queries and documents to an intermediary
language (Kishida & Kando, 2005). In the hybrid system developed by Kishida and Kando (2005), the
query is translated into the intermediary language (query set A), and from the intermediary language to
the document language (query set B). In the mean time, the documents is roughly translated into the
intermediary language. Query set A is used to retrieve a set of documents also translated into the
intermediary language. Query set B is used to retrieve a set of documents in the original document
language. The two sets are merged to form the end result.
Triangulation methods are based on a similar concept. Languages with more lexical and
translation resources are used to provide the connection for languages pairs without direct translation
resources. For example, assume language X and language Y have no direct translation resources. Query
terms in X are translated into two intermediary languages, A and B. The translations in A and B are
then translated into Y. Translations of A are used to retrieve one set of documents, and translations of B
are used to retrieve a second set of documents. The union among the two sets are kept as the final result
(Gollins & Sanderson, 2001).
The use of transitive and triangulation methods make it possible to include language pairs
without direct translation resources in a CLIR system. Used in combination with other techniques, such
as query structuring (Ballesteros & Sanderson, 2003), they can be effective. But as with all other
methods, there are weaknesses to these methods as well. The approaches depend on the translation
quality and contents of the intermediary languages. Errors could be introduced if no common words
were found in the translation sets or if non-intersecting but essential query words were dropped form
the translation (Gollins and Sanders, 2001; Ballesteros and Sanderson, 2003)
Other Design Issues
As retrieval technology continues to improve, more attention has been given to interface design
and supporting features. Studies has found that although users still demand improvement on translation
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
quality, they also found support features helpful. Such features include phrasal recognition and
translation features, assistance in query formulation, the ability to edit or choose translation terms, and
translated summaries of search results (Marlo et al., 2008; Petrelli, et , 2004; Wu, He, & Luo, 2012).
Wu et al. (2012). especially, argues for the importance of understanding the users and making the
functions and interfaces of a CLIR system the central part of design. A sample of research on CLIR and
user interaction is included in the next segment of the literature review.
Summary of CLIR Literature Review
While the review above does not provide a comprehensive coverage of CLIR technologies (see
Zhouet al., 2012 for a detailed discussion), it covers the main strands of research and demonstrates the
efforts invested into improving the effectiveness of the systems. Most of the research involves the
development of new technologies, the combination of existing technologies, the construction or use of
resources, and the refinement of systems. The act of retrieval is confined to the process of matching a
query to a set of documents. With the emphasis on system, user's profiles, needs, actions, and decisions
are treated as fixed. Yet “if we are to design useful machines, we must understand the process(es) by
which those machines will be used” (He, Oard, & Plenttenberg, 2006, p.2). In recent years, system
designers and researchers are beginning to see the importance of understanding user behavior in system
design, and research has been conducted to see how exactly would users use CLIR systems and what
factors influence their behaviors.
CLIR and Bilingual Users
As the last section demonstrates, much of the research in CLIR has been focused on the
performance of systems that is often measured by precision, recall, and translation quality. It cannot be
forgotten, however, that systems are designed for users. “To be effective, an information system has to
be faithful to a real context and in keeping with the use the end-user will make of it” (Petrelli et al.,
2004). To build an efficient CLIR system, it is important to ask: Who are the users? What do they think
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
about CLIR? How would they use a CLIR system? Do the users come from a homogenous group, with
the same need for certain technical support? Or are they heterogeneous and different user groups
require different CLIR features?
Knowledge about the users would better guide system design by identifying the features and
supports that users need, whether it is full text translation, query formulation assistance, or better
interface design, and funnel resources to where they are most required. The following paragraphs
review current studies that seek to understand the users and their interaction with CLIR systems. The
first segment establishes that users search for information in different languages due to practical
reasons. The second segment summarizes studies on CLIR users' information seeking behaviors and
strategies. A conclusion segment would summarize the literature review findings.
The Occasions for CLIR
There is noticeably fewer studies about users and CLIR than on CLIR technology. While each
research has its own focus and research questions, viewed together, the use of different sample groups
in existing research points to the fact that CLIR users are heterogeneous in nature. For example,
Petrelli, Hansen, Beaulieu, Demetriou, and Herring (2004) involved ten participants from four different
professions (journalists, translators, business analysts, and librarians) in the user-centered design
process of a CLIR system to observe what multilingual support they need. He, Oard, and Plenttenberg
(2006) synthesized three user studies involving 20 native English-speaking academics as participants
conducted during the design of a CLIR system. Rieh and Rieh (2005) interviewed English-Korean
bilingual academic users of a Korean university on their professional and personal Web use. Marlo,
Clough, Recuero, and Artiles (2008) recruited 12 computer science postgraduate or researchers of
different nationality, and observed how they use Google Translate's search function for a prescribed
task. Artiles, Gonzalo, Lopez-Ostenero, and Peinado (2006) observed 22 native Spanish speakers’
attitude and responses when they are asked to search for images using FlickLing, an image database
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
with CLIR search modes. Aula and Kellar (2009) recruited ten participants who use at least two
languages to search the Web, and described the decisions involved in the search sessions. Kralisch
(2005) studied the logs of an international health database and surveyed international students based in
German and Malaysia using an online questionnaire to study the cultural and linguistic impact on users.
Brazier and Harvey (2017) asked 10 PhD students who are non-native English speakers to search for
government services.
There are also a few studies that stood out for having larger sample sizes. Wu, et al. (2011)
collected 358 survey results, Clough and Eleta (2010) obtained 514 questionnaire responses, and
Steichen, et al. (2014) surveyed 448 participants. The participants of these three studies are largely
recruited from within the academia. Nevertheless, combined with the previously mentioned studies,
together, they show that potential CLIR users could be from different nations, speak different
languages, come from different fields and professions, have different information needs, encounter
various difficulties during information seeking, and use different strategies to search the World Wide
Web. As varied as the population and information seeking tasks, these users share the recognition that
they need to search beyond their native language for information resources that would fulfill an
information need. Sometimes, a foreign language is treated as a default search language for certain
tasks. Research has found non-English users resorting to English as their search language in the belief
that it would yield the most results (Aula & Kellar, 2009; Nzomo, Rubin, & Ajiferuke, 2012, Steichen,
et al., 2014) or “find everything” (Artiles et al., 2007, p.10). In some cases, a language may be viewed
as the dominant language for a specific profession or subject field. For example, English is seen as the
dominant language in technology and the sciences (Petrelliet al., 2004; Clough & Eleta, 2010; Artileset
al., 2007; Rieh & Rieh, 2005; Aula & Kellar, 2009; Steichen, Ghorab, O’Connor, Lawless, & Wade,
2014, Steichen, et al., 2014), German and French were found to be the dominant search language for
philosophy, and French that of the field of museum studies (Clough & Eleta, 2010).
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
And yet, users are not rigid in their language use. Polygots use multiple languages when
browsing or surfing on the Internet. Their language of choice changes depending on the situation and
the nature of the search tasks (Steichenet al. 2014). This language and information source switching
occurs because “...collections searched by the search engines are often region-specific and lack a
comprehensive understanding of the environment in which they operate” (Chung, 2008, p.36).
Although major search engines such as Google, Bing, and Yahoo let users search non-English
information resources, research have found that they do not have adequate coverage of domain- or
region-specific resources, and user's search results suffer as a consequence (Chung, 2008; Aula &
Kellar, 2009). Users, either through experience or presumption, understands the limits to each search
engine and would switch language and/or search engine based on the nature and purpose of their
information seeking session (Aula & Kellar, 2009; Clough & Eleta, 2010; Rieh & Rieh, 2005; Hong,
2011, Steichen, et al, 2014). For instance, academic researchers are observed to search in databases in
non-native languages to increase recall (Rieh & Rieh, 2005; Wu, He, & Luo, 2012; Clough & Eleta,
2010). Outside of the academic field, users are found to prefer local languages for cultural, historical,
linguistic, sight-seeing, and geographical information searches, even if they are not native speakers.
People living abroad have been observed to search for information resources, such as local news, in the
local language of their work place or school (Aula & Kellar, 2009; Rieh & Rieh, 2005; Hughes, 2005;
Hong, 2011; Nzomo, Rubin, & Ajiferuke, 2012, Steichenet al., 2014). For example, international
students living in the United States are found to prefer English as their query language and search on
English-based search engines for information pertaining to daily lives (Hong, 2011).
In these cases, users are aware that they need to use different languages to search in different
regional search engines in order to find the information they need. They choose the search engine and
query language based on their knowledge or assumptions of the information resources. These users see
the Web as segmented by languages, and deal with the fact by searching in each necessary segment. For
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
them, information seeking involves not only recognizing an information need and formulating the
query term, but the user also needs to make assumptions on the origin of the information resources, and
decide on where and in what language to search for them.
A user's preferred language may differ from the language they need to use in an information
searching session if they wish to obtain optimal search results. If a user does not know which regional
search engine or language to search in, they may not be able to find the most relevant information
resource. Although the World Wide Web has largely reduced the geographic boundaries of information
resources and make it possible for users to access information resources from around the world,
language boundaries still exist. How, then, are users searching for information resources in or across
various languages once they recognized the need to do so?
Multilingual Information Seeking Behavior, Language Proficiency and Domain Knowledge
Searching in a non-native language is a more involved process than searching in a native
language. For one, after deciding on a resource and the language to employ, users need to translate the
search terms into the document language before they enter it into the search engine. It is an extra step in
the query formulation process and could be challenging for users who are not proficient in the
document language. Kralisch (2005), for example, hypothesized that using unfamiliar languages to
search requires more cognitive effort. The requisite cognitive effort might lead users to avoid using
information resources of a language they are less versed in, thereby missing on potentially relevant
information.
When a user persists regardless of the language challenges, deficiency in language skills can
hamper their efforts to come up with a query term. Some users were observed to use creative methods
to counteract this situation. For example, users use advanced features of search engines, such as
preferred language settings, to restrict search results to web pages constructed in their target languages
(Hughes, 2005; Aula & Kellar, 2009). They may use search engines as a language tool to look up words
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
or the correct phrases in their non-native language (Aula & Kellar, 2009), or they may make use of
online translation tools, such as Google Translate, to translate the query terms into the document
language (Hughes, 2005; Nzomoet al., 2012; Ruiz & Chin, 2010).
Furthermore, the ability to read does not necessarily translate to the ability to actively come up
with a search term. Users who are able to read documents written in a foreign language may not be able
to formulate queries in that language unaided (Clough & Eleta, 2010) and still depend upon translation
assistance some systems provide (Artilleset al., 2007). Lack of language proficiency not only impedes a
user's query formulation ability, it also erodes user's confidence, hinders information seeking efficiency,
and requires extra effort on the users to discern relevant information resources (Rao & Varma, 2010;
Ruiz & Chine, 2010; Marlo et al., 2008; Petrelliet al., 2004; Clough & Eleta, 2010; Artileset al., 2007;
Srinivasarao, 2010; Nzomoet al., 2012). For example, Petrelli et al., (2004) found that users who are
less familiar with the document language would open the document to read its content in order to judge
its content, whereas users who are more fluent with the language can make similar judgment from the
title alone. The effort involved in searching in an unfamiliar language may be too demanding for some,
and deter them from accessing information resources of non-native languages (Kralisch & Brerendt,
2005; Wu, He, & Luo, 2011). Information seekers may rely on online translation tools, such as Google
Translate, to translate a document into a language they are more fluent in, even though they are not
happy with the machine translation result (Wu, He, & Luo, 2011). It is evident that language
proficiency is an impact factor that influences users' CLIR behaviors and capacity.
The effect of insufficient language skills can sometimes be mitigated by the depth of a user's
domain knowledge. Users who are familiar with a subject is usually more conversant with the technical
terms and jargons used in a field. They are able to more effectively come up with query terms in the
document language and peruse the search results (Kralisch & Berendt, 2005; Kralisch, 2005; Heet al.,
2003; Gaspari, 2004). Kralisch (2005) found users with low language proficiency but high domain
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
knowledge more likely to have similar success rate seeking for information in the subject domain as
native speakers.
From these cited studies, it appears that without sufficient subject domain or language
proficiency, users are likely be limited to searching within a language they know, and overlook
potentially helpful information resources. For these users, search engines that support CLIR features or
CLIR systems might be able to help them bypass the language barrier on the Web. Though there have
been test systems such as Clarity (Petrelliet al., 2004), and test features such as Flickling (Peinado,
Artiles, Gonzalo, Barker, & Lopez-Ostenero, 2008) that supports CLIR on the online photo-sharing
repository Flickr (www.flickr.com), such features are hard to come by in publicly available search
engines. One exception is Google with its Google Translated Search that was launched in 2007 and
disabled in 2013. The next section would review studies on how users react to and use CLIR features
and systems.
The Use of Existing CLIR Features and Systems
Google Translated Search was designed to help users of any language proficiency levels search
across information resources in the supported languages (full description of Google Translated Search
is provided in Chapter 1). Marlo et al (2008) found that users are largely unaware of it; participants
learned about Google Translated Search only after they were enrolled into the study. Most participants
in Marlo et al. (2008) were not able to envision using CLIR features on their own, even though
international users in other studies have indicated that they would appreciate the ability to read
documents or search for information in their native or active languages (Nzomo, Clough, & Dance,
2011; Srinivasarao, 2010). The researchers wondered if such response was a result of the experimental
setting, and that users would find uses for the feature for real life information tasks. There have yet
been any studies that confirms the supposition, and the disabling of Google Translated Search due to
low usage seems to suggest otherwise. It appears that even when users are presented with a CLIR tool
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
that would streamline the search process by making it possible to search across languages, they did not
immediately incorporate it into their information seeking tool kit.
Summary of Literature Review on CLIR and Bilingual Users
To briefly summarize, existing studies on users' CLIR behaviors are often conducted with smaller
samples of different user groups. Viewing them together, even without the statistical power for
generalization, there are common observations about users' general behaviors. Yet even with data
collected from the different groups of participants, I propose that there are still large groups of bilingual
users, especially ones not within academia, that have not been studied and might show different
information seeking patterns.
The reviewed literature show that users do indeed search for information across languages and
that it is a rather common practice among bilingual users. There are several ways a user approaches the
task of selecting language to use when information seeking on the Web:
1. Confined to using certain language(s) because of one’s language skill limitations,
although language skill can sometimes be supplemented by domain knowledge.
2. Recognize the fact that the Web is fragmented by language, and that each fragment
contains information resources of varying quality on different subjects. Identify the
information need and the purpose of the search, choose the information resource
accordingly and use the language of the information resource for searching, even if it is
a second language.
3. Use a language that is most suitable for the subject matter at hand.
A user with sufficient language skill to choose among different information resources, he or she
chooses the language based on what he or she thinks would be most effective in retrieving relevant
information. The language choice is not a personal preference, but a decision arrived after weighing
various constraints.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
What would happen if it is no longer necessary to pre-select a language based on what is
available and how well one speaks a language? What language would a person use if they are able to
access information resources in any language they want? How would they choose among the
languages? What are the impact factors? In the conclusion of their study, Wu, He, and Luo (2011) listed
their most important finding is that:
…many attributes of users can impact users’ needs and expectations with regard to multilingual
information processing in digital libraries… For example, the languages that the academic users
speak and their countries (thus their environments) can significantly change their motivations,
behaviors, and expectations of multilingual information in digital libraries. (p.194)
This dissertation set out to explore some of these user attributes to expand upon current
understandings of CLIR users' behaviors and preferences. Kralisch (2005) found that using less
familiar language results in higher cognitive load. This study would observe whether the higher
cognitive load effects users' language preference, and if there are other impact factors that would sway
user's language preferences for online information resources.
Furthermore, this study would continue the investigations on how language preference shapes the
users' information seeking behavior. Aula and Kellar (2009) showed that users choose search language
based on the quality of search results. This study takes away the differences in search result contents in
order to observe users' reaction and response to language alone.
The design of this proposed study is also different from previous research. Researchers have
studied users' CLIR behavior through the use CLIR systems and features (e.g. Petrelliet al., 2004, and
Marloet al., 2008), prescribed tasks (Hong, 2011, and Artileset al., 2007), and interviews about users'
Web searching activities (Reih & Reih, 2005, and Aula & Kellar, 2009). These studies observe a range
of factors that impact user behaviors, from reactions to search task (Reih & Reih, 2005; Hong, 2011;
Aula & Kellar, 2009), to language proficiency (Aula & Kellar, 2009; Marlo, et al, 2008). This study
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
extends from these research, but with a narrow focus on the act of language selection. Although the
sample size of this study follows the examples of previous studies and is kept small, the diverseness of
the participants would add to the rich narrative of users and their language selection during an
information seeking process.
Information Seeking Behavioral
Information seeking behavior (ISB) is the other main thread in this research as users’ language
choices are examined in the context of an information seeking session.
ISB has been a major subject of interest in the library and information science field. At first
glance, the act of information seeking appears to be straightforward: a person has a need for
information, looks for the information from different sources, gets a list of result and is either satisfied
or is disappointed. Yet look at it closely, and one would find it to be a complex process that requires
deliberate thought and decision making. Throughout the process, users are often contending with
myriad of variables that could be psychological, sociological, or cognitive.
The complexity of ISB is demonstrated in Wilson's (1997) attempt to provide a thorough model
that accounts for all aspects of information seeking behavior. Wilson based his model on theories in
information science field and expand it outward to include “the study of personality in psychology; the
study of consumer behavior; innovation research; health communication studies; organizational
decision-making; and information requirements in information systems design” (Wilson, 1997, p.551),
and noted the possibility of applying mass media and communication studies into the model as well.
Not all models are so encompassing. The more common research approach to tackle a subject as
intricate as ISB is to focus on one type of the behavior or on a specific aspect of it. The following
literature review provides an overview of the classic ISB models. The purpose of this review is to
establish the importance of examining language preference as an integral part of ISB. The review
shows that ISB is usually viewed as a monolingual activity. Although language is one of the important
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
factors that influence user's information seeking pattern, it has not been studied for its impact on the
behavior. This proposed study will fill this knowledge gap.
Information Seeking Behavior Models
In the 1980's and 1990's, several influential models were developed to describe what is involved
in ISB. The models can be categorized into three groups. The first group focuses on the nature of
information need and how it leads to information seeking behavior. Such models include Taylor's
(1967) four stages of information need, Belkin's ASK (Belkin, Oddy, & Brooks, 1982), and Dervin's
(1983) sensemaking theory. The second group describes the information seeking process which
involves the recognition of the information need, the formulation of the query, and the resulting act of
finding potentially relevant information. Well cited models within this category include Ellis (1989),
Kuhlthau (1991), and Wilson (1996). The third group incorporates user's interaction with the
information retrieval systems into the model. Examples include Ingwersen's (1996) cognitive
information retrieval model and Saracevic's (1996) information retrieval process model. The following
paragraphs provide an overview of the models in each group.
Models exploring information need. Models in the first group describes how an information
need emerges, what propels a person to act upon it, and what is generally involved in the act. Noted
studies and models in this area include Taylor's (1967) four types of information need, Belkin's ASK
(Belkin, Oddy, & Brooks, 1982), and Dervin's (1983) sensemaking theory.
Dervin's (1983) sensemaking theory theorizes that as individuals move through time and space,
they would encounter discontinuities, or “gaps”, in their reality. The gaps lead to questions, confusions,
and angst. A person would need to “make sense” of the situation, construct for themselves the uses of
the new sense in order to move on. In this model, information is highly subjective - it is a product of
human observation colored by the observer's existing knowledge and experiences. Information need
arises from a person's desire to resolve the discontinuities he or she experiences in reality.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Belkin, Oddy, and Brooks (1982) view the occurrence of an information need as an anomalous
state of knowledge (ASK). In ASK, information need is the result of a person recognizing an anomaly
in his/her state of knowledge concerning a subject or a situation. Initially, the disruption in the
knowledge store is difficult for the user to specify. Users are often unable to describe the information
they need to resolve the anomaly. Throughout the information seeking process, the information need is
recognized, disambiguated, formulated into a statement, and presented to an information system in
accordance to what the user anticipates the system can deliver. The evolution of the information need is
similar to the four levels of information need identified by Taylor (1967):
Q1 – the actual, but unexpressed need for information (the visceral need);
Q2 – the conscious, within-brain description of the need (the conscious need);
Q3 – the formal statement of the need (the formalized need);
Q4 – the question as presented to the information system (the compromised need). (p.182)
Visceral need is the first level in the question formation process in which user senses, consciously
or unconsciously, a need for more information. It is similar to the “non-specifiability of need” (Belkin,
1980) described in ASK. The need changes in “form, quality, concreteness, and criteria as information
is added” (Taylor, 1967, p.182) and arrives at the second level, the conscious need. At the second level,
the person develops a mental description of the need, but it is still ambiguous and not well defined. At
the third level, the formalized need, the person is able to formulate and describe the information need in
concrete terms. At the fourth level, the compromised need, the information need is modified in
anticipation of the potential information resources that the information system in use can deliver.
Sensemaking theory, ASK, and Taylor's four stages of information needs exemplify models that
focus on the cognitive development of an information need. Once the hazy sense of something being
amiss has been tuned into an expressible statement and question, the user is likely to engage in a series
of actions that hopefully resolves the situation. The second group of models describes the behaviors
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
involved in this process.
Information seeking behavior models. Information seeking behavior models distills the steps
that are taken in general information seeking processes to resolve the information need.
Ellis (1989) observed academic social scientists' information seeking patterns and identified six
features between the start and end of the process: starting, chaining, browsing, differentiating,
monitoring, extracting, verifying, and ending. The order of the features is not fixed but changes
according to the user and the circumstances of the information seeking activities. In this view, the
process of information seeking is flexible. It reacts and adapts to specific situations. With the sequence
and combination of the features being changeable, the model accounts for tasks with a fixed beginning
and an end, as well as ongoing monitoring situations. This model focuses on user behaviors; external
variables, such as context and situation, are discussed but does not receive extra attention. It also does
not cover the cognitive or affective aspects of ISB. In comparison is Kuhlthau's (1991) model that is
known for its inclusion of the cognitive (thoughts) and affective (feelings) realms alongside the
physical (actions) realm.
Kuhlthau's model “describes common patterns in users' experience in the process of information
seeking for a complex task that has a discrete beginning and ending” (Kuhlthau, 2005, p.230). The
model identifies common stages that users' go through: initiation, selection, exploration, formulation,
collection, and presentation. The often recursive and iterative process hinges upon four criteria: the
amount of time the user has, the nature of the task, the amount of personal interest, and the information
resource that is available. The model accounts for the cognitive and affective consequences of the
physical actions. As the users continue, their cognitive states evolve from vague to focused, their
thoughts develop from full of ambiguity to specificity and increased interest, and their feelings change
from confusion and frustration to confidence and satisfaction or disappointment.
From the model, Kuhlthau (2005) proposes the conceptual premise of uncertainty principle for
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
library and information services and systems, drawing attention to uncertainty “as a natural, essential
characteristic of information seeking as a sign of the beginning of learning and creativity” (p.233).
Uncertainty is first experienced when users notice the gap in their knowledge or understanding, and
then when users are unable to express clearly the information they seek. Uncertainty often manifests
into frustration and doubt. Another model that factors in the affective realm is developed by Bystrom
and Jarvelin (1995).
Noting the personal factors of attitude, motivation, mood, etc., Bysrom and Jarvelin (1995)
developed their model based on how task complexity impacts user behavior. The researchers identified
five task categories that ranges from automatic information processing tasks to genuine decision tasks.
At one end is automatic information processing tasks. These are tasks that are structured, with defined
perimeters, and has no case-based arbitration. These types of tasks are routine and can be automated.
On the other side of the spectrum is genuine decision tasks. These tasks are unfamiliar, unexpected, less
defined, and unstructured. For genuine decision tasks, neither the process nor the information
requirements can be defined beforehand. Between these two task categories are, in order, normal
information processing task, normal decision task, and known, genuine decision task. From automatic
information processing tasks to genuine decision tasks, each category requires more case-based
arbitration than the category before.
Corresponding to the different task types are two types of information needs: information need in
problem formulation, and information need in problem solving. To satisfy the information needs, there
are three kinds of information: problem information that describes the problem; domain information
that consists of facts and data in the subject area; and problem-solving information that describes how a
problem should be formulated, as well as what and in which manner domain information should be
used to solve the problem.
In Bystrom and Jarvelin's model, the process of information seeking and the type of information
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
resources sought are decided by the nature of the task, the situational factors, and the user's personal
factors. In the model, the information need arises when a lack in certain knowledge is encountered
during a task performance. The person decides upon an action based on “the needs, the perceived
accessibility (whether cognitive, economic or physical) of information channels and sources and the
personal information seeking style which evolves on the basis of successfulness of attempted actions”
(Bysrom & Jarvelin, 1995, p.8). The researchers found that task complexity has a direct impact on the
complexity of information needed, the needs for domain and problem-solving information, and the
number of sources needed.
Rather than focusing on tasks, Savolainen (1995) provided a framework to study information
seeking that occurs in non-work related everyday life. In everyday life, people are confronted with way
of life - the order of things, and the need for mastery of life - to keep things in order. In this context,
users may seek information in order to maintain, restore, or construct way of life or mastery of life. As
a result, people often seek information for three reasons: health, consumption, and leisure. Like
Bystrom and Jarvelin's (1995) model, the specific projects (tasks), situational factors, and personal
factors are all important in Savolainen's model. Personal factors, including values, attitudes, material
capital (e.g. financial resources), social capital (e.g. contact networks), cultural and cognitive capital,
and current situation of life forms a person's basic equipment to seek and use information. Savolainen's
(1995) everyday life information seeking framework reveals the complexity of information seeking
behavior, and the influence of each variable (such as income and education level) alone and in
convergence.
The complexity of ISB is demonstrated in Wilson's model (Wilson, 1997; Wilson, 2005) that
includes many theoretical basis from various fields. As Kuhlthau (2005) based her model on
uncertainty, Wilson's (1996) model uses stress and coping as its theoretical basis. There are five types
of information needed: new information, information to clarify, information to confirm the information
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
held, and information to confirm beliefs and values held (Wilson, 1996). Information is sought out for
the goals of: orientation (discover what is happening), reorientation, construction (to form an opinion or
solve a problem), and extension (to build knowledge on a subject). A major part of Wilson's model is
the adoption and incorporation of intervening variables from various fields in his model. These
variables that may pose a barrier to the activating mechanism of ISB include psychological,
demographic, interpersonal, environmental, and source related ones. The variables are rooted in health
information, psychology, marketing, and other fields. Through the construction of the model, Wilson
encourages information scientists to explore other disciplines for research ideas. He states that there are
“analytical concepts, models, and theories that need to be absorbed into information science as a matter
of urgency” (Wilson, 1997, p.570). Such is the angle of this study in examining information seeking
behavior from the view of language choice.
Information seeking and retrieval models. The third type of model extends from user's
information seeking process to include the design and features of an information system. Rather than
focusing on information seeking behavior, these models are developed to place users within the
information retrieval (IR) process.
Information seeking involves a person's explicit effort to locate information. There are many
different types of information seeking behavior that may engage many different types of information
resources such as other people, books, archives, or the Web. Information retrieval is a type of
information seeking behavior and it describes the process of retrieving potentially relevant information
from a digital document collection using an information retrieval (IR) system. The process is initiated
when the user enters a query term into the system, and ends with the system presenting a set of
documents (usually text, but could be other types of media) ordered by relevancy as judged by the
system to the user. The system is usually the critical element and the focus of IR discussions. Models
developed under the tradition of IR shows the shift of emphasis from users in the ISB tradition to the
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
system. Examples of this type of models include Ingwersen's integrative framework (Ingwersen, 1996),
and Saracevic's (1997) stratified model.
Ingwersen's (1996) is an information seeking and information retrieval model that reflects not
just the cognitive process of the information seeker, but also the technology behind the information
system, including the information space of the information retrieval system, the information retrieving
algorithms and the interface through which users interact with the system. Alongside user's cognitive
space and the social and environmental variables, Ingerwersen included into the model the nature of the
information objects (such as the way knowledge is represented), the setting of the information retrieval
system (including search language, IR techniques, etc), and the interface design as important elements
in the information seeking an retrieval process.
Similarly, in Saracevic's (1997) model, user and computer are the two entities that interact
through an interface. Users come to the interaction with existing knowledge store, an understanding of
the situation and environment in which the task is originated. The computer system is equipped with
system specific resources that makes up the engineering and processing levels, and informational
resources that make up the content level. Interaction between user and computer is initiated through the
interface, and ripples through the typology, occurring sequentially in connected levels.
Summary of Literature Review on Information Seeking Behavior Studies
Three types of ISB models are discussed above. The first group focuses on the nature of
information need, the second group describes the process of information seeking, and the third group
incorporates IR systems into the process. Each of the models has its own perspectives and approaches.
Each of the model is different in complexity, topology, and terminology. All of them, however,
recognizes the impact of situational, personal, and other contextual variables to a user's information
seeking and retrieval process. These variables may be implicit as in Ellis' (1989) model, or featured
prominently as in Wilson's (2005). The influence and importance of these variables are undeniable.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Researchers have isolated individual variables to study its impact on users and the ISB process.
Individual variables such as task (Li & Belkin, 2010), context (Kelly, 2006), gender (Hupfer & Deltor,
2006), demography (Gray, Klein, Noyce, Sesselberg, & Cantrill, 2005), and situational factors ( Rieh,
2004).
Although no language related variables were mentioned in the models, language proficiency,
language preference, and other language related variables may very well be a part of the personal factor
discussed by Ellis (1989), has an impact on the information resource available to the users as discussed
by Kuhlthau (1991), be related to the mastery of life in Savolainen's (1995) everyday life information
seeking, be an element of the psychological and demographic intervening variables in Wilson's model
(1997), and occupies the cognitive space in Ingwerson's interactive model. Multiple studies in the
context of MLIA or CLIR has shown that language use is, indeed, a part of information seeking
process. Its impact is seen to manifest in many ways, such as resource availability, or user's query
construction capability (Aula & kellar, 2009; Kralisch & Berendt, 2005). However, these studies and
the ISB models appear to stand apart. MLIA studies seldom reflects ISB findings, and ISB models have
not discussed the impact of bilingualism or multilingualism.
As Wilson (1997) argues, information seeking behavior is an interdisciplinary subject and should
be studied so. Bilingual user's information seeking is a complicated matter that stretches across many
different fields, including bilingualism, information seeking behavior, and information retrieval. There
are many elements that should be examined in depth in order to understand bilingual user's ISB in full
and how to best support them. This proposed study focuses on user's language choice, and explores the
impact of language related variables on a bilingual user's information seeking patter. It takes on but one
small corner of a large and complex issue. However, it is the beginning of deeper understanding. This
exploratory study does not seek to fit user's language choice into any model, but to understand the
shaping of the choice. Potential frameworks and models are explored but would need to be studied
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
further in the future for applicability.
Bilingual User's Language Choice and Language Use
Although bilingual users have received little attention from CLIR or information behavior fields,
bilingualism and multilingualism have received extensive attention in fields including linguistics,
cognitive science, neuroscience, psycholinguistic, and education. As varied as the fields that study it,
the subject is examined from various perspectives that range from how to educate a bilingual child
(Baker, 2011), how languages influence and change the speaker's language patterns (Backus, 2005), to
how policy influences language choice (Heller, 1992). The wide breadth of bilingualism and
multilingualism research is exemplified in the four parts of The Handbook of Bilingualism and
Multilingualism (Bhatia & Ritchie, 2013). This study looks at the use of language as a part of the
information seeking process. The emphasis of this review on bilingualism studies is therefore placed on
their language use, language choice and preferences.
It has been established that:
…at any given point in time and based on numerous psychosocial and linguistic factors, the
bilingual has to decide, usually quite unconsciously, which language to use and how much of the
other language is needed – from not at all to a lot. (Grosjean, 2008, p.2)
At any given point of time, a language is chosen as the base language for main use, and when needed,
other languages are brought in to use as the guest language in the form of code-switching or borrowing
(Grosjean, 2008). In code-switching, the speaker shifts to the guest language for a word, a sentence, or
more, whereas in borrowing, a phrase or sentence from a guest language is adapted
morphosyntactically to the base language. For this study, it is the complete switch to a different
language (the act of code-switching) and the factors influencing the language selection process that is
of relevance.
This literature review begins by introducing the term code switching, and identify its
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
applicability to this study. The following segments provide brief overviews and some example studies
of selected subject areas that pertains to the issue of why and how language choices were made. The
focus is on the user's approach to language choice. Potential impact factors are extracted for closer
examination in this study. The mixing of multiple languages in literature, as seen in War and Peace, or
other media, such as advertisements or news articles, are not included.
Code Switching
Before the discussion of bilingual speakers' language preference and use, it is prudent to
introduce the concept of code switching (also written as codeswitching or code-switching). The term
“code” is used to refer to a human language such as English, or linguistic style used within a
communication session, such as regional vernaculars (Nilep, 2006). Code switching occurs when
multiple languages are purposely used by one person within one conversation. The alteration of
languages could be within a sentence (intrasentential switching, sometimes referred to as code mixing)
or between sentences (intersentential switching). Observations of bilingual users' information seeking
behavior in the CLIR and MLIR fields show that the alternate use of languages in information seeking
situations is more similar to intersentential mixing, rather than inrasentiential mixing. In other words,
users are observed to use one language at a time for a search, and words used in one search are
composed in the same language. However, users may conduct several searches each using a different
language within a search session (Peinado, Artiles, Gonzalo, Barker, & Lopez-Ostenero, 2008; Nzomo,
Rubin, & Ajiferuke, 2012; Hong, 2011).
Code switching could occur in writing, but more often than not, the phenomenon is observed
during verbal communication. Code switching, like bilingualism, is studied from many different
perspectives such as the “syntactic or morphosyntactic constraints on language alteration” (Nilep, 2006,
p.1), the effectiveness of using multiple languages in the classroom (Moschkovich, 2007), the
grammatical properties when code switching occurs (MacSwan, 2012), the neurological undertaking of
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
switching languages (Meuter & Allport, 1999), or the social-cultural relationships between speakers
that triggers language switches (Saville-Troike, 2003). For this study, it is the why that matters.
Code switching can occur to either fill a linguistic/conceptual gap (Greene, Pena, & Bedore,
2012), or for more intricate reasons that are usually contextual to the social situation in which a person
finds oneself. Grosjean (1998) listed language proficiency, language mixing habits and attitudes, usual
mode of interaction, kinship relation, socioeconomic status, the nature of the message, the function of
the language act to be among factors that would make a person more susceptible to using multiple
languages at the same time. Many other linguistic studies approach code switching from the
perspectives of pragmatism and sociolinguistics (Androutsopoulus, 2013). For example, many studies
investigating the cause for code switching point to the use of language to establish social status, self-
pride, and prestige (Rezai & Cheitanchian, 2008). The social aspect is not applicable to an online
information seeking act when the user is interacting with a database or a search engine. Lacking the
intrinsic social dimension, the alternate use of language in an online information seeking session cannot
be described as code switching, but should be viewed as a straight forward switch of language.
However, research on why code switching occurs are still included in the literature review for the
insights they offer in a bilingual user's tendency to choose one language over another in a given
situation.
The Impact Factors of Language Choices
Of the many questions asked by bilingualism researchers, when and why does one choose to
speak a language are most relevant to this study. When people need to communicate or express their
thoughts, how did they choose which language to use? The decision is complex and involves many
linguistic, psychological, and social factors such as a person's language skill, identity and purpose, as
well as the social setting and the situation in which a conversation takes place. This section would
review research that explores and describes these factors through the lens of models that were
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
developed to examine the factors that influence a person's language choice. Included are Blom and
Gumperz's (1972) categorization of language switching occurrences, Fishman's (1965) construct of
domain, and Hakuta and Dandrea's (1992) discussion of language attitude.
Situational switching and metaphorical switching. One often cited framework that studies
language choice is developed by Blom and Gumperz (1972) which categorized the occurrence of
language alternation into two types: (a) situational switching, and (b) metaphorical switching.
Situational switching happens when a societal consensus was reached on what languages to use based
on the topics, situation, participants, and location of the conversation – what Scotton and Ury (1977)
calls the cluster. In situational switching, the language serves to define the existing context therefore
any change in the cluster might trigger a language switch. Metaphorical switching takes place without
changes in the cluster. The speaker uses a language that is outside of the social situation's agreed upon
languages in order to draw attention, emphasize, or add connotative meanings to parts of the dialog.
Another influential framework is the markedness model of Myers-Scotton (1983) that views the choice
of code as a way for the speaker to evaluate and index the rights-and-obligations sets that exist between
the speaker and addressee. Both frameworks view the use of language as a social function that changes,
sends a message about, or confirms a social construct.
Social interaction is also an important element in Fishman’s model (Fishman, 1965). His
construct of domain describes the social interaction in four aspects: purpose, topic, role-relationships,
and setting.
Fishman's Domain. Not to be confused with “subject domain” that is often used in IR or library
and information science that suggests a topic, domain is a concept that refers to the social-cultural
construct in which a conversation takes place (Saville-Troike, 2008). The construct is abstracted from
the setting, such as church, home, and school; purpose, such as official business and small talk; topic,
such as work and religion; and the role-relationships of the interlocutors, such as priest-parishioner,
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
mother-child, and teacher-student. Although individually defined, the four factors are often intertwined
and hard to untangle. For example, the topic of a conversation (e.g. sales transaction) can be highly
related to its purpose (e.g. business) which can also determine the setting (e.g. a formal business
interaction in an office). In another example, a boss and his employees may use one language to discuss
work related topics in the office as employer and employee, and another to ask about each other's
family as friends. Therefore, their role-relationship is dependent upon the setting and topic of the
conversation. As with the framework proposed by Blom and Gumperz (1972), domain is a social-
cultural construct. Of the four facets, only topic and purpose are germane to this study.
Purpose. In information seeking and CLIR research, purpose refers to the goal of an information
seeking task (e.g. personal, professional, etc). In bilingualism, purpose refers to the goal of the
conversation. The purpose of a conversation may be to conduct business, to build relationships, to
exchange pleasantries, etc. Sometimes, a language is chosen to assist the progression towards the
purpose. For instance, a sales clerk may use the customer's native tongue to provide better service
during a transaction. One can also view purpose through Saville-Traoike's “genre” which categorizes
talk by why it occurs (e.g. negotiation, war talk, etc) (Saville-Traoike, 2008). Sometimes, the choice of
a language may serve a purpose that is independent from the content of the conversation, such as
expressing an affinity with a population, a heritage, or culture. Mills (2001), for example, interviewed
ten mother and child pairs who are of Eastern Asian heritage and resides in England over the span of
two years regarding language use in daily conversations. She found that languages are crucial in
conveying the core values of religion as well as cultural and community affiliation as they are passed
down from parent to child. Language is used to help form the child's sense of ethnic identity. Similar
observations were made on Russian families immigrated to England (Kasatkina, 2010), and Scottish
families that use both Gaelic and English (Smith-Christmas, 2012). Language choices can even be used
by children of diaspora families as a way to challenge parental authorities, and, through it, forge new
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
relationships and values within the family (Hua. 2008). In these cases, language is used as a medium to
impart a set of values and signify cultural heritage of the family.
Beyond family, language is also shown to be instrumental in constructing a child's personal and
social identity among peers (Fuller, 2008; Caldas & Caron-Caldas, 2002), and in constructing and
reflecting the social construct in which the speaker belongs (Cashman, 2001). The purpose of language
choice in these situations have more to do with showing an association with an identity or a culture
than with expressing the message that was uttered. Beyond the social-cultural relationship among
interlocutors, language can also be used to indicate a change in topic, to structure a conversation (such
as indicating a side-sequence), or to enhance an expression or to link an element to a specific domain,
experience, or social-cultural setting (Morel, Bucher, Doehler, & Siebenhaar, 2012).
Regardless of the purpose, one can see the difference between how the fields of bilingualism and
information science treat “purpose”. In bilingualism, the purpose of a language choice is in the message
it sends and the social-cultural construct it conveys. For information science researchers, “purpose” is
tied to the anticipated outcome of the information seeking task. Language is chosen as a part of the
user's information seeking strategy. As of yet, there has not been any indication of language carrying
social meanings when used in an information seeking act.
Topic. Topic is about the content of the conversation. It is perhaps the most recognized factor in
CLIR, multilingual information access, and information behavior fields that influences information
seeking behavior. It is often cited as the motivating factor for language choice during an information
seeking task (e.g. Aula & Kellar, 2009; Hong, 2011, etc). Similarly, topic has been accepted by
bilingual researchers as one of the major determinant of language choice (Becker, 1997; Saville-Troike,
2003). There are times when a person associates a topic specifically to a language regardless of the
setting or language skills. This may occur because it is the language in which the person learned about
a subject, so that their store of necessary vocabulary and terminology is in that language. Even though
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
they are not fluent in the language, they would choose to use it when the topic comes up (Saville-
Troike, 2003).
As mentioned previously, topic is often enmeshed with the purpose and settings of a
conversation. For example, Soliman (2008) studied the use of Classical and Egyptian Arabic by
Egyptian scholars during a sermon and found that the topic of the lecture is a strong factor in language
choice. During a sermon, Classical Arabic is used for reciting and quoting from religious texts while
Egyptian Arabic is used for lectures and discussions. Furthermore, Classical Arabic conveys
seriousness and formality while Egyptian Arabic expresses warmth and intimacy (setting). Difficult to
separate from social-cultural setting and purpose, topic itself is seldom studied alone as a determinant
for language switching in bilingualism. Some researchers further suggest that language choice is not
driven by semantics, but by the interactional function of the conversation (Nilep, 2006), further
diminishing the importance of topic. Even so, topic has been listed as an important factor that
influences information seeking behavior, and is hence treated so in this study.
Language attitude. Another attribute that has been found to impact language choice more than
proficiency is the speaker's language attitude (Hakuta & Dandrea, 1992).
Attitude “is a hypothetical construct used to explain the direction and persistence of human
behavior” that is hard to observe and assess (Baker, 1992, p.10). In their study of the maintenance and
loss of Spanish/English bilingualism, Hakuta and Dandrea (1992) measured a subject's language
attitude through statements such as “It's O.K. If a person grows up speaking Spanish, and later forgets
it”, and “It is possible to learn English well without forgetting Spanish” (p.80). Similarly, Cherciov
(2013) measured a person's language attitude through questions about his/her cultural preference,
feelings of homesickness, and importance attached to a language as medium of contact with friends and
family. The effect of language attitude is examined for its impact in heritage language maintenance
(Nesteruk, 2010), language acquisition (Lasagabaster, 2015), language attrition (Cheriov, 2012;
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Bahrick et al., 1994), and language preference (Hakuta & Dandrea, 1992). In fact, Hakuta and
d”Andrea (1992) saw that language attitude acts as a predictor to language preference more so than
language proficiency. In their study, language use in six different settings are explored: 1. with adults,
2. with siblings, 3. in school, 4. with peers, 5. in digital media, 6. alone, and 7. in church. It is unclear
whether this language choice covers digital information resources. This study would explore the
relationship between language attitude and digital information resource through the questionnaire.
The literature thus far cited mostly concerns verbal communications which has been the focal
point of bilingualism research. In more recent years, the alternating use of first (L1) and other
languages (referred to as L2 for simplification) is also observed in non-verbal communications, such as
in composition and in digital media.
First and Second Language Uses in Composition
Though language switching in writing is not studied as extensively as in verbal communication, it
received more attention from the field of foreign language education albeit in a different light. In this
capacity, language switching refers to the use of L1 in the composing of an L2 writing. The thought
process may be conducted in both L1 and L2, but the outcome is always in L2. The situation is
different from the information seeking setting studied in this research in that during information
seeking process, users can decide which language to use. Furthermore, writing requires the ability to
actively recall vocabularies and grammatical rules of a language. While information seeking also
requires active vocabulary recall, queries are often very short, and are not required to have grammatical
structure (Jansen & Booth, 2006). As a result, users who have passive language ability and can read but
not write in L2 might be able to successfully complete an information seeking task. Even with the
inherent differences, research on language switching in composition provides insights into user's
language uses that may apply to information seeking situations.
Language switching between L1 and L2 during L2 composition is often studied for how the
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
behavior might assist and influence the writing process. Language switching was found to help with the
thought and strategy development process, and to improve the quality of the L2 text (Qi, 1998;
Woodall, 2002; Li, 2008; Zarei & Amiryousefi, 2011 Qahfarokhi & Biria, 2012). Researchers suggest
that people might switch between L1 and L2 naturally as “an implicit or explicit problem-solving
strategy” (Qi, 1998, p.429) to facilitate their writing, especially for conceptual activities (Zarei &
Amiryousefi, 2011; Ramirez, 2012). Language proficiency and the difficulty of a task are suggested by
several studies to be strong impact factors to the amount of language switching that is engaged. The
more difficult the task is and the higher the level of knowledge the task demands, the more language
switching was observed (Qi, 1998; Wang, 2003; Li, 2008; Qahfarokhi & Biria, 2012).
Although this type of language switching has not been studied in the context of information
seeking, it likely occurs. If the use of L1 can facilitate a person's conceptual activities and helps
produce better L2 writings, it might also contribute to forming a better query term in L2. When a user
decides to execute a search in L2, they may engage L1 to process and express an abstract concept into
words.
The possibility of users incorporating language switching in the query term formulation process
has not been explored in CLIR or information seeking. Though this proposed study focuses on how a
user's L2 proficiency and thought process influence the selection information resource language,
whether L1 is involved in the process is also of interest.
First and Second Language Uses in Composition Computer-Mediated-Communication
Computer-mediated-communication (CMC) is the act of communicating through digital medias
such as mailing lists, instant messengers, chat rooms, online discussion forums, or social media sites. It
is an emerging subject within code switching (Morel, Bucher, Doehler & Sienbenhaar, 2012;
Androutsopoulos, 2006; Sabahat, 2013). CMC is similar to verbal communication except that it is
conducted through written words and computers. The participants are not face-to-face, and the
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
communication could happen asynchronously. The differences between CMC and verbal
communication are large enough that there are calls to treat CMC in its own right, separate from verbal
or written communications (Androutsopoulus, 2013). Nevertheless, there are similarities in why people
choose to use one language over the other is digital media and in verbal communication.
Researchers have found that, communicating online, users choose languages based on their
language proficiency, the specific setting (e.g. formal vs. informal, etc), his/her purpose (e.g. to express
sarcasm, to emphasize a point, or to express identity), the specific topic and subject domain, and the
common understanding of the group (Morelet al., 2012; Sienbenhaar, 2006; Sabahat, 2013; Sperlich,
2005). The online media use examined involves interacting with other people and carries the social-
cultural subtexts that is not fully applicable to information seeking situations. It is the purpose of this
research to examine these factors applicability within an information seeking session.
Language Exposure and Language Dominance
Language exposure concerns the amount of contact a person has with a language. It can be
measured by the length of residency in the second language country, or a person's reported use of the
second language in various forms (Krashen, 1982). Language exposure has been found to have
significant impact on a person's language acquisition (Love, Mass, & Swinney, 2003; Bahrick, et al.,
1994) and language processing abilities (Morford, 2002). Several studies have found that the earlier a
person is exposed to a language, the better he or she learns a language (Kovelman, Baker, & Petitto,
2008; Mayberry, Lock, & Kazmi, 2002; Morford, 2002). Yet other studies have refuted the significance
of age of exposure, and point to the length and degree of exposure to be more influential (Bahricket al.,
1994; Bedore, Pena, Summers, Boerger, Resendiz, Greene, Bohman, & Gillan, 2012). These studies
find that current language use is a more important predictor of language performance. Higher exposure
to a language also leads to better retention and less language loss throughout time (Rott, 1999).
Furthermore, it leads to a change in language dominance. Though in most times, language dominance
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
and preference is found to be task specific, for general tasks, the language a person is most exposed to
emerges as the one that is more dominant (Bahrick, et al, 1994).
As discussed in a previous section, some researchers view language dominance as language
proficiency. For example, in Aparcio and Lavaur's (2014) study of language proficiency and language
processing speed, the researchers uses “dominant language” to refer to the language that a person is
most proficient. I adopt another point of view and sees a dominant language as the one that a person
chooses to use over other learned languages. A language may gain dominance due to a person's
language proficiency, but also because of language exposure, the task at hand, and other variables
(Bahrick, et al, 1994; Grosjean, 2008; Lim, Liow, Lincoln, Chan, & Onslow, 2008).
Language dominance is observed as a variable in the maintenance and use of heritage language
in immigrant families (Suarez, 2002), in the acquisition of second language in a bilingual community
(Gathercole & Thomas, 2009), and for clinical assessment and intervention (Lim et al., 2008). Yet there
is no exploration towards the effect of a person's dominant language to their online information seeking
language choice. This study would fill the knowledge gap.
Summary of Literature Review on Bilingualism
As previous examples demonstrate, aside from the cognitive functions of the brain and for
pedagogy strategies, motivations behind language choice and language use are often studied from
social-cultural perspectives in bilingualism. Most studies focus on communications in which language
is used for people to relate to each other, and to relay ideas. Language choice is often indicative of the
speaker's perspective on his/her social identity and carries cultural significance. Languages are selected
not only as a medium of communication, but often serve as a message itself.
During an online information seeking session, language is used to compose a search term, not to
communicate with another person. Language serves, strictly, the role of the messenger. It is the tool in
which a Web user can express an information need. It is mostly used to construct short queries in the
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
purpose of retrieving relevant information resources. This type of language use and the dominant
factors that impact it has not been discussed much. This study would investigate the impact of topic,
purpose, language attitude, language proficiency, and setting, to user's language choice in an
information seeking setting.
Conclusions of the Literature Review
Literature from three fields are reviewed in this section: cross language information retrieval
(CLIR), information seeking behavior, and bilingualism. The three fields have different emphasis. The
focus of cross language information retrieval is on the development of the technology and the design of
the system. A subgroup of the studies examined user's interaction with and thoughts on cross language
information retrieval systems. The number of research on CLIR users may be small, but they illustrated
the heterogeneity of the users and the multiple variables that may impact their behaviors. Most of CLIR
user studies examined the overall CLIR process, and observed the behavioral pattern as well as
variables that may influence the behavior. Language proficiency and subject domain have been
identified as the most notable impact factors for CLIR users on selecting information resources of
different languages. This proposed study takes an alternative viewpoint from traditional CLIR research
by treating language choice as an outcome instead of an impact factor. Language proficiency and
subject domain would be observed as potential impact factors for user's language choice. This proposed
study would contribute to the knowledge of bilingual users' CLIR behaviors by understanding bilingual
users' language choice for digital information resources through a user study approach.
The field of information seeking behavior emphasizes user's behavioral pattern and cognitive
process that initiates and facilitates an information seeking process. The process is influenced by
several external variables. This research examines language as one such variable and purports that the
examination of information seeking process should start with the understanding of a user's language
use. Instead of looking at the information seeking process, this research focuses on how users associate
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
with languages in an information seeking context. Once a user's language selection process is
understood, one can continue to examine the difference between language preference and actual
language use and the cause to this discrepancy.
Language use is a core subject in bilingualism. For bilingualism researchers, language is often
seen as a part of the message that is being imparted. This study sees language as the medium to
compose search terms and retrieve potentially relevant information resources. Within the context of this
study, the social-cultural aspects of language is not an emphasis. At the core of this study is how users
use the languages that they know. However, studies on bilingual users have shown that language choice
is often tied to elements beyond language proficiency, including a person's language exposure, and
language dominance. These are factors that have yet to be explored for their impact on a user's
information seeking behavior.
CLIR, information seeking behavior, and bilingualism each contribute to the understanding of an
aspect of multilingual user's information seeking behavior. This research is positioned at the
intersection of these three fields. Instead of focusing on CLIR system and technology, this study
focuses on users. Instead of asking users how they would use an existing system, this study asks what
language would they prefer to use, and argues that system design should be based on user's language
uses. Instead of treating language as a message, this study views language as a medium and its
selection as a decision made by the user under the influence of other factors. As Wilson (1997) argues,
information seeking behavior is an interdisciplinary subject. As Fishman (1965) posits, there are many
motivators behind a person's language use. Literature reviewed in the previous paragraph points to
language proficiency, language exposure, subject domain knowledge, language attitude and
information availability as possible impact factors to a person's language choice. The research method
of this study (see later chapter for detail) lets the researcher remove information availability as a
variable. In its place, a user's language profile is studied in closer detail for its bearings into a user's
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
habitual use of language and the user's language selection process during an information seeking act.
This research would examine if these variables influence a user's choice of language when using digital
information resources, and explore other possible impact factors. By pulling on existing literature from
the three fields, this study would establish a better understanding of CLIR behaviors in relation to
language use.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Chapter 4. Research Question and Research Methodology
Research Question
The previous chapter’s review of CLIR development, user research, and what we know of
bilingual users, demonstrated that there is a need for more understanding of bilingual speakers and how
they look at digital documents made up of different languages. Existing studies on bilingual users
focused on language proficiency and search task as the impact factors for information seeking behavior.
The present research expands upon them and asks: “What elements within a user’s language profile
influences his/her language choice for digital text documents?”
Hypothesizing that linguistic and psychological factors including the user's language exposure,
attitude, proficiency, preference, and domain of use have impact on a bilingual user’s language choice,
the following assumptions are examined:
1. A bilingual speaker’s language attitude influences he/her language choice.
2. The length and breadth of language exposure impacts a multilingual person’s language
choice.
3. The history of language use impacts a multilingual person’s language choice.
4. A person’s language fluency impacts his/her language choice.
5. The subject matter has effect on a person's language choice for the information resource.
The assumptions are operationalized by limiting bilingual speakers to Chinese-English bilingual
speakers to whom Chinese are first languages (L1) and English are second languages (L2), and
examining the relationship of the above listed assumptions with their language selection results:
1. Language attitude: The likelihood of a bilingual speaker chooses L2 increases when he/she
indicates a preference for L2.
2. Language exposure: The longer a bilingual speaker is exposed to an L2 environment, the
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
longer they have been actively using the language, and the more likely he/she would choose
L2.
3. History of language use: The longer a bilingual speaker has been using L2, the more likely
he/she would choose the L2 versions.
4. Language proficiency: The more proficient a bilingual speaker is with L2, the more likely
he/she will choose L2 for digital information resources.
5. Subject matter: The less familiar a bilingual speaker is with a subject matter, the more likely
he/she will choose L1 for information regarding that subject.
The goal for this study is not to produce generalized and conclusive tests on the hypotheses, but
to observe trends and see whether the hypotheses warrant further studies and verification.
Research Method
Overview
This section provides a brief literature review on research method to explain the study design.
Detailed description of the measurements and research procedure begins at the next section.
Research method. Many approaches have been used to study a person’s information seeking
behavior. The first type observes users in-situ, and gathers data about users' natural behaviors as they
encounter information seeking tasks in life or work. Such designs make use of diaries (Kuhlthau, 1991;
Kelly, 2006), browser history (Aula & Kellar, 2009), search engine query logs (Keegan &
Cunningham, 2008; Kralisch & Berendt, 2005; Rao & Varma, 2010; Wu, He, & Luo, 2012), or surveys
and interviews about user's real-life experiences (Rieh & Rieh, 2005; Clough & Eleta, 2010; Nzomo,
Rubin, & Ajiferuke, 2012).
The second type conducts user studies in laboratory settings with prescribed tasks in order to
control the context in which the search occurs. The tasks could be designed to simulate information
seeking situations that users encounter in life (Hong, 2011), or designed to expose users to unfamiliar
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
situations and tools (Petrelliet al., 2004; Ruiz & Chin, 2010; Artiles, Gonzalo, Lopez-Ostenero, &
Perinado, 2007). This research uses a combination of both types of methods: user background and
actual information seeking habit are gathered by a survey; their language preference will be observed
through prescribed document selecting exercise.
Bilingualism researchers have used various subjective measurements to measure a person's
language skill and dominance. Measurements such as pronunciation (Hopp & Schmid, 2011), tip-of-
the-tongue experiences (Ecke & Hall, 2013) and language tests (He, Wang, Oard, & Nossal, 2002).
More often than not, researchers rely on participant's self-report (Artiles, Gonzalo, Lopex-Ostenero, &
Peinado, 2007; Clough & Eleta, 2010), which has been shown to produce reliable and valid
measurements, and correlates highly with standardized tests and judged ratings (Lim, Liow, Lincoln,
Chan, & Onslow, 2008). This study relies on participant self-report through online surveys to gather
data on the participant’s language history, use, proficiency, and other profile elements.
However, to counter any occurrences of over- or under-estimation of one’s language skills or
language uses (Ayers, 2010), this present study also gathers behavioral measurements as additional
evidence by observing their language choice and article language selection results. The two groups of
data, observations of actual language selection for the survey and for the article selection exercise, and
the verbal or written input from the surveys, are combined to develop what Yin (2009) calls the
“converging lines of inquiry” (p.115) that more fully describes users’ use of and attitude towards a
language.
Population and sample size. There are no established standard profiles for CLIR users nor
known population that could be used to draw from for a randomized study. Users who search for
information across languages could be of different nationality, speak different languages, have different
levels of language proficiency, work in different fields, and have different information needs. As a
response to the immensely heterogeneous demography, MLIA and CLIR researchers have largely
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
adopted the purposive sampling or convenience sampling approaches. Using purposive sampling,
“[m]embers of a sample are chosen with a ‘purpose’ to represent a location or type in relation to a key
criterion” (Ritchie, 2003, p.79). Some studies use a small sample size with purposive sampling
approach to examine a specific segment of potential CLIR users: Hong (2011) interviewed 21 Chinese-
speaking graduate students; and Petrelli et al. (2004) conducted field studies with 10 journalists and
librarians. There are also studies that use convenience sampling approach for first-hand observation of
multilingual users: Aula and Kellar (2009) did reflective interviews with 10 IT professionals working in
foreign countries. Though findings from such studies cannot be applied to all bilingual Web users, they
nonetheless contribute to the understanding of the greater population.
There are larger scale studies that reached out to a wider population. Clough and Eleta (2010)
recruited 514 participants through online invitations for their studies on academics’ use of multilingual
digital libraries. Similarly, Wu, He, and Luo (2012) targeted potential digital library users within the
academia and recruited 358 participants for their study, also on multilingual digital library usage,
through email list. Steichen et al. (2014) used academic mailing list to recruit 385 users for their study
on multilingual user behaviors.
This study is designed as an exploratory research. The purpose is not to produce a generalizable
result, but to observe trends and explore possible impact factors; therefore, the selected sample size is
not large. The participants are recruited from deliberately selected sample pools through convenience
sampling, snowball sampling, and online invitations. The participant recruiting process loosely follows
the heterogeneous sampling approach (Ritchie, et al., 2003) in that the invitations are sent to potential
participants that reflects the diversity of Chinese-English bilingual speakers. More specifically,
invitations to participate in the survey are sent to Chinese-English speakers living in the US and in
Taiwan in order to gather user response from bilingual speakers with different language proficiency,
language background, and language exposure.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
The sample frame for this study is set to Chinese-English bilingual speakers to account for this
study’s goal of studying bilingual user’s language preference and use; and the researcher’s ability to
communicate and accommodate a bilingual user’s language preference during the study. The
participants are recruited from diverse backgrounds in order to ensure the inclusion of different
language profiles. The main group of the participants are Chinese speakers with English as their second
language who resides in the United States. A second group of Chinese speakers who knows English
were recruited from Taiwan as a comparison group for the impact of language environment factors.
Research process. The study procedure includes a pilot study and a larger scale user survey.
The pilot studies are conducted through video conferences so that the researcher can directly
interact with and observe participants. The participants are asked to incorporate Think Aloud protocols
during the session, and a semi-structured interview (see Appendix IV for interview script). The final
survey was modified based on pilot study user inputs so that the questions are more clear and precise.
The research procedure is detailed in a later section.
Measurements
This present study observes four variables that have potential influence to a person's language
choosing outcomes: attitude, exposure and use, fluency, and the subject matter of the information
content. Data are collected through a language profile survey (Appendix III) and a user study
(Appendix IV) to help evaluate the existence of any observable trends that supports the theses
established in the previous section. Detail description of the relationships between the variables are
given in the Materials section. Impact of the variables can be expressed as 2x2 or nxm contingency
tables. The variables and how they are operationalized are discussed in the following sections.
Language Attitude
As defined in Chapter 2, language attitude is the feeling a person has towards languages that is
either known or not known to the person. Language attitude is viewed as a composition of three
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
components: cognitive, affective, and readiness for action. In this research, language attitude is
measured through participants’ cognizant and subconscious actions – their language choices for the
survey questions (survey language) and to answer in (answer language); cognitive reflections on survey
questions about dominant language, language preference, and cultural identification; and from the
open-ended questions that collected participants’ affective reactions toward languages.
Language Exposure and Experience
In this study, language exposure refers to coming into contact with a language either passively
in one’s surrounding, or in active act of communication. To be exposed to a language does not require
active participation; hearing English being spoken by others, and surrounded by English signage are
forms of exposure. In the survey, language exposure is measured by: (a) how long a participant has
been living in the US, an English-speaking country, and (b) the amount of English exposure in daily
life.
Unlike exposure, daily use of a language requires employment of language skills such as
comprehension and reading. It involves actively using the language on the person’s part. Participants of
this study are asked how long they have been consistently using English daily as a measurement of how
much experience they have with using English.
Language Proficiency
Language proficiency refers to how well a person can speak, write, read, and comprehend a
language. There are multiple ways to measure a person’s language proficiency. One of the data
collecting material used in this study is a modified LEAP-Q survey (see next section, Material, for
more discussion). In the survey, participants are asked to self-rate their language proficiency level as:
(a) able to read place names, signs, some words and phrases; (b) able to read simple paragraphs on
familiar subjects; (c) able to read general news articles as well as reports and technical materials on
familiar subjects; (d) able to read all styles and forms of documents related to professional needs; and €
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
equivalent to the language proficiency level of an educated native speaker.
Subject Matter
Subject matter refers to the topic of the article. It is operationalized using the different news
articles included in the article selection exercise. The subjects of the articles include entertainment
(movie and TV series), cultural commentary, interior design, politics, science, and local news.
Putting it Together
This research assumes that each of the variables listed above have potential influence on how
people select the language to use, which in term might influence their information seeking behavior ().
Figure 2. Language Choice Variables and Information Seeking BehaviorMaterial
Language Proficiency and Internet Usage and Experience Survey
At the start of the session, after the participant has gone through the informed consent, they are
asked to fill out a language proficiency and Internet usage survey (Appendix III) that collects data on
their language skills, language use, language preferences, language exposure and Internet use
frequency. The survey is based on the Language Experience and Proficiency Questionnaire (LEAP-Q)
developed by Marian, Blumenfeld, and Kaushanskaya (2007). Leap-Q is designed to create a language
profile of a participant through a series of language use impact factors, including self-rated language
proficiency, history, exposure, attitude, as well as amount of and domain of use
Language Attitude
Language Exposure
Language
ProficiencySubject Matter
Language
Choice
Information
Seeking Behavior
Influences InfluencesLanguage History
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
(http://www.bilingualism.northwestern.edu/leapq/). LEAP-Q's reliability and internal validity has been
established through a comparison of survey results and standardized language tests results (Marian et
al., 2007).
For this research, LEAP-Q is converted from paper to digital form. The questions include
multiple choices, single choices, and open-ended questions. The single-choice questions are asked in
the formats of dropdown lists, radio boxes, and sliding scales. The multiple-choice questions are asked
in the formats of checkboxes. LEAP-Q’s original questions are adjusted to include language use in the
context of online information seeking sessions: the World Wide Web is added as one of the possible
source of exposure and language use in the questionnaire; additional questions about Internet use are
added to the end of the survey to collect users' regular Web use behavior and information seeking
habits.
The survey is available in two languages: Chinese and English. The Chinese version is available
in two forms: Simplified Chinese and Traditional Chinese. Chinese script has evolved over
millenniums into Traditional Chinese form, which can be cumbersome and time-consuming to write.
People have come up with different variants of the traditional form to make the writing process easier.
In the 1950’s, in order to regularize simplified forms of the Traditional Chinese script, Simplified
Chinese was developed and promoted for common use in the People’s Republic of China (Bokset,
2006). As a result, Traditional Chinese is the standardized form used in Taiwan, and Simplified Chinese
in Mainland China. Though the two scripts are different, the meaning, pronunciation, and syntax of the
words and phrases remain the same. Both scripts are offered to accommodate the reading habits and
preferences of participants.
The users’ selections of survey language and their consequent use of languages within the session
are recorded and included in the data analysis as indications of user's language preference and language
fluency.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Article Selection Software
The article selection exercise is conducted through a Web application that I created specifically
for this research. The application contains a database of ten news story titles and excerpts in English,
and the same article titles and excerpts in simplified and traditional Chinese. News stories are used in
this exercise because they are written for the general public and cover a wide range of topics.
The articles are collected from BBC and New York Times during the period of April 1st, 2014 to
April 15th, 2014. Both news services have a main English site (www.bbc.com and www.nytimes.com)
and Chinese site (http://www.bbc.com/zhongwen/simp and cn.nytimes.com). The sites contain
independent stories as well as articles of comparable contents. Only articles that are available in both
English and Chinese are selected. Subject matters of the articles selected include: entertainment (movie
and TV series), cultural commentary, interior design, politics, science, and local news. Each story
contains proper nouns and/or technical jargons. Some stories, such as the ones about entertainment and
the local news, contains proper nouns, and some, such as the science articles, contain technical terms.
The article selection exercise begins with a brief instruction instructing the user to select between
a Chinese and an English version of the same article. The participants are not asked to search for
information, but to react upon the information that is presented to them. They are asked to choose a
language intuitively without giving it much thought. Once started, the participants are presented with
eight articles in English and the Chinese script of their choice. The articles are shown individually, with
the two language versions side by side (see Appendix IV for the application). The articles and the order
in which the language versions are presented are randomized but consistent from session to session so
that each user sees the articles in the same order and presentation.
Think Aloud Protocol
The pilot study incorporates Think Aloud protocol in order to gather as much data as possible
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
while minimizing possible influences to the participant's thought process (Ericsson and Simon, 1993)
are used. At the beginning of the study, the users are introduced to the protocol and asked to verbalize
their thoughts as if they are talking to themselves while they perform the article selection tasks. They
may describe what is going through their minds or explicate their thought processes, but they are
warned against providing explanations to their behavior. The purpose is to gather thoughts that would
normally occur during the task instead of asking them to be introspective of their actions. Asking the
users to examine their own motives and logic would likely alter their thought process, and change their
behavior. During the think aloud protocol, the researcher is present but limits interaction with the
participant to prompts to continue verbalizing so as to not affect the user's articulation. The Think
Aloud protocol is only used during the pilot study.
Article Selection Follow-Up Questionnaire
The final part of the research session is a questionnaire consisting of open-ended questions that
ask users to reflect upon the article selection exercise. The participants are shown all the articles titles
they have just seen in a list, with the two language versions side by side, and the version they selected
in bold font. The article selection results are presented as prompts to help participants retroactively
answer questions about how they choose the articles, and what they thought of it.
Population
The subjects of this study are Chinese-English bilingual, Internet using adults. The subjects are
able to use both languages in either reading, writing, or conversing. Although there are US census data
on languages spoken at home, there are no collaborative data on bilingual speakers who are also
Internet users. With the population unknown, and no established user profile, this study recruits
deliberately within designated sample frames that reflects diversities existing among bilingual
population. Five subjects were recruited for the pilot study. The subjects are Chinese-English bilinguals
who reside in the US and use both languages daily.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Two sets of participants were recruited for the general study. The first is Chinese speakers who
are able to speak English residing in the United States. The second is Chinese speakers who have
studied English residing in Taiwan. The reason for having two groups of participants is to make certain
diverse language profiles are reflected in the samples. Invitations for the general study were sent to
Chinese-English bilingual speakers residing in the US as well as in Taiwan through email lists, message
boards, and personal email invitations. Participants are also encouraged to forward the invitation to
others who meet the study criteria. The purpose of this approach is to collect enough responses to show
trends and patterns. Data is gathered during the month of March 2017.
Procedures
Pilot study
Invite Chinese-English speakers of different background, currently residing in the US, and are
eighteen years or older to participate in the pilot study. Schedule each for individual sessions. Ask for
their preferred video conference format and set up a time for the study session.
At the time of the study, call the participant. Greet him or her in both Chinese and English, and
ask them which language would they like the researcher to use in the remainder of the session. This
will be the language that the instructions, assistance, and interview questions be given in. If they
choose Chinese, ask if they prefer traditional or simplified Chinese for the written materials. Continue
the debriefing and the rest of the session in the selected language.
Direct the participant to the survey website and commence with debriefing. Introduce the
participant to the research, go over the research agenda, and obtain informed consent (Appendix I).
Give the participant opportunities to ask questions and decline participation if they so wish. If they
wish to continue, instruct the participant of the Think Aloud protocol and ask them to verbalize their
thoughts during the session henceforth. Encourage them to ask questions during the session, ask them
to click on the “Begin” button to start the survey.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Administer the modified LEAP-Q questionnaire (Appendix II). If the participant seems stuck at a
question or show signs of confusion, ask them what they are thinking. Remind them to think aloud
while answering the questions.
After the survey, let the participant begin the article selection exercise. Observe quietly and only
prompt participant to express their thoughts if he or she silent for an extended period of time (10
seconds).
After the article selection exercise, proceed to the follow-up questionnaire. After the participants
finish the questionnaire, conduct final semi-structured interview (Appendix VIII Interview Script).
General User Survey
Revise survey questions and instructions based on pilot study participant inputs.
Send personal email invitations to Chinese-English bilingual speakers residing in the United
States. Post invitations to area Chinese schools and social media lists. Include survey URL in the
invitations.
Continue to send out invitations to the survey and collect data for the duration of two months.
Scope and Limitations of the Study
This proposed project examines the language use of Chinese-English bilingual speakers from the
Chinese-English communities that the researcher has contact with using materials selected and
designed for this proposed research. The focus on user’s reaction to languages places the system
development, interface design, and other technical aspects of CLIR out of the scope of this study.
Without a clearly defined bilingual Web user profile, this study, like others, uses non-random
sampling approaches, consequently the participants cannot be viewed as representative of all bilingual
speakers. Furthermore, participants of this study are self-selected: they are those who responded to the
online or email invitations and volunteered their time to contribute to this research. As a result, the
participants may be interested in the subject or are more aware of their own language use to begin with.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
The use of a digital survey accessible online also led to the exclusion of Chinese-English speakers who
are not comfortable using digital information resources, prefer physical documents, or have no easy
Internet access. Because of these concerns and the small sample size, results of this study cannot be
used to describe all bilingual Web users. It contributes to the literature on MLIA and bilingual users in a
more confined way.
The proposed research is limited by its procedure to explore how participants' select between
languages they know. The materials used in this research are designed by the researcher and tailored to
this study. The materials may not reflect participants’ preferences for digital information resource or
real life information needs, and may not perfectly simulate real life language choice. Furthermore, the
proposed procedure of this study examines user's attitude and behavior towards receptive and passive
language use. Users need to decode the language when the article is presented to them, but they are not
required to actively recall words or compose queries. Users might demonstrate different behaviors and
have different preferences when they are asked to formulate search terms themselves and need to
engage in productive language use.
In short, the scope of this study is defined by the use of a specific group of bilingual users and
prescribed materials, the involvement of reading skills, and the selection of reading material between
two languages. The small sample size and the sampling method prohibits the results to be generalized.
Regardless, the findings would be valuable to CLIR, information seeking behavior, bilingualism, and
other fields that take interest in bilingual user's information seeking behavior and language use.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Chapter 5. Data Analysis
In this chapter, the survey results are presented and analyzed following the direction of
the research questions detailed in chapter 2. The majority of statistical analysis completed in this
current study uses nonparametric statistics such as Kruskal-Wallis test and Mann-Whitney tests. This is
because many of the variables, such as English proficiency level and daily English exposure, are non-
normally distributed, has outliers, and/or are ordinal variables. Categorical variables are compared
using chi-square tests. (Moore, McCabe, & Craig, 2009)
Pilot Study Results
The researcher was able to recruit five participants for the pilot study. The pilot studies were
conducted through timed video conferencing, and audio recorded. The participants were asked to think
aloud during the sessions. After the survey, they were asked to comment on the study process and
survey questions. The sessions ranged from 25 to 45 minutes.
From the pilot study results, instructions were added at the beginning of the survey to let
participants know that they can choose to answer the survey in any language, even if it is a language
different from the one they chose to have the survey be presented in. They are also assured that they
can switch languages between questions. A few survey questions were reworded for clarity. A
redundant question about the article selection process was replaced with one that asks about language
dominance. The article selection exercise was shortened from twenty articles to eight to avoid
participant fatigue. Finally, a special version for Internet Explorer was created due to the browser’s
inability to display certain HTML5 scripts.
General Survey Results
Forty email invitations were sent to Chinese-English speakers currently residing in the US. Five
email invitations were sent to Chinese-English speakers currently residing in Taiwan. The invitation is
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
further posted at four Chinese community email lists in the US, three universities with international
student populations (one on the East Coast, one in the South, and one in the Mid-West), and at one
university in Taiwan. In total,184 responses were received, 32 of which were only partially completed,
leaving 152 (82.6%) completed survey. Three participants reached out to the researcher after their
survey was submitted to convey further thoughts on language and digital resources.
The results are examined as structured in the survey in sections: language profile, daily
language use, use of language online, and the article selection exercise. The data are also analyzed
across section in correspondence to the research questions. The goal is to identify variables that may be
associated with a bilingual user’s language choice for digital information resources.
Language Profile
Basic demographic information. The 152 participants are made up of 116 females (76.3%)
and 36 males (36%). They range from 19 years old to 65 years old (M = 35.7, SD = 12.218). The
participants have on average lived, or have been living in, the US for 11.3 years (SD = 9.92): 106 of the
participants (69.7%) have lived or are still living in the United Sates, and 46 (27.6%) may have visited
but have never lived in the US. Of those who live in the United States, eight (6.9) only stayed for a
year, 12 (11.3%) are here for under five years, and 94 (88.7%) have been in the country for six to 42
years. In total, the 116 participants have lived in the United States for an average of 15.58 years (SD =
8.349), with the longest duration being 42 years.
Regarding the participants’ daily environment, the majority (140, 92.1%) of the participants
speaks mainly Chinese within their family; only a minority (42, 7.9%) uses mainly English in their
family. Outside of the home, 106 participants (69.7%) have spent more than one year in an English-
speaking work/school environment.
In contrast, all of the participants have lived in a Chinese-speaking country at some point in
their lives. On average, participants lived, or have been living, in a Chinese-speaking country for 23.56
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
years (SD = 9.21). Furthermore, more of them (124, 81.6%) have lived in a Chinese-speaking country
longer than in an English-speaking country, as well as having a Chinese-speaking family (140, 92.1%).
The majority of the participants (138, 90.8%) have spent more than one year in a Chinese-speaking
work/school environment.
First, dominant, and preferred languages. Almost all of the participants (151, 99.3%) speak
Chinese as their first language, and one (0.7%) speaks English as his/her first language. Not everyone
views their first language as the dominant language however (see Chapter 2 section 3 for a detailed
discussion of the definition of dominant language). When the participants were asked to list the
languages that they most intuitively use in life, the majority of them listed Chinese (113 participants,
74.3%), a second group listed English (33 participants, 21.7%), and the rest listed Cantonese (3
participants, 2%) and Taiwanese1(3 participants, 2%). The final count shows that 36 participants
(23.7%) have a dominant language that is not their first language.
All of the participants (100%) know a second language (Cantonese, Chinese, English, French,
Japanese, and Taiwanese), and 96 (63.2) of them listed English as their second most dominant
language. 35 of them (23%) speaks a third language (Chinese, English, French, Italian, Japanese,
Korean, Portuguese, Russian, Shanghainese, and Taiwanese); and four of them (2.6%) speaks a fourth
language (Japanese and Taiwanese).
Although Cantonese, Shanghainese, and Taiwanese are languages spoken in specific locales
(Hong Kong and Macau, Shanghai, and Taiwan, respectively), each with its own distinct tonality,
pronunciations, grammar, and lexicon, they are used in areas that deploy Mandarin Chinese, or
Putonghua, as the standard language and script (Gong, Chow, & Ahlstrom, 2011). Participants who
listed Cantonese and Taiwanese as their dominant language all listed Mandarin Chinese as their second
1 Here, Taiwanese refers to Taiwanese Hokkien (臺灣閩南語), a local language brought to Taiwan in the 1600’s by emigrants from Southern China. It is spoken by approximately 70% of the population in Taiwan (Tomala, 2016). Taiwanese has its own tone and pronunciations, terms, and syntax.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
most dominant language. These participants are all well versed in Mandarin Chinese, and are counted
as Chinese-English speakers.
A chi-square test of independence was conducted to observe the relation between the dominate
languages and whether the participants have lived in the US. The result was significant, X2(1, N=152)
=13.081, p<0.001. Participants who lives in the US has a higher chance of having English as their
dominant language than participants who lives in a non-English speaking country.
A Mann-Whitney U test was conducted to determine whether there was a difference in the
number of years living in the US between participants who are English dominant, and participants who
are Chinese dominant. A Mann-Whitney U test was used because the data do not meet the assumptions
for independent-samples t test, even though the sample size is of adequate size: (a) the number of
Chinese dominant participants are about 3.5 times larger than the number of English dominant
participants, (b) the samples failed the normality test and has two outliers, and (c) the two groups have
different distributions (Moore, McCabe, & Craig, 2009). The result indicates that English dominant
participants has lived longer in the US (Mdn = 17) than Chinese dominant participants (Mdn = 10), U =
1101.5, p <0.001.
When the subjects decide to participate in the study, they are first asked to choose between
Chinese and English as a survey language - the language in which the survey instructions and questions
are presented. They are also given the instruction to answer in whichever language they are most
comfortable with. Of the 119 (78.3%) participants who chose Chinese, Taiwanese, or Cantonese as
their dominant language, 98 (93.2%) of them selected Chinese as their survey language. Out of the 98,
71 (72.4%) used Chinese as their answer language. On the other hand, 12 out of the 33 participants
(36.4%) who chose English as their most dominant language chose English as the survey language, and
27 (81.8%) answered questions in English.
Chi-square tests of independence was performed to examine the relations between dominant
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
language, survey language, and answer language. The relation between dominant language and survey
language was significant, X2(1, N=152) = 5.325, p=0.021. From the data, participants who views
Chinese as their dominant language are more likely to select Chinese as the survey language. Dominant
language and answer language also have a significant relation, X2(1, N=152) = 17.786, p<0.001.
Participants whose dominant language is Chinese are more likely to use Chinese as the answer
language, and participants whose dominant language is English are more likely to use English as the
answer language. The relation between survey language and answer language was significant as well,
X2(1, N=152) = 43.275, p<0.001. Participants who choose English as their survey language are more
likely to use English as their answer language, and participants who choose Chinese as their survey
language are more likely to use Chinese as their answer language.
Language exposure. The participants are asked to assign a percentage to the amount of time
they are exposed to each of their known languages in their daily lives, 77 of them (50.7%) assigned
more percentage points to Chinese than to other languages. This includes 42 participants currently
residing in Taiwan. Three participants (2%) ranked Cantonese, and 52 (34.2%) indicted English as the
language they are most exposed to. The rest of the 19 participants indicated that they are exposed to
Chinese and English equally (50% and 50%) on a daily basis.
A Pearson’s correlation was computed to assess the relationship between number of years living
in the US and the amount of language exposure. The result shows a moderate, positive correlation
between the two variables, r(152)=0.627, p<0.001. The longer a participant has lived in the US, the
more they are exposed to English on a daily basis. (Figure 2)
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Figure 3. Daily English Exposure in Percentage vs. Number of Years Residing in the US
A Mann-Whitney U test was conducted to determine whether there was a difference in the
amount of daily English exposure between participants who are English dominant and participants who
are Chinese dominant. A Mann-Whitney U test is used because of the difference in group size, non-
normal and different distributions of the data, and the existence of outliers. The result shows that
English dominant participants are exposed to a higher percentage of English daily use (Mdn = 80%)
than Chinese dominant participants (Mdn = 35%), U = 361, p < 0.001.
Furthermore, Kruskal-Wallis tests show that although the amount of daily English use is
positively related to the choice of answer language (X2 [4, 152] = 24.232, p = <0.001), it has no impact
on the choice of survey language (X2 [4, 152] = 9.135, p = 0.058). As Figure 4. Survey Language and
Amount of Daily English Exposureshows, the amount of language exposure does not have a clear
relation with the selection of survey language selection. There is a slight trend downward tick that
suggests there are fewer and fewer participants who selected Chinese as the survey language as the
amount of daily English exposure percentage increases, and a slight upward trend for participants who
selected English as their survey language. The trend is not obvious, hwoever.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Figure 4. Survey Language and Amount of Daily English Exposure
Language preference. Another aspect of the language profile is the participant’s preference to
speak or read in a language if they are given full control of the situation. Participants are asked, if they
are to communicate with a person who has the same language abilities, which language would they
choose to use. The three Cantonese and three Taiwanese speakers strongly prefer their dominant
language, Cantonese and Taiwanese respectively, over other languages. Of the rest, 109 (71.7%) chose
Chinese, 21 (13.8%) chose English, and 16 (10.5%) indicates an equal chance of choosing English or
Chinese.
A chi-square test of independence was conducted to examine the relation between dominant
language and the preferred spoken language for the Chinese or English dominant participants who
clearly favors one over the other for communication. The relation between these variables was
significant X2(1, N=152) = 24.2, p<0.001. Chinese dominant participants are more likely to choose
Chinese as the spoken language, and English dominant participants are more likely to choose English
as the spoken language. A Mann-Whitney U test was conducted to determine if there is a difference in
spoken language preferences between English dominant participants and Chinese dominant
participants. The results show that English dominant participants have a higher preference for English
as the spoken language (Mdn = 50%) than Chinese dominant participants (Mdn = 10%), U = 1070.5, p
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
< 0.001.
For text language preference, participants were asked what language would they choose to have
a document in an unknown language be translated into. Chinese is chosen by 93 (61.2%) participants,
followed by English by 47 (30.9%) participants. Ten of the participants (6.6%) indicated that there are
equal chances (50% and 50%) they would choose either English or Chinese, and two of the Cantonese
speakers (1.3%) chose Cantonese as their preferred text language. For the English or Chinese dominant
users who have clear preferences over which language for translation, a chi-square test of independence
was performed to examine the relation between dominant language and preferred text language. The
relation between these variables was significant, X2(1, N=139) = 31.4, p<0.001. A participant is more
likely to choose their dominant language as the translation language. A Mann-Whitney test was
conducted to determine if there is a difference in translation language preferences between English
dominant participants and Chinese dominant participants. The result shows that English dominant
participants have a higher preference for English as the translation language (Mdn = 65%) than Chinese
dominant participants (Mdn = 15%), U = 746, p<0.001.
A Pearson’s correlation coefficient was calculated to estimate the relationship between the
spoken and the text language preferences. The results, r = 0.420, n = 152, p<0.001, indicates a
moderate to weak, positive relation between the two, as illustrated in figure 2.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Figure 5. Scatter Plot – English as Spoken Language vs. English as Text Language
Kruskal-Wallis tests were performed to see if language proficiency has any influence on the
spoken and reading language preferences. The nonparametric Kruskal-Wallis test was used because the
sample failed the normalcy assumption for ANOVA tests (Moore, McCabe, & Craig, 2009). The results
show that a participant’s assigned possibility of using English as the spoken language is higher when
he/she has a higher rated English proficiency, X2(4, 152) = 26.215, p<0.001. Similarly, a participant
assigns higher possibility to use English as the reading language when he/she has a higher rated English
proficiency, X2(4, 152) = 47.132, p<0.001.
Pearson’s correlation was calculated to examine the relationships between the amount of daily
English use (in percentage) and the participant’s preference of spoken and translation languages.
The results indicate a mild, positive relationship between amount of daily English use and
English as the preferred spoken language (r[150] = 0.409, p<0.001), and a stronger, positive
relationship between amount of daily English use and English as the preferred translation language
(r[150] = 0.543, p<0.001).
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Culture identification. To explore the impact environmental language exposure has on a
participant’s language attitude, they are asked to list and rank the cultures that they most identify with.
Most of the participants expresses the strongest identification with the culture of their birth country: 66
participants (43.4%) identified most with Taiwanese culture, 42 (27.6%) with Chinese culture, 22
(14.5%) with North American culture. A smaller number of participants identify most with religions: 14
(9.2%) with Christianity and 2 (1.3%) with Taoism. The rest of the participants each (0.7%) identifies
most with: Asian culture, Chinese American, Taiwanese American, Japanese culture, and Western
culture (a combination of North America and European).
Daily language use. When it comes language use, 114 participants (75%) have been using
English daily for five years or more, 10 participants (6.6%) three to four years, three (2%) two to three
years, four (2.6%) for under a year, and 21 participants (13.8%) had not have a chance to use English
daily consistently. In comparison, 146 participants (96.1%) have been using Chinese daily for five
years or more, 1 (0.7%) for two to three years, 1 (0.7%) for under a year, and four (2.6%) for having
never used Chinese daily consistently (Figure 6).
Pearson’s correlation was calculated to observe the relationship between the number of years a
participant has lived in the US and the number of years they have been using English daily. The two
variables are moderately related (r[150] = 0.513, p<0.001). It is likely a person who has lived longer in
the US would also have been using English daily longer. It is, however, not a strong correlation, and
one variable is not an indication of the other.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Never 1-2 Years 2-3 Years 3-4 Years 5 Years or More0
20
40
60
80
100
120
140
21
4 310
114
4 1 1 0
146Daily Use of Language
Use English Daily Use Chinese Daily
Figure 6. Daily use of language.
Interestingly enough, Pearson’s correlation finds that a participant’s accumulated English using
experience, expressed as the length of time a person has been using English daily consistently, is only
mildly correlated to the amount of daily English exposure the person experiences (r[150] = 0.48, p <
0.001).
Figure 7. Amount of Daily English Exposure and the Length of Daily English Use
Note: For English daily use, 0 – Never consistently used English daily, 1 – Have been using English daily for under a year, 2 – for 1-2 years, 3 – 3-4 years, 4 – 5 years and more.
A Mann-Whitney test result shows that participants with English as dominant language are
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
likely to have used English daily longer (U = 1470.5, p = 0.004).
Pearson correlation was calculated to see if the number of years participants use English daily
correlates to the participant’s spoken and translation languages. There is a moderate, positive
relationship between the duration of daily English use and English as the preferred spoken language
(r[150] = 0.384, p<0.001), and a weaker, positive relationship between duration of daily English use
and English as the preferred translation language (r[150] = 0.259, p<0.001).
The participants were asked about their language use in six settings: at work, communicating
with family, watching TV, communicating with friends, learning new subject, listening to radio, using
the Internet, and reading. They were asked to indicate which language they use more in each of the
settings. The results are summarized in Table 1.
Table 1. Use of Language in Different SituationsUse More Eng-lish
Use More Chi-nese
Type of Envi-ronment Frequency
Percent-age Frequency
Percent-age
Work 98 64% 49 32%Family 22 14% 130 86%TV 72 47% 59 39%Friends 38 25% 113 74%Learn 108 71% 41 27%Radio 73 48% 41 27%Internet 78 51% 54 36%Reading 72 47% 75 49%
Note: Bold font highlights the higher percentage.
Language fluency. Participants are asked to rate their ability to read in each language. They are
asked if they can: (1) only understand words and phrases; (2) read simple paragraphs on familiar
subjects; (3) read general news articles as well as reports on familiar subjects; (4) read all styles and
forms on familiar subjects; and (5) read as well as native speakers. Most of the participants (133,
87.5%) have English proficiency level that can at least read and comprehend general articles written in
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
English. Even more participants (147, 96.7%) have Chinese proficiency level that can at least read and
comprehend general articles written in Chinese. The breakdown is presented in Table 2 and 5.
Table 2. Participant English Reading ProficiencyProficiency Frequency Percent Cumulative PercentWords and phrases 2 1.3% 1.3%Simple paragraphs 17 11.2% 12.5%General articles 57 37.5% 50.0%All style and forms of writings on familiar subjects 36 23.7% 73.7%Equivalent to educated native speakers 40 26.3% 100.0%Total 152 100.0%
Table 3. Participant’s Chinese Reading Proficiency
Chinese Proficiency
Proficiency Frequency Percent Cumulative Percent Words and phrases 3 2.0% 2.0% Simple paragraphs 2 1.3% 3.3% General articles 11 7.2% 10.5% All style and forms of writings on familiar subjects 2 1.3% 11.8% Equivalent to educated native speakers 134 88.2% 100.0%Total 152 100.0%
In total, 105 participants (69%) rated their Chinese reading ability higher than their English
reading ability, while 10 (7%) rated their English reading ability to be better than their Chinese reading
ability and 37 (24%) of them rates their Chinese and English reading abilities being equal.
A chi-square test was performed to examine the relation between a person’s living in an English-speak-
ing country and his/her English proficiency. The relation between the variables was significant, X2(2,
N=152) = 42.246, p<0.001. Participants who reside in an English-speaking country are more likely to
have a higher self-rated English proficiency level. A Kruskal-Wallis test was performed to examine the
relation between the length a person has lived in an English-speaking country and his/her English profi-
ciency. The test excludes participants who have not lived in an English-speaking country. The result
shows that there is a not a significant effect between the two variables, X2(4, N=116) = 9.430, p=0.051.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Participants with higher English proficiency have not necessarily lived longer in an English-speaking
country. A Pearson correlation coefficient was computed to assess the relationship between the number
of years a participant has lived in the US and the participant’s English proficiency. The result shows a
mild, positive correlation between the two variables, r=0.545, n=152, p<0.001. A scatterplot in Figure 3
summarizes the result. It is likely that the longer a participant has lived in the US, the higher his/her
English proficiency.
Figure 8. English proficiency and number of years residing in US
Note: English proficiency level is measured by a five-point Likert scale: 1 is the lowest level: able to recognize words and phrases; 5 is the highest level: equivalent to native born speakers.
However, a Kruskal-Wallis test performed on participants with higher English proficiency
shows among participants who can at least read and comprehend English articles written for the general
audiences, the ones with higher English proficiency level are likely to have lived in an English-speak-
ing country longer, X2(2, N=133) = 19.645, p<0.001.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
To examine the effect of daily use and English proficiency, a Kruskal-Wallis test was per-
formed. The result was significant, X2(4, N=152) = 41.234, p<0.001. Participants who have been using
English for longer periods of time are more likely to report a higher English proficiency rate. However,
as with the duration one has spent in an English-speaking country, this influence of daily use is not ex-
clusive nor absolute. There are participants who have not been using English daily or have not lived in
English-speaking countries who have high English proficiency ratings.
Pearson’s correlations were calculated to examine the effects of language use and language ex-
posure on language proficiency. The results show that there is a strong, positive relationship between
the amount of daily English exposure and a person’s English proficiency (r[150] = 0.600, p<0.001),
and a strong, positive relationship between the duration in which a person has been using English daily
and the person’s English proficiency level (r[150] = 0.495, p<0.001).
A cross examination of the fluency ranking with dominant language choices shows that none of
the 19 participants who rated themselves at the lower English proficiency level (only know words and
phrases, or can read simple paragraphs) has English as their dominant language. A chi-square test was
administered on the 133 participants with middle to higher English proficiency ratings (can read gen-
eral articles, is able to understand all style of forms of writing on familiar subjects, and equivalent to
educated native speakers) to examine the relation between English fluency and the choice of English as
dominant language. The relation between the variables was significant, X2(2, N=133) = 28.1, p<0.001.
Participants with higher English proficiency are more likely to choose English as their dominant lan-
guage.
The result was confirmed by a Mann-Whitney U test conducted to determine whether there was
a difference in the participants’ English proficiency level between English dominant participants and
Chinese dominant participants. The results indicate that English dominant participants have a higher
English proficiency level (Mdn = 5) than Chinese dominant participants (Mdn = 3), U = 711, p <0.001.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
However, Chinese dominant participants do not necessarily have lower English proficiencies. A
majority of them (86%) rated their English proficiency as moderate (able to read general articles) and
higher. In comparison, all of the English dominant speakers (100%) rate their English proficiency as
moderate or higher. (Figure 6.)
Figure 9. English Proficiency Level and Dominant Language
Furthermore, chi-square tests were performed on these participants to examine the relation be-
tween English proficiency level and the selection of English as survey language and answer language.
The relation between English proficiency level and the choice of survey language was significant, X2(2,
N=133) = 18.933, p<0.001. Participants with higher English proficiency are more likely to choose Eng-
lish as the survey language. The relation between English proficiency level and the use of English as
the answer language was significant, X2(2, N=133) = 22.25, p<0.001. Participants with higher English
fluency level are more likely to use English to answer the survey questions.
Bar charts provide a better look of the relationships between English proficiency level and sur-
vey and answer languages (Figure 10 and Figure 11). Although as English proficiency level increases,
the number of participants choosing English as survey language also increases, in general, Chinese is
still the preferred survey language. As for answer language, at the top two highest English proficiency
level, there are more participants who prefer English than participants who prefer Chinese.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Figure 10. English Proficiency and Survey Language
Figure 11. English Proficiency and Answer Language
Language Preference in General
The final question about the participant’s language profile asks which language they prefer to
use, and why: 83 (54.6%) of the participants prefer Chinese, 31 (20.4%) prefer English, and 38 (25%)
indicates that they don’t have a preference. Participants with the lowest two English proficiency ratings
prefers non-English languages, including Mandarin Chinese, Cantonese, and Taiwanese. A chi-square
test was performed on the participants with higher English proficiency ratings to examine the relation
between English proficiency and the selection of English as their preferred language. The result was
significant, X2(4, N=133) = 31.176, p<0.001. A participant with higher English proficiency is more
likely to prefer English.
Looking at the data, it appears that participants’ language preferences are different from their
dominant language (Table 6). Although preferred language and dominant language do not always align,
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
a chi-square test shows that participants who has English as their dominant language are more likely to
prefer English as well, X2(2, N=152) = 40.675, p<0.001.
Table 4. Language Preference and Dominant Language Comparison
Language PreferenceTotalNot English English No Preference
Dominant Language
English Count 5 19 9 33% of Total 3.3 12.5% 5.9% 21.7%
Not Eng-lish
Count 78 12 29 119% of Total 51.3% 7.9% 19.1% 78.3%
Total Count 83 31 38 152% of Total 78.3% 20.4% 25% 100.0%
Kruskal-Wallis tests were conducted to determine whether a participant’s language exposure,
use, and amount of use are related to his/her preference toward English. The first Kruskal-Wallis test
was conducted to evaluate the effect of the number of years a participant has lived in the US. The result
shows that the longer a participant has lived in the US, the more likely the participant would lean to-
wards English as their preference.
A Kruskal-Wallis test was performed to determine whether there are differences in the amount
of time a participant has been using English daily and their language preference. The result shows that
the amount of time a participant has been using English daily has statistically significant impact on a
participant’s language preference, X2(2) = 22.336, p < 0.001.
A Kruskal-Wallis test was performed to examine whether there are differences in the percentage
of daily exposure to language between participants who prefer English, Chinese, and have no language
preference. The result shows that there are significant differences among the amount of daily language
exposures as expressed in percentages among participants with different language preferences, X2(2) =
52.828, p < 0.001.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
When asked about why they prefer one language over the other, participants cited reasons such
as “convenience”, “work environment”, and “Easy to use”. The results are clustered by the participant’s
choice of preferred language: Chinese, English, or Neither. Their statements were reviewed and catego-
rized. Statements with similar meanings were grouped together for clarity. An example of the coding
process is displayed in Appendix VIII Coding Framework. To ensure coder reliability, the coding
process was repeated twice: after the results were coded for the first time, the researcher reviewed the
raw data after a few days and coded them again without referencing back to the first set of results. The
two sets of codes were compared and, besides minor wording changes (“Integrating into culture” to
“Cultural assimilation”), are in accordance, indicating coding reliability. The results are shown in Table
5.
Table 5. Participant’s reasons for preferring one language.
Chinese as pre-ferred language
Fluency English as pre-ferred language
Fluency No specific preferences
Depends on purpose
Accustomed. Accustomed Depends on context
Better for express-ing thoughts
Better for expressing thoughts
Depends on audience
Frequency of use and exposure
Frequency of use and exposure
Familiarity Easier to typeEasier to compre-hend
Personal preference
Identify with cul-ture
Language skill main-tenance
Personal preference History of use
For participants who prefer one language over another, fluency, the habit of using a language,
the feeling that one language is better for expressing their thoughts, and the amount of use and expo-
sure to a language are important factors to language choice.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
A chi-square analysis was performed to see if cultural identification has any influence on the
participant’s language preference. A list of the cultures that the participants most identify with was ex-
tracted from the dataset. A second list of languages that each culture corresponds with, such as Chinese
for Taiwanese culture and English for North American culture, was composed. Cultures that do not di-
rectly correspond to a language, such as Christianity and Western culture, is omitted. Participants with-
out a preferred language are also omitted in this analysis. The results show that participants who iden-
tify most with a Chinese-based culture are more likely to prefer Chinese, X2(2, N=99) = 19.130,
p<0.001, although, as summarized in Table 8, more than half (56%) of the participants who identify
with English-based culture still prefer Chinese.
Table 6 Dominant language and the corresponding language to dominant culture
Dominant Culture Corre-sponding Language
TotalChinese EnglishDomi-
nant Lan-guage
Chinese Count 96 10 106% 90.6% 9.4% 100.0%
English Count 15 15 30% 50.0% 50.0% 100.0%
Total Count 111 25 136% 81.6% 18.4% 100.0%
Language Use Scenarios
Language uses in daily life. Participants were asked if they use English more than Chinese and
vice versa for the following tasks every day: at work (Work), watching TV (TV), on the Internet for
personal or recreational purposes (Internet), to communicate with friends (Friends), reading for fun
(Reading), to learn new things (Learn), to communicate with family (Family), and listening to the radio
(Radio). The results are shown in Table 7 and Figure 1. Chinese is more often used in personal
communications with family and friends: 130 of the participants (86%) uses Chinese more than English
to communicate with family, and 113 of the participants (75%) do so with friends. There are also
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
slightly more participants (75, 51%) who read Chinese materials more than English ones. For the rest
of the activities, more participants prefer English over Chinese. The most significant English
preference appears at work (98 participants, 64%), learning new subject (108 participants, 72%), and
Internet use (78 participants, 81%).
Table 7. Daily Activity and Language UseWork TV
English Chinese English ChineseCount
Per-cent
Count
Per-cent
Count
Per-cent
Count Percent
No 54 36% 103 68% 80 53% 93 61%Yes 98 64% 49 32% 72 47% 59 39%To-tal
152 100% 152 100% 152 100% 152 100%
Friends ReadingEnglish Chinese English Chinese
Count
Per-cent
Count
Per-cent
Count
Per-cent
Count Percent
No 114 75% 39 26% 80 53% 77 51%Yes 38 25% 113 74% 72 47% 75 49%To-tal
152 100% 152 100% 152 100% 152 100%
Family Radio
English Chinese English Chinese
Count
Per-cent
Count
Per-cent
Count
Per-cent
Count Percent
No 130 86% 22 14% 79 52% 111 73%Yes 22 14% 130 86% 73 48% 41 27%To-tal
152 100% 152 100% 152 100% 152 100%
Internet Learn
English Chinese English Chinese
Count
Per-cent
Count
Per-cent
Count
Per-cent
Count Percent
No 74 49% 134 88% 44 29% 111 73%Yes 78 51% 18 12% 108 71% 41 27%
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
To-tal
152 100% 152 100% 152 100% 152 100%
Work TV Internet Friends Reading Learn Family Radio0
20
40
60
80
100
120
140
Language Use in Different Settings and for Dif -ferent Purposes
English Chinese
Figure 12. Language Choice for Different Situations
The research questions asked in chapter 2 concerns most with Internet use. Statistic tests were
therefore completed to examine the relation between Internet language choice and variables including:
length of consistent daily English use (history of language use), language proficiency, and dominant
language choice.
Mann-Whitney tests were conducted to examine the relation between a participant’s using Eng-
lish more than Chinese on the Internet, and the amount of English he/she is exposed to on a daily basis,
as well as his/her history of language use. The results show that neither of the variables have statisti-
cally significant impact on the participant’s Internet use language. Amount of daily English exposure
(U = 2824, p = 0.819) and the number of years a participant has been exposed to English daily (U =
2475.5, p = 0.371) does not influence the participant’s Internet language choice, either. As Figure 13.
Language Use Online and the Number of Years Living in USand Figure 14. Language Use Online and
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Daily Language Exposure demonstrates, there is no clear pattern between the language exposure vari-
ables and the preference of English as the online language.
Figure 13. Language Use Online and the Number of Years Living in US
Figure 14. Language Use Online and Daily Language Exposure
Similarly, a Mann-Whitney test also shows English proficiency as non-influential to online lan-
guage use (U = 2475.5, p = 0.114).
User’s attitude and perception of languages are examined next. Chi-square test performed on
dominant language and the comparison of English and Chinese use shows that the relation between the
two variables was not significant, X2(1, N=152) =1.456, p=0.228. The likelihood of a participant using
English more than Chinese online does not increase or decrease based on their dominant language. It is
in fact quite even between the English dominant and the Chinese dominant participants, as Figure 15.
Domain Language and Online Language Preference Comparison illustrates.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
However, there is significant relation between a participant’s language preference for the Inter-
net and his/her actual language use online, X2(2, N=152) =8.393, p=0.015. The relation is not entirely
clear cut, however. Figure 16. Language Preference in General and Online Language Preference Com-
parison shows participants who prefer Chinese in general are more likely to prefer Chinese online. Par-
ticipants who do not have a preference between Chinese and English for general use are more likely to
prefer English for Internet use. Participants who prefer English for general use are evenly split between
preferring English online and preferring Chinese online.
Figure 15. Domain Language and Online Language Preference Comparison
Figure 16. Language Preference in General and Online Language Preference Comparison
Language use and the Internet. All the participants reported spending at least an hour on the
Internet every day, and 46 (30.3%) spending more than six hours. (Figure 5)
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Less than 1 hour 1-2 hours 3-4 hours 5-6 hours More than 6 hours
0
10
20
30
40
50
60
0
22
57
27
47
Daily Internet Use
Figure 17. Daily Internet use
Participants are asked about their language uses for the following online activities: do work
related research, shop, do leisure and personal interest research, catch up on news, and use social
media. They are asked to rate the frequency they engage in each of the activities in Chinese and/or
English in a five-point Likert scale: 1- never, 2- rarely, 3- occasionally, 4- frequently, and 5- always.
Every participant uses a mix of both languages at certain points; none of them used one
language exclusively for all of the listed activities. Most of the participants (106, 69.7%) use both
Chinese and English in every activity, although not equally. A smaller group (46, 30.3%), uses one
language for specific occasions, such as using only Chinese on Social Media, or using only English at
work. Table 8 is a count of the number of times an activity-language pair was used by a participant.
Each activity-language pair are used by over 85% of the participants. The ones with the least
use are online shopping-Chinese (average rating 2.8), and work-related research-Chinese (average
rating 2.98). These two activity-language pairing received the lowest frequency ratings. The other
activity-language pairings are used by 96% to 99% of the participants. Table 9 contains the complete
count of ratings and activity-language pairing, further elaborating the results presented in Table 8.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Table 8 Participant Online Activity-language Use Summary
Activity Work Related Research
Shop Personal or Recreational
Research
News Social Media
Language Eng-lish
Chi-nese
Eng-lish
Chi-nese
Eng-lish
Chi-nese
Eng-lish
Chi-nese
Eng-lish
Chi-nese
Number of partici-pants
146 134 141 130 150 146 150 149 150 146
Percent-age
96% 88% 93% 86% 99% 96% 99% 98% 99% 96%
Table 9 Participant Online Activity-language Use
Work Related Re-search - English
Work Related Re-search - Chinese Shopping - English Shopping - Chinese
Frequency Percent Frequency Percent Frequency Percent Frequency Percent1 6 4% 18 12% 11 7% 22 14%2 10 7% 39 26% 26 17% 49 32%3 37 24% 34 22% 27 18% 36 24%4 40 26% 50 33% 42 28% 28 18%5 59 39% 11 7% 46 30% 17 11%
Total 152 100% 152 100% 152 100% 152 100%Avg. Rating 3.89 2.98 3.57 2.80
Personal or Recre-ational Research - English
Personal or Recre-ational Research - Chinese News – English News - Chinese
Frequency Percent Frequency Percent Frequency Percent Frequency Percent1 2 1% 6 4% 2 1% 3 2%2 23 15% 18 12% 24 16% 15 10%3 35 23% 30 20% 42 28% 38 25%4 59 39% 74 49% 54 36% 64 42%5 33 22% 24 16% 30 20% 32 21%
Total 152 100% 152 100% 152 100% 152 100%Avg. Rating 3.64 3.61 3.57 3.70
Social Media - Eng-lish
Social Media - Chi-nese
Frequency Percent Frequency Percent1 2 1% 6 4%2 20 13% 13 9%
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
3 54 36% 22 14%4 48 32% 79 52%5 28 18% 32 21%
Total 152 100% 152 100%Avg. Rat-ing 3.53 3.78
Mann-Whitney tests and Kruskal-Wallis tests were carried out to examine the effects on
language-activity pair rating by user’s dominant language, preferred language and English proficiency.
The relation between one’s dominant language and the frequency of using English for an online
activity is examined using the Mann-Whitney test because of the non-normality and the differences in
distribution of the data. The results, summarized in Table 10. Dominant Language and the Frequency of
Using English for Online Activities, are significant for each of those activities and dominant language.
Participants with English as their dominant language uses English more frequently than participants
with Chinese as their dominant language for work, shop, personal reasons, news, and social media.
Table 10. Dominant Language and the Frequency of Using English for Online Activities. Mann-
Whitney Test Result.
Work Related Research Shopping
Personal/Recreational
Research News Social NetworkingMann-Whitney U 1258.000 1046.500 954.000 856.500 943.500
Wilcoxon W 8398.000 8186.500 8094.000 7996.500 8083.500
Z -3.308 -4.226 -4.715 -5.151 -4.765
p 0.001 0.000 0.000 0.000 0.000
Dominant Language N Mean Rank
Work Related ResearchChinese 119 70.57English 33 97.88Total 152
Shopping Chinese 119 68.79
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
English 33 104.29Total 152
Personal/Recreational ResearchChinese 119 68.02English 33 107.09Total 152
NewsChinese 119 67.2English 33 110.05Total 152
Social NetworkingChinese 119 67.93English 33 107.41Total 152
The relation between one’s language preference for online text document (as obtained through
participants’ survey language choices) and the frequency of using English for an online activity is
examined using Kruskal-Wallis test, grouping the participants into participants who prefer English,
participants who prefer Chinese, and participants without a language preference. The results,
summarized in Table 11. Preferred Language and the Frequency of Using English for Online Activities,
are significant for each of those activities and dominant language. Participants who prefer English use
English more frequently than participants who prefer Chinese for work, shop, personal reasons, news,
and social media.
Table 11. Preferred Language and the Frequency of Using English for Online Activities
Work Related Research Shopping
Personal/Recreational Research News
Social Networking
X2 33.574 14.765 41.877 56.859 43.884df 2 2 2 2 2p 0.000 0.001 0.000 0.000 0.000
Preferred Language N Mean Rank
WrkFrqEng
Chinese 83 58.77English 31 103.18
No Preference 38 93.47Total 152
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
ShopFrqEng
Chinese 83 65.15English 31 97.87
No Preference 38 83.86Total 152
FunFrqEng
Chinese 83 56.61English 31 106.34
No Preference 38 95.61Total 152
News-FrqEng
Chinese 83 53.22English 31 111.27
No Preference 38 98.97Total 152
SMFrqEng
Chinese 83 56.28English 31 108.24
No Preference 38 94.76Total 152
A Kruskal-Wallis test was conducted to see if language proficiency has any impact on language
choice and online activities. The results, summarized in Table 12. Language Choice for Internet
Activity and Language Proficiency, are significant for every activity. Participants with higher English
proficiency use English more frequently for their online activities. Figure 18. Amount of
Personal/Recreational Research Conducted in English Clustered by English Proficiencygives an
example of how personal/recreational research relates to language proficiency. Participants with lower
English proficiency uses English to conduct personal or recreational research only rarely, if ever. This
trend is also seen in the other four Internet activities.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Figure 18. Amount of Personal/Recreational Research Conducted in English Clustered by English
Proficiency (1 – lowest, 5 – highest)
Table 12. Language Choice for Internet Activity and Language Proficiency
Work Research Shopping Personal Research News Social NetworkingX2 54.048 36.232 49.600 56.066 57.074df 4 4 4 4 4p .000 .000 .000 .000 .000
The correlation between the number of years a participant has lived in the US and his/her
language preference for each of the Internet activity was sought. The activity-language pairs have mild,
positive relationships with the number of years a participant has lived in the US. The results are
summarized in Table 13. Number of Years Living in the US and Conducting Internet Activity in
English.
Table 13. Number of Years Living in the US and Conducting Internet Activity in English
Work Research - English
Shopping - Eng-lish
Personal Research - English
News - English
Social Networking - English
Pearson Corre-lation
.490 .576 .434 .481 .410
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
p .000 .000 .000 .000 .000
Likewise, the correlation between the amount of daily English use and his/her language
preference for each of the Internet activity was examined. The activity-language pairs have stronger,
positive relationship with the amount of daily English exposure. The results are summarized in Table
14. English Daily Exposure (in Percentage) and Conducting Online Activity in English.
Table 14. English Daily Exposure (in Percentage) and Conducting Online Activity in English
Work Research - English
Shopping - Eng-lish
Personal Research - English
News - English
Social Networking - English
Pearson Corre-lation
.525 .565 .560 .661 .571
p .000 .000 .000 .000 .000
Language use online and why. The final question before the article selection exercise asks
participants for how decide what language to use online. The answers are reviewed, and the Chinese
answers translated into English. The answers are then categorized into groups, and coded. The coding
process follows the coding framework outlined in Appendix VIII. For example, “Depending on what
my search topic is” and “題材” are coded into “Subject matter”. The coding process was repeated
twice to ensure coder reliability. The results are presented in Table 15. Criteria for online language
choice.
Table 15. Criteria for online language choice
The purpose of the activity: whether it is to shop, to catch up on news, for entertainment, or for professional research.
Subject domain of the desired information, such as science, news, or tabloid.
Communication partner: what language would be used by the partner with whom the user is or may be communicating.
The setting: whether it is social networking, communicating with family, or for work.
Language proficiency.
Personal preference.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Habit of use.
Convenience level judged by ease and speed of use.
The language’s ability to express the user’s thoughts clearly and accurately.
The language used by the target website.
The cultural or geographical source of the information.
Information availability: whether the information would be available in English or Chinese.
Language of the computer system interface.
The credibility and authenticity of information as expressed in one language.
Input ability: the participant’s ability to type in a language.
The level of interest in the presented information.
Article Selection Exercise
Article Selection Results
In this section, participants choose from the Chinese and English versions of the same story the
version they prefer over the other. The results are summarized in Table 16.
Table 16. Article Selection Result
Article Title
Article Source Language
English Version Location
Subject Mat-ter
Chinese Selected
English Selected
The'Coming Home' of Zhang Yi-mou and Gong Li Chinese Left Entertainment 134 (88%) 18 (12%)
Cultural Revolution Nostalgia Chinese Right History 112 (74%) 40 (26%)
Home/Work | Alice Temperley's British Country House English Right
Interior de-sign 61 (40%) 91 (60%)
China's Monroe Doctrine Chinese Left Political 88 (58%) 64 (42%)
Your'Game of Thrones' Ques-tions, Answered English Left Entertainment 56 (37%) 96 (63%)
"Pinocchio Rex," China's New Dinosaur English Right Science 82 (54%) 70 (46%)
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
China Further Restricts Foreign Dairy Brands Chinese Right Local news 102 (67%) 50 (33%)
Searching for Meaningful Mark-ers of Aging English Right Science 59 (39%) 92 (61%)
The article selection results and the possible impact of the location of the language versions,
subject matter, the spoken language of the location in which the story originated (article source
language) are examined.
Article presentation order. An independent samples t test was performed comparing the mean
language selection results of articles with the English version presented on the right or on the left. The
article location has no effect on the Chinese version being select, t (6) = 0.435, p =0.679, or on the
English version being selected, t (6) = 0.816, p =0.446. There is no relation between a participant’s
selection and the order in which the language versions are presented.
Subject and story source. The results from the two entertainment stories shows no strong
relation between the subject of the article to the participant’s language choice. The first story is about a
Chinese movie by the Chinese director Chang Yimou, for which 134 of the participants (88%) selected
the Chinese version, and 18 (12%) the English version. The second story is about the American TV
series Game of Thrones, for which 56 participants (37%) selected the Chinese version and 96 (63%)
selected the English version. Same can be said of the science articles; 82 participants (54%) selected
the Chinese version for the first article, and 59 (39%) for the second article. The results appear to be
more closely linked to the language spoken in the geographic origin (source language) of the story as
demonstrated in Figure 19. Article Selection Result and Information Source Language..
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Chinese Chinese Chinese Chinese English English English English0
20406080
100120140160
134
112
88102
61 56
82
59
18
40
6450
91 96
70
92
Article Selection Result by Source Language
Number Select Chinese Number Select English
Source Language
Figure 19. Article Selection Result and Information Source Language.
Note: Articles are reordered here to highlight the possible influence of article source language. Articles with Chinese source are grouped to the left, and articles with English source are grouped to the right.
Independent sample t-tests were performed to see if the information source language has any
influence on the article selection outcome. The result shows an effect of the source language on the
number of Chinese articles being selected, t (6) =3.922, p =0.008. For stories with Chinese origins,
such as the article Chinese milk powder, the Chinese version were chosen more times than the English
version. The result also shows an effect of the source language on the number of English articles being
selected, t (6) =2.481, p =0.048. For stories with English origins, such as the article about the British
interior designer, more English versions were chosen (M = 87.25, SD = 11.7) than the Chinese version.
Language use, exposure, and cultural identification. The effect the following variables have
on the article selection results are also examined: history of language use, as measured through the
length of time a participant has been speaking English daily, the length of time the participant has lived
in an English-speaking country, and cultural identification.
Pearson’s correlations found a moderate, positive relationship between the number of years a
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
participant has lived in the US and the number of English version content they prefer, r(150) = 0.358,
p<0.001.
Pearson’s correlations also indicate moderate, positive relationships between the amount of
daily English exposure and number of English articles selected (Figure 20. Amount of Daily English
Exposure and the Number of English Articles Selected), r(150) = 0.370, p<0.001, and the length of
time a participant has been using English daily and the number of English articles selected, r(150) =
0.373, p<0.001. In general, the longer a participant has used English on a daily basis, the more they use
English every day, the more English version articles he/she is likely to choose.
Figure 20. Amount of Daily English Exposure and the Number of English Articles Selected
However, a Kruskal-Wallis test shows that cultural identification of English-based, Chinese-
based, and religion-based cultures does not have significant impact over the language versions
participants choose, X2 (2, N = 152) = 2.267, p = 0.322. To test the impact of cultural identification, the
language each culture corresponds to, English for American Culture for example, are used to run the
tests.
Dominant language, language preference and language proficiency. The relation between
number of English versions chosen and the participant’s English proficiency and dominant language
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
are examined.
A Mann-Whitney test shows that the participant’s dominant language has significant impact on
the participant’s article selection outcome, U = 950, p <0.001. A participant who identifies English as
his/her dominant language selected more English version articles in the article selection exercise (Mdn
= 5) than Chinese dominant participants (Mdn = 3).
A Kruskal-Wallis test shows that one’s English proficiency has significant influence on the
person’s article selection outcome, X2 (4, n = 152) = 31.561, p <0.001. As Figure 21 and Figure 22
demonstrates, participants who selected a higher number of English version articles are more likely to
have higher English proficiency levels.
Figure 21. Number of English Articles Selected and English Proficiency Level
Figure 22. Average Number of English Version Articles Selected and English Proficiency Level
Language preference. The survey used in this study collects participants’ language preferences
from three sources: the participant’s chosen language to continue the survey (survey language), the
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
language the participant uses to answer the questions (answer language), and participant’s answers.
A Mann-Whitney test found significant relationship between the survey language and the
number of English version articles chosen in the article selection exercise, U = 1003.5, p<0.001.
Participants who chose English for the survey language chose more English versions (Mnd = 5) than
participants who chose Chinese for the survey language (Mnd = 3). Mann-Whitney test also found
significant relationship between the answer language and the number of English version articles chosen
in the article selection exercise, U = 1594.5, p<0.001. Participants who use English to answer questions
on average select more English versions (Mnd = 5) than participants who use Chinese to answer
questions, (Mnd = 3).
Participants are asked about their language preferences twice in the survey. In the language
profile section participants are asked which language do they prefer in general. The second time is after
the article selection exercise, and about the language participants prefer to read online, text-based
documents. The two language preferences differ in the context in which the languages are used,
however, a chi-square test shows that the two are related X2(1, N=152) =30.099, p<0.001. Participants
who prefer to use English in general are more likely to prefer English for the articles.
A Kruskal-Wallis test shows that participant’s language preference for general use has
significant impact on the number of English articles they choose, X2(2, N=152) =31.169, p<0.001.
Participants who prefer English are more likely to choose the most English version articles (Mdn = 6)
than those who has no language preferences (Mdn = 4), and even more so than participants who prefer
Chinese (Mdn = 3).
After the article selection exercise, participants are asked about their language preference for
text-based, online documents in general (online language preference). Participant answers are reviewed
categorized into four groups: English, Chinese, depending on situation, and no preferences. A chi-
square shows that the online language preference and general language preference are related X2(4,
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
N=152) =51.096, p<0.001. Participants who prefer to use English online are more likely to also prefer
English in general; participants who has no general language preference are more likely to also have no
language preference for online use. On close examination, however, it appears that though the two
types of language preferences are closely related, online language preference does not completely align
with general language preference, as illustrated in Table 17. There are slightly less participants who
have no general language preference (25%) than online language preference (30%).
Table 17. A Cross Comparison of General Language Preference and Online Language Preference
Online Language PreferenceTotalChinese Depends English
General Language
Prefer-ence
Chinese Count 60 17 6 83% within General Preference 72.3% 20.5% 7.2% 100.0%% within Online Preference 75.9% 37.0% 22.2% 54.6%
% of Total 39.5% 11.2% 3.9% 54.6%English Count 7 8 16 31
% within General Preference 22.6% 25.8% 51.6% 100.0%% within Online Preference 8.9% 17.4% 59.3% 20.4%
% of Total 4.6% 5.3% 10.5% 20.4%No Prefer-ence / De-
pends
Count 12 21 5 38% within General Preference 31.6% 55.3% 13.2% 100.0%% within Online Preference 15.2% 45.7% 18.5% 25.0%
% of Total 7.9% 13.8% 3.3% 25.0%Total Count 79 46 27 152
% within General Preference 52.0% 30.3% 17.8% 100.0%% within Online Preference 100.0% 100.0% 100.0% 100.0%
Post Article Selection Survey
Difficulty selecting language. After the article selection exercise, participants are asked a series
of open ended questions that ask them to reflect upon the article selection and decision making process.
The first question asks if it was easy for the participants to choose between the two languages when
they were presented with the excerpts and why. 132 of the participants (86.8%) thought it was easy, 14
(9.2%) thought it was hard, and 6 (3.9%) thought it was neither easy nor hard.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
For the open-ended part of the question, after the English answers were translated, the complete
set of answers were reviewed, and clustered by whether the participant thinks selecting a language
version of the same article was easy or difficult. The answers are then coded following the coding
framework in Appendix VIII, and presented in Table 12. The coding process was repeated twice to
ensure coder reliability.
Table 18. Is it easy or hard to choose between different language excerpts?Easy I use my mother tongue.
I have a strong language preferenceI am more used to one of the languages.I am intuitively drawn to one language.Depends on my comfort level with the languageI can read both languageDepends on the topic of the article.Whichever is faster to read.Based on effort required to read the articleWhen the subject is unfamiliar or there are difficult terms, I use the language that I am more better at.The language in which I learned the technical terms.Whichever catches my eye.Whichever was on the left.I read whichever version seems shorter.I prefer the original version of the story, not the translationWhichever has the easier sentence structure and vocabularyBased on the logic, structure, and clarity of the article.Depends on the quality of the writing/translationBased on which version seems more accurate and/or precise
Difficult Both version tell the same story so it is difficult to choose.The story is neither Chinese or American culture related.I can read both language.
Language preference for news excerpts. The participants were asked if they have a preferred
language in general when they were presented with the news articles. Of the 152 participants, one
participant did not answer all of the questions and is therefore not included in the data analysis for this
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
section. The majority of the 152 participants have clear preference between Chinese or English
versions: 79 (52.3%) participants prefer the Chinese version, and 26 (17%) participants prefer the
English versions. There is also a group of participants that either has no strong preference, or that their
selection process involves more than the consideration of language in use. The result is presented in
Table 19. Language preference for the news article excerpts. and Figure 23.
Table 19. Language preference for the news article excerpts.
Frequency Percent Cumulative PercentChinese 79 52.3% 52.3
English 26 17.2% 69.5
Depends 33 21.9% 91.4
No preference 13 8.6% 100.0
Total 151 100.0%
52%
17%
22%
9%
Chinese English Depends No preference
Figure 23. Pie chart - language preference for the news article excerpts.
A chi-square test of independence was performed to examine the relation between a
participant’s dominant language and language preference for the news article excerpts. The relation
between these variables was significant, X2(3, N=151) = 31.336, p<0.001. Participants who view
English as their dominant language are more likely to prefer English for the news article excerpts.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Kruskal-Wallis tests were completed to investigate the possible effects of daily language use
and the duration participants have been using English daily on the article language preferences. The
results are statistically significant. Participants who are using English more everyday are more likely to
prefer English for online documents, X2(3, N=152) = 34.094, p<0.001. Participants who have been
using English daily for a longer period of time are also more likely to prefer English for online text
documents, X2(3, N=151) = 15.827, p = 0.001.
Finally, the impact of language proficiency is examined. A Kruskal-Wallis test found that
English proficiency level and language preferences for Internet use are statistically related, X2(4,
N=152) = 31.561, p<0.001. A closer look of the two variables using the pie chart in Figure 24 shows
that the impact of language proficiency is not evident until the highest proficiency level.
Figure 24. Pie Chart - Preferred Language and English Proficiency
Note: English proficiency level is measured by a five-point Likert scale: 1 is the lowest level: able to recognize words and phrases; 5 is the highest level: equivalent to native born speakers.
The participants provided their thoughts on language preferences in the open-ended portion of
the question. Their answers were reviewed, clustered, and coded following the coding framework
outlined in Appendix VIII. The coding procedure was completed twice to ensure coder reliability. The
result is shown in Table 20. Language preference reasons.
Table 20. Language preference reasons.English Chinese Depends
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Higher language skill/fluencyHigher language skill/fluency
Original, not translated, version is better
Faster to read Faster to read Quality of the writingHigher amount of use and expo-sure Mother tongue Whichever is easier to "absorb"Exposure and familiarity to the subject in English
Higher reading com-prehension
The original language for proper nouns and technical terms
Accustomed to the use of Eng-lish Higher comfort level Subject and contentBetter understanding of unfa-miliar terms and concepts
Use it for unfamiliar terms and concepts
On screen location of the article (left or right)
To avoid translation error Less mental effortThe geographical location of the story
Convenience.Difficulty of and/or familiarity with the subject.Who the information will be shared with
“Which language draws you first.” The participants are asked if they are drawn to one
language more so than the other when the articles are presented to them, and if so, what are the reasons.
The participants’ answers were reviewed, clustered and coded following the coding framework
presented in Appendix VIII. The coding process was repeated once after the initial effort to ensure
coder reliability. The result is presented in Table 21.
Table 21. Why one language appeals to you first.
Instinct-Chinese Instinct-English Neither
Faster to read Faster to read Scan the title and make judgement
More frequently used More frequently used Look for jargonsUsed to using it Used to using it Judge by subject matterLives in a Chinese speaking environment
Lives in an English-speaking envi-ronment for a long time
Read the one on the left first
Mother tongue To improve English reading skill Article difficultyMore proficient in Chinese More proficient in English SubjectMore effortless to use Easier to grasp meaning Font sizeSpots Chinese proper name Spots English proper name Knowledge and familiarity of a subject was acquired in Eng-lish
Knowledge and familiarity of a subject was acquired in English
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Leads to better reading com-prehensionScan the Chinese version title for clue on which version to peruseThe Chinese version appears shorter
Additional Thoughts
Finally, the participants were asked to share any additional thoughts they have regarding the
two languages and their uses. Their responses are reviewed and excerpted below:
I was unaware that I actually prefer English articles over Chinese. The language in which one learns a new subject heavily influences one’s language preference
when encountering the subject in the future. I lean towards using English because I need to improve my English proficiency. I use Chinese to help me learn English. If I need to memorize something, I use Chinese. Language selection is about social setting and interaction. But if the other person I am
communicating with someone who can speak both Chinese and English, I would use my dominant language.
It is harder for me to type in Chinese so I will use English on the Internet. Culture is an important part of language use. Some concepts can be better expressed in one language. Different languages sometimes embed different viewpoints. Frequently, the original writing is better than the translated version. Translations are sometimes
awkward and inaccurate. Mother tongue always dominates. Language is a tool. When to use it and which one to use depends on the situation and the
language’s ease of use. Pronouns and names are better in their original language. There is different beauty in different languages. The longer one lives in the US, the higher the possibility of preferring English. My Chinese has degraded but my English is not perfect. I am stuck in between.In the next chapter, the results of the data analysis are used to answer the research question.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Chapter 6. Discussion
In this chapter, the data presented and analyzed in the previous chapter are examined and
discussed in reply to the research questions introduced in chapter 2. Before the analysis is a brief
review of the research question and method. The rest of the chapter is structured to address one
research question at a time, followed by additional thoughts and insights from the data.
Research Question and Method Review
This research is built upon the fields of cross language information retrieval (CLIR),
information seeking behavior, and bilingualism. Reviewing existing CLIR and multilingual user
information seeking studies, it is evident that although efforts are made to understand bilingual
information users, there is still much we do not know. Most of the previous research on bilingual users
draw participants from the academia, and are related to uses of current information retrieval systems or
CLIR systems in testing. These studies found that language choice for Information seeking online are
mostly influenced by language proficiency; natures of the search task such as to do academic research
or to lookup local transportation information; and subject matters such as history, or science.
While enlightening, it bears pointing out that existing literature are rooted in current systems
and available information resources. Most of the studies center on the information seeking behavior of
bilingual or multilingual speakers, other impact factors recognized in bilingualism, namely the purpose
of the language use, language attitude, and the amount of language use have not been examined in the
information seeking context. This dissertation examines how information seeker interacts with
language and digital information resources and asks the question: “What elements within a user’s
language profile influences his/her language choice for digital text documents?” Five variables are
observed for potential impact on user’s language choice, and five assumptions are made:
1. Language attitude: The likelihood of a bilingual speaker chooses L2 increases when he/she
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
indicates a preference for L2.
2. Language exposure: The longer a bilingual speaker is exposed to an L2 environment, the
longer they have been actively using the language, and the more likely he/she would choose
L2.
3. History of language use: The longer a bilingual speaker has been using L2, the more likely
he/she would choose the L2 versions.
4. Language proficiency: The more proficient a bilingual speaker is with L2, the more likely
he/she will choose L2 for digital information resources.
5. Subject matter: The less familiar a bilingual speaker is with a subject matter, the more likely
he/she will choose L1 for information regarding that subject.
Data were collected using a modified LEAP-Q survey developed at Northwestern University to
build a language profile for the participant, and an article selection tool developed by the current
researcher to observe the participant’s reaction to different languages. Unlike other studies, this current
research did not ask users to perform information seeking task. The article selection exercise is used to
stimulate users’ thoughts and observe its result. Participants’ language choice is obtained through
behavior: Which language was chosen as dominant language? Which was used for the survey? What
language did the participant type their answers in? How many Chinese vs. English version of the article
did the participants select?
In total, 152 complete surveys were collected, and the data were analyzed in the previous
chapter. The results are discussed below to examine the effect of the five potential impact factors.
Language Attitude
The first assumption states that the likelihood of a bilingual speaker chooses L2 increases when
he/she has a favorable attitude towards L2.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Result
The results of the survey indicate strong relations between the number of English versions of
the articles selected and a participant’s (a) survey language choice, (b) answer language choice, (c)
general language preference, (d) language preference for text-based, online documents, and (e)
dominant language. Participants who selected more English version articles than Chinese version
articles are found to be more likely to: have English as their dominant language, prefer to use English
on a daily basis and online, chose English as their survey language and/or answer language, and
indicated a preference of English for the online text documents.
There was no significant relation between cultural identification and the article selection results.
The language spoken within a participant’s most identified culture has no impact on the article
selection outcome.
Discussions and Implications
As the previous paragraph states, the survey result finds language attitude, specifically
dominant language and language preference, and language choice to be related. Given all conditions
equal, when a person prefers a language, the person is more likely to choose contents written in that
language. However, the relation is not an absolute. Before we continue, let us first examine the impact
and interplay among dominant language, and the various iterations of language preferences.
Dominant language. Dominant language is defined in this study as the language that a person
intuitively uses first in his/her daily life. Although highly influenced by one’s first language, dominant
language can be a language acquired later in life. In fact, 25% of the participants in this study has a
dominant language that is not their first language. Dominant language is often the language that a
person is more exposed to on a daily basis. This could be a result of the person’s living environment,
such as living in an English-speaking country, or for work, such as working as a translator. The longer a
person is exposed to the language daily and consistently, the more likely the person would adopt it as
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
the dominant language.
While the dominant language is usually the language a person is most fluent in, it becomes
more unpredictable when the person is equally fluent in more than one languages. In these cases, other
factors become more influential. Factors beyond language fluency include: length and amount of
language exposure, frequency and amount of language use, and language preferences (discussed in
detail in the next section). This study has found several observable trends relating the above listed
factors to participant’s language using behavior. For example, participants with English as dominant
language are more likely to have lived in an English-speaking country longer, and has higher English
proficiency. In the article selection exercise, we see that participants with English as dominant language
are more likely to choose more English articles than participants with Chinese as dominant language. In
general, we see that when a participant is English dominant, he/she tend to have been exposed to
English more, use English more, and prefer English in different situations. The only exceptions occur
with online activities. Data analysis in the previous chapter showed that having English as dominant
language or having more English exposure do not increase or decrease a participant’s likelihood of
using English more than using Chinese online. This could imply a conscious choice of language when
faced with the use of an online information resource. Other research showed that information seekers
often favor English as the search language (Artiles, et al., 2006; Steichen, et al., 2014). English might
also be viewed as the overall dominant language online with the most information resources available
by the current study’s participant, leading many of them to intentionally choose English despite their
attitude towards or history with English.
Language preferences. Another way to examine the relationship between a person and
languages is to view it through language preferences. Steichen et al. (2014) found personal preference
to be a major impact factor to language use, but did not elaborate on the nature of personal preference. I
define language preference as a more conscious language choice whereas dominant language is an
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
intuitive one. A preferred language is one that a person likes to use better. It may not be a language that
is easy for the person, but he or she would deliberately choose to use it when the situation allows. As
with dominant language, language preferences are related to many different variables, including
language proficiency, dominant language, cultural identification, and finally, the number of English
version articles a participant chooses. The relations between language preferences and the variables are
not causal nor absolute but point to trends and possibilities. For example, a bilingual speaker with
English as dominant language is more likely to prefer English; a person who identifies more with
English-speaking cultures is more likely to prefer to use English. English preference, at least in the
written form, is moderately related to participants’ language choices for online activities (Table 11.
Preferred Language and the Frequency of Using English for Online Activities). For this study, we also
see that participants who prefer to use English online are more likely to choose more English version
articles.
Another difference between dominant language and preferred language is that language
preference is a multi-layered construct. In this study, six language preferences were identified and
examined: spoken language, written language, a passive language preference for reading the survey
(survey language), an active language preference for answering the questions (answer language), a
language preference for daily language use (general language preference), and a language preference
for online activities (online language preference).
Although the different preferences may be related to each other, they are each distinct and not
always strongly correlated. For example, although the chi-square test performed in the previous chapter
find survey language and answer language to be related to each other, they do not correlate completely.
As a matter of fact, over half (56%) of the participants who chose Chinese as the survey language used
English to answer the questions. The difference between survey language used for written text, and
answer language used to answer questions, are worth contemplating over. With reading a passive
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
language skill and writing an active one (Baker, 1998), reading requires less cognitive load and mastery
of active vocabulary. In other words, it is easier for someone to read a document written in a second
language than to write in it. Yet here are many native Chinese speakers who prefer Chinese as the
survey language using English to answer questions. While language proficiency does play a role in the
deciding factor of which language to use, three participants’ comments on the issue shone some light
onto the discrepancy. These participants chose English as the answer language because, as residents of
the US, they do not type in Chinese as frequently and efficiently. Furthermore, similar to what Steichen
et al. (2014) found with their subjects, some of the participants’ US-bought computer keyboards do not
come with Chinese input symbols, making it difficult for them to use Chinese as the input language.
Consequently, even though they chose Chinese as the reading/survey language, they resort to English
as the input/answer language.
The discrepancy between preferences for active and passive languages exists in participants’
choices for spoken and written languages as well. In the survey, participants are asked to choose one
language to converse with a partner whose language skills are equal to theirs. In a separate question,
the participants are asked to choose one language for a written document. For both questions,
participants either chose Chinese, English, or no preference. Most of the participants (89.5%) who
chose Chinese as their written language also prefer Chinese as their spoken language. Yet of the
participants who prefer English as their written language, only a little less than half (48.9%) also prefer
English as their spoken language. As with the difference between answer and survey language,
participants have provided some ideas as to why we are seeing this difference. Some participants find it
easier to fully communicate their thoughts with one of the languages. Some participants like to
challenge themselves with the language they are least proficient in for reading materials. One
participant point to cultural assimilation, or rather the inadequate degree of cultural assimilation, that
pushes him to choose to speak in his native language rather than a second language. Indeed, there are
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
many variables that influence language use and language preferences, many of which are not easily
foreseen.
The relationship between dominant language and the different language preferences are
similarly not in perfect coordination albeit somewhat related. Although the likelihood of someone
preferring one language increases if that language is also their dominant language, almost a third of the
participants did not follow the pattern. Most of these participants (83.33%, 25% of all participants)
have a dominant language, but could not decide on a preferred language. For the participants without a
strong preference over languages, the context, purpose, and communication partners are impact factors
to what language they will choose. For them, while dominant language is a personal choice, language
choice is complicated, and often not a solitary decision. It depends on the purpose and context of the
language use, as well as the social situation. Who are they communicating with? What will they
communicate about? Is there a specific audience they should prepare to share the information with?
Dominant language and language preferences are not straightforward conceptual constructs.
They are nuanced, distinct, and influence or are influenced by many different variables. While language
proficiency and the purpose of the language use are two of these variables that impact language
dominance and preference, they do not account for all.
Summary
This study has found that dominant language and preferred languages are different but each can
be used to predict the general outcome of a participant’s article selection result. Participants who are
English dominant and prefers English are more likely to conduct online activities online, and select
more English version articles. However, dominant language and preferred language are complicated
constructs with many facets. Participants have noted how different languages can elicit different
responses, and that language has an affective aspect that they respond to.
In CLIR and MLIR, a person’s language choice is often viewed as a reactive decision: An
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
information seekers choose a language based on the given conditions of their language proficiencies,
the nature and purpose of the information seeking tasks, and the domain in which the research is
conducted (i.e. Steichen et al., 2014). It is never the user’s choice to decide, based on the user’s likes or
dislikes on what language to use. I propose that MLIR and CLIR should be expanded to accommodate
user’s dominant language and language preferences for the dominant language may be the most
comfortable and efficient for the information seeker to use, and the preferred language with the most
affective reward. Over and over, participants cited “convenience”, “familiarity”, and “better article
flow” as their reasons to choose one language version of the same information content over another
version. Some of the reasons that were given, “a more beautiful language” for example, are subjective
and sentimental, different from the practical considerations of search task purpose and language
proficiency but should nevertheless be considered.
Deliberate consideration should be given to the different types of language preferences for
various language use. Spoken and written, active and passive, people respond to these language choices
differently. If we are to truly understand how information seekers use language in information seeking
situations, dominant language and language preferences should be viewed as important factors of a user
profile, and be taken into account when studying bi- or multi-lingual speakers’ information seeking
behaviors.
Language Exposure and the History of Language Use
The second assumption states that the longer a bilingual speaker is exposed to an L2
environment the more likely he/she would choose L2 versions. The third assumption states that the
longer a bilingual speaker has been using L2, the more likely he/she would choose the L2 versions.
Result
The results found that the number of years living in an English-speaking country, the amount of
daily English exposure, and the length of daily English use have positive impact on the number of
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
English articles that a participant chooses.
Discussion and Implications
Language exposure and history of language use are discussed here together because although
they are different concepts, they are statistically correlated, strongly and positively. Viewing the two
variables side by side provides a more complete picture of a person’s language profile.
Language exposure. The two indicators of language exposure examined in this study, the
number of years one has lived in US and the amount of English exposure they experience in their daily
environs, are moderately strong, and positively related.
Language exposure is the amount of time that a person is exposed to a language, either by
actively involved in conversations or passively through the surrounding environment. Someone who
does not live in an English-speaking environment may still be exposed to English through the media or
the use of the Internet. On the other hand, a person who lives in an English-speaking environment
might insulate him-/herself with Chinese material and only converse in Chinese. Even so, it is difficult
to not come into contact with English material outside of his/her immediate surrounding. Language
exposure is therefore observed through both the number of year a participant has been living in the US,
an English-speaking country, and the amount of daily English exposure they experience.
How long a person has been living in an English-speaking country and the amount of English
he/she is exposed to every day are positively related, but the relationship is not linear. A few
participants who do not live in an English-speaking country have also indicated high daily English
exposure. Likewise, participants who live in the US have indicated low daily English exposure.
The previous paragraphs gave a couple of examples of how this could come to be. Participants
also provided explanations. Some of the participants are often exposed to a language in their home or at
work that is different from the language used in the general environment. For example, a participant
who lives in Taiwan but works as an English translator is exposed to English almost daily and in great
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
amount. On the other hand, a participant who lives in the US but works in a Chinese office may have
limited daily English exposure. Many of the US-residing participants also speak Chinese among their
family.
Language use. Language use is related to language exposure but requires more active
involvement from the bilingual speaker. The survey results show a relatively strong, positive
relationship between language exposure and the amount of language use. It is likely, but not
definitively, that the longer a participant has been exposed to English, the longer they have been using
English daily. Use and exposure are different variables, however, and are examined separately.
Language use, exposure, and attitude. The survey results demonstrate that the longer a person
has lived in an English-speaking country, the more likely he/she views English as the dominant
language, and prefers English. Similar results are found with language exposure and the length of time
a person has been using English.
The statistical analysis used in this study is used only to find correlations and possibilities, not
to establish causal relationships. However, a causal relationship between the variables can be derived
from participants’ statements in the survey’s open-ended questions.
Asked about why they prefer one language over the other, several participants used the term “習慣了”, meaning they are used to it. Several participants prefer Chinese because it is the language they
have been using for the longest period of time, and that they are most familiar with it. The participants
imply that as language exposure increases, and language use history accumulates, their feelings about a
language and the inclination to choose one language over another being to shift. A little less than half
(46%) of the participants in this study, all Chinese-English bilingual speakers with Chinese being L1
and English being L2, either has no strong preference between Chinese and English anymore, or
switched to preferring English. As one participant puts it succinctly when asked about her preference of
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
English, she is: “[m]ore used to the language.” A second participant prefers English because she “use it
more on a daily basis”. Another participant prefers English because she has “been living in the US for
27 years, that’s the language I use [every day].” A fourth participant further says “[t]he longer I live in
the US, the higher the possibility I would use English.”
Similar statements were made by participants who has no preferences between Chinese and
English. One participant says she could not choose between the languages because “Chinese is native
language; English is the native language of the environment that I live in.” These comments suggest
that language exposure and length of language use have positive and observable impact on language
selection. Exposure and use can lead to familiarity until a participant “didn’t realize that I prefer to read
most articles in English,” or “…is no longer used to the way Chinese articles are worded.” In other
words, with exposure and use comes familiarity and habit forming, which in turn leads to the
establishment of preference and language dominance.
There is, however, one exception: A history of daily English exposure is not statistically
significant to the choice of English for survey language (p = 0.058). There are a few possible
explanations for this phenomenon. As previously mentioned, language exposure does not equal
language use. Some participants might feel more comfortable approaching the survey in Chinese
because Chinese is, after all, their mother language. The survey language choice is further discussed in
later sections.
Internet use. The survey asks participants to select the settings in which they use more English
over Chinese. One of the setting is “on the Internet for personal or recreational purposes.” The results
of the survey showed no relation between language exposure, history of use, and online language use
(Figure 13 and Figure 14). The results, however, does not comply with the findings of the later set of
questions about the language preference for each individual online activity. A later set of questions
found that language exposure and history of use are correlated to the frequency of Chinese or English
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
uses for individual online activities. The longer a person has been living in an English-speaking
country, the more frequently they would use English for various online activities, including work-
related research, reading news, socializing with friends, and online shopping. The same moderately
positive relationship is also found between the amount of daily English exposure and online activity
language choices.
The discrepancy between language use online in general and for listed activities could be a
caused by the survey not accounting for all possible online activities. For example, the survey did not
ask about online gaming or emailing. Regardless, it is clear that language exposure and history of
language use has no impact on the language choice for general Internet use. English is evenly preferred
by participants with or without extensive exposure to English or history of English use. Is this because
English is still the most common language used on the Internet (“Most Common Languages”, 2016) so
that users are required to use English for certain occasions online regardless of their preference and
language history? If so, it confirms the argument laid out by Petrelli et al (2004) that users choose a
language based on the task at hand. Or are users willingly using English in order to be engaged in a
broader range of online activity? Are the users satisfied with their language options? How often do they
require language-related assistance? These are questions that requires further research in the future.
Article selection results and digital document language preference. Both living in an
English-speaking country and having daily English exposure have moderate relationships with the
article selection results (see Figure 10). As with language attitude, although the relationship between
exposure and the article selection results are moderate, it shows a possible effect of language
environment on information seekers. Familiarity comes out of exposure. With familiarity comes
comfort that could lead to preference. For people who lives in a L2 environment, L2 gradually becomes
the main language they use over time, reducing their need for CLIR or MLIR features.
Another consequence of living in an English-speaking environment is that new concepts, terms,
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
and subjects are learned in English. One participant pointed out that “when I come into contact with
something new, the language in which the impression is formed is very important. Although Chinese is
my native language, for things I learned after I moved to the US, I am used to using English to search
for relevant information.” One other participant reflected, after the article selection, that she chooses to
read the dinosaur article in Chinese because everything she knows about dinosaur, she learned in
Chinese. This view is perhaps one of the reasons why multiple studies cited in Chapter 3 found users
gravitating toward a specific language when seeking for information in specific fields. When a field’s
publication is dominated by a language, it is true that there is more information available in that
language, it is also true that people often acquire the jargons, terms, and concepts in that language.
When the knowledge of a subject is acquired in a specific language, it is natural to use that language for
related information seeking. Therefore, for subjects with a commonly acknowledged dominant
language, it would perhaps be better to focus CLIR and MLIR resources and efforts on assisting
information access to information seekers not yet acquainted with the subject, nor have sufficient
knowledge of the proper terms and jargons.
Summary
Language exposure, operationalized as number of years living in the US and the amount of
daily exposure, and the history of language use are found to have positive relationship with
participants’ language preferences, dominant languages, language proficiency, online activity language
choice, and article selection results. To be sure, it is possible some of these variables have a cause-and-
effect impact on each other. For example, a person’s English proficiency would perhaps improve the
more they are exposed to English, and a person’s preference of English could lead him/her to be more
exposed to English, or vice versa. The interaction between variables and the influence of the interaction
should be explored further in future studies. From this study, participant input shows us that language
exposure and history of language use contribute to language familiarity, which in turn leads to
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
participant’s language preference, language dominance, as well as the behavior of choosing to use the
language. If this is the case, information seeking users would be more likely to prefer and use the
language of their immediate surrounding, and that CLIR and MLIR would be more needed for non-
native users who would benefit from accessing local information.
Language Proficiency
Assumption four is about a bilingual speaker’s language proficiency: The more proficient a
bilingual speaker is with L2, the more likely he/she will choose L2 for digital information resources.
Result
Language proficiency and article selection results are found to be statistically related. The
higher a participant’s language proficiency level, the more likely they would select more English
versions of the articles.
Discussion and Implications
Language proficiency is found to be intertwined with many factors of a person’s language
profile. It is a possible impact factor to participant’s language attitude regarding dominant language,
and the various language preferences.
Language proficiency and language attitude. The findings of this study show that language
proficiency and the variables that represent language attitude are statistically relevant to each other. The
higher a person’s English proficiency, the more likely English is his/her dominant, preferred, answer,
and survey language. Participants who can only reading simple English paragraphs overwhelmingly
view Chinese as their dominant language and preferred language for general use. Once participants
English proficiency improves, however, its impact on the person’s language attitude diminishes, and
the impact of other language exposure and use elements increases.
To investigate the possible influence of other language profile elements on highly proficient
bilingual users, let us examine at the participants with the highest proficiency ratings. There are 40
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
participants who rate their language proficiency as comparable to that of an educated native speaker’s.
For these participants, the number of years they have lived in the US is correlated to how long they
have been using English consistently daily (r = 0.404, p = 0.01), but not with their daily English
exposure amount (r = 0.334, p = 0.036). All of them but one lives in the US, with the average duration
being 18.5 years. All of them but one has been using English consistently, daily for longer than four
years. Furthermore, English is the language that most of these participants (90%) are exposed to more
than 50% of the time in their daily lives.
Mann-Whitney test, Kruskal-Wallis test, and Pearson’s correlations were conducted to examine
the relationships between these participants’ language attitude with language exposure and amount of
use. The results show that for participants with high English proficiency, the amount of daily language
exposure is related to dominant language outcome (U = 110.5, p = 0.015), and general language use
preference (X2[2, N=40] = 12.311, p = 0.002). The other variables have no bearing on language
attitudes. The result suggests that when participants are fluent with a language, the amount of language
exposure can influence how they perceive and approach languages.
This finding corroborates the findings in the previous section, Language Exposure and the
History of Language Use and highlights the importance of a user’s language environment with their
language use pattern. The influence of language exposure and language environment should be further
examined in the context of information seeking behavior and CLIR.
Language proficiency and language choice for online activities. Language proficiency, like
language exposure and language use history, is found to have no significant impact on language choice
for online activities. There is an even split between participants on what language they prefer to use
online. It is possible that the survey did not account for every online activity, and so the results are not
full representatives of the participants’ language use and preferences. It could also be that, as Steichen
et al. (2014) found, participants have come to expect English as the most commonly used English
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
online and adopted it as the default language. Furthermore, they are able to conduct online activities
using English to satisfaction, eliminating the need to use other languages.
Language proficiency and article selection results. Let us revisit the relationship between
English proficiency and the number of English version articles selected. When participants are asked to
comment on their article selection process, proficiency is mentioned over and over as one of the main
impact factors. Many of the participants prefer the articles in their native language which they are most
proficient with. With their native language, they are able to “read quickly and grasp the meaning of an
article accurately”. On the other hand, there are also participants who prefer English even though it is
their second language. These participants usually have high English proficiency to the point where they
“can read and write in English effortlessly”.
Whether the participants prefer English or Chinese, when language proficiency is the reason
they choose one language version over the other, it is mostly because with higher proficiency, less
cognitive efforts are required and the participants are able to obtain what one participant describes as
“higher efficiency”. There is also a sense of ease that comes with skill and familiarity; many of the
participants describe it as “convenience”.
Many of the existing studies cited in Chapter 2, such as Steichen et al. (2014), found
proficiency to be a major driving factor behind bilingual information seekers’ language selection
processes. Its impact would likely carry over to the information seeking process. However, from Figure
21. Number of English Articles Selected and English Proficiency Level, we can see that while one can
safely assume a participant who selected a higher percentage of English version articles would have
higher English proficiency, the same assumption cannot be made on participants who selected fewer
English versions. Slightly less than half (42%) of the participants with level 4 English proficiency (able
to read all styles and forms of documents related to professional needs), and slightly more than a
quarter (27%) of the participants with level 5 English proficiency (rival educated native speakers)
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
selected three or less English versions and five or more Chinese versions. There are clearly other
factors that influence the article selection outcome, some of which are discussed in the Subject Matter
section below.
No language preference. Not all of the participants who have at least moderate English
proficiency level have clear preferences between English and Chinese. In fact, 25% of the participants
said they either have no preference, or that they choose the language based on other conditions such as
translation quality and the subject of the article. These conditions are discussed in later paragraphs.
Some of the participants found it difficult to choose between Chinese and English because they
can read both languages fluently. For these participants, their English proficiency level negated the
impact of language proficiency, and bring other considerations into sharper relief. These considerations
are addressed in the later sections.
Summary
Language proficiency is an impact factor that influences participant’s language choice. Its
impact can usually be seen after a person’s language proficiency level is moderate or better.
Participants who are less fluent in English largely prefers Chinese in various situations, and views
Chinese as their dominant language. As the participant’s language proficiency improves, other variables
begin to take on more influence.
Subject Matter
The last assumption is about subject matter, and states: The less familiar a bilingual speaker is
with a subject matter, the more likely he/she will choose L1 for information regarding that subject.
Result
The survey asks participants to reflect upon the article selection exercise, and comment on the
process. Sixteen participants cited subject as the deciding factor for the article selection result. Several
of them specifically mentioning using L1 for topics that they are not familiar with, confirming the
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
fourth assumption. Two participants prefer L2 since it is the language in which they learned about the
subject. The rest of the participants view subject topic as an important factor, but not in the way this
current study assumed.
Discussions and Implications
Many of the existing studies view the purpose and subject field of a search task as major impact
factors to an information seeker’s language choice. This study took away the variable of the search
task, and presented the participants with a collection of articles of different topics. Even without the
search tasks, participants find the subject topic of the articles to be influential to their article selection
result. A few of them favored L1 (Chinese) for articles that are more difficult or subject matters that are
unfamiliar. A couple of them favored L2 (English) because they are more familiar with the jargons and
technical terms worded in English. Others cited other subject field related reasons, such as:
“If it is about China, I would prefer Chinese. If it’s about the West, I would choose English.”
“If it is scientific, I prefer English. Only when it is about Chinese internal [affairs], I read
Chinese.”
“Depending on the content of these articles—if the content is culturally related to Chinese, then
I would prefer to read it in Chinese; if it's a piece related to news or new studies, then the
English version will be my first choice.”
“If it is science related, I prefer the English version. I think it’s because I am trained as a
scientist in English.”
“Parenting + work + kids related articles = English. There are more/better info out there and the
people I share opinions with use English. Personal leisure + cultural/family related articles =
Chinese. So I can communicate and exchange ideas quickly.”
Although all of the statements above are about subject matter, each approached subject matter a
little differently and raised the following issues:
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
1. Where did the story originate from and what is the official language spoken at the
source?
2. Are the participants familiar with a subject? If they are, what language did they learn
about the subject in?
3. Who will the participants be sharing the information with and what language do they
speak?
4. In what language can the participant find more information about this subject? In what
language would they find more accurate information about this subject?
Let us examine these four issues one by one.
Story origin. The survey results show that to some participants, the original language
concerning the subject matter is important and much preferred.
As summarized in Table 16, four of the eight articles are China or Chinese related. Figure 19
charts the way participants respond to the articles and the perceived source language of the articles. The
chart shows that more than half of the participants chose the Chinese version for the Chinese-sourced
articles. On the other hand, Western culture related articles in general has more English versions
chosen. Further statistical test shows that source language is indeed a significant impact factor to the
article selection outcome, demonstrating information seeker’s tendency to want to read an article in its
source language.
A closer review of participant feedbacks discovers that the preference for source language is
mentioned over and over again when participants are asked about the article selection process. Many of
them stated they want to read the version that is written in the original language. A participant explains
“I read English if it is a Western report, and Chinese if it is a Chinese topic. This way, I get to read the
original versions.” Even though there are no indications which language version is the original, and
which is the translation, participants assign original languages to each article by the articles cultural or
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
geographical associations. As one participant puts it, “I choose language by guessing if the article is
originally written in the particular language.”
Some participants chose the original language in order to avoid proper name translations. The
transliterations of proper names are viewed as complicated or unwieldly and takes “too much work to
keep track of it”. When faced with the articles, some participants scan the title, and then decide on the
language version to choose based on the existence of proper names or technical terms.
Some chose the original language because “the articles sound more natural to me”. As one
participant states, “Chinese first. Unless when I first scanned the article, the Chinese translation is not
good or feels too forced.” Several participants similarity remarked on the translation quality of the
article, saying that “the sentences are awkward”, “the translation is hard to understand”, and “it is
uncomfortable to read”. One participant pointed out that he does not want to risk contents being lost in
translation, therefore he prefers the original version. For these participants, the original writings are
better than the translations, unless L2 is too difficult and their language proficiency does not support it.
For these participants, translations of an article may be enough to help them learn what an article is
about (Orengo & Huyck, 2006), it may not be good enough for the information seeker to continue to
read it.
Familiarity with the subject. Some participants do lean towards L1 when faced with an
unfamiliar subject, explaining their language choice: “If it is a subject I am not super familiar with I
will choose to read it in Chinese. However, if it is a subject that I am comfortable with I would prefer to
read it in English.” Time and again the decision on whether to continue with L1 or L2 hinges upon the
existence of technical terms and jargons.
Participants are found to base their language selection on whether there are unfamiliar terms in
the document, or because they associate the subject topic with a specific language.
Participants who came across unknown terms often choose to read the articles in the language
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
they are most fluent in. One participant explains her Chinese article selections by saying “… these
articles mostly include many technical terms, so I want to read the Chinese explanation of these terms”.
Another participant says “if the content is more technical, and in some cases, containing professional
terminologies, I will also read it in Chinese simply because it is easier.” For these participants, the
subject matter is already foreign and requires effort to understand. They use language selection as a
strategy to combat the difficulty of the content and lighten the cognitive load required to comprehend
the document. Similar coping strategies regarding language and task difficulty has been observed in
bilingual studies on language switching in composition (Ramirez, 2012).
Bilingual speakers can sometimes associate a subject topic with a specific language due to
personal experience or exposure to the subject topic. This is a phenomenon that has been observed by
linguists (Saville-Troike, 2003), it is also seen here. In this survey, participants display such association
by selecting the document version that is written in the language in which they are also more familiar
with the technical terms. One participant prefers English version documents because she is “no longer
familiar with professional terms in Mandarin”. Information seekers who are well versed in a subject
field that has a commonly acknowledged dominant language are likely able, and perhaps even prefer, to
use the dominant language for information seeking tasks on the subject topic. The acquisition of the
knowledge in the language, and the usage of it lead to familiarity which can develop into preference.
For linguists, topic is often entwined with the setting and purpose of the language use. Here, it is likely
so as well. Participants acquired the knowledge of a subject matter for purposes that could be work,
school, or for personal entertainment. The knowledge store begins to build, the accumulation of
relevant vocabulary and terminology grows, and the language the person used to acquire the knowledge
becomes dominant, and is used more which leads to further vocabulary accumulation. The cycle
continues, and the language used to pursue the knowledge becomes dominant and strongly preferred
for this subject domain.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
From the above discussion, it appears that information seekers who are not familiar with the
technical terms and proper nouns used in the subject field would benefit more from CLIR features.
They would need help understanding the meaning of terminologies, forming search terms, and
comprehending search results.
Language is for communicating. Although the act of reading a digital document seems like a
solitary one, it carries a social component that cannot be ignored. The quote cited above demonstrated
how one of the participants look at the articles and consider the language choices based on who she
would converse with about the subject. It is an example of how information consumption is often for
further information dissemination. It reminds us that language is a tool used for communication. Even
when a person is reading an article by him- or herself, the future possibility of having to discuss it with
someone is never far from his/her mind.
When asked about what language they use for online activities and why, twenty participants
cited reasons similar to this: “[the language] choice depends on what I am looking for and with whom I
share info”, or “the language the audience is more inclined to using.” For these participants,
information seeking and information consumption are one step of the communication process. It is not
a standalone act. This is also evident in Figure 12 which shows participants, despite language
proficiency and attitude, overwhelmingly uses Chinese to communicate with family and friends. The
language is chosen for the purpose of communication, it does not reflect personal likes or dislikes, nor
personal abilities.
Information availability. The last quote cited in the beginning of this section mentioned
information availability as one of the criteria the participant considered when she was selecting
between Chinese and English. She looked at the subject matter of the article, and decide on the
language based on which one would likely yield the most or the best information. Her approach is
echoed by 24 other participants (16%) when they were asked how they select a language for online
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
activities. They choose the language based on “which language/culture will produce more results of
what I am looking for.” A participant searches for international news in English and Chinese news in
Chinese; another gave the example of using Chinese to search for Chinese-medicine related
information because “I use the language that can bring me more direct and updated information for a
subject well-known by the language.” When information seekers choose language by information
availability, their language options are limited by the resource language. The language choice is defined
by the website they visit. This will become an issue if the information seeker is not proficient in the
website’s language.
Other Findings and Observations
So far, we have evaluated the impacts on language selection by language attitude, language
exposure and the history of use, language proficiency, and subject matter. Participants mentioned other
factors that also influenced their language selection process.
One such factor is cultural identity. One participant who is highly fluent in English and exposed
to English 90% of the day felt duty-bound to choose Chinese. She later regretted her decision for she
uses English a lot more than Chinese in her daily life, so much so that formal Chinese wordings and
structures are harder for her to process. Another participant prefers to use Chinese because he does not
feel completely assimilated into US culture even after living in the US for a prolonged period of time.
For these participants, language selection is not the result of language proficiency or for ease of use,
but it is tied to their self-identity. Theirs are examples of how language cannot be separated from its
social functions. It is used to communicate ideas, and also social structures, identities, and standings.
There are also preconceived notions on the different quality, accuracy, and credibility of a
document written different languages even when the articles are parallel in content. At first glance, this
observation is similar to what subject in Rieh and Rieh (2005) displayed - a tendency to give more
favorable assessment to information in foreign documents, describing them as better, more credible,
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
etc. In Rieh and Rieh (2005), however, users were responding to existing information resources where
their availability in different language are disproportionate. The collection of L2, often English,
documents are larger than L1 collections. In this study, participants were provided with the same
information in two languages, and yet participants still view the documents as of different quality. One
participant likes to read Chinese history written in English because “history is subjective, sometimes it
is more interesting to read the English point-of-views”. A participant has this to say regarding the
Chinese local news article, she “feel[s] like I can trust English news more though I know they are
meant to be the same.” Another participant puts it bluntly, “I tend to think there is more fact in English
article.” These comments reflect further complexity in language attitude where participants are
responding to a language beyond what the language is used to convey. These participants assigned
attributes such as trustworthiness and truthfulness to the documents prior to reading the documents.
They read from the different wordings and grammar structure different tones to the same idea. They
approach the information with preformed judgments and opinions on the quality of the information that
the documents contain, thinking that different languages express one idea differently. These
preconceived notions could skew a person’s concept of information and approach to information
seeking. One of the participants explicitly noted that she might have answered the questions differently
if she chose to proceed with the survey in a different language. The language effects the way she
approaches the questions. It would be worthwhile to investigate how these opinions are formed, and if
the opinions influence an information user’s information seeking approach.
A different type of quality that has been mentioned by participants is the writing quality of the
digital documents. In the previous section on Subject Matter, we examined how participants look for
the source language of a subject matter and comment on the translation quality of the articles as their
language selection criteria. A few participants based their language selection on the quality of the
writing and in the beauty of language. One of them said his language selection depends upon the
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
document writer’s literary choices, accuracy of the wording, and literary aesthetics. “It is not just about
statements of facts,” he declares. Another participant noted the different feelings English and Chinese
evoke: “If I don’t find the writing in one language fluid, I will choose the other one.” A third participant
also noted the beauty in the use of words, and also that there should be “strong logic” behind the
writing. What these participants are saying is that language is objective and is appreciated as art. The
articles presented to them are read not only for the facts they contain, but also for enjoyment. In this,
they choose the language that they feel can bring the most joy. These participants provide another
instance where language serves more than being the medium that conveys an idea. As with invoking
cultural identity, it also appeals to information users aesthetically and to their emotion. Do these
emotion changes how users see an information resource? Would it change how they search for
information?
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Chapter 7. Conclusion, Implications, Limitations, and Future Research
This research began by asking what other variables beyond language proficiency and subject
domain of the search task influences a Chinese-English bilingual information seeker’s language choice.
By asking the users to choose from parallel contents in two languages, this study is able to focus on the
effect of elements that make up an information user’s language profile on language selection. Elements
such as: language attitude (dominant language and language preferences), language exposure, history
of use, and language proficiency level, which were all found to impact a bilingual speaker’s language
selection result. The impact of the article’s subject matter is also observed but in a different manner
from other existing studies. Whereas most studies examine subject domain through the information
seeking task, this study examines how information seekers react to the subject of a digital document
and how it influences their perception and use of language.
The findings of this research demonstrate that the makeup of a bilingual user’s language profile
developed for bilingualism research is also instrumental in the language selection process for
information seeking, and it is complex in nature. Users’ language profile influences their approach to
language and by extension their acceptance or rejection of the document that is created in the language,
which then in turn effects the choice of information resources. Furthermore, participant responses in the
survey demonstrated that language is a multifaceted construct that often carries meanings beyond the
words that were used. Participant comments revealed unexpected preconceptions towards language,
and the expression of self-identity through language selection.
This study set out to explore how a person’s language profile might influence how they use and
react to languages they know. The findings confirmed some former research conclusions, and brought
to light new observations. There is still much to learn about user’s language profile and its relation to
user’s information seeking behavior and what it means to CLIR and MLIR researchers. The findings of
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
this study laid out possible trends and likely relationships among language profile variables and
language selection results. However, as an exploratory study, it is not without limitations.
Limitations
First and foremost, without a known population and an established subject profile for bilingual
information seekers, this study used purposive sampling approach combined with convenience sam-
pling approach, which results in self-selected participants. The recruitment method can potentially in-
troduce statistical error. As a result, it is crucial to recognize the exploratory nature of this study.
Secondly, this study is designed not only to observe users’ language selection outcome, but also
to prompt them to reflect upon language and information resources through the article selection out-
come. Since the articles are prescribed to the participants by the researcher, the articles are not always
of interest to the participants. There is a possibility that a participant does not respond to the selected
articles as how he/she would respond to digital information that he/she encounter in real life. There is
also a limit to the amount of feedback and observation that can be gained due to the online survey for-
mat.
Lastly, although participants complete the online survey on their own without being observed,
were guaranteed by the researcher that no judgement would be passed on their answers, and were told
that the survey is anonymous, there is still the possibility that some of them tailored their responses ac-
cording to what they think the answers should be. For example, one participant chose to do the survey
in Chinese because she thought it reflects the fact that Chinese is her mother tongue, not because she
preferred it. Another participant approached the article selection exercise like it is a test. These types of
participant bias could affect research outcome. There is still much to do to further our understanding of
bilingual information seekers.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Future Research
Similar to existing literature on bilingual or multilingual information seeking behavior, this
study has a focused sample frame that is narrow in nature. The sample framework for this study is lim-
ited to Chinese-English bilingual speakers in order to accommodate the limitation of the researcher’s
language abilities. Future studies should include bilingual speakers of different language pairs. Each
language has its own inherit cultural background and history, can instigates different sets of assump-
tions, and can possibly incite different user behaviors. It would be interesting to see if bilingual speak-
ers of different language pairs behave differently in their language selection approach. Similar investi-
gation should also be applied to monolingual speakers. Are monolingual users also influenced by ele-
ments of their language profile? To what degree?
Participants in the current study treated language as an instrument wielded by the speaker not
only to articulate a thought but also to convey ideas about the speaker’s assessment of the setting,
social association and self-identification. Language is loaded with cultural and the speaker’s personal
history and perceptions; users do not view each language impartially. How does this influence the way
information seekers approach information? Does it impact how they view information resources? Or
can the act of information seeking be separated from the social construct and exist in a neutral space?
Furthermore, does monolingual speakers also experience similar preconceptions and influences? These
are questions that should be further explored, perhaps borrowing insights from social linguists, and
bilingual scholars.
Bilingual speakers are heterogenous and come from a wide range of experiences and
background. The type of support and function they need would vary according to their language
profile. How long have they been using a language, what environment are they in, and what they
associate the languages with can all impact their language selection and information seeking
approaches. While this study is able to provide some understanding of information seeker’s motivation
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
behind language use, it does not, by any means, provide a full picture. However, the results of this
study illustrated the complexity and variety of the constitution of a bilingual speaker’s language profile.
It is important to identify the different types of bilingual speakers, recognize their strengths and
disadvantages, identify the groups that would most require cross language information seeking support,
and determine the type of features that would provide the right type of support. Through this process
can a CLIR system be designed to meet the demands and requirements of users. This study further
echoes Steichen et al. (2014) call for personalized support for cross language information retrieval
systems in order to account for individual user’s particular background, needs, and skill sets.
This study focuses on the relationship between the observed variables and the language selec-
tion outcome. Although the interactions of a few variables, such as the relationship between dominant
language and survey language, were explored, there are others with relationships worth investigating in
future studies, such as the effect of language exposure at home versus at work. Another aspect that have
not been studied is the strength of the impact. This study found language preference and language dom-
inance to be influential. How does the strength of their impact compare to that of the impact of profi-
ciency? Does any variable’s influence suppress other variables’?
Furthermore, many of the variables in this study generated non-normally distributed data. Some
variables have outliers, such as English proficiency mapped by number of years living the US, that are
worth further exploration. What led these participants to have such abnormal language profile? Does
their background change their language attitude and language use? What is different in their environ-
ment. Future research should also examine if a larger set of data be more normally distributed, dimin-
ishing the effect of outliers; or if the profiles of bilingual information seekers are non-normal in nature.
Lastly, this study focused on digital text but information seekers use the Internet for many other
reasons, and search for much more than text documents. Do users handle language use for other media
sources differently? Does the role of language diminish or increase for different online activities? These
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
are questions that would need to be answered by future studies.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
References
Agheyisi, R., & Fishman, J. A. (1970). Language attitude studies: A brief survey of methodological approaches. Anthropological linguistics, 137-157.
Airio, E., & Kettnen, K. (2009). Does dictionary-based bilingual retrieval work in a non-normalized index? Information Processing & Management, 45(6), 703-713.
Allan, J., Callan, J., Croft, W. B., Ballesteros, L., Byrd, D., Swann, R., and Xu, J. (1997). INQUERY does battle with TREC-6. In Proceedings of the Sixth Text REtrieval Conference (TREC-6), NIST, 169--206. Retrieved on November 22, 2005, from http://citeseer.ist.psu.edu/broglio94inquery.html.
Androutsopoulos, J. (2006). Multilingualism, diaspora, and the Internet: Codes and identities on German‐based diaspora websites. Journal of Sociolinguistics, 10(4), 520-547.
Androutsopoulos, J. (2013). Code-switching in computer-mediated communication. In , S.C. Herring, D. Stein, T. Virtanen (Eds). Pragmatics of Computer-Mediated Communication, 659-686. Berlin/New York: Mouton de Gruyter.
Aparicio, X. & Lavaur, J. (2013). Recognising words in three languages: effects of language dominance and language switching. International Journal of Multilingualism, 11(2), 164-181. DOI: 10.1080/1479-718.2013.783583.
Artandi, S. (1973). Information concepts and their utility. Journal of the American Society for Information Science, 24(4), 242-245.
Artiles, J., Gonzalo, J., Lopez-Ostenero, F., & Peinado, V. (2006). Are users willing to search cross-language? An experiment with the Flickr image sharing repository. In C. Peters, P. Clough, F. C. Gey, J. Karlgren, P. C. Carol Peters, & B. Magnini (Ed.), In Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval (CLEF'06) (pp. 195-204). Berlin, Heidelberg: Springer-Verlag.
Aula, A., & Kellar, M. (2009, April). Multilingual search strategies. In CHI'09 Extended Abstracts on Human Factors in Computing Systems (pp. 3865-3870). ACM.
Ayers, J.W. (August, 2010). Measuring English proficiency and language preference: Are self-reports valid? American Journal of Public Health, 100(8), 1364-1366.
Azarbonyad, H., Shakery, A., & Faili, H. (2012). Using learning to rank approach for parallel corpora based cross language information retrieval. In ECAI (pp. 79-84).
Backus, A. (2005). Codeswitching and language change: One thing leads to another?. International Journal of Bilingualism, 9(3-4), 307-340.
Bahrick, H. P., Hall, L. K., Goggin, J. P., Bahrick, L. E., & Berger, S. A. (1994). Fifty years of language maintenance and language dominance in bilingual Hispanic immigrants. Journal of Experimental Psychology: General, 123(3), 264-283.
Baker, C. (1992). Attitudes and language (Vol. 83). Tonawanda, NY: Multilingual Matters.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Baker, C. (2011). Foundations of Bilingual Education (5th Ed). Tonawanda, NY: Multilingual Matters.Baker, C., & Jones, S. P. (Eds.). (1998). Encyclopedia of bilingualism and bilingual education.
Multilingual Matters.Ballesteros, L. and Croft, W. B. (1997). Phrasal translation and query expansion techniques for cross-
language information retrieval. In Proceedings of the 20th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Philadelphia, Pennsylvania, United States, July 27 - 31, 1997). N. J. Belkin, A. D. Narasimhalu, P. Willett, and W. Hersh, Eds. SIGIR '97. ACM Press, New York, NY, 84-91. Retrieved: 10/4/05, from ACM Portal.
Ballesteros L., Croft, W.B., (1998). Resolving ambiguity for cross-language retrieval. In Proceedings of the 21st Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Melbourne, Australia, August 24 - 28, 1998). SIGIR '98. ACM Press, New York, NY, 64-71. Retrieved: 10/4/05, from ACM Portal.
Ballesteros, L., & Sanderson, M. (2003, November). Addressing the lack of direct translation resources for cross-language retrieval. In Proceedings of the twelfth international conference on Information and knowledge management (pp. 147-152). ACM.
Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E. (2009). The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Language resources and evaluation, 43(3), 209-226.
Bates, M. J. (2006). Fundamental forms of information. Journal of the American Society for Information Science and Technology, 57(8), 1033-1045.
Bates, M.J. (2009). Inforamtion Behavior. In Encyclopedia of Library and Information Sciences, 3rd Edition (pp. 22381-2391). New York, NY: Taylor and Francis.
Becker, K. R. (1997). Spanish/English bilingual codeswitching: A syncretic model. Bilingual Review, 22(1), 3-30
Belkin, N. J. (1978). Information concepts for information science. Journal of documentation, 34(1), 55-85.
Belkin, N. J. (1980). Anomalous states of knowledge as a basis for information-retrieval. Canadian Journal of Information Science-Revue Canadienne Des Sciences De L Information, 5(May), 133-143.
Belkin, N. J. (2000). Helping people find what they don't know. Communications of the ACM, 43(8), 58-61.
Belkin, N. J., & Robertson, S. E. (1976). Information science and the phenomenon of information. Journal of the American Society for Information Science, 27(4), 197-204.
Belkin, N. J., Oddy, R. N., & Brooks, H. M. (1982). ASK for information retrieval: Part I. Background and theory. Journal of documentation, 38(2), 61-71.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Bedore, L. M., Pena, E. D., Summers, C. L., Boerger, K. M., Resendiz, M. D., Greene, K., Bohman, T. M. & Gillam, R. B. (2012). The measure matters: Language dominance profiles across measures in Spanish–English bilingual children. Bilingualism: Language and Cognition, 15(03), 616-629.
Ben Romdhane, W., Elayeb, B., Bounhas, I., Evrard, F., & Ben Saoud, N.B. (2013). A possibilistic query translation approach for cross-language information retrieval. Lecture Notes in Computer Science, 7996, 73-82.
Bhatia, T.K. & Ritchie, W.C. (Eds.) (2013). The Handbook of Bilingualism and Multilingualism. Chichester, UK: Blackwell Publishing.
Bialystok, E., Craik, F.I.M., & Luk, G. (2013). Bilingualism Consequences for mind and brain. Trends in Cognitive Science, 16(4), 240-250. DOI: 10.1016/j.tics.2012.03.001.
Bilingualism [Def. 1]. (n.d.). In OED Online. Retrieved Jun 12, 2014, from http://0-www.oed.com.library.simmons.edu/view/Entry/18968.
Bilingual [Def. 3]. (n.d.). In OED Online. Retreived June 12, 2014, from http://0-www.oed.com.library.simmons.edu/view/Entry/18967.
Birdsong, D. (2006). Dominance, proficiency, and second language grammatical processing. AppliedPsycholinguistics, 27, 46–49.
Blom, J, & Gumperz, J. (1972). Social meaning in linguistic structures: Code switching in Northern Norway. In J. Gumperz and D. Hymes (Eds). Directions in Sociolinguistics: The Ethnography of Communication, 407-434. New York, NY: Holt, Rinehart, and Winston.
Bloomfield, L. (1935). Language. London: Allen and Unwin.Bokset, R (2006). The Long Story of Short Forms: The Evolution of Simplified Chinese Characters.
Stockholm East Asian Monographs, No. 11. Stockholm: Department of Oriental Languages, Stockholm University.
Boslaugh, S. & Watters, P.A. (2008). Statistics in a Nutshell: A Destop Quick Reference. Sebastopol, CA: O'Reilly Media, Inc.
Broglio, J., Callan, J. P., Croft, W. B. (1994). INQUERY system overview. Project TIPSTER Text Program, Phase I. Retrieved on November 27, 2005, from: http://citeseer.ist.psu.edu/broglio94inquery.html.
Buchweitz, A., & Prat, C. (2013). The bilingual brain: Flexibility and control in the human cortex. Physics of Life Reviews, 10(4), 428-443.
Buckland, M. K. (1991). Information as thing. Journal of the Association for Information Science and Technology, 42(5), 351-360.
Byström, K., & Järvelin, K. (1995). Task complexity affects information seeking and use. Information Processing & Management, 31(2), 191-213.
Caldas, S.J., & Caron-Caldas, S. (2002). A sociolinguistic analysis of the language preferences of adolescent bilinguals: Shifting allegiances and developing identities. Applied Linguistics, 23(4),
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
490-514.Capurro, R., & Hjørland, B. (2003). The concept of information. Annual review of information science
and technology, 37(1), 343-411.Cartoni, B., Zufferey, S., & Meyer, T. (2013). Using the Europarl corpus for cross-linguistic research.
Belgian Journal of Linguistics, 27(1), 23-42. doi:10.1075/bjl.27.02carCashman, H. R. (2005). Identities at play: language preference and group membership in bilingual talk
in interaction. Journal of Pragmatics, 37(3), 301-315.Chau, R. and Yeh, C., (2002). Explorative multilingual text retrieval based on fuzzy multilingual
keyword classification. In Proceedings of the 5th International Workshop Information Retrieval with Asian Languages, 33-40. Retrieved 10/4/05, from ACM Portal.
Chen, J. & Bao, Y. (2009, March). Cross-language search: The case of Google Language Tools. First Monday, 14(3).
Chen, J., Ding, R., Jiang, S., & Knudson, R. (2012). A preliminary evaluation of metadata records machine translation. The Electronic Library, 30(2), 264-277.
Cheng, P., Teng, J., Chen, R., Wang, J., Lu, W., and Chien, L. (2004). Translating unknown queries with web corpora for cross-language information retrieval. In Proceedings of the 27th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Sheffield, United Kingdom, July 25 - 29, 2004). SIGIR '04. ACM Press, New York, NY, 146-153. Retrieved: 10/17/05, from ACM Portal.
Cherciov, M. (2013). Investigating the impact of attitude on first language attrition and second language acquisition from a Dynamic Systems Theory perspective. International Journal Of Bilingualism, 17(6), 716-733. doi:10.1177/1367006912454622
Chiao, Y. C., & Zweigenbaum, P. (2002, August). Looking for candidate translational equivalents in specialized, comparable corpora. In Proceedings of the 19th international conference on Computational linguistics-Volume 2 (pp. 1-5). Association for Computational Linguistics.
Chung, W. (2008). Web searching in a multilingual world. Communications of the ACM, 51(5), 32-40.Cimiano, P., Schultz, A., Sizov, S., Sorg, P., & Staab, S. (2009). Explicit versus latent concept models
for cross-language information retrieval. Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence, (pp. 1513-1512).
Clough, P. & Eleta, I. (2010) Investigating language skills and field of knowledge on multilingual information access in digital libraries. International Journal of Digital Library Systems, 1(1). DOI: 10.4018/jdls.2010102705.
Cutting, D., Kupiec, J., Pedersen, J., and Sibun, P. (1992) A practical part-of-speech tagger.
Proceedings of the 3rd Conference on Applied Natural Language Proceeding, 133-140. DOI: 10.3115/9774499.974523.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Davis, M. (1996). New experiments in cross-language text retrieval at NMSU's Computing Research Lab. In the Fifth Text Retrieval Conference (TREC-5), Gaithersburg, MD: National Institute of Standards and Technology, 1996, 447-453. Retrieved November 21, 2005, from http://www.scils.rutgers.edu/~muresan/IR/TREC/Proceedings/t5_proceedings/t5_proceedings.html.
Davis, M. (1998). On the effective use of large parallel corpora in cross-language text retrieval. In G. Grefenstette ed. Cross-Language Information Retrieval. Kluwer Academic Publisher, 11-22.
Davis, M. W. and Ogden, W. C. (1997). QUILT: implementing a large-scale cross-language text retrieval system. In Proceedings of the 20th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Philadelphia, Pennsylvania, United States, July 27 - 31, 1997). N. J. Belkin, A. D. Narasimhalu, P. Willett, and W. Hersh, Eds. SIGIR '97. ACM Press, New York, NY, 92-98. Retrieved: 10/10/05, from ACM Portal.
Demnar-Fushman, D. & Oard, D.W. (2003, February) The effect of bilingual term list size on dictionary-based cross-language information retrieval. Paper presented at the Thirty-Sixth Hawaii International Conference on System Sciences (HICSS) (Hawaii, Jan 6-9, 2003).
DePalma, D.A. (July 11, 2012). Microsoft aims to be the machine translation hub of global business. Retrieved from: http://www.commonsenseadvisory.com/Default.aspx?Contenttype=ArticleDetAD&tabID=63&Aid=2908&moduleId=390.
Dervin, B. (1983). An overview of sense-making research: Concepts, methods and results. Paper presented at the Annual Meeting of the International Communication Association. Dallas, TX.
Dewaele, J.M. (2007). Multilinguals' language choice for mental calculation. Intercultural Pragmatics, 4(3), 343-376. doi:10.1515/IP.2007.017.
Diekema, A. R. (2012). Multilinguality in the digital library: a review. The Electronic Library, 30(2), 165-181.
Dyvik, H. (2004). Translations as semantic mirrors: from parallel corpus to wordnet. Language and computers, 49(1), 311-326.
Ecke, P., & Hall, C. J. (2013). Tracking tip-of-the-tongue states in a multilingual speaker: Evidence of attrition or instability in lexical systems?. International Journal of Bilingualism, 17(6), 734-751.
Edmonds, L. A., & Oetting, J. (2013). Correlates and Cross-Linguistic Comparisons of Informativeness and Efficiency on Nicholas and Brookshire Discourse Stimuli in Spanish/English Bilingual Adults. Journal Of Speech, Language & Hearing Research, 56(4), 1298-1313. doi:10.1044/1092-4388(2012/12-0065)
Ellis, D. (1989). A behavioural approach to information retrieval system design. Journal of Documentation, 45(3), 171-212.
Ellis, R. (1994). The study of second language acquisition. Oxford University Press.Erdelez, S. (1999). Information encountering: It's more than just bumping into information. Bulletin of
the American Society for Information Science, 25(3). Accessed from:
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
http://www.asis.org/Bulletin/Feb-99/erdelez.html . Ge, X. (2010). Information-seeking behavior in the digital age: A multidisciplinary study of academic researchers. College & Research Libraries, 7(5), 4435-455.
Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data. (2nd ed). Cambridge, MA: MIT press.
Evans, D. A., Handerson, S. K., Monarch, I. A., Pereiro, J., Delon, L., and Hersh, W. R., (1998). Mapping vocabularies using latent semantics. In G. Grefenstette ed. Cross-Language Information Retrieval. Kluwer Academic Publisher, 63-80.
Farradane, J. (1980). Knowledge, information, and information science. Journal of Information Science, 2(2), 75-80.
Fishman, J.A. (1965). Who speaks what language to whom and when? La Linguistique, 1(2), 67-88.Franz, M., McCarley, J. S., Roukos, S., (1999). Ad Hoc and Multilingual information Retrieval at IBM.
In Proceedings of the Sixth Text REtrieval Conference (TREC-7), NIST. 157-168. Retrieved November 23, 2005, from http://trec.nist.gov/pubs/trec7/t7_proceedings.html.
Francis, N. (2012). Bilingual Competence and Bilingual Proficiency in Child Development. Cambridge, Mass: The MIT Press.
Fujii, A., Ishikawa, T. (2000). Applying machine translation to two-stage cross-language information retrieval. Proceedings of the 4th Conference of the Association for Machine Translation in the Americas (AMTA-2000), Oct. 2000, 13-24. Retrieved November 21, 2005, from http://arxiv.org/abs/cs.CL/0011003.
Gao, J. and Nie, J.Y. (2006). A study of statistical models for query translation: finding a good unit of translation. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, USA, 194–201.
Gao, J., Nie, J. Y., Xun, E., Zhang, J., Zhou, M., & Huang, C. (2001, September). Improving query translation for cross-language information retrieval using statistical models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 96-104). ACM.
Gao, J., Zhou, M., Nie, J. Y., He, H., & Chen, W. (2002, August). Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 183-190). ACM.
Gaspari, F. (2004). Online MT services and real users’ needs: An empirical usability evaluation. In Machine Translation: From Real Users to Research (pp. 74-85). Springer Berlin Heidelberg.
Gathercole, V. C. M., & Thomas, E. M. (2009). Bilingual first-language development: Dominant language takeover, threatened minority language take-up. Bilingualism: Language and Cognition, 12(02), 213-237.
Ge, X. (2010). Information-seeking behavior in the digital age: A multidisciplinary study of academic
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
researchers. College & Research Libraries, 71(5), 435-455.Genesee, F. & Bourhis, R.Y. (1988). Evaluative reactions to language choice strategies: The role of
sociostructural factors. Language & Communication, 8(3/4), 229-250.Georgalidou, M., Kaili, H., & Celtek, A. (2010). Code alternation patterns in bilingual family
conversation: A conversation analysis approach. Journal of Greek Linguistics, 10(2), 317-344. doi:10.1163/156658410X531401
Gertken, L. M., Amengual, M., & Birdsong, D. (2014). Assessing language dominance with the Bilingual Language Profile. Measuring L2 proficiency: Perspectives from SLA, 208-225.
Gey, F. C., Kando, N., & Peters, C. (2005). Cross-language information retrieval: The way ahead. Information Processing and Management, 41, 415-431. DOI: 10.1016/j.ipm.2004.06.006.
Gollins, T. and Sanderson, M. (2001). Improving cross language retrieval with triangulated translation. In Proceedings of the 24th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (New Orleans, Louisiana, United States). SIGIR '01. ACM Press, New York, NY, 90-95. Retrieved: 11/10/05, from ACM Portal.
Gong, Y., Chow, I., & Ahlstrom, D. (2011). Cultural diversity in China: Dialect, job embeddedness, and turnover. Asia Pacific Journal of Management, 28(2), 221. doi:10.1007/s10490-010-9232-6
Goodwin, C. & Heritage, J. (1990). Conversation Analysis. Annual Review of Anthropology, 19, 283-307.
Google (2013, December 10). Google Translator - now in 80 languages. [Blog post]. Retrieved from http://googletranslate.blogspot.com/2013/12/google-translate-now-in-80-languages.html.
Gray, N. J., Klein, J. D., Noyce, P. R., Sesselberg, T. S., & Cantrill, J. A. (2005). Health information-seeking behaviour in adolescence: the place of the internet. Social Science & Medicine, 60(7), 1467-1478.
Greene, K.J., Pena, E.D., & Bedore, L.M. (2012). Lexical choice and language selection in bilingual preschoolers. Child Language Teaching and Therapy, 29(1), 27-39.
Grosjean, F. (1982). Life with two languages: An introduction to bilingualism. Harvard University Press.
Grosjean, F. (1998). Studying bilinguals: Methodological and conceptual issues. Bilingualism: Language and cognition, 1(02), 131-149.
Grosjean, F. (2008). Studying bilinguals. Oxford University Press.Grosjean, F. (2012). Bilingual: Life and Reality. Cambridge, MA: Harvard University Press.Hakuta, K., & d'Andrea, D. (1992). Some properties of bilingual maintenance and loss in Mexican
background high-school students. Applied Linguistics, 13(1), 72-99.Hamers, J. F., & Blanc, M. H. (2000). Bilinguality and Bilingualism. Cambridge University Press.He, D., Oard, D. W., & Plettenberg, L. (2006). Studying the use of interactive multilingual information
retrieval. In Proceedings of the Worksho pon New Directions in Multilingual Information Access, pp.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
53-60. ACM-SIGIR 2006, Seattle, Washington, USA. He, D., Wang, J., Oard, D. W., & Nossal, M. (2002, September). Comparing user-assisted and
automatic query translation. In Workshop of the Cross-Language Evaluation Forum for European Languages (pp. 400-415). Springer Berlin Heidelberg.
He, D. & Wu, D. (2008). Translation enhancement: a new relevance feedback method for cross-language information retrieval. In Proceedings of the 17th ACM conference on Information and knowledge management (CIKM '08). ACM, New York, NY, USA, 729-738. DOI=10.1145/1458082.1458180 http://0-doi.acm.org.library.simmons.edu/10.1145/1458082.1458180
Heller, M. (1992). The politics of codeswitching and language choice. Journal of Multilingual & Multicultural Development, 13(1-2), 123-142.
Hansen, P., Liu, C., & Zhang, P. (2016). I Need More Time!: The Influence of Native Language on Search Behavior and Experience. CLEF.
Herbert, B., Szarvas, G., & Gurevych, I. (2011). Combining query translation techniques to improve cross-language information retrieval. In Advances in Information Retrieval (pp. 712-715). Berlin, Germany: Springer Berlin Heidelberg.
Hiemstra, D. and de Jong, F., (1999). Disambiguation strategies for cross-language information retrieval. In Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries, 274-293. Retrieved November 23, 2005, from http://citeseer.ist.psu.edu/hiemstra99disambiguation.html.
Hiemstra, D. and Kraaij, W. (1999). Twenty-One at TREC-7: Ad-hoc and cross-language track. In Proceedings of the Seventh Text REtrieval Conference (TREC-7), NIST. Retrieved November 27, 2005, from http://citeseer.ist.psu.edu/82362.html
Hiemstra, D., Kraaij, W., Pohlmann, R., and Westerveld, T. (2000). Twenty-One at CLEF-2000: Translation resources, merging strategies and relevance feedback. In Working Notes for CLEF Workshop. Retrieved November 27, from http://clef.isti.cnr.it/DELOS/CLEF/Notes.html.
Hong, W. (2011). A descriptive user study of bilingual information seekers searching for online information to complete four tasks. (Unpublished doctoral dissertation). University of Pittsburgh. Pittsburgh, PA.
Hopp, H., & Schmid, M. S. (2013). Perceived foreign accent in first language attrition and second language acquisition: The impact of age of acquisition and bilingualism. Applied Psycholinguistics, 34(02), 417-417.
Hua, Z. (2008). Duelling languages, duelling values: Codeswitching in bilingual intergenerational conflict talk in diasporic families. Journal of Pragmatics, 40, 1799-1816.
Hughes, H. (2005). Actions and reactions: Exploring international students' use of online information resources. Australian Academic & Research Libraries, 36(4), 169-179.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Hull, D. A. and Grefenstette, G. (1996). Querying across languages: a dictionary-based approach to multilingual information retrieval. In Proceedings of the 19th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Zurich, Switzerland, August 18 - 22, 1996). SIGIR '96. ACM Press, New York, NY, 49-57. Retrieved: 10/4/05, from ACM Portal.
Hupfer, M. E., & Detlor, B. (2006). Gender and Web information seeking: A self‐concept orientation model. Journal of the American Society for Information Science and Technology, 57(8), 1105-1115.
Ingwersen, P. (1996). Cognitive perspectives of information retrieval interaction: elements of a cognitive IR theory. Journal of Documentation, 52(1), 3-50.
Ianos, M. A., Huguet, À., Janés, J., & Lapresta, C. (2017). Can language attitudes be improved? A longitudinal study of immigrant students in Catalonia (Spain). International Journal of Bilingual Education and Bilingualism, 20(3), 331-345.
Ide, N., Erjavec, T., & Tufis, D. (2002, July). Sense discrimination with parallel corpora. In Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions, vol. 8 (pp. 61-66). Association for Computational Linguistics.
Inside Google Translate (n.d.). Retrieved from http://translate.google.com/about/intl/en_ALL/. Jansen, B.J., Booth, D.L., & Spink, A. (2008). Determining the informational, navigational, and
transactional intent of Web queries. Information Processing and Management, 44, 1251-1266.Jansen, B. & Spink, A. (2006). How are we searching the World Wide Web? A comparison of nine
search engine transaction logs. Information Processing and Management, 42(1), 248-263.Johansson, S. 2007. On the role of corpora in cross-linguistic research. In S. Johansson (ed.), Seeing
through multilingual corpora, 3–24. Amsterdam/Philadelphia: John Benjamins.Kaushanskaya, M., Gross, M., & Buac, M. (2014). Effects of classroom bilingualism on task‐shifting,
verbal memory, and word learning in children. Developmental science, 17(4), 564-583.Kasatkina, N. (2010). Analyzing language choice among Russian-speaking immigrants to the United
States. (Doctoral dissertation). Retrieved from The University of Arizona Campus Repository. (http://hdl.handle.net/10150/193622)
Keegan, T. T., & Cunningham, S. J. (2008). What a difference a default setting makes. In Research and Advanced Technology for Digital Libraries (pp. 264-267). Springer Berlin Heidelberg.
Kelly, D. (2006). Measuring online information seeking context. Part 1: Background and method. Journal of the American Society for Information Science and Technology, 57(13), 1729-1739.
Kimura, F., Maeda, A., Uemura, S. (2004). CLIR using Web directory at NTCIR4. In Working Notes of the Fourth NTCIR Workshop Meeting (Tokyo, Japan, June 2-4, 2004). Retrieved November 27, 2005, from http://research.nii.ac.jp/ntcir-ws4/NTCIR4-WN/CLIR/NTCIR4WN-CLIR-KimuraF.pdf.
Kishida, K. (2005). Technical issues of cross-language information retrieval: A review. Information Processing & Management: an International Journal, 41(3), 433-455.
Kishida, K. & Ishita, E. (2009). Translation disambiguation for cross-language information retrieval
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
using context-based translation probability. Journal of Information Science, 35(4), 481-495.Kishida, K. and Kando, N. (2005). Hybrid approach of query and document translation with pivot for
cross-language information retrieval. In Working Notes for the CLEF 2005 Workshop (Vienna, Austria, September 21-23, 2005). Retrieved November 27, 2005, from http://www.clef-campaign.org/2005/working_notes/CLEF2005WN-Contents1.htm.
Klavans, J., Hovy, E., Fluhr, C., Frederking, R., Oard, D., Okumura, A., Ishikawa, K., & Satoh, K.(2001). Multilingual (or cross-lingual) information retrieval. In E. Hovy, N. Ide, R. Frederking, J. Mariani, & A., Zompolli (Eds.) Multilingual Information Management: Current Levels and Future Abilities (pp. 35-56). http://www.cs.cmu.edu/~ref/mlim/chapter2.html.
Klein, D., Mok, K., Chen, J. K., & Watkins, K. E. (2013). Age of language learning shapes brain structure: A cortical thickness study of bilingual and monolingual individuals. Brain and language. DOI: 10.1016/j.bandl.2013.05.014.
Knight, C., & Studdert-Kennedy, M. (2000). The evolutionary emergence of language: social function and the origins of linguistic form. Cambridge University Press.
Kraaij, W. (2001). Comparing translation resources. In Proceedings of the CLEF-2001 Workshop. Retrieved November 22 2005, from http://citeseer.ist.psu.edu/kraaij01tno.html.
Kraaij, W., Nie, J., & Simard, M. (2003). Embedding Web-based statistical translation models in cross-language information retrieval. Computational Linguistics, 29(3), 281-419.
Kralisch, A. (2005) The impact of culture and language on the use of the Internet: Empirical analysis of behaviour and attitudes. (Doctoral dissertation). Retrieved from http://edoc.hu-berlin.de/dissertationen/kralisch-anett-2005-12-16/PDF/kralisch.pdf .
Kralisch, A., & Berendt, B. (2005). Language-sensitive search behaviour and the role of domain knowledge. New Review of Hypermedia and Multimedia, 11(2), 221-246.
Kuhlthau, C.C. (1991). Inside the search process: Information seeking from the user's perspective. Journal of the American Society for Information Science, 42(5), 361-371.
Kuhlthau, C.C. (1993). A principle of uncertainty for information seeking. Journal of Documentation, 49(4), 339–355.
Kuhlthau, C.C. (2005). Kuhlthau's information search process. In K.E. Fisher, S. Erdelez, & L.E.F. McKenchnie (Eds), Theories of Information Behavior, 230-234.
Larkey, L. S. and Connell, M. E. (2005). Structured queries, language modeling, and relevance modeling in cross-language information retrieval. Information Processing and Management: an International Journal, vol. 41, no. 3, 457-473. Retrieved 11/4/05, from Elsevier.
Larson, R. R., Gey, F., and Chen, A. 2002. Harvesting translingual vocabulary mappings for multilingual digital libraries. In Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries (Portland, Oregon, USA, July 14 - 18, 2002). JCDL '02. ACM Press, New York, NY, 185-190.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Lavrenko, V., Choquette, M., and Croft, W. B. (2002). Cross-lingual relevance models. In Proceedings of the 25th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Tampere, Finland, August 11 - 15, 2002). SIGIR '02. ACM Press, New York, NY, 175-182. Retrieved: 11/4/05, from ACM Portal.
Lehtokangas, R., Airio, E., and Järvelin, K. (2004). Transitive dictionary translation challenges direct dictionary translation in CLIR. . Information Processing and Management: an International Journal, vol. 40, no. 6, 973-988. Retrieved 10/17/05, from Elsevier.
Levow, G., Oard, D., & Resnik, P. (2005). Dictionary-based techniques for cross-language information retrieval. Information Processing & Management, 41, 523-547.
Li, H. (2008). The role of L1 use in L2 writing process by Chinese EFL students: Six cases of non-English majors. Journal of Cambridge Studies, 3(2), 25-28.
Li, Y., & Belkin, N. J. (2010). An exploration of the relationships between work task and interactive information search behavior. Journal of the American Society for information Science and Technology, 61(9), 1771-1789.
Lim, V. P., Liow, S. J. R., Lincoln, M., Chan, Y. H., & Onslow, M. (2008). Determining language dominance in English–Mandarin bilinguals: Development of a self-report classification tool for clinical use. Applied Psycholinguistics, 29(03), 389-412.
Littman, M.L., Dumais, S.T. and Landauer, T.K, (1998). Automatic Cross-language Information Retrieval using Latent Semantic Indexing. In Grefenstette, G. ed. Cross Language Information Retrieval, Kluwer Academic Publishers, 51-62.
Longman Dictionary of Language Teaching and Applied Linguistics (4th ed.). (2010). Great Britain: Pearson Education Limited.
Lu, W., Chien, L., and Lee, H. (2004). Anchor text mining for translation of Web queries: A transitive translation approach. ACM Transaction on Information Systems, vol. 22, no. 2, 242-269. Retrieved at: 10/14/05, from ACM Portal.
Macnamara, J. (1967). The bilingual's linguistic performance—a psychological overview. Journal of Social Issues, 23(2), 58-77.
MacSwan, J. (2012). 13 Code-Switching and Grammatical Theory. In T.K. Bhatia & W.C. Ritchie (Eds). The Handbook of Bilingualism and Multilingualism (pp.323-.350). Chichester, UK: Blackwell Publishing.
Marian, V., Blumenfeld, H. K., & Kaushanskaya, M. (2007). The Language Experience and Proficiency Questionnaire (LEAP-Q): Assessing Language Profiles in Bilinguals and Multilinguals. Journal of Speech Language and Hearing Research, 50(4), 940-967. doi:10.1044/1092-4388(2007/067)
Marlow, J., Clough, P., Recuero, J. C., & Artiles, J. (2008). Exploring the Effects of Language Skills on multilingual Web search. Proceedings of the IR research, 30th European conference on Advances in information retrieval (ECIR'08) (pp. 126-137). Berlin, Heidelberg: Springer-Verlag.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
McCarley, J. S. (1999). Should we translate the documents or the queries in cross-language information retrieval? In Proceedings of the 37th Annual Meeting of the Association For Computational Linguistics on Computational Linguistics (College Park, Maryland, June 20 - 26, 1999). Annual Meeting of the ACL. Association for Computational Linguistics, Morristown, NJ, 208-214. Retrieved November 10, 2005, from http://acl.ldc.upenn.edu//P/P99/P99-1027.pdf.
McNamee, P. and Mayfield, J. (2002). Comparing cross-language query expansion techniques by degrading translation resources. In Proceedings of the 25th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Tampere, Finland, August 11 - 15, 2002). SIGIR '02. ACM Press, New York, NY, 159-166. Retrieved 11/22/05, from ACM Portal.
McNamee, P. and Myfield, J. (2004). Corss-language retrieval using HAIRCUT for CLEF (2004). In Working Notes for the CLEF 2004 Workshop (Bath, United Kingdom, September, 2004). CLEF-(2004). Retrieved November 23, 2005, from http://www.clef-campaign.org/2004/working_notes/WorkingNotes2004/04.pdf.
Meuter, R. F., & Allport, A. (1999). Bilingual language switching in naming: Asymmetrical costs of language selection. Journal of Memory and Language, 40(1), 25-40.
Mills, J. (2001). Being bilingual: Perspectives of third generation Asian children on language, culture and identity. International Journal of Bilingual Education and Bilingualism, 4(6), 383-402.
Moore, D. S., McCabe, G. P., & Craig, B. A. (2009). Introduction to the practice of statistics. New York: WH Freeman.
Morel, E., Bucher, C., Pekarek-Doehler, S., & Siebenhaar, B. (2012). SMS communication as plurilingual communication: Hybrid language use as a challenge for classical code-switching categories. Lingvisticae Investigationes, 35(2), 260-288.
Moschkovich, J. (2005). Using two languages when learning mathematics. Educational Studies in Mathematics, 64, 121-144. DOI: 10.1007/s10649-005-9005-1.
Most common languages used on the internet as of June 2016, by share of internet users (June, 2016). Retrieved from: https://www.statista.com/statistics/262946/share-of-the-most-common-languages-on-the-internet/.
Myers-Scotton, C. (1998). A theoretical introduction to the markedness model. Codes and consequences: Choosing linguistic varieties, 18-38.
Ng, H. T., Wang, B., & Chan, Y. S. (2003, July). Exploiting parallel texts for word sense disambiguation: An empirical study. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol 1 (pp. 455-462). Association for Computational Linguistics.
Nie, J. Y. (2010). Cross-language information retrieval. Synthesis Lectures on Human Language Technologies, 3(1), 1-125.
Nie, J. Y., and Cai, J. (2001). Filtering noisy parallel corpora of Web pages. In IEEE Symposium on Natural Language Processing and Knowledge Engineering (Tucson, AZ, October), 453-458.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Nie, J. Y., Simard, M., Isabelle, P., and Durand, R. (1999). Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In Proceedings of the 22nd Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Berkeley, California, United States, August 15 - 19, 1999). SIGIR '99. ACM Press, New York, NY, 74-81. Retrieved: 10/17/05, from ACM Portal.
Nilep, C. (2006). Code switching in sociocultural linguistics. Colorado Research in Linguistics, 19(1), 1-22.
Nzomo, P., Rubin, V. L., & Ajiferuke, I. (2012, February). Multi-lingual information access tools: user survey. In Proceedings of the 2012 iConference (pp. 530-532). ACM.
Qahfarokhi , A.S. & Biria, R. (2012). The impact of task difficulty and language proficiency on Iranian EFL learner's code-switching in writing. Theory and Practice in Language Studies, 2(3), 572-578.
Oard, D. W. (1998). A Comparative Study of Query and Document Translation for Cross-Language Information Retrieval. In Proceedings of the Third Conference of the Association For Machine Translation in the Americas on Machine Translation and the information Soup (October 28 - 31, 1998). D. Farwell, L. Gerber, and E. H. Hovy, Eds. Lecture Notes in Computer Science, vol. 1529. Springer-Verlag, London, 472-483. Retrieved November 23, 2005, from http://citeseer.ist.psu.edu/oard98comparative.html.
Oard, D. W. (2009). Multilingual information access. In Bates, M. J., and Maack, M. N. (Eds),
Encyclopedia of Library and Information Sciences, 3rd Ed., Taylor and Francis.Och, F. (2007, May 23). Search without boundaries. [Blog post]. Retrieved from:
http://googleblog.blogspot.com/2007/05/search-without-boundaries.html . Pecina, P., Dušek, O., Goeuriot, L., Hajič, J., Hlaváčová, J., Jones, G. J., ... & Popel, M. (2014).
Adaptation of machine translation for multilingual information retrieval in the medical domain. Artificial intelligence in medicine, 61(3), 165-185.
Peinado, V., Artiles, J., Gonzalo, J., Barker, E., & López-Ostenero, F. (2008, September). FlickLing: a multilingual search interface for Flickr. In Working Notes for the CLEF 2008 Workshop, Aarhus, Denmark.
Peinado, V., Rodrigo, Á., & López-Ostenero, F. (2013). Multilingual Information Access. Emerging Applications of Natural Language Processing: Concepts and New Research, 203. DOI: 10.4018/978-1-4666-2169-5.ch009
Peters, C., Braschler, M., & Clough, P. (2012). Multilingual Information Retrieval: From Research to Practice. Heidelberg, Germany: Springer.
Peters, C., Clough, P., Gey, F. C., Karlgren, J., & Magnini, B. (Eds.). (2007). Evaluation of Multilingual and Multi-modal Information Retrieval: 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, Alicante, Spain, September 20-22, 2006, Revised Selected Papers. Springer.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Peters, C., & Sheridan, P. (2001). Multilingual information access. In Lectures on Information Retrieval (pp. 51-80). Springer Berlin Heidelberg.
Petrelli, D., Beaulieu, M., Sanderson, M., Demetriou, G., Herring, P., & Hansen, P. (2004). Observing users, designing clarity: A case study on the user-centered design of a cross-language information retrieval system. Journal of the American Society for Information Science and Technology, 55(10), 923-934.
Petrelli, D., Hansen, P., Beaulieu, M., Sanderson, M., Demetriou, G. and Herring, P. (2004) Observing Users - Designing CLARITY a case study on the user-centred design of a cross-language information retrieval system. Journal of the American Society for Information Science and Technology, 55 (10). pp. 923-934.
Picchi, E., Peters, C. (1998). Cross-language information retrieval: A system for comparable corpus querying. In G. Grefenstette ed. Cross-Language Information Retrieval. Kluwer Academic Publishers, Norwell, MA, 81-92.
Pirkola, A. (1998). The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In Proceedings of the 21st Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Melbourne, Australia, August 24 - 28, 1998). SIGIR '98. ACM Press, New York, NY, 55-63. Retrieved: 10/4/05, from ACM Portal.
Pirkola, A., Hedlund, T., Keskustalo, H., and Javelin, K. (2001). Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings. Information Retrieval 4, 3-4 (Sep. 2001), 209-230. Retrieved November 21, 2005, from http://www.info.uta.fi/tutkimus/fire/archive/dictionary_based.pdf.
Pirkola, A., Cosijn, E., Bothma, T., Nel, J. (2002). Cross-lingual information access in indigenous languages: a case study in Zulu language. In Emerging frameworks and methods, Proceedings of the Fourth International Conference on Conceptions of Library and Information Science, CoLIS4, Seattle, USA, 21 - 25 July 2002. Retrieved November 29, 2005, from http://ucdata.berkeley.edu:7101/sigir-2002/sigir2002CLIR-10-pirkola.pdf.
Ponte, J. M. and Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Melbourne, Australia, August 24 - 28, 1998). SIGIR '98. ACM Press, New York, NY, 275-281. Retrieved 11/10/05, from ACM Portal.
Potthast, M., Stein, B., & Anderka, M. (2008). A Wikipedia-based multilingual retrieval model. Lecture Notes in Computer Science, 4956, 522-530.
Prochasso, E. & Fung, P. (2011) Rare word translation extraction from aligned comparable documents.
In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 1327-1335.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Purwarianti, A., Tsuchiya, M., & Nakagawa, S. (2007). Indonesian-Japanese transitive translation using English for CLIR. Information and Media Technologies, 2(2), 612-640.
Qi, D. S. (1998). An inquiry into language-switching in second language composing processes. Canadian Modern Language Review/La Revue Canadienne des Langues Vivantes, 54(3), 413-435.
Ramirez, J.M.P. (2012). Language switching: A qualitative clinical study of four second language learners' composing processes. (Unpublished doctoral dissertation). University of Iowa.
Rao, V. S. and Varma, V. (2010) User behavior in a multilingual information access task. Centre for Search and Information Extraction Lab, International Institute of Information Technology, Report No: IIIT/TR/2010/30. Hyderabad: India.
Reid, S. A., & Wood, V. V. (2013). An Empirical Examination of the Relationship between Bilingual Acculturation, Cultural Heritage to Identity, and Self-Esteem. National Social Science Journal, 40(2), 94-99.
Resnik, P. and Smith, N. A. 2003. The Web as a parallel corpus. Computational Linguistic, vol. 29, no. 3, 349-380. Retrieved October 14, 2005, from http://nlp.cs.jhu.edu/~nasmith/webascorpus.pdf.
Rezaei, S. H. S., & Gheitanchian, M. (2008, December). Code Mixing or Code Switching? A case study: Native Speakers of Turkish in Farsi Production. In Global Practices of Language Teaching: Proceedings of the 2008 International Online Language Conference (IOLC 2008) (p. 61-67). Universal-Publishers.
Rieh, H. Y., & Rieh, S. Y. (2005). Web searching across languages: Preference and behavior of bilingual academic users in Korea. Library & Information Science Research, 27(2), 249-263.
Rieh, S. Y. (2004). On the Web at home: Information seeking and Web searching in the home environment. Journal of the American Society for Information Science and Technology, 55(8), 743-753.
Rogati, M. and Yang, Y. (2004). Resource selection for domain-specific cross-lingual IR. In Proceedings of the 27th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Sheffield, United Kingdom, July 25 - 29, 2004). SIGIR '04. ACM Press, New York, NY, 154-161. Retrieved on 10/17/05, from ACM Portal.
Ruiz, M. E., & Chin, P. (2010). Users’ image seeking behavior in a multilingual tag environment. In Multilingual Information Access Evaluation II. Multimedia Experiments (pp. 37-44). Springer Berlin Heidelberg.
Russell, D.M. (April 23, 2013). When to use "Translated foreign pages"? Retrieved from: http://searchresearch1.blogspot.com/2013/04/ramong-writes-in-with-great-question-i.html.
Russell, D. M., & Grimes, C. (2007, January). Assigned tasks are not the same as self-chosen Web search tasks. In System Sciences, 2007. HICSS 2007. 40th Annual Hawaii International Conference (pp. 83-83). IEEE.
Sadat, F. (2010). Using comparable corpora to improve the effectiveness of cross-language information
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
retrieval. In H. Loftsson, E. Rognvaldsson, & S. Helgadottir (Eds). 7th International Conference on NLP, Paper presented at IceTAL 2010, Reykjavik, Iceland, August 16-18, 2010
Sabahat, P. (2013). A study on reasons for code-switching in Facebook by Pakistani Urdu English bilinguals. Language in India, 13(11), 564-590.
Saracevic, T. (1997). The stratified model of information retrieval interaction: Extension and applications. Proceedings of the American Society for Information Science, 34, 313-327.
Saralegi, X. & de Lacalle, M.L. (2010). Estimating translation probabilities from the Web for structured queries on CLIR. In C. Gurrin, Y. He, G. Kazai, U. Kruschwitz, S. Little, T. Roelleke, S. Ruger, and
K. van Rijsberen (Eds) Advances in Information Retrieval: 32nd European Conference on IR Research, 586-589, presented at ECIR 2010, Milton Keynes, UK, March 28-31, 2010.
Saville-Troike, M. (2008). The Ethnography of Communication: An Introduction (3rd Ed). Oxford, UK: John Wiley & Sons.
Savolainen, R. (1995). Everyday life information seeking: Approaching information seeking in the context of “way of life”. Library & Information Science Research, 17(3), 259-294.
Savoy, J. and Dolamic, L. (2009) How effective is Google's translation service in search? Communications of the ACM, 52(10), 139-145.
Schäfer, R., & Bildhauer, F. (2012). Building large corpora from the Web using a new efficient tool chain. In LREC (pp. 486-493).
Schmid, M. S. (2010). Languages at play: The relevance of L1 attrition to the study of bilingualism. Bilingualism: Language and Cognition, 13(1), 1-7.
Schwartz, B. (2013, May 23). Google Drops “Translated Foreign Pages” Search Option Due to Lack of Use. Search Engine Land.
Scotton, C. M., & Ury, W. (1977). Bilingual strategies: The social functions of code-switching. International Journal of the Sociology of Language, 1977(13), 5-20.
Shakery, A., & Zhai, C. (2013). Leveraging comparable corpora for cross-lingual information retrieval in resource-lean language pairs. Information Retrieval, 16(1), 1-29.
Shannon, C. (1948). A Mathematical Theory of Communication. Bell System Technical Journal 27, 379-423, 623-656.
Sheridan, P. and Ballerini, J. P. (1996). Experiments in multilingual information retrieval using the SPIDER system. In Proceedings of the 19th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Zurich, Switzerland, August 18 - 22, 1996). SIGIR '96. ACM Press, New York, NY, 58-65. Retrieved on 10/4/05, from ACM Portal.
Smith-Christmas, Cassie (2012) I've lost it here dè a bh' agam: Language shift, maintenance, and code-switching in a bilingual family. (Unpublished doctoral dissertation). Retrieved from Glasgow Theses Service. (glathesis:2012-3798)
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Soliman, A. (2008). The changing role of Arabic in religious discourse: A sociolinguistic study of Egyptian Arabic. (Unpublished doctoral dissertation). Indiana University of Pennsylvania, PA.
Somers, H. (1999). Review article: Example-based machine translation. Machine Translation, 14(2), 113-157.
Sperlich, W. B. (2005). Will Cyberforums Save Endangered Languages? A Niuean Case Study. International Journal Of The Sociology Of Language, 2005(172), 51-77.
Srinivasarao, V. (2010) Mining the behaviour of users in a multilingual information access task. In: CLEF 2008 Workshop Notes, Aarhus, Denmark, September 17-19 (2008).
Sterling, G. (2007, May 24). Google Launches ‘Cross-Language Information Retrieval (CLIR)’. Retrieved May 29, 2013, from Search Engine Land: http://searchengineland.com/google-launches-cross-language-information-retrieval-clir-11296
Suarez, D. (2002). The paradox of linguistic hegemony and the maintenance of Spanish as a heritage language in the United States. Journal of Multilingual and Multicultural Development, 23(6), 512-530.
Tomala, A. M. (2016). The Taiwanese linguistic mosaic. Multilingual, 27(7), 39-43Taylor, R. S. (1968). Question-negotiation and information seeking in libraries. College & research
libraries, 29(3), 178-194.Ture, F., Lin, J., & Oard, D.W. (2012). Combining statistical translation techniques for cross-language
information retrieval. Proceedings of COLING 2012: Technical Papers (Mumbai, December 2012), 2685-2702.
Van Heuven, W. J., & Dijkstra, T. (2010). Language comprehension in the bilingual brain: fMRI and ERP support for psycholinguistic models. Brain research reviews, 64(1), 104-122.
Wang, L. (2003). Switching to first language among writers with differing second-language proficiency. Journal of Second Language Writing, 12(4), 347-375. DOI: 10.1016/j.jslw.2003.08.003.
White, R. W. & Drucker, S.M., (2007). Investigating behavioral variability in Web search. Paper presented at WWW2007. Banff, Alberta, Canada. May 8–12, 2007 (pp. 21-30).
Wilson, T. D. (1997). Information behaviour: an interdisciplinary perspective. Information Processing & Management, 33(4), 551-572.
Wilson, T.D. (2005). Evolution in information behavior modeling: Wilson's model. In K.E. Fisher, S. Erdelez, & L.E.F. McKenchnie (Eds), Theories of Information Behavior, 31-36.
Wilson, T.D., Ford, N., Ellis, D., & Foster, A. (2002). Information seeking and mediated searching: Part 2. Uncertainty and its correlates. Journal of the American Society for Information Science and Technology, 53(9), 704-715.
Woodall, B.R. (2002) Language-switching: Using the first language when writing in a second language. Journal of Second Language Writing 11 (1), 7-28
World Internet Users Statistics and 2016 World Population Stats. (2016, June 30). Retrieved February
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
02, 2017, from http://www.internetworldstats.com/stats.htmWu, D., He, D., & Luo, B. (2012). Multilingual needs and expectations in digital libraries: A survey of
academic users with different languages. The Electronic Library, 30(2), 182-197.Xu, J. and Croft, W. B. (2000). Improving the effectiveness of information retrieval with local context
analysis. ACM Transactions on Information Systems. 18, 1 (Jan. 2000), 79-112. Retrieved: 11/4/05, from ACM Portal.
Xu, J., and Weischedel, R. (2000). Cross-lingual information retrieval using Hidden Markov models. In Proceeding of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. Hong Kong, October 7-8, (2000). Retrieved November 23, 2005, from http://acl.ldc.upenn.edu/W/W00/W00-1312.pdf.
Xu, J. and Weischedel, R. (2005). Empirical studies on the impact of lexical resources on CLIR performance. Information Processing and Management, vol. 41, no. 3, 475-487. Retrieved 11/17/05, from Elsevier.
Xu, J., Weischedel, R., and Nguyen, C. (2001). Evaluating a probabilistic model for cross-lingual information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and Development in Information Retrieval, 105-110. Retrieved: 10/17/05, from ACM Portal.
Ye, Z., Huang, J. X., He, B., & Lin, H. (2012). Mining a multilingual association dictionary from Wikipedia for cross‐language information retrieval. Journal of the American Society for Information Science and Technology, 63(12)J, 2474-2487.
Yin, R. K. (2014). Case study research: Design and methods. Sage publications.Yip, V., & Matthews, S. (2006). Assessing language dominance in bilingual acquisition: A case for
mean length utterance differentials. Language Assessment Quarterly: An International Journal, 3(2), 97-116.
Yuexiao, Z. (1988). Definitions and sciences of information. Information Processing & Management, 24(4), 479-491.
Zarei, G.R. & Amiryousefi, M. (2011). A study of L2 composing task: An analysis of conceptual and linguistic activities and text quality. Procedia – Social and Behavioral Sciences, 30, 437-441.
Zhang, Y. and Vines, P. (2004). Using the web for automated translation extraction in cross-language information retrieval. In Proceedings of the 27th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Sheffield, United Kingdom, July 25 - 29, 2004). SIGIR '04. ACM Press, New York, NY, 162-169.
Zhou, D., Truran, M., Brailsford, T., & Ashman, H. (2008). A hybrid technique for English-Chinese cross language information retrieval. ACM Transactions on Asian Language Information Processing, 7(2).
Zhou, D., Truran, M., Brailsford, T., Wade, V., & Ashman, H. (2012). Translation techniques in cross-
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
language information retrieval. ACM Computing Surveys (CSUR), 45(1), 1-51.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Appendix I. Participant Recruitment Letter
Hello,My name is Peishan Bartley, a PhD student from the Graduate School of Library and Information Science at Simmons College. I would like to invite you to participate in my research study about the language preference of Chinese-English bilingual speakers when they search for information online. Participants of the study will be asked to choose between Chinese and English versions of articles, and answer a few interview questions afterward. The participants of this study will remain anonymous. No personal identification information will be collected for this study. The study will take between 10-15 minutes. If you use both Chinese and English every day, and would like to participate in the study, please contact me at: [email protected].
Thank you for your consideration.Sincerely,Peishan Tsai Bartley
您好, 我是 Simmons College 图书资讯科学系的博士生. 在此邀请您填写一份研究问卷. 研究题材是中英双语使用者在使用网路时如何在中英文之间做选择. 我一直对自己如何在双语之间做取舍十分好奇. 您呢? 本问卷应该十至十五分钟之内即可完成. 您能提供的任何帮助我都由衷的感谢. 若您知道有其他会有兴趣参与研究的中英双语使用者, 请将邀请他们也填写问卷. 谢谢您的帮忙. 问卷网址: http://web.simmons.edu/~tsai.
蔡佩珊 (Peishan Tsai Bartley)Simmons College School of Library and Information Science
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Appendix II. Informed Consent Form
Informed Consent Form English Version
Chinese-English Bilingual Speaker Language Preference Study
Before the study begins, please read the following overview of the study and what is required of you. Please read them carefully. If you are 18 years old or older and agree to continue, please provide your name in the bottom of the page. Your name will only be used as a digital signature to signify your agreement to participate in the study. It will not be used in any part of the study. If you have any questions, please contact Peishan Bartely at [email protected]. If you have questions about your rights as a research subject, please contact Valerie Beaudrault, Human Protections Administrator in the Office of Sponsored Programs of Simmons College at 617-521-2415. Thank you!PurposeThe purpose of this study is to explore how and why users select between Chinese and English when they are looking for information on the World Wide Web.
ProcedureThe study contains three sections:
1. Demography and language background survey: a survey that collects information on your lan-guage history and use.
2. Article language selection exercise: eight articles will be presented to you one at a time in both English and Chinese. You will be asked to select the language version that you prefer.
3. Language preference and selection process survey: a questionnaire that asks about your rea-soning and motivation during the article selection exercise.
The study should not take more than 20 minutes to complete.
The objective of this study is to collect your views and responses. There are no right or wrong answers. Please answer truthfully and respond intuitively. No judgments are made on your answers.ConfidentialityYour participation is voluntary. If any of the questions make you uncomfortable, you can withdraw from the study at any time.
Furthermore, your participation in this research is confidential. Every precaution will be taken to protect your privacy and the confidentiality of the records and data pertaining to you. In the event of a publication or presentation resulting from the research, no personally identifiable information will be shared. Any recordings made during the study is accessible to the researcher only. Once the study is complete, the recordings will be destroyed.If you have any question during the study or experience any problem with the survey, please don't hesitate to contact the researcher: Peishan Bartley at [email protected] on the continue button would signify that you have read and agreed to the above statements:continue to the study
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Informed Consent Form Chinese Version
中英雙語使用者語文選擇研究謝謝您的參與. 在開始之前, 請仔細閱讀本研究大綱和對參與者的要求. 若您已年滿 18 歲並願意繼續進行, 請在本頁下方提供您的姓名. 您的名字僅代表您閱讀了本文並同意參與, 不會被包含於研究內容之內. 任何可辨識您身分之紀錄與您個人隱私資料皆被視為機密處理, 不會公開. 若您有任何疑問, 請聯絡蔡佩珊 (Peishan Bartely): [email protected]. 若您對身為參與者的權力有任何疑問, 請聯絡 Simmons College, Human Protections Administrator in the Office of Sponsored Programs, Valerie Beaudrault (617-521-2415). 謝謝.目的本研究目的在探索中英雙語使用者在網路上如何選用語言, 在選擇時有哪些考量.程序本研究有三部分:
1. 使用者語文背景調查: 基本語文學習及使用習慣.2. 中英雙語對照文章選擇: 八篇文章將一一以中英對照方式呈現. 請您在中文和英文版本中選擇偏好閱讀的版本.
3. 語文選擇考量調查: 此部分將問您在文章選擇時的考量和原因.整個過程可在二十分鐘內完成.本研究用意在於收集您的意見及想法, 沒有所謂正確答案, 因此請您忠實的依直覺反應回答問題. 您的答案和其他參與者的答案會被綜合起來一併分析, 不會被單獨評斷.保密原則
您對本研究是自願參與. 在問卷過程中若有任何疑慮, 您可以隨時退出.如上文所列,您的參與及資料皆會被視於機密處理.所有的原始資料將被審慎保管. 在本研究論文完成後, 所有收集的資訊將被銷毀.若問卷過程中有任何疑問或發生甚麼問題, 請聯絡研究生蔡佩珊 [email protected].若您閱讀上列相關資訊, 經過考量後同意參與本研究, 請按鍵:
繼續前往問卷調查
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Appendix III. Demographic and Language Skill Questionnaire
English version
2
2 The two languages listed here corresponds to user’s entry of the languages they know in question 4.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Chinese Version
3
3 The two languages listed here corresponds to user’s entry of the languages they know in question 4.
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Appendix V. Interview Script
1. Was it difficult or was it easy to select between the two language versions? Why?
在中文與英文中做選擇, 對您來說很困難還是很簡單? 為什麼?
2. When you looked at the articles in general, is there a language version you prefer? Why?
一般說來, 當您在看這些文章時, 您比較偏好中文版或英文版? 為什麼?
3. Are there any articles that are especially interesting to you? Which ones and why?
回顧剛才文章的題材, 有哪些是您特別感興趣的?
4. Reviewing the results, there are articles that you choose the Chinese version over the English
version. Please explain why.
回顧您剛才做的選擇, 這些文章您選擇閱讀中文版本. 請您解釋一下您在兩語言之間做選
擇時的考量?
5. Reviewing the results, there are the articles that you choose the English version over the
Chinese version. Please explain why.
回顧您剛才做的選擇, 這些文章您選擇閱讀英文版本. 請您解釋一下您在兩語言之間做選
擇時的考量?
6. Are there other thoughts or considerations on language selection that you care to share with this
researcher?
請問您對語言選擇有沒有其它的想法及考量可以分享?
7. Reflecting on the survey process, were there any thoughts and suggestions you can share?
回顧整個研究過程, 您有甚麼建議及想法可以跟研究者分享的呢?
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Appendix VI. Post Article Selection Questionnaire
English Version with Simulated Article Selection Result
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Chinese Version with Simulated Article Selection Result
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Appendix VII. Variables represented in the survey items
The survey is based on LEAP-Q developed by Marian et al. (2007). Questions were added
about user's Internet use frequency and patterns. The table below lists the questions, and the
corresponding research question about: Language exposure (exposure), language attitude (attitude,
which includes dominance and preference), language fluency (fluency), frequency of language use
(frequency), and familiarity with subject matter (subject).
Item Survey Question Research Question and Variable1 What is your gender Basic demographic information2 In what year were you born Basic demographic information3 In what year did you move to the US Language history: exposure4 Please list all the languages you know and use. Language profile: dominance, attitude5 What is the order in which you learned the languages? Language history: attitude, frequency
6Please list the percentage of time you are exposed to each lan-guage on a daily basis. Language profile: exposure
7
When you can choose a language to speak with another person who is fluent in every language that you speak, what are the per-centages that you would choose to speak in each language? Language preference: attitude
8
If you are presented with a document with unknown content written in a language that you do not know, what are the per-centages that you would choose to translate it into each of the following languages? Language preference: attitude
9Please name the cultures that you identify with and rate the ex-tent to which you identify with each of them. Language profile: attitude
10Please roughly estimate the amount of time you have spent in the following environments for as long as you have lived. Language history: exposure
11 How long have you been using English daily? Language history: exposure, frequency12 How would you describe your English reading ability? Language profile: fluency13 In general, when do you use English more than Chinese? Language preference: attitude14 How long have you been using Chinese daily? Language history: exposure, frequency15 How would you describe your Chinese reading ability? Language profile: fluency16 In general, when do you use Chinese more than English? Language preference: attitude
17 Generally speaking, which language do you prefer and why. Language preference: attitude
18On average, how many hours do you spend on the Internet each day? Internet use: frequency
19How frequently do you use the Internet for the following activi-ties in each language?
Language use on the Internet: fre-quency, attitude
20When you are online, how do you decide what language to use? What are your decision-making criteria? Language use on the Internet: attitude
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Appendix VIII Coding Framework
Initial Coding Example
* Eng: 在美国工作生活环境都需要英语* Ch: * None: Environment. Frequency of use.
* Eng: * Ch: 母语* None: Mother tongue.* Eng: I need to use it in my work* Ch: * None: Frequency of use.* Eng: * Ch: 易懂* None: Easier to understand.* Eng: * Ch: * None:Depending on the situation, or my environment.
Depends on situation.
Secondary Coding Example
Language Preference
Initial Coding Secondary coding
Chinese I am more fluent in this language. FluencyIt is more convenient to use.It is my native language.It is easier for me to use.I can express myself more accurately in this language.
Better for expressing thoughts
It is easier to understand. Better comprehensionI am more familiar with it. AccustomedI am more used to it.It is a beautiful language. Personal preferenceI am not fully assimilated into American culture or environment.
Cultural acclimatization
I use it more in my work environment. Frequency of useIt is the language used in my social network.It is the language most used by people around me. Amount of exposure
CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.
Appendix IX Literature Review Source
The literature included in this article are cultivated from three electronic databases - ACM,
Library and Information Science, Library, and Information Science & Technology Abstracts; as well as
Google Scholar. The documents were retrieved using the following search terms:
1. “cross-lingual information retrieval”
2. “cross language information seeking”
3. “multilingual information retrieval”
4. “multilingual information seeking”
5. “cross-language information retrieval”
6. “multilingual information access”
7. “information seeking behavior and the Web”
8. “information seeking behavior and Internet”
9. “Digital libraries” or “electronic libraries”
10. Bilingual or multilingualism
11. “Bilingual speaker” and Internet
12. “Language selection”
13. “Language choice”
14. “Language preference”
15. Bilingualism (“language choice” OR “language preference” OR “code switching”)
and various Boolean combinations thereof. Some articles were found through citations in
reviewed literature as well as serendipitously during the search process.