Peishan Bartley - web.simmons.eduweb.simmons.edu/~tsai/Papers/Dissertation_Bartley.docx · Web viewWhich Language to Use? Chinese-English Bilingual Speakers’ Language Selection

Which Language to Use? Chinese-English Bilingual Speakers’ Language Selection Criteria for Digital

Information Resources

Peishan Bartley

Simmons College

Abstract

The World Wide Web has made it easy to access information resources of different languages, and

efforts have been put into developing information systems that could search for information across

languages in the field of cross language information retrieval. At the same time, a wealth of studies

delved into how people search for information in the fields of information seeking behavior. How users

are currently searching and consuming information across languages, on the other hand, has received

comparably little attention. This research explores the variables that influences a bilingual user’s

language choice for digital information. More specifically, this research focuses on the influences of

variables that constitutes a bilingual speaker’s language profile, such as language history and language

exposure. Using a mixed method approach that includes surveys and an article selection exercise, this

research explores the variables that cause a Chinese-English bilingual user to prefer one language over

the other when they are given parallel digital contents in Chinese and English. The results show that the

user’s language profile and background has statistically significant impact on the user’s language

CHINESE-ENGLISH WEB USER'S LANGUAGE PREFERENCE.

choice. The results also show concerns over social interaction and personal biases to carry over into

information seeking behavior.

Keywords: information seeking behavior, cross-language information retrieval, multilingual

information access, language choice, language preference, bilingual speakers


Contents

Peishan Bartley...........................................................................................................................................i

Simmons College........................................................................................................................................i

Abstract.......................................................................................................................................................i

Chapter 1. Introduction and Problem Area Overview............................................................................viii

Chapter 2. Definition of Terms..................................................................................................................9

Languages.........................................................................................................................................9

Information and Information Seeking..............................................................................................9

Bilingual and Multilingual Speakers..............................................................................................10

Language proficiency, dominance, preference, and attitude...........................................................11

Chapter 3: Literature Review...................................................................................................................15

Overview..............................................................................................................................................15

Cross Language Information Retrieval and Multilingual Information Access....................................16

Machine-Based Approaches...........................................................................................................17

Dictionary-Based Approaches........................................................................................................18

Parallel Corpora Based Disambiguation Methods..........................................................................20

Probabilistic-Based and Statistical Approaches..............................................................................22

Latent Semantic Indexing (LSI) and Language Models.................................................................25

Transitive and Triangulation Methods............................................................................................27

Other Design Issues........................................................................................................................28

Summary of CLIR Literature Review............................................................................................29


CLIR and Bilingual Users....................................................................................................................29

The Occasions for CLIR.................................................................................................................30

Multilingual Information Seeking Behavior, Language Proficiency and Domain Knowledge.....33

The Use of Existing CLIR Features and Systems..........................................................................35

Summary of Literature Review on CLIR and Bilingual Users.......................................................36

Information Seeking Behavioral..........................................................................................................38

Information Seeking Behavior Models...........................................................................................39

Summary of Literature Review on Information Seeking Behavior Studies...................................45

Bilingual User's Language Choice and Language Use........................................................................47

Code Switching...............................................................................................................................48

The Impact Factors of Language Choices......................................................................................49

First and Second Language Uses in Composition..........................................................................54

First and Second Language Uses in Composition Computer-Mediated-Communication..............55

Language Exposure and Language Dominance.............................................................................56

Summary of Literature Review on Bilingualism............................................................................57

Conclusions of the Literature Review..................................................................................................58

Chapter 4. Research Question and Research Methodology.....................................................................61

Research Question...............................................................................................................................61

Research Method.................................................................................................................................62

Overview.........................................................................................................................................62

Measurements......................................................................................................................................65

Language Attitude...........................................................................................................................65


Language Exposure and Experience...............................................................................................66

Language Proficiency.....................................................................................................................66

Subject Matter.................................................................................................................................66

Putting it Together..........................................................................................................................67

Material................................................................................................................................................67

Language Proficiency and Internet Usage and Experience Survey................................................67

Article Selection Software..............................................................................................................68

Think Aloud Protocol.....................................................................................................................69

Article Selection Follow-Up Questionnaire...................................................................................70

Population............................................................................................................................................70

Procedures............................................................................................................................................71

Pilot study.......................................................................................................................................71

General User Survey.......................................................................................................................72

Scope and Limitations of the Study.....................................................................................................72

Chapter 5. Data Analysis..........................................................................................................................74

Pilot Study Results...............................................................................................................................74

General Survey Results........................................................................................................................74

Language Profile.............................................................................................................................75

Language Preference in General.....................................................................................................90

Language Use Scenarios.................................................................................................................93

Article Selection Exercise..................................................................................................................105

Article Selection Results...............................................................................................................105


Post Article Selection Survey........................................................................................................111

Additional Thoughts.....................................................................................................................116

Chapter 6. Discussion............................................................................................................................117

Research Question and Method Review............................................................................................117

Language Attitude..............................................................................................................................118

Result............................................................................................................................................118

Discussions and Implications........................................................................................................119

Summary.......................................................................................................................................123

Language Exposure and the History of Language Use......................................................................124

Result............................................................................................................................................125

Discussion and Implications.........................................................................................................125

Summary.......................................................................................................................................129

Language Proficiency........................................................................................................................130

Result............................................................................................................................................130

Discussion and Implications.........................................................................................................130

Summary.......................................................................................................................................133

Subject Matter....................................................................................................................................133

Result............................................................................................................................................134

Discussions and Implications.......................................................................................................134

Other Findings and Observations.......................................................................................................139

Chapter 7. Conclusion, Implications, Limitations, and Future Research..............................................142

Limitations.........................................................................................................................................143


Future Research.................................................................................................................................144

References..............................................................................................................................................146

Appendix I. Participant Recruitment Letter...........................................................................................164

Appendix II. Informed Consent Form...................................................................................................165

Informed Consent Form English Version...........................................................................................165

Informed Consent Form Chinese Version..........................................................................................166

Appendix III. Demographic and Language Skill Questionnaire...........................................................167

English version...................................................................................................................................167

Chinese Version..................................................................................................................................172

Appendix IV. User Study Article Selection Samples.............................................................................176

Appendix V. Interview Script.................................................................................................................178

Appendix VI. Post Article Selection Questionnaire...............................................................................179

English Version with Simulated Article Selection Result..................................................................179

Chinese Version with Simulated Article Selection Result.................................................................180

Appendix VII. Variables represented in the survey items......................................................................181

Appendix VIII Coding Framework........................................................................................................182

Initial Coding Example......................................................................................................................182

Secondary Coding Example...............................................................................................................182

Appendix IX Literature Review Source................................................................................................183



Table of Figures

Table 1. Use of Language in Different Situations....................................................................................85

Table 2. Participant English Reading Proficiency...................................................................................86

Table 3. Participant’s Chinese Reading Proficiency................................................................................86

Table 4. Language Preference and Dominant Language Comparison.....................................................91

Table 5. Participant’s reasons for preferring one language......................................................................92

Table 6 Dominant language and the corresponding language to dominant culture.................................93

Table 7. Daily Activity and Language Use..............................................................................................94

Table 8 Participant Online Activity-language Use Summary..................................................................99

Table 9 Participant Online Activity-language Use...................................................................................99

Table 10. Dominant Language and the Frequency of Using English for Online Activities. Mann-

Whitney Test Result...............................................................................................................................100

Table 11. Preferred Language and the Frequency of Using English for Online Activities....................101

Table 12. Language Choice for Internet Activity and Language Proficiency........................................103

Table 13. Number of Years Living in the US and Conducting Internet Activity in English..................103

Table 14. English Daily Exposure (in Percentage) and Conducting Online Activity in English...........104

Table 15. Criteria for online language choice........................................................................................104

Table 16. Article Selection Result..........................................................................................................105

Table 17. A Cross Comparison of General Language Preference and Online Language Preference.....111

Table 18. Is it easy or hard to choose between different language excerpts?........................................112

Table 19. Language preference for the news article excerpts................................................................113

Table 20. Language preference reasons.................................................................................................114

Table 21. Why one language appeals to you first..................................................................................115


Figure 1. MLIR process.............................................................................................................................2

Figure 2. Language Choice Variables and Information Seeking Behavior..............................................67

Figure 3. Daily English Exposure in Percentage vs. Number of Years Residing in the US....................79

Figure 4. Survey Language and Amount of Daily English Exposure......................................................80

Figure 5. Scatter Plot – English as Spoken Language vs. English as Text Language.............................82

Figure 6. Daily use of language...............................................................................................................84

Figure 7. Amount of Daily English Exposure and the Length of Daily English Use..............................84

Figure 8. English proficiency and number of years residing in US.........................................................87

Figure 9. English Proficiency Level and Dominant Language................................................................89

Figure 10. English Proficiency and Survey Language.............................................................................90

Figure 11. English Proficiency and Answer Language............................................................................90

Figure 12. Language Choice for Different Situations..............................................................................95

Figure 13. Language Use Online and the Number of Years Living in US..............................................96

Figure 14. Language Use Online and Daily Language Exposure............................................................96

Figure 15. Domain Language and Online Language Preference Comparison........................................97

Figure 16. Language Preference in General and Online Language Preference Comparison..................97

Figure 17. Daily Internet use...................................................................................................................98

Figure 18. Amount of Personal/Recreational Research Conducted in English Clustered by English

Proficiency (1 – lowest, 5 – highest).....................................................................................................103

Figure 19. Article Selection Result and Information Source Language................................................107

Figure 20. Amount of Daily English Exposure and the Number of English Articles Selected.............108

Figure 21. Number of English Articles Selected and English Proficiency Level..................................109

Figure 22. Average Number of English Version Articles Selected and English Proficiency Level.......109


Figure 23. Pie chart - language preference for the news article excerpts..............................................113

Figure 24. Pie Chart - Preferred Language and English Proficiency.....................................................114


Chapter 1. Introduction and Problem Area Overview

Human language is rich in diversity. This is evident in Maryland Language Science Center’s

Langscape project (http://langscape.umd.edu) which mapped 6,300 languages from 175 countries, and

in European Union’s 24 official and working languages (http://ec.europa.eu/education/official-

languages-eu-0_en). The wide linguistic diversity in both text and speech is reflected in online

resources as well. A simple search on Google brings up websites created in a vast number of languages,

from Afrikaans to Kongo to Yiddish. There are bountiful of information expressed in all these different

languages ready for anyone with an Internet connection to tap into from anywhere in the world. Yet

even with the easy access, information seekers may never find the information because they don’t

know the language the information is written in. They may not be able to understand it if they

happened upon it, and may not even begin to use the language to form a search term and search for it.

The discrepancy between a user’s known languages and the language the relevant information is

written in (henceforth referred to as document language) is a barrier that bars information access. To

bridge this language gap is the primary goal of the fields of multilingual information access (MLIA)

and cross language information retrieval (CLIR).

MLIA and CLIR are subfields of information retrieval (IR), a field that strives to solve the issues

of storage, retrieval, and display of information resources (Baeza-Yates & Riveiro-Neto, 1999). MLIA

emphasizes on the discovery, access, querying, and retrieving of multilingual digital documents. It is IR

further complicated by the addition of multiple languages. The multilingual nature is within the

collection itself, and also between the user and the documents (Oard, 2009; Peters & Sheridan, 2001;

Peinado, Rodrigo, & Lopez-Ostenero, 2013). There are three components in MLIA that works together

as described in Figure 1.

(1) a digital collection of multilingual documents,

http://ec.europa.eu/education/official-languages-eu-0_en

http://ec.europa.eu/education/official-languages-eu-0_en

http://langscape.umd.edu/


(2) a computer system that supports information retrieval of multilingual documents, and

(3) users who can or would make use of a multilingual collection.

Figure 1. MLIR process.

On one side of the system is the collection. A collection of multilingual documents could contain

documents each composed in only one language (monolingual documents), or be made up of multiple

languages (multilingual documents). An example of monolingual document collection is the Parliament

of Canada website (www.parl.gc.ca) which provides online access to digital documents written either

completely in French, or completely in English. As for digital collections with documents that contain a

mix of languages, see language teaching websites, such as www.guidetojapanese.org, for example. The

language teaching website uses a mix of languages within a single sentence to provide vocabulary

definition or usage demonstration.

On the other side of the system is the users. Internet users may or may not be fluent in the

document languages present in the collection. With disparate language skills, some users may be able

to come up with the search terms that would retrieve information relevant to their information need,

and some would not be able to do so without help. The later situation can potentially be solved by a

cross language information retrieval (CLIR) system. CLIR systems are the computer systems that sits

between the users and the document collections, and acts as a bridge. CLIR systems are designed to

accept search terms in one language (the query language) and retrieve relevant documents in other

languages (the document languages).

http://www.guidetojapanese.org/

http://www.parl.gc.ca/


CLIR is viewed by some researchers as an integral part of MLIA (Oard, 1999; and Peters &

Sheridan, 2001). Not only does it address language identification, character encoding, and multilingual

indexing issues that occur in the processing of a multilingual collection, it also aims to provide a way to

cross the language gap (Peters & Sheridan, 2001, Gey, Kadno, & Peters, 2005; Peters & Sheridan,

2001). The process of CLIR begins when a user composes a query in a query language of his/her

choice. Once the search terms were entered, the system proceeds to match it to potentially relevant

documents through translations, statistical algorithms, or other matching methods (Herbert, Szarvas, &

Gurevych, 2011; Ye, Huang, He, & Lin, 2012; and Nie, 2010). Using a CLIR system, the users are no

longer confined by their language ability; they can search in collections of foreign language, and

retrieve documents in different tongues. A user's access to multilingual digital information resources

would therefore be broadened.

With the rapid spread of the Internet, and the development of a Web structure on which

multilingual contents can be hosted and accessed by users during the 1990’s, the potential and

importance of CLIR was recognized (Peters, Braschler, & Clough, 2012, Gey, Kando, & Peters, 2005).

Since then, there has been many significant developments within the field, yet the translation of

technology developments into a comprehensive CLIR system for common users appears to be slow in

coming (Gey et al., 2005; Diekema, 2012; Peters, Clough, Gey, Karlgren, & Magnini, 2007). CLIR

applications widely available to the general public were not available until Google launched Google

Translated Search, also known as Translated Foreign Pages search option, in May, 2007 (Sterling,

2007).

Google Translated Search combined statistical machine translation (Inside Google Translate, n.d.;

Russell, April 23, 2013) and Web searching. It offered a way to search for digital information resources

across languages. Google Translated Search takes a user's query, replace it with search terms from the

user's intended document languages, and use the new set of search term to retrieve results. The process


is similar to query reformulating (Belkin, 2000) but with the added complexity of reformulating the

query into a different language. The search results can be viewed in the original document languages,

or be translated into the query language. The search feature was seen as a breakthrough for

transitioning CLIR research into a publicly available, real-life application (Chen & Bao, 2009).

However, this breakthrough was not widely adopted by end-users even though researchers foreseen

many benefits for using such a system, such as to plan for a foreign trip or to broaden research scope

(Artiles, Gonzalo, Lopez-Ostenero, & Peinado, 2007; Marlow, Clough, Recuero, & Artiles, 2008). In a

small study, Web users who were introduced to the feature expressed doubt to its usefulness and

practicality (Marlo et al., 2008). Citing lack of use, Google Translated Search was disabled in 2013

(Schwartz, 2013). To date, there has not been another generally available Web-based CLIR system.

There are many possible reasons for Google Translated Search's lack of use, such as insufficient

user awareness (e.g. Marlow et al., 2008), degradation of retrieval effectiveness (Savoy and Dolmaic,

2009). From subject responses collected by Marlow et al. (2008), it appears that the lack of use is

possibly a result of too little understanding of its potential users and how they approach multilingual

digital information resources.

Ruminating on the applications of CLIR systems, Oard (2009) envisioned two types of users: (1)

multilingual speakers who may be able to formulate queries and read documents in different languages,

and (2) monolingual speakers who need translations to help bridge the language differences between

their known language and the collection language. The biggest difference between the two types of

users is language proficiency. A number of existing MLIA research held similar vision and view user’s

language proficiency as one of the major MLIA impact factors. For example, Marlo, et al. (2008)

studied the impact of language proficiency on users' multilingual search task results; and Hong (2011)

examined how bilingual users adjust their search strategies according to their language proficiency.

Petrelli, Hansen, Beaulieu, Sanderson, Demetriou, and Herring (2004) inspected the amount of details


and information needed for users of different language proficiency to judge a document’s relevance.

Other studies observed how users’ attitudes toward using a less familiar language with an existing

CLIR systems. Such examples include Artiles, et al. (2006) and Marlo, et al.’s (2008) studies using

Google Translate, and FlickrLing; and Petrelli, Beaulieu, Sanderson, Demetriou, Herring, and Hansen's

(2004) study on test system CLARITY. Yet is language proficiency the only variable that impacts a

user's CLIR experience?

Some researchers propose search task as the other impact factor on the information seeker’s

CLIR behavior (Petrelli et al., 2004, Rieh and Rieh, 2005, Hong, 2011, Steichen, Ghorab, O’Conner,

Lawless, & Wade, 2014). Research have found that users do not begin an information seeking process

blindly. When an information need rises, users decide upon the language and information resource to

use based on past search experience or speculation of where they can most likely find relevant

information. For example, travelers needing train time table and students looking for movie show time

choose to search for the information using the native language (Aula & Kellar, 2007; Hong, 2011).

When the same students need to do scholarly research, they search in academic databases using English

as their query language (Hong, 2011). From these observations, it appears that users’ language selection

is task based. I would argue that it is task based because information resources are currently segregated

by languages. Information seekers are knowledgeable enough to know that if they are looking for local

information, they need to use the native language. If they are looking for information within a field that

has a dominant language, they need to search within the dominant language. This phenomenon has

been observed by Stiechen, et al. (2014), and has more to do with the current availability of

information, and less to do with the user. Whereas language proficiency is an innate capacity and

contributes to a person’s language preference, task based decision is made based on learned experience,

and not a personal choice. This research is focused on the impact factors on user’s personal preference.

The emphasis is on the variables brought about by the user. Variables such as language proficiency,


which is but one of a complicated set of factors that, altogether, forms a person’s language profile.

A person’s language profile is a composition of the person’s language history (exposure, length of

use, etc.), current language environment, language attitude (preference and cultural identity) and

language dominance (concepts of language preference and dominance are discussed in detail in the

term definition section that follows) (Marian, Blumenfeld, & Kaushanskaya, 2007). Research in the

field of bilingualism have found that several language profile elements are more influential to the

language choice of a bi- or multilingual speaker in different situations, and demonstrated how language

choice is complex and involves many different factors (Dewaele, 2007; Bahrick, Hall, Goggin,

Bahrick, & Berger, 1994, Hakuta & D'Andrea, 1992). This is reflected in a dated but still relevant

statement by linguist Fishman, “habitual language choice is far from being a random matter of

momentary inclination, even under those circumstances when it could very well function as such from

a purely probabilistic point of view” (1965, p. 67). Linguists have been studying bilingual speakers and

identifying different features and functions relating to language choice. Their findings, however, have

not been examined in the light of information seeking or CLIR. This study intends to fill this gap by

examining bilingual user’s information behavior through the user’s language profile.

In summary, cross lingual information retrieval systems have expanded into real world

application but has not gained traction among users. I propose that part of the reason is because there

has not been enough understanding of CLIR users. There are studies on how users handle language

skill deficiencies when they need to search for information across languages, and on how language

proficiency level impacts users’ information seeking behavior, but there are likely other impact factors

that have not yet been explored. This study looks beyond language skills and incorporates other

language profile elements into the investigation of users’ cross language information behaviors. This

study further differs from existing research by examining potential impact factors of the language

selection process without the interference of search tasks and different system interface designs. By


doing so, this study gains a deeper look into bilingual users and a deepen the knowledge of is needed to

help users navigate the multilingual World Wide Web. The results is valuable across several subject

areas including CLIR system designs, web-based information seeking behavior, and bilingualism.

For CLIR, the findings of this research could enhance system designers' understandings of

bilingual users' multilingual information resource uses, and on what type of metadata or search features

could better assist users. Understanding how users identify content with language and how language

factors into information consumption can help decide whether full text translation is needed, or if

supportive features, such as query term suggestions, would be enough. This study's finding will also

contribute to the research into information seeking behavior where researchers continues to examine

the phenomenon of information seeking on the Web.

Web-based information seeking behavior has been examined from many different angles, such as

the general behavioral patterns of Web users (Jansen & Spink, 2006); the intents of Web searching

(Jansen, Booth, & Spink, 2008); the variability of a user's Web search patterns (White & Drucker,

2007); and the different behaviors of users by subject field (Ge, 2010). The field has so far focused on

monolingual speakers who represent only a part of the highly heterogenous Internet using demography.

There are few discussions about information seeking behaviors of bilingual and multilingual users, or

about cross-language information seeking behaviors. This study examines the information seeking

behaviors of bilingual and multilingual users in the hopes that it would enrich our understanding of

information users, and add additional layers to the as yet defined bilingual user profile. This research

also sheds light on how users perceive the availability of information.

For bilingualism, the findings of this study provide additional data on how bilingual speakers

think about multilingual digital information resources, and how they approach them. Bilingualism

research often focus on language switching in social context, for educational purposes, or on its effects

on mental processing. For example, Androutsopoulos' (2009) examination of language use on an online


forum observed that language choice is used to alter the setting of a discussion, such as from formal to

informal. Another example is Kaushanskaya, Gross, and Buac's (2014) study on the effect of children's

bilingual experience on cognitive skills. Studies of language use online, such as Lam (2004), have

largely confined to social media where language is used for communication between people. This

research explores users’ language choice when the resource is text-based information.

The next chapter provides the definitions to crucial concepts relevant to this study with the

literature review following after.


Chapter 2. Definition of Terms

Before we continue any further, important terms and concepts used in this paper needs to be

defined. At the center of the discussion are the concepts: information, information seeking, bilingual

and multilingual speakers, and language proficiency, dominance, attitude, and preference.

Languages

English. English refers to modern English with Americanized grammar and spelling, spoken

and used in the United States.

Chinese. Chinese refers to Mandarin Chinese, the common, standard, working language in

Taiwan and mainland China. Other dialects will be referred to in their name, such as Hokken, a dialect

used in Taiwan, and Cantonese, a major language used in Hong Kong.

Information and Information Seeking

Information. “Information” is not an easy concept to define. Researchers from different fields

have extensively discussed what the term “information” means (e.g. Shannon, 1948; Artandi, 1973;

Belkin & Robertson, 1976; Belkin, 1978; Farradane, 1980; Zhang, 1988; Buckland, 1991; Capurro &

Hjorland, 2003; Bates, 2006; Hjorland, 2007). Information has been described as a message issued

from a source as signals to a receiver (Shannon, 1948); as physical representations of knowledge, such

as books (Buckland, 1991); in relation to a communicated and transformed state of knowledge (Belkin,

1978); or as a pattern or organization of matter and energy (Bates, 2006). This study follows Bates’

(2009) line of thought and define information through its role in general conversation (Bates, 2009)

such as: students search for information about the American Civil War for a school project; a person

looks for public transportation information for an upcoming trip in a foreign city; reporters searches for

information on different products in order to write a review. In the above cases, “information” are facts

and statistics that enters a person's cognitive space either through active pursuit or passive encounter,


and alters the person's knowledge store. Encountering information leaves an impact on the person in

the forms of emotional responses, deeper knowledge, a reaffirmation of pre-existing understandings, or

new ideas and thoughts. In this context, “information”: (1) is fact or data transmitted through a media,

text, image, or sound; (2) needs to be received by a person; and (3) changes the emotional and/or

knowledge state of the receiver. The physical vehicle (such as documents, images, or collections of

both) that carries the information are referred to as information resources.

This study is about user’s language choices regarding digital, text-based documents. Henceforth

in this research, the term “information” refers to digital, text-based documents, and “information

resources” refers to databases or document collections that can be accessed through the Internet.

Information seeking. Information seeking is the action of looking for relevant information that

would satisfy one’s information need. The action can be a purposeful search of specific fact, or

intentional browsing for interesting information resources. It is a deliberate behavior, different from

information encountering in which unexpected discoveries are made through passive, unintended

exposure to information (Erdelez, 1999). The process of online information seeking involves: (1)

recognition of the information need, (2) an initial strategy of browsing or searching, (3) formulation of

search terms, and (4) examination and evaluation of retrieved document set (Holscher & Strube, 2000),

Bilingual and Multilingual Speakers

Oxford English Dictionary defines “monolingual” as “a person who speaks only one language”

(“Monolingual”, 2013). More specifically, monolinguals are people who can speak, read, write, and

comprehend only one language.

OED's definition of “bilingual” refers to “one who can speak two languages” (“Bilingual” [Def.

3], n.d.). In the same spirit, multilingual describes a person who can speak more than two languages.

While there are differences in the number of languages involved in the concepts of multilingualism and

bilingualism, the complexity of both concepts stems from the involvement for more than one language.


As a result, the following discussion groups multilingual speakers with bilingual speakers, and focuses

on the concept of bilingual speakers and bilingualism.

For many scholars, “able to speak two languages” is an oversimplification of bilingualism (Baker

& Jones, 1998). “The ability to speak” is overly vague and does not cover the many levels of

proficiency, frequency of use, and ways of language use. Bloomfield (1935), for example, strictly sees

only speakers with “native-like control of two languages” (p.56) as bilingual. Macnamara (1967), on

the other hand, only requires one to have a minimal competency in any of the four language skills

(listening comprehension, speaking, reading, and writing) in one non-native language to qualify as

bilingual. Grosjean (2012) approaches bilingualism emphasizing on the frequency of language use.

Bilingual speakers are defined as “those who use two or more languages (or dialects) in their everyday

lives” (Grosjean, 2012, p.4). Hamers and Blanc (2000) argues that bilingualism needs to be defined

through societal and cultural context. From their point of view, the definition of bilingualism needs to

account for the psychological and social functions of language.

The definition for “bilingual” in this dissertation falls somewhere between Bloomfield’s (1935)

stringent language skills requirement and Macnamara’s (1967) relaxed condition. A bilingual speaker is

defined in this paper as a person who can read, write, speak, and have adequate listening

comprehension ability to carry on a conversation in a language in addition to their native language.

Some of the subjects in this study are exposed to both languages regularly or habitually in their daily

lives either at home or in professional settings. Others only use their second language sporadically. All

of them are able to communicate somewhat with both languages in both oral or written form.

The social, cultural, and psychological dimensions as well as language proficiency are not treated

as critical criteria, but would be accounted for as potential impact factors for this study.

Language proficiency, dominance, preference, and attitude

Language proficiency. A person’s use of language is a complicated matter that can be viewed


and measured in various ways. Language proficiency is one such measure, and it is one that is often

examined as an impact factor in cross-lingual information seeking.

Language proficiency is a person's ability to express their thoughts in and comprehend a

language (Francis, 2012). Lim, Liow, Lincoln, Chan, and Onslow (2008) quoted Birdsong (2006) and

describes proficiency as related to “the mastery of syntax, vocabulary, and pronunciation of a

language” (p.39). Proficiency is often measured by educators, researchers, or institutions through

language tests such as ones offered by American Council on the Teaching of Foreign Languages

(www.actfl.org) and Cambridge English Language Assessment (www.cambridgeenglish.org). It can be

tested and the result expressed quantitatively. For this study, participants are asked to self-rate their

language proficiency level using the LEAP-Q survey questions (Marian, et al., 2007).

Language dominance and language preference. The meaning of “language dominance” is not

as clear cut as language proficiency. Bedore et al. (2012) views dominance as “a measure of relative

performance” (p. 4). Measurements of it can be taken through the actions of reading and writing. In

another point of view, dominance results from the difference in one's mental processing ability between

first and second languages (Birdsong, 2006; Aparicio & Lavaur, 2013). Dominant language is the one

that a speaker can process faster and more accurately (Aparicio & Lavaur, 2013). Flege, Mackay, and

Piske (2002) included other measurements into consideration, such as self-ratings of a person’s ability

to read, write, speak and understand his/her known languages; and the speed, or “automaticity”, of a

person's language processing capability.

This research uses Grosjean’s (1982) definition of language dominance: a person's inclination to

use one language over other known languages. This is different from language preference, which is a

person’s more favorable attitude toward one language over other languages. Language dominance is a

confluence of a host of variables, including the person’s language proficiency, the degree of ease they

feel when they are mentally processing the language, their cultural identification, the frequency of

http://www.cambridgeenglish.org/

http://www.actfl.org/


language use, and their exposure to the language (Gertken, Amengual, & Birdsong, 2014; Grosjean,

1982). A person who is most proficient in their native tongue could view their second language as the

dominant one because of the environment they are in, the domain in which the language is used, or the

amount of exposure one is subjected to (Lim et al., 2008; Birdsong, 2006; Grosjean, 1982). A person

might prefer one language, but has a different dominant language due to the frequency of use. The

above two examples demonstrate how multifaceted and nuanced language dominance is formed.

In this research, participants are asked of their dominant languages in the survey. They are also

asked about language preferences. Moreover, language preference is observed through the participants’

choice of language for the survey questions (survey language), and for answering the questions (answer

language).

Language attitude. Language attitude has long been associated with the acquisition and

maintenance of language, language use, identity construction, and other language related issues (Ianos,

Huguet, Janes, & Lapresta, 2015). It is:

The attitudes which speakers of different languages or language varieties have towards each

other’s languages or to their own languages. Expressions of positive or negative feelings

towards a language may reflect impressions of linguistic difficulty or simplicity, ease or

difficulty of learning, degree of importance, elegance, social status, etc. Attitudes towards a

language may also show what people feel about the speakers of that language. Language

attitudes may have an effect on second language or foreign language learning. (Longman

Dictionary of Language Teaching and Applied Linguistics, 2010, p.314).

A classical view of attitudes is to break it into three components: cognitive (thoughts and

belief), affective (feeling), and readiness for action (behavioral intention or plan of action) (Baker,

1992). For example, a Chinese-English bilingual speaker’s attitude towards English can be seen as the

composition of: (1) a belief that it is important to be able to speak English (cognitive), (2) anxiety of


having to speak English (affective), and (3) avoidance of using English when Chinese is available

and/or accepted (readiness for action). Consequently, this current study measures the language attitude

by collecting participants’ thoughts about languages through open-ended questions, and the observation

of language choice for the survey and for question answering.

Now that the terms have been defined, the next chapter reviews literature of relevance.


Chapter 3: Literature Review

Overview

The information seeking behavior of a bilingual user is an interdisciplinary issue. It involves

information seeking, accessing multilingual information resources (MILA), the use of different

languages, and the possible employment of a cross-language information retrieval (CLIR) system. As a

result, at least three fields are of concern: information seeking behavior, bilingualism, and MLIA and

CLIR. Relevant studies from these three fields are reviewed below (see Appendix VI for resources and

search terms used) beginning with CLIR and MLIA, followed by information seeking behavior, and

bilingualism. The following paragraphs will provide an introduction to and define the scope of the

literature review.

CLIR has been an active and productive field since the 1980's. For this study, an overview of the

major growths in the field will be given to illustrate how the main focus of the field has been in the

development of more efficient CLIR technology. The review on MLIA literature will focus on the

studies on MLIA user behaviors that mostly focus on how and what resources are used by users; how

users thought about and use CLIR systems; how they conduct cross-language information seeking; and

how language related variables impact system use.

Information seeking behavior is multifaceted and complicated. There is the motive that spurred

the action, the selection of the information source, the formulation of the query, and finally the act of

reviewing the retrieved documents and judging their relevancy to one's information need. The focal

point of the literature review would be on establishing the importance of considering language choice

in the overall information seeking behavior.

Bilingualism is an interdisciplinary subject involving many fields as well, including linguistics,

psychology, neuroscience, education and sociology. Here, particular interest is paid to research


studying the variables that impact language preference, language choice, and language switching.

As will be shown in the rest of this chapter, literature in CLIR seems to focus on technology and

system development, literature in information seeking behavior focuses on monolingual information

seeking patterns, and literature in bilingualism emphasizes the cultural and social aspects of bilingual

speakers' language use in oral and sometimes written communications. While there are MLIA research

on how bilingual users are using existing CLIR systems, not much has been done to explore how they

select the language to begin with. This review will demonstrate that still more needs to be done to

understand how bilingual Web users associate with languages in regards to digital information

resources.

Cross Language Information Retrieval and Multilingual Information Access

Cross-Language Information Retrieval (CLIR) addresses the situation in which a user submits a

query in one language to retrieve documents written in other languages. As a recognized sub-field of

information retrieval (IR) (Gey, Kando & Peters, 2005), CLIR shares many characteristics of the

general IR which deals with the representation, storage, retrieval, and access of a document collection

(Baeza-Yates & Ribeiro-Neto, 1999), but has the added complexity of language differences between

the query language and the document language to contend with. The major challenge is to represent and

store documents of multiple languages in a way that can facilitate effective access and retrieval using a

different query language.

A review of CLIR literature suggest that the major trend in the field is in system and technology

development, especially in improving the recall (the percentage of relevant results retrieved) and

precision (the percentage of relevant results among all retrieved results) of the system. More recently,

topics of image and multimedia file retrieval, and user's experience interacting with the system have

also garnered interest in the field (Gey et al, 2005). Even with the emergence of studies on human

interaction with CLIR system, the focus of the field appears to be on technological advancements (Gey,


Kando, Peters, 2005). While user behavior, interface designs, and user's information needs are

identified as major issues, they received little attention and still present research opportunities (Gey,

Kando, Peters, 2005; Petrelliet al., 2004).

Within the scope of this study (see Chapter 4 for the complete outline of the scope), and the use

of text document within this proposed research, the following review on CLIR literature identified and

covers the major approaches to CLIR systems that includes: machine translation that combines existing

machine translation systems with monolingual IR systems; query and/or document translation using

lexical resources such as dictionaries, corpora, and Web pages; sense disambiguation and query

expansion techniques that augment the accuracy or recall of retrieval results; statistical models, such as

latent semantic indexing, that maps the relationship among words; language modeling, triangulation

and other alternative methods that can be used when there is a dearth of direct language to language

translation resource. The next section begins with machine-based approaches.

Machine-Based Approaches

Machine-based translation (MT) is the use of existing computer software to translate text

document from one language to another via statistical algorithm or some linguistic resource (Somers,

1999; Pecina et al., 2014). The translation software, such as Google Translate, is designed to produce

translations that are as accurate and fluent for human readers. With the ability to translate either the

documents or the queries into the same language, MT should effectively remove the language barrier

and change a cross lingual situation into a monolingual one (Oard, 1998, McCarley, 1999; Zhou,

Truran, Brailsford, and Ashman, 2008). Yet for a long time, MT were not able to achieve the results of

other CLIR approaches (Oard, 1997; Fujii and Ishikawa, 2000). On one hand, queries, often in the form

of brief, sequential words, do not usually provide enough contextual clues for accurate translation

(Pikola, 1998). On the other hand, while lengthier documents often do lead to better translation quality,

the translated results are still not good enough to produce a satisfactory CLIR result (Oard, 1998; He,


Wang, Oard, & Nossal, 2002). Early efforts to improve machine translation qualities for CLIR purposes

were judged as not effective enough to mitigate the required cost (Ballesteros and Croft, 1997), but

improvements have been made as technology advances. And statistical machine translation based

approaches have dominated CLIR efforts as query translation qualities improve (Pecina et al., 2014;

Wu, He, & Grishman, 2008), document translations become good enough to identify a documents'

relevancy to a query (Orengo & Huyck, 2006), and machine translation results come closer to

simulating what a non-native speaker might be able to produce (Chen, Ding, Jiang, and Knudson,

2012).

Google and Microsoft Bing, two major English search engines, both provide translation features

through MT (Russell, May 24, 2007; DePalma, July 11, 2012). Google translate

(http://translate.google.com/ ) supports 80 languages (Google, 2013, December 10), and Bing

(http://www.bing.com/translator ) lets users choose among 44 (http://www.bing.com/translator/help/).

In May 2007, Google combined its translation feature with search, and rolled out Google Translated

Search (Sterling, May 23, 2007). Users were able to use Google Translated Search to retrieve results

from multiple languages using one query. The search feature, however, was discontinued in 2013

(Sterling, October 7, 2013) as its machine translation feature continues to serve many CLIR projects as

the basis for statistical machine translation (Pecina et al., 2014).

Dictionary-Based Approaches

Dictionary-based approaches use machine readable dictionaries, bilingual word lists, or other

lexical resources to translate the query terms by replacing them with their target language equivalents

(e.g. Hull and Frefenstette, 1996; Croft, 1998; Oard, 1998; Prikola et al. 2001; Zhou et al, 2008; Airio

and Kettnen, 2009; Kishida and Ishita, 2009). These types of lexical resources may be different from

conventional dictionaries and thesauri in that they are not used to provide a precise description of the

meaning of the word or examples of uses for human readers, but to provide connections among words

http://www.bing.com/translator

http://translate.google.com/


such as synonyms and acronyms that can be read by software. In general, machine readable lexical

resources are easier to construct for different language pairs than an effective MT system that requires

the development of statistical algorithms on top of the lexical resources. Dictionary-based approaches

are therefore viewed as easier to implement (Levow, Oard, and Resnik, 2005; Oard, 1998).

Furthermore, researchers have found that by using language resources with comprehensive coverage

and accurate relationships among words, they are able to produce CLIR results that rival or even

surpass monolingual IR system performances (Zhou, Truran, Brailsford, Wade, & Ashman, 2012). The

quality and coverage of the language resources can have significant impact to the CLIR system’s

translation performance. A poorly constructed dictionary that fails to identify phrases, does not cover

newly coined terms and compound words, lacks in the coverage of multi-word expressions and

common out of vocabulary terms such as proper names and jargon, or has a narrow coverage can

greatly hinder the system performance (Hull & Grefenstette, 1996; Xu & Weishedel, 2000; Demner-

Fushman & Oard, 2003; Zhou, et.al, 2012).

There is also the issue of translation ambiguity that can cause translation errors. Translation

ambiguity occurs because words often carry multiple meanings that can lead to several different

translations. Not all of the translated meanings may be intended in the query. The situation can be

handled by either including all variations of translations, or discern, by some means, which translations

best represent the original query. The later process is referred to as disambiguation.

Using all the possible translations of every word in the query can inadvertently add noise to the

retrieval process (Hull & Grefenstette, 1996; Ballesteros & Crofts, 1998). This is because the approach

can lead to words with the most possible translations receiving more weight than words with fewer

translations, devaluing the latter, and thus degrading the retrieval outcome. The approach does not

appear to be used in more recent projects.

The alternative approach of selecting one translation for each words in the query can be done in


different ways. One way is to assume that the first definition listed in the dictionary is the most

frequently used, and the most likely to reflect the concepts expressed in the original query term.

Therefore, the system selects the terms corresponding to the first sense, or just the first term to use (e.g.

Oard, 1998; Ballesteros & Croft, 1998; Zhouet al., 2012). Or, as every term bears some possibility to be

the correct translation, one can randomly select a term from the potential translations as the new query

term. Oard (1998) showed that selecting a random translation from multiple translations can be as

effective as retaining every possible translation for a query, although both are far below the

performance of monolingual retrieval.

A better approach is to use statistical approach, such as co-occurrence rate, as the basis to decide

which translation is most likely to be appropriate. For example, Reddy and Hanumanthappa's (2012)

devised an approach based on the assumption that the correct translations of the words that form the

query have a higher likelihood to co-occur in the target document. Other methods used to disambiguate

the word sense includes part-of-speech tagging (Cutting, Kupiec, Pedersen, & Sibun, 1992; Davis and

Ogden, 1997) and the employment of parallel corpora.

Parallel Corpora Based Disambiguation Methods

Parallel corpora, also referred to as translation corpora, are sets of translation-equivalent texts in

which the corpus in language A mirrors the corpus in language B in both content and structure (Cartoni,

Zufferey, & Meyer, 2013; Johansson, 2007; Dyvik, 2004). Parallel corpora can be used as a direct

translation source through side-by-side analysis of text (e.g. Zhou, Truran, Brailsford, Wade, &

Ashman, 2012), as training text for statistical machine translation systems, (e.g. Cartoni, Zufferey, &

Meyer, 2013), to obtain co-occurrence statistics for sense disambiguation (e.g. Ballersteros & Croft,

1998; Ide, Erjavec, & Tufis, 2002, July; Ng, Wang, & Chan, 2003, July), or for linear disambiguation

(e.g. Davis, 1996; Davis, 1998; Davis & Ogden, 1997). These uses of parallel corpora would be

discussed more in later paragraphs.


The premise for using the co-occurrence statistics in parallel corpora for sense disambiguation is

that the correct translations would be used together in the document language corpora as the original

terms would be in the query language corpora. Therefore, the correct translations would co-occur in the

document language in frequency and distance the way the original terms co-occur in the query

language (e.g. Gao, Nie, He, Chen, & Zhou, 2002). The co-occurrence statistics of the potential

translations are then used as the foundation to select the correct word sense as translations.

The parallel nature of the corpora is used differently for linear disambiguation approach. This

approach is based on the assumption that a term and its translation would retrieve similar sets of

documents in their respective collections. Therefore, systems using linear disambiguation methods

retrieve documents from both language sets using the original query and all its translations. The

retrieved document sets are matched against each other. The translation that produced the most similar

set of documents to the original query is selected as the correct translation (Hiemstra & Jong, 1999).

While the translation process may produce excessive translations that are not related to the user's

original search, there is also the possibility that certain meanings are lost in translation. This situation is

often handled by query expansion methods, including local feedback and local context analysis.

Query expansion methods. Local feedback and local context analysis are two popular query

expansion methods used by IR systems to solve word sense mismatch problems that occurs when the

same idea is expressed in different ways in the query and the document, making it difficult for the

system to associate one with the other. Query expansion methods are used in CLIR to reduce this type

of dictionary-based translation errors (Ballesteros & Croft, 1997; Ballesteros & Croft, 1998; McNamee

& Mayfield, 2002; McNamee & Mayfield, 2004). Whereas disambiguation approaches are used to

eliminate incorrect translations, query expansion methods are used to make sure the correct sense is

included in the final translation set.

Local feedback is a common query expansion technique that retrieves documents in two steps. A


first set of documents is retrieved using the original query terms. The documents that were ranked

highest in relevancy are used by the system to extract additional query terms to expand the user's query

and retrieve the final set for the user (Xu and Croft, 2000; Wu & He, 2008). Local context analysis is a

method proposed by Xu and Croft (2000) that employs co-occurrence analysis for query expansion.

Concepts, instead of terms, are extracted from the top-retrieved documents. Both of the methods have

been seen to improve CLIR results (Zhouet al., 2012).

In addition to using lexical resources to look up translations, statistical and probabilistic-based

methods, reviewed in the next section, are also used to evaluate possible translations.

Probabilistic-Based and Statistical Approaches

Probabilistic based approaches in monolingual IR use algorithms to predict the probability of a

document matching a query (Baeza-Yates & Riveiro-Neto, 1999). In CLIR, rather than seeking direct

translation, probabilistic-based approaches estimate the probability of a term in the document language

being the translation of a term in the query language (e.g. Romdhane, Elayeb, Bounhas, Evrard, &

Saoud., 2013).

There are two great advantages to probabilistic-based approaches. One is that, once developed,

the systems are able to handle the languages in both directions - it can be used to translate language A

to language B, and from language B to language A without modification. The method was found to be

most effective if used among languages with similar structures, such as among European languages, or

among certain Asian languages (McNamee & Mayfield, 2004). Another advantage is that they are not

language dependent. In other words, once the algorithm has been developed, it can be used on any

language pairs as long as there are sufficient linguistic training materials such as parallel corpora.

Unfortunately, not every language has sufficient lexical resources for such use. Furthermore, with the

the varying sizes and quality of the linguistic resources, different language pairs usually require

individual systems to handle (Franz, McCarley, & Roukos, 1999). These two disadvantages show case


the importance of lexical resources to probabilistic-based approaches. Many different types of

resources have been explored for their potential use in probabilistic- based approaches as well as in

statistical-based approaches. These lexical resources include parallel corpora, comparable corpora,

webpages, and other Web resources. The use of these resources in probabilistic- and statistical-based

approaches are discussed next.

Lexical resources for probabilistic and statistical approaches. Parallel corpora are often

viewed as suitable resources for matching terms in one language to those in another because of the

translation-equivalent text they contain (e.g. Cartoni, Zufferey, & Meyer, 2013; Dyvik, 2004). The

aligned texts provide a foundation to construct the correlations between words, and are often used as

training material for statistical machine translation systems such as IBM's fast document translation

system (Franzet al., 1999) that built bilingual dictionaries and translation models using algorithms

automatically learned from aligned texts of parallel corpora, and HAIRCUT (McNamee and Mayfield,

2004). HAIRCUT uses parallel corpora as a basis for a n-gram based statistical model. The model

relies on language similarity instead of direct translation for query term mapping, and was shown to be

highly effective in CLIR testing (McNamee and Mayfield, 2004). Parallel corpora are also used to

augment coverage of existing bilingual dictionaries (Gao, Nie, Xun, Zhang, Zhou, and Huang, 2001);

or combined with other techniques for improved performance (e.g. Ture, Lin, & Oard, 2012;

Azarbonyad, shakery, & Faili, 2012). Researchers are able to use parallel corpora to explore the

relationship among words within the same language, and across languages without resorting to the use

of dictionaries.

But parallel corpora are not without shortcomings. The quality of the translation obtained through

parallel corpora is highly dependent on the scope and domain of the corpora, the vocabulary used in the

text, and the frequencies or word use (Nieet al., 1999; Maedaet al., 2000; McNamee & Mayfield, 2002;

Kraaij, 2001; Rogati & Yang, 2004). Same words may express different concepts when used in


different domains. The word “model”, for example, carries different connotations in the fields of math,

physics, and fashion. Parallel corpora in one subject domain may not be an effective translation

resource for documents written for another domain. Furthermore, parallel corpora are hard to come by

and difficult to develop (Ballesteros & Croft, 1998; Zhou et al., 2011; Pirkola et al., 2001; Gao, et. al.,

2001).

In addition to parallel corpora, there is comparable corpora. Comparable corpora are collections

of texts that contain similar content, but are not structurally aligned (Shakery & Zhai, 2013). The

contents in each corpus is written independently, and are not direct translations of each other. The

corpora provide a data source to map natural language lexical equivalents among languages due to the

fact that the texts are written in their individual languages for their respective readers. There are several

projects that uses contents on the World Wide Web as basis to create comparable corpora (e.g. Baroni,

Bernardini, Ferraresi, & Zanchetta, 2009; Schäfer & Bildhauer, 2012). Comparable corpora can be used

to generate multilingual thesaurus (e.g. Sheridan & Ballerini, 1996), or to extract the translative

relationships of words (e.g. Picchi & Peters, 1998; Franz, McCarley, & Roukos, 1999; Sadat, F., 2010;

Prochasson & Fung, 2011).

Be it parallel or comparable, the nature of the corpora makes it so that CLIR systems constructed

using corpora are multi-directional. That is, the systems can translate the words in both languages to

and from each other. However, comparable corpora share the disadvantages of parallel corpora. Though

it is presumed that comparable corpora are easier to obtain than parallel corpora, it is still hard to

develop, or to acquire a large enough set. It is also domain specific with word caring meanings that is is

applicable in one field but not necessary to another. The sensitivity to domain is an advantage when the

lexical sources used to train the statistical translation system and the document collection shares the

same domain (Pecina et al., 2014). Not so when the training corpora and the collection varies in topic.

Because of these shortcomings, alternative resources are tested for their suitability to replace


corpora as linguistic resources. For example, the World Wide Web has been treated as potential

resource by many researchers for parallel or comparable texts. Efforts using the Web as the training

resource for a probabilistic based CLIR system include the STRAND system developed by Resnik and

Smith (2003), PTMiner developed by Nie and Cai (2001), and Chiao and Zweigenbaum (2002). In

these systems, software identifies and harvests Web pages of similar content, and use them as parallel

or comparable corpus to train a probabilistic model. Chia and Zweigenbaum (2002), for example,

collected French and English websites on the same topic through the use of a medical thesaurus

(MeSH) and two Internet catalogs of medical websites (CISMeF for French medical websites and

CliniWeb for English medical websites) as the comparable corpus to use in a translation system. In

addition to webpages, other Web resources have also been explored for their potential as translation

material, such as anchor text and link structures (Lu, Chien, & Lee, 2004); search engine results (Zhang

& Vines, 2004; Chen et al., 2004); Web directory (Kumura, Meda, & Uemura, 2004); library online

public access catalogs (OPAC) (Larson, Gey, & Chen, 2003); news archives and blogs (Saralegi & de

Lacalle, 2010), and Wikipedia (Herbert, Szarvas, & Gurevych, 2011).

Instead of finding translations and estimating the probability of a term being the translation of

another, other approaches, namely latent semantic indexing and language models, examine and

calculate the relationship among words. Research on latent semantic indexing and language models are

reviewed in the next section.

Latent Semantic Indexing (LSI) and Language Models

Latent semantic indexing. LSI is a variant of the vector-space model that constructs the word-

word inter-relationship in a vector-space through the use of a set of multilingual documents (Littman,

Dumais and Landauer, 1998). LSI does not rely on external lexicon resources to determine word

relationships. The relationships are derived from a numerical analysis of the initial training data, such

as a set of multilingual documents. The method examines the contexts in which words appear, and


creates a multidimensional space with each term represented by a vector. In this lexicon dimension,

words used in similar contexts are located close together. Documents are also represented in the same

vector space, therefore similarities and dissimilarities between words and/or documents can be

determined by the distances among their representations in the multidimensional space.

By exploring the relationships among words and documents, within and across languages, LSI

models are able to retrieve relevant documents even when the documents and the queries do not contain

the same words. Once the vector-space is established, new materials could be added in without re-

establishment or adjustment. The method is entirely algorithmic, and does not need other lexical

resources besides the initial training data, which can be quickly developed. But as with probabilistic

models, the system's performance highly depends on the scope, quality, and domain of the training

material. Words with multiple meanings can cause semantic distortions. LSI is also computationally

expensive, and may be quite costly when dealing with a larger data set (Evanset al., 1998; Moriet al.,

2001).

Language models. Language modeling is used in information retrieval to predict the occurrences

of terms, with no sequential orders, in a document (Ponte and Croft, 1998). Where traditional

probabilistic models estimate the possibility of a document being relevant to a query, language models

assume that users have a general idea of what terms are likely to be found in their target documents.

Given a query, the language model estimates the probability that the query is generated to search for

each of the documents in the collection. The documents with the highest probabilities are presented to

the users as the search result (Lakey & Connell, 2005; Xu & Weischedel, 2000; Xu & Weischedel,

2001; Xu, Weischedel, & Nguyen 2001; Lavrenko, Choquette and Croft, 2002). The language models

can be established through the use of lexical resources such as parallel corpus or bilingual dictionaries.

Language model approaches are based on statistical theories. They are language independent, and

can incorporate the use of additional enhancements, such as document expansion and stemming


alternatives, into the system (Larkey and Connell, 2005). However, as with other statistical approaches,

lexicon coverage is extremely important for accurate translation probability estimations (Lavrenko,

Choquette, Croft, 2002); the model still relies on parallel corpus with comprehensive lexical coverage

for effective retrieval results.

However sophisticated the aforementioned CLIR approaches are, they all share one weakness:

the reliance on some kind of lexical resources, such as machine readable bilingual dictionaries or

corpora. Not all languages have such resources readily available. Researchers have come up with

different alternative methods for languages without sufficient lexical resources. Two of these methods

are transitive and triangulation methods.

Transitive and Triangulation Methods

Transitive and triangulation methods are used when there is not enough lexical resource to

establish a system that can directly map language A to and from language B. In this instance, a third

language that has established lexical resources for translations into both languages is used to facilitate

the process (Ballesteros & Sanderson, 2003; Purwarianti, Tsuchiya, & Nakagawa, 2007). These

methods not only allow for CLIR between languages that do not share translation resources; they are

also able to reduce the number of translations that needs to be done when a large number of languages

are involved.

Transitive methods. Transitive methods use a medium language to bridge the translation gap

between two languages. For example, one may wish to build a CLIR system that takes query terms in

Indonesian to retrieve documents in Japanese, yet there is no available machine readable Indonesian-

Japanese dictionary, parallel corpora, or comparable corpora for this language pair. Fortunately, there

are existing lexical resources between each of the languages and English. Therefore, a Indonesian-

Japanese CLIR system may be built using English as a pivot language (Purawarianti et al., 2007).

Terms in Indonesia is mapped to English terms that are than mapped to Japanese terms. Another


example uses machine translation systems to translate both queries and documents to an intermediary

language (Kishida & Kando, 2005). In the hybrid system developed by Kishida and Kando (2005), the

query is translated into the intermediary language (query set A), and from the intermediary language to

the document language (query set B). In the mean time, the documents is roughly translated into the

intermediary language. Query set A is used to retrieve a set of documents also translated into the

intermediary language. Query set B is used to retrieve a set of documents in the original document

language. The two sets are merged to form the end result.

Triangulation methods are based on a similar concept. Languages with more lexical and

translation resources are used to provide the connection for languages pairs without direct translation

resources. For example, assume language X and language Y have no direct translation resources. Query

terms in X are translated into two intermediary languages, A and B. The translations in A and B are

then translated into Y. Translations of A are used to retrieve one set of documents, and translations of B

are used to retrieve a second set of documents. The union among the two sets are kept as the final result

(Gollins & Sanderson, 2001).

The use of transitive and triangulation methods make it possible to include language pairs

without direct translation resources in a CLIR system. Used in combination with other techniques, such

as query structuring (Ballesteros & Sanderson, 2003), they can be effective. But as with all other

methods, there are weaknesses to these methods as well. The approaches depend on the translation

quality and contents of the intermediary languages. Errors could be introduced if no common words

were found in the translation sets or if non-intersecting but essential query words were dropped form

the translation (Gollins and Sanders, 2001; Ballesteros and Sanderson, 2003)

Other Design Issues

As retrieval technology continues to improve, more attention has been given to interface design

and supporting features. Studies has found that although users still demand improvement on translation


quality, they also found support features helpful. Such features include phrasal recognition and

translation features, assistance in query formulation, the ability to edit or choose translation terms, and

translated summaries of search results (Marlo et al., 2008; Petrelli, et , 2004; Wu, He, & Luo, 2012).

Wu et al. (2012). especially, argues for the importance of understanding the users and making the

functions and interfaces of a CLIR system the central part of design. A sample of research on CLIR and

user interaction is included in the next segment of the literature review.

Summary of CLIR Literature Review

While the review above does not provide a comprehensive coverage of CLIR technologies (see

Zhouet al., 2012 for a detailed discussion), it covers the main strands of research and demonstrates the

efforts invested into improving the effectiveness of the systems. Most of the research involves the

development of new technologies, the combination of existing technologies, the construction or use of

resources, and the refinement of systems. The act of retrieval is confined to the process of matching a

query to a set of documents. With the emphasis on system, user's profiles, needs, actions, and decisions

are treated as fixed. Yet “if we are to design useful machines, we must understand the process(es) by

which those machines will be used” (He, Oard, & Plenttenberg, 2006, p.2). In recent years, system

designers and researchers are beginning to see the importance of understanding user behavior in system

design, and research has been conducted to see how exactly would users use CLIR systems and what

factors influence their behaviors.

CLIR and Bilingual Users

As the last section demonstrates, much of the research in CLIR has been focused on the

performance of systems that is often measured by precision, recall, and translation quality. It cannot be

forgotten, however, that systems are designed for users. “To be effective, an information system has to

be faithful to a real context and in keeping with the use the end-user will make of it” (Petrelli et al.,

2004). To build an efficient CLIR system, it is important to ask: Who are the users? What do they think


about CLIR? How would they use a CLIR system? Do the users come from a homogenous group, with

the same need for certain technical support? Or are they heterogeneous and different user groups

require different CLIR features?

Knowledge about the users would better guide system design by identifying the features and

supports that users need, whether it is full text translation, query formulation assistance, or better

interface design, and funnel resources to where they are most required. The following paragraphs

review current studies that seek to understand the users and their interaction with CLIR systems. The

first segment establishes that users search for information in different languages due to practical

reasons. The second segment summarizes studies on CLIR users' information seeking behaviors and

strategies. A conclusion segment would summarize the literature review findings.

The Occasions for CLIR

There is noticeably fewer studies about users and CLIR than on CLIR technology. While each

research has its own focus and research questions, viewed together, the use of different sample groups

in existing research points to the fact that CLIR users are heterogeneous in nature. For example,

Petrelli, Hansen, Beaulieu, Demetriou, and Herring (2004) involved ten participants from four different

professions (journalists, translators, business analysts, and librarians) in the user-centered design

process of a CLIR system to observe what multilingual support they need. He, Oard, and Plenttenberg

(2006) synthesized three user studies involving 20 native English-speaking academics as participants

conducted during the design of a CLIR system. Rieh and Rieh (2005) interviewed English-Korean

bilingual academic users of a Korean university on their professional and personal Web use. Marlo,

Clough, Recuero, and Artiles (2008) recruited 12 computer science postgraduate or researchers of

different nationality, and observed how they use Google Translate's search function for a prescribed

task. Artiles, Gonzalo, Lopez-Ostenero, and Peinado (2006) observed 22 native Spanish speakers’

attitude and responses when they are asked to search for images using FlickLing, an image database


with CLIR search modes. Aula and Kellar (2009) recruited ten participants who use at least two

languages to search the Web, and described the decisions involved in the search sessions. Kralisch

(2005) studied the logs of an international health database and surveyed international students based in

German and Malaysia using an online questionnaire to study the cultural and linguistic impact on users.

Brazier and Harvey (2017) asked 10 PhD students who are non-native English speakers to search for

government services.

There are also a few studies that stood out for having larger sample sizes. Wu, et al. (2011)

collected 358 survey results, Clough and Eleta (2010) obtained 514 questionnaire responses, and

Steichen, et al. (2014) surveyed 448 participants. The participants of these three studies are largely

recruited from within the academia. Nevertheless, combined with the previously mentioned studies,

together, they show that potential CLIR users could be from different nations, speak different

languages, come from different fields and professions, have different information needs, encounter

various difficulties during information seeking, and use different strategies to search the World Wide

Web. As varied as the population and information seeking tasks, these users share the recognition that

they need to search beyond their native language for information resources that would fulfill an

information need. Sometimes, a foreign language is treated as a default search language for certain

tasks. Research has found non-English users resorting to English as their search language in the belief

that it would yield the most results (Aula & Kellar, 2009; Nzomo, Rubin, & Ajiferuke, 2012, Steichen,

et al., 2014) or “find everything” (Artiles et al., 2007, p.10). In some cases, a language may be viewed

as the dominant language for a specific profession or subject field. For example, English is seen as the

dominant language in technology and the sciences (Petrelliet al., 2004; Clough & Eleta, 2010; Artileset

al., 2007; Rieh & Rieh, 2005; Aula & Kellar, 2009; Steichen, Ghorab, O’Connor, Lawless, & Wade,

2014, Steichen, et al., 2014), German and French were found to be the dominant search language for

philosophy, and French that of the field of museum studies (Clough & Eleta, 2010).


And yet, users are not rigid in their language use. Polygots use multiple languages when

browsing or surfing on the Internet. Their language of choice changes depending on the situation and

the nature of the search tasks (Steichenet al. 2014). This language and information source switching

occurs because “...collections searched by the search engines are often region-specific and lack a

comprehensive understanding of the environment in which they operate” (Chung, 2008, p.36).

Although major search engines such as Google, Bing, and Yahoo let users search non-English

information resources, research have found that they do not have adequate coverage of domain- or

region-specific resources, and user's search results suffer as a consequence (Chung, 2008; Aula &

Kellar, 2009). Users, either through experience or presumption, understands the limits to each search

engine and would switch language and/or search engine based on the nature and purpose of their

information seeking session (Aula & Kellar, 2009; Clough & Eleta, 2010; Rieh & Rieh, 2005; Hong,

2011, Steichen, et al, 2014). For instance, academic researchers are observed to search in databases in

non-native languages to increase recall (Rieh & Rieh, 2005; Wu, He, & Luo, 2012; Clough & Eleta,

2010). Outside of the academic field, users are found to prefer local languages for cultural, historical,

linguistic, sight-seeing, and geographical information searches, even if they are not native speakers.

People living abroad have been observed to search for information resources, such as local news, in the

local language of their work place or school (Aula & Kellar, 2009; Rieh & Rieh, 2005; Hughes, 2005;

Hong, 2011; Nzomo, Rubin, & Ajiferuke, 2012, Steichenet al., 2014). For example, international

students living in the United States are found to prefer English as their query language and search on

English-based search engines for information pertaining to daily lives (Hong, 2011).

In these cases, users are aware that they need to use different languages to search in different

regional search engines in order to find the information they need. They choose the search engine and

query language based on their knowledge or assumptions of the information resources. These users see

the Web as segmented by languages, and deal with the fact by searching in each necessary segment. For


them, information seeking involves not only recognizing an information need and formulating the

query term, but the user also needs to make assumptions on the origin of the information resources, and

decide on where and in what language to search for them.

A user's preferred language may differ from the language they need to use in an information

searching session if they wish to obtain optimal search results. If a user does not know which regional

search engine or language to search in, they may not be able to find the most relevant information

resource. Although the World Wide Web has largely reduced the geographic boundaries of information

resources and make it possible for users to access information resources from around the world,

language boundaries still exist. How, then, are users searching for information resources in or across

various languages once they recognized the need to do so?

Multilingual Information Seeking Behavior, Language Proficiency and Domain Knowledge

Searching in a non-native language is a more involved process than searching in a native

language. For one, after deciding on a resource and the language to employ, users need to translate the

search terms into the document language before they enter it into the search engine. It is an extra step in

the query formulation process and could be challenging for users who are not proficient in the

document language. Kralisch (2005), for example, hypothesized that using unfamiliar languages to

search requires more cognitive effort. The requisite cognitive effort might lead users to avoid using

information resources of a language they are less versed in, thereby missing on potentially relevant

information.

When a user persists regardless of the language challenges, deficiency in language skills can

hamper their efforts to come up with a query term. Some users were observed to use creative methods

to counteract this situation. For example, users use advanced features of search engines, such as

preferred language settings, to restrict search results to web pages constructed in their target languages

(Hughes, 2005; Aula & Kellar, 2009). They may use search engines as a language tool to look up words


or the correct phrases in their non-native language (Aula & Kellar, 2009), or they may make use of

online translation tools, such as Google Translate, to translate the query terms into the document

language (Hughes, 2005; Nzomoet al., 2012; Ruiz & Chin, 2010).

Furthermore, the ability to read does not necessarily translate to the ability to actively come up

with a search term. Users who are able to read documents written in a foreign language may not be able

to formulate queries in that language unaided (Clough & Eleta, 2010) and still depend upon translation

assistance some systems provide (Artilleset al., 2007). Lack of language proficiency not only impedes a

user's query formulation ability, it also erodes user's confidence, hinders information seeking efficiency,

and requires extra effort on the users to discern relevant information resources (Rao & Varma, 2010;

Ruiz & Chine, 2010; Marlo et al., 2008; Petrelliet al., 2004; Clough & Eleta, 2010; Artileset al., 2007;

Srinivasarao, 2010; Nzomoet al., 2012). For example, Petrelli et al., (2004) found that users who are

less familiar with the document language would open the document to read its content in order to judge

its content, whereas users who are more fluent with the language can make similar judgment from the

title alone. The effort involved in searching in an unfamiliar language may be too demanding for some,

and deter them from accessing information resources of non-native languages (Kralisch & Brerendt,

2005; Wu, He, & Luo, 2011). Information seekers may rely on online translation tools, such as Google

Translate, to translate a document into a language they are more fluent in, even though they are not

happy with the machine translation result (Wu, He, & Luo, 2011). It is evident that language

proficiency is an impact factor that influences users' CLIR behaviors and capacity.

The effect of insufficient language skills can sometimes be mitigated by the depth of a user's

domain knowledge. Users who are familiar with a subject is usually more conversant with the technical

terms and jargons used in a field. They are able to more effectively come up with query terms in the

document language and peruse the search results (Kralisch & Berendt, 2005; Kralisch, 2005; Heet al.,

2003; Gaspari, 2004). Kralisch (2005) found users with low language proficiency but high domain


knowledge more likely to have similar success rate seeking for information in the subject domain as

native speakers.

From these cited studies, it appears that without sufficient subject domain or language

proficiency, users are likely be limited to searching within a language they know, and overlook

potentially helpful information resources. For these users, search engines that support CLIR features or

CLIR systems might be able to help them bypass the language barrier on the Web. Though there have

been test systems such as Clarity (Petrelliet al., 2004), and test features such as Flickling (Peinado,

Artiles, Gonzalo, Barker, & Lopez-Ostenero, 2008) that supports CLIR on the online photo-sharing

repository Flickr (www.flickr.com), such features are hard to come by in publicly available search

engines. One exception is Google with its Google Translated Search that was launched in 2007 and

disabled in 2013. The next section would review studies on how users react to and use CLIR features

and systems.

The Use of Existing CLIR Features and Systems

Google Translated Search was designed to help users of any language proficiency levels search

across information resources in the supported languages (full description of Google Translated Search

is provided in Chapter 1). Marlo et al (2008) found that users are largely unaware of it; participants

learned about Google Translated Search only after they were enrolled into the study. Most participants

in Marlo et al. (2008) were not able to envision using CLIR features on their own, even though

international users in other studies have indicated that they would appreciate the ability to read

documents or search for information in their native or active languages (Nzomo, Clough, & Dance,

2011; Srinivasarao, 2010). The researchers wondered if such response was a result of the experimental

setting, and that users would find uses for the feature for real life information tasks. There have yet

been any studies that confirms the supposition, and the disabling of Google Translated Search due to

low usage seems to suggest otherwise. It appears that even when users are presented with a CLIR tool

http://www.flickr.com/


that would streamline the search process by making it possible to search across languages, they did not

immediately incorporate it into their information seeking tool kit.

Summary of Literature Review on CLIR and Bilingual Users

To briefly summarize, existing studies on users' CLIR behaviors are often conducted with smaller

samples of different user groups. Viewing them together, even without the statistical power for

generalization, there are common observations about users' general behaviors. Yet even with data

collected from the different groups of participants, I propose that there are still large groups of bilingual

users, especially ones not within academia, that have not been studied and might show different

information seeking patterns.

The reviewed literature show that users do indeed search for information across languages and

that it is a rather common practice among bilingual users. There are several ways a user approaches the

task of selecting language to use when information seeking on the Web:

1. Confined to using certain language(s) because of one’s language skill limitations,

although language skill can sometimes be supplemented by domain knowledge.

2. Recognize the fact that the Web is fragmented by language, and that each fragment

contains information resources of varying quality on different subjects. Identify the

information need and the purpose of the search, choose the information resource

accordingly and use the language of the information resource for searching, even if it is

a second language.

3. Use a language that is most suitable for the subject matter at hand.

A user with sufficient language skill to choose among different information resources, he or she

chooses the language based on what he or she thinks would be most effective in retrieving relevant

information. The language choice is not a personal preference, but a decision arrived after weighing

various constraints.


What would happen if it is no longer necessary to pre-select a language based on what is

available and how well one speaks a language? What language would a person use if they are able to

access information resources in any language they want? How would they choose among the

languages? What are the impact factors? In the conclusion of their study, Wu, He, and Luo (2011) listed

their most important finding is that:

…many attributes of users can impact users’ needs and expectations with regard to multilingual

information processing in digital libraries… For example, the languages that the academic users

speak and their countries (thus their environments) can significantly change their motivations,

behaviors, and expectations of multilingual information in digital libraries. (p.194)

This dissertation set out to explore some of these user attributes to expand upon current

understandings of CLIR users' behaviors and preferences. Kralisch (2005) found that using less

familiar language results in higher cognitive load. This study would observe whether the higher

cognitive load effects users' language preference, and if there are other impact factors that would sway

user's language preferences for online information resources.

Furthermore, this study would continue the investigations on how language preference shapes the

users' information seeking behavior. Aula and Kellar (2009) showed that users choose search language

based on the quality of search results. This study takes away the differences in search result contents in

order to observe users' reaction and response to language alone.

The design of this proposed study is also different from previous research. Researchers have

studied users' CLIR behavior through the use CLIR systems and features (e.g. Petrelliet al., 2004, and

Marloet al., 2008), prescribed tasks (Hong, 2011, and Artileset al., 2007), and interviews about users'

Web searching activities (Reih & Reih, 2005, and Aula & Kellar, 2009). These studies observe a range

of factors that impact user behaviors, from reactions to search task (Reih & Reih, 2005; Hong, 2011;

Aula & Kellar, 2009), to language proficiency (Aula & Kellar, 2009; Marlo, et al, 2008). This study


extends from these research, but with a narrow focus on the act of language selection. Although the

sample size of this study follows the examples of previous studies and is kept small, the diverseness of

the participants would add to the rich narrative of users and their language selection during an

information seeking process.

Information Seeking Behavioral

Information seeking behavior (ISB) is the other main thread in this research as users’ language

choices are examined in the context of an information seeking session.

ISB has been a major subject of interest in the library and information science field. At first

glance, the act of information seeking appears to be straightforward: a person has a need for

information, looks for the information from different sources, gets a list of result and is either satisfied

or is disappointed. Yet look at it closely, and one would find it to be a complex process that requires

deliberate thought and decision making. Throughout the process, users are often contending with

myriad of variables that could be psychological, sociological, or cognitive.

The complexity of ISB is demonstrated in Wilson's (1997) attempt to provide a thorough model

that accounts for all aspects of information seeking behavior. Wilson based his model on theories in

information science field and expand it outward to include “the study of personality in psychology; the

study of consumer behavior; innovation research; health communication studies; organizational

decision-making; and information requirements in information systems design” (Wilson, 1997, p.551),

and noted the possibility of applying mass media and communication studies into the model as well.

Not all models are so encompassing. The more common research approach to tackle a subject as

intricate as ISB is to focus on one type of the behavior or on a specific aspect of it. The following

literature review provides an overview of the classic ISB models. The purpose of this review is to

establish the importance of examining language preference as an integral part of ISB. The review

shows that ISB is usually viewed as a monolingual activity. Although language is one of the important


factors that influence user's information seeking pattern, it has not been studied for its impact on the

behavior. This proposed study will fill this knowledge gap.

Information Seeking Behavior Models

In the 1980's and 1990's, several influential models were developed to describe what is involved

in ISB. The models can be categorized into three groups. The first group focuses on the nature of

information need and how it leads to information seeking behavior. Such models include Taylor's

(1967) four stages of information need, Belkin's ASK (Belkin, Oddy, & Brooks, 1982), and Dervin's

(1983) sensemaking theory. The second group describes the information seeking process which

involves the recognition of the information need, the formulation of the query, and the resulting act of

finding potentially relevant information. Well cited models within this category include Ellis (1989),

Kuhlthau (1991), and Wilson (1996). The third group incorporates user's interaction with the

information retrieval systems into the model. Examples include Ingwersen's (1996) cognitive

information retrieval model and Saracevic's (1996) information retrieval process model. The following

paragraphs provide an overview of the models in each group.

Models exploring information need. Models in the first group describes how an information

need emerges, what propels a person to act upon it, and what is generally involved in the act. Noted

studies and models in this area include Taylor's (1967) four types of information need, Belkin's ASK

(Belkin, Oddy, & Brooks, 1982), and Dervin's (1983) sensemaking theory.

Dervin's (1983) sensemaking theory theorizes that as individuals move through time and space,

they would encounter discontinuities, or “gaps”, in their reality. The gaps lead to questions, confusions,

and angst. A person would need to “make sense” of the situation, construct for themselves the uses of

the new sense in order to move on. In this model, information is highly subjective - it is a product of

human observation colored by the observer's existing knowledge and experiences. Information need

arises from a person's desire to resolve the discontinuities he or she experiences in reality.


Belkin, Oddy, and Brooks (1982) view the occurrence of an information need as an anomalous

state of knowledge (ASK). In ASK, information need is the result of a person recognizing an anomaly

in his/her state of knowledge concerning a subject or a situation. Initially, the disruption in the

knowledge store is difficult for the user to specify. Users are often unable to describe the information

they need to resolve the anomaly. Throughout the information seeking process, the information need is

recognized, disambiguated, formulated into a statement, and presented to an information system in

accordance to what the user anticipates the system can deliver. The evolution of the information need is

similar to the four levels of information need identified by Taylor (1967):

Q1 – the actual, but unexpressed need for information (the visceral need);

Q2 – the conscious, within-brain description of the need (the conscious need);

Q3 – the formal statement of the need (the formalized need);

Q4 – the question as presented to the information system (the compromised need). (p.182)

Visceral need is the first level in the question formation process in which user senses, consciously

or unconsciously, a need for more information. It is similar to the “non-specifiability of need” (Belkin,

1980) described in ASK. The need changes in “form, quality, concreteness, and criteria as information

is added” (Taylor, 1967, p.182) and arrives at the second level, the conscious need. At the second level,

the person develops a mental description of the need, but it is still ambiguous and not well defined. At

the third level, the formalized need, the person is able to formulate and describe the information need in

concrete terms. At the fourth level, the compromised need, the information need is modified in

anticipation of the potential information resources that the information system in use can deliver.

Sensemaking theory, ASK, and Taylor's four stages of information needs exemplify models that

focus on the cognitive development of an information need. Once the hazy sense of something being

amiss has been tuned into an expressible statement and question, the user is likely to engage in a series

of actions that hopefully resolves the situation. The second group of models describes the behaviors


involved in this process.

Information seeking behavior models. Information seeking behavior models distills the steps

that are taken in general information seeking processes to resolve the information need.

Ellis (1989) observed academic social scientists' information seeking patterns and identified six

features between the start and end of the process: starting, chaining, browsing, differentiating,

monitoring, extracting, verifying, and ending. The order of the features is not fixed but changes

according to the user and the circumstances of the information seeking activities. In this view, the

process of information seeking is flexible. It reacts and adapts to specific situations. With the sequence

and combination of the features being changeable, the model accounts for tasks with a fixed beginning

and an end, as well as ongoing monitoring situations. This model focuses on user behaviors; external

variables, such as context and situation, are discussed but does not receive extra attention. It also does

not cover the cognitive or affective aspects of ISB. In comparison is Kuhlthau's (1991) model that is

known for its inclusion of the cognitive (thoughts) and affective (feelings) realms alongside the

physical (actions) realm.

Kuhlthau's model “describes common patterns in users' experience in the process of information

seeking for a complex task that has a discrete beginning and ending” (Kuhlthau, 2005, p.230). The

model identifies common stages that users' go through: initiation, selection, exploration, formulation,

collection, and presentation. The often recursive and iterative process hinges upon four criteria: the

amount of time the user has, the nature of the task, the amount of personal interest, and the information

resource that is available. The model accounts for the cognitive and affective consequences of the

physical actions. As the users continue, their cognitive states evolve from vague to focused, their

thoughts develop from full of ambiguity to specificity and increased interest, and their feelings change

from confusion and frustration to confidence and satisfaction or disappointment.

From the model, Kuhlthau (2005) proposes the conceptual premise of uncertainty principle for


library and information services and systems, drawing attention to uncertainty “as a natural, essential

characteristic of information seeking as a sign of the beginning of learning and creativity” (p.233).

Uncertainty is first experienced when users notice the gap in their knowledge or understanding, and

then when users are unable to express clearly the information they seek. Uncertainty often manifests

into frustration and doubt. Another model that factors in the affective realm is developed by Bystrom

and Jarvelin (1995).

Noting the personal factors of attitude, motivation, mood, etc., Bysrom and Jarvelin (1995)

developed their model based on how task complexity impacts user behavior. The researchers identified

five task categories that ranges from automatic information processing tasks to genuine decision tasks.

At one end is automatic information processing tasks. These are tasks that are structured, with defined

perimeters, and has no case-based arbitration. These types of tasks are routine and can be automated.

On the other side of the spectrum is genuine decision tasks. These tasks are unfamiliar, unexpected, less

defined, and unstructured. For genuine decision tasks, neither the process nor the information

requirements can be defined beforehand. Between these two task categories are, in order, normal

information processing task, normal decision task, and known, genuine decision task. From automatic

information processing tasks to genuine decision tasks, each category requires more case-based

arbitration than the category before.

Corresponding to the different task types are two types of information needs: information need in

problem formulation, and information need in problem solving. To satisfy the information needs, there

are three kinds of information: problem information that describes the problem; domain information

that consists of facts and data in the subject area; and problem-solving information that describes how a

problem should be formulated, as well as what and in which manner domain information should be

used to solve the problem.

In Bystrom and Jarvelin's model, the process of information seeking and the type of information


resources sought are decided by the nature of the task, the situational factors, and the user's personal

factors. In the model, the information need arises when a lack in certain knowledge is encountered

during a task performance. The person decides upon an action based on “the needs, the perceived

accessibility (whether cognitive, economic or physical) of information channels and sources and the

personal information seeking style which evolves on the basis of successfulness of attempted actions”

(Bysrom & Jarvelin, 1995, p.8). The researchers found that task complexity has a direct impact on the

complexity of information needed, the needs for domain and problem-solving information, and the

number of sources needed.

Rather than focusing on tasks, Savolainen (1995) provided a framework to study information

seeking that occurs in non-work related everyday life. In everyday life, people are confronted with way

of life - the order of things, and the need for mastery of life - to keep things in order. In this context,

users may seek information in order to maintain, restore, or construct way of life or mastery of life. As

a result, people often seek information for three reasons: health, consumption, and leisure. Like

Bystrom and Jarvelin's (1995) model, the specific projects (tasks), situational factors, and personal

factors are all important in Savolainen's model. Personal factors, including values, attitudes, material

capital (e.g. financial resources), social capital (e.g. contact networks), cultural and cognitive capital,

and current situation of life forms a person's basic equipment to seek and use information. Savolainen's

(1995) everyday life information seeking framework reveals the complexity of information seeking

behavior, and the influence of each variable (such as income and education level) alone and in

convergence.

The complexity of ISB is demonstrated in Wilson's model (Wilson, 1997; Wilson, 2005) that

includes many theoretical basis from various fields. As Kuhlthau (2005) based her model on

uncertainty, Wilson's (1996) model uses stress and coping as its theoretical basis. There are five types

of information needed: new information, information to clarify, information to confirm the information


held, and information to confirm beliefs and values held (Wilson, 1996). Information is sought out for

the goals of: orientation (discover what is happening), reorientation, construction (to form an opinion or

solve a problem), and extension (to build knowledge on a subject). A major part of Wilson's model is

the adoption and incorporation of intervening variables from various fields in his model. These

variables that may pose a barrier to the activating mechanism of ISB include psychological,

demographic, interpersonal, environmental, and source related ones. The variables are rooted in health

information, psychology, marketing, and other fields. Through the construction of the model, Wilson

encourages information scientists to explore other disciplines for research ideas. He states that there are

“analytical concepts, models, and theories that need to be absorbed into information science as a matter

of urgency” (Wilson, 1997, p.570). Such is the angle of this study in examining information seeking

behavior from the view of language choice.

Information seeking and retrieval models. The third type of model extends from user's

information seeking process to include the design and features of an information system. Rather than

focusing on information seeking behavior, these models are developed to place users within the

information retrieval (IR) process.

Information seeking involves a person's explicit effort to locate information. There are many

different types of information seeking behavior that may engage many different types of information

resources such as other people, books, archives, or the Web. Information retrieval is a type of

information seeking behavior and it describes the process of retrieving potentially relevant information

from a digital document collection using an information retrieval (IR) system. The process is initiated

when the user enters a query term into the system, and ends with the system presenting a set of

documents (usually text, but could be other types of media) ordered by relevancy as judged by the

system to the user. The system is usually the critical element and the focus of IR discussions. Models

developed under the tradition of IR shows the shift of emphasis from users in the ISB tradition to the


system. Examples of this type of models include Ingwersen's integrative framework (Ingwersen, 1996),

and Saracevic's (1997) stratified model.

Ingwersen's (1996) is an information seeking and information retrieval model that reflects not

just the cognitive process of the information seeker, but also the technology behind the information

system, including the information space of the information retrieval system, the information retrieving

algorithms and the interface through which users interact with the system. Alongside user's cognitive

space and the social and environmental variables, Ingerwersen included into the model the nature of the

information objects (such as the way knowledge is represented), the setting of the information retrieval

system (including search language, IR techniques, etc), and the interface design as important elements

in the information seeking an retrieval process.

Similarly, in Saracevic's (1997) model, user and computer are the two entities that interact

through an interface. Users come to the interaction with existing knowledge store, an understanding of

the situation and environment in which the task is originated. The computer system is equipped with

system specific resources that makes up the engineering and processing levels, and informational

resources that make up the content level. Interaction between user and computer is initiated through the

interface, and ripples through the typology, occurring sequentially in connected levels.

Summary of Literature Review on Information Seeking Behavior Studies

Three types of ISB models are discussed above. The first group focuses on the nature of

information need, the second group describes the process of information seeking, and the third group

incorporates IR systems into the process. Each of the models has its own perspectives and approaches.

Each of the model is different in complexity, topology, and terminology. All of them, however,

recognizes the impact of situational, personal, and other contextual variables to a user's information

seeking and retrieval process. These variables may be implicit as in Ellis' (1989) model, or featured

prominently as in Wilson's (2005). The influence and importance of these variables are undeniable.


Researchers have isolated individual variables to study its impact on users and the ISB process.

Individual variables such as task (Li & Belkin, 2010), context (Kelly, 2006), gender (Hupfer & Deltor,

2006), demography (Gray, Klein, Noyce, Sesselberg, & Cantrill, 2005), and situational factors ( Rieh,

2004).

Although no language related variables were mentioned in the models, language proficiency,

language preference, and other language related variables may very well be a part of the personal factor

discussed by Ellis (1989), has an impact on the information resource available to the users as discussed

by Kuhlthau (1991), be related to the mastery of life in Savolainen's (1995) everyday life information

seeking, be an element of the psychological and demographic intervening variables in Wilson's model

(1997), and occupies the cognitive space in Ingwerson's interactive model. Multiple studies in the

context of MLIA or CLIR has shown that language use is, indeed, a part of information seeking

process. Its impact is seen to manifest in many ways, such as resource availability, or user's query

construction capability (Aula & kellar, 2009; Kralisch & Berendt, 2005). However, these studies and

the ISB models appear to stand apart. MLIA studies seldom reflects ISB findings, and ISB models have

not discussed the impact of bilingualism or multilingualism.

As Wilson (1997) argues, information seeking behavior is an interdisciplinary subject and should

be studied so. Bilingual user's information seeking is a complicated matter that stretches across many

different fields, including bilingualism, information seeking behavior, and information retrieval. There

are many elements that should be examined in depth in order to understand bilingual user's ISB in full

and how to best support them. This proposed study focuses on user's language choice, and explores the

impact of language related variables on a bilingual user's information seeking patter. It takes on but one

small corner of a large and complex issue. However, it is the beginning of deeper understanding. This

exploratory study does not seek to fit user's language choice into any model, but to understand the

shaping of the choice. Potential frameworks and models are explored but would need to be studied


further in the future for applicability.

Bilingual User's Language Choice and Language Use

Although bilingual users have received little attention from CLIR or information behavior fields,

bilingualism and multilingualism have received extensive attention in fields including linguistics,

cognitive science, neuroscience, psycholinguistic, and education. As varied as the fields that study it,

the subject is examined from various perspectives that range from how to educate a bilingual child

(Baker, 2011), how languages influence and change the speaker's language patterns (Backus, 2005), to

how policy influences language choice (Heller, 1992). The wide breadth of bilingualism and

multilingualism research is exemplified in the four parts of The Handbook of Bilingualism and

Multilingualism (Bhatia & Ritchie, 2013). This study looks at the use of language as a part of the

information seeking process. The emphasis of this review on bilingualism studies is therefore placed on

their language use, language choice and preferences.

It has been established that:

…at any given point in time and based on numerous psychosocial and linguistic factors, the

bilingual has to decide, usually quite unconsciously, which language to use and how much of the

other language is needed – from not at all to a lot. (Grosjean, 2008, p.2)

At any given point of time, a language is chosen as the base language for main use, and when needed,

other languages are brought in to use as the guest language in the form of code-switching or borrowing

(Grosjean, 2008). In code-switching, the speaker shifts to the guest language for a word, a sentence, or

more, whereas in borrowing, a phrase or sentence from a guest language is adapted

morphosyntactically to the base language. For this study, it is the complete switch to a different

language (the act of code-switching) and the factors influencing the language selection process that is

of relevance.

This literature review begins by introducing the term code switching, and identify its


applicability to this study. The following segments provide brief overviews and some example studies

of selected subject areas that pertains to the issue of why and how language choices were made. The

focus is on the user's approach to language choice. Potential impact factors are extracted for closer

examination in this study. The mixing of multiple languages in literature, as seen in War and Peace, or

other media, such as advertisements or news articles, are not included.

Code Switching

Before the discussion of bilingual speakers' language preference and use, it is prudent to

introduce the concept of code switching (also written as codeswitching or code-switching). The term

“code” is used to refer to a human language such as English, or linguistic style used within a

communication session, such as regional vernaculars (Nilep, 2006). Code switching occurs when

multiple languages are purposely used by one person within one conversation. The alteration of

languages could be within a sentence (intrasentential switching, sometimes referred to as code mixing)

or between sentences (intersentential switching). Observations of bilingual users' information seeking

behavior in the CLIR and MLIR fields show that the alternate use of languages in information seeking

situations is more similar to intersentential mixing, rather than inrasentiential mixing. In other words,

users are observed to use one language at a time for a search, and words used in one search are

composed in the same language. However, users may conduct several searches each using a different

language within a search session (Peinado, Artiles, Gonzalo, Barker, & Lopez-Ostenero, 2008; Nzomo,

Rubin, & Ajiferuke, 2012; Hong, 2011).

Code switching could occur in writing, but more often than not, the phenomenon is observed

during verbal communication. Code switching, like bilingualism, is studied from many different

perspectives such as the “syntactic or morphosyntactic constraints on language alteration” (Nilep, 2006,

p.1), the effectiveness of using multiple languages in the classroom (Moschkovich, 2007), the

grammatical properties when code switching occurs (MacSwan, 2012), the neurological undertaking of


switching languages (Meuter & Allport, 1999), or the social-cultural relationships between speakers

that triggers language switches (Saville-Troike, 2003). For this study, it is the why that matters.

Code switching can occur to either fill a linguistic/conceptual gap (Greene, Pena, & Bedore,

2012), or for more intricate reasons that are usually contextual to the social situation in which a person

finds oneself. Grosjean (1998) listed language proficiency, language mixing habits and attitudes, usual

mode of interaction, kinship relation, socioeconomic status, the nature of the message, the function of

the language act to be among factors that would make a person more susceptible to using multiple

languages at the same time. Many other linguistic studies approach code switching from the

perspectives of pragmatism and sociolinguistics (Androutsopoulus, 2013). For example, many studies

investigating the cause for code switching point to the use of language to establish social status, self-

pride, and prestige (Rezai & Cheitanchian, 2008). The social aspect is not applicable to an online

information seeking act when the user is interacting with a database or a search engine. Lacking the

intrinsic social dimension, the alternate use of language in an online information seeking session cannot

be described as code switching, but should be viewed as a straight forward switch of language.

However, research on why code switching occurs are still included in the literature review for the

insights they offer in a bilingual user's tendency to choose one language over another in a given

situation.

The Impact Factors of Language Choices

Of the many questions asked by bilingualism researchers, when and why does one choose to

speak a language are most relevant to this study. When people need to communicate or express their

thoughts, how did they choose which language to use? The decision is complex and involves many

linguistic, psychological, and social factors such as a person's language skill, identity and purpose, as

well as the social setting and the situation in which a conversation takes place. This section would

review research that explores and describes these factors through the lens of models that were


developed to examine the factors that influence a person's language choice. Included are Blom and

Gumperz's (1972) categorization of language switching occurrences, Fishman's (1965) construct of

domain, and Hakuta and Dandrea's (1992) discussion of language attitude.

Situational switching and metaphorical switching. One often cited framework that studies

language choice is developed by Blom and Gumperz (1972) which categorized the occurrence of

language alternation into two types: (a) situational switching, and (b) metaphorical switching.

Situational switching happens when a societal consensus was reached on what languages to use based

on the topics, situation, participants, and location of the conversation – what Scotton and Ury (1977)

calls the cluster. In situational switching, the language serves to define the existing context therefore

any change in the cluster might trigger a language switch. Metaphorical switching takes place without

changes in the cluster. The speaker uses a language that is outside of the social situation's agreed upon

languages in order to draw attention, emphasize, or add connotative meanings to parts of the dialog.

Another influential framework is the markedness model of Myers-Scotton (1983) that views the choice

of code as a way for the speaker to evaluate and index the rights-and-obligations sets that exist between

the speaker and addressee. Both frameworks view the use of language as a social function that changes,

sends a message about, or confirms a social construct.

Social interaction is also an important element in Fishman’s model (Fishman, 1965). His

construct of domain describes the social interaction in four aspects: purpose, topic, role-relationships,

and setting.

Fishman's Domain. Not to be confused with “subject domain” that is often used in IR or library

and information science that suggests a topic, domain is a concept that refers to the social-cultural

construct in which a conversation takes place (Saville-Troike, 2008). The construct is abstracted from

the setting, such as church, home, and school; purpose, such as official business and small talk; topic,

such as work and religion; and the role-relationships of the interlocutors, such as priest-parishioner,


mother-child, and teacher-student. Although individually defined, the four factors are often intertwined

and hard to untangle. For example, the topic of a conversation (e.g. sales transaction) can be highly

related to its purpose (e.g. business) which can also determine the setting (e.g. a formal business

interaction in an office). In another example, a boss and his employees may use one language to discuss

work related topics in the office as employer and employee, and another to ask about each other's

family as friends. Therefore, their role-relationship is dependent upon the setting and topic of the

conversation. As with the framework proposed by Blom and Gumperz (1972), domain is a social-

cultural construct. Of the four facets, only topic and purpose are germane to this study.

Purpose. In information seeking and CLIR research, purpose refers to the goal of an information

seeking task (e.g. personal, professional, etc). In bilingualism, purpose refers to the goal of the

conversation. The purpose of a conversation may be to conduct business, to build relationships, to

exchange pleasantries, etc. Sometimes, a language is chosen to assist the progression towards the

purpose. For instance, a sales clerk may use the customer's native tongue to provide better service

during a transaction. One can also view purpose through Saville-Traoike's “genre” which categorizes

talk by why it occurs (e.g. negotiation, war talk, etc) (Saville-Traoike, 2008). Sometimes, the choice of

a language may serve a purpose that is independent from the content of the conversation, such as

expressing an affinity with a population, a heritage, or culture. Mills (2001), for example, interviewed

ten mother and child pairs who are of Eastern Asian heritage and resides in England over the span of

two years regarding language use in daily conversations. She found that languages are crucial in

conveying the core values of religion as well as cultural and community affiliation as they are passed

down from parent to child. Language is used to help form the child's sense of ethnic identity. Similar

observations were made on Russian families immigrated to England (Kasatkina, 2010), and Scottish

families that use both Gaelic and English (Smith-Christmas, 2012). Language choices can even be used

by children of diaspora families as a way to challenge parental authorities, and, through it, forge new


relationships and values within the family (Hua. 2008). In these cases, language is used as a medium to

impart a set of values and signify cultural heritage of the family.

Beyond family, language is also shown to be instrumental in constructing a child's personal and

social identity among peers (Fuller, 2008; Caldas & Caron-Caldas, 2002), and in constructing and

reflecting the social construct in which the speaker belongs (Cashman, 2001). The purpose of language

choice in these situations have more to do with showing an association with an identity or a culture

than with expressing the message that was uttered. Beyond the social-cultural relationship among

interlocutors, language can also be used to indicate a change in topic, to structure a conversation (such

as indicating a side-sequence), or to enhance an expression or to link an element to a specific domain,

experience, or social-cultural setting (Morel, Bucher, Doehler, & Siebenhaar, 2012).

Regardless of the purpose, one can see the difference between how the fields of bilingualism and

information science treat “purpose”. In bilingualism, the purpose of a language choice is in the message

it sends and the social-cultural construct it conveys. For information science researchers, “purpose” is

tied to the anticipated outcome of the information seeking task. Language is chosen as a part of the

user's information seeking strategy. As of yet, there has not been any indication of language carrying

social meanings when used in an information seeking act.

Topic. Topic is about the content of the conversation. It is perhaps the most recognized factor in

CLIR, multilingual information access, and information behavior fields that influences information

seeking behavior. It is often cited as the motivating factor for language choice during an information

seeking task (e.g. Aula & Kellar, 2009; Hong, 2011, etc). Similarly, topic has been accepted by

bilingual researchers as one of the major determinant of language choice (Becker, 1997; Saville-Troike,

2003). There are times when a person associates a topic specifically to a language regardless of the

setting or language skills. This may occur because it is the language in which the person learned about

a subject, so that their store of necessary vocabulary and terminology is in that language. Even though


they are not fluent in the language, they would choose to use it when the topic comes up (Saville-

Troike, 2003).

As mentioned previously, topic is often enmeshed with the purpose and settings of a

conversation. For example, Soliman (2008) studied the use of Classical and Egyptian Arabic by

Egyptian scholars during a sermon and found that the topic of the lecture is a strong factor in language

choice. During a sermon, Classical Arabic is used for reciting and quoting from religious texts while

Egyptian Arabic is used for lectures and discussions. Furthermore, Classical Arabic conveys

seriousness and formality while Egyptian Arabic expresses warmth and intimacy (setting). Difficult to

separate from social-cultural setting and purpose, topic itself is seldom studied alone as a determinant

for language switching in bilingualism. Some researchers further suggest that language choice is not

driven by semantics, but by the interactional function of the conversation (Nilep, 2006), further

diminishing the importance of topic. Even so, topic has been listed as an important factor that

influences information seeking behavior, and is hence treated so in this study.

Language attitude. Another attribute that has been found to impact language choice more than

proficiency is the speaker's language attitude (Hakuta & Dandrea, 1992).

Attitude “is a hypothetical construct used to explain the direction and persistence of human

behavior” that is hard to observe and assess (Baker, 1992, p.10). In their study of the maintenance and

loss of Spanish/English bilingualism, Hakuta and Dandrea (1992) measured a subject's language

attitude through statements such as “It's O.K. If a person grows up speaking Spanish, and later forgets

it”, and “It is possible to learn English well without forgetting Spanish” (p.80). Similarly, Cherciov

(2013) measured a person's language attitude through questions about his/her cultural preference,

feelings of homesickness, and importance attached to a language as medium of contact with friends and

family. The effect of language attitude is examined for its impact in heritage language maintenance

(Nesteruk, 2010), language acquisition (Lasagabaster, 2015), language attrition (Cheriov, 2012;


Bahrick et al., 1994), and language preference (Hakuta & Dandrea, 1992). In fact, Hakuta and

d”Andrea (1992) saw that language attitude acts as a predictor to language preference more so than

language proficiency. In their study, language use in six different settings are explored: 1. with adults,

2. with siblings, 3. in school, 4. with peers, 5. in digital media, 6. alone, and 7. in church. It is unclear

whether this language choice covers digital information resources. This study would explore the

relationship between language attitude and digital information resource through the questionnaire.

The literature thus far cited mostly concerns verbal communications which has been the focal

point of bilingualism research. In more recent years, the alternating use of first (L1) and other

languages (referred to as L2 for simplification) is also observed in non-verbal communications, such as

in composition and in digital media.

First and Second Language Uses in Composition

Though language switching in writing is not studied as extensively as in verbal communication, it

received more attention from the field of foreign language education albeit in a different light. In this

capacity, language switching refers to the use of L1 in the composing of an L2 writing. The thought

process may be conducted in both L1 and L2, but the outcome is always in L2. The situation is

different from the information seeking setting studied in this research in that during information

seeking process, users can decide which language to use. Furthermore, writing requires the ability to

actively recall vocabularies and grammatical rules of a language. While information seeking also

requires active vocabulary recall, queries are often very short, and are not required to have grammatical

structure (Jansen & Booth, 2006). As a result, users who have passive language ability and can read but

not write in L2 might be able to successfully complete an information seeking task. Even with the

inherent differences, research on language switching in composition provides insights into user's

language uses that may apply to information seeking situations.

Language switching between L1 and L2 during L2 composition is often studied for how the


behavior might assist and influence the writing process. Language switching was found to help with the

thought and strategy development process, and to improve the quality of the L2 text (Qi, 1998;

Woodall, 2002; Li, 2008; Zarei & Amiryousefi, 2011 Qahfarokhi & Biria, 2012). Researchers suggest

that people might switch between L1 and L2 naturally as “an implicit or explicit problem-solving

strategy” (Qi, 1998, p.429) to facilitate their writing, especially for conceptual activities (Zarei &

Amiryousefi, 2011; Ramirez, 2012). Language proficiency and the difficulty of a task are suggested by

several studies to be strong impact factors to the amount of language switching that is engaged. The

more difficult the task is and the higher the level of knowledge the task demands, the more language

switching was observed (Qi, 1998; Wang, 2003; Li, 2008; Qahfarokhi & Biria, 2012).

Although this type of language switching has not been studied in the context of information

seeking, it likely occurs. If the use of L1 can facilitate a person's conceptual activities and helps

produce better L2 writings, it might also contribute to forming a better query term in L2. When a user

decides to execute a search in L2, they may engage L1 to process and express an abstract concept into

words.

The possibility of users incorporating language switching in the query term formulation process

has not been explored in CLIR or information seeking. Though this proposed study focuses on how a

user's L2 proficiency and thought process influence the selection information resource language,

whether L1 is involved in the process is also of interest.

First and Second Language Uses in Composition Computer-Mediated-Communication

Computer-mediated-communication (CMC) is the act of communicating through digital medias

such as mailing lists, instant messengers, chat rooms, online discussion forums, or social media sites. It

is an emerging subject within code switching (Morel, Bucher, Doehler & Sienbenhaar, 2012;

Androutsopoulos, 2006; Sabahat, 2013). CMC is similar to verbal communication except that it is

conducted through written words and computers. The participants are not face-to-face, and the


communication could happen asynchronously. The differences between CMC and verbal

communication are large enough that there are calls to treat CMC in its own right, separate from verbal

or written communications (Androutsopoulus, 2013). Nevertheless, there are similarities in why people

choose to use one language over the other is digital media and in verbal communication.

Researchers have found that, communicating online, users choose languages based on their

language proficiency, the specific setting (e.g. formal vs. informal, etc), his/her purpose (e.g. to express

sarcasm, to emphasize a point, or to express identity), the specific topic and subject domain, and the

common understanding of the group (Morelet al., 2012; Sienbenhaar, 2006; Sabahat, 2013; Sperlich,

2005). The online media use examined involves interacting with other people and carries the social-

cultural subtexts that is not fully applicable to information seeking situations. It is the purpose of this

research to examine these factors applicability within an information seeking session.

Language Exposure and Language Dominance

Language exposure concerns the amount of contact a person has with a language. It can be

measured by the length of residency in the second language country, or a person's reported use of the

second language in various forms (Krashen, 1982). Language exposure has been found to have

significant impact on a person's language acquisition (Love, Mass, & Swinney, 2003; Bahrick, et al.,

1994) and language processing abilities (Morford, 2002). Several studies have found that the earlier a

person is exposed to a language, the better he or she learns a language (Kovelman, Baker, & Petitto,

2008; Mayberry, Lock, & Kazmi, 2002; Morford, 2002). Yet other studies have refuted the significance

of age of exposure, and point to the length and degree of exposure to be more influential (Bahricket al.,

1994; Bedore, Pena, Summers, Boerger, Resendiz, Greene, Bohman, & Gillan, 2012). These studies

find that current language use is a more important predictor of language performance. Higher exposure

to a language also leads to better retention and less language loss throughout time (Rott, 1999).

Furthermore, it leads to a change in language dominance. Though in most times, language dominance


and preference is found to be task specific, for general tasks, the language a person is most exposed to

emerges as the one that is more dominant (Bahrick, et al, 1994).

As discussed in a previous section, some researchers view language dominance as language

proficiency. For example, in Aparcio and Lavaur's (2014) study of language proficiency and language

processing speed, the researchers uses “dominant language” to refer to the language that a person is

most proficient. I adopt another point of view and sees a dominant language as the one that a person

chooses to use over other learned languages. A language may gain dominance due to a person's

language proficiency, but also because of language exposure, the task at hand, and other variables

(Bahrick, et al, 1994; Grosjean, 2008; Lim, Liow, Lincoln, Chan, & Onslow, 2008).

Language dominance is observed as a variable in the maintenance and use of heritage language

in immigrant families (Suarez, 2002), in the acquisition of second language in a bilingual community

(Gathercole & Thomas, 2009), and for clinical assessment and intervention (Lim et al., 2008). Yet there

is no exploration towards the effect of a person's dominant language to their online information seeking

language choice. This study would fill the knowledge gap.

Summary of Literature Review on Bilingualism

As previous examples demonstrate, aside from the cognitive functions of the brain and for

pedagogy strategies, motivations behind language choice and language use are often studied from

social-cultural perspectives in bilingualism. Most studies focus on communications in which language

is used for people to relate to each other, and to relay ideas. Language choice is often indicative of the

speaker's perspective on his/her social identity and carries cultural significance. Languages are selected

not only as a medium of communication, but often serve as a message itself.

During an online information seeking session, language is used to compose a search term, not to

communicate with another person. Language serves, strictly, the role of the messenger. It is the tool in

which a Web user can express an information need. It is mostly used to construct short queries in the


purpose of retrieving relevant information resources. This type of language use and the dominant

factors that impact it has not been discussed much. This study would investigate the impact of topic,

purpose, language attitude, language proficiency, and setting, to user's language choice in an

information seeking setting.

Conclusions of the Literature Review

Literature from three fields are reviewed in this section: cross language information retrieval

(CLIR), information seeking behavior, and bilingualism. The three fields have different emphasis. The

focus of cross language information retrieval is on the development of the technology and the design of

the system. A subgroup of the studies examined user's interaction with and thoughts on cross language

information retrieval systems. The number of research on CLIR users may be small, but they illustrated

the heterogeneity of the users and the multiple variables that may impact their behaviors. Most of CLIR

user studies examined the overall CLIR process, and observed the behavioral pattern as well as

variables that may influence the behavior. Language proficiency and subject domain have been

identified as the most notable impact factors for CLIR users on selecting information resources of

different languages. This proposed study takes an alternative viewpoint from traditional CLIR research

by treating language choice as an outcome instead of an impact factor. Language proficiency and

subject domain would be observed as potential impact factors for user's language choice. This proposed

study would contribute to the knowledge of bilingual users' CLIR behaviors by understanding bilingual

users' language choice for digital information resources through a user study approach.

The field of information seeking behavior emphasizes user's behavioral pattern and cognitive

process that initiates and facilitates an information seeking process. The process is influenced by

several external variables. This research examines language as one such variable and purports that the

examination of information seeking process should start with the understanding of a user's language

use. Instead of looking at the information seeking process, this research focuses on how users associate


with languages in an information seeking context. Once a user's language selection process is

understood, one can continue to examine the difference between language preference and actual

language use and the cause to this discrepancy.

Language use is a core subject in bilingualism. For bilingualism researchers, language is often

seen as a part of the message that is being imparted. This study sees language as the medium to

compose search terms and retrieve potentially relevant information resources. Within the context of this

study, the social-cultural aspects of language is not an emphasis. At the core of this study is how users

use the languages that they know. However, studies on bilingual users have shown that language choice

is often tied to elements beyond language proficiency, including a person's language exposure, and

language dominance. These are factors that have yet to be explored for their impact on a user's

information seeking behavior.

CLIR, information seeking behavior, and bilingualism each contribute to the understanding of an

aspect of multilingual user's information seeking behavior. This research is positioned at the

intersection of these three fields. Instead of focusing on CLIR system and technology, this study

focuses on users. Instead of asking users how they would use an existing system, this study asks what

language would they prefer to use, and argues that system design should be based on user's language

uses. Instead of treating language as a message, this study views language as a medium and its

selection as a decision made by the user under the influence of other factors. As Wilson (1997) argues,

information seeking behavior is an interdisciplinary subject. As Fishman (1965) posits, there are many

motivators behind a person's language use. Literature reviewed in the previous paragraph points to

language proficiency, language exposure, subject domain knowledge, language attitude and

information availability as possible impact factors to a person's language choice. The research method

of this study (see later chapter for detail) lets the researcher remove information availability as a

variable. In its place, a user's language profile is studied in closer detail for its bearings into a user's


habitual use of language and the user's language selection process during an information seeking act.

This research would examine if these variables influence a user's choice of language when using digital

information resources, and explore other possible impact factors. By pulling on existing literature from

the three fields, this study would establish a better understanding of CLIR behaviors in relation to

language use.


Chapter 4. Research Question and Research Methodology

Research Question

The previous chapter’s review of CLIR development, user research, and what we know of

bilingual users, demonstrated that there is a need for more understanding of bilingual speakers and how

they look at digital documents made up of different languages. Existing studies on bilingual users

focused on language proficiency and search task as the impact factors for information seeking behavior.

The present research expands upon them and asks: “What elements within a user’s language profile

influences his/her language choice for digital text documents?”

Hypothesizing that linguistic and psychological factors including the user's language exposure,

attitude, proficiency, preference, and domain of use have impact on a bilingual user’s language choice,

the following assumptions are examined:

1. A bilingual speaker’s language attitude influences he/her language choice.

2. The length and breadth of language exposure impacts a multilingual person’s language

choice.

3. The history of language use impacts a multilingual person’s language choice.

4. A person’s language fluency impacts his/her language choice.

5. The subject matter has effect on a person's language choice for the information resource.

The assumptions are operationalized by limiting bilingual speakers to Chinese-English bilingual

speakers to whom Chinese are first languages (L1) and English are second languages (L2), and

examining the relationship of the above listed assumptions with their language selection results:

1. Language attitude: The likelihood of a bilingual speaker chooses L2 increases when he/she

indicates a preference for L2.

2. Language exposure: The longer a bilingual speaker is exposed to an L2 environment, the


longer they have been actively using the language, and the more likely he/she would choose

L2.

3. History of language use: The longer a bilingual speaker has been using L2, the more likely

he/she would choose the L2 versions.

4. Language proficiency: The more proficient a bilingual speaker is with L2, the more likely

he/she will choose L2 for digital information resources.

5. Subject matter: The less familiar a bilingual speaker is with a subject matter, the more likely

he/she will choose L1 for information regarding that subject.

The goal for this study is not to produce generalized and conclusive tests on the hypotheses, but

to observe trends and see whether the hypotheses warrant further studies and verification.

Research Method

Overview

This section provides a brief literature review on research method to explain the study design.

Detailed description of the measurements and research procedure begins at the next section.

Research method. Many approaches have been used to study a person’s information seeking

behavior. The first type observes users in-situ, and gathers data about users' natural behaviors as they

encounter information seeking tasks in life or work. Such designs make use of diaries (Kuhlthau, 1991;

Kelly, 2006), browser history (Aula & Kellar, 2009), search engine query logs (Keegan &

Cunningham, 2008; Kralisch & Berendt, 2005; Rao & Varma, 2010; Wu, He, & Luo, 2012), or surveys

and interviews about user's real-life experiences (Rieh & Rieh, 2005; Clough & Eleta, 2010; Nzomo,

Rubin, & Ajiferuke, 2012).

The second type conducts user studies in laboratory settings with prescribed tasks in order to

control the context in which the search occurs. The tasks could be designed to simulate information

seeking situations that users encounter in life (Hong, 2011), or designed to expose users to unfamiliar


situations and tools (Petrelliet al., 2004; Ruiz & Chin, 2010; Artiles, Gonzalo, Lopez-Ostenero, &

Perinado, 2007). This research uses a combination of both types of methods: user background and

actual information seeking habit are gathered by a survey; their language preference will be observed

through prescribed document selecting exercise.

Bilingualism researchers have used various subjective measurements to measure a person's

language skill and dominance. Measurements such as pronunciation (Hopp & Schmid, 2011), tip-of-

the-tongue experiences (Ecke & Hall, 2013) and language tests (He, Wang, Oard, & Nossal, 2002).

More often than not, researchers rely on participant's self-report (Artiles, Gonzalo, Lopex-Ostenero, &

Peinado, 2007; Clough & Eleta, 2010), which has been shown to produce reliable and valid

measurements, and correlates highly with standardized tests and judged ratings (Lim, Liow, Lincoln,

Chan, & Onslow, 2008). This study relies on participant self-report through online surveys to gather

data on the participant’s language history, use, proficiency, and other profile elements.

However, to counter any occurrences of over- or under-estimation of one’s language skills or

language uses (Ayers, 2010), this present study also gathers behavioral measurements as additional

evidence by observing their language choice and article language selection results. The two groups of

data, observations of actual language selection for the survey and for the article selection exercise, and

the verbal or written input from the surveys, are combined to develop what Yin (2009) calls the

“converging lines of inquiry” (p.115) that more fully describes users’ use of and attitude towards a

language.

Population and sample size. There are no established standard profiles for CLIR users nor

known population that could be used to draw from for a randomized study. Users who search for

information across languages could be of different nationality, speak different languages, have different

levels of language proficiency, work in different fields, and have different information needs. As a

response to the immensely heterogeneous demography, MLIA and CLIR researchers have largely


adopted the purposive sampling or convenience sampling approaches. Using purposive sampling,

“[m]embers of a sample are chosen with a ‘purpose’ to represent a location or type in relation to a key

criterion” (Ritchie, 2003, p.79). Some studies use a small sample size with purposive sampling

approach to examine a specific segment of potential CLIR users: Hong (2011) interviewed 21 Chinese-

speaking graduate students; and Petrelli et al. (2004) conducted field studies with 10 journalists and

librarians. There are also studies that use convenience sampling approach for first-hand observation of

multilingual users: Aula and Kellar (2009) did reflective interviews with 10 IT professionals working in

foreign countries. Though findings from such studies cannot be applied to all bilingual Web users, they

nonetheless contribute to the understanding of the greater population.

There are larger scale studies that reached out to a wider population. Clough and Eleta (2010)

recruited 514 participants through online invitations for their studies on academics’ use of multilingual

digital libraries. Similarly, Wu, He, and Luo (2012) targeted potential digital library users within the

academia and recruited 358 participants for their study, also on multilingual digital library usage,

through email list. Steichen et al. (2014) used academic mailing list to recruit 385 users for their study

on multilingual user behaviors.

This study is designed as an exploratory research. The purpose is not to produce a generalizable

result, but to observe trends and explore possible impact factors; therefore, the selected sample size is

not large. The participants are recruited from deliberately selected sample pools through convenience

sampling, snowball sampling, and online invitations. The participant recruiting process loosely follows

the heterogeneous sampling approach (Ritchie, et al., 2003) in that the invitations are sent to potential

participants that reflects the diversity of Chinese-English bilingual speakers. More specifically,

invitations to participate in the survey are sent to Chinese-English speakers living in the US and in

Taiwan in order to gather user response from bilingual speakers with different language proficiency,

language background, and language exposure.


The sample frame for this study is set to Chinese-English bilingual speakers to account for this

study’s goal of studying bilingual user’s language preference and use; and the researcher’s ability to

communicate and accommodate a bilingual user’s language preference during the study. The

participants are recruited from diverse backgrounds in order to ensure the inclusion of different

language profiles. The main group of the participants are Chinese speakers with English as their second

language who resides in the United States. A second group of Chinese speakers who knows English

were recruited from Taiwan as a comparison group for the impact of language environment factors.

Research process. The study procedure includes a pilot study and a larger scale user survey.

The pilot studies are conducted through video conferences so that the researcher can directly

interact with and observe participants. The participants are asked to incorporate Think Aloud protocols

during the session, and a semi-structured interview (see Appendix IV for interview script). The final

survey was modified based on pilot study user inputs so that the questions are more clear and precise.

The research procedure is detailed in a later section.

Measurements

This present study observes four variables that have potential influence to a person's language

choosing outcomes: attitude, exposure and use, fluency, and the subject matter of the information

content. Data are collected through a language profile survey (Appendix III) and a user study

(Appendix IV) to help evaluate the existence of any observable trends that supports the theses

established in the previous section. Detail description of the relationships between the variables are

given in the Materials section. Impact of the variables can be expressed as 2x2 or nxm contingency

tables. The variables and how they are operationalized are discussed in the following sections.

Language Attitude

As defined in Chapter 2, language attitude is the feeling a person has towards languages that is

either known or not known to the person. Language attitude is viewed as a composition of three


components: cognitive, affective, and readiness for action. In this research, language attitude is

measured through participants’ cognizant and subconscious actions – their language choices for the

survey questions (survey language) and to answer in (answer language); cognitive reflections on survey

questions about dominant language, language preference, and cultural identification; and from the

open-ended questions that collected participants’ affective reactions toward languages.

Language Exposure and Experience

In this study, language exposure refers to coming into contact with a language either passively

in one’s surrounding, or in active act of communication. To be exposed to a language does not require

active participation; hearing English being spoken by others, and surrounded by English signage are

forms of exposure. In the survey, language exposure is measured by: (a) how long a participant has

been living in the US, an English-speaking country, and (b) the amount of English exposure in daily

life.

Unlike exposure, daily use of a language requires employment of language skills such as

comprehension and reading. It involves actively using the language on the person’s part. Participants of

this study are asked how long they have been consistently using English daily as a measurement of how

much experience they have with using English.

Language Proficiency

Language proficiency refers to how well a person can speak, write, read, and comprehend a

language. There are multiple ways to measure a person’s language proficiency. One of the data

collecting material used in this study is a modified LEAP-Q survey (see next section, Material, for

more discussion). In the survey, participants are asked to self-rate their language proficiency level as:

(a) able to read place names, signs, some words and phrases; (b) able to read simple paragraphs on

familiar subjects; (c) able to read general news articles as well as reports and technical materials on

familiar subjects; (d) able to read all styles and forms of documents related to professional needs; and €


equivalent to the language proficiency level of an educated native speaker.

Subject Matter

Subject matter refers to the topic of the article. It is operationalized using the different news

articles included in the article selection exercise. The subjects of the articles include entertainment

(movie and TV series), cultural commentary, interior design, politics, science, and local news.

Putting it Together

This research assumes that each of the variables listed above have potential influence on how

people select the language to use, which in term might influence their information seeking behavior ().

Figure 2. Language Choice Variables and Information Seeking BehaviorMaterial

Language Proficiency and Internet Usage and Experience Survey

At the start of the session, after the participant has gone through the informed consent, they are

asked to fill out a language proficiency and Internet usage survey (Appendix III) that collects data on

their language skills, language use, language preferences, language exposure and Internet use

frequency. The survey is based on the Language Experience and Proficiency Questionnaire (LEAP-Q)

developed by Marian, Blumenfeld, and Kaushanskaya (2007). Leap-Q is designed to create a language

profile of a participant through a series of language use impact factors, including self-rated language

proficiency, history, exposure, attitude, as well as amount of and domain of use

Language Attitude

Language Exposure

Language

ProficiencySubject Matter

Language

Choice

Information

Seeking Behavior

Influences InfluencesLanguage History


(http://www.bilingualism.northwestern.edu/leapq/). LEAP-Q's reliability and internal validity has been

established through a comparison of survey results and standardized language tests results (Marian et

al., 2007).

For this research, LEAP-Q is converted from paper to digital form. The questions include

multiple choices, single choices, and open-ended questions. The single-choice questions are asked in

the formats of dropdown lists, radio boxes, and sliding scales. The multiple-choice questions are asked

in the formats of checkboxes. LEAP-Q’s original questions are adjusted to include language use in the

context of online information seeking sessions: the World Wide Web is added as one of the possible

source of exposure and language use in the questionnaire; additional questions about Internet use are

added to the end of the survey to collect users' regular Web use behavior and information seeking

habits.

The survey is available in two languages: Chinese and English. The Chinese version is available

in two forms: Simplified Chinese and Traditional Chinese. Chinese script has evolved over

millenniums into Traditional Chinese form, which can be cumbersome and time-consuming to write.

People have come up with different variants of the traditional form to make the writing process easier.

In the 1950’s, in order to regularize simplified forms of the Traditional Chinese script, Simplified

Chinese was developed and promoted for common use in the People’s Republic of China (Bokset,

2006). As a result, Traditional Chinese is the standardized form used in Taiwan, and Simplified Chinese

in Mainland China. Though the two scripts are different, the meaning, pronunciation, and syntax of the

words and phrases remain the same. Both scripts are offered to accommodate the reading habits and

preferences of participants.

The users’ selections of survey language and their consequent use of languages within the session

are recorded and included in the data analysis as indications of user's language preference and language

fluency.


Article Selection Software

The article selection exercise is conducted through a Web application that I created specifically

for this research. The application contains a database of ten news story titles and excerpts in English,

and the same article titles and excerpts in simplified and traditional Chinese. News stories are used in

this exercise because they are written for the general public and cover a wide range of topics.

The articles are collected from BBC and New York Times during the period of April 1st, 2014 to

April 15th, 2014. Both news services have a main English site (www.bbc.com and www.nytimes.com)

and Chinese site (http://www.bbc.com/zhongwen/simp and cn.nytimes.com). The sites contain

independent stories as well as articles of comparable contents. Only articles that are available in both

English and Chinese are selected. Subject matters of the articles selected include: entertainment (movie

and TV series), cultural commentary, interior design, politics, science, and local news. Each story

contains proper nouns and/or technical jargons. Some stories, such as the ones about entertainment and

the local news, contains proper nouns, and some, such as the science articles, contain technical terms.

The article selection exercise begins with a brief instruction instructing the user to select between

a Chinese and an English version of the same article. The participants are not asked to search for

information, but to react upon the information that is presented to them. They are asked to choose a

language intuitively without giving it much thought. Once started, the participants are presented with

eight articles in English and the Chinese script of their choice. The articles are shown individually, with

the two language versions side by side (see Appendix IV for the application). The articles and the order

in which the language versions are presented are randomized but consistent from session to session so

that each user sees the articles in the same order and presentation.

Think Aloud Protocol

The pilot study incorporates Think Aloud protocol in order to gather as much data as possible

http://cn.nytimes.com/

http://www.bbc.com/zhongwen/simp

http://www.nytimes.com/

http://www.bbc.com/


while minimizing possible influences to the participant's thought process (Ericsson and Simon, 1993)

are used. At the beginning of the study, the users are introduced to the protocol and asked to verbalize

their thoughts as if they are talking to themselves while they perform the article selection tasks. They

may describe what is going through their minds or explicate their thought processes, but they are

warned against providing explanations to their behavior. The purpose is to gather thoughts that would

normally occur during the task instead of asking them to be introspective of their actions. Asking the

users to examine their own motives and logic would likely alter their thought process, and change their

behavior. During the think aloud protocol, the researcher is present but limits interaction with the

participant to prompts to continue verbalizing so as to not affect the user's articulation. The Think

Aloud protocol is only used during the pilot study.

Article Selection Follow-Up Questionnaire

The final part of the research session is a questionnaire consisting of open-ended questions that

ask users to reflect upon the article selection exercise. The participants are shown all the articles titles

they have just seen in a list, with the two language versions side by side, and the version they selected

in bold font. The article selection results are presented as prompts to help participants retroactively

answer questions about how they choose the articles, and what they thought of it.

Population

The subjects of this study are Chinese-English bilingual, Internet using adults. The subjects are

able to use both languages in either reading, writing, or conversing. Although there are US census data

on languages spoken at home, there are no collaborative data on bilingual speakers who are also

Internet users. With the population unknown, and no established user profile, this study recruits

deliberately within designated sample frames that reflects diversities existing among bilingual

population. Five subjects were recruited for the pilot study. The subjects are Chinese-English bilinguals

who reside in the US and use both languages daily.


Two sets of participants were recruited for the general study. The first is Chinese speakers who

are able to speak English residing in the United States. The second is Chinese speakers who have

studied English residing in Taiwan. The reason for having two groups of participants is to make certain

diverse language profiles are reflected in the samples. Invitations for the general study were sent to

Chinese-English bilingual speakers residing in the US as well as in Taiwan through email lists, message

boards, and personal email invitations. Participants are also encouraged to forward the invitation to

others who meet the study criteria. The purpose of this approach is to collect enough responses to show

trends and patterns. Data is gathered during the month of March 2017.

Procedures

Pilot study

Invite Chinese-English speakers of different background, currently residing in the US, and are

eighteen years or older to participate in the pilot study. Schedule each for individual sessions. Ask for

their preferred video conference format and set up a time for the study session.

At the time of the study, call the participant. Greet him or her in both Chinese and English, and

ask them which language would they like the researcher to use in the remainder of the session. This

will be the language that the instructions, assistance, and interview questions be given in. If they

choose Chinese, ask if they prefer traditional or simplified Chinese for the written materials. Continue

the debriefing and the rest of the session in the selected language.

Direct the participant to the survey website and commence with debriefing. Introduce the

participant to the research, go over the research agenda, and obtain informed consent (Appendix I).

Give the participant opportunities to ask questions and decline participation if they so wish. If they

wish to continue, instruct the participant of the Think Aloud protocol and ask them to verbalize their

thoughts during the session henceforth. Encourage them to ask questions during the session, ask them

to click on the “Begin” button to start the survey.


Administer the modified LEAP-Q questionnaire (Appendix II). If the participant seems stuck at a

question or show signs of confusion, ask them what they are thinking. Remind them to think aloud

while answering the questions.

After the survey, let the participant begin the article selection exercise. Observe quietly and only

prompt participant to express their thoughts if he or she silent for an extended period of time (10

seconds).

After the article selection exercise, proceed to the follow-up questionnaire. After the participants

finish the questionnaire, conduct final semi-structured interview (Appendix VIII Interview Script).

General User Survey

Revise survey questions and instructions based on pilot study participant inputs.

Send personal email invitations to Chinese-English bilingual speakers residing in the United

States. Post invitations to area Chinese schools and social media lists. Include survey URL in the

invitations.

Continue to send out invitations to the survey and collect data for the duration of two months.

Scope and Limitations of the Study

This proposed project examines the language use of Chinese-English bilingual speakers from the

Chinese-English communities that the researcher has contact with using materials selected and

designed for this proposed research. The focus on user’s reaction to languages places the system

development, interface design, and other technical aspects of CLIR out of the scope of this study.

Without a clearly defined bilingual Web user profile, this study, like others, uses non-random

sampling approaches, consequently the participants cannot be viewed as representative of all bilingual

speakers. Furthermore, participants of this study are self-selected: they are those who responded to the

online or email invitations and volunteered their time to contribute to this research. As a result, the

participants may be interested in the subject or are more aware of their own language use to begin with.


The use of a digital survey accessible online also led to the exclusion of Chinese-English speakers who

are not comfortable using digital information resources, prefer physical documents, or have no easy

Internet access. Because of these concerns and the small sample size, results of this study cannot be

used to describe all bilingual Web users. It contributes to the literature on MLIA and bilingual users in a

more confined way.

The proposed research is limited by its procedure to explore how participants' select between

languages they know. The materials used in this research are designed by the researcher and tailored to

this study. The materials may not reflect participants’ preferences for digital information resource or

real life information needs, and may not perfectly simulate real life language choice. Furthermore, the

proposed procedure of this study examines user's attitude and behavior towards receptive and passive

language use. Users need to decode the language when the article is presented to them, but they are not

required to actively recall words or compose queries. Users might demonstrate different behaviors and

have different preferences when they are asked to formulate search terms themselves and need to

engage in productive language use.

In short, the scope of this study is defined by the use of a specific group of bilingual users and

prescribed materials, the involvement of reading skills, and the selection of reading material between

two languages. The small sample size and the sampling method prohibits the results to be generalized.

Regardless, the findings would be valuable to CLIR, information seeking behavior, bilingualism, and

other fields that take interest in bilingual user's information seeking behavior and language use.


Chapter 5. Data Analysis

In this chapter, the survey results are presented and analyzed following the direction of

the research questions detailed in chapter 2. The majority of statistical analysis completed in this

current study uses nonparametric statistics such as Kruskal-Wallis test and Mann-Whitney tests. This is

because many of the variables, such as English proficiency level and daily English exposure, are non-

normally distributed, has outliers, and/or are ordinal variables. Categorical variables are compared

using chi-square tests. (Moore, McCabe, & Craig, 2009)

Pilot Study Results

The researcher was able to recruit five participants for the pilot study. The pilot studies were

conducted through timed video conferencing, and audio recorded. The participants were asked to think

aloud during the sessions. After the survey, they were asked to comment on the study process and

survey questions. The sessions ranged from 25 to 45 minutes.

From the pilot study results, instructions were added at the beginning of the survey to let

participants know that they can choose to answer the survey in any language, even if it is a language

different from the one they chose to have the survey be presented in. They are also assured that they

can switch languages between questions. A few survey questions were reworded for clarity. A

redundant question about the article selection process was replaced with one that asks about language

dominance. The article selection exercise was shortened from twenty articles to eight to avoid

participant fatigue. Finally, a special version for Internet Explorer was created due to the browser’s

inability to display certain HTML5 scripts.

General Survey Results

Forty email invitations were sent to Chinese-English speakers currently residing in the US. Five

email invitations were sent to Chinese-English speakers currently residing in Taiwan. The invitation is


further posted at four Chinese community email lists in the US, three universities with international

student populations (one on the East Coast, one in the South, and one in the Mid-West), and at one

university in Taiwan. In total,184 responses were received, 32 of which were only partially completed,

leaving 152 (82.6%) completed survey. Three participants reached out to the researcher after their

survey was submitted to convey further thoughts on language and digital resources.

The results are examined as structured in the survey in sections: language profile, daily

language use, use of language online, and the article selection exercise. The data are also analyzed

across section in correspondence to the research questions. The goal is to identify variables that may be

associated with a bilingual user’s language choice for digital information resources.

Language Profile

Basic demographic information. The 152 participants are made up of 116 females (76.3%)

and 36 males (36%). They range from 19 years old to 65 years old (M = 35.7, SD = 12.218). The

participants have on average lived, or have been living in, the US for 11.3 years (SD = 9.92): 106 of the

participants (69.7%) have lived or are still living in the United Sates, and 46 (27.6%) may have visited

but have never lived in the US. Of those who live in the United States, eight (6.9) only stayed for a

year, 12 (11.3%) are here for under five years, and 94 (88.7%) have been in the country for six to 42

years. In total, the 116 participants have lived in the United States for an average of 15.58 years (SD =

8.349), with the longest duration being 42 years.

Regarding the participants’ daily environment, the majority (140, 92.1%) of the participants

speaks mainly Chinese within their family; only a minority (42, 7.9%) uses mainly English in their

family. Outside of the home, 106 participants (69.7%) have spent more than one year in an English-

speaking work/school environment.

In contrast, all of the participants have lived in a Chinese-speaking country at some point in

their lives. On average, participants lived, or have been living, in a Chinese-speaking country for 23.56


years (SD = 9.21). Furthermore, more of them (124, 81.6%) have lived in a Chinese-speaking country

longer than in an English-speaking country, as well as having a Chinese-speaking family (140, 92.1%).

The majority of the participants (138, 90.8%) have spent more than one year in a Chinese-speaking

work/school environment.

First, dominant, and preferred languages. Almost all of the participants (151, 99.3%) speak

Chinese as their first language, and one (0.7%) speaks English as his/her first language. Not everyone

views their first language as the dominant language however (see Chapter 2 section 3 for a detailed

discussion of the definition of dominant language). When the participants were asked to list the

languages that they most intuitively use in life, the majority of them listed Chinese (113 participants,

74.3%), a second group listed English (33 participants, 21.7%), and the rest listed Cantonese (3

participants, 2%) and Taiwanese1(3 participants, 2%). The final count shows that 36 participants

(23.7%) have a dominant language that is not their first language.

All of the participants (100%) know a second language (Cantonese, Chinese, English, French,

Japanese, and Taiwanese), and 96 (63.2) of them listed English as their second most dominant

language. 35 of them (23%) speaks a third language (Chinese, English, French, Italian, Japanese,

Korean, Portuguese, Russian, Shanghainese, and Taiwanese); and four of them (2.6%) speaks a fourth

language (Japanese and Taiwanese).

Although Cantonese, Shanghainese, and Taiwanese are languages spoken in specific locales

(Hong Kong and Macau, Shanghai, and Taiwan, respectively), each with its own distinct tonality,

pronunciations, grammar, and lexicon, they are used in areas that deploy Mandarin Chinese, or

Putonghua, as the standard language and script (Gong, Chow, & Ahlstrom, 2011). Participants who

listed Cantonese and Taiwanese as their dominant language all listed Mandarin Chinese as their second

1 Here, Taiwanese refers to Taiwanese Hokkien (臺灣閩南語), a local language brought to Taiwan in the 1600’s by emigrants from Southern China. It is spoken by approximately 70% of the population in Taiwan (Tomala, 2016). Taiwanese has its own tone and pronunciations, terms, and syntax.


most dominant language. These participants are all well versed in Mandarin Chinese, and are counted

as Chinese-English speakers.

A chi-square test of independence was conducted to observe the relation between the dominate

languages and whether the participants have lived in the US. The result was significant, X2(1, N=152)

=13.081, p<0.001. Participants who lives in the US has a higher chance of having English as their

dominant language than participants who lives in a non-English speaking country.

A Mann-Whitney U test was conducted to determine whether there was a difference in the

number of years living in the US between participants who are English dominant, and participants who

are Chinese dominant. A Mann-Whitney U test was used because the data do not meet the assumptions

for independent-samples t test, even though the sample size is of adequate size: (a) the number of

Chinese dominant participants are about 3.5 times larger than the number of English dominant

participants, (b) the samples failed the normality test and has two outliers, and (c) the two groups have

different distributions (Moore, McCabe, & Craig, 2009). The result indicates that English dominant

participants has lived longer in the US (Mdn = 17) than Chinese dominant participants (Mdn = 10), U =

1101.5, p <0.001.

When the subjects decide to participate in the study, they are first asked to choose between

Chinese and English as a survey language - the language in which the survey instructions and questions

are presented. They are also given the instruction to answer in whichever language they are most

comfortable with. Of the 119 (78.3%) participants who chose Chinese, Taiwanese, or Cantonese as

their dominant language, 98 (93.2%) of them selected Chinese as their survey language. Out of the 98,

71 (72.4%) used Chinese as their answer language. On the other hand, 12 out of the 33 participants

(36.4%) who chose English as their most dominant language chose English as the survey language, and

27 (81.8%) answered questions in English.

Chi-square tests of independence was performed to examine the relations between dominant


language, survey language, and answer language. The relation between dominant language and survey

language was significant, X2(1, N=152) = 5.325, p=0.021. From the data, participants who views

Chinese as their dominant language are more likely to select Chinese as the survey language. Dominant

language and answer language also have a significant relation, X2(1, N=152) = 17.786, p<0.001.

Participants whose dominant language is Chinese are more likely to use Chinese as the answer

language, and participants whose dominant language is English are more likely to use English as the

answer language. The relation between survey language and answer language was significant as well,

X2(1, N=152) = 43.275, p<0.001. Participants who choose English as their survey language are more

likely to use English as their answer language, and participants who choose Chinese as their survey

language are more likely to use Chinese as their answer language.

Language exposure. The participants are asked to assign a percentage to the amount of time

they are exposed to each of their known languages in their daily lives, 77 of them (50.7%) assigned

more percentage points to Chinese than to other languages. This includes 42 participants currently

residing in Taiwan. Three participants (2%) ranked Cantonese, and 52 (34.2%) indicted English as the

language they are most exposed to. The rest of the 19 participants indicated that they are exposed to

Chinese and English equally (50% and 50%) on a daily basis.

A Pearson’s correlation was computed to assess the relationship between number of years living

in the US and the amount of language exposure. The result shows a moderate, positive correlation

between the two variables, r(152)=0.627, p<0.001. The longer a participant has lived in the US, the

more they are exposed to English on a daily basis. (Figure 2)


Figure 3. Daily English Exposure in Percentage vs. Number of Years Residing in the US

A Mann-Whitney U test was conducted to determine whether there was a difference in the

amount of daily English exposure between participants who are English dominant and participants who

are Chinese dominant. A Mann-Whitney U test is used because of the difference in group size, non-

normal and different distributions of the data, and the existence of outliers. The result shows that

English dominant participants are exposed to a higher percentage of English daily use (Mdn = 80%)

than Chinese dominant participants (Mdn = 35%), U = 361, p < 0.001.

Furthermore, Kruskal-Wallis tests show that although the amount of daily English use is

positively related to the choice of answer language (X2 [4, 152] = 24.232, p = <0.001), it has no impact

on the choice of survey language (X2 [4, 152] = 9.135, p = 0.058). As Figure 4. Survey Language and

Amount of Daily English Exposureshows, the amount of language exposure does not have a clear

relation with the selection of survey language selection. There is a slight trend downward tick that

suggests there are fewer and fewer participants who selected Chinese as the survey language as the

amount of daily English exposure percentage increases, and a slight upward trend for participants who

selected English as their survey language. The trend is not obvious, hwoever.


Figure 4. Survey Language and Amount of Daily English Exposure

Language preference. Another aspect of the language profile is the participant’s preference to

speak or read in a language if they are given full control of the situation. Participants are asked, if they

are to communicate with a person who has the same language abilities, which language would they

choose to use. The three Cantonese and three Taiwanese speakers strongly prefer their dominant

language, Cantonese and Taiwanese respectively, over other languages. Of the rest, 109 (71.7%) chose

Chinese, 21 (13.8%) chose English, and 16 (10.5%) indicates an equal chance of choosing English or

Chinese.

A chi-square test of independence was conducted to examine the relation between dominant

language and the preferred spoken language for the Chinese or English dominant participants who

clearly favors one over the other for communication. The relation between these variables was

significant X2(1, N=152) = 24.2, p<0.001. Chinese dominant participants are more likely to choose

Chinese as the spoken language, and English dominant participants are more likely to choose English

as the spoken language. A Mann-Whitney U test was conducted to determine if there is a difference in

spoken language preferences between English dominant participants and Chinese dominant

participants. The results show that English dominant participants have a higher preference for English

as the spoken language (Mdn = 50%) than Chinese dominant participants (Mdn = 10%), U = 1070.5, p


< 0.001.

For text language preference, participants were asked what language would they choose to have

a document in an unknown language be translated into. Chinese is chosen by 93 (61.2%) participants,

followed by English by 47 (30.9%) participants. Ten of the participants (6.6%) indicated that there are

equal chances (50% and 50%) they would choose either English or Chinese, and two of the Cantonese

speakers (1.3%) chose Cantonese as their preferred text language. For the English or Chinese dominant

users who have clear preferences over which language for translation, a chi-square test of independence

was performed to examine the relation between dominant language and preferred text language. The

relation between these variables was significant, X2(1, N=139) = 31.4, p<0.001. A participant is more

likely to choose their dominant language as the translation language. A Mann-Whitney test was

conducted to determine if there is a difference in translation language preferences between English

dominant participants and Chinese dominant participants. The result shows that English dominant

participants have a higher preference for English as the translation language (Mdn = 65%) than Chinese

dominant participants (Mdn = 15%), U = 746, p<0.001.

A Pearson’s correlation coefficient was calculated to estimate the relationship between the

spoken and the text language preferences. The results, r = 0.420, n = 152, p<0.001, indicates a

moderate to weak, positive relation between the two, as illustrated in figure 2.


Figure 5. Scatter Plot – English as Spoken Language vs. English as Text Language

Kruskal-Wallis tests were performed to see if language proficiency has any influence on the

spoken and reading language preferences. The nonparametric Kruskal-Wallis test was used because the

sample failed the normalcy assumption for ANOVA tests (Moore, McCabe, & Craig, 2009). The results

show that a participant’s assigned possibility of using English as the spoken language is higher when

he/she has a higher rated English proficiency, X2(4, 152) = 26.215, p<0.001. Similarly, a participant

assigns higher possibility to use English as the reading language when he/she has a higher rated English

proficiency, X2(4, 152) = 47.132, p<0.001.

Pearson’s correlation was calculated to examine the relationships between the amount of daily

English use (in percentage) and the participant’s preference of spoken and translation languages.

The results indicate a mild, positive relationship between amount of daily English use and

English as the preferred spoken language (r[150] = 0.409, p<0.001), and a stronger, positive

relationship between amount of daily English use and English as the preferred translation language

(r[150] = 0.543, p<0.001).


Culture identification. To explore the impact environmental language exposure has on a

participant’s language attitude, they are asked to list and rank the cultures that they most identify with.

Most of the participants expresses the strongest identification with the culture of their birth country: 66

participants (43.4%) identified most with Taiwanese culture, 42 (27.6%) with Chinese culture, 22

(14.5%) with North American culture. A smaller number of participants identify most with religions: 14

(9.2%) with Christianity and 2 (1.3%) with Taoism. The rest of the participants each (0.7%) identifies

most with: Asian culture, Chinese American, Taiwanese American, Japanese culture, and Western

culture (a combination of North America and European).

Daily language use. When it comes language use, 114 participants (75%) have been using

English daily for five years or more, 10 participants (6.6%) three to four years, three (2%) two to three

years, four (2.6%) for under a year, and 21 participants (13.8%) had not have a chance to use English

daily consistently. In comparison, 146 participants (96.1%) have been using Chinese daily for five

years or more, 1 (0.7%) for two to three years, 1 (0.7%) for under a year, and four (2.6%) for having

never used Chinese daily consistently (Figure 6).

Pearson’s correlation was calculated to observe the relationship between the number of years a

participant has lived in the US and the number of years they have been using English daily. The two

variables are moderately related (r[150] = 0.513, p<0.001). It is likely a person who has lived longer in

the US would also have been using English daily longer. It is, however, not a strong correlation, and

one variable is not an indication of the other.


Never 1-2 Years 2-3 Years 3-4 Years 5 Years or More0

20

40

60

80

100

120

140

21

4 310

114

4 1 1 0

146Daily Use of Language

Use English Daily Use Chinese Daily

Figure 6. Daily use of language.

Interestingly enough, Pearson’s correlation finds that a participant’s accumulated English using

experience, expressed as the length of time a person has been using English daily consistently, is only

mildly correlated to the amount of daily English exposure the person experiences (r[150] = 0.48, p <

0.001).

Figure 7. Amount of Daily English Exposure and the Length of Daily English Use

Note: For English daily use, 0 – Never consistently used English daily, 1 – Have been using English daily for under a year, 2 – for 1-2 years, 3 – 3-4 years, 4 – 5 years and more.

A Mann-Whitney test result shows that participants with English as dominant language are


likely to have used English daily longer (U = 1470.5, p = 0.004).

Pearson correlation was calculated to see if the number of years participants use English daily

correlates to the participant’s spoken and translation languages. There is a moderate, positive

relationship between the duration of daily English use and English as the preferred spoken language

(r[150] = 0.384, p<0.001), and a weaker, positive relationship between duration of daily English use

and English as the preferred translation language (r[150] = 0.259, p<0.001).

The participants were asked about their language use in six settings: at work, communicating

with family, watching TV, communicating with friends, learning new subject, listening to radio, using

the Internet, and reading. They were asked to indicate which language they use more in each of the

settings. The results are summarized in Table 1.

Table 1. Use of Language in Different SituationsUse More Eng-lish

Use More Chi-nese

Type of Envi-ronment Frequency

Percent-age Frequency

Percent-age

Work 98 64% 49 32%Family 22 14% 130 86%TV 72 47% 59 39%Friends 38 25% 113 74%Learn 108 71% 41 27%Radio 73 48% 41 27%Internet 78 51% 54 36%Reading 72 47% 75 49%

Note: Bold font highlights the higher percentage.

Language fluency. Participants are asked to rate their ability to read in each language. They are

asked if they can: (1) only understand words and phrases; (2) read simple paragraphs on familiar

subjects; (3) read general news articles as well as reports on familiar subjects; (4) read all styles and

forms on familiar subjects; and (5) read as well as native speakers. Most of the participants (133,

87.5%) have English proficiency level that can at least read and comprehend general articles written in


English. Even more participants (147, 96.7%) have Chinese proficiency level that can at least read and

comprehend general articles written in Chinese. The breakdown is presented in Table 2 and 5.

Table 2. Participant English Reading ProficiencyProficiency Frequency Percent Cumulative PercentWords and phrases 2 1.3% 1.3%Simple paragraphs 17 11.2% 12.5%General articles 57 37.5% 50.0%All style and forms of writings on familiar subjects 36 23.7% 73.7%Equivalent to educated native speakers 40 26.3% 100.0%Total 152 100.0%

Table 3. Participant’s Chinese Reading Proficiency

Chinese Proficiency

Proficiency Frequency Percent Cumulative Percent Words and phrases 3 2.0% 2.0% Simple paragraphs 2 1.3% 3.3% General articles 11 7.2% 10.5% All style and forms of writings on familiar subjects 2 1.3% 11.8% Equivalent to educated native speakers 134 88.2% 100.0%Total 152 100.0%

In total, 105 participants (69%) rated their Chinese reading ability higher than their English

reading ability, while 10 (7%) rated their English reading ability to be better than their Chinese reading

ability and 37 (24%) of them rates their Chinese and English reading abilities being equal.

A chi-square test was performed to examine the relation between a person’s living in an English-speak-

ing country and his/her English proficiency. The relation between the variables was significant, X2(2,

N=152) = 42.246, p<0.001. Participants who reside in an English-speaking country are more likely to

have a higher self-rated English proficiency level. A Kruskal-Wallis test was performed to examine the

relation between the length a person has lived in an English-speaking country and his/her English profi-

ciency. The test excludes participants who have not lived in an English-speaking country. The result

shows that there is a not a significant effect between the two variables, X2(4, N=116) = 9.430, p=0.051.


Participants with higher English proficiency have not necessarily lived longer in an English-speaking

country. A Pearson correlation coefficient was computed to assess the relationship between the number

of years a participant has lived in the US and the participant’s English proficiency. The result shows a

mild, positive correlation between the two variables, r=0.545, n=152, p<0.001. A scatterplot in Figure 3

summarizes the result. It is likely that the longer a participant has lived in the US, the higher his/her

English proficiency.

Figure 8. English proficiency and number of years residing in US

Note: English proficiency level is measured by a five-point Likert scale: 1 is the lowest level: able to recognize words and phrases; 5 is the highest level: equivalent to native born speakers.

However, a Kruskal-Wallis test performed on participants with higher English proficiency

shows among participants who can at least read and comprehend English articles written for the general

audiences, the ones with higher English proficiency level are likely to have lived in an English-speak-

ing country longer, X2(2, N=133) = 19.645, p<0.001.


To examine the effect of daily use and English proficiency, a Kruskal-Wallis test was per-

formed. The result was significant, X2(4, N=152) = 41.234, p<0.001. Participants who have been using

English for longer periods of time are more likely to report a higher English proficiency rate. However,

as with the duration one has spent in an English-speaking country, this influence of daily use is not ex-

clusive nor absolute. There are participants who have not been using English daily or have not lived in

English-speaking countries who have high English proficiency ratings.

Pearson’s correlations were calculated to examine the effects of language use and language ex-

posure on language proficiency. The results show that there is a strong, positive relationship between

the amount of daily English exposure and a person’s English proficiency (r[150] = 0.600, p<0.001),

and a strong, positive relationship between the duration in which a person has been using English daily

and the person’s English proficiency level (r[150] = 0.495, p<0.001).

A cross examination of the fluency ranking with dominant language choices shows that none of

the 19 participants who rated themselves at the lower English proficiency level (only know words and

phrases, or can read simple paragraphs) has English as their dominant language. A chi-square test was

administered on the 133 participants with middle to higher English proficiency ratings (can read gen-

eral articles, is able to understand all style of forms of writing on familiar subjects, and equivalent to

educated native speakers) to examine the relation between English fluency and the choice of English as

dominant language. The relation between the variables was significant, X2(2, N=133) = 28.1, p<0.001.

Participants with higher English proficiency are more likely to choose English as their dominant lan-

guage.

The result was confirmed by a Mann-Whitney U test conducted to determine whether there was

a difference in the participants’ English proficiency level between English dominant participants and

Chinese dominant participants. The results indicate that English dominant participants have a higher

English proficiency level (Mdn = 5) than Chinese dominant participants (Mdn = 3), U = 711, p <0.001.


However, Chinese dominant participants do not necessarily have lower English proficiencies. A

majority of them (86%) rated their English proficiency as moderate (able to read general articles) and

higher. In comparison, all of the English dominant speakers (100%) rate their English proficiency as

moderate or higher. (Figure 6.)

Figure 9. English Proficiency Level and Dominant Language

Furthermore, chi-square tests were performed on these participants to examine the relation be-

tween English proficiency level and the selection of English as survey language and answer language.

The relation between English proficiency level and the choice of survey language was significant, X2(2,

N=133) = 18.933, p<0.001. Participants with higher English proficiency are more likely to choose Eng-

lish as the survey language. The relation between English proficiency level and the use of English as

the answer language was significant, X2(2, N=133) = 22.25, p<0.001. Participants with higher English

fluency level are more likely to use English to answer the survey questions.

Bar charts provide a better look of the relationships between English proficiency level and sur-

vey and answer languages (Figure 10 and Figure 11). Although as English proficiency level increases,

the number of participants choosing English as survey language also increases, in general, Chinese is

still the preferred survey language. As for answer language, at the top two highest English proficiency

level, there are more participants who prefer English than participants who prefer Chinese.


Figure 10. English Proficiency and Survey Language

Figure 11. English Proficiency and Answer Language

Language Preference in General

The final question about the participant’s language profile asks which language they prefer to

use, and why: 83 (54.6%) of the participants prefer Chinese, 31 (20.4%) prefer English, and 38 (25%)

indicates that they don’t have a preference. Participants with the lowest two English proficiency ratings

prefers non-English languages, including Mandarin Chinese, Cantonese, and Taiwanese. A chi-square

test was performed on the participants with higher English proficiency ratings to examine the relation

between English proficiency and the selection of English as their preferred language. The result was

significant, X2(4, N=133) = 31.176, p<0.001. A participant with higher English proficiency is more

likely to prefer English.

Looking at the data, it appears that participants’ language preferences are different from their

dominant language (Table 6). Although preferred language and dominant language do not always align,


a chi-square test shows that participants who has English as their dominant language are more likely to

prefer English as well, X2(2, N=152) = 40.675, p<0.001.

Table 4. Language Preference and Dominant Language Comparison

Language PreferenceTotalNot English English No Preference

Dominant Language

English Count 5 19 9 33% of Total 3.3 12.5% 5.9% 21.7%

Not Eng-lish

Count 78 12 29 119% of Total 51.3% 7.9% 19.1% 78.3%

Total Count 83 31 38 152% of Total 78.3% 20.4% 25% 100.0%

Kruskal-Wallis tests were conducted to determine whether a participant’s language exposure,

use, and amount of use are related to his/her preference toward English. The first Kruskal-Wallis test

was conducted to evaluate the effect of the number of years a participant has lived in the US. The result

shows that the longer a participant has lived in the US, the more likely the participant would lean to-

wards English as their preference.

A Kruskal-Wallis test was performed to determine whether there are differences in the amount

of time a participant has been using English daily and their language preference. The result shows that

the amount of time a participant has been using English daily has statistically significant impact on a

participant’s language preference, X2(2) = 22.336, p < 0.001.

A Kruskal-Wallis test was performed to examine whether there are differences in the percentage

of daily exposure to language between participants who prefer English, Chinese, and have no language

preference. The result shows that there are significant differences among the amount of daily language

exposures as expressed in percentages among participants with different language preferences, X2(2) =

52.828, p < 0.001.


When asked about why they prefer one language over the other, participants cited reasons such

as “convenience”, “work environment”, and “Easy to use”. The results are clustered by the participant’s

choice of preferred language: Chinese, English, or Neither. Their statements were reviewed and catego-

rized. Statements with similar meanings were grouped together for clarity. An example of the coding

process is displayed in Appendix VIII Coding Framework. To ensure coder reliability, the coding

process was repeated twice: after the results were coded for the first time, the researcher reviewed the

raw data after a few days and coded them again without referencing back to the first set of results. The

two sets of codes were compared and, besides minor wording changes (“Integrating into culture” to

“Cultural assimilation”), are in accordance, indicating coding reliability. The results are shown in Table

5.

Table 5. Participant’s reasons for preferring one language.

Chinese as pre-ferred language

Fluency English as pre-ferred language

Fluency No specific preferences

Depends on purpose

Accustomed. Accustomed Depends on context

Better for express-ing thoughts

Better for expressing thoughts

Depends on audience

Frequency of use and exposure

Frequency of use and exposure

Familiarity Easier to typeEasier to compre-hend

Personal preference

Identify with cul-ture

Language skill main-tenance

Personal preference History of use

For participants who prefer one language over another, fluency, the habit of using a language,

the feeling that one language is better for expressing their thoughts, and the amount of use and expo-

sure to a language are important factors to language choice.


A chi-square analysis was performed to see if cultural identification has any influence on the

participant’s language preference. A list of the cultures that the participants most identify with was ex-

tracted from the dataset. A second list of languages that each culture corresponds with, such as Chinese

for Taiwanese culture and English for North American culture, was composed. Cultures that do not di-

rectly correspond to a language, such as Christianity and Western culture, is omitted. Participants with-

out a preferred language are also omitted in this analysis. The results show that participants who iden-

tify most with a Chinese-based culture are more likely to prefer Chinese, X2(2, N=99) = 19.130,

p<0.001, although, as summarized in Table 8, more than half (56%) of the participants who identify

with English-based culture still prefer Chinese.

Table 6 Dominant language and the corresponding language to dominant culture

Dominant Culture Corre-sponding Language

TotalChinese EnglishDomi-

nant Lan-guage

Chinese Count 96 10 106% 90.6% 9.4% 100.0%

English Count 15 15 30% 50.0% 50.0% 100.0%

Total Count 111 25 136% 81.6% 18.4% 100.0%

Language Use Scenarios

Language uses in daily life. Participants were asked if they use English more than Chinese and

vice versa for the following tasks every day: at work (Work), watching TV (TV), on the Internet for

personal or recreational purposes (Internet), to communicate with friends (Friends), reading for fun

(Reading), to learn new things (Learn), to communicate with family (Family), and listening to the radio

(Radio). The results are shown in Table 7 and Figure 1. Chinese is more often used in personal

communications with family and friends: 130 of the participants (86%) uses Chinese more than English

to communicate with family, and 113 of the participants (75%) do so with friends. There are also


slightly more participants (75, 51%) who read Chinese materials more than English ones. For the rest

of the activities, more participants prefer English over Chinese. The most significant English

preference appears at work (98 participants, 64%), learning new subject (108 participants, 72%), and

Internet use (78 participants, 81%).

Table 7. Daily Activity and Language UseWork TV

English Chinese English ChineseCount

Per-cent

Count

Per-cent

Count

Per-cent

Count Percent

No 54 36% 103 68% 80 53% 93 61%Yes 98 64% 49 32% 72 47% 59 39%To-tal

152 100% 152 100% 152 100% 152 100%

Friends ReadingEnglish Chinese English Chinese

Count

Per-cent

Count

Per-cent

Count

Per-cent

Count Percent

No 114 75% 39 26% 80 53% 77 51%Yes 38 25% 113 74% 72 47% 75 49%To-tal

152 100% 152 100% 152 100% 152 100%

Family Radio

English Chinese English Chinese

Count

Per-cent

Count

Per-cent

Count

Per-cent

Count Percent

No 130 86% 22 14% 79 52% 111 73%Yes 22 14% 130 86% 73 48% 41 27%To-tal

152 100% 152 100% 152 100% 152 100%

Internet Learn

English Chinese English Chinese

Count

Per-cent

Count

Per-cent

Count

Per-cent

Count Percent

No 74 49% 134 88% 44 29% 111 73%Yes 78 51% 18 12% 108 71% 41 27%


To-tal

152 100% 152 100% 152 100% 152 100%

Work TV Internet Friends Reading Learn Family Radio0

20

40

60

80

100

120

140

Language Use in Different Settings and for Dif -ferent Purposes

English Chinese

Figure 12. Language Choice for Different Situations

The research questions asked in chapter 2 concerns most with Internet use. Statistic tests were

therefore completed to examine the relation between Internet language choice and variables including:

length of consistent daily English use (history of language use), language proficiency, and dominant

language choice.

Mann-Whitney tests were conducted to examine the relation between a participant’s using Eng-

lish more than Chinese on the Internet, and the amount of English he/she is exposed to on a daily basis,

as well as his/her history of language use. The results show that neither of the variables have statisti-

cally significant impact on the participant’s Internet use language. Amount of daily English exposure

(U = 2824, p = 0.819) and the number of years a participant has been exposed to English daily (U =

2475.5, p = 0.371) does not influence the participant’s Internet language choice, either. As Figure 13.

Language Use Online and the Number of Years Living in USand Figure 14. Language Use Online and


Daily Language Exposure demonstrates, there is no clear pattern between the language exposure vari-

ables and the preference of English as the online language.

Figure 13. Language Use Online and the Number of Years Living in US

Figure 14. Language Use Online and Daily Language Exposure

Similarly, a Mann-Whitney test also shows English proficiency as non-influential to online lan-

guage use (U = 2475.5, p = 0.114).

User’s attitude and perception of languages are examined next. Chi-square test performed on

dominant language and the comparison of English and Chinese use shows that the relation between the

two variables was not significant, X2(1, N=152) =1.456, p=0.228. The likelihood of a participant using

English more than Chinese online does not increase or decrease based on their dominant language. It is

in fact quite even between the English dominant and the Chinese dominant participants, as Figure 15.

Domain Language and Online Language Preference Comparison illustrates.


However, there is significant relation between a participant’s language preference for the Inter-

net and his/her actual language use online, X2(2, N=152) =8.393, p=0.015. The relation is not entirely

clear cut, however. Figure 16. Language Preference in General and Online Language Preference Com-

parison shows participants who prefer Chinese in general are more likely to prefer Chinese online. Par-

ticipants who do not have a preference between Chinese and English for general use are more likely to

prefer English for Internet use. Participants who prefer English for general use are evenly split between

preferring English online and preferring Chinese online.

Figure 15. Domain Language and Online Language Preference Comparison

Figure 16. Language Preference in General and Online Language Preference Comparison

Language use and the Internet. All the participants reported spending at least an hour on the

Internet every day, and 46 (30.3%) spending more than six hours. (Figure 5)


Less than 1 hour 1-2 hours 3-4 hours 5-6 hours More than 6 hours

0

10

20

30

40

50

60

0

22

57

27

47

Daily Internet Use

Figure 17. Daily Internet use

Participants are asked about their language uses for the following online activities: do work

related research, shop, do leisure and personal interest research, catch up on news, and use social

media. They are asked to rate the frequency they engage in each of the activities in Chinese and/or

English in a five-point Likert scale: 1- never, 2- rarely, 3- occasionally, 4- frequently, and 5- always.

Every participant uses a mix of both languages at certain points; none of them used one

language exclusively for all of the listed activities. Most of the participants (106, 69.7%) use both

Chinese and English in every activity, although not equally. A smaller group (46, 30.3%), uses one

language for specific occasions, such as using only Chinese on Social Media, or using only English at

work. Table 8 is a count of the number of times an activity-language pair was used by a participant.

Each activity-language pair are used by over 85% of the participants. The ones with the least

use are online shopping-Chinese (average rating 2.8), and work-related research-Chinese (average

rating 2.98). These two activity-language pairing received the lowest frequency ratings. The other

activity-language pairings are used by 96% to 99% of the participants. Table 9 contains the complete

count of ratings and activity-language pairing, further elaborating the results presented in Table 8.


Table 8 Participant Online Activity-language Use Summary

Activity Work Related Research

Shop Personal or Recreational

Research

News Social Media

Language Eng-lish

Chi-nese

Eng-lish

Chi-nese

Eng-lish

Chi-nese

Eng-lish

Chi-nese

Eng-lish

Chi-nese

Number of partici-pants

146 134 141 130 150 146 150 149 150 146

Percent-age

96% 88% 93% 86% 99% 96% 99% 98% 99% 96%

Table 9 Participant Online Activity-language Use

Work Related Re-search - English

Work Related Re-search - Chinese Shopping - English Shopping - Chinese

Frequency Percent Frequency Percent Frequency Percent Frequency Percent1 6 4% 18 12% 11 7% 22 14%2 10 7% 39 26% 26 17% 49 32%3 37 24% 34 22% 27 18% 36 24%4 40 26% 50 33% 42 28% 28 18%5 59 39% 11 7% 46 30% 17 11%

Total 152 100% 152 100% 152 100% 152 100%Avg. Rating 3.89 2.98 3.57 2.80

Personal or Recre-ational Research - English

Personal or Recre-ational Research - Chinese News – English News - Chinese

Frequency Percent Frequency Percent Frequency Percent Frequency Percent1 2 1% 6 4% 2 1% 3 2%2 23 15% 18 12% 24 16% 15 10%3 35 23% 30 20% 42 28% 38 25%4 59 39% 74 49% 54 36% 64 42%5 33 22% 24 16% 30 20% 32 21%

Total 152 100% 152 100% 152 100% 152 100%Avg. Rating 3.64 3.61 3.57 3.70

Social Media - Eng-lish

Social Media - Chi-nese

Frequency Percent Frequency Percent1 2 1% 6 4%2 20 13% 13 9%


3 54 36% 22 14%4 48 32% 79 52%5 28 18% 32 21%

Total 152 100% 152 100%Avg. Rat-ing 3.53 3.78

Mann-Whitney tests and Kruskal-Wallis tests were carried out to examine the effects on

language-activity pair rating by user’s dominant language, preferred language and English proficiency.

The relation between one’s dominant language and the frequency of using English for an online

activity is examined using the Mann-Whitney test because of the non-normality and the differences in

distribution of the data. The results, summarized in Table 10. Dominant Language and the Frequency of

Using English for Online Activities, are significant for each of those activities and dominant language.

Participants with English as their dominant language uses English more frequently than participants

with Chinese as their dominant language for work, shop, personal reasons, news, and social media.

Table 10. Dominant Language and the Frequency of Using English for Online Activities. Mann-

Whitney Test Result.

Work Related Research Shopping

Personal/Recreational

Research News Social NetworkingMann-Whitney U 1258.000 1046.500 954.000 856.500 943.500

Wilcoxon W 8398.000 8186.500 8094.000 7996.500 8083.500

Z -3.308 -4.226 -4.715 -5.151 -4.765

p 0.001 0.000 0.000 0.000 0.000

Dominant Language N Mean Rank

Work Related ResearchChinese 119 70.57English 33 97.88Total 152

Shopping Chinese 119 68.79


English 33 104.29Total 152

Personal/Recreational ResearchChinese 119 68.02English 33 107.09Total 152

NewsChinese 119 67.2English 33 110.05Total 152

Social NetworkingChinese 119 67.93English 33 107.41Total 152

The relation between one’s language preference for online text document (as obtained through

participants’ survey language choices) and the frequency of using English for an online activity is

examined using Kruskal-Wallis test, grouping the participants into participants who prefer English,

participants who prefer Chinese, and participants without a language preference. The results,

summarized in Table 11. Preferred Language and the Frequency of Using English for Online Activities,

are significant for each of those activities and dominant language. Participants who prefer English use

English more frequently than participants who prefer Chinese for work, shop, personal reasons, news,

and social media.

Table 11. Preferred Language and the Frequency of Using English for Online Activities

Work Related Research Shopping

Personal/Recreational Research News

Social Networking

X2 33.574 14.765 41.877 56.859 43.884df 2 2 2 2 2p 0.000 0.001 0.000 0.000 0.000

Preferred Language N Mean Rank

WrkFrqEng

Chinese 83 58.77English 31 103.18

No Preference 38 93.47Total 152


ShopFrqEng



FunFrqEng



News-FrqEng



SMFrqEng



A Kruskal-Wallis test was conducted to see if language proficiency has any impact on language

choice and online activities. The results, summarized in Table 12. Language Choice for Internet

Activity and Language Proficiency, are significant for every activity. Participants with higher English

proficiency use English more frequently for their online activities. Figure 18. Amount of

Personal/Recreational Research Conducted in English Clustered by English Proficiencygives an

example of how personal/recreational research relates to language proficiency. Participants with lower

English proficiency uses English to conduct personal or recreational research only rarely, if ever. This

trend is also seen in the other four Internet activities.


Figure 18. Amount of Personal/Recreational Research Conducted in English Clustered by English

Proficiency (1 – lowest, 5 – highest)

Table 12. Language Choice for Internet Activity and Language Proficiency

Work Research Shopping Personal Research News Social NetworkingX2 54.048 36.232 49.600 56.066 57.074df 4 4 4 4 4p .000 .000 .000 .000 .000

The correlation between the number of years a participant has lived in the US and his/her

language preference for each of the Internet activity was sought. The activity-language pairs have mild,

positive relationships with the number of years a participant has lived in the US. The results are

summarized in Table 13. Number of Years Living in the US and Conducting Internet Activity in

English.

Table 13. Number of Years Living in the US and Conducting Internet Activity in English

Work Research - English

Shopping - Eng-lish

Personal Research - English

News - English

Social Networking - English

Pearson Corre-lation

.490 .576 .434 .481 .410


p .000 .000 .000 .000 .000

Likewise, the correlation between the amount of daily English use and his/her language

preference for each of the Internet activity was examined. The activity-language pairs have stronger,

positive relationship with the amount of daily English exposure. The results are summarized in Table

14. English Daily Exposure (in Percentage) and Conducting Online Activity in English.

Table 14. English Daily Exposure (in Percentage) and Conducting Online Activity in English

Work Research - English

Shopping - Eng-lish

Personal Research - English

News - English

Social Networking - English

Pearson Corre-lation

.525 .565 .560 .661 .571

p .000 .000 .000 .000 .000

Language use online and why. The final question before the article selection exercise asks

participants for how decide what language to use online. The answers are reviewed, and the Chinese

answers translated into English. The answers are then categorized into groups, and coded. The coding

process follows the coding framework outlined in Appendix VIII. For example, “Depending on what

my search topic is” and “題材” are coded into “Subject matter”. The coding process was repeated

twice to ensure coder reliability. The results are presented in Table 15. Criteria for online language

choice.

Table 15. Criteria for online language choice

The purpose of the activity: whether it is to shop, to catch up on news, for entertainment, or for professional research.

Subject domain of the desired information, such as science, news, or tabloid.

Communication partner: what language would be used by the partner with whom the user is or may be communicating.

The setting: whether it is social networking, communicating with family, or for work.

Language proficiency.

Personal preference.


Habit of use.

Convenience level judged by ease and speed of use.

The language’s ability to express the user’s thoughts clearly and accurately.

The language used by the target website.

The cultural or geographical source of the information.

Information availability: whether the information would be available in English or Chinese.

Language of the computer system interface.

The credibility and authenticity of information as expressed in one language.

Input ability: the participant’s ability to type in a language.

The level of interest in the presented information.

Article Selection Exercise

Article Selection Results

In this section, participants choose from the Chinese and English versions of the same story the

version they prefer over the other. The results are summarized in Table 16.

Table 16. Article Selection Result

Article Title

Article Source Language

English Version Location

Subject Mat-ter

Chinese Selected

English Selected

The'Coming Home' of Zhang Yi-mou and Gong Li Chinese Left Entertainment 134 (88%) 18 (12%)

Cultural Revolution Nostalgia Chinese Right History 112 (74%) 40 (26%)

Home/Work | Alice Temperley's British Country House English Right

Interior de-sign 61 (40%) 91 (60%)

China's Monroe Doctrine Chinese Left Political 88 (58%) 64 (42%)

Your'Game of Thrones' Ques-tions, Answered English Left Entertainment 56 (37%) 96 (63%)

"Pinocchio Rex," China's New Dinosaur English Right Science 82 (54%) 70 (46%)


China Further Restricts Foreign Dairy Brands Chinese Right Local news 102 (67%) 50 (33%)

Searching for Meaningful Mark-ers of Aging English Right Science 59 (39%) 92 (61%)

The article selection results and the possible impact of the location of the language versions,

subject matter, the spoken language of the location in which the story originated (article source

language) are examined.

Article presentation order. An independent samples t test was performed comparing the mean

language selection results of articles with the English version presented on the right or on the left. The

article location has no effect on the Chinese version being select, t (6) = 0.435, p =0.679, or on the

English version being selected, t (6) = 0.816, p =0.446. There is no relation between a participant’s

selection and the order in which the language versions are presented.

Subject and story source. The results from the two entertainment stories shows no strong

relation between the subject of the article to the participant’s language choice. The first story is about a

Chinese movie by the Chinese director Chang Yimou, for which 134 of the participants (88%) selected

the Chinese version, and 18 (12%) the English version. The second story is about the American TV

series Game of Thrones, for which 56 participants (37%) selected the Chinese version and 96 (63%)

selected the English version. Same can be said of the science articles; 82 participants (54%) selected

the Chinese version for the first article, and 59 (39%) for the second article. The results appear to be

more closely linked to the language spoken in the geographic origin (source language) of the story as

demonstrated in Figure 19. Article Selection Result and Information Source Language..


Chinese Chinese Chinese Chinese English English English English0

20406080

100120140160

134

112

88102

61 56

82

59

18

40

6450

91 96

70

92

Article Selection Result by Source Language

Number Select Chinese Number Select English

Source Language

Figure 19. Article Selection Result and Information Source Language.

Note: Articles are reordered here to highlight the possible influence of article source language. Articles with Chinese source are grouped to the left, and articles with English source are grouped to the right.

Independent sample t-tests were performed to see if the information source language has any

influence on the article selection outcome. The result shows an effect of the source language on the

number of Chinese articles being selected, t (6) =3.922, p =0.008. For stories with Chinese origins,

such as the article Chinese milk powder, the Chinese version were chosen more times than the English

version. The result also shows an effect of the source language on the number of English articles being

selected, t (6) =2.481, p =0.048. For stories with English origins, such as the article about the British

interior designer, more English versions were chosen (M = 87.25, SD = 11.7) than the Chinese version.

Language use, exposure, and cultural identification. The effect the following variables have

on the article selection results are also examined: history of language use, as measured through the

length of time a participant has been speaking English daily, the length of time the participant has lived

in an English-speaking country, and cultural identification.

Pearson’s correlations found a moderate, positive relationship between the number of years a


participant has lived in the US and the number of English version content they prefer, r(150) = 0.358,

p<0.001.

Pearson’s correlations also indicate moderate, positive relationships between the amount of

daily English exposure and number of English articles selected (Figure 20. Amount of Daily English

Exposure and the Number of English Articles Selected), r(150) = 0.370, p<0.001, and the length of

time a participant has been using English daily and the number of English articles selected, r(150) =

0.373, p<0.001. In general, the longer a participant has used English on a daily basis, the more they use

English every day, the more English version articles he/she is likely to choose.

Figure 20. Amount of Daily English Exposure and the Number of English Articles Selected

However, a Kruskal-Wallis test shows that cultural identification of English-based, Chinese-

based, and religion-based cultures does not have significant impact over the language versions

participants choose, X2 (2, N = 152) = 2.267, p = 0.322. To test the impact of cultural identification, the

language each culture corresponds to, English for American Culture for example, are used to run the

tests.

Dominant language, language preference and language proficiency. The relation between

number of English versions chosen and the participant’s English proficiency and dominant language


are examined.

A Mann-Whitney test shows that the participant’s dominant language has significant impact on

the participant’s article selection outcome, U = 950, p <0.001. A participant who identifies English as

his/her dominant language selected more English version articles in the article selection exercise (Mdn

= 5) than Chinese dominant participants (Mdn = 3).

A Kruskal-Wallis test shows that one’s English proficiency has significant influence on the

person’s article selection outcome, X2 (4, n = 152) = 31.561, p <0.001. As Figure 21 and Figure 22

demonstrates, participants who selected a higher number of English version articles are more likely to

have higher English proficiency levels.

Figure 21. Number of English Articles Selected and English Proficiency Level

Figure 22. Average Number of English Version Articles Selected and English Proficiency Level

Language preference. The survey used in this study collects participants’ language preferences

from three sources: the participant’s chosen language to continue the survey (survey language), the


language the participant uses to answer the questions (answer language), and participant’s answers.

A Mann-Whitney test found significant relationship between the survey language and the

number of English version articles chosen in the article selection exercise, U = 1003.5, p<0.001.

Participants who chose English for the survey language chose more English versions (Mnd = 5) than

participants who chose Chinese for the survey language (Mnd = 3). Mann-Whitney test also found

significant relationship between the answer language and the number of English version articles chosen

in the article selection exercise, U = 1594.5, p<0.001. Participants who use English to answer questions

on average select more English versions (Mnd = 5) than participants who use Chinese to answer

questions, (Mnd = 3).

Participants are asked about their language preferences twice in the survey. In the language

profile section participants are asked which language do they prefer in general. The second time is after

the article selection exercise, and about the language participants prefer to read online, text-based

documents. The two language preferences differ in the context in which the languages are used,

however, a chi-square test shows that the two are related X2(1, N=152) =30.099, p<0.001. Participants

who prefer to use English in general are more likely to prefer English for the articles.

A Kruskal-Wallis test shows that participant’s language preference for general use has

significant impact on the number of English articles they choose, X2(2, N=152) =31.169, p<0.001.

Participants who prefer English are more likely to choose the most English version articles (Mdn = 6)

than those who has no language preferences (Mdn = 4), and even more so than participants who prefer

Chinese (Mdn = 3).

After the article selection exercise, participants are asked about their language preference for

text-based, online documents in general (online language preference). Participant answers are reviewed

categorized into four groups: English, Chinese, depending on situation, and no preferences. A chi-

square shows that the online language preference and general language preference are related X2(4,


N=152) =51.096, p<0.001. Participants who prefer to use English online are more likely to also prefer

English in general; participants who has no general language preference are more likely to also have no

language preference for online use. On close examination, however, it appears that though the two

types of language preferences are closely related, online language preference does not completely align

with general language preference, as illustrated in Table 17. There are slightly less participants who

have no general language preference (25%) than online language preference (30%).

Table 17. A Cross Comparison of General Language Preference and Online Language Preference

Online Language PreferenceTotalChinese Depends English

General Language

Prefer-ence

Chinese Count 60 17 6 83% within General Preference 72.3% 20.5% 7.2% 100.0%% within Online Preference 75.9% 37.0% 22.2% 54.6%

% of Total 39.5% 11.2% 3.9% 54.6%English Count 7 8 16 31

% within General Preference 22.6% 25.8% 51.6% 100.0%% within Online Preference 8.9% 17.4% 59.3% 20.4%

% of Total 4.6% 5.3% 10.5% 20.4%No Prefer-ence / De-

pends

Count 12 21 5 38% within General Preference 31.6% 55.3% 13.2% 100.0%% within Online Preference 15.2% 45.7% 18.5% 25.0%

% of Total 7.9% 13.8% 3.3% 25.0%Total Count 79 46 27 152

% within General Preference 52.0% 30.3% 17.8% 100.0%% within Online Preference 100.0% 100.0% 100.0% 100.0%

Post Article Selection Survey

Difficulty selecting language. After the article selection exercise, participants are asked a series

of open ended questions that ask them to reflect upon the article selection and decision making process.

The first question asks if it was easy for the participants to choose between the two languages when

they were presented with the excerpts and why. 132 of the participants (86.8%) thought it was easy, 14

(9.2%) thought it was hard, and 6 (3.9%) thought it was neither easy nor hard.


For the open-ended part of the question, after the English answers were translated, the complete

set of answers were reviewed, and clustered by whether the participant thinks selecting a language

version of the same article was easy or difficult. The answers are then coded following the coding

framework in Appendix VIII, and presented in Table 12. The coding process was repeated twice to

ensure coder reliability.

Table 18. Is it easy or hard to choose between different language excerpts?Easy I use my mother tongue.

I have a strong language preferenceI am more used to one of the languages.I am intuitively drawn to one language.Depends on my comfort level with the languageI can read both languageDepends on the topic of the article.Whichever is faster to read.Based on effort required to read the articleWhen the subject is unfamiliar or there are difficult terms, I use the language that I am more better at.The language in which I learned the technical terms.Whichever catches my eye.Whichever was on the left.I read whichever version seems shorter.I prefer the original version of the story, not the translationWhichever has the easier sentence structure and vocabularyBased on the logic, structure, and clarity of the article.Depends on the quality of the writing/translationBased on which version seems more accurate and/or precise

Difficult Both version tell the same story so it is difficult to choose.The story is neither Chinese or American culture related.I can read both language.

Language preference for news excerpts. The participants were asked if they have a preferred

language in general when they were presented with the news articles. Of the 152 participants, one

participant did not answer all of the questions and is therefore not included in the data analysis for this


section. The majority of the 152 participants have clear preference between Chinese or English

versions: 79 (52.3%) participants prefer the Chinese version, and 26 (17%) participants prefer the

English versions. There is also a group of participants that either has no strong preference, or that their

selection process involves more than the consideration of language in use. The result is presented in

Table 19. Language preference for the news article excerpts. and Figure 23.

Table 19. Language preference for the news article excerpts.

Frequency Percent Cumulative PercentChinese 79 52.3% 52.3

English 26 17.2% 69.5

Depends 33 21.9% 91.4

No preference 13 8.6% 100.0

Total 151 100.0%

52%

17%

22%

9%

Chinese English Depends No preference

Figure 23. Pie chart - language preference for the news article excerpts.

A chi-square test of independence was performed to examine the relation between a

participant’s dominant language and language preference for the news article excerpts. The relation

between these variables was significant, X2(3, N=151) = 31.336, p<0.001. Participants who view

English as their dominant language are more likely to prefer English for the news article excerpts.


Kruskal-Wallis tests were completed to investigate the possible effects of daily language use

and the duration participants have been using English daily on the article language preferences. The

results are statistically significant. Participants who are using English more everyday are more likely to

prefer English for online documents, X2(3, N=152) = 34.094, p<0.001. Participants who have been

using English daily for a longer period of time are also more likely to prefer English for online text

documents, X2(3, N=151) = 15.827, p = 0.001.

Finally, the impact of language proficiency is examined. A Kruskal-Wallis test found that

English proficiency level and language preferences for Internet use are statistically related, X2(4,

N=152) = 31.561, p<0.001. A closer look of the two variables using the pie chart in Figure 24 shows

that the impact of language proficiency is not evident until the highest proficiency level.

Figure 24. Pie Chart - Preferred Language and English Proficiency

Note: English proficiency level is measured by a five-point Likert scale: 1 is the lowest level: able to recognize words and phrases; 5 is the highest level: equivalent to native born speakers.

The participants provided their thoughts on language preferences in the open-ended portion of

the question. Their answers were reviewed, clustered, and coded following the coding framework

outlined in Appendix VIII. The coding procedure was completed twice to ensure coder reliability. The

result is shown in Table 20. Language preference reasons.

Table 20. Language preference reasons.English Chinese Depends


Higher language skill/fluencyHigher language skill/fluency

Original, not translated, version is better

Faster to read Faster to read Quality of the writingHigher amount of use and expo-sure Mother tongue Whichever is easier to "absorb"Exposure and familiarity to the subject in English

Higher reading com-prehension

The original language for proper nouns and technical terms

Accustomed to the use of Eng-lish Higher comfort level Subject and contentBetter understanding of unfa-miliar terms and concepts

Use it for unfamiliar terms and concepts

On screen location of the article (left or right)

To avoid translation error Less mental effortThe geographical location of the story

Convenience.Difficulty of and/or familiarity with the subject.Who the information will be shared with

“Which language draws you first.” The participants are asked if they are drawn to one

language more so than the other when the articles are presented to them, and if so, what are the reasons.

The participants’ answers were reviewed, clustered and coded following the coding framework

presented in Appendix VIII. The coding process was repeated once after the initial effort to ensure

coder reliability. The result is presented in Table 21.

Table 21. Why one language appeals to you first.

Instinct-Chinese Instinct-English Neither

Faster to read Faster to read Scan the title and make judgement

More frequently used More frequently used Look for jargonsUsed to using it Used to using it Judge by subject matterLives in a Chinese speaking environment

Lives in an English-speaking envi-ronment for a long time

Read the one on the left first

Mother tongue To improve English reading skill Article difficultyMore proficient in Chinese More proficient in English SubjectMore effortless to use Easier to grasp meaning Font sizeSpots Chinese proper name Spots English proper name Knowledge and familiarity of a subject was acquired in Eng-lish

Knowledge and familiarity of a subject was acquired in English


Leads to better reading com-prehensionScan the Chinese version title for clue on which version to peruseThe Chinese version appears shorter

Additional Thoughts

Finally, the participants were asked to share any additional thoughts they have regarding the

two languages and their uses. Their responses are reviewed and excerpted below:

I was unaware that I actually prefer English articles over Chinese. The language in which one learns a new subject heavily influences one’s language preference

when encountering the subject in the future. I lean towards using English because I need to improve my English proficiency. I use Chinese to help me learn English. If I need to memorize something, I use Chinese. Language selection is about social setting and interaction. But if the other person I am

communicating with someone who can speak both Chinese and English, I would use my dominant language.

It is harder for me to type in Chinese so I will use English on the Internet. Culture is an important part of language use. Some concepts can be better expressed in one language. Different languages sometimes embed different viewpoints. Frequently, the original writing is better than the translated version. Translations are sometimes

awkward and inaccurate. Mother tongue always dominates. Language is a tool. When to use it and which one to use depends on the situation and the

language’s ease of use. Pronouns and names are better in their original language. There is different beauty in different languages. The longer one lives in the US, the higher the possibility of preferring English. My Chinese has degraded but my English is not perfect. I am stuck in between.In the next chapter, the results of the data analysis are used to answer the research question.


Chapter 6. Discussion

In this chapter, the data presented and analyzed in the previous chapter are examined and

discussed in reply to the research questions introduced in chapter 2. Before the analysis is a brief

review of the research question and method. The rest of the chapter is structured to address one

research question at a time, followed by additional thoughts and insights from the data.

Research Question and Method Review

This research is built upon the fields of cross language information retrieval (CLIR),

information seeking behavior, and bilingualism. Reviewing existing CLIR and multilingual user

information seeking studies, it is evident that although efforts are made to understand bilingual

information users, there is still much we do not know. Most of the previous research on bilingual users

draw participants from the academia, and are related to uses of current information retrieval systems or

CLIR systems in testing. These studies found that language choice for Information seeking online are

mostly influenced by language proficiency; natures of the search task such as to do academic research

or to lookup local transportation information; and subject matters such as history, or science.

While enlightening, it bears pointing out that existing literature are rooted in current systems

and available information resources. Most of the studies center on the information seeking behavior of

bilingual or multilingual speakers, other impact factors recognized in bilingualism, namely the purpose

of the language use, language attitude, and the amount of language use have not been examined in the

information seeking context. This dissertation examines how information seeker interacts with

language and digital information resources and asks the question: “What elements within a user’s

language profile influences his/her language choice for digital text documents?” Five variables are

observed for potential impact on user’s language choice, and five assumptions are made:

1. Language attitude: The likelihood of a bilingual speaker chooses L2 increases when he/she


indicates a preference for L2.

2. Language exposure: The longer a bilingual speaker is exposed to an L2 environment, the

longer they have been actively using the language, and the more likely he/she would choose

L2.

3. History of language use: The longer a bilingual speaker has been using L2, the more likely

he/she would choose the L2 versions.

4. Language proficiency: The more proficient a bilingual speaker is with L2, the more likely

he/she will choose L2 for digital information resources.

5. Subject matter: The less familiar a bilingual speaker is with a subject matter, the more likely

he/she will choose L1 for information regarding that subject.

Data were collected using a modified LEAP-Q survey developed at Northwestern University to

build a language profile for the participant, and an article selection tool developed by the current

researcher to observe the participant’s reaction to different languages. Unlike other studies, this current

research did not ask users to perform information seeking task. The article selection exercise is used to

stimulate users’ thoughts and observe its result. Participants’ language choice is obtained through

behavior: Which language was chosen as dominant language? Which was used for the survey? What

language did the participant type their answers in? How many Chinese vs. English version of the article

did the participants select?

In total, 152 complete surveys were collected, and the data were analyzed in the previous

chapter. The results are discussed below to examine the effect of the five potential impact factors.

Language Attitude

The first assumption states that the likelihood of a bilingual speaker chooses L2 increases when

he/she has a favorable attitude towards L2.


Result

The results of the survey indicate strong relations between the number of English versions of

the articles selected and a participant’s (a) survey language choice, (b) answer language choice, (c)

general language preference, (d) language preference for text-based, online documents, and (e)

dominant language. Participants who selected more English version articles than Chinese version

articles are found to be more likely to: have English as their dominant language, prefer to use English

on a daily basis and online, chose English as their survey language and/or answer language, and

indicated a preference of English for the online text documents.

There was no significant relation between cultural identification and the article selection results.

The language spoken within a participant’s most identified culture has no impact on the article

selection outcome.

Discussions and Implications

As the previous paragraph states, the survey result finds language attitude, specifically

dominant language and language preference, and language choice to be related. Given all conditions

equal, when a person prefers a language, the person is more likely to choose contents written in that

language. However, the relation is not an absolute. Before we continue, let us first examine the impact

and interplay among dominant language, and the various iterations of language preferences.

Dominant language. Dominant language is defined in this study as the language that a person

intuitively uses first in his/her daily life. Although highly influenced by one’s first language, dominant

language can be a language acquired later in life. In fact, 25% of the participants in this study has a

dominant language that is not their first language. Dominant language is often the language that a

person is more exposed to on a daily basis. This could be a result of the person’s living environment,

such as living in an English-speaking country, or for work, such as working as a translator. The longer a

person is exposed to the language daily and consistently, the more likely the person would adopt it as


the dominant language.

While the dominant language is usually the language a person is most fluent in, it becomes

more unpredictable when the person is equally fluent in more than one languages. In these cases, other

factors become more influential. Factors beyond language fluency include: length and amount of

language exposure, frequency and amount of language use, and language preferences (discussed in

detail in the next section). This study has found several observable trends relating the above listed

factors to participant’s language using behavior. For example, participants with English as dominant

language are more likely to have lived in an English-speaking country longer, and has higher English

proficiency. In the article selection exercise, we see that participants with English as dominant language

are more likely to choose more English articles than participants with Chinese as dominant language. In

general, we see that when a participant is English dominant, he/she tend to have been exposed to

English more, use English more, and prefer English in different situations. The only exceptions occur

with online activities. Data analysis in the previous chapter showed that having English as dominant

language or having more English exposure do not increase or decrease a participant’s likelihood of

using English more than using Chinese online. This could imply a conscious choice of language when

faced with the use of an online information resource. Other research showed that information seekers

often favor English as the search language (Artiles, et al., 2006; Steichen, et al., 2014). English might

also be viewed as the overall dominant language online with the most information resources available

by the current study’s participant, leading many of them to intentionally choose English despite their

attitude towards or history with English.

Language preferences. Another way to examine the relationship between a person and

languages is to view it through language preferences. Steichen et al. (2014) found personal preference

to be a major impact factor to language use, but did not elaborate on the nature of personal preference. I

define language preference as a more conscious language choice whereas dominant language is an


intuitive one. A preferred language is one that a person likes to use better. It may not be a language that

is easy for the person, but he or she would deliberately choose to use it when the situation allows. As

with dominant language, language preferences are related to many different variables, including

language proficiency, dominant language, cultural identification, and finally, the number of English

version articles a participant chooses. The relations between language preferences and the variables are

not causal nor absolute but point to trends and possibilities. For example, a bilingual speaker with

English as dominant language is more likely to prefer English; a person who identifies more with

English-speaking cultures is more likely to prefer to use English. English preference, at least in the

written form, is moderately related to participants’ language choices for online activities (Table 11.

Preferred Language and the Frequency of Using English for Online Activities). For this study, we also

see that participants who prefer to use English online are more likely to choose more English version

articles.

Another difference between dominant language and preferred language is that language

preference is a multi-layered construct. In this study, six language preferences were identified and

examined: spoken language, written language, a passive language preference for reading the survey

(survey language), an active language preference for answering the questions (answer language), a

language preference for daily language use (general language preference), and a language preference

for online activities (online language preference).

Although the different preferences may be related to each other, they are each distinct and not

always strongly correlated. For example, although the chi-square test performed in the previous chapter

find survey language and answer language to be related to each other, they do not correlate completely.

As a matter of fact, over half (56%) of the participants who chose Chinese as the survey language used

English to answer the questions. The difference between survey language used for written text, and

answer language used to answer questions, are worth contemplating over. With reading a passive


language skill and writing an active one (Baker, 1998), reading requires less cognitive load and mastery

of active vocabulary. In other words, it is easier for someone to read a document written in a second

language than to write in it. Yet here are many native Chinese speakers who prefer Chinese as the

survey language using English to answer questions. While language proficiency does play a role in the

deciding factor of which language to use, three participants’ comments on the issue shone some light

onto the discrepancy. These participants chose English as the answer language because, as residents of

the US, they do not type in Chinese as frequently and efficiently. Furthermore, similar to what Steichen

et al. (2014) found with their subjects, some of the participants’ US-bought computer keyboards do not

come with Chinese input symbols, making it difficult for them to use Chinese as the input language.

Consequently, even though they chose Chinese as the reading/survey language, they resort to English

as the input/answer language.

The discrepancy between preferences for active and passive languages exists in participants’

choices for spoken and written languages as well. In the survey, participants are asked to choose one

language to converse with a partner whose language skills are equal to theirs. In a separate question,

the participants are asked to choose one language for a written document. For both questions,

participants either chose Chinese, English, or no preference. Most of the participants (89.5%) who

chose Chinese as their written language also prefer Chinese as their spoken language. Yet of the

participants who prefer English as their written language, only a little less than half (48.9%) also prefer

English as their spoken language. As with the difference between answer and survey language,

participants have provided some ideas as to why we are seeing this difference. Some participants find it

easier to fully communicate their thoughts with one of the languages. Some participants like to

challenge themselves with the language they are least proficient in for reading materials. One

participant point to cultural assimilation, or rather the inadequate degree of cultural assimilation, that

pushes him to choose to speak in his native language rather than a second language. Indeed, there are


many variables that influence language use and language preferences, many of which are not easily

foreseen.

The relationship between dominant language and the different language preferences are

similarly not in perfect coordination albeit somewhat related. Although the likelihood of someone

preferring one language increases if that language is also their dominant language, almost a third of the

participants did not follow the pattern. Most of these participants (83.33%, 25% of all participants)

have a dominant language, but could not decide on a preferred language. For the participants without a

strong preference over languages, the context, purpose, and communication partners are impact factors

to what language they will choose. For them, while dominant language is a personal choice, language

choice is complicated, and often not a solitary decision. It depends on the purpose and context of the

language use, as well as the social situation. Who are they communicating with? What will they

communicate about? Is there a specific audience they should prepare to share the information with?

Dominant language and language preferences are not straightforward conceptual constructs.

They are nuanced, distinct, and influence or are influenced by many different variables. While language

proficiency and the purpose of the language use are two of these variables that impact language

dominance and preference, they do not account for all.

Summary

This study has found that dominant language and preferred languages are different but each can

be used to predict the general outcome of a participant’s article selection result. Participants who are

English dominant and prefers English are more likely to conduct online activities online, and select

more English version articles. However, dominant language and preferred language are complicated

constructs with many facets. Participants have noted how different languages can elicit different

responses, and that language has an affective aspect that they respond to.

In CLIR and MLIR, a person’s language choice is often viewed as a reactive decision: An


information seekers choose a language based on the given conditions of their language proficiencies,

the nature and purpose of the information seeking tasks, and the domain in which the research is

conducted (i.e. Steichen et al., 2014). It is never the user’s choice to decide, based on the user’s likes or

dislikes on what language to use. I propose that MLIR and CLIR should be expanded to accommodate

user’s dominant language and language preferences for the dominant language may be the most

comfortable and efficient for the information seeker to use, and the preferred language with the most

affective reward. Over and over, participants cited “convenience”, “familiarity”, and “better article

flow” as their reasons to choose one language version of the same information content over another

version. Some of the reasons that were given, “a more beautiful language” for example, are subjective

and sentimental, different from the practical considerations of search task purpose and language

proficiency but should nevertheless be considered.

Deliberate consideration should be given to the different types of language preferences for

various language use. Spoken and written, active and passive, people respond to these language choices

differently. If we are to truly understand how information seekers use language in information seeking

situations, dominant language and language preferences should be viewed as important factors of a user

profile, and be taken into account when studying bi- or multi-lingual speakers’ information seeking

behaviors.

Language Exposure and the History of Language Use

The second assumption states that the longer a bilingual speaker is exposed to an L2

environment the more likely he/she would choose L2 versions. The third assumption states that the

longer a bilingual speaker has been using L2, the more likely he/she would choose the L2 versions.

Result

The results found that the number of years living in an English-speaking country, the amount of

daily English exposure, and the length of daily English use have positive impact on the number of


English articles that a participant chooses.

Discussion and Implications

Language exposure and history of language use are discussed here together because although

they are different concepts, they are statistically correlated, strongly and positively. Viewing the two

variables side by side provides a more complete picture of a person’s language profile.

Language exposure. The two indicators of language exposure examined in this study, the

number of years one has lived in US and the amount of English exposure they experience in their daily

environs, are moderately strong, and positively related.

Language exposure is the amount of time that a person is exposed to a language, either by

actively involved in conversations or passively through the surrounding environment. Someone who

does not live in an English-speaking environment may still be exposed to English through the media or

the use of the Internet. On the other hand, a person who lives in an English-speaking environment

might insulate him-/herself with Chinese material and only converse in Chinese. Even so, it is difficult

to not come into contact with English material outside of his/her immediate surrounding. Language

exposure is therefore observed through both the number of year a participant has been living in the US,

an English-speaking country, and the amount of daily English exposure they experience.

How long a person has been living in an English-speaking country and the amount of English

he/she is exposed to every day are positively related, but the relationship is not linear. A few

participants who do not live in an English-speaking country have also indicated high daily English

exposure. Likewise, participants who live in the US have indicated low daily English exposure.

The previous paragraphs gave a couple of examples of how this could come to be. Participants

also provided explanations. Some of the participants are often exposed to a language in their home or at

work that is different from the language used in the general environment. For example, a participant

who lives in Taiwan but works as an English translator is exposed to English almost daily and in great


amount. On the other hand, a participant who lives in the US but works in a Chinese office may have

limited daily English exposure. Many of the US-residing participants also speak Chinese among their

family.

Language use. Language use is related to language exposure but requires more active

involvement from the bilingual speaker. The survey results show a relatively strong, positive

relationship between language exposure and the amount of language use. It is likely, but not

definitively, that the longer a participant has been exposed to English, the longer they have been using

English daily. Use and exposure are different variables, however, and are examined separately.

Language use, exposure, and attitude. The survey results demonstrate that the longer a person

has lived in an English-speaking country, the more likely he/she views English as the dominant

language, and prefers English. Similar results are found with language exposure and the length of time

a person has been using English.

The statistical analysis used in this study is used only to find correlations and possibilities, not

to establish causal relationships. However, a causal relationship between the variables can be derived

from participants’ statements in the survey’s open-ended questions.

Asked about why they prefer one language over the other, several participants used the term “習慣了”, meaning they are used to it. Several participants prefer Chinese because it is the language they

have been using for the longest period of time, and that they are most familiar with it. The participants

imply that as language exposure increases, and language use history accumulates, their feelings about a

language and the inclination to choose one language over another being to shift. A little less than half

(46%) of the participants in this study, all Chinese-English bilingual speakers with Chinese being L1

and English being L2, either has no strong preference between Chinese and English anymore, or

switched to preferring English. As one participant puts it succinctly when asked about her preference of


English, she is: “[m]ore used to the language.” A second participant prefers English because she “use it

more on a daily basis”. Another participant prefers English because she has “been living in the US for

27 years, that’s the language I use [every day].” A fourth participant further says “[t]he longer I live in

the US, the higher the possibility I would use English.”

Similar statements were made by participants who has no preferences between Chinese and

English. One participant says she could not choose between the languages because “Chinese is native

language; English is the native language of the environment that I live in.” These comments suggest

that language exposure and length of language use have positive and observable impact on language

selection. Exposure and use can lead to familiarity until a participant “didn’t realize that I prefer to read

most articles in English,” or “…is no longer used to the way Chinese articles are worded.” In other

words, with exposure and use comes familiarity and habit forming, which in turn leads to the

establishment of preference and language dominance.

There is, however, one exception: A history of daily English exposure is not statistically

significant to the choice of English for survey language (p = 0.058). There are a few possible

explanations for this phenomenon. As previously mentioned, language exposure does not equal

language use. Some participants might feel more comfortable approaching the survey in Chinese

because Chinese is, after all, their mother language. The survey language choice is further discussed in

later sections.

Internet use. The survey asks participants to select the settings in which they use more English

over Chinese. One of the setting is “on the Internet for personal or recreational purposes.” The results

of the survey showed no relation between language exposure, history of use, and online language use

(Figure 13 and Figure 14). The results, however, does not comply with the findings of the later set of

questions about the language preference for each individual online activity. A later set of questions

found that language exposure and history of use are correlated to the frequency of Chinese or English


uses for individual online activities. The longer a person has been living in an English-speaking

country, the more frequently they would use English for various online activities, including work-

related research, reading news, socializing with friends, and online shopping. The same moderately

positive relationship is also found between the amount of daily English exposure and online activity

language choices.

The discrepancy between language use online in general and for listed activities could be a

caused by the survey not accounting for all possible online activities. For example, the survey did not

ask about online gaming or emailing. Regardless, it is clear that language exposure and history of

language use has no impact on the language choice for general Internet use. English is evenly preferred

by participants with or without extensive exposure to English or history of English use. Is this because

English is still the most common language used on the Internet (“Most Common Languages”, 2016) so

that users are required to use English for certain occasions online regardless of their preference and

language history? If so, it confirms the argument laid out by Petrelli et al (2004) that users choose a

language based on the task at hand. Or are users willingly using English in order to be engaged in a

broader range of online activity? Are the users satisfied with their language options? How often do they

require language-related assistance? These are questions that requires further research in the future.

Article selection results and digital document language preference. Both living in an

English-speaking country and having daily English exposure have moderate relationships with the

article selection results (see Figure 10). As with language attitude, although the relationship between

exposure and the article selection results are moderate, it shows a possible effect of language

environment on information seekers. Familiarity comes out of exposure. With familiarity comes

comfort that could lead to preference. For people who lives in a L2 environment, L2 gradually becomes

the main language they use over time, reducing their need for CLIR or MLIR features.

Another consequence of living in an English-speaking environment is that new concepts, terms,


and subjects are learned in English. One participant pointed out that “when I come into contact with

something new, the language in which the impression is formed is very important. Although Chinese is

my native language, for things I learned after I moved to the US, I am used to using English to search

for relevant information.” One other participant reflected, after the article selection, that she chooses to

read the dinosaur article in Chinese because everything she knows about dinosaur, she learned in

Chinese. This view is perhaps one of the reasons why multiple studies cited in Chapter 3 found users

gravitating toward a specific language when seeking for information in specific fields. When a field’s

publication is dominated by a language, it is true that there is more information available in that

language, it is also true that people often acquire the jargons, terms, and concepts in that language.

When the knowledge of a subject is acquired in a specific language, it is natural to use that language for

related information seeking. Therefore, for subjects with a commonly acknowledged dominant

language, it would perhaps be better to focus CLIR and MLIR resources and efforts on assisting

information access to information seekers not yet acquainted with the subject, nor have sufficient

knowledge of the proper terms and jargons.

Summary

Language exposure, operationalized as number of years living in the US and the amount of

daily exposure, and the history of language use are found to have positive relationship with

participants’ language preferences, dominant languages, language proficiency, online activity language

choice, and article selection results. To be sure, it is possible some of these variables have a cause-and-

effect impact on each other. For example, a person’s English proficiency would perhaps improve the

more they are exposed to English, and a person’s preference of English could lead him/her to be more

exposed to English, or vice versa. The interaction between variables and the influence of the interaction

should be explored further in future studies. From this study, participant input shows us that language

exposure and history of language use contribute to language familiarity, which in turn leads to


participant’s language preference, language dominance, as well as the behavior of choosing to use the

language. If this is the case, information seeking users would be more likely to prefer and use the

language of their immediate surrounding, and that CLIR and MLIR would be more needed for non-

native users who would benefit from accessing local information.

Language Proficiency

Assumption four is about a bilingual speaker’s language proficiency: The more proficient a

bilingual speaker is with L2, the more likely he/she will choose L2 for digital information resources.

Result

Language proficiency and article selection results are found to be statistically related. The

higher a participant’s language proficiency level, the more likely they would select more English

versions of the articles.

Discussion and Implications

Language proficiency is found to be intertwined with many factors of a person’s language

profile. It is a possible impact factor to participant’s language attitude regarding dominant language,

and the various language preferences.

Language proficiency and language attitude. The findings of this study show that language

proficiency and the variables that represent language attitude are statistically relevant to each other. The

higher a person’s English proficiency, the more likely English is his/her dominant, preferred, answer,

and survey language. Participants who can only reading simple English paragraphs overwhelmingly

view Chinese as their dominant language and preferred language for general use. Once participants

English proficiency improves, however, its impact on the person’s language attitude diminishes, and

the impact of other language exposure and use elements increases.

To investigate the possible influence of other language profile elements on highly proficient

bilingual users, let us examine at the participants with the highest proficiency ratings. There are 40


participants who rate their language proficiency as comparable to that of an educated native speaker’s.

For these participants, the number of years they have lived in the US is correlated to how long they

have been using English consistently daily (r = 0.404, p = 0.01), but not with their daily English

exposure amount (r = 0.334, p = 0.036). All of them but one lives in the US, with the average duration

being 18.5 years. All of them but one has been using English consistently, daily for longer than four

years. Furthermore, English is the language that most of these participants (90%) are exposed to more

than 50% of the time in their daily lives.

Mann-Whitney test, Kruskal-Wallis test, and Pearson’s correlations were conducted to examine

the relationships between these participants’ language attitude with language exposure and amount of

use. The results show that for participants with high English proficiency, the amount of daily language

exposure is related to dominant language outcome (U = 110.5, p = 0.015), and general language use

preference (X2[2, N=40] = 12.311, p = 0.002). The other variables have no bearing on language

attitudes. The result suggests that when participants are fluent with a language, the amount of language

exposure can influence how they perceive and approach languages.

This finding corroborates the findings in the previous section, Language Exposure and the

History of Language Use and highlights the importance of a user’s language environment with their

language use pattern. The influence of language exposure and language environment should be further

examined in the context of information seeking behavior and CLIR.

Language proficiency and language choice for online activities. Language proficiency, like

language exposure and language use history, is found to have no significant impact on language choice

for online activities. There is an even split between participants on what language they prefer to use

online. It is possible that the survey did not account for every online activity, and so the results are not

full representatives of the participants’ language use and preferences. It could also be that, as Steichen

et al. (2014) found, participants have come to expect English as the most commonly used English


online and adopted it as the default language. Furthermore, they are able to conduct online activities

using English to satisfaction, eliminating the need to use other languages.

Language proficiency and article selection results. Let us revisit the relationship between

English proficiency and the number of English version articles selected. When participants are asked to

comment on their article selection process, proficiency is mentioned over and over as one of the main

impact factors. Many of the participants prefer the articles in their native language which they are most

proficient with. With their native language, they are able to “read quickly and grasp the meaning of an

article accurately”. On the other hand, there are also participants who prefer English even though it is

their second language. These participants usually have high English proficiency to the point where they

“can read and write in English effortlessly”.

Whether the participants prefer English or Chinese, when language proficiency is the reason

they choose one language version over the other, it is mostly because with higher proficiency, less

cognitive efforts are required and the participants are able to obtain what one participant describes as

“higher efficiency”. There is also a sense of ease that comes with skill and familiarity; many of the

participants describe it as “convenience”.

Many of the existing studies cited in Chapter 2, such as Steichen et al. (2014), found

proficiency to be a major driving factor behind bilingual information seekers’ language selection

processes. Its impact would likely carry over to the information seeking process. However, from Figure

21. Number of English Articles Selected and English Proficiency Level, we can see that while one can

safely assume a participant who selected a higher percentage of English version articles would have

higher English proficiency, the same assumption cannot be made on participants who selected fewer

English versions. Slightly less than half (42%) of the participants with level 4 English proficiency (able

to read all styles and forms of documents related to professional needs), and slightly more than a

quarter (27%) of the participants with level 5 English proficiency (rival educated native speakers)


selected three or less English versions and five or more Chinese versions. There are clearly other

factors that influence the article selection outcome, some of which are discussed in the Subject Matter

section below.

No language preference. Not all of the participants who have at least moderate English

proficiency level have clear preferences between English and Chinese. In fact, 25% of the participants

said they either have no preference, or that they choose the language based on other conditions such as

translation quality and the subject of the article. These conditions are discussed in later paragraphs.

Some of the participants found it difficult to choose between Chinese and English because they

can read both languages fluently. For these participants, their English proficiency level negated the

impact of language proficiency, and bring other considerations into sharper relief. These considerations

are addressed in the later sections.

Summary

Language proficiency is an impact factor that influences participant’s language choice. Its

impact can usually be seen after a person’s language proficiency level is moderate or better.

Participants who are less fluent in English largely prefers Chinese in various situations, and views

Chinese as their dominant language. As the participant’s language proficiency improves, other variables

begin to take on more influence.

Subject Matter

The last assumption is about subject matter, and states: The less familiar a bilingual speaker is

with a subject matter, the more likely he/she will choose L1 for information regarding that subject.

Result

The survey asks participants to reflect upon the article selection exercise, and comment on the

process. Sixteen participants cited subject as the deciding factor for the article selection result. Several

of them specifically mentioning using L1 for topics that they are not familiar with, confirming the


fourth assumption. Two participants prefer L2 since it is the language in which they learned about the

subject. The rest of the participants view subject topic as an important factor, but not in the way this

current study assumed.

Discussions and Implications

Many of the existing studies view the purpose and subject field of a search task as major impact

factors to an information seeker’s language choice. This study took away the variable of the search

task, and presented the participants with a collection of articles of different topics. Even without the

search tasks, participants find the subject topic of the articles to be influential to their article selection

result. A few of them favored L1 (Chinese) for articles that are more difficult or subject matters that are

unfamiliar. A couple of them favored L2 (English) because they are more familiar with the jargons and

technical terms worded in English. Others cited other subject field related reasons, such as:

“If it is about China, I would prefer Chinese. If it’s about the West, I would choose English.”

“If it is scientific, I prefer English. Only when it is about Chinese internal [affairs], I read

Chinese.”

“Depending on the content of these articles—if the content is culturally related to Chinese, then

I would prefer to read it in Chinese; if it's a piece related to news or new studies, then the

English version will be my first choice.”

“If it is science related, I prefer the English version. I think it’s because I am trained as a

scientist in English.”

“Parenting + work + kids related articles = English. There are more/better info out there and the

people I share opinions with use English. Personal leisure + cultural/family related articles =

Chinese. So I can communicate and exchange ideas quickly.”

Although all of the statements above are about subject matter, each approached subject matter a

little differently and raised the following issues:


1. Where did the story originate from and what is the official language spoken at the

source?

2. Are the participants familiar with a subject? If they are, what language did they learn

about the subject in?

3. Who will the participants be sharing the information with and what language do they

speak?

4. In what language can the participant find more information about this subject? In what

language would they find more accurate information about this subject?

Let us examine these four issues one by one.

Story origin. The survey results show that to some participants, the original language

concerning the subject matter is important and much preferred.

As summarized in Table 16, four of the eight articles are China or Chinese related. Figure 19

charts the way participants respond to the articles and the perceived source language of the articles. The

chart shows that more than half of the participants chose the Chinese version for the Chinese-sourced

articles. On the other hand, Western culture related articles in general has more English versions

chosen. Further statistical test shows that source language is indeed a significant impact factor to the

article selection outcome, demonstrating information seeker’s tendency to want to read an article in its

source language.

A closer review of participant feedbacks discovers that the preference for source language is

mentioned over and over again when participants are asked about the article selection process. Many of

them stated they want to read the version that is written in the original language. A participant explains

“I read English if it is a Western report, and Chinese if it is a Chinese topic. This way, I get to read the

original versions.” Even though there are no indications which language version is the original, and

which is the translation, participants assign original languages to each article by the articles cultural or


geographical associations. As one participant puts it, “I choose language by guessing if the article is

originally written in the particular language.”

Some participants chose the original language in order to avoid proper name translations. The

transliterations of proper names are viewed as complicated or unwieldly and takes “too much work to

keep track of it”. When faced with the articles, some participants scan the title, and then decide on the

language version to choose based on the existence of proper names or technical terms.

Some chose the original language because “the articles sound more natural to me”. As one

participant states, “Chinese first. Unless when I first scanned the article, the Chinese translation is not

good or feels too forced.” Several participants similarity remarked on the translation quality of the

article, saying that “the sentences are awkward”, “the translation is hard to understand”, and “it is

uncomfortable to read”. One participant pointed out that he does not want to risk contents being lost in

translation, therefore he prefers the original version. For these participants, the original writings are

better than the translations, unless L2 is too difficult and their language proficiency does not support it.

For these participants, translations of an article may be enough to help them learn what an article is

about (Orengo & Huyck, 2006), it may not be good enough for the information seeker to continue to

read it.

Familiarity with the subject. Some participants do lean towards L1 when faced with an

unfamiliar subject, explaining their language choice: “If it is a subject I am not super familiar with I

will choose to read it in Chinese. However, if it is a subject that I am comfortable with I would prefer to

read it in English.” Time and again the decision on whether to continue with L1 or L2 hinges upon the

existence of technical terms and jargons.

Participants are found to base their language selection on whether there are unfamiliar terms in

the document, or because they associate the subject topic with a specific language.

Participants who came across unknown terms often choose to read the articles in the language


they are most fluent in. One participant explains her Chinese article selections by saying “… these

articles mostly include many technical terms, so I want to read the Chinese explanation of these terms”.

Another participant says “if the content is more technical, and in some cases, containing professional

terminologies, I will also read it in Chinese simply because it is easier.” For these participants, the

subject matter is already foreign and requires effort to understand. They use language selection as a

strategy to combat the difficulty of the content and lighten the cognitive load required to comprehend

the document. Similar coping strategies regarding language and task difficulty has been observed in

bilingual studies on language switching in composition (Ramirez, 2012).

Bilingual speakers can sometimes associate a subject topic with a specific language due to

personal experience or exposure to the subject topic. This is a phenomenon that has been observed by

linguists (Saville-Troike, 2003), it is also seen here. In this survey, participants display such association

by selecting the document version that is written in the language in which they are also more familiar

with the technical terms. One participant prefers English version documents because she is “no longer

familiar with professional terms in Mandarin”. Information seekers who are well versed in a subject

field that has a commonly acknowledged dominant language are likely able, and perhaps even prefer, to

use the dominant language for information seeking tasks on the subject topic. The acquisition of the

knowledge in the language, and the usage of it lead to familiarity which can develop into preference.

For linguists, topic is often entwined with the setting and purpose of the language use. Here, it is likely

so as well. Participants acquired the knowledge of a subject matter for purposes that could be work,

school, or for personal entertainment. The knowledge store begins to build, the accumulation of

relevant vocabulary and terminology grows, and the language the person used to acquire the knowledge

becomes dominant, and is used more which leads to further vocabulary accumulation. The cycle

continues, and the language used to pursue the knowledge becomes dominant and strongly preferred

for this subject domain.


From the above discussion, it appears that information seekers who are not familiar with the

technical terms and proper nouns used in the subject field would benefit more from CLIR features.

They would need help understanding the meaning of terminologies, forming search terms, and

comprehending search results.

Language is for communicating. Although the act of reading a digital document seems like a

solitary one, it carries a social component that cannot be ignored. The quote cited above demonstrated

how one of the participants look at the articles and consider the language choices based on who she

would converse with about the subject. It is an example of how information consumption is often for

further information dissemination. It reminds us that language is a tool used for communication. Even

when a person is reading an article by him- or herself, the future possibility of having to discuss it with

someone is never far from his/her mind.

When asked about what language they use for online activities and why, twenty participants

cited reasons similar to this: “[the language] choice depends on what I am looking for and with whom I

share info”, or “the language the audience is more inclined to using.” For these participants,

information seeking and information consumption are one step of the communication process. It is not

a standalone act. This is also evident in Figure 12 which shows participants, despite language

proficiency and attitude, overwhelmingly uses Chinese to communicate with family and friends. The

language is chosen for the purpose of communication, it does not reflect personal likes or dislikes, nor

personal abilities.

Information availability. The last quote cited in the beginning of this section mentioned

information availability as one of the criteria the participant considered when she was selecting

between Chinese and English. She looked at the subject matter of the article, and decide on the

language based on which one would likely yield the most or the best information. Her approach is

echoed by 24 other participants (16%) when they were asked how they select a language for online


activities. They choose the language based on “which language/culture will produce more results of

what I am looking for.” A participant searches for international news in English and Chinese news in

Chinese; another gave the example of using Chinese to search for Chinese-medicine related

information because “I use the language that can bring me more direct and updated information for a

subject well-known by the language.” When information seekers choose language by information

availability, their language options are limited by the resource language. The language choice is defined

by the website they visit. This will become an issue if the information seeker is not proficient in the

website’s language.

Other Findings and Observations

So far, we have evaluated the impacts on language selection by language attitude, language

exposure and the history of use, language proficiency, and subject matter. Participants mentioned other

factors that also influenced their language selection process.

One such factor is cultural identity. One participant who is highly fluent in English and exposed

to English 90% of the day felt duty-bound to choose Chinese. She later regretted her decision for she

uses English a lot more than Chinese in her daily life, so much so that formal Chinese wordings and

structures are harder for her to process. Another participant prefers to use Chinese because he does not

feel completely assimilated into US culture even after living in the US for a prolonged period of time.

For these participants, language selection is not the result of language proficiency or for ease of use,

but it is tied to their self-identity. Theirs are examples of how language cannot be separated from its

social functions. It is used to communicate ideas, and also social structures, identities, and standings.

There are also preconceived notions on the different quality, accuracy, and credibility of a

document written different languages even when the articles are parallel in content. At first glance, this

observation is similar to what subject in Rieh and Rieh (2005) displayed - a tendency to give more

favorable assessment to information in foreign documents, describing them as better, more credible,


etc. In Rieh and Rieh (2005), however, users were responding to existing information resources where

their availability in different language are disproportionate. The collection of L2, often English,

documents are larger than L1 collections. In this study, participants were provided with the same

information in two languages, and yet participants still view the documents as of different quality. One

participant likes to read Chinese history written in English because “history is subjective, sometimes it

is more interesting to read the English point-of-views”. A participant has this to say regarding the

Chinese local news article, she “feel[s] like I can trust English news more though I know they are

meant to be the same.” Another participant puts it bluntly, “I tend to think there is more fact in English

article.” These comments reflect further complexity in language attitude where participants are

responding to a language beyond what the language is used to convey. These participants assigned

attributes such as trustworthiness and truthfulness to the documents prior to reading the documents.

They read from the different wordings and grammar structure different tones to the same idea. They

approach the information with preformed judgments and opinions on the quality of the information that

the documents contain, thinking that different languages express one idea differently. These

preconceived notions could skew a person’s concept of information and approach to information

seeking. One of the participants explicitly noted that she might have answered the questions differently

if she chose to proceed with the survey in a different language. The language effects the way she

approaches the questions. It would be worthwhile to investigate how these opinions are formed, and if

the opinions influence an information user’s information seeking approach.

A different type of quality that has been mentioned by participants is the writing quality of the

digital documents. In the previous section on Subject Matter, we examined how participants look for

the source language of a subject matter and comment on the translation quality of the articles as their

language selection criteria. A few participants based their language selection on the quality of the

writing and in the beauty of language. One of them said his language selection depends upon the


document writer’s literary choices, accuracy of the wording, and literary aesthetics. “It is not just about

statements of facts,” he declares. Another participant noted the different feelings English and Chinese

evoke: “If I don’t find the writing in one language fluid, I will choose the other one.” A third participant

also noted the beauty in the use of words, and also that there should be “strong logic” behind the

writing. What these participants are saying is that language is objective and is appreciated as art. The

articles presented to them are read not only for the facts they contain, but also for enjoyment. In this,

they choose the language that they feel can bring the most joy. These participants provide another

instance where language serves more than being the medium that conveys an idea. As with invoking

cultural identity, it also appeals to information users aesthetically and to their emotion. Do these

emotion changes how users see an information resource? Would it change how they search for

information?


Chapter 7. Conclusion, Implications, Limitations, and Future Research

This research began by asking what other variables beyond language proficiency and subject

domain of the search task influences a Chinese-English bilingual information seeker’s language choice.

By asking the users to choose from parallel contents in two languages, this study is able to focus on the

effect of elements that make up an information user’s language profile on language selection. Elements

such as: language attitude (dominant language and language preferences), language exposure, history

of use, and language proficiency level, which were all found to impact a bilingual speaker’s language

selection result. The impact of the article’s subject matter is also observed but in a different manner

from other existing studies. Whereas most studies examine subject domain through the information

seeking task, this study examines how information seekers react to the subject of a digital document

and how it influences their perception and use of language.

The findings of this research demonstrate that the makeup of a bilingual user’s language profile

developed for bilingualism research is also instrumental in the language selection process for

information seeking, and it is complex in nature. Users’ language profile influences their approach to

language and by extension their acceptance or rejection of the document that is created in the language,

which then in turn effects the choice of information resources. Furthermore, participant responses in the

survey demonstrated that language is a multifaceted construct that often carries meanings beyond the

words that were used. Participant comments revealed unexpected preconceptions towards language,

and the expression of self-identity through language selection.

This study set out to explore how a person’s language profile might influence how they use and

react to languages they know. The findings confirmed some former research conclusions, and brought

to light new observations. There is still much to learn about user’s language profile and its relation to

user’s information seeking behavior and what it means to CLIR and MLIR researchers. The findings of


this study laid out possible trends and likely relationships among language profile variables and

language selection results. However, as an exploratory study, it is not without limitations.

Limitations

First and foremost, without a known population and an established subject profile for bilingual

information seekers, this study used purposive sampling approach combined with convenience sam-

pling approach, which results in self-selected participants. The recruitment method can potentially in-

troduce statistical error. As a result, it is crucial to recognize the exploratory nature of this study.

Secondly, this study is designed not only to observe users’ language selection outcome, but also

to prompt them to reflect upon language and information resources through the article selection out-

come. Since the articles are prescribed to the participants by the researcher, the articles are not always

of interest to the participants. There is a possibility that a participant does not respond to the selected

articles as how he/she would respond to digital information that he/she encounter in real life. There is

also a limit to the amount of feedback and observation that can be gained due to the online survey for-

mat.

Lastly, although participants complete the online survey on their own without being observed,

were guaranteed by the researcher that no judgement would be passed on their answers, and were told

that the survey is anonymous, there is still the possibility that some of them tailored their responses ac-

cording to what they think the answers should be. For example, one participant chose to do the survey

in Chinese because she thought it reflects the fact that Chinese is her mother tongue, not because she

preferred it. Another participant approached the article selection exercise like it is a test. These types of

participant bias could affect research outcome. There is still much to do to further our understanding of

bilingual information seekers.


Future Research

Similar to existing literature on bilingual or multilingual information seeking behavior, this

study has a focused sample frame that is narrow in nature. The sample framework for this study is lim-

ited to Chinese-English bilingual speakers in order to accommodate the limitation of the researcher’s

language abilities. Future studies should include bilingual speakers of different language pairs. Each

language has its own inherit cultural background and history, can instigates different sets of assump-

tions, and can possibly incite different user behaviors. It would be interesting to see if bilingual speak-

ers of different language pairs behave differently in their language selection approach. Similar investi-

gation should also be applied to monolingual speakers. Are monolingual users also influenced by ele-

ments of their language profile? To what degree?

Participants in the current study treated language as an instrument wielded by the speaker not

only to articulate a thought but also to convey ideas about the speaker’s assessment of the setting,

social association and self-identification. Language is loaded with cultural and the speaker’s personal

history and perceptions; users do not view each language impartially. How does this influence the way

information seekers approach information? Does it impact how they view information resources? Or

can the act of information seeking be separated from the social construct and exist in a neutral space?

Furthermore, does monolingual speakers also experience similar preconceptions and influences? These

are questions that should be further explored, perhaps borrowing insights from social linguists, and

bilingual scholars.

Bilingual speakers are heterogenous and come from a wide range of experiences and

background. The type of support and function they need would vary according to their language

profile. How long have they been using a language, what environment are they in, and what they

associate the languages with can all impact their language selection and information seeking

approaches. While this study is able to provide some understanding of information seeker’s motivation


behind language use, it does not, by any means, provide a full picture. However, the results of this

study illustrated the complexity and variety of the constitution of a bilingual speaker’s language profile.

It is important to identify the different types of bilingual speakers, recognize their strengths and

disadvantages, identify the groups that would most require cross language information seeking support,

and determine the type of features that would provide the right type of support. Through this process

can a CLIR system be designed to meet the demands and requirements of users. This study further

echoes Steichen et al. (2014) call for personalized support for cross language information retrieval

systems in order to account for individual user’s particular background, needs, and skill sets.

This study focuses on the relationship between the observed variables and the language selec-

tion outcome. Although the interactions of a few variables, such as the relationship between dominant

language and survey language, were explored, there are others with relationships worth investigating in

future studies, such as the effect of language exposure at home versus at work. Another aspect that have

not been studied is the strength of the impact. This study found language preference and language dom-

inance to be influential. How does the strength of their impact compare to that of the impact of profi-

ciency? Does any variable’s influence suppress other variables’?

Furthermore, many of the variables in this study generated non-normally distributed data. Some

variables have outliers, such as English proficiency mapped by number of years living the US, that are

worth further exploration. What led these participants to have such abnormal language profile? Does

their background change their language attitude and language use? What is different in their environ-

ment. Future research should also examine if a larger set of data be more normally distributed, dimin-

ishing the effect of outliers; or if the profiles of bilingual information seekers are non-normal in nature.

Lastly, this study focused on digital text but information seekers use the Internet for many other

reasons, and search for much more than text documents. Do users handle language use for other media

sources differently? Does the role of language diminish or increase for different online activities? These


are questions that would need to be answered by future studies.


References

Agheyisi, R., & Fishman, J. A. (1970). Language attitude studies: A brief survey of methodological approaches. Anthropological linguistics, 137-157.

Airio, E., & Kettnen, K. (2009). Does dictionary-based bilingual retrieval work in a non-normalized index? Information Processing & Management, 45(6), 703-713.

Allan, J., Callan, J., Croft, W. B., Ballesteros, L., Byrd, D., Swann, R., and Xu, J. (1997). INQUERY does battle with TREC-6. In Proceedings of the Sixth Text REtrieval Conference (TREC-6), NIST, 169--206. Retrieved on November 22, 2005, from http://citeseer.ist.psu.edu/broglio94inquery.html.

Androutsopoulos, J. (2006). Multilingualism, diaspora, and the Internet: Codes and identities on German‐based diaspora websites. Journal of Sociolinguistics, 10(4), 520-547.

Androutsopoulos, J. (2013). Code-switching in computer-mediated communication. In , S.C. Herring, D. Stein, T. Virtanen (Eds). Pragmatics of Computer-Mediated Communication, 659-686. Berlin/New York: Mouton de Gruyter.

Aparicio, X. & Lavaur, J. (2013). Recognising words in three languages: effects of language dominance and language switching. International Journal of Multilingualism, 11(2), 164-181. DOI: 10.1080/1479-718.2013.783583.

Artandi, S. (1973). Information concepts and their utility. Journal of the American Society for Information Science, 24(4), 242-245.

Artiles, J., Gonzalo, J., Lopez-Ostenero, F., & Peinado, V. (2006). Are users willing to search cross-language? An experiment with the Flickr image sharing repository. In C. Peters, P. Clough, F. C. Gey, J. Karlgren, P. C. Carol Peters, & B. Magnini (Ed.), In Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval (CLEF'06) (pp. 195-204). Berlin, Heidelberg: Springer-Verlag.

Aula, A., & Kellar, M. (2009, April). Multilingual search strategies. In CHI'09 Extended Abstracts on Human Factors in Computing Systems (pp. 3865-3870). ACM.

Ayers, J.W. (August, 2010). Measuring English proficiency and language preference: Are self-reports valid? American Journal of Public Health, 100(8), 1364-1366.

Azarbonyad, H., Shakery, A., & Faili, H. (2012). Using learning to rank approach for parallel corpora based cross language information retrieval. In ECAI (pp. 79-84).

Backus, A. (2005). Codeswitching and language change: One thing leads to another?. International Journal of Bilingualism, 9(3-4), 307-340.

Bahrick, H. P., Hall, L. K., Goggin, J. P., Bahrick, L. E., & Berger, S. A. (1994). Fifty years of language maintenance and language dominance in bilingual Hispanic immigrants. Journal of Experimental Psychology: General, 123(3), 264-283.

Baker, C. (1992). Attitudes and language (Vol. 83). Tonawanda, NY: Multilingual Matters.

http://citeseer.ist.psu.edu/broglio94inquery.html


Baker, C. (2011). Foundations of Bilingual Education (5th Ed). Tonawanda, NY: Multilingual Matters.Baker, C., & Jones, S. P. (Eds.). (1998). Encyclopedia of bilingualism and bilingual education.

Multilingual Matters.Ballesteros, L. and Croft, W. B. (1997). Phrasal translation and query expansion techniques for cross-

language information retrieval. In Proceedings of the 20th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Philadelphia, Pennsylvania, United States, July 27 - 31, 1997). N. J. Belkin, A. D. Narasimhalu, P. Willett, and W. Hersh, Eds. SIGIR '97. ACM Press, New York, NY, 84-91. Retrieved: 10/4/05, from ACM Portal.

Ballesteros L., Croft, W.B., (1998). Resolving ambiguity for cross-language retrieval. In Proceedings of the 21st Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Melbourne, Australia, August 24 - 28, 1998). SIGIR '98. ACM Press, New York, NY, 64-71. Retrieved: 10/4/05, from ACM Portal.

Ballesteros, L., & Sanderson, M. (2003, November). Addressing the lack of direct translation resources for cross-language retrieval. In Proceedings of the twelfth international conference on Information and knowledge management (pp. 147-152). ACM.

Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E. (2009). The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Language resources and evaluation, 43(3), 209-226.

Bates, M. J. (2006). Fundamental forms of information. Journal of the American Society for Information Science and Technology, 57(8), 1033-1045.

Bates, M.J. (2009). Inforamtion Behavior. In Encyclopedia of Library and Information Sciences, 3rd Edition (pp. 22381-2391). New York, NY: Taylor and Francis.

Becker, K. R. (1997). Spanish/English bilingual codeswitching: A syncretic model. Bilingual Review, 22(1), 3-30

Belkin, N. J. (1978). Information concepts for information science. Journal of documentation, 34(1), 55-85.

Belkin, N. J. (1980). Anomalous states of knowledge as a basis for information-retrieval. Canadian Journal of Information Science-Revue Canadienne Des Sciences De L Information, 5(May), 133-143.

Belkin, N. J. (2000). Helping people find what they don't know. Communications of the ACM, 43(8), 58-61.

Belkin, N. J., & Robertson, S. E. (1976). Information science and the phenomenon of information. Journal of the American Society for Information Science, 27(4), 197-204.

Belkin, N. J., Oddy, R. N., & Brooks, H. M. (1982). ASK for information retrieval: Part I. Background and theory. Journal of documentation, 38(2), 61-71.


Bedore, L. M., Pena, E. D., Summers, C. L., Boerger, K. M., Resendiz, M. D., Greene, K., Bohman, T. M. & Gillam, R. B. (2012). The measure matters: Language dominance profiles across measures in Spanish–English bilingual children. Bilingualism: Language and Cognition, 15(03), 616-629.

Ben Romdhane, W., Elayeb, B., Bounhas, I., Evrard, F., & Ben Saoud, N.B. (2013). A possibilistic query translation approach for cross-language information retrieval. Lecture Notes in Computer Science, 7996, 73-82.

Bhatia, T.K. & Ritchie, W.C. (Eds.) (2013). The Handbook of Bilingualism and Multilingualism. Chichester, UK: Blackwell Publishing.

Bialystok, E., Craik, F.I.M., & Luk, G. (2013). Bilingualism Consequences for mind and brain. Trends in Cognitive Science, 16(4), 240-250. DOI: 10.1016/j.tics.2012.03.001.

Bilingualism [Def. 1]. (n.d.). In OED Online. Retrieved Jun 12, 2014, from http://0-www.oed.com.library.simmons.edu/view/Entry/18968.

Bilingual [Def. 3]. (n.d.). In OED Online. Retreived June 12, 2014, from http://0-www.oed.com.library.simmons.edu/view/Entry/18967.

Birdsong, D. (2006). Dominance, proficiency, and second language grammatical processing. AppliedPsycholinguistics, 27, 46–49.

Blom, J, & Gumperz, J. (1972). Social meaning in linguistic structures: Code switching in Northern Norway. In J. Gumperz and D. Hymes (Eds). Directions in Sociolinguistics: The Ethnography of Communication, 407-434. New York, NY: Holt, Rinehart, and Winston.

Bloomfield, L. (1935). Language. London: Allen and Unwin.Bokset, R (2006). The Long Story of Short Forms: The Evolution of Simplified Chinese Characters.

Stockholm East Asian Monographs, No. 11. Stockholm: Department of Oriental Languages, Stockholm University.

Boslaugh, S. & Watters, P.A. (2008). Statistics in a Nutshell: A Destop Quick Reference. Sebastopol, CA: O'Reilly Media, Inc.

Broglio, J., Callan, J. P., Croft, W. B. (1994). INQUERY system overview. Project TIPSTER Text Program, Phase I. Retrieved on November 27, 2005, from: http://citeseer.ist.psu.edu/broglio94inquery.html.

Buchweitz, A., & Prat, C. (2013). The bilingual brain: Flexibility and control in the human cortex. Physics of Life Reviews, 10(4), 428-443.

Buckland, M. K. (1991). Information as thing. Journal of the Association for Information Science and Technology, 42(5), 351-360.

Byström, K., & Järvelin, K. (1995). Task complexity affects information seeking and use. Information Processing & Management, 31(2), 191-213.

Caldas, S.J., & Caron-Caldas, S. (2002). A sociolinguistic analysis of the language preferences of adolescent bilinguals: Shifting allegiances and developing identities. Applied Linguistics, 23(4),

http://citeseer.ist.psu.edu/broglio94inquery.html


490-514.Capurro, R., & Hjørland, B. (2003). The concept of information. Annual review of information science

and technology, 37(1), 343-411.Cartoni, B., Zufferey, S., & Meyer, T. (2013). Using the Europarl corpus for cross-linguistic research.

Belgian Journal of Linguistics, 27(1), 23-42. doi:10.1075/bjl.27.02carCashman, H. R. (2005). Identities at play: language preference and group membership in bilingual talk

in interaction. Journal of Pragmatics, 37(3), 301-315.Chau, R. and Yeh, C., (2002). Explorative multilingual text retrieval based on fuzzy multilingual

keyword classification. In Proceedings of the 5th International Workshop Information Retrieval with Asian Languages, 33-40. Retrieved 10/4/05, from ACM Portal.

Chen, J. & Bao, Y. (2009, March). Cross-language search: The case of Google Language Tools. First Monday, 14(3).

Chen, J., Ding, R., Jiang, S., & Knudson, R. (2012). A preliminary evaluation of metadata records machine translation. The Electronic Library, 30(2), 264-277.

Cheng, P., Teng, J., Chen, R., Wang, J., Lu, W., and Chien, L. (2004). Translating unknown queries with web corpora for cross-language information retrieval. In Proceedings of the 27th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Sheffield, United Kingdom, July 25 - 29, 2004). SIGIR '04. ACM Press, New York, NY, 146-153. Retrieved: 10/17/05, from ACM Portal.

Cherciov, M. (2013). Investigating the impact of attitude on first language attrition and second language acquisition from a Dynamic Systems Theory perspective. International Journal Of Bilingualism, 17(6), 716-733. doi:10.1177/1367006912454622

Chiao, Y. C., & Zweigenbaum, P. (2002, August). Looking for candidate translational equivalents in specialized, comparable corpora. In Proceedings of the 19th international conference on Computational linguistics-Volume 2 (pp. 1-5). Association for Computational Linguistics.

Chung, W. (2008). Web searching in a multilingual world. Communications of the ACM, 51(5), 32-40.Cimiano, P., Schultz, A., Sizov, S., Sorg, P., & Staab, S. (2009). Explicit versus latent concept models

for cross-language information retrieval. Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence, (pp. 1513-1512).

Clough, P. & Eleta, I. (2010) Investigating language skills and field of knowledge on multilingual information access in digital libraries. International Journal of Digital Library Systems, 1(1). DOI: 10.4018/jdls.2010102705.

Cutting, D., Kupiec, J., Pedersen, J., and Sibun, P. (1992) A practical part-of-speech tagger.

Proceedings of the 3rd Conference on Applied Natural Language Proceeding, 133-140. DOI: 10.3115/9774499.974523.


Davis, M. (1996). New experiments in cross-language text retrieval at NMSU's Computing Research Lab. In the Fifth Text Retrieval Conference (TREC-5), Gaithersburg, MD: National Institute of Standards and Technology, 1996, 447-453. Retrieved November 21, 2005, from http://www.scils.rutgers.edu/~muresan/IR/TREC/Proceedings/t5_proceedings/t5_proceedings.html.

Davis, M. (1998). On the effective use of large parallel corpora in cross-language text retrieval. In G. Grefenstette ed. Cross-Language Information Retrieval. Kluwer Academic Publisher, 11-22.

Davis, M. W. and Ogden, W. C. (1997). QUILT: implementing a large-scale cross-language text retrieval system. In Proceedings of the 20th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Philadelphia, Pennsylvania, United States, July 27 - 31, 1997). N. J. Belkin, A. D. Narasimhalu, P. Willett, and W. Hersh, Eds. SIGIR '97. ACM Press, New York, NY, 92-98. Retrieved: 10/10/05, from ACM Portal.

Demnar-Fushman, D. & Oard, D.W. (2003, February) The effect of bilingual term list size on dictionary-based cross-language information retrieval. Paper presented at the Thirty-Sixth Hawaii International Conference on System Sciences (HICSS) (Hawaii, Jan 6-9, 2003).

DePalma, D.A. (July 11, 2012). Microsoft aims to be the machine translation hub of global business. Retrieved from: http://www.commonsenseadvisory.com/Default.aspx?Contenttype=ArticleDetAD&tabID=63&Aid=2908&moduleId=390.

Dervin, B. (1983). An overview of sense-making research: Concepts, methods and results. Paper presented at the Annual Meeting of the International Communication Association. Dallas, TX.

Dewaele, J.M. (2007). Multilinguals' language choice for mental calculation. Intercultural Pragmatics, 4(3), 343-376. doi:10.1515/IP.2007.017.

Diekema, A. R. (2012). Multilinguality in the digital library: a review. The Electronic Library, 30(2), 165-181.

Dyvik, H. (2004). Translations as semantic mirrors: from parallel corpus to wordnet. Language and computers, 49(1), 311-326.

Ecke, P., & Hall, C. J. (2013). Tracking tip-of-the-tongue states in a multilingual speaker: Evidence of attrition or instability in lexical systems?. International Journal of Bilingualism, 17(6), 734-751.

Edmonds, L. A., & Oetting, J. (2013). Correlates and Cross-Linguistic Comparisons of Informativeness and Efficiency on Nicholas and Brookshire Discourse Stimuli in Spanish/English Bilingual Adults. Journal Of Speech, Language & Hearing Research, 56(4), 1298-1313. doi:10.1044/1092-4388(2012/12-0065)

Ellis, D. (1989). A behavioural approach to information retrieval system design. Journal of Documentation, 45(3), 171-212.

Ellis, R. (1994). The study of second language acquisition. Oxford University Press.Erdelez, S. (1999). Information encountering: It's more than just bumping into information. Bulletin of

the American Society for Information Science, 25(3). Accessed from:

http://www.scils.rutgers.edu/~muresan/IR/TREC/Proceedings/t5_proceedings/t5_proceedings.html


http://www.asis.org/Bulletin/Feb-99/erdelez.html . Ge, X. (2010). Information-seeking behavior in the digital age: A multidisciplinary study of academic researchers. College & Research Libraries, 7(5), 4435-455.

Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data. (2nd ed). Cambridge, MA: MIT press.

Evans, D. A., Handerson, S. K., Monarch, I. A., Pereiro, J., Delon, L., and Hersh, W. R., (1998). Mapping vocabularies using latent semantics. In G. Grefenstette ed. Cross-Language Information Retrieval. Kluwer Academic Publisher, 63-80.

Farradane, J. (1980). Knowledge, information, and information science. Journal of Information Science, 2(2), 75-80.

Fishman, J.A. (1965). Who speaks what language to whom and when? La Linguistique, 1(2), 67-88.Franz, M., McCarley, J. S., Roukos, S., (1999). Ad Hoc and Multilingual information Retrieval at IBM.

In Proceedings of the Sixth Text REtrieval Conference (TREC-7), NIST. 157-168. Retrieved November 23, 2005, from http://trec.nist.gov/pubs/trec7/t7_proceedings.html.

Francis, N. (2012). Bilingual Competence and Bilingual Proficiency in Child Development. Cambridge, Mass: The MIT Press.

Fujii, A., Ishikawa, T. (2000). Applying machine translation to two-stage cross-language information retrieval. Proceedings of the 4th Conference of the Association for Machine Translation in the Americas (AMTA-2000), Oct. 2000, 13-24. Retrieved November 21, 2005, from http://arxiv.org/abs/cs.CL/0011003.

Gao, J. and Nie, J.Y. (2006). A study of statistical models for query translation: finding a good unit of translation. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, USA, 194–201.

Gao, J., Nie, J. Y., Xun, E., Zhang, J., Zhou, M., & Huang, C. (2001, September). Improving query translation for cross-language information retrieval using statistical models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 96-104). ACM.

Gao, J., Zhou, M., Nie, J. Y., He, H., & Chen, W. (2002, August). Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 183-190). ACM.

Gaspari, F. (2004). Online MT services and real users’ needs: An empirical usability evaluation. In Machine Translation: From Real Users to Research (pp. 74-85). Springer Berlin Heidelberg.

Gathercole, V. C. M., & Thomas, E. M. (2009). Bilingual first-language development: Dominant language takeover, threatened minority language take-up. Bilingualism: Language and Cognition, 12(02), 213-237.

Ge, X. (2010). Information-seeking behavior in the digital age: A multidisciplinary study of academic

http://arxiv.org/abs/cs.CL/0011003

http://trec.nist.gov/pubs/trec7/t7_proceedings.html

http://www.asis.org/Bulletin/Feb-99/erdelez.html


researchers. College & Research Libraries, 71(5), 435-455.Genesee, F. & Bourhis, R.Y. (1988). Evaluative reactions to language choice strategies: The role of

sociostructural factors. Language & Communication, 8(3/4), 229-250.Georgalidou, M., Kaili, H., & Celtek, A. (2010). Code alternation patterns in bilingual family

conversation: A conversation analysis approach. Journal of Greek Linguistics, 10(2), 317-344. doi:10.1163/156658410X531401

Gertken, L. M., Amengual, M., & Birdsong, D. (2014). Assessing language dominance with the Bilingual Language Profile. Measuring L2 proficiency: Perspectives from SLA, 208-225.

Gey, F. C., Kando, N., & Peters, C. (2005). Cross-language information retrieval: The way ahead. Information Processing and Management, 41, 415-431. DOI: 10.1016/j.ipm.2004.06.006.

Gollins, T. and Sanderson, M. (2001). Improving cross language retrieval with triangulated translation. In Proceedings of the 24th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (New Orleans, Louisiana, United States). SIGIR '01. ACM Press, New York, NY, 90-95. Retrieved: 11/10/05, from ACM Portal.

Gong, Y., Chow, I., & Ahlstrom, D. (2011). Cultural diversity in China: Dialect, job embeddedness, and turnover. Asia Pacific Journal of Management, 28(2), 221. doi:10.1007/s10490-010-9232-6

Goodwin, C. & Heritage, J. (1990). Conversation Analysis. Annual Review of Anthropology, 19, 283-307.

Google (2013, December 10). Google Translator - now in 80 languages. [Blog post]. Retrieved from http://googletranslate.blogspot.com/2013/12/google-translate-now-in-80-languages.html.

Gray, N. J., Klein, J. D., Noyce, P. R., Sesselberg, T. S., & Cantrill, J. A. (2005). Health information-seeking behaviour in adolescence: the place of the internet. Social Science & Medicine, 60(7), 1467-1478.

Greene, K.J., Pena, E.D., & Bedore, L.M. (2012). Lexical choice and language selection in bilingual preschoolers. Child Language Teaching and Therapy, 29(1), 27-39.

Grosjean, F. (1982). Life with two languages: An introduction to bilingualism. Harvard University Press.

Grosjean, F. (1998). Studying bilinguals: Methodological and conceptual issues. Bilingualism: Language and cognition, 1(02), 131-149.

Grosjean, F. (2008). Studying bilinguals. Oxford University Press.Grosjean, F. (2012). Bilingual: Life and Reality. Cambridge, MA: Harvard University Press.Hakuta, K., & d'Andrea, D. (1992). Some properties of bilingual maintenance and loss in Mexican

background high-school students. Applied Linguistics, 13(1), 72-99.Hamers, J. F., & Blanc, M. H. (2000). Bilinguality and Bilingualism. Cambridge University Press.He, D., Oard, D. W., & Plettenberg, L. (2006). Studying the use of interactive multilingual information

retrieval. In Proceedings of the Worksho pon New Directions in Multilingual Information Access, pp.

http://googletranslate.blogspot.com/2013/12/google-translate-now-in-80-languages.html


53-60. ACM-SIGIR 2006, Seattle, Washington, USA. He, D., Wang, J., Oard, D. W., & Nossal, M. (2002, September). Comparing user-assisted and

automatic query translation. In Workshop of the Cross-Language Evaluation Forum for European Languages (pp. 400-415). Springer Berlin Heidelberg.

He, D. & Wu, D. (2008). Translation enhancement: a new relevance feedback method for cross-language information retrieval. In Proceedings of the 17th ACM conference on Information and knowledge management (CIKM '08). ACM, New York, NY, USA, 729-738. DOI=10.1145/1458082.1458180 http://0-doi.acm.org.library.simmons.edu/10.1145/1458082.1458180

Heller, M. (1992). The politics of codeswitching and language choice. Journal of Multilingual & Multicultural Development, 13(1-2), 123-142.

Hansen, P., Liu, C., & Zhang, P. (2016). I Need More Time!: The Influence of Native Language on Search Behavior and Experience. CLEF.

Herbert, B., Szarvas, G., & Gurevych, I. (2011). Combining query translation techniques to improve cross-language information retrieval. In Advances in Information Retrieval (pp. 712-715). Berlin, Germany: Springer Berlin Heidelberg.

Hiemstra, D. and de Jong, F., (1999). Disambiguation strategies for cross-language information retrieval. In Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries, 274-293. Retrieved November 23, 2005, from http://citeseer.ist.psu.edu/hiemstra99disambiguation.html.

Hiemstra, D. and Kraaij, W. (1999). Twenty-One at TREC-7: Ad-hoc and cross-language track. In Proceedings of the Seventh Text REtrieval Conference (TREC-7), NIST. Retrieved November 27, 2005, from http://citeseer.ist.psu.edu/82362.html

Hiemstra, D., Kraaij, W., Pohlmann, R., and Westerveld, T. (2000). Twenty-One at CLEF-2000: Translation resources, merging strategies and relevance feedback. In Working Notes for CLEF Workshop. Retrieved November 27, from http://clef.isti.cnr.it/DELOS/CLEF/Notes.html.

Hong, W. (2011). A descriptive user study of bilingual information seekers searching for online information to complete four tasks. (Unpublished doctoral dissertation). University of Pittsburgh. Pittsburgh, PA.

Hopp, H., & Schmid, M. S. (2013). Perceived foreign accent in first language attrition and second language acquisition: The impact of age of acquisition and bilingualism. Applied Psycholinguistics, 34(02), 417-417.

Hua, Z. (2008). Duelling languages, duelling values: Codeswitching in bilingual intergenerational conflict talk in diasporic families. Journal of Pragmatics, 40, 1799-1816.

Hughes, H. (2005). Actions and reactions: Exploring international students' use of online information resources. Australian Academic & Research Libraries, 36(4), 169-179.

http://clef.isti.cnr.it/DELOS/CLEF/Notes.html

http://citeseer.ist.psu.edu/82362.html

http://citeseer.ist.psu.edu/hiemstra99disambiguation.html


Hull, D. A. and Grefenstette, G. (1996). Querying across languages: a dictionary-based approach to multilingual information retrieval. In Proceedings of the 19th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Zurich, Switzerland, August 18 - 22, 1996). SIGIR '96. ACM Press, New York, NY, 49-57. Retrieved: 10/4/05, from ACM Portal.

Hupfer, M. E., & Detlor, B. (2006). Gender and Web information seeking: A self‐concept orientation model. Journal of the American Society for Information Science and Technology, 57(8), 1105-1115.

Ingwersen, P. (1996). Cognitive perspectives of information retrieval interaction: elements of a cognitive IR theory. Journal of Documentation, 52(1), 3-50.

Ianos, M. A., Huguet, À., Janés, J., & Lapresta, C. (2017). Can language attitudes be improved? A longitudinal study of immigrant students in Catalonia (Spain). International Journal of Bilingual Education and Bilingualism, 20(3), 331-345.

Ide, N., Erjavec, T., & Tufis, D. (2002, July). Sense discrimination with parallel corpora. In Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions, vol. 8 (pp. 61-66). Association for Computational Linguistics.

Inside Google Translate (n.d.). Retrieved from http://translate.google.com/about/intl/en_ALL/. Jansen, B.J., Booth, D.L., & Spink, A. (2008). Determining the informational, navigational, and

transactional intent of Web queries. Information Processing and Management, 44, 1251-1266.Jansen, B. & Spink, A. (2006). How are we searching the World Wide Web? A comparison of nine

search engine transaction logs. Information Processing and Management, 42(1), 248-263.Johansson, S. 2007. On the role of corpora in cross-linguistic research. In S. Johansson (ed.), Seeing

through multilingual corpora, 3–24. Amsterdam/Philadelphia: John Benjamins.Kaushanskaya, M., Gross, M., & Buac, M. (2014). Effects of classroom bilingualism on task‐shifting,

verbal memory, and word learning in children. Developmental science, 17(4), 564-583.Kasatkina, N. (2010). Analyzing language choice among Russian-speaking immigrants to the United

States. (Doctoral dissertation). Retrieved from The University of Arizona Campus Repository. (http://hdl.handle.net/10150/193622)

Keegan, T. T., & Cunningham, S. J. (2008). What a difference a default setting makes. In Research and Advanced Technology for Digital Libraries (pp. 264-267). Springer Berlin Heidelberg.

Kelly, D. (2006). Measuring online information seeking context. Part 1: Background and method. Journal of the American Society for Information Science and Technology, 57(13), 1729-1739.

Kimura, F., Maeda, A., Uemura, S. (2004). CLIR using Web directory at NTCIR4. In Working Notes of the Fourth NTCIR Workshop Meeting (Tokyo, Japan, June 2-4, 2004). Retrieved November 27, 2005, from http://research.nii.ac.jp/ntcir-ws4/NTCIR4-WN/CLIR/NTCIR4WN-CLIR-KimuraF.pdf.

Kishida, K. (2005). Technical issues of cross-language information retrieval: A review. Information Processing & Management: an International Journal, 41(3), 433-455.

Kishida, K. & Ishita, E. (2009). Translation disambiguation for cross-language information retrieval

http://research.nii.ac.jp/ntcir-ws4/NTCIR4-WN/CLIR/NTCIR4WN-CLIR-KimuraF.pdf

http://translate.google.com/about/intl/en_ALL/


using context-based translation probability. Journal of Information Science, 35(4), 481-495.Kishida, K. and Kando, N. (2005). Hybrid approach of query and document translation with pivot for

cross-language information retrieval. In Working Notes for the CLEF 2005 Workshop (Vienna, Austria, September 21-23, 2005). Retrieved November 27, 2005, from http://www.clef-campaign.org/2005/working_notes/CLEF2005WN-Contents1.htm.

Klavans, J., Hovy, E., Fluhr, C., Frederking, R., Oard, D., Okumura, A., Ishikawa, K., & Satoh, K.(2001). Multilingual (or cross-lingual) information retrieval. In E. Hovy, N. Ide, R. Frederking, J. Mariani, & A., Zompolli (Eds.) Multilingual Information Management: Current Levels and Future Abilities (pp. 35-56). http://www.cs.cmu.edu/~ref/mlim/chapter2.html.

Klein, D., Mok, K., Chen, J. K., & Watkins, K. E. (2013). Age of language learning shapes brain structure: A cortical thickness study of bilingual and monolingual individuals. Brain and language. DOI: 10.1016/j.bandl.2013.05.014.

Knight, C., & Studdert-Kennedy, M. (2000). The evolutionary emergence of language: social function and the origins of linguistic form. Cambridge University Press.

Kraaij, W. (2001). Comparing translation resources. In Proceedings of the CLEF-2001 Workshop. Retrieved November 22 2005, from http://citeseer.ist.psu.edu/kraaij01tno.html.

Kraaij, W., Nie, J., & Simard, M. (2003). Embedding Web-based statistical translation models in cross-language information retrieval. Computational Linguistics, 29(3), 281-419.

Kralisch, A. (2005) The impact of culture and language on the use of the Internet: Empirical analysis of behaviour and attitudes. (Doctoral dissertation). Retrieved from http://edoc.hu-berlin.de/dissertationen/kralisch-anett-2005-12-16/PDF/kralisch.pdf .

Kralisch, A., & Berendt, B. (2005). Language-sensitive search behaviour and the role of domain knowledge. New Review of Hypermedia and Multimedia, 11(2), 221-246.

Kuhlthau, C.C. (1991). Inside the search process: Information seeking from the user's perspective. Journal of the American Society for Information Science, 42(5), 361-371.

Kuhlthau, C.C. (1993). A principle of uncertainty for information seeking. Journal of Documentation, 49(4), 339–355.

Kuhlthau, C.C. (2005). Kuhlthau's information search process. In K.E. Fisher, S. Erdelez, & L.E.F. McKenchnie (Eds), Theories of Information Behavior, 230-234.

Larkey, L. S. and Connell, M. E. (2005). Structured queries, language modeling, and relevance modeling in cross-language information retrieval. Information Processing and Management: an International Journal, vol. 41, no. 3, 457-473. Retrieved 11/4/05, from Elsevier.

Larson, R. R., Gey, F., and Chen, A. 2002. Harvesting translingual vocabulary mappings for multilingual digital libraries. In Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries (Portland, Oregon, USA, July 14 - 18, 2002). JCDL '02. ACM Press, New York, NY, 185-190.

http://citeseer.ist.psu.edu/kraaij01tno.html

http://www.cs.cmu.edu/~ref/mlim/chapter2.html

http://www.clef-campaign.org/2005/working_notes/CLEF2005WN-Contents1.htm

http://www.clef-campaign.org/2005/working_notes/CLEF2005WN-Contents1.htm


Lavrenko, V., Choquette, M., and Croft, W. B. (2002). Cross-lingual relevance models. In Proceedings of the 25th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Tampere, Finland, August 11 - 15, 2002). SIGIR '02. ACM Press, New York, NY, 175-182. Retrieved: 11/4/05, from ACM Portal.

Lehtokangas, R., Airio, E., and Järvelin, K. (2004). Transitive dictionary translation challenges direct dictionary translation in CLIR. . Information Processing and Management: an International Journal, vol. 40, no. 6, 973-988. Retrieved 10/17/05, from Elsevier.

Levow, G., Oard, D., & Resnik, P. (2005). Dictionary-based techniques for cross-language information retrieval. Information Processing & Management, 41, 523-547.

Li, H. (2008). The role of L1 use in L2 writing process by Chinese EFL students: Six cases of non-English majors. Journal of Cambridge Studies, 3(2), 25-28.

Li, Y., & Belkin, N. J. (2010). An exploration of the relationships between work task and interactive information search behavior. Journal of the American Society for information Science and Technology, 61(9), 1771-1789.

Lim, V. P., Liow, S. J. R., Lincoln, M., Chan, Y. H., & Onslow, M. (2008). Determining language dominance in English–Mandarin bilinguals: Development of a self-report classification tool for clinical use. Applied Psycholinguistics, 29(03), 389-412.

Littman, M.L., Dumais, S.T. and Landauer, T.K, (1998). Automatic Cross-language Information Retrieval using Latent Semantic Indexing. In Grefenstette, G. ed. Cross Language Information Retrieval, Kluwer Academic Publishers, 51-62.

Longman Dictionary of Language Teaching and Applied Linguistics (4th ed.). (2010). Great Britain: Pearson Education Limited.

Lu, W., Chien, L., and Lee, H. (2004). Anchor text mining for translation of Web queries: A transitive translation approach. ACM Transaction on Information Systems, vol. 22, no. 2, 242-269. Retrieved at: 10/14/05, from ACM Portal.

Macnamara, J. (1967). The bilingual's linguistic performance—a psychological overview. Journal of Social Issues, 23(2), 58-77.

MacSwan, J. (2012). 13 Code-Switching and Grammatical Theory. In T.K. Bhatia & W.C. Ritchie (Eds). The Handbook of Bilingualism and Multilingualism (pp.323-.350). Chichester, UK: Blackwell Publishing.

Marian, V., Blumenfeld, H. K., & Kaushanskaya, M. (2007). The Language Experience and Proficiency Questionnaire (LEAP-Q): Assessing Language Profiles in Bilinguals and Multilinguals. Journal of Speech Language and Hearing Research, 50(4), 940-967. doi:10.1044/1092-4388(2007/067)

Marlow, J., Clough, P., Recuero, J. C., & Artiles, J. (2008). Exploring the Effects of Language Skills on multilingual Web search. Proceedings of the IR research, 30th European conference on Advances in information retrieval (ECIR'08) (pp. 126-137). Berlin, Heidelberg: Springer-Verlag.


McCarley, J. S. (1999). Should we translate the documents or the queries in cross-language information retrieval? In Proceedings of the 37th Annual Meeting of the Association For Computational Linguistics on Computational Linguistics (College Park, Maryland, June 20 - 26, 1999). Annual Meeting of the ACL. Association for Computational Linguistics, Morristown, NJ, 208-214. Retrieved November 10, 2005, from http://acl.ldc.upenn.edu//P/P99/P99-1027.pdf.

McNamee, P. and Mayfield, J. (2002). Comparing cross-language query expansion techniques by degrading translation resources. In Proceedings of the 25th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Tampere, Finland, August 11 - 15, 2002). SIGIR '02. ACM Press, New York, NY, 159-166. Retrieved 11/22/05, from ACM Portal.

McNamee, P. and Myfield, J. (2004). Corss-language retrieval using HAIRCUT for CLEF (2004). In Working Notes for the CLEF 2004 Workshop (Bath, United Kingdom, September, 2004). CLEF-(2004). Retrieved November 23, 2005, from http://www.clef-campaign.org/2004/working_notes/WorkingNotes2004/04.pdf.

Meuter, R. F., & Allport, A. (1999). Bilingual language switching in naming: Asymmetrical costs of language selection. Journal of Memory and Language, 40(1), 25-40.

Mills, J. (2001). Being bilingual: Perspectives of third generation Asian children on language, culture and identity. International Journal of Bilingual Education and Bilingualism, 4(6), 383-402.

Moore, D. S., McCabe, G. P., & Craig, B. A. (2009). Introduction to the practice of statistics. New York: WH Freeman.

Morel, E., Bucher, C., Pekarek-Doehler, S., & Siebenhaar, B. (2012). SMS communication as plurilingual communication: Hybrid language use as a challenge for classical code-switching categories. Lingvisticae Investigationes, 35(2), 260-288.

Moschkovich, J. (2005). Using two languages when learning mathematics. Educational Studies in Mathematics, 64, 121-144. DOI: 10.1007/s10649-005-9005-1.

Most common languages used on the internet as of June 2016, by share of internet users (June, 2016). Retrieved from: https://www.statista.com/statistics/262946/share-of-the-most-common-languages-on-the-internet/.

Myers-Scotton, C. (1998). A theoretical introduction to the markedness model. Codes and consequences: Choosing linguistic varieties, 18-38.

Ng, H. T., Wang, B., & Chan, Y. S. (2003, July). Exploiting parallel texts for word sense disambiguation: An empirical study. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol 1 (pp. 455-462). Association for Computational Linguistics.

Nie, J. Y. (2010). Cross-language information retrieval. Synthesis Lectures on Human Language Technologies, 3(1), 1-125.

Nie, J. Y., and Cai, J. (2001). Filtering noisy parallel corpora of Web pages. In IEEE Symposium on Natural Language Processing and Knowledge Engineering (Tucson, AZ, October), 453-458.

http://www.clef-campaign.org/2004/working_notes/WorkingNotes2004/04.pdf

http://acl.ldc.upenn.edu//P/P99/P99-1027.pdf


Nie, J. Y., Simard, M., Isabelle, P., and Durand, R. (1999). Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In Proceedings of the 22nd Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Berkeley, California, United States, August 15 - 19, 1999). SIGIR '99. ACM Press, New York, NY, 74-81. Retrieved: 10/17/05, from ACM Portal.

Nilep, C. (2006). Code switching in sociocultural linguistics. Colorado Research in Linguistics, 19(1), 1-22.

Nzomo, P., Rubin, V. L., & Ajiferuke, I. (2012, February). Multi-lingual information access tools: user survey. In Proceedings of the 2012 iConference (pp. 530-532). ACM.

Qahfarokhi , A.S. & Biria, R. (2012). The impact of task difficulty and language proficiency on Iranian EFL learner's code-switching in writing. Theory and Practice in Language Studies, 2(3), 572-578.

Oard, D. W. (1998). A Comparative Study of Query and Document Translation for Cross-Language Information Retrieval. In Proceedings of the Third Conference of the Association For Machine Translation in the Americas on Machine Translation and the information Soup (October 28 - 31, 1998). D. Farwell, L. Gerber, and E. H. Hovy, Eds. Lecture Notes in Computer Science, vol. 1529. Springer-Verlag, London, 472-483. Retrieved November 23, 2005, from http://citeseer.ist.psu.edu/oard98comparative.html.

Oard, D. W. (2009). Multilingual information access. In Bates, M. J., and Maack, M. N. (Eds),

Encyclopedia of Library and Information Sciences, 3rd Ed., Taylor and Francis.Och, F. (2007, May 23). Search without boundaries. [Blog post]. Retrieved from:

http://googleblog.blogspot.com/2007/05/search-without-boundaries.html . Pecina, P., Dušek, O., Goeuriot, L., Hajič, J., Hlaváčová, J., Jones, G. J., ... & Popel, M. (2014).

Adaptation of machine translation for multilingual information retrieval in the medical domain. Artificial intelligence in medicine, 61(3), 165-185.

Peinado, V., Artiles, J., Gonzalo, J., Barker, E., & López-Ostenero, F. (2008, September). FlickLing: a multilingual search interface for Flickr. In Working Notes for the CLEF 2008 Workshop, Aarhus, Denmark.

Peinado, V., Rodrigo, Á., & López-Ostenero, F. (2013). Multilingual Information Access. Emerging Applications of Natural Language Processing: Concepts and New Research, 203. DOI: 10.4018/978-1-4666-2169-5.ch009

Peters, C., Braschler, M., & Clough, P. (2012). Multilingual Information Retrieval: From Research to Practice. Heidelberg, Germany: Springer.

Peters, C., Clough, P., Gey, F. C., Karlgren, J., & Magnini, B. (Eds.). (2007). Evaluation of Multilingual and Multi-modal Information Retrieval: 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, Alicante, Spain, September 20-22, 2006, Revised Selected Papers. Springer.

http://googleblog.blogspot.com/2007/05/search-without-boundaries.html

http://citeseer.ist.psu.edu/oard98comparative.html


Peters, C., & Sheridan, P. (2001). Multilingual information access. In Lectures on Information Retrieval (pp. 51-80). Springer Berlin Heidelberg.

Petrelli, D., Beaulieu, M., Sanderson, M., Demetriou, G., Herring, P., & Hansen, P. (2004). Observing users, designing clarity: A case study on the user-centered design of a cross-language information retrieval system. Journal of the American Society for Information Science and Technology, 55(10), 923-934.

Petrelli, D., Hansen, P., Beaulieu, M., Sanderson, M., Demetriou, G. and Herring, P. (2004) Observing Users - Designing CLARITY a case study on the user-centred design of a cross-language information retrieval system. Journal of the American Society for Information Science and Technology, 55 (10). pp. 923-934.

Picchi, E., Peters, C. (1998). Cross-language information retrieval: A system for comparable corpus querying. In G. Grefenstette ed. Cross-Language Information Retrieval. Kluwer Academic Publishers, Norwell, MA, 81-92.

Pirkola, A. (1998). The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In Proceedings of the 21st Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Melbourne, Australia, August 24 - 28, 1998). SIGIR '98. ACM Press, New York, NY, 55-63. Retrieved: 10/4/05, from ACM Portal.

Pirkola, A., Hedlund, T., Keskustalo, H., and Javelin, K. (2001). Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings. Information Retrieval 4, 3-4 (Sep. 2001), 209-230. Retrieved November 21, 2005, from http://www.info.uta.fi/tutkimus/fire/archive/dictionary_based.pdf.

Pirkola, A., Cosijn, E., Bothma, T., Nel, J. (2002). Cross-lingual information access in indigenous languages: a case study in Zulu language. In Emerging frameworks and methods, Proceedings of the Fourth International Conference on Conceptions of Library and Information Science, CoLIS4, Seattle, USA, 21 - 25 July 2002. Retrieved November 29, 2005, from http://ucdata.berkeley.edu:7101/sigir-2002/sigir2002CLIR-10-pirkola.pdf.

Ponte, J. M. and Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Melbourne, Australia, August 24 - 28, 1998). SIGIR '98. ACM Press, New York, NY, 275-281. Retrieved 11/10/05, from ACM Portal.

Potthast, M., Stein, B., & Anderka, M. (2008). A Wikipedia-based multilingual retrieval model. Lecture Notes in Computer Science, 4956, 522-530.

Prochasso, E. & Fung, P. (2011) Rare word translation extraction from aligned comparable documents.

In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 1327-1335.

http://ucdata.berkeley.edu:7101/sigir-2002/sigir2002CLIR-10-pirkola.pdf

http://www.info.uta.fi/tutkimus/fire/archive/dictionary_based.pdf


Purwarianti, A., Tsuchiya, M., & Nakagawa, S. (2007). Indonesian-Japanese transitive translation using English for CLIR. Information and Media Technologies, 2(2), 612-640.

Qi, D. S. (1998). An inquiry into language-switching in second language composing processes. Canadian Modern Language Review/La Revue Canadienne des Langues Vivantes, 54(3), 413-435.

Ramirez, J.M.P. (2012). Language switching: A qualitative clinical study of four second language learners' composing processes. (Unpublished doctoral dissertation). University of Iowa.

Rao, V. S. and Varma, V. (2010) User behavior in a multilingual information access task. Centre for Search and Information Extraction Lab, International Institute of Information Technology, Report No: IIIT/TR/2010/30. Hyderabad: India.

Reid, S. A., & Wood, V. V. (2013). An Empirical Examination of the Relationship between Bilingual Acculturation, Cultural Heritage to Identity, and Self-Esteem. National Social Science Journal, 40(2), 94-99.

Resnik, P. and Smith, N. A. 2003. The Web as a parallel corpus. Computational Linguistic, vol. 29, no. 3, 349-380. Retrieved October 14, 2005, from http://nlp.cs.jhu.edu/~nasmith/webascorpus.pdf.

Rezaei, S. H. S., & Gheitanchian, M. (2008, December). Code Mixing or Code Switching? A case study: Native Speakers of Turkish in Farsi Production. In Global Practices of Language Teaching: Proceedings of the 2008 International Online Language Conference (IOLC 2008) (p. 61-67). Universal-Publishers.

Rieh, H. Y., & Rieh, S. Y. (2005). Web searching across languages: Preference and behavior of bilingual academic users in Korea. Library & Information Science Research, 27(2), 249-263.

Rieh, S. Y. (2004). On the Web at home: Information seeking and Web searching in the home environment. Journal of the American Society for Information Science and Technology, 55(8), 743-753.

Rogati, M. and Yang, Y. (2004). Resource selection for domain-specific cross-lingual IR. In Proceedings of the 27th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Sheffield, United Kingdom, July 25 - 29, 2004). SIGIR '04. ACM Press, New York, NY, 154-161. Retrieved on 10/17/05, from ACM Portal.

Ruiz, M. E., & Chin, P. (2010). Users’ image seeking behavior in a multilingual tag environment. In Multilingual Information Access Evaluation II. Multimedia Experiments (pp. 37-44). Springer Berlin Heidelberg.

Russell, D.M. (April 23, 2013). When to use "Translated foreign pages"? Retrieved from: http://searchresearch1.blogspot.com/2013/04/ramong-writes-in-with-great-question-i.html.

Russell, D. M., & Grimes, C. (2007, January). Assigned tasks are not the same as self-chosen Web search tasks. In System Sciences, 2007. HICSS 2007. 40th Annual Hawaii International Conference (pp. 83-83). IEEE.

Sadat, F. (2010). Using comparable corpora to improve the effectiveness of cross-language information

http://nlp.cs.jhu.edu/~nasmith/webascorpus.pdf


retrieval. In H. Loftsson, E. Rognvaldsson, & S. Helgadottir (Eds). 7th International Conference on NLP, Paper presented at IceTAL 2010, Reykjavik, Iceland, August 16-18, 2010

Sabahat, P. (2013). A study on reasons for code-switching in Facebook by Pakistani Urdu English bilinguals. Language in India, 13(11), 564-590.

Saracevic, T. (1997). The stratified model of information retrieval interaction: Extension and applications. Proceedings of the American Society for Information Science, 34, 313-327.

Saralegi, X. & de Lacalle, M.L. (2010). Estimating translation probabilities from the Web for structured queries on CLIR. In C. Gurrin, Y. He, G. Kazai, U. Kruschwitz, S. Little, T. Roelleke, S. Ruger, and

K. van Rijsberen (Eds) Advances in Information Retrieval: 32nd European Conference on IR Research, 586-589, presented at ECIR 2010, Milton Keynes, UK, March 28-31, 2010.

Saville-Troike, M. (2008). The Ethnography of Communication: An Introduction (3rd Ed). Oxford, UK: John Wiley & Sons.

Savolainen, R. (1995). Everyday life information seeking: Approaching information seeking in the context of “way of life”. Library & Information Science Research, 17(3), 259-294.

Savoy, J. and Dolamic, L. (2009) How effective is Google's translation service in search? Communications of the ACM, 52(10), 139-145.

Schäfer, R., & Bildhauer, F. (2012). Building large corpora from the Web using a new efficient tool chain. In LREC (pp. 486-493).

Schmid, M. S. (2010). Languages at play: The relevance of L1 attrition to the study of bilingualism. Bilingualism: Language and Cognition, 13(1), 1-7.

Schwartz, B. (2013, May 23). Google Drops “Translated Foreign Pages” Search Option Due to Lack of Use. Search Engine Land.

Scotton, C. M., & Ury, W. (1977). Bilingual strategies: The social functions of code-switching. International Journal of the Sociology of Language, 1977(13), 5-20.

Shakery, A., & Zhai, C. (2013). Leveraging comparable corpora for cross-lingual information retrieval in resource-lean language pairs. Information Retrieval, 16(1), 1-29.

Shannon, C. (1948). A Mathematical Theory of Communication. Bell System Technical Journal 27, 379-423, 623-656.

Sheridan, P. and Ballerini, J. P. (1996). Experiments in multilingual information retrieval using the SPIDER system. In Proceedings of the 19th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Zurich, Switzerland, August 18 - 22, 1996). SIGIR '96. ACM Press, New York, NY, 58-65. Retrieved on 10/4/05, from ACM Portal.

Smith-Christmas, Cassie (2012) I've lost it here dè a bh' agam: Language shift, maintenance, and code-switching in a bilingual family. (Unpublished doctoral dissertation). Retrieved from Glasgow Theses Service. (glathesis:2012-3798)


Soliman, A. (2008). The changing role of Arabic in religious discourse: A sociolinguistic study of Egyptian Arabic. (Unpublished doctoral dissertation). Indiana University of Pennsylvania, PA.

Somers, H. (1999). Review article: Example-based machine translation. Machine Translation, 14(2), 113-157.

Sperlich, W. B. (2005). Will Cyberforums Save Endangered Languages? A Niuean Case Study. International Journal Of The Sociology Of Language, 2005(172), 51-77.

Srinivasarao, V. (2010) Mining the behaviour of users in a multilingual information access task. In: CLEF 2008 Workshop Notes, Aarhus, Denmark, September 17-19 (2008).

Sterling, G. (2007, May 24). Google Launches ‘Cross-Language Information Retrieval (CLIR)’. Retrieved May 29, 2013, from Search Engine Land: http://searchengineland.com/google-launches-cross-language-information-retrieval-clir-11296

Suarez, D. (2002). The paradox of linguistic hegemony and the maintenance of Spanish as a heritage language in the United States. Journal of Multilingual and Multicultural Development, 23(6), 512-530.

Tomala, A. M. (2016). The Taiwanese linguistic mosaic. Multilingual, 27(7), 39-43Taylor, R. S. (1968). Question-negotiation and information seeking in libraries. College & research

libraries, 29(3), 178-194.Ture, F., Lin, J., & Oard, D.W. (2012). Combining statistical translation techniques for cross-language

information retrieval. Proceedings of COLING 2012: Technical Papers (Mumbai, December 2012), 2685-2702.

Van Heuven, W. J., & Dijkstra, T. (2010). Language comprehension in the bilingual brain: fMRI and ERP support for psycholinguistic models. Brain research reviews, 64(1), 104-122.

Wang, L. (2003). Switching to first language among writers with differing second-language proficiency. Journal of Second Language Writing, 12(4), 347-375. DOI: 10.1016/j.jslw.2003.08.003.

White, R. W. & Drucker, S.M., (2007). Investigating behavioral variability in Web search. Paper presented at WWW2007. Banff, Alberta, Canada. May 8–12, 2007 (pp. 21-30).

Wilson, T. D. (1997). Information behaviour: an interdisciplinary perspective. Information Processing & Management, 33(4), 551-572.

Wilson, T.D. (2005). Evolution in information behavior modeling: Wilson's model. In K.E. Fisher, S. Erdelez, & L.E.F. McKenchnie (Eds), Theories of Information Behavior, 31-36.

Wilson, T.D., Ford, N., Ellis, D., & Foster, A. (2002). Information seeking and mediated searching: Part 2. Uncertainty and its correlates. Journal of the American Society for Information Science and Technology, 53(9), 704-715.

Woodall, B.R. (2002) Language-switching: Using the first language when writing in a second language. Journal of Second Language Writing 11 (1), 7-28

World Internet Users Statistics and 2016 World Population Stats. (2016, June 30). Retrieved February

http://searchengineland.com/google-launches-cross-language-information-retrieval-clir-11296

http://searchengineland.com/google-launches-cross-language-information-retrieval-clir-11296


02, 2017, from http://www.internetworldstats.com/stats.htmWu, D., He, D., & Luo, B. (2012). Multilingual needs and expectations in digital libraries: A survey of

academic users with different languages. The Electronic Library, 30(2), 182-197.Xu, J. and Croft, W. B. (2000). Improving the effectiveness of information retrieval with local context

analysis. ACM Transactions on Information Systems. 18, 1 (Jan. 2000), 79-112. Retrieved: 11/4/05, from ACM Portal.

Xu, J., and Weischedel, R. (2000). Cross-lingual information retrieval using Hidden Markov models. In Proceeding of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. Hong Kong, October 7-8, (2000). Retrieved November 23, 2005, from http://acl.ldc.upenn.edu/W/W00/W00-1312.pdf.

Xu, J. and Weischedel, R. (2005). Empirical studies on the impact of lexical resources on CLIR performance. Information Processing and Management, vol. 41, no. 3, 475-487. Retrieved 11/17/05, from Elsevier.

Xu, J., Weischedel, R., and Nguyen, C. (2001). Evaluating a probabilistic model for cross-lingual information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and Development in Information Retrieval, 105-110. Retrieved: 10/17/05, from ACM Portal.

Ye, Z., Huang, J. X., He, B., & Lin, H. (2012). Mining a multilingual association dictionary from Wikipedia for cross‐language information retrieval. Journal of the American Society for Information Science and Technology, 63(12)J, 2474-2487.

Yin, R. K. (2014). Case study research: Design and methods. Sage publications.Yip, V., & Matthews, S. (2006). Assessing language dominance in bilingual acquisition: A case for

mean length utterance differentials. Language Assessment Quarterly: An International Journal, 3(2), 97-116.

Yuexiao, Z. (1988). Definitions and sciences of information. Information Processing & Management, 24(4), 479-491.

Zarei, G.R. & Amiryousefi, M. (2011). A study of L2 composing task: An analysis of conceptual and linguistic activities and text quality. Procedia – Social and Behavioral Sciences, 30, 437-441.

Zhang, Y. and Vines, P. (2004). Using the web for automated translation extraction in cross-language information retrieval. In Proceedings of the 27th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Sheffield, United Kingdom, July 25 - 29, 2004). SIGIR '04. ACM Press, New York, NY, 162-169.

Zhou, D., Truran, M., Brailsford, T., & Ashman, H. (2008). A hybrid technique for English-Chinese cross language information retrieval. ACM Transactions on Asian Language Information Processing, 7(2).

Zhou, D., Truran, M., Brailsford, T., Wade, V., & Ashman, H. (2012). Translation techniques in cross-

http://acl.ldc.upenn.edu/W/W00/W00-1312.pdf


language information retrieval. ACM Computing Surveys (CSUR), 45(1), 1-51.


Appendix I. Participant Recruitment Letter

Hello,My name is Peishan Bartley, a PhD student from the Graduate School of Library and Information Science at Simmons College. I would like to invite you to participate in my research study about the language preference of Chinese-English bilingual speakers when they search for information online. Participants of the study will be asked to choose between Chinese and English versions of articles, and answer a few interview questions afterward. The participants of this study will remain anonymous. No personal identification information will be collected for this study. The study will take between 10-15 minutes. If you use both Chinese and English every day, and would like to participate in the study, please contact me at: [email protected].

Thank you for your consideration.Sincerely,Peishan Tsai Bartley

您好, 我是 Simmons College 图书资讯科学系的博士生. 在此邀请您填写一份研究问卷. 研究题材是中英双语使用者在使用网路时如何在中英文之间做选择. 我一直对自己如何在双语之间做取舍十分好奇. 您呢? 本问卷应该十至十五分钟之内即可完成. 您能提供的任何帮助我都由衷的感谢. 若您知道有其他会有兴趣参与研究的中英双语使用者, 请将邀请他们也填写问卷. 谢谢您的帮忙. 问卷网址: http://web.simmons.edu/~tsai.

蔡佩珊 (Peishan Tsai Bartley)Simmons College School of Library and Information Science

mailto:[email protected]


Appendix II. Informed Consent Form

Informed Consent Form English Version

Chinese-English Bilingual Speaker Language Preference Study

Before the study begins, please read the following overview of the study and what is required of you. Please read them carefully. If you are 18 years old or older and agree to continue, please provide your name in the bottom of the page. Your name will only be used as a digital signature to signify your agreement to participate in the study. It will not be used in any part of the study. If you have any questions, please contact Peishan Bartely at [email protected]. If you have questions about your rights as a research subject, please contact Valerie Beaudrault, Human Protections Administrator in the Office of Sponsored Programs of Simmons College at 617-521-2415. Thank you!PurposeThe purpose of this study is to explore how and why users select between Chinese and English when they are looking for information on the World Wide Web.

ProcedureThe study contains three sections:

1. Demography and language background survey: a survey that collects information on your lan-guage history and use.

2. Article language selection exercise: eight articles will be presented to you one at a time in both English and Chinese. You will be asked to select the language version that you prefer.

3. Language preference and selection process survey: a questionnaire that asks about your rea-soning and motivation during the article selection exercise.

The study should not take more than 20 minutes to complete.

The objective of this study is to collect your views and responses. There are no right or wrong answers. Please answer truthfully and respond intuitively. No judgments are made on your answers.ConfidentialityYour participation is voluntary. If any of the questions make you uncomfortable, you can withdraw from the study at any time.

Furthermore, your participation in this research is confidential. Every precaution will be taken to protect your privacy and the confidentiality of the records and data pertaining to you. In the event of a publication or presentation resulting from the research, no personally identifiable information will be shared. Any recordings made during the study is accessible to the researcher only. Once the study is complete, the recordings will be destroyed.If you have any question during the study or experience any problem with the survey, please don't hesitate to contact the researcher: Peishan Bartley at [email protected] on the continue button would signify that you have read and agreed to the above statements:continue to the study


Informed Consent Form Chinese Version

中英雙語使用者語文選擇研究謝謝您的參與. 在開始之前, 請仔細閱讀本研究大綱和對參與者的要求. 若您已年滿 18 歲並願意繼續進行, 請在本頁下方提供您的姓名. 您的名字僅代表您閱讀了本文並同意參與, 不會被包含於研究內容之內. 任何可辨識您身分之紀錄與您個人隱私資料皆被視為機密處理, 不會公開. 若您有任何疑問, 請聯絡蔡佩珊 (Peishan Bartely): [email protected]. 若您對身為參與者的權力有任何疑問, 請聯絡 Simmons College, Human Protections Administrator in the Office of Sponsored Programs, Valerie Beaudrault (617-521-2415). 謝謝.目的本研究目的在探索中英雙語使用者在網路上如何選用語言, 在選擇時有哪些考量.程序本研究有三部分:

1. 使用者語文背景調查: 基本語文學習及使用習慣.2. 中英雙語對照文章選擇: 八篇文章將一一以中英對照方式呈現. 請您在中文和英文版本中選擇偏好閱讀的版本.

3. 語文選擇考量調查: 此部分將問您在文章選擇時的考量和原因.整個過程可在二十分鐘內完成.本研究用意在於收集您的意見及想法, 沒有所謂正確答案, 因此請您忠實的依直覺反應回答問題. 您的答案和其他參與者的答案會被綜合起來一併分析, 不會被單獨評斷.保密原則

您對本研究是自願參與. 在問卷過程中若有任何疑慮, 您可以隨時退出.如上文所列,您的參與及資料皆會被視於機密處理.所有的原始資料將被審慎保管. 在本研究論文完成後, 所有收集的資訊將被銷毀.若問卷過程中有任何疑問或發生甚麼問題, 請聯絡研究生蔡佩珊 [email protected].若您閱讀上列相關資訊, 經過考量後同意參與本研究, 請按鍵:

繼續前往問卷調查




Appendix III. Demographic and Language Skill Questionnaire

English version

2

2 The two languages listed here corresponds to user’s entry of the languages they know in question 4.






Chinese Version

3

3 The two languages listed here corresponds to user’s entry of the languages they know in question 4.





Appendix IV. User Study Article Selection Samples



Appendix V. Interview Script

1. Was it difficult or was it easy to select between the two language versions? Why?

在中文與英文中做選擇, 對您來說很困難還是很簡單? 為什麼?

2. When you looked at the articles in general, is there a language version you prefer? Why?

一般說來, 當您在看這些文章時, 您比較偏好中文版或英文版? 為什麼?

3. Are there any articles that are especially interesting to you? Which ones and why?

回顧剛才文章的題材, 有哪些是您特別感興趣的?

4. Reviewing the results, there are articles that you choose the Chinese version over the English

version. Please explain why.

回顧您剛才做的選擇, 這些文章您選擇閱讀中文版本. 請您解釋一下您在兩語言之間做選

擇時的考量?

5. Reviewing the results, there are the articles that you choose the English version over the

Chinese version. Please explain why.

回顧您剛才做的選擇, 這些文章您選擇閱讀英文版本. 請您解釋一下您在兩語言之間做選

擇時的考量?

6. Are there other thoughts or considerations on language selection that you care to share with this

researcher?

請問您對語言選擇有沒有其它的想法及考量可以分享?

7. Reflecting on the survey process, were there any thoughts and suggestions you can share?

回顧整個研究過程, 您有甚麼建議及想法可以跟研究者分享的呢?


Appendix VI. Post Article Selection Questionnaire

English Version with Simulated Article Selection Result


Chinese Version with Simulated Article Selection Result


Appendix VII. Variables represented in the survey items

The survey is based on LEAP-Q developed by Marian et al. (2007). Questions were added

about user's Internet use frequency and patterns. The table below lists the questions, and the

corresponding research question about: Language exposure (exposure), language attitude (attitude,

which includes dominance and preference), language fluency (fluency), frequency of language use

(frequency), and familiarity with subject matter (subject).

Item Survey Question Research Question and Variable1 What is your gender Basic demographic information2 In what year were you born Basic demographic information3 In what year did you move to the US Language history: exposure4 Please list all the languages you know and use. Language profile: dominance, attitude5 What is the order in which you learned the languages? Language history: attitude, frequency

6Please list the percentage of time you are exposed to each lan-guage on a daily basis. Language profile: exposure

7

When you can choose a language to speak with another person who is fluent in every language that you speak, what are the per-centages that you would choose to speak in each language? Language preference: attitude

8

If you are presented with a document with unknown content written in a language that you do not know, what are the per-centages that you would choose to translate it into each of the following languages? Language preference: attitude

9Please name the cultures that you identify with and rate the ex-tent to which you identify with each of them. Language profile: attitude

10Please roughly estimate the amount of time you have spent in the following environments for as long as you have lived. Language history: exposure

11 How long have you been using English daily? Language history: exposure, frequency12 How would you describe your English reading ability? Language profile: fluency13 In general, when do you use English more than Chinese? Language preference: attitude14 How long have you been using Chinese daily? Language history: exposure, frequency15 How would you describe your Chinese reading ability? Language profile: fluency16 In general, when do you use Chinese more than English? Language preference: attitude

17 Generally speaking, which language do you prefer and why. Language preference: attitude

18On average, how many hours do you spend on the Internet each day? Internet use: frequency

19How frequently do you use the Internet for the following activi-ties in each language?

Language use on the Internet: fre-quency, attitude

20When you are online, how do you decide what language to use? What are your decision-making criteria? Language use on the Internet: attitude


Appendix VIII Coding Framework

Initial Coding Example

* Eng: 在美国工作生活环境都需要英语* Ch: * None: Environment. Frequency of use.

* Eng: * Ch: 母语* None: Mother tongue.* Eng: I need to use it in my work* Ch: * None: Frequency of use.* Eng: * Ch: 易懂* None: Easier to understand.* Eng: * Ch: * None:Depending on the situation, or my environment.

Depends on situation.

Secondary Coding Example

Language Preference

Initial Coding Secondary coding

Chinese I am more fluent in this language. FluencyIt is more convenient to use.It is my native language.It is easier for me to use.I can express myself more accurately in this language.

Better for expressing thoughts

It is easier to understand. Better comprehensionI am more familiar with it. AccustomedI am more used to it.It is a beautiful language. Personal preferenceI am not fully assimilated into American culture or environment.

Cultural acclimatization

I use it more in my work environment. Frequency of useIt is the language used in my social network.It is the language most used by people around me. Amount of exposure


Appendix IX Literature Review Source

The literature included in this article are cultivated from three electronic databases - ACM,

Library and Information Science, Library, and Information Science & Technology Abstracts; as well as

Google Scholar. The documents were retrieved using the following search terms:

1. “cross-lingual information retrieval”

2. “cross language information seeking”

3. “multilingual information retrieval”

4. “multilingual information seeking”

5. “cross-language information retrieval”

6. “multilingual information access”

7. “information seeking behavior and the Web”

8. “information seeking behavior and Internet”

9. “Digital libraries” or “electronic libraries”

10. Bilingual or multilingualism

11. “Bilingual speaker” and Internet

12. “Language selection”

13. “Language choice”

14. “Language preference”

15. Bilingualism (“language choice” OR “language preference” OR “code switching”)

and various Boolean combinations thereof. Some articles were found through citations in

reviewed literature as well as serendipitously during the search process.