Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
저 시-비 리- 경 지 2.0 한민
는 아래 조건 르는 경 에 한하여 게
l 저 물 복제, 포, 전송, 전시, 공연 송할 수 습니다.
다 과 같 조건 라야 합니다:
l 하는, 저 물 나 포 경 , 저 물에 적 된 허락조건 명확하게 나타내어야 합니다.
l 저 터 허가를 면 러한 조건들 적 되지 않습니다.
저 에 른 리는 내 에 하여 향 지 않습니다.
것 허락규약(Legal Code) 해하 쉽게 약한 것 니다.
Disclaimer
저 시. 하는 원저 를 시하여야 합니다.
비 리. 하는 저 물 리 목적 할 수 없습니다.
경 지. 하는 저 물 개 , 형 또는 가공할 수 없습니다.
공학박사학위논문
Data Driven Approaches in User Experience
Analysis: Customer-voice classification, User
segmentation and Design elements selection
데이터 분석 방법론 기반 사용자 경험 디자인: 사용자 요구사항
분류, 사용자 세그멘테이션 및 디자인 요소 선정
2019 년 2 월
서울대학교 대학원
산업공학과
이 영 훈
Data Driven Approaches in User Experience
Analysis: Customer-voice classification, User
segmentation and Design elements selection
데이터 분석 방법론 기반 사용자 경험 디자인: 사용자
요구사항 분류, 사용자 세그멘테이션 및 디자인 요소 선정
지도교수 조 성 준
이 논문을 공학박사 학위논문으로 제출함
2018 년 11 월
서울대학교 대학원
산업공학과
이 영 훈
이영훈의 공학박사 학위논문을 인준함
2018 년 12 월
위 원 장 윤 명 환 (인)
부위원장 조 성 준 (인)
위 원 박 우 진 (인)
위 원 정 재 윤 (인)
위 원 홍 지 영 (인)
Abstract
Data Driven Approaches in User ExperienceAnalysis: Customer-voice classification, Usersegmentation and Design elements selection
Younghoon Lee
Department of Industrial Engineering
The Graduate School
Seoul National University
In this thesis, data driven approaches in user experience analysis is proposed. Even
if lots of studies from both academia and industry are tried to propose various tech-
niques to improve the user experience of smartphone, there are few problems since
it is usually performed heuristically by user experience designer. The objective of
this study is to effectively address those problems and it focuses on three subjects in
the whole user experience design process: 1) Customer-voice classification, 2) User
segmentation and 3) Design elements selection. First, this study proposes advanced
document de-nosing method and representation for an effective document classifica-
tion task that is appropriate for the customer-voice data to address the limitation
of inefficiency of previous manual classification. Second, this study proposes a novel
way of user segmentation method utilizing app usage sequence of real users to ad-
dress the problem of limited utilizing sources. Last, this study proposes two design
elements selection methods for help contents re-organization and product attribute
i
prioritization with high-end deep learning techniques to deal with the previous lim-
itations of not considering the users needs and characteristics. With the meaningful
results of this thesis, it is concluded that data driven approaches effectively addresses
the previous problems cause by heuristic approaches. And it can provide meaningful
insights to several UI designers regarding customer-voice analysis, user segmenta-
tion, product development or layout design. Future studies can extend the scope of
researches based on this study for other tasks in the whole user experience design
process.
And this thesis published in the SCI/SCIE/SSCI journals of:
Lee, Y., Cho, S., Choi, J. (2018). De-noising documents with a novelty detection
method utilizing class vectors. Intelligent Data Analysis, 22(4), 717-733.
Lee, Y., Park, I., Cho, S., Choi, J. (2018). Smartphone user segmentation based
on app usage sequence with neural networks. Telematics and Informatics, 35(2), 329-
339.
Lee, Y., Im, J., Cho, S., Choi, J. (2018). Applying convolution filter to matrix
of word-clustering based document representation. Neurocomputing, 315, 210-220.
Lee, Y., Chung, M., Cho, S., Choi, J. (2019). Extraction of Product Evalua-
tion Factors with a Convolutional Neural Network and Transfer Learning. Neural
Processing Letters, 1-16.
Lee, Y., Song, S., Cho, S., Choi, J. (2019). Document representation based on
probabilistic word clustering in customer-voice classification. Pattern Analysis and
Applications, (Accepted).
Lee, Y., Cho, S., Choi, J. (2019). Smartphone help contents re-organization
considering user specification via conditional GAN. International Journal of Human-
ii
Computer Studies, (Accepted).
Keywords: User experience, Data analysis, Document classification, User segmen-
tation, Design elements selection
Student Number: 2016-30254
iii
Contents
Abstract i
Contents viii
List of Tables x
List of Figures xii
Chapter 1 Introduction 1
Chapter 2 Literature Review 7
2.1 Traditional approaches for analysis of user experience design . . . . . 7
2.1.1 Focus group discussion . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Personal interview . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.3 Quantitative approaches . . . . . . . . . . . . . . . . . . . . . 9
2.2 Related studies on document classification . . . . . . . . . . . . . . . 11
2.2.1 Document classification method . . . . . . . . . . . . . . . . . 11
2.2.2 Word-clustering based document representation method . . . 12
2.2.3 Novelty detection in the textual domain . . . . . . . . . . . . 13
2.3 Related studies on user segmentation . . . . . . . . . . . . . . . . . 14
2.4 Related studies on product attributes prioritization . . . . . . . . . . 16
v
2.5 Related studies on help system improvements . . . . . . . . . . . . . 17
2.5.1 Help system user interface . . . . . . . . . . . . . . . . . . . . 17
2.5.2 User specification . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 Review on related architecture . . . . . . . . . . . . . . . . . . . . . 19
2.6.1 Probabilistic clustering method . . . . . . . . . . . . . . . . . 19
2.6.2 Neural embedding architecture . . . . . . . . . . . . . . . . . 21
2.6.3 Variational auto-encoder and Neural variational document model 22
2.6.4 t-distributed stochastic neighbor embedding (t-SNE) . . . . . 23
2.6.5 Seq2seq architecture . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.6 Louvain method . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6.7 Explainable machine learning algorithms . . . . . . . . . . . . 26
2.6.8 Transfer learning . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6.9 Conditional GAN . . . . . . . . . . . . . . . . . . . . . . . . . 28
Chapter 3 Customer-voice classification 31
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.1 De-nosing documents . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.2 Probabilistic word clustering based document representation 38
3.2.3 Word-clustering based document representation with VAE and
its probabilistic version . . . . . . . . . . . . . . . . . . . . . 44
3.2.4 Matrix representation of word-clustering based document rep-
resentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.5 Applying convolution filter to matrix representation . . . . . 47
3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
vi
3.3.1 Data description . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.2 Experiments setup . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.3 Experiments results . . . . . . . . . . . . . . . . . . . . . . . 54
Chapter 4 User segmentation 63
4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.1 Variant of the seq2seq based approach . . . . . . . . . . . . . 65
4.2.2 App clustering and relative similarity-based segmentation . . 69
4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3.1 Data description . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3.2 Experiments setup . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3.3 Experiments results . . . . . . . . . . . . . . . . . . . . . . . 78
Chapter 5 Design elements selection 83
5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2.1 Prioritization of product attributes . . . . . . . . . . . . . . . 85
5.2.2 Help contents re-organization . . . . . . . . . . . . . . . . . . 90
5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3.1 Data description . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3.2 Experiments setup . . . . . . . . . . . . . . . . . . . . . . . . 97
5.3.3 Experiments results . . . . . . . . . . . . . . . . . . . . . . . 100
Chapter 6 Conclusion 105
vii
Bibliography 109
국문초록 129
감사의 글 131
viii
List of Tables
Table 3.1 Word list located closest to the centroid . . . . . . . . . . . . 39
Table 3.2 Word list located far from the centroid . . . . . . . . . . . . . 39
Table 3.3 Customer-voice dataset . . . . . . . . . . . . . . . . . . . . . 51
Table 3.4 Words with lowest novelty score . . . . . . . . . . . . . . . . . 55
Table 3.5 Words with highest novelty score . . . . . . . . . . . . . . . . 55
Table 3.6 Accuracy of classification performance (*: Proposed method) 57
Table 3.7 Accuracy of classification performance of customer-voice data 60
Table 3.8 Example of representation interpretation . . . . . . . . . . . . 61
Table 4.1 User segmentation results obtained by domain experts. . . . . 77
Table 4.2 Comparison of the similarities between the segmentations ob-
tained by each method and the answer set (*: proposed method,
(c): utilizing cosine distance, (m): utilizing mahalanobis dis-
tance). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Table 4.3 Example of representation interpretation. . . . . . . . . . . . 82
Table 5.1 Structure of convolutional neural network for aspect extraction 87
Table 5.2 Examples of keywords in the same cluster . . . . . . . . . . . 90
Table 5.3 Baselines utilized in first experiment . . . . . . . . . . . . . . 98
Table 5.4 Performance of attributes extraction and prioritization (NDGC)101
ix
Table 5.5 Examples of extracted attributes . . . . . . . . . . . . . . . . 102
Table 5.6 Result of effectiveness comparison . . . . . . . . . . . . . . . 102
Table 5.7 Confusion matrix of help contents usage prediction . . . . . . 103
Table 5.8 Average of help contents selection for top-k prediction . . . . 103
x
List of Figures
Figure 1.1 Process of smartphone user experience design . . . . . . . . 2
Figure 2.1 Original seq2seq architecture . . . . . . . . . . . . . . . . . . 25
Figure 2.2 Example of Grad CAM image . . . . . . . . . . . . . . . . . 27
Figure 3.1 Summary of customer-voice data analysis process . . . . . . 32
Figure 3.2 Scope of proposed approaches . . . . . . . . . . . . . . . . . 32
Figure 3.3 Limitation of the previously stated novelty detection method 34
Figure 3.4 Advantage of proposed novelty detection method . . . . . . 35
Figure 3.5 Document representation based on probabilistic word clustering 40
Figure 3.6 The Reason for rearranging the each representation . . . . . 48
Figure 3.7 Preserve semantic distance . . . . . . . . . . . . . . . . . . . 50
Figure 3.8 One-to-one correspondence . . . . . . . . . . . . . . . . . . . 50
Figure 3.9 Rearrange the elements . . . . . . . . . . . . . . . . . . . . . 50
Figure 3.10 TF-IDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Figure 3.11 Neural embedding based word clustering [61, 127] . . . . . . 58
Figure 3.12 Probabilistic word clustering based approach [72] . . . . . . 58
Figure 3.13 Topic vector . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Figure 3.14 LSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Figure 3.15 Accuracy of classification performance . . . . . . . . . . . . 58
xi
Figure 3.16 Accuracy of classification performance of customer-voice data 61
Figure 4.1 Summary of our proposed method . . . . . . . . . . . . . . . 65
Figure 4.2 Variant of the seq2seq architecture (our proposed architecture) 67
Figure 4.3 Determination of user segmentation . . . . . . . . . . . . . . 69
Figure 4.4 Summary of app clustering-based user representation. . . . . 70
Figure 4.5 Comparison between actual and predicted segmentation results 72
Figure 4.6 Summary of our proposed method for considering relative
similarity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Figure 4.7 Example of the app usage sequence. . . . . . . . . . . . . . . 75
Figure 4.8 Example of user segmentation by domain experts . . . . . . 76
Figure 4.9 Example of app clustering. . . . . . . . . . . . . . . . . . . . 78
Figure 4.10 User network construction. . . . . . . . . . . . . . . . . . . . 79
Figure 5.1 Example of smartphone help system . . . . . . . . . . . . . 85
Figure 5.2 Summary of our proposed method . . . . . . . . . . . . . . . 86
Figure 5.3 Example of weight visualization . . . . . . . . . . . . . . . . 89
Figure 5.4 Summary of our proposed method . . . . . . . . . . . . . . . 91
Figure 5.5 Preprocessing of help usage data . . . . . . . . . . . . . . . . 92
Figure 5.6 CGAN architecture for help usage prediction . . . . . . . . . 93
Figure 5.7 Example of help contents re-organization . . . . . . . . . . . 94
Figure 5.8 Spec sheet of LG V30 (Resource: GSM Arena) . . . . . . . . 95
xii
Chapter 1
Introduction
User experience (UX) is an experience that consists of all aspects of users’ interac-
tions with a certain product or service [98]. Since the revolutionary success of Apple,
the competitive advantage of most Information & Communication Technology (ICT)
products and services in the contemporary market is now gained from the domain
of UX beyond the functionality and efficiency especially in smartphone [20].
Researchers from both academia and industry propose lots of techniques to im-
prove the user experience of smartphone in each design step (Figure 1.1). In the
concept building step, 1) Trend research to answer the question ‘Which trends are
there or will there be in the future? And which of these are relevant to us?’, 2) User
segmentation to maximize the value of each customer to the business, and 3) Focus
group interview are performed to derive the new insight and concept for the future
UX. In the validation step, 1) Prototyping, and 2)Acceptance testing to evaluate the
concept’s compliance with the user requirements and assess whether it is acceptable
for delivery are carried out to validate the derived concept. In the design step, 1)
Define common guideline to govern the various UX design of each application, 2)
Element selection for layout and flow design, and 3) Graphic design to wear the
graphic element is performed for final UX design of the device [67, 35].
1
Figure 1.1: Process of smartphone user experience design
However, in most companies, user experience design is performed heuristically
by individual designer, thus there are a few problems associated with it. The first
problem relates to the lack of consistency. It is performed by several individuals,
and thus results of tasks vary with individuals. Thus, additional steps are required
for correcting inconsistencies. The second problem relates to resource management.
Those tasks needs to be carried by a domain expert with a great amount of back-
ground knowledge in the field. Not only is it a highly time-consuming task but it
will be expensive to find an adequate domain expert.
The objective of this study is to effectively address the issues listed above and
apply the data driven approach in user experience design. In details, this study
focuses on three research scopes of the whole process of user experience design stage:
Customer-voice classification, User segmentation and Design elements selection as
illustrated in the Figure 1.1.
With respect to customer-voice analysis, as the classification of customer-voice
data is performed manually, there are a few problems associated with it. The first
problem relates to the lack of consistency. Classification tasks are performed by sev-
2
eral individuals, and thus, results vary with individuals. Thus, additional steps may
be required for correcting inconsistencies. The second problem relates to the time-
consuming nature of the classification. In some cases, it may be necessary to respond
to customer voice urgently, especially when the issue is related to quality assurance.
The time consumed by the classification task delays customer-voice analysis and
requires an immediate response. The last problem relates to resource management.
Unnecessary allocation of human resources to a classification task may lead to a
shortage of human resources for more important tasks, which will not be helpful for
optimizing human resource management.
Thus, this study focused on building an automatic classifier for customer-voice
data and newly proposes an advanced document representation method that is ap-
propriate for customer-voice data. The customer-voice data used in this study were
obtained from various channels, including phones, e-mails, or websites, and the data
were stored in a text document. Thus, customer-voice analysis starts with the docu-
ment classification which allows it to be delivered to relevant departments and also
provides overall information on customer-voice distribution according to function.
In details, this study proposes 1) document de-noising method to clear the raw
documents, 2) probabilistic word clustering based document representation method
to provide interpretability of document and 3) another novel method to applying
convolution filter to document representation to increase the classification perfor-
mance.
In the user segmentation step, there are several limitations in previous ap-
proaches that are based on demographics and reported usage. First, they are inher-
ently subjective and prone to skewing by observers and participants. Second, these
3
studies were predominantly performed heuristically with persons who already have
extensive domain knowledge and background information about the smartphone in-
dustry by carrying out user segmentation with limited information. Therefore, the
user segmentation tasks based on previous studies are costly and time consuming
because they require participants who report their usage data to be gathered, and
domain experts are invited to analyze the participants’ reported usage [95, 27, 144].
Thus, this study proposes novel ways of segmenting smartphone users based on
app usage sequences collected from real smartphone logs. Hundreds of applications
are often installed in users smartphone, and a log of their application usage is a
powerful resource for user segmentation because it contains meaningful information
regarding the user’s preferences, behaviors and interests. In details, we proposed
two novel ways to segment users: 1) Variant of the seq2seq architecture based ap-
proach, and 2) App clustering and relative similarity based approach to provide
interpretability to user segmentation results.
Finally, according to element selection, most of the previous studies on devel-
oping the evaluation or purchasing factor were heuristically performed by those,
who already have comprehensive domain knowledge and background information of
the product industry, and were expensive and time-consuming. These studies were
mostly based on existing studies and focus group interviews with a few participants.
Thus, they were likely to skew and could not catch the latest improvement on smart-
phone products, which are one of the most rapidly changing devices in the industry.
And, with respect to contents selection such as help system, they often provide the
same help contents to all users without considering individual user’s persona and
characteristic. This causes the user to question the effectiveness of the help system
4
and results in reduced frequency of using the help system.
Thus, this study deals with two subjects: 1) Prioritization of product attributes
with Convolutional Neural Network (CNN) based aspect extraction method, and 2)
Contents re-organization method based on conditional Generative Adversarial Net-
work (GAN). In the product attribute prioritization, this study newly proposes an
aspect extraction method combine the Convolutional Neural Network and transfer
learning. Additionally we utilized the explainable neural network such to calcu-
late the relative importance of each product attributes. And in the contents re-
organization, this study proposes a new method of re-organizing help content by
considering each user’s interests and preferences using their app usage sequence.
The remainder of this paper is structured as follows: Section 2 discusses various
studies on each subject and other algorithms that are utilized herein; Section 3
proposes the few algorithms for document classification; Section 4 newly presents
two user segmentation methods, Section 5 proposes two small subjects regarding to
design elements selection and Section 6 provides the conclusions and discussion, as
well as the directions for future work.
5
Chapter 2
Literature Review
2.1 Traditional approaches for analysis of user experience
design
2.1.1 Focus group discussion
A focus group discussion (FGD) is a good way to gather together people from
similar backgrounds or experiences to discuss a specific topic of interest. The group
of participants is guided by a moderator or facilitator who introduces topics for
discussion and helps the group to participate in a lively and natural discussion
amongst themselves.
It is utilized in various UX design process such as user segmentation or ideation.
The strength of FGD relies on allowing the participants to agree or disagree with each
other so that it provides an insight into how a group thinks about an issue, about
the range of opinion and ideas, and the inconsistencies and variation that exists in
a particular community in terms of beliefs and their experiences and practices.
FGD can be used to explore the meanings of survey findings that cannot be
explained statistically, the range of opinions/views on a topic of interest and to
collect a wide variety of local terms. In bridging research and policy, FGD can be
useful in providing an insight into different opinions among different parties involved
7
in the change process, thus enabling the process to be managed more smoothly. It
is also a good method to employ prior to designing questionnaires.
FGD sessions need to be prepared carefully through identifying the main objec-
tive of the meeting, developing key questions, developing an agenda, and planning
how to record the session. The next step is to identify and invite suitable discussion
participants; the ideal number is between six and eight.
The crucial element of FGD is the facilitation. Some important points to bear in
mind in facilitating FGDs are to ensure even participation, careful wording of the
key questions, maintaining a neutral attitude and appearance, and summarizing the
session to reflect the opinions evenly and fairly. A detailed report should be prepared
after the session is finished. Any observations during the session should be noted and
included in the report.
2.1.2 Personal interview
A personal interview survey, also called as a face-to-face survey, is a survey method
that is utilized when a specific target population is involved. The purpose of con-
ducting a personal interview survey is to explore the responses of the people to
gather more and deeper information.
Personal interview surveys are used to probe the answers of the respondents and
at the same time, to observe the behavior of the respondents, either individually or
as a group. The personal interview method is preferred by researchers for a couple of
advantages. But before choosing this method for your own survey, you also have to
read about the disadvantages of conducting personal interview surveys. In addition,
you must be able to understand the types of personal or face-to-face surveys.
8
It is also utilized in the various steps in UX design process similar to FGD. One of
the main reasons why researchers achieve good response rates through this method
is the face-to-face nature of the personal interview survey. Unlike administering
questionnaires, people are more likely to readily answer live questions about the
subject simply because they can actually see, touch, feel or even taste the product.
If designer wish to probe the answers of the respondents, they may do so using a
personal interview approach. Open-ended questions are more tolerated through in-
terviews due to the fact that the respondents would be more convenient at expressing
their long answers orally than in writing.
2.1.3 Quantitative approaches
There are various quantitative approaches utilized in UX design process such as
usability testing, A/B Testing, Eyetracking or Questionnaires. Although not used as
often, quantitative usability testing is a lot like qualitative usability testing — users
are asked to perform realistic tasks using a product. The primary difference between
the two is that qual usability testing prioritizes observations, like identifying usability
issues. In contrast, quantitavie usability testing is focused on collecting metrics like
time on task or success. Once designer have collected those metrics with a relatively
large sample size, they can use them to track the progress of your product’s usability
over time, or compare it to the usability of your competitors’ products. The type of
usability testing you choose (in-person, remote moderated, or remote unmoderated)
will impact the cost and difficultly associated with this method. Since the goals of
quantitative and qualitative usability studies are different, the structure of the test
and the tasks used will need to be different as well.
9
While designer can use analytics metrics to monitor your product’s performance,
they can also create experiments that detect how different UI designs change those
metrics — either through A/B testing or multivariate testing. In A/B testing, teams
create two different live versions of the same UI, and then show each version to
different users to see which version performs best. Multivariate testing is similar,
but involves testing several design elements at once. For example, the test could
involve different button labels, typography, and placement on the page. Both of
these analytics-based experiments are great for deciding among different variations
of the same design — and can put an end to team disputes about which version is
best. A major downside to this methodology is that it’s often abused. Some teams
fail to run the tests as long as they should, and make risky decisions based on small
numbers.
Eyetracking studies require special equipment that tracks users’ eyes as they
move across an interface. When many participants perform the same task on the
same interface, meaningful trends start to emerge and designer can tell, with some
reliability, which elements of the page will attract people’s attention. Eyetracking
can help them identify which interface and content elements need to be emphasized
or de-emphasized, to enable users to reach their goals. A major obstacle to running
eyetracking studies is the highly specialized, prohibitively expensive, and somewhat
unstable equipment that requires lots of training to use.
10
2.2 Related studies on document classification
2.2.1 Document classification method
Document representation is a key step in the document classification problem. This
section reviews the major document representation methods. Many text and senti-
ment classifiers are still solely based on different sets of words contained in docu-
ments, such as the bag-of-words or bag-of-n-grams approaches, and do not consider
sentence and discourse structure or meaning. It is a straightforward method and
provides an intuitive interpretation. However, these approaches are limited when
a large number of documents are involved. It could have high dimensionality and
sparsity to measure the proximity between documents [68, 142].
Latent Semantic Analysis (LSA) [30], probabilistic Latent Semantic Analysis
(pLSA) [15], and a more comprehensive method based on Latent Dirichlet Allocation
(LDA) were suggested [9] to reduce dimensionality and select more discriminative
features. However, these techniques could lose the innate interpretability and suffer
from few disadvantages because it continues to be based on word co-occurrences. It
ignores the semantic relevance among words and does not consider context informa-
tion to a lesser extent when compared with the bag-of-words method. Furthermore,
the inference process is too sensitive to the initial condition, especially with respect
to the LDA-based model.
Additionally, word2vec, one of the neural embedding approaches, is based on
the assumption of distributed hypothesis, which implies that words occurring in
a similar context tend to have similar meanings [46]. Based on this assumption,
word2vec uses a neural network model such as skip-gram or continuous bag of word
11
(CBOW) that predicts the neighboring words of input words [70, 85]. The most
important aspect of word2vec is that words with similar meaning are located close to
each other in a vector space. The word2vec model can be utilized to construct dense
document vectors with reasonable dimensions when compared with the bag-of-words
approach, in which the dimensionality and sparsity of a document vector can increase
significantly. Various document representation methods have been suggested based
on the word2vec model. Even a simple representation method, in which average word
vectors are contained in document, shows a good representation performance [142].
A promising representation method based on the word2vec model corresponds to
the doc2vec model. The doc2vec model utilizes contextual information of words and
documents to represent a document.
2.2.2 Word-clustering based document representation method
This section reviews the major document representation methods based on word-
clustering. The bag-of-concepts approach, one of the word-clustering based doc-
umentation representation method, combines the advantages of previous studies.
Semantically similar terms are clustered into a common concept by clustering the
words generated from a neural embedding architecture, thereby incorporating the
impact of semantically similar words for preserving document proximity. Document
vectors are subsequently represented by the frequencies of these concepts [61]. Sim-
ilarly, Paniagua et al. utilized word vectors and word clusters generated by the
neural embedding architecture to add the word-clustering result in the feature set
of documents [127].
And in Mitrofanova et al, a set of key words describing major topics of the plot are
12
assigned to each text; the clusters of words with similar distributions were created for
each key word based on word vector model utilizing co-occurrence matrix [86, 112].
Moreover, Saha et al. constructs word-clustering based cosine similarity for named
entity recognition task [111], and Bekkerman et al. more directly compared the
simple bag-of-words approach and word-clustering based document representation
approach to prove the effectiveness of word-clustering based document representa-
tion.
2.2.3 Novelty detection in the textual domain
Novelty detection can be defined as the task of recognizing that data differ in some
respects from the data that are considered as normal. Novelty detection methods
are commonly classified into five categories, namely probabilistic approach, dis-
tance/density based approach, reconstruction based approach, domain based ap-
proach, and information theoretic techniques. The probabilistic approach and dis-
tance/density based approach are commonly used among the fore-mentioned ap-
proaches [99, 122]. Probabilistic approach uses probabilistic density estimation and
assumes that low-density areas correspond to low probabilities of including nor-
mal data. The distance/density based approach assumes that normal data is tightly
clustered and located close to each other in contrast to novel data. This study in-
cludes improvements of these novelty methods that combines a Gaussian mixture
model that is a probabilistic approach with the k-means clustering based that is a
distance/density based approach.
Novelty detection in the textual domain aims to detect novel documents, sen-
tences, words, or interesting topics. There are many examples of novelty detection
13
methods in the textual domain and these studies apply various methods including
the statistical approach, mixture of models approach, neural networks based ap-
proach, support vector machine based approach, and clustering based approach in
novelty detection [3, 124, 147, 6, 79, 80]. However, these studies focused on novelty
detection of a document or sentence level. That is mainly because various features
could be easily extracted from a document or sentence such as word frequency, fre-
quent POS list, and average length [42, 41, 43]. Meanwhile, novelty detection studies
of word levels are mostly based on a dictionary or a corpus only due to the lack of
suitable methods to represent words in a vector space [47, 17].
2.3 Related studies on user segmentation
According to Kotler, user segmentation refers to the classification of users into groups
depending on their characteristic and behaviors in order to identify those who may
require separate products [66]. User segmentation has also been identified as a key
element of product development. With user segmentation, product developers can
develop differentiated and personalized products for each segment, and marketing
personnel can create segmented advertisements and marketing communications for
each segment [25].
As mentioned earlier, many studies have focused on mobile internet services
based on their usage pattern. Cheng and Sun used messages, entertainment, and
micro-payment services to segment users with an improved segmentation model,
which is called the TFM (time, frequency, money) model [18]. Wu and Chou devel-
oped a soft clustering method that uses a latent mixed-class membership clustering
approach to classify online users based on their purchasing data across categories.
14
Bose and Chen selected internet usage, revenue, services, and user categories as re-
search indicators that were employed to cluster users [12]. Shafig et al. provided
a fine-grained characterization of the geospatial dynamics of application usage in
cellular networks [118].
However, this study focuses on the sequential pattern of mobile internet service
usage, which is only one aspect of the entire smartphone usage, so the clustering
result does not fully reflect the various smartphone usage behaviors.
Several studies have tried to collect additional data sources and consider the ef-
fects of other aspects on users’ smartphone behavior, unlike previous mobile internet
service-based methods. Uronen, Falaki, and Lin obtained mobile usage data using
call detail records collected by an operator, and segmented users using those voice
call usage data [134, 31, 74]. Walsh and Plaza utilized demographics: their results
show that younger users are most likely to be extensively involved with their mobile
phone [137], and the other finds that elderly people utilize mobile phones primar-
ily to communicate with relatives, as memory and daily-life aids, as enjoyment, for
self-actualization, and as tools to feel safe and secure [100].
In addition, Sell, Tao, Bouwman, and de Reuver and Bouwman utilized psy-
chology by combining those sets of information with demographics and behavioral
segmentation, and they found that each group has different motivations and product
attributes [116, 130]. In particular, Bouwman presents a psychographic segmenta-
tion that is based on sociological factors to understand how people deal with their
social lives and psychological factor of the person [87]. De Reuver and Bouwman
found that each segment moderates the effect on the context-use of mobile phones
towards a user’s intention to use products and services [28].
15
The smartphone industry stands to benefit from user segmentation more than
other industries because of the following reasons: 1) smartphones have the capability
to collect and store various types of information, 2) several hundreds of applications
are often installed on a user’s smartphone, and 3) a log of their application usage is a
powerful resource for user segmentation because it contains meaningful information
regarding the user’s preferences, behavioral patterns, and interests. However, these
studies were mainly based on reported usage and limited sources, such as voice calls
and data usage.
One recent study utilized the smart log data that is stored in each device to
segment users in objective and quantitative ways [45]. It utilized the average number
of calls and messages, average amount of data used, average number of URLs visited,
and the average number of applications that are installed and run daily. The use of
smartphone log data to segment users is meaningful, but it is also limited in terms
of its ability to use data from the apps that are used by each user as well as the
sequence in which the apps are used, even if app usage sequences are key elements
for effective user segmentation, as mentioned earlier.
2.4 Related studies on product attributes prioritization
The previous works on aspect extraction are categorized into supervised and unsu-
pervised approaches. However, our discussion here focuses on supervised approaches,
which are utilized in our method. Supervised learning methods are mostly based on
standard sequence labeling approaches, such as Conditional Random Field (CRF)
and Hidden Markov Model (HMM). Huang et al. proposed treated product feature
extraction as a sequence labeling task and employed a discriminative learning model
16
using CRF [49]. In comparison, Choi et al. applied a hierarchical parameter sharing
technique using CRF for a fine-grained opinion analysis, combinedly detecting the
boundaries of the opinion expressions [21]. Moreover, Yang et al. proposed a joint
inference model that leveraged knowledge from predictors optimizing the subtasks
of an opinion [145] and many of the other studies also based on HMM [53, 73, 133].
Meanwhile, Jin et al. extracted highly specific product-related entities based on
lexicalized HMMs [58]. Furthermore, a few domain-knowledge-based methods [139,
52] have been utilized in supervised approaches. CNN-based approaches [101] have
recently been suggested, and they show state-of-the-art performance compared to
those used in the previous studies. The authors of that study utilized Amazon em-
beddings for word representation and constructed a seven-layer CNN architecture.
The present study basically utilizes this CNN structure in the first phase and in-
troduces variations to address the limitations of the previous study considering the
rapidly changing smartphone industry.
However, previous studies mostly focused only on the extraction of aspects and
not on the relative importance of the extracted aspects. Although recent few studies
deal with the relative importance of the aspects [8], they are intuitively based on
the frequency of each aspect in the textual review. Thus, we focus on deriving the
relative importance of the extracted aspects utilizing an explainable neural network.
2.5 Related studies on help system improvements
2.5.1 Help system user interface
There have been several studies conducted on help system during the past decade.
As previously stated, however, those studies majorly focused on the design aspect
17
or common guidelines for usability, and not on the content organization problem,
considering users’ specifications.
In the design aspect, those studies focused on graphical user interface (GUI) to
ensure that users did not find it difficult to locate information, or fin it confusing,
time-consuming, or frustrating [1]. Baker et al. provided tips and practical advice
for using colors, such as avoiding reserved colors for on-line help systems [7]. Al-
berts and Geest also recommended using a maximum of three colors in on-line help
documentation, and argued for functional use of colors [2].
In the usability aspect, most studies focus on providing general guidelines or
tips for designing help systems [89]. Ellison et al. proposed 7 golden rules of on-line
help design, and Crane et al. presented 12 techniques for improving on-line help.
Moreover, Roy et al. proposed a guide for appropriately choosing and designing
task support tools based on tasks and characteristics of help tools [108], and Corbin
et al. presented the design attributes of on-line help systems in a series of design
checklists [24].
2.5.2 User specification
As previously mentioned, item recommendation studies considering users’ charac-
teristics and preferences in an on-line commerce field are primarily based on col-
laborative filtering. In the smartphone industry, however, these approaches are not
appropriate as the required information, such as users’ purchasing history or meta-
data, is not enough for the smartphone user.
Thus, previous user specification studied in the context of the smartphone has
typically been based on demographics and reported usage, which are inherently sub-
18
ject and prone to be skewed by the observers and participants. Furthermore, those
studies were predominantly performed by domain experts who already have compre-
hensive domain knowledge and background information regarding the smartphone
industry.
These can further be classified into several types as follows: (1) geographic seg-
mentation based on dividing the market into different geographical areas, such as
nations, regions, and cities; (2) demographic segmentation based on age, gender,
family size, etc.; (3) psychographic segmentation based on social class, lifestyle,
and/or personality characteristics; and (4) behavior segmentation based on occasion
segmentation, benefit segmentation, service usage, and intention to use [115, 22].
Thus, these studies used the app usage sequence collected by each user, which
are the most meaningful and interesting source of identifying a user’s preferences
and characteristics effectively [51].
2.6 Review on related architecture
2.6.1 Probabilistic clustering method
The studies mentioned in the previous section, however, utilized hard clustering
methods such as K-means, K-medoids, or spherical K-means clustering and did not
consider the membership strength of each word with respect to each cluster. There-
fore, in the present study, an advanced document representation method utilizing
neural embedding architecture based on the probabilistic clustering method was pro-
posed to capture the membership strength of each word. The utilized probabilistic
clustering method included the fuzzy C-means (FCM) clustering method [55] and
the Gaussian mixture model (GMM) clustering method [33].
19
The FCM algorithm attempts to partition a finite collection of n elements X =
{X1, ..., Xn} into a collection of c fuzzy clusters with respect to a specified criterion.
Given a finite set of data, the algorithm returns a list of c cluster centers C =
{C1, ..., Cc} and a partition matrix W = wij ∈ [0, 1], i = 1, ..., n, j = 1, ..., c where
each element wij specified the degree to which element Xi belongs to cluster Cj .
The FCM algorithm aims to minimize an objective function as follows:
argminC
n∑i=1
c∑j=1
wmij dist2(xi, cj)
where
wij =1∑c
k=1
(dist(xi, cj)
dist(xi, ck)
) 2m−1
A GMM is a parametric probability density function that is represented as the
weighted sum of Gaussian component densities. In a multivariate distribution, p(x|θ)
is defined as a finite mixture model with J components, and each component is a
multivariate Gaussian density defined with parameters θj = {µj ,Σj} as follows:
p(x|θ) =J∑j=1
αjpj(x|zj , θj),
pj(x|θj) =1
(2π)d/2|Σj |1/2e−
12
(x−µj)tΣ−1j (x−µj)
and αj = p(zj) denote the mixture weight representing the probability that a
randomly selected x was generated by component J , and ΣJj=1αj = 1. After each
parameter was calculated using the expectation-maximization (EM) algorithm, the
membership weight of data point is computed as follows:
20
wij = p(zij = 1|xi, θ) =pj(xi|zj , θj) · αj∑J
m=1 pm(xi|zm, θm) · αm
2.6.2 Neural embedding architecture
As mentioned earlier, the neural embedding architecture is based on the assumption
of the distributed hypothesis, which implies that words occurring in a similar context
tend to have similar meanings [46]. Based on this assumption, word2vec, which is one
of the neural embedding architectures, uses a neural network model, such as skip-
gram or a continuous bag of words (CBOW) that predicts the neighboring words of
input words [70, 85]. The neural network model in that particular architecture is first
trained with respect to the optimization function 1T
T−k∑t=k
log(p(ωt|ωt−k, ..., ωt+k)) in
CBOW or 1T
T−k∑t=k
log(p(ωt−k, ..., ωt+k|ωt)) in skip-gram, where T denotes the number
of words, and k denotes the window size of the neighboring words. Hidden nodes
can then be used as representations of words wt. The most important aspect of
word2vec is that words with similar meaning are located close to each other in the
vector space.
A class vector is trained from a neural network similar to simple neural embed-
ding model. Sachan and Kumar suggested architecture to embed word vectors in
conjunction with a class vector by incorporating both into a neural network [110].
In a manner similar to simple neural embedding model, the neural network model is
trained with an optimization function∑V
i=1 log p(wi|wcontext)+∑k
j=1
∑Vi=1 log(wi|cj)
when V denotes the number of words, and k denotes the number of classes. The cal-
culation of a class vector cj as well as word vectors wi lead to class vectors with
high cosine similarity with words that discriminate between classes. For instance,
21
with respect to the IMDB dataset, there are two classes of words, namely positive
words and negative words. Negative words, such as ‘awful ’ , are located close to
the negative class vector, while positive words, such as ‘wonderful ’ or ‘lovely ’ are
located close to the positive class vector [97].
2.6.3 Variational auto-encoder and Neural variational document
model
VAE is a directed model that uses learned approximate inference and can be trained
purely with gradient-based methods. To generate a sample from the model, the
VAE first draws a sample z from the code distribution pmodel(z). The sample is
then run through a differentiable generator network g(z). Finally, x is sampled from
a distribution pmodel(x; g(z)) = pmodel(x|z). During the training, the approximate
inference network (or encoder) q(z|x) is used to obtain z, and pmodel(x|z) is then
viewed as a decoder network. It is then trained by maximizing the variational lower
bound L(q) with data point x:
L(q) = Ez∼q(z|x)log pmodel(z, x) +H(q(z|x)) (2.1)
= Ez∼q(z|x)log pmodel(x|z)−DKL(q(z|x) ‖ Pmodel(z)) (2.2)
≤ log pmodel(x) (2.3)
The VAE usually has Gaussian distribution for pmodel(x; g(z)) and maximizing a
lower bound on the likelihood of such a distribution is similar to training a traditional
auto-encoder [64, 105, 71].
Neural variational document model (NVDM) utilized these VAE framework to
22
derive document representation [84]. In this process, word representation is also
derived from the model. In detail, an encoder network q(z|x) compresses document
representation into hidden vector z and a softmax decoder p(x|z) =∏Ni=1 p(xi|z)
reconstructs the documents by independently generating the words where N is the
number of words in the document. Similar to VAE, NVDM is trained by maximizing
the variational lower bound:
L(q) = Eq(z|x)
[N∑i=1
log pmodel(xi|z)
]−DKL [q(z|x) ‖ p(z)] (2.4)
In addition, conditional probability over words p(xi|z) is modeled by multinomial
logistic regression and shared across documents:
P (xi|z) =exp(E(xi; z))∑|V |j=1 exp(E(xi; z))
(2.5)
E(xi; z) = −zTRxi − bxi (2.6)
where R is the word representation matrix(RK×|V |) derived from the VAE architec-
ture.
2.6.4 t-distributed stochastic neighbor embedding (t-SNE)
t-SNE [77] is a nonlinear dimensionality reduction technique that is particularly
well-suited for embedding high-dimensional data into a space of low dimensions
while preserving the distance between data points. Specifically, it models each high-
dimensional object by low-dimensional point in such a way that similar objects are
modeled by nearby points and dissimilar objects are modeled by distant points.
23
The t-SNE algorithm comprises two main stages. First, t-SNE constructs a prob-
ability distribution over pairs of high-dimensional objects in such a way that similar
objects have a high probability of being picked, while dissimilar points have an
extremely small probability of being picked. Second, t-SNE defines a similar prob-
ability distribution over the points in the low-dimensional map, and it minimizes
the Kullback–Leibler divergence between the two distributions with respect to the
locations of the points in the map.
2.6.5 Seq2seq architecture
This study proposes herein variants to the previously established seq2seq architec-
ture to represent each app usage sequence in vector space. The seq2seq architecture
is based on recurrent neural networks (RNN), which is a family of neural networks for
processing sequential data [109]. The RNN creates an internal state of the network,
which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural
networks, RNNs can use their internal memory to process arbitrary sequences of
inputs [39].
The seq2seq architecture was first proposed by Cho (2014) and Sutskever (2014),
as illustrated in Figure 2.1 [19, 128]. An encoder or input RNN is processed as the
input sequence, and the encoder emits the context C usually as a simple function of
its final hidden state. A decoder or output RNN is conditioned on that fixed-length
vector to generate an output sequence. In the seq2seq architecture, the two RNNs
are jointly trained to maximize the average of logP (y(1), ..., y(ny)|x(1), ..., x(ny)) over
all the pairs of x and y sequences in the training set.
24
Figure 2.1: Original seq2seq architecture
2.6.6 Louvain method
The Louvain method is a network-clustering algorithm that optimizes the modularity
to detect nodes that are more densely connected [11]. This technique is a greedy
optimization method that does not always assure a globally optimal result; however,
the method’s time complexity is O(n log n). The modularity function to be optimized
in the Louvain method is presented as follows:
Q =1
2m
∑ij
[Aij −
kikj2m
]δ(ci, cj)
where m represents the edge weight sum of all of the edges in the graph, Aij denotes
the edge weight of nodes i and j, ki and kj are the sums of all edge weights connected
to nodes i and j, respectively, ci and cj represent the communities of the given nodes
i and j, respectively, and δ denotes the delta function.
25
The Louvain method consists of two phases between which iterations optimize
the modularity and detect communities accordingly. For the first step, all nodes are
randomly assigned to a small community. For each node i, j is removed from its
own community and transferred to the community of i’s neighbors j. The change in
modularity is then calculated and denoted as ∆Q.
∆Q =
[Σin + ki,in
2m−(
Σtot + ki2m
)]−
[Σin
2m−(
Σtot
2m
)2
−(ki2m
)2]
Once ∆Q has been calculated for all communities connected to i, j is then moved
to the community in which the change in modularity has increased the most. The
above-mentioned steps are repeated until the value of ∆Q can no longer be uploaded.
For the second step, each community that is formed in the first step is expressed
as a node upon the completion of the first step. The links within the same commu-
nity are expressed as self-loops, while those between different community nodes are
expressed as weighted edges. The first step is then executed on the newly constructed
networks.
2.6.7 Explainable machine learning algorithms
An explainable algorithm concept is proposed to explain how machine learning algo-
rithms arrive at a specific decision in contrast with the black-box characteristic of the
existing machine learning algorithms. In this study, we introduces variations to Grad
CAM (Gradient-weighted Class Activation Mapping) [117], one of the explainable
machine learning algorithms derived for image classification, to calculate the relative
importance of the extracted aspects. Grad CAM uses the gradient information flow-
ing into the last convolutional layer of a CNN to understand the importance of each
26
neuron for a decision of interest (Figure 2.2). Similarly, we construct a sentiment
classification model utilizing the concept of the Grad CAM algorithm to capture the
importance of each aspect.
Figure 2.2: Example of Grad CAM image
Additionally, we utilize other explainable machine learning algorithms as base-
lines of our proposed method in the experiments, such as a sequence model based on
attention mechanism [140] and LIME (local interpretable model-agnostic explana-
tions) [106]. The attention mechanism allows a decoder to consider different parts of
a source sentence at each step of the output generation. Then, the model learns how
to generate a context vector for each output time step and what to focus on based on
the input sentence and what it has produced [143]. Moreover, LIME is an algorithm
that can explain the prediction of any classifier authentically by approximating it
locally with an interpretable model.
2.6.8 Transfer learning
Transfer learning is a machine learning method where a model developed for a task
is reused as the starting point for a model on a second task. In a classification
task in one domain of interest, we only have sufficient training data in another
domain of interest, where the latter data may be in a different feature space or
27
follow a different data distribution. For example, knowledge gained while learning
to recognize cars could apply when trying to recognize trucks. In such cases, transfer
learning, if done successfully, would significantly improve the performance of learning
by avoiding expensive data-labeling efforts [96]. This study utilizes the off-the-shelf
feature approach of transfer learning herein. In this approach, we use the outputs of
one or more layers of a network trained on a different task as generic feature detectors
and train a new shallow model based on these features for target data [119, 135].
It is a popular approach in deep learning where pre-trained models are used as
the starting point on natural language processing tasks given the vast compute and
time resources required to develop neural network models on these problems and
from the huge jumps in skill that they provide on related problems, wherein much
training data can be found in one domain, but little to none in another [123] such
as sentiment classification [37, 10].
2.6.9 Conditional GAN
Conditional Generative Adversarial Nets (CGAN), which is an extension of vanilla
GAN, is originally designed to generate artificial image that can scarcely be distin-
guished from real image under the specific condition of continuous vector value.
GAN simultaneously trains two networks: a generator that learns to generate
fake samples from an unknown distribution or noise and a discriminator that learns
to distinguish fake from real samples [38].
In the CGAN, the generator learns to generate a fake sample with a specific
condition or characteristics (such as, a label associated with an image or a more
detailed tag) rather than a generic sample from unknown noise distribution. To add
28
such a condition to both generator and discriminator, a vector y must be fed into
both networks. Hence, both the discriminator D(X, y) and generator G(z, y) are
jointly conditioned to two variables, z or X and y.
The objective function of CGAN is:
minGmaxDV (D,G) = Ex∼Pdata(x)[logD(x)] + Ez∼Pz(z)[log(1−D(G(z, y), y))]
The difference between GAN loss and CGAN loss lies in the additional parameter
y in both a discriminator and generator function. The architecture of CGAN shown
in the following figure now has an additional input layer (in the form of condition
vector C) that is fed into both the discriminator and generator networks.
29
Chapter 3
Customer-voice classification
3.1 Background
Customer voice (Voice of the customers, VOC) is a term that denotes the feelings of
customers regarding their experience with a product, service, or business. Explicit
complaints and requirements, as well as the unsatisfied needs of customers and over-
all satisfaction, are inherent in customer voice. By analyzing customer voice, thus,
product developers obtain a detailed understanding of customer requirements and
appropriate design specifications for a new product. Additionally, it could be a com-
mon language for a team to proceed forward during product development and a
highly useful springboard for product innovation [36, 40].
Thus, several companies attempt to identify and respond to customer needs and
expectations through customer-voice analysis [59] [131], and it is important to cat-
egorize customer-voice data for relevant departments and responsible individuals.
For instance, the categorization of customer-voice data of a mobile device into sys-
tem, user interface, design, and appearance categories allows it to be delivered to
relevant departments and also provides overall information on customer-voice dis-
tribution according to function. Therefore, it is necessary for customer-voice data to
be classified into functional categories prior to analyzing the data.
31
Figure 3.1: Summary of customer-voice data analysis process
Then, the customer-voice data is gleaned across a variety of channels including
phone, e-mail, and the web, and it is stored in a text document such as Figure 3.1.
The customer-voice data consists of extremely unstructured text since e-mail con-
tents or phone call recordings are stored without any proofreading. Thus, it typically
includes mistakes, such as typo’s and other informal terms including interjections
and slang. With respect to the aspects related to the representation and classifi-
cation of customer-voices, these words are considered as noisy data since they do
not provide significant information on the meaning of a customer-voice. Further-
more, noisy data typically exerts a negative effect on the classification task, and
even small amounts of noisy data can severely decrease overall performance [78].
Figure 3.2: Scope of proposed approaches
32
Thus, this study mainly focused on proposing document de-nosing method to
clear the customer-data and an advanced document representation method that is
appropriate for customer-voice data, while building the automatic classifier because
the representation of a document is an essential task in document classification.
Moreover, the performance of a document representation method for customer-voice
data must be better than previous methods. Further, it must provide representa-
tional interpretability, as it might be analyzed for various purposes after the classifi-
cation task. Thus, we also consider the interpretability factor in our proposed docu-
ment representation method (Figure 3.2). Additionally this study proposes another
novel approach to apply convolution filter to document representation to improve
the classification performance.
3.2 Methodology
3.2.1 De-nosing documents
As described above, customer-voice data involves extremely unstructured data con-
taining mistakes such as typo‘s or other informal terms. It also contains less impor-
tant words to effectively represent each class. The removal of these noisy words by
novelty detection improves the representation and classification performance.
First, it is necessary to consider the application of the previously described nov-
elty detection method in a vector space of words calculated by neural embedding
model to detect the noisy words. A data set as shown in figure 3.3 is assumed to
exist. Each circle refers to word vectors calculated by neural embedding model. Ide-
ally, it is expected that purple circles and green circles are clustered into two main
clusters, and a yellow circle is classified as a novelty. However, the application of the
33
GMM novelty detection method on these data leads to the detection of both green
and yellow circles as novelties since these words are located at a distance from other
words as shown in figure 3.3. Additionally, red ‘+’ and blue ‘+’ indicate the means
of each Gaussian distribution. This implies that words that are distant from other
words due to their uniqueness and low frequency are classified as novel words based
on the previously described novelty detection method although these words consti-
tute meaningful words that explain specific classes or important words with respect
to the classification task. Thus, the application of the previously stated novelty de-
tection method without modification is not sufficient for the effective detection of
novel words.
Figure 3.3: Limitation of the previously stated novelty detection method
The utilization of the class vector addresses this limitation. As described in
section 2, class vectors have high cosine similarity with words that discriminate
between classes. Hence, each class vector is assumed as a mean or centroid of each
words distribution to consider words that are close to each class vector or have
high PDF value as meaningful words to effectively explain each class. Meanwhile,
34
words that are far from the class vector or possess a low PDF value are considered
as noisy words, such as typo‘s, or less important words to discriminate between
classes. Figure 3.4 shows the advantage of the proposed novelty detection method
that utilizes a class vector. In the proposed method, the class vector is located near
the centroid of each word distribution that is composed of words that represent each
class. Although a word distribution composed of a small number of green words
exists, a class vector that is indicated by a green ‘+’ is located near the centroid
of the word distribution. Therefore, the proposed method effectively classifies the
meaningful words and novel words by utilizing a class vector. Thus, in this study,
an alternative is proposed to previous novelty detection methods, such as Gaussian
mixture model and K-means clustering approach which are most frequently used in
novelty detection task, to utilize a class vector.
Figure 3.4: Advantage of proposed novelty detection method
The details of the proposed novelty detection are presented below. Formally,
let set of documents D = {d1, ..., dN} where N denotes the number of documents.
Additionally, the set of words W = {w1, ..., wV } where V denotes the total number
35
of word in D, and C = {c1, ..., ck} where k denotes the total number of class in D.
Word vector wi and class vector cj is h-dimensional vector that represents each
word and each class, and h denotes the number of hidden nodes as defined by a user
in the neural embedding model. The number of class vectors is equal to the data
classes.
1) Calculate vector dimension of each words wi and each class cj. Specifically, wi
and cj are calculated by optimizing function∑V
i=1 log p(wi|wcontext)+∑k
j=1
∑Vi=1 log(wi|cj).
2) Calculate the novelty score with improvements of the Gaussian mixture model
and the K-means clustering method utilizing a class vector.
(1) Improvements of Gaussian mixture model :
Apply improvements of GMM method considering each class vector as the means
of each distribution. Each distribution is assumed as the distribution of words of
each class. The improvements of the GMM method is also represented as a weighted
sum of k component Gaussian densities as given by the following equation:
p(W |µ,Σ) =
k∑j=1
mjg(W |µj ,Σj)
where mj, j = 1, ..., k, denotes the mixture weight and g(W |µj ,Σj), j = 1, ..., k, de-
note the component Gaussian densities. Each component density belongs to a Gaus-
sian function of the following form:
g(W |µj ,Σj) =1
(2π)h/2|Σj |1/2e−
12
(x−µj)tΣ−1j (x−µj)
Then, mean vector µj is fixed with each class vector cj, and only mj and Σj is
36
calculated and updated by the Expectation-Maximization (EM) algorithm as follows.
mj =1
V
V∑i
mjp(wi|µj ,Σj)
p(wi|µ,Σ)
Σj =
∑Vi=1(wi − µj)(wi − µj)T
mjp(wi|µj ,Σj)
p(wi|µ,Σ)∑Vi=1
mjp(wi|µj ,Σj)
p(wi|µ,Σ)
(2) Improvements of K-means clustering :
The improvements of the KMC method considers each class vector as the cen-
troids of each cluster. Each cluster is assumed as the cluster of words of each class.
The improvements of the KMC method aims to minimize an objective function J
known as a squared error function given by the following expression:
J =
k∑j=1
∑W∈Sk
dist(W,µj)2
where S = {S1, ..., Sk} denotes sets of clusters. The centroid vector µj is then fixed
with each class vector and assigns the data point to the cluster center whose distance
from the cluster center corresponds to the minimum of all the cluster centers. It does
not require an additional step to recalculate and obtain a new centroid. The distance
between word wi and centroid of cluster containing wi is utilized as a novelty score.
3) Finally, PDF value, weighted sum of k component Gaussian densities are
utilized as a novelty score in the variation of GMM approach, and the distance
between a word wi and the centroid of the cluster containing wi is utilized as a
novelty score in a variation of the KMC method to detect novel words. This implies
that words with a PDF value lower than specific probability, user define, are consider
37
as novel words in the variation of the GMM approach. Words with a distance from
the centroid that exceeds the specific distance, user set, are considered as novel words
in the variation of the KMC approach.
In step 1), each word vector wi and class vector cj is calculated by neural embed-
ding model. The number of dimensions of wi and cj denotes the number of hidden
nodes of the neural embedding as defined by a user. A variation of the Gaussian
mixture model and K-means clustering method that utilizes a class vector to calcu-
late the novelty score in step 2) is used. The novelty score is calculated by the PDF
value or the distance from the centroid in each method. In step 3), novel words are
detected by the threshold of novelty score as defined by the user in each method.
3.2.2 Probabilistic word clustering based document representation
Consideration of the membership strength
As mentioned above, previous word-clustering-based approaches has a limitation
related to reflecting the membership strength of words with respect to each cluster.
That is, previous approaches represent a document based on the hard clustering
method and does not differentiate in terms of frequency count as to whether a word
is located closest to the centroid of each cluster or located far from the centroid.
Words are clustered in customer-voice data for a mobile device collected from
LG Electronics by using the spherical K-means method [148] in a manner identical
to that in a previous study to show the limitation of not considering the member-
ship strength of a word with respect to a cluster. With respect to the spherical
K-means method, data located near each centroid are considered to exhibit a strong
membership strength with each centroid. Table 3.1 & Table 3.2 show the lists of
38
words in the 7th cluster among 70 clusters with respect to cosine dissimilarity from
the centroid. Cosine dissimilarity, 1− cos(x, y), is the distance measure used in the
spherical K-means method. A close look at the 7th cluster indicates that it may
contain words related to water damage or breakage. Additionally, it is also revealed
that words located near centroid, such as rust, humidity, and LCD are meaningful
keywords to clearly represent the property of a cluster, while other words located
far from the centroid, such as think, daily, and terminal appear as relatively general
words that are not strongly related to the water damage or breakage topic. A domain
expert of LG electronics was involved in the study and shared the same opinion as
the observations of this study.
Table 3.1: Word list located closest to the centroid
Word Dissimilarity Word Dissimilarity Word Dissimilarity
rust 0.3053 careful 0.3417 broken 0.3718
mistake 0.3252 carelessness 0.3579 part 0.3809
humidity 0.3317 LCD 0.3662 dent 0.3846
throw 0.3405tempered
glass0.3697 appearance 0.3895
Table 3.2: Word list located far from the centroid
Word Dissimilarity Word Dissimilarity Word Dissimilarity
integrated 0.9137 sticker 0.8922 do 0.8422
just 0.9078 two 0.8873 ambiguous 0.8314
pay 0.9023 tear 0.8713 grudge 0.8076
terminal 0.8973 daily 0.8573 think 0.8033
Hence, it is reasonable to differentiate between words in the frequency count.
That is, words exhibiting a strong membership strength with clusters need to have
a higher representation in the frequency count, as these words better represent the
39
property of the cluster. The consideration of membership strength is expected to in-
crease the impact of meaningful keywords in document representation. Further, it is
expected that the proposed representation method will be more robust with respect
to noisy words, as noisy words can have a lower representation in the frequency
count.
Probabilistic document representation
In this study, two soft clustering methods, namely, the FCM and GMM clustering
methods, are applied to measure the membership strength of each word with respect
to clusters. The application of the soft clustering methods enabled the measurement
of the membership strength of words by wij . In the FCM clustering method, wij
denotes the degree to which wordi belongs to cluster Cj , and wij denotes the prob-
ability that the wordi is generated from the distribution of cluster Cj in the GMM
clustering method.
The following figure 3.5 summarizes the proposed document representation method.
Figure 3.5: Document representation based on probabilistic word clustering
Formally, let the set of documents D = {d1, ..., dN} and set of words W =
40
{w1, ..., wn} where n denotes the total number of words in D. c denotes the number
of clusters the user defines, and dist(a, b) denotes the cosine dissimilarity between a
and b. Furthermore, centj denotes the centroid of clusterj .
The membership strength mij denotes the scalar value that represents the mem-
bership strength of wordi with jth cluster in which mij ∈ [0, 1]. Additionally, the
document vector Vk and its normalized vector Vk correspond to the jth document,
respectively.
Then, the proposed document representation method is calculated as follows:
1) Calculate the h-dimensional vector of each word in W by using the neural
embedding model, where h denotes the number of hidden nodes in the model. Each
wi is calculated by optimizing the function
n∑i=1
log p(wi|wi−k, ..., wi+k)
where k denotes the window size of neighboring words. Then, the hidden nodes could
be used as the representations of words wi
2) Cluster all words wi and calculate the membership strength mij for all i, j by
using the FCM and the GMM clustering methods.
(1) Apply the FCM clustering method to calculate the membership strength mij
by using the following equation:
mij =1∑c
k=1
(dist(wi, centj)
dist(wi, centk)
)2 (3.1)
41
while minimizing an objective function as follows:
argminC
n∑i=1
c∑j=1
m2ijdist
2(wi, cj)
(2) Apply the GMM clustering method to calculate the membership strength
mij by using the following equation:
mij =pj(wi|zj , θj) · αj∑ck=1 pk(wi|zk, θk) · αk
(3.2)
while p(w|θ) is defined as a finite mixture model with c components, and each
component is a multivariate Gaussian density defined with parameter θj = {µj ,Σj}
as follows
p(w|θ) =c∑j=1
αjpj(w|zj , θj),
pj(w|θj) =1
(2π)h/2|Σj |1/2e−
12
(w−µj)tΣ−1j (w−µj)
and αj = p(zj) denotes the mixture weight that represents the probability that
a randomly selected w is generated by components j, where Σcj=1αj = 1. Each
parameter is updated by the EM algorithm.
3) Calculate the document vector Vk = [vk1, ..., vkj , ..., vkc] by the following equa-
tion:
vkj =∑i
(cf ijk ×mij) (3.3)
where cf ijk denotes the frequency of wi that is included in jth cluster in dk.
4) Calculate the normalized document vector Vk = [vk1, ..., vkj , ..., vkc] by the
42
following equation:
vkj =vkj∑j vkj
× logN
df j(3.4)
where j = 1, ..., c, k = 1, ..., N , df j denotes the number of documents containing
words included in the jth cluster.
In step 1), the word vector wi is calculated by the neural embedding model. As
described previously, the number of dimensions of wi corresponds to the number of
hidden nodes of the neural embedding model that is defined by the user.
The membership strength mij for all i, j is calculated in step 2). Two soft
clustering methods, namely the fuzzy C-means method and Gaussian mixture model,
are used. Equation (1) is used to calculate mij with the FCM clustering method,
and Equation (2) is used to calculate mij with the GMM clustering method.
In step 3), the document vector prior to normalization is calculated by multiply-
ing the membership strength mij and cf ijk, which is the frequency of wordi that is
included in the jth cluster in the kth document based on equation (3).
In step 3), each dimension is first divided by summing the entire dimension for
normalizing based on equation (4). Normalization is applied to create a robust doc-
ument representation based on the length of the document. As mentioned above,
the customer-voice data are represented in extremely unstructured texts of various
lengths, a longer text often contains a large amount of repetition. Without normal-
ization, the customer-voice data of different lengths containing similar contents can
be differently represented and classified into different categories. Second, logN/df j
is multiplied with each dimension according to equation (4) for concept frequency-
inverse document frequency (CF-IDF) effect used on the previously specified bag-of-
concepts approach. The CF-IDF corresponds to the weighting scheme that readjusts
43
the count of concepts based on its frequency in the entire corpus. If a certain concept
occurs in every document in the corpus, it is considered as relatively unimportant,
thus reducing its frequency.
3.2.3 Word-clustering based document representation with VAE
and its probabilistic version
Formally, let the set of documents D = {d1, ..., dN} and the set of words W =
{w1, ..., wn} where N and n respectively denotes the number of documents and the
total number of words in D. c and dist(a, b) respectively denotes the number of
clusters user defines and the cosine distance between a and b. Word vector wi is h-
dimensional vector that represents each word, and h denotes the number of hidden
nodes the user defines in the VAE model. And membership strength mij denotes the
scalar value that represents the membership strength of wordi with clusterj , in which
mij ∈ [0, 1]. Document vector Vk denotes the c-dimensional vector of document, and
formally corresponds to Vk = [vk1, ..., vkj , ..., vkc], where k = 1, ..., N .
First, calculate vector dimension of each word wi using the VAE architecture. wi
is calculated by Rxi in the function E(xi; z) = −zTRxi− bxi as explained in section
2. Second, cluster all words wi and calculate membership strength mij for all i, j.
In hard clustering version, membership strength is assigned as binary value 1 or 0.
And it is calculated using following equation in the probabilistic clustering version:
mij =pj(wi|zj , θj) · αj∑ck=1 pk(wi|zk, θk) · αk
(3.5)
where i = 1, ..., n, j = 1, ..., c, while p(w|θ) is defined as a finite mixture model
with c components, and each component is a multivariate Gaussian density defined
44
with parameter θj = {µj ,Σj} as follows
pj(wi|θj) =1
(2π)h/2|Σj |1/2e−
12
(w−µj)tΣ−1j (w−µj) (3.6)
The EM algorithm updates each parameter. Finally, calculate the document
vector Vk = [vk1, ..., vkj , ..., vkc] by the following equation:
vkj =∑i
(cf ijk ×mij) (3.7)
where cf ijk calculated by the frequency of wi that is included in clusterj of Vk.
3.2.4 Matrix representation of word-clustering based document rep-
resentation
As aforementioned, previous document representation studies based on word cluster-
ing utilized word representation from individual architecture such as co-occurrence
or neural embedding. Thus, discriminative power, one of the criteria of document
representation performance, varies with the kind or attribute of the document [62].
Thus this study proposed matrix representation approaches to concatenate the var-
ious word clustering based document representation methods shown in figure 5.4.
In previous studies, each document is represented by vector representation de-
rived from individual architecture. By contrast, our proposed method represents
each document in matrix form. Irrespective of the architecture or word representa-
tion method, the matrix form could be constructed by solely specifying the number
of clusters and easily combining the document representation derived from various
word representation algorithms or various clustering algorithms. For instance, we
45
easily concatenate the additional document representation derived from other algo-
rithm, such as Fuzzy C-means clustering, by calculating the mij by the equation as
follows:
mij =1∑c
k=1
(dist(wi, centroidj)
dist(wi, centroidk)
)2 (3.8)
and minimizing an objective function argminc∑n
i=1
∑cj=1m
2ijdist
2(wi, centoridj).
where centroidj denotes the centroid of each clusterj in section 3.1.
As a footnote, we devised our matrix representation approach from the approach
of multiple feature extraction approach and ensemble learning. Multiple feature ex-
traction approach used in many studies in document analysis [146]. And ensemble
learning helps improve machine learning results by combining several models and
this approach allows the production of better predictive performance compared to
a single model. Furthermore, the efficiency of the ensemble approach of classifier
are proved both theoretically and practically in many studies [82]. In this sense, we
assumed and expected that the concatenation of document representation will show
the better discriminative power rather than individual representation, similar to the
multiple feature extraction approch or ensemble of classifiers.
In experiments, we construct matrix representation by combining 7 different
word-clustering based representations such as co-occurrence based word-clustering
approach, Neural embedding based word-clustering approach, its probabilistic ver-
sion with FCM and GMM, VAE based word-clustering approach, its probabilistic
version with FCM and GMM.
46
3.2.5 Applying convolution filter to matrix representation
As mentioned previously, matrix representation is appropriate for word clustering
based document representation and has similar effectiveness of the ensemble ap-
proach. In spite of those advantages, we need to deal with the relatively large size of
matrix representation compared to the individual vector representation which lead
to increase the complexity and possibility of over-fitting for further classification
model. Thus, we apply convolution filter to matrix representation for addressing
those limitations.
Rearrange the elements of each document representation vector
To apply convolution filter to matrix representation, each element of representation
vector is rearranged by semantic meaning while it preserve the semantic distance
among the word clusterings.
The Figure 3.6 illustrates the reason why the rearrangement process is required.
In the word clustering based document representation, each element contains the se-
mantic meaning. For instance of customer-voice, greyscale elements contains design
related words or blue scale elements battery related words. Without rearrangement,
there are no correlation between neighboring elements like the left dog image in the
Figure 3.6 and we could not extract appropriate local feature with applying convolu-
tion filter. Thus we apply rearrangement process while preserving semantic meaning
of each document representation to get the appropriate matrix representation output
like the right dog image.
In the rearrangement process, first, we apply t-SNE algorithm to one specific
representation vector used as benchmark to determine the order of elements pre-
47
Figure 3.6: The Reason for rearranging the each representation
serving the semantic distance of word clusters. We project all word cluster to 1-
dimensional space with t-SNE algorithm since it preserve the distance of each data
points when embedding high-dimensional space into low-dimensional space. Then,
we utilize those semantic order determined by t-SNE algorithm as the order of ele-
ments of representation vector as figure 3.7.
In details, individual document representation is constructed in 1-dimension
while each element represents each word clustering. By projecting all word clus-
ter into 1-dimension, we easily match the semantic order of word cluster, used as
benchmark, and elements in other document representation.
Second, we put the elements in one-to-one correspondence between presentation
vectors based on the semantic meaning of benchmark representation vector. For this
end, we linearly transformed word clustering space of each representation vector to
those of benchmark representation while minimizing the squared sum of error of
48
distance between data points, namely word. Then, we correspond the word clusters
which are located closest each other as figure 3.8. In correspondence, we find the
closest clusters heuristically in order of silhouette index of benchmark representa-
tion since clusters with low silhouette index will not have much significance to the
representation.
Once aforementioned process is done, all elements of individual document rep-
resentation will be arranged by same semantic order of benchmark representation.
For instance, if elements order of benchmark representation configured as follows:
design related word cluster, performance related word cluster, ..., battery related
word cluster, then other representations are also rearranged by same semantic order
with our proposed method.
Applying convolution filter
After rearranging the elements of each representation vector, namely word clustering,
considering the semantic order, we apply convolution filter to matrix representation.
We use two levels of convolution filter with the size of 3x1 (within the each repre-
sentation vector) and 2x2 respectively. (figure 5.4). We used the relatively small size
of 2-layered convolution filter instead of large sized filter with reference to experi-
mental result of VGG network [121] or Inception-v2 [129]. Then, finally we add the
fully connected neural network layer for final classification model in the following
experiments section.
49
Figure 3.7: Preserve semantic distance
Figure 3.8: One-to-one correspondence
Figure 3.9: Rearrange the elements
50
3.3 Experiments
3.3.1 Data description
In order to verify the discriminative power of our proposed method, we collected
the customer-voice data, one of the real business text data, the Reuter news and
20 Newsgroup dataset, two of the public text data. First, customer-voice data was
collected from Mobile Communication (MC) department in LG Electronics between
April 23, 2014 and March 23, 2017 (Table 3.3). The data were manually labeled by
domain experts in LG Electronics into 12 classes. In order to avoid a class imbalance
problem, 900 customer-voice datasets were collected from each class.
Table 3.3: Customer-voice dataset
ClassesNumberof data
Averagewords
per dataClasses
Numberof data
Averagewords
per data
OS upgrade 900 107.62Network
connection900 111.37
Multimedia 900 121.64 Call & message 900 108.49
Hard key &input error
900 119.57Heating &processing
900 110.27
Water-proof &dust-proof
900 108.26Battery &
power900 81.24
Accessory 793 84.31Appearance &
display900 90.34
Security &backup
900 95.73 User Interface 900 93.66
3.3.2 Experiments setup
First experiment is performed to verify the effectness of de-nosing effect. De-nosing
customer-voice data representation is constructed and is composed of only words
that are not determined as a novelty to compare the representational effectiveness
51
and the classification performance of the customer-voice data representations by ap-
plying the proposed method and the previous method. Each 1%, 2%, 3%, 4%, 5%,
6%, 8%, 10%, 12%, 15% or 20% of novel words detected by the proposed method
and the previous method is removed preliminarily as a means of de-noising prior
to constructing the customer-voice data representation. Then, the result of repre-
sentational effectiveness and classification performance of each customer-voice data
representation is compared by applying de-noising.
The customer-voice data representation methods include the Term Frequency-
Inverse Document Frequency(TF-IDF), Latent Semantic Analysis(LSA), topic vec-
tor, neural embedding based word clustering approach, and probabilistic word clus-
tering based approach. The TF-IDF is most common document representation method
in which a document is fundamentally represented by the counts of word occurrences
within the document [5, 81]. LSA is the technique applying singular value decompo-
sition(SVD) in term-frequency matrix to reduce the number of rows while preserving
the similarity structure among columns [69]. The topic vector is an inferred topic
proportion that is typically used as a topic feature to represent the document [16].
Additionally, in the neural embedding based word clustering approach [61, 127] and
probabilistic word clustering based approach [72], semantically similar terms are
clustered into a common cluster by clustering word generated from neural embed-
ding. Document vectors are subsequently represented by the frequencies of these
clusters. The only difference between these methods is that the probabilistic word
clustering based approach additionally considers the membership strength of words
by utilizing a soft clustering method. In this experiment, the number of clusters is
fixed at 150 for the neural embedding based word clustering approach and prob-
52
abilistic word clustering based approach to minimize the impact of the number of
clusters in the experiments.
Second experiment is performed to measure the classification performance of our
proposed method. The classification result is considered as correct if the document
is predicted as its actual class by the prediction model. A major voting ensemble
model of neural networks that is used in several studies [94, 13, 92], is constructed
for the classification task.
In the experiments, classification performance based on the proposed document
representation method are compared to those generated from the bag-of-words,
co-occurrence based word-clustering approach, doc2vec, Neural embedding based
word-clustering approach and its probabilistic version, VAE based word-clustering
approach and its probabilistic version.
Additionally, we also performed a comparison of the proposed method to ordered
document representation methods such as a convolutional neural-networks (CNN)
based model [29, 63] and recurrent neural-networks (RNN) based model [76] to
validate the performance of the proposed method more significantly. Those two
methods were designed to share the same parameter with the experiments in their
studies.
Moreover, in order to solely focus on the effect of applying convolution filter, we
experimented other variations of matrix decomposition approach; Singular Vector
Decomposition (SVD) [44], Non-Negative Matrix Decomposition (NMF) [23, 32].
We implement our proposed method and other benchmark method with Python,
Tensorflow. In pre-processing stage, we removed stop word with NLTK library and
stemmed word with snowball library. And The proposed method, doc2vec method,
53
and neural-embedding-based word-clustering approach were designed to share the
same window size of eight and the number of hidden layers in the training word vec-
tor (300) to minimize the influence of hyperparameters on the experiment. Hence,
in order to observe the impact on the overall experiment, several values were exper-
imented with the number of clusters beginning from 20 to 200 with increments of
10. And CNN and RNN based approach in implemented following aforementioned
papers [29, 63, 76].
Lastly, we measure the accuracy of classification performance by counting the
number of correct predictions are located in the diagonal of the confusion matrix.
Confusion matrix is a specific table layout that allows visualization of the perfor-
mance of an algorithm. Each row of the matrix represents the instances in a predicted
class while each column represents the instances in an actual class.
3.3.3 Experiments results
De-nosing documents
Table 3.4 shows words with the lowest novelty score as determined by the proposed
method and the previous method. Novelty score of GMM method is calculated by
minus of logarithm of the PDF value and that of KMC is calculated by distance
from the closest centroid. In the proposed method, words with lowest novelty score
constitute considerably discriminative words to represent each class such as ‘LCD ’,
‘voice’, ‘security ’, and ‘WiFi ’. In the previous method, words with the lowest novelty
score are general words to discriminate between classes such as ‘phone’, ‘again’, and
‘after ’. Especially, in the previous KMC method, extremely general words, such as
‘my ’, ‘of ’, and ‘it ’ are extracted. This implies that the novelty score of the proposed
54
Table 3.4: Words with lowest novelty score
Novelty detectionmethod
Examples of words (Novelty score)
GMM with classvector (Proposed)
LCD(-143.32), breakage(-142.87), Marshmellow(-141.85),break(-141.69), health(-137.66), voice(-133.93),GPS (-133.23), battery(-132.46), volume(-130.04),QWERTY (-127.24)
KMC with classvector (Proposed)
touch(0.1728), restore(0.1947), security(0.2676),Lollipop(0.3927), WiFi(0.3022), ringtone(0.3169),LCD(0.3173), memo(0.3414), message(0.3503),backup(0.3850)
Previous GMMis(-176.90), do(-175.71), again(-175.75), various(-174.43),season(-162.19), after(-160.06), opposite(-154.20),Samsung(-152.29), phone(-149.64), important(-147.67)
Previous KMCand(0.1584), my(0.1606), of (0.1697), it(0.1754),your(0.1808), was(0.1811), have(0.1989), this(0.1990),is(0.2000), no(0.0.2006)
method is a proper measure when compared to the previous method to determine
whether each word effectively represents each class.
Table 3.5: Words with highest novelty score
Novelty detectionmethod
Examples of words (Novelty score)
GMM with classvector (Proposed)
aguardo(146.61), de(146.61), suddenly(146.59),regards(146.59), may(146.57), holiday(146.45),sus(146.38), poseedor(146.32), why(146.30),method(146.28)
KMC with classvector (Proposed)
both(0.7652), volkswagen(0.7652), normal(0.7652),blah(0.7651), if (0.7651), time(0.7651), age(0.7651),last(0.7651), uu(0.7647), SIRS (0.7647)
Previous GMM
electronic(29.39), statement(29.33), eBay(29.16),YouTube(29.07), native(28.95), connection(28.86),showing(28.65), progress(28.52), VOLTE (28.33),photography(28.33)
Previous KMC
premium(0.4109), repair(0.4101), than(0.4101),Media(0.4092), provide(0.4090), open(0.4074),Windows(0.4071), read(0.4070), music(0.4064),GUI (0.4043)
Table 3.5 shows words with the highest novelty score determined by each method.
55
Typo‘s including ‘de’, ‘sus’, and ‘uu’ and meaningless words including ‘blah’, ‘last ’
and ‘may ’ are effectively detected in the proposed method and not detected in the
previous method. From a qualitative viewpoint, these results indicated that the pro-
posed method performed better in the detection of novel words. Additionally, it is
intuitively expected that the representational effectiveness and classification perfor-
mance will improve when those words are detected and removed by the proposed
method.
Table 3.6 and Figure 3.15 show the results of the classification performance of
customer-voice data by applying the proposed method and the previous method. In
a manner similar to the results of representational effectiveness, the results of the
classification performance of the proposed method improve steadily with increases
in the removal ratio of novel words and outperform that of the previous method
with respect to all representation methods. The reason for the better performance
of the proposed method is attributed to the fact that it can detect novel words more
effectively than previous method by utilizing a class vector.
Document representation
We represent the results of classification performance of customer-voice data (Table
3.7) with respect to varying dimensions.
First, the matrix representation with convolution filter outperforms all other doc-
ument representation methods in all dimensions. As we expected, by concatenating
the each representation vector it shows the better discriminative power rather than
not only other word clustering based representations but also other ordered method
such as RNN or CNN based approach. Moreover, our proposed method shows quiet a
56
Table 3.6: Accuracy of classification performance (*: Proposed method)
Noveltydetectionmethod
No de-noising
5% de-noising
10% de-noising
20% de-noising
GMM withclass vector*
0.6403 0.6471 0.6510 0.6523
TF-IDFKMC with class
vector*0.6403 0.6506 0.6522 0.6545
Previous GMM 0.6403 0.6311 0.6358 0.6364
Previous KMC 0.6403 0.6329 0.6391 0.6346
GMM withclass vector*
0.6723 0.6902 0.6982 0.7027
Neural embeddingbased
clustering [61, 127]
KMC with classvector*
0.6723 0.6874 0.6918 0.7053
Previous GMM 0.6723 0.6668 0.6498 0.6555
Previous KMC 0.6723 0.6739 0.6700 0 0.6690
GMM withclass vector*
0.8638 0.8808 0.8876 0.8907
Probabilisticclusteringbased [72]
KMC with classvector*
0.8638 0.8867 0.8856 0.8994
Previous GMM 0.8638 0.8642 0.8661 0.8657
Previous KMC 0.8638 0.8695 0.8605 0.8738
GMM withclass vector*
0.6401 0.6626 0.6651 0.6698
Topic vectorKMC with class
vector*0.6401 0.6616 0.6719 0.6758
Previous GMM 0.6401 0.3890 0.6270 0.6419
Previous KMC 0.6401 0.3802 0.6487 0.6497
GMM withclass vector*
0.6443 0.6532 0.6572 0.6627
LSAKMC with class
vector*0.6443 0.6552 0.6631 0.6728
Previous GMM 0.6443 0.6469 0.6514 0.6507
Previous KMC 0.6443 0.6502 0.6467 0.6439
57
Figure 3.10: TF-IDF
Figure 3.11: Neural embedding based word clustering [61, 127]
Figure 3.12: Probabilistic word clustering based approach [72]
Figure 3.13: Topic vector
Figure 3.14: LSA
Figure 3.15: Accuracy of classification performance
58
stable performance with respect to varying dimension while previous methods show
extremely low classification performance in the dimension of 20 or 30. It means that
matrix representation approach is appropriate representation method to be used in
document classification task which accord with our expectation.
And in the comparison of convolution filter, 2x2 filter shows the rather higher
performance than 3x1 filter. It means that local feature between individual repre-
sentation has meaningful effects on classification of the document. We can infer that
difference between each word clustering result lead to make a difference between
semantic order of element and it works as meaningful feature in the classification
task.
In the effect of matrix factorization, we cannot find out the critical difference
after applying matrix factorization, rather decrease the classification performance
a bit than naive matrix representation. It indicates that matrix factorization has
no other benefit except dimension reduction effect in the matrix representation of
document. Additionally, VAE based presentations show quite a higher result than
other representation vectors. However, it does not outperform the neural embedding
based presentation since VAE based approach could not capture the contextual
information of word while deriving the word representation.
Furthermore, this study provides an intuitive interpretation for the generated
vector. The strength of the approach is inherited by the approach proposed in the
present study. Table 4.3 shows that the proposed method successfully offers a clear
interpretation of the generated vector. The words in the cluster listed in table 4.3
indicates that each cluster contains words that are closely related to each class.
This implies that the customer-voice data in each class are represented by words in
59
Table 3.7: Accuracy of classification performance of customer-voice data
Number of clusters
40 80 120 160 200
Matrix representation(2x2 filter)
80.64% 85.90% 88.22% 87.45% 88.73%
Matrix representation(3x1 filter)
79.78% 84.78% 86.74% 86.37% 87.90%
Matrix representation(NMF)
78.42% 83.79% 85.28% 86.42% 86.59%
Matrix representation(SVD)
78.43% 83.46% 85.86% 86.22% 86.23%
Matrix representation(Naive)
79.51% 84.24% 86.99% 87.24% 86.45%
VAE based probabilisticclustering
71.51% 78.51% 83.23% 82.10% 81.45%
VAE based wordclustering
72.13% 77.78% 76.75% 78.73% 78.94%
Neural embedding basedprobabilistic clustering
77.51% 80.24% 84.05% 83.61% 84.45%
Neural embedding basedword clustering [61, 127]
71.13% 76.78% 78.05% 78.84% 78.94%
Co-occurrence basedword clustering [86]
65.40% 66.42% 65.81% 64.46% 65.75%
CNN based [29, 63] 83.19%
RNN based [76] 81.64%
VAE based documentrepresentation [84]
70.67%
Doc2Vec [70] 72.22%
Bag-of-words 64.67%
frequent clusters and the name or topic can be easily assigned to each cluster by
viewing those keywords.
60
Figure 3.16: Accuracy of classification performance of customer-voice data
Table 3.8: Example of representation interpretation
Customer-voice example ClassMost
frequentcluster
Words in mostfrequent cluster
(Cosine dissimilaritywith centroid)
There are other problems, most ofwhich involves display brightness...
Display 3rd / 70Screen(0.1649),
Brightness(0.1873),Display(0.1947)
When I took pictures, they weresaved in the sd card until now...
Multimedia 24th / 70Camera(0.2073),Photo(0.2491),Shutter(0.2556)
After mounting camplus, I pressthe shutter button. The backupbattery is activated...
Battery &Power
12th / 70Battery(0.2134),Charge(0.2619),
Charging(0.2843)I got various accessories along withthe G5. However, the VR deviceleaves much to be desired...
Accessory 52th / 70Accessory(0.2267),Toneplus(0.2341),
VR(0.2682)
61
Chapter 4
User segmentation
4.1 Background
The term “user segmentation” refers to classifying users into groups depending on
their specific needs, characteristics, or behaviors to identify those who might re-
quire separate products or services [65]. Users can be segmented in different ways.
One way is to characterize the target customers by homogeneous preferences, that
is, grouping together customers that have roughly the same preferences [66]. User
segmentation has been identified as a key element of product development and mar-
keting. For instance, with user segmentation, product/service developers can develop
differentiated and personalized products/services for each segment, and marketing
personnel can create segmented advertisements and marketing communications for
each segment.
Applying user segmentation strategies for information gathering is highly bene-
ficial, particularly in the smartphone industry. First, smartphones have the capabil-
ity to collect and store various types of information, including the user‘s location,
communications, social networks, and lifestyle, which are effective sources of user
segmentation [26]. Second, hundreds of applications are often installed in a user‘s
smartphone, and a log of their application usage is a powerful resource for user
63
segmentation because it contains meaningful information regarding the user‘s pref-
erences, behaviors, interests.
In the smartphone industry, user segmentation is typically performed based on
the user‘s preferences, interests, or willingness to use. Furthermore, the applications
used by each user are the most meaningful and interesting source of identifying a
user‘s preferences and interests [51]. Therefore, considering which apps a user uses
and what patterns their apps are used in with regard to smartphone user segmen-
tation is essential. In this study, we propose novel ways of segmenting smartphone
users based on their app usage log collected from LG smartphones.
This study proposes a variant of the seq2seq architecture to represent each app
usage sequence, which processes a whole sequence, and not within limited windows,
and represents the sequence itself, and not the corresponding sequence. We then cal-
culate the vector representation of each user based on the representation of each app
usage sequence and derive the segmentation results by clustering the representation
of each user.
Irrespective of these meaningful results of first approaches, this approach could
not provide an intuitive interpretation of user segmentation because the users are
represented in a continuous vector space that is generated from a seq2seq archi-
tecture. Therefore, their study fell short of real business applications, which would
determine which app is most critical for user segmentation.
Here, for user segmentation, we additiaonllay propose two types of approaches
that are able to provide an intuitive interpretation based on the observations in the
study: (1) app clustering-based user segmentation and (2) network representation-
based segmentation. First, each app is embedded in the vector space by calculating
64
each app’s vector representation value using neural embedding architecture, and
characteristically similar apps, which are located close to each other in the vector
space, are grouped into a cluster. Each user is represented by the frequencies of these
clusters.
4.2 Methodology
4.2.1 Variant of the seq2seq based approach
Figure 4.1: Summary of our proposed method
By thoroughly reviewing the app usage sequences, we could determine that the
usage of each app is closely related to the usage of other neighboring apps. For
instance, similar categories of gallery apps are usually used next to the camera app,
and similar categories of voice call apps are usually used next to call log apps. In
addition, many people have their own habits of running through SNS apps, such
as Facebook, Twitter, and Instagram. In other words, sequential and contextual
information are meaningful in the app usage sequence of each user.
The contextual information of the app usage sequence is meaningful, as well as
65
the words and documents. Accordingly, the neural embedding architecture would be
the first option to represent app usage sequences because it is designed to represent
words and documents based on their contextual information. However, the neural
embedding architecture considers only words within the window size, and not the
whole sequence. That is, the neural embedding model is limited in its ability to
represent the entire app usage sequence.
Thus, the existing seq2seq architecture would be the second option, which is orig-
inally proposed to generate sequences of words by predicting the next word while
considering the entire sequence. This architecture performs very well in the machine
translation field. As mentioned earlier, sequential and contextual information of app
usage sequences are also meaningful. Therefore, app usage sequences can be used
as data to be processed with these kinds of architectures. Moreover, the seq2seq
architecture contains a context C node. This node is suitable for the representation
of sequences because it is originally designed to summarize all the encoded infor-
mation. Thus, we utilize herein the context vector as representation of the encoded
app sequence information, which further supports the applicability of these kinds
of architectures to the representation of app usage sequences. However, previously
developed architectures are composed of different input and output parameters be-
cause they were originally designed to generate corresponding sequences, and not
sequences.
Thus, we propose the use of a variant of the conventional seq2seq architecture
that receives an app usage sequence as the input of the encoder and generates
the same app usage sequence in the decoder instead of using the seq2seq architec-
ture (Figure 4.2). By training the architecture this way, we take advantage of both
66
the neural embedding architecture, which calculates vector representation by be-
ing trained to predict context words, and the seq2seq architecture, which considers
the entire sequence in the training step. We expect to calculate a more appropriate
context vector C to represent each usage sequence with our proposed method com-
bining the advantages of both architectures. Summary of our proposed method is
illustrated in Figure 4.1.
Each app usage sequence A = (a1, a2, . . . , aT ) specifically defines the series of
apps used from the time the smartphone screen is turned on to the time it is turned
off. Each user normally has several app sequences per day.
Figure 4.2: Variant of the seq2seq architecture (our proposed architecture)
The hidden state h1 of the encoder in each time step t is updated by the following
equation:
h(t) = f(h(t−1), xt)
After reading the end of the sequence, the hidden state of the encoder becomes
the context vector C of the whole input sequence. The decoder of the proposed
model is trained to generate the output sequence by predicting the next app used
given the hidden state st. The hidden state decoder at time t, st, is computed as
67
follows:
h(t) = f(h(t−1), y(t−1), c)
The conditional distribution of the next app used is:
p(yt|y(t−1), y(t−2), ..., y1, c) = g(h(t), y(t−1), c)
where f is a sigmoid function, and g is a softmax function. The two components
of the proposed architecture are jointly trained to maximize the conditional log-
likelihood:
maxθ
1
N
N∑n=1
log pθ(yn|xn)
where θ is the set of model parameters.
As regards the details of our neural network architecture, our network contains
three levels of hidden layer in each sequence. The length of the encoder/decoder
sequence corresponds to 15, considering the maximum length of the app usage se-
quence (Figure 4.2). Moreover, sequences shorter than the maximum length are
padded with a constant value used in most real implementations to respond to a
variable size of the app usage sequence [132, 60].
We utilize context vector C as the vector representation of each app usage se-
quence after training the architecture using the set of the app usage sequences. This
calculated vector represents the usage sequence, and not the user. Thus, we need an
additional step for the final user segmentation result.
First, we segment the app usage sequence using the GMM method, which shows
the highest performance among all the other clustering methods, including K-means
68
clustering and fuzzy C-means clustering. We cluster each sequence based on the
highest conditional probability pj(x|θj) and fix the number of clusters to ten, which
is the same number of clusters identified by the domain expert. Each user is then
assigned to the segment, where most of his/her usage sequences are found (Figure
4.3).
Figure 4.3: Determination of user segmentation
4.2.2 App clustering and relative similarity-based segmentation
In this section, we 1) describe an interpretable approach of user representation based
on app clustering, 2) present two novel techniques to normalize and adjust the vector
value of user representation, and 3) propose a novel user segmentation method to
address the inherent limitations arising from absolute similarity by determining the
relative similarities between users.
App clustering-based user representation
In this approach, each app in the usage sequence of each user is represented using
neural embedding architecture. Based on the vector representations of the respective
apps derived from the architecture, characteristically similar apps are gathered into
69
neighborhoods and neighboring apps are then gathered into common clusters. Each
user is assigned a vector representing the counts of their total app usage within
each cluster, and the users are segmented based on these representations, as shown
schematically in Figure 4.4. This approach can also serve as an effective dimension-
ality reduction method for user representation to address the sparsity problem of
the N-gram model.
Figure 4.4: Summary of app clustering-based user representation.
Additional techniques for effective user representation
In this sub-section, we propose two novel techniques to make our proposed user
representation method more effective through assessments of relative importance
between app clusters and the significance of each app within a cluster.
Our initial goal is to make our proposed method robust to the effects of app clus-
tering. Under most clustering algorithms, the apps are normally assigned to different
clusters, with the obvious differences between apps located close to and distant from
the centroid of each cluster representing the characteristics of the cluster. We there-
fore seek to differentiate the calculation of app usage frequency by considering the
membership strength of each app in its cluster. To do so, we utilize a probabilistic
70
clustering method based on a Gaussian mixture model (GMM) to assess the mem-
bership strength of each app in calculating the app usage frequency. This increases
the effects of apps located near the cluster centroids while reducing the effects of
apps located away from the centroids.
We then normalize the user representation based on the relative importance
of various app clusters. In other words, we seek to discount app clusters that are
frequently used across most users as ineffective/insignificant clusters for representing
and segmenting users. To this end, we apply our normalizing method to emphasize
significant app clusters while reducing the impact of commonly used clusters.
Formally, this is done as follows. Letting cj denote the centroid of each app cluster
and the membership strength mij represent the membership strength of app ai in
cluster jth cluster, we first derive the vector value of each app ai from the neural
embedding architecture, then cluster all ai, and finally calculate the membership
strength mij for all i, j using the GMM clustering methods.
From this, we calculate the kth user vector Uk = [..., ukj , ...] using the equation
ukj =∑
i(fijk ×mij), where fijk is the frequency of ai within the jth cluster in uk
and c is the number of clusters defined by the user. In this step, we consider the
membership strength derived from the probabilistic clustering method to address
the user similarity limitations described above.
Finally, we normalize each user representation vector value using the equation
ukj∑j vuj× log N
ufj, where uf j denotes the number of users using apps included within
the jth cluster. Using this equation, it is possible to emphasize significant app clusters
while reducing the impact of commonly used clusters.
71
Relative similarity-based user segmentation approach
By considering the relative importance of app clusters and the significance of apps
within these clusters, our proposed user representation approach provides an effective
method of user representation. However, it is still necessary to consider other aspects
of the user segmentation problem, which normally requires that users be evenly
distributed among various clusters as opposed to mostly belonging to a specific
cluster, as illustrated in Figure 4.5. An examination of the segmentation results
produced on our dataset based on absolute similarity reveals that, in contrast to the
predicted results, several clusters tend to contain many users.
Figure 4.5: Comparison between actual and predicted segmentation results
To address this issue, we propose a novel user segmentation method based on
relative similarity between users instead of absolute similarity. Under this approach,
pairs of relatively similar users are found based on our app clustering-based user
representation and a network is constructed using these pairs. Users are then seg-
72
mented using a modularity detection algorithm. This approach is summarized in
Figure 4.6.
Figure 4.6: Summary of our proposed method for considering relative similarity.
The segmentation approach is implemented as follows. First, all users on the
embedding space learned from the app clustering-based representation are looked
up. For each user, the top k users with the greatest pairwise cosine similarity to that
user are selected. Once all of the users have been looked up, the overlaps among the
returned users are counted.
Using the counting results, a bipartite graph of users in which the edge weights
are determined by the number of edges shared by their corresponding pairs is con-
structed. For example, if user 1 and user 2 have one of the higher cosine similarities,
the edge weight between user 1 and user 2 becomes two.
Using our experimental set of 540 users (see the Experiments section below) and
a parameter k = 10, a nearly fully connected projected network is produced. Be-
cause this projected network requires edge pruning, edges with weights of less than
75% of the maximum source-target weight are removed. We assessed the influence
of the edge pruning parameter on the overall experimental results produced by the
73
proposed method by testing several values from 60 to 80% in increments of 5%. Al-
though none of the values within the assessed range produces significantly dominant
results, we selected the parameter value 75%, which exhibited the best results.
Finally, the users represented in the network are segmented using the Lou-
vain method [11], which demonstrated the best performance among all modularity-
maximizing methods, including the smart local moving (SLM) algorithm [138] and
the Infomap algorithm [107]. The Louvain algorithm is a hierarchical agglomerative
method that takes a greedy approach to local optimization using an iterated two-
step procedure. In the first step, it iterates over the nodes in the graph and assigns
each node to a community if the assignment leads to an increase in modularity. In
the second step, it creates super-nodes out of the clusters found in the first step.
The process is iteratively repeated using the base-graph to compute the gains in
modularity.
Finally, the algorithm determines the modularity for 10 clusters, which is the
same number of clusters identified by the domain expert. In the practical application
of our proposed method, the number of clusters varies with the product developer’s
needs or intentions; for evaluation purposes, however, we set the number of clusters
to correspond to the segmentation identified by the domain expert.
4.3 Experiments
4.3.1 Data description
To perform the comparison experiments, we obtained the app usage sequence data
from LG Electronics. Each app usage sequence consisted of a sequence of apps that
the users accessed between the time they turned on the screens of their smartphones
74
and the time that these were turned off (Figure 4.7).
Figure 4.7: Example of the app usage sequence.
We also obtained the results of the user segmentation performed by the domain
experts using LG Electronics (Figure 4.8), and used them as the answer set in our
experiments. According to LG Electronics, the segmentation results were derived
through widespread consultation of 32 domain experts. We mentioned those expla-
nations in the experiment section and the demographic information of some of the
domain experts are presented in the acknowledgements.
The user segmentation results of the domain experts consisted of 10 segments
presented in Table 4.1. We collected the user segmentation results of 180 people
and 180,000 app usage sequences (1,000 usage sequences were randomly selected per
user) for the experiments. All of the datasets were processed after anonymization
was performed.
4.3.2 Experiments setup
We evaluated the similarities of our proposed method with an answer set established
by the domain experts to verify the utilization and performance of our proposed user
segmentation method. As mentioned previously, the user segmentation results of the
domain experts consisted of 10 segments presented in Table 4.1.
75
Figure 4.8: Example of user segmentation by domain experts
Then, we set the number of clusters for the app clustering as 50. The proposed
method was influenced by the number of clusters of app clustering. Hence, to observe
the influence on the overall experiments, several values were tested, with the number
of clusters varying from 10 to 100 in increments of 10. Next, we set the number
of clusters as 50, which shows the best results; however, there are no significant
improvements when the number of clusters exceeds 50.
We also compared the similarities with the answer set and the following bench-
mark methods: (1) the method proposed by Hamka et al., which was the first to uti-
lize smartphone logs for user segmentation; (2) the neural embedding-based method,
and (3) the seq2seq-based method; and (4) the N-gram model, which represents se-
quential data as the frequency of the whole n-gram combination [120, 14]. We also
experimented on a few matrix reduction techniques as another benchmark (i.e.,
singular-vector decomposition (SVD) [44] and non-negative matrix decomposition
76
Table 4.1: User segmentation results obtained by domain experts.
Segments DescriptionNumberof users
ConversationalistsUse smartphones primarily for making calls,sending messages, and chatting with very low“other app” usage
14
UtilitariansUsage is primarily utility driven, and they spendthe greatest amount of time on apps, such asorganizers and productivity apps.
16
Social starsIdentified by greatest engagement on socialnetworking and chat platforms.
15
Photographers
Identified by greatest engagement on camera-relatedapps. They usually use several dozen camera appswith different features and post their photos toseveral types of SNS or their communities.
22
Music lovers
People who discover and listen to music whereverthey are. They usually use push and in-appmessages that highlight new songs, new playlists,and new artists.
12
News andmagazine readers
Identified by the greatest amount of time spent onbrowsing and reading articles. Their dataconsumption is also very high.
13
Video streamersUsage is dedicated to watching missed shows andmovies when they commute and rest.
13
Gaming buffsUsage primarily involves playing games on theirsmartphones.
16
Power usersIdentified as spending the most time on theirsmartphones, regardless of the type of apps. Theirengagement with shopping apps is greatest.
27
BeginnersThey use a very limited number of apps. Most usersin this segment are senior users.
32
(NMF) [23, 32]) to address the sparsity problem of the N-gram model.
In this study, we set the number of clusters to 10, which is similar to the number
of segments performed by the domain experts. For the similarity measurement, we
utilized the (1) Adjusted Rand index (ARI), which is defined as the number of
pairs that are either in the same group or in different groups in both partitions,
77
divided by the total number of pairs [136] [50]; (2) Normalized Mutual Information
(NMI), which is a variation of mutual information [83, 126]; and (3) Homogeneity
and Completeness. These metrics should not consider the absolute values of the
cluster labels, but rather applies if this clustering defines separations of the data
similar to some answer set of classes.
4.3.3 Experiments results
Observation of app clustering and user network construction result
Figure 4.9: Example of app clustering.
Figure 4.9 shows an example of the app clustering results derived from the
neural embedding architecture. Each app was clustered by characteristics in the
case of k = 50. Cluster 6 contained camera-related apps, while cluster 8 contained
social networking apps. Thus, we conclude that each app cluster effectively contained
a characteristically similar app, and the user representation based on this clustering
could be an effective representation result.
Figure 4.10 shows the network construction of users. The user network consisted
of 180 nodes, which is the number of users, and 1800 edges, which represent the top
78
10 similar users per user. For visualization, we used the Yifan Hu layout, which was
basically a force-directed graph-drawing technique, for this network [48]. The nodes
that were likely to be in the same community were clearly located together, and the
users exhibiting a similarity were more closely located to one another.
Figure 4.10: User network construction.
Comparison of segmentation results
To validate our proposed method, we compared the similarities between its segmen-
tation results and the results produced by the other assessed approaches to an answer
set established by the domain experts. Table 5.6 lists the results of the similarity
analysis.
The proposed app clustering method generally outperformed the other methods
because it successfully captured each app’s semantic characteristic by clustering
apps with contextual similarity. Furthermore, the proposed relative similarity-based
79
approach outperformed the baseline methods because it tended to evenly cluster
users rather than group closely located users into a specific cluster. It is also a
more straightforward method for capturing usage patterns than the conventional
seq2seq-based method, which, as a black box mechanism, produced no information
through training. Finally, there were no significant differences between the relative
and absolute similarity measure results.
The seq2seq-based architecture, which is a state-of-the-art architecture, also re-
turned better similarity results than the other methods, in particular outperform-
ing the other sequence models, including latent representations such an RNN-AE,
LSTM-AE, and RNN-VAE. This confirms that the seq2seq-based architecture is a
more straightforward method that does not require additional calculation for latent
representations such as AE or VAE, which allowed it to outperform other methods.
The study performed by Hamka et al. was limited in terms of sources of data use
and number of apps used to determine the users’ preferences. Consequently, they
produced less meaningful results, with only the Power and Beginner user groups
effectively segmented.
The N-gram model assessed in this study produced better results than those
obtained by Hamka et al. because it considered app usage data when segmenting
users. However, this model also had a sparsity problem owing to its large matrix size,
which led to a worse similarity result than produced by the proposed method. In
addition, no significant changes were observed when matrix decomposition methods
such as NMF and SVD were applied.
Addtionally, through our proposed user representation method, each user rep-
resentation is easily understandable because each dimension of representation in-
80
Table 4.2: Comparison of the similarities between the segmentations obtained byeach method and the answer set (*: proposed method, (c): utilizing cosine distance,(m): utilizing mahalanobis distance).
ARI NMI Homogeneity Completeness
Relative similarity-based(c)*
0.6004 0.6996 0.7294 0.7504
Relative similarity-based (m) 0.5946 0.6841 0.7203 0.7349
App clustering-based (c)* 0.5776 0.6496 0.6783 0.7010
App clustering-based (m) 0.5713 0.6311 0.6731 0.6973
Seq2seq-based approach (Leeet al.)
0.5671 0.6314 0.6697 0.6927
RNN-AE-based approach 0.5472 0.6148 0.6404 0.6761
LSTM-AE-based approach 0.5317 0.6031 0.6308 0.6673
RNN-VAE-based approach 0.5391 0.6079 0.7271 0.6656
Vanilla neuralembedding-based (win:2)
0.4946 0.5543 0.5973 0.6075
Vanilla neuralembedding-based (win:4)
0.5273 0.5878 0.6343 0.6276
Hamka et al. 0.2298 0.3004 0.4015 0.417
Bi-gram 0.3873 0.4404 0.5175 0.5137
Tri-gram 0.3901 0.4373 0.5137 0.5705
Bi-gram (SVD) 0.3781 0.4215 0.4735 0.5264
Bi-gram (NMF) 0.3974 0.4318 0.5157 0.5076
tuitively shows the frequency of each app cluster. This allows an analyst to easily
grasp the underlying logic of the derived segmentation results and the characteris-
tics associated with each segment by viewing the representation of users who belong
to it. In other words, as each app cluster represents a specific characteristic, it is
possible to perceive each user as a collection of interests and intuitively understand
the components of the generated user vectors.
An examination of Table 4.3 reveals that the proposed method successfully offers
a clear interpretation of the generated user representation. The apps in the clusters
81
listed in the table clearly indicate the characteristics of each cluster, implying that
the users in each segment are represented by the apps in frequent clusters and
allowing a name or characteristic to be easily assigned to each segment.
Table 4.3: Example of representation interpretation.
SegmentID
Most frequentapp cluster
Apps in the most frequent clusterCorresponding
segment of answerset
#2 3rd/50 Instagram, Facebook, Twitter Social stars
#4 24th/50 Spotify, SoundCloud, LG Music Music lovers
#7 12th/50LG camera, Candy camera, Camera
MXPhotographers
82
Chapter 5
Design elements selection
5.1 Background
In this section, we proposed two small subjects for elements section: 1) Prioritization
of product attributes and 2) Help contents selection and re-organization.
Customers generally make purchase decisions based on their evaluation and
knowledge of the attributes of a product [54, 113]. Thus, product developers or
marketers are frequently interested in identifying the product attributes that are
considered most important by the customers during their evaluation and purchase
of products [34]. For instance, they select the attribute identified as the important
product attribute for the product promotion. Another example is a spec sheet (Fig-
ure 5.8), which is a list describing the specifications of a product in a commercial
site. By identifying the significant product attribute, they effectively selected the
specifications contained in the spec sheet.
Recently, with the growing prominence and availability of user-generated reviews,
numerous product attribute extraction studies are being performed based on these
textual reviews [104]. However, most of the previous studies only focused on the
extraction of the product aspects by considering them as product attributes and
not on the relative importance of the extracted aspects although it is a critical
83
information utilized for the promotion or development of spec sheets as mentioned
previously. For example, the sentence ‘I love the touchscreen of this, but the battery
life is too short.’ contains two aspects [102], namely touchscreen and battery life.
However, we would not be able to capture the relative importance of touchscreen
and battery life with the previous approaches.
Thus, the present study firstly focuses on the development of an attribute set
for a product by considering the relative importance of the extracted attributes. We
select a smartphone as a target product because it is the most frequently purchased
electronic device. Moreover, we utilize thousands of customer reviews collected from
commercial and review sites of LG Electronics.
Second, there are several terms and help systems used in web sites or digital
devices, such as ‘Help’, ‘FAQ’, and ‘Docs’. These contents intend to provide assis-
tance to users (Figure 5.1). Thus, help systems should be conveniently accessible so
that users can get answers to their questions. For example, when users begin using
devices and when they can benefit from useful information [114].
In smartphones, in particular, help systems are critical because smartphones
constantly add new features and improvements, and a help system is one of the last
places users consult when they have difficulty using a device. Moreover, smartphone
manufacturers explain their major improvements effectively through the help system
and boost user satisfaction [93, 75].
In this study, app usage sequence was used as it is a powerful resource for user
specification because it contains meaningful information regarding the user’s prefer-
ences, behaviors, interests, and even demographic information such as age, gender,
and occupation [26]. Based on user specification derived from app usage sequence
84
Figure 5.1: Example of smartphone help system
information, the help contents organization reflecting user’s needs and characteristic
were generated and predicted. Although there are few studies utilizing app usage
sequence, it is limited to context/pattern modeling [90, 88, 125] or next app predic-
tion [4, 150, 149]. Thus, this is the first study that addresses the complicated user
interface problem concerning content recommendation using app usage sequence.
5.2 Methodology
5.2.1 Prioritization of product attributes
Our proposed method is composed of two phases: 1) Attribute extraction: using a
CNN and transfer learning, and 2) Calculation of the relative importance of the
extracted attributes: Applying variants of Grad CAM with a sentiment classifica-
tion model. Additionally, we perform minor refinements such as attribute clustering
85
(Figure 5.2).
Figure 5.2: Summary of our proposed method
Extraction of product aspect
For the first phase, we utilized a CNN approach, which is a state-of-the-art super-
vised approach, to extract the attributes following the study of Poria et al [101]. We
also used another useful approach of the transfer learning concept to capture the
latest improvements of a smartphone, one of the most rapidly changing products,
for which the data become easily outdated.
We first embedded all the customer reviews in a 300-dimensional vector space
before the CNN model was constructed utilizing the word2vec architecture [85].
86
Amazon and smartphone review datasets collected from LG Electronics were used
for the word embedding task.
We constructed and trained the CNN (Figure 5.4) after the word embedding
tasks using the existing datasets of SemEval 2014 [91] and Qui et al. [103]. We
inputted each word with a window size of 5 into the CNN because the features of
an aspect term depended on its context words.
The network contained one input layer, three convolutional layers, three max-
pooling layers, and two fully connected layers with a softmax output. The convo-
lutional layers are constructed as described in Table 5.1, and the stride in each
convolutional layer was 1 because we wanted to tag each word.
Table 5.1: Structure of convolutional neural network for aspect extraction
Layer Number of feature map Size of filter
1st layer 100 3×3
2st layer 50 2×2×100
3rd layer 25 2×2×50
The pool size we used in the max-pooling layer was 2×2. The output of each
convolutional layer was computed using a hyperbolic tangent. The other parameters
of the CNN were based on previous studies [101]. Additionally, we used regularization
with dropout on the penultimate layer with constraint L2-norms of the weight vectors
having 50 epochs.
We applied the off-the-shelf feature concept after training the basic convolutional
network. We maintained the weights of the convolutional layer of a previous model
and only re-trained the last two fully connected layers with respect to each product,
such as V10, G5, V20, G6, and V30. The dataset used to train the CNN and off-
87
the-shelf approaches is described in the Experiments section. We then extracted the
attribute keyword from the entire review dataset of each product with the trained
model. Our smartphone review dataset contained 1000 reviews between September
23, 2014, and July 23, 2018, and labeled the aspect keyword using the domain experts
from the Mobile Communication Department in LG Electronics.
All the above-mentioned datasets were labeled using a widely used coding scheme
for representing sequences. In this example, the first word of each aspect starts with
a B-A tag. The I-A tag denotes the continuation of the aspect, whereas O is used
to tag a word that is not an aspect.
Calculation of relative importance of extracted aspect
Without prioritizing the extracted aspects, as mentioned previously, there numerous
limitations in utilizing them. Moreover, a simple prioritization approach based on
simple frequencies causes a bias so that extremely general aspects are considered
as the most important aspects. Thus, we provide a novel approach to calculate the
relative importance of the extracted aspects based on variants of Grad CAM.
We assume that the aspect that has a significant effect on the sentiment of the
overall product also has a relatively more importance than the other attributes.
Thus, we utilize the weight of each aspect affecting the overall product sentiments
as the importance score of each product attribute.
First, we construct ae sentiment classification model utilizing the CNN. To im-
prove the overall efficiency of our proposed method, we reuse part of the aspect
extraction model described in the previous section for the sentiment classification
model. We retain the parameters of the filter used in each convolution layer and
88
only re-train the final two layers for the sentiment classification model.
Second, we add the weighted layer similarly with Grad CAM to calculate the
weight of each aspect influencing the sentiment decision, as shown in Figure 5.3.
Figure 5.3: Example of weight visualization
Further, we add up the weights of all the aspects for a complete text review to
understand the importance of each aspect. Additionally, the weights of the aspects
in each review text are normalized to remove the bias caused by the different lengths
of the textual reviews.
We then sort the attributes by the order of the importance score to reveal the
relative importance of each attribute. We easily select a relevant attribute from the
limited number of attributes by sorting them.
Evaluation factor clustering and refinement
Furthermore, we conducted additional minor refinements for achieving a better per-
formance. The observations of the extracted attribution factor showed many ty-
pographical errors, incorrect expressions, or different words that imply the same
because the user review data were extremely unstructured texts. Thus, herein, we
applied a clustering technique to assign synonymous words standing for an extracted
attribution factor in the same cluster.
We clustered the words based on the embedding vector of the extracted factors
calculated in the first step using the spherical k-means method [148] to make the
silhouette index the lowest. Cosine dissimilarity 1 − cos(x, y) is the distance mea-
89
sure used in the spherical k-means method. In the clustering result, ‘screen ration’,
‘16:9’, ‘18:9’, and ‘full screen’ were assigned to the same cluster. Table 5.2 provides
examples of the extracted keywords belonging to the same cluster.
Table 5.2: Examples of keywords in the same cluster
[HTML]C0C0C0Attribute
Synonymous extracted keywords
Screen ratio screen ration, 16:9, 18:9, full vision
Design design, look, LG Signature, appearance
OS version OS, N OS, Nougat, Android
User interface User interface, UX, UX4.0, GUI
We converted the indirect expression of a user in a review comment into an ap-
propriate wording representing each attribute after the clustering task. For instance,
we converted ‘fast’ and ‘speed’ into ‘Processor’ and ‘Clearance’, respectively, and
‘Screen color’ into ‘display type’ and ‘glass type’.
5.2.2 Help contents re-organization
Our proposed method consists of four steps: 1) Seq2seq architecture training for user
specification; 2) CGAN architecture training for help contents’ usage generation; 3)
Calculation of the new user’s specification and generation of help contents’ usage
prediction and 4) Re-organization of help contents based on those predictions (Figure
5.4).
User specification based on app usage sequence
First, the user specification value is calculated using seq2seq architecture that is
originally proposed to generate sequences of words by predicting the next word while
considering the entire sequence. App usage sequences can be suitably processed
90
Figure 5.4: Summary of our proposed method
with seq2seq architectures as sequential and contextual information of app usage
sequences are similarly meaningful as words and sentences. Moreover, the seq2seq
architecture contains a context C node that is suitable for representing sequences
because it is originally designed to summarize all the encoded information.
In the proposed method, seq2seq architecture receives an app usage sequence
as the encoder input and generates the same app usage sequence in the decoder.
We utilize context vector C as the vector representation of each app usage sequence
after training the architecture using the set of the app usage sequences.
Each app usage sequence A = (a1, a2, . . . , aT ) specifically defines the series of
apps used from the smartphone screen being turned on to being turned off. Each user
typically has several app sequences per day and user specification is calculated by
averaging those usage sequences. Regarding the details of the proposed architecture,
the network contains three levels of hidden layers in each sequence. The length of
the encoder/decoder sequence corresponds to 15, considering the maximum length
91
of the app usage sequence.
Training of conditional GAN
Next, for training the CGAN to help usage prediction, usage data was preprocessed
as it is appropriate to be input into an architecture. The dataset contained infor-
mation regarding the help contents selected by each user during the first month
after purchasing. Thus, we converted that data into a binary format that indicated
whether each help content was selected during the first month, as shown in figure
5.5.
Figure 5.5: Preprocessing of help usage data
After preprocessing, the help usage data and user specification data is input into
the CGAN architecture, after which user specification vector is processed as condi-
tion and help usage date is processed as input node of real data, as shown in figure
5.6. Then, it is trained to generate help usage prediction for new users as artificial
92
data that can scarcely be distinguished from real help usage data considering user
specification.
In the proposed architecture, help usage data consists of 80 dimensions consid-
ering the number of the help contents data and user specification data consists of
50 dimensions, which is the highest performance in the experiments.
Figure 5.6: CGAN architecture for help usage prediction
Generation of help contents usage and its re-organization
After training the seq2seq and CGAN architecture, the help usage prediction of
new users can be generated. First, new users’ app usage sequences are input into a
seq2seq architecture to calculate their specification vector. These specifications are
then input into the CGAN to generate the help contents usage prediction.
Based on the usage prediction, help contents can be re-organized according to
the value of each help contents’ prediction score. For instance, if usage prediction
score indicates a higher value closest to 1, it indicates that it is likely to be selected
by a user and deserves to be located at the top of the help content lists. On the
other hand, if the usage prediction score is closest to 0, it indicates that it does not
hold the user’s interest and should be hidden in the help lists (figure 5.7).
93
Figure 5.7: Example of help contents re-organization
5.3 Experiments
5.3.1 Data description
For the first subject, we acquired the survey results from LG Electronics consisting of
the product attributes ordered by the importance considered as the most significant
purchasing factors. Such surveys are conducted periodically for each device, such as
G4, V10, G4, V20, G6, and V30.
Further, the spec sheet (Figure 5.8), addressed in the second experiment, is a
list describing the specifications of a product or property in a commercial site, such
as Amazon.com. The spec sheet contains information that comes uppermost to the
customers when they collect information about a product, particularly when buying
electronic devices. Thus, the selection of the attribute contained in the spec sheet is
an important task considering the frequency of the spec sheet.
94
Figure 5.8: Spec sheet of LG V30 (Resource: GSM Arena)
Moreover, below, we demonstrate the attributes of the spec sheet presented on
commercial or review sites for a smartphone product. We examine five major web-
sites: Amazon, BestBuy, GSM Arena, CNET, PhoneArena, and the official LG web-
site. The attributes are presented in the order in which the websites are listed.
Amazon (17) Screen size, Display type, Color spectrum, Resolution, Glass type,
Network, Storage, RAM, SD slot, First rear camera resolution, Second rear camera
resolution, Front camera resolution, OS version, Processor, Battery, Wireless charg-
ing, In the box
Best Buy (19) Processor, OS version, Network, Screen size, Screen ratio, Res-
olution, Display type, First rear camera resolution, Second rear camera resolution,
Front camera resolution, Camera angle, Network, Storage, SD slot, Mobile hotspot,
QSlide, QuickMemo, Water resistant, Warranty
GSM Arena (30) OS version, Dimensions, Weight, Materials, Fingerprint,
95
Water resistant, Dust resistant, Colors, Screen size, Resolution, Pixel density, Dis-
play type, Glass type, Sensor, First rear camera resolution, Second rear camera res-
olution, Front camera resolution, Camera angle, Camera feature, Camcorder reso-
lution, Processor, Storage, RAM, SD slot, Battery, Wireless charging, speaker, Mi-
crophone, Network, Voice feature
CNET (34) Weight, Color, Network, Form factor, OS version, User inter-
face, Intelligent assistant, SIM Card, Sensor, Materials, Water resistant, Dust re-
sistant, Messaging, Processor, Wireless interface, Resolution, Pixel density, Screen
size, Screen features, Screen ratio, Audio codec, Video codec, Memory, SD card, Bat-
tery, Wireless charging, Camera feature, Security, RAM, 1st rear camera resolution,
2nd rear camera resolution, Front camera resolution, Warranty, Dimensions
Phone Arena (36) Network, Dimensions, Weight, Materials, Glass type, SIM
card, Display type, Screen size, Resolution, Screen ratio, Multi-touch, Display fea-
ture, User interface, OS version, Processor (CPU), Processor (GPU), Memory, SD
card, SIM card, 1st rear camera resolution, 2nd rear camera resolution, Front camera
resolution, Video resolution, Speaker, Earphone jack, Network, GPS, NFC, Radio,
USB, Sensor, Messaging, Browser, Battery, Colors, Test results
Official site (44) Screen size, Display type, Pixel density, Screen ratio, Cam-
era feature, System features, Display features, 1st rear camera resolution, 2nd rear
camera resolution, Front camera resolution, Front camera angle, Rear camera angle,
Camera feature, Video resolution, Video feature, Voice recording feature hardware,
Voice recording feature, Hi-Fi, DAC, Material, Fingerprint, Dimensions, Weight,
Water resistant, Shock resistant, Glass type, Security features, Productivity features,
Convenience features, Entertainment features, Connectivity features, OS version,
96
User interface, Processor, Battery, Network, Fast charging, USB, Memory, Micro
SD, RAM, Earphone jack, Accessory
As shown above, many differences exist between each site. For instance, Best
Buy and GSM Arena do not contain the User interface attribute, and Amazon only
contained In the box items. The LG Electronics official site and Phone Arena contain
more than double the number of attributes contained on Amazon. Thus, we conclude
that it is relevant to study and select reasonable attributes to influence the purchase
intention of a customer to create an effective spec sheet.
And for the second subject, the app usage sequence data was collected from
LG Electronics to perform the comparison experiments. Each app usage sequence
consisted of the sequence of apps that the users accessed from the time they turned
on their smartphones’ screens to the time they were turned off. The results of the help
contents’ selection of users was also acquired for training and verifying the proposed
architecture. The help contents’ selection data of 1,800 people was collected that
consisted of 60 help contents per user. Further, 180,000 app usage sequences of
1,800 people (1,000 usage sequences per user) were collected for the experiments
5.3.2 Experiments setup
For the first subject, we verify the performance of our proposed method with two
experiments. First, we calculate the similarity between our prioritized product at-
tributes with the real survey results conducted internally by LG Electronics to iden-
tify the product attributes considered by real customers as the most important
purchasing factors.
The survey results, utilized as an answer set, consist the product attributes
97
ordered by the importance considered as the most significant purchasing factors.
To compare the order of the product attributes in the answer set and our pro-
posed method, we measure the results with a normalized discounted cumulative
gain (NDCG), which is one of the most well-known evaluation measures in informa-
tion retrieval for ranking systems [56, 57]. NDGC allows each retrieved result to have
a graded relevance, whereas most traditional ranking measures only allow a binary
relevance. In addition, it associates a discount function with the rank, whereas many
other measures uniformly weigh all the positions [141].
We measure the NDCG value with the top 30 extracted attributes and then
compare the results with other baselines, as presented in Table 5.3. We assign the
relevance weight on a scale from 1 to 10 per three attributes, and the weights are
reduced based on a logarithm function from 1.0 to near zero. For instance, the
attributes in the answer set are assigned as [10, 10, 10, 9, 9, 9, 8, 8,..., 2, 1, 1, 1],
and the weights are reduced as [1.0, 1.0, 1.0, 0.6309, 0.6309, 0.6309, 0.5, 0.5, 0.5,
0.4307,...]. Obviously, the NDCG value increases when the largest number comes in
the front order.
Table 5.3: Baselines utilized in first experiment
No. Extraction method Prioritization method
1 CNN-based LSTM attention
2 CNN-based LIME
3 CNN-based [101] Frequency-based
4 HMM-based [58] Frequency-based
5 CRF-based [49] Frequency-based
Second, we compare the effectiveness of our proposed method for the develop-
ment of spec sheets with existing major commercial sites. In detail, we verify that
98
the proposed method can extract the specialized factor of LG V30. The extrac-
tion of each specialized product is one of the most important considerations in the
smartphone industry, which is the most rapidly changing industry.
For the experiment, we conducted two five-point Likert scale user surveys with
40 participants and used the following points: 1) how much influence each attribute
in the spec sheet exerts on their purchase intention of LG V30 and 2) satisfaction
of the overall product and each attribute in the spec sheet for LG V30. We then
constructed a multiple regression model of the satisfaction of the overall product
and each attribute. Subsequently, we compared the coefficients of determination
(R2). The regression model demonstrated the completeness of the composition of
the attribute set in the spec sheet. In the experiment, we set three different numbers
of attributes (i.e., 17, 30, and 44) that corresponded to the minimum, average, and
maximum numbers of the attributes of the previous spec sheet.
And for the second subject, the two experiments were performed to verify the
performance of the proposed method. For the first experiment, the accuracy of help
contents’ usage prediction was verified with a five-fold cross validation method. The
seq2seq architecture was trained for user specification and CGAN architecture was
trained for help contents’ usage prediction with four-fifths of the dataset. Moreover,
the accuracy of the proposed method was tested with the remaining data.
For the second experiment, the effectiveness results of the proposed re-organizing
method were compared with other benchmark methods. The top-20, top-30, and top-
40 help contents were then selected for each user extracted from our proposed method
and each benchmark method. Then, the result considering the number of contents
selected by the user within those top-k contents was compared. The k=20, 30, and
99
40 were established for the experiment as it is the appropriate value shown within
the user’s few scrolling in the smartphone. The results were compared with the fol-
lowing benchmark methods: (1) Average selection of demographically similar users;
(2) Average selection of user who has a similar n-gram app usage within a week;
(3) Average selection of user who has a similar user specification without applying
CGAN; and (4) Applying neural embedding-based user specification method to the
proposed method by replacing only the seq2seq-based user specification approach.
For the neural embedding-based user specification method, the window size was
limited to 4 and 6, considering the average length app usage sequence of 8.13. For the
n-gram models, the number of usage of each app for the week was counted and only
the bi-gram and tri-gram were used, considering the extensive number of n-gram
combinations.
5.3.3 Experiments results
Prioritization of product attributes
Table 5.4 lists the NDGC results of the comparison of the extracted product at-
tributes and the answer set acquired from the user survey results conducted by
LG Electronics. As mentioned previously, we tested our proposed method and few
baselines for each product of LG V10, G5, V20, G5, and V30.
Although the NDGC results vary with the product, our proposed method out-
performs other methods for all the considered products. In the aspect of prioritizing
method, our proposed variants of the Grad CAM approach yield better results com-
pared to those obtained by other explainable machine learning approaches of Long
Short-Term Memory (LSTM) attention and LIME. By examining the detailed re-
100
Table 5.4: Performance of attributes extraction and prioritization (NDGC)
Method V10 G5 V20 G6 V30
Our proposed method 0.9273 0.9046 0.9215 0.9171 0.9013
CNN + LSTM Attention 0.9046 0.8920 0.9103 0.9018 0.8876
CNN + LIME 0.8844 0.8803 0.8961 0.8916 0.8803
CNN + Frequency [101] 0.8013 0.8164 0.7913 0.7813 0.7851
CRF + Frequency [49] 0.7556 0.7418 0.7519 0.7409 0.7491
HMM + Frequency [58] 0.7216 0.7276 0.7137 0.7374 0.7104
sults, we conclude that the LSTM attention shows inconsistent weight calculations
based on the length of each textual review, and thus, it causes bias in the overall
weight calculation. The LIME approach is more appropriate for binary decisions
for each aspect but not for the weight calculation. Nonetheless, clearly, all the ex-
plainable machine learning-based approaches outperform the simple frequency-based
approaches. Based on the experiment results, we conclude that the explainable ma-
chine learning-based approaches provide an effective weight score in the calculation
of the relative importance of the product attributes. Moreover, we also conclude
that the frequency-based approaches cause general aspects to be irrelevantly ranked
highest. Furthermore, the CNN-based method outperforms the other methods such
as the CRF and HMM-based approaches in aspect extraction problem as verified in
the previous studies [101].
Table 5.5 summarizes the extracted attributes obtained by the proposed method
for the LG V30 product. The major key features of the product, such as video
features (Cine-video) and camera lens (Crystal clear lens), are effectively extracted
with the proposed method mostly because the transfer learning approach is applied.
These features also have a relevant slope coefficient, β, in the regression model for
101
the satisfaction score.
Table 5.5: Examples of extracted attributes
Attribute β Attribute β Attribute β
Hi-Fi 0.1019 Voice features 0.0846 Camera angle 0.0785
AI features 0.0743 Camera lens 0.0716 Display type 0.0673
Video features 0.0654 Finger print 0.0584Water
resistance0.0519
As presented in Table 5.6, our proposed method shows a higher influence score
and larger coefficients of determination than the values for the existing spec sheet for
all the corresponding number of attributes. Thus, our proposed method effectively
reflects the interest of a customer and identifies the essential element affecting the
purchasing intention of a customer. We also consider the recent improvements of LG
V30 by utilizing the transfer learning approach.
Therefore, the previous method has a lesser effectiveness than the proposed
method even though it is constructed by a domain expert, who has much back-
ground knowledge of the smartphone industry.
Table 5.6: Result of effectiveness comparison
Source of Average influence score R2
Spec sheet Min Avg Max Min Avg Max
Proposed method 4.13 4.01 3.84 0.6236 0.5329 0.4829
Amazon 3.94 0.5219Best Buy 3.61 0.4917
GSM Arena 3.59 0.4532CNET 3.46 0.4048
Phone Arena 3.33 0.4129Official site 3.42 0.4483
102
Table 5.7: Confusion matrix of help contents usage prediction
n=108,000Predicted
Select Unselect
ActualSelect 30,473 3,839 34,312
Unselect 9,477 64,211 73,688
39,950 68,050
Help contents re-organization
Table 5.7 is a confusion matrix of help contents’ usage prediction with a probability
threshold of 0.5. It presents 88.81% of precision score, 87.14% of recall score, and
0.8797 of F1 score. The experimental results demonstrate a higher absolute perfor-
mance level, given that generative and prediction problem were dealt with and not
the simple classification problem, and considers the user specification vector as well
as the usage data.
Table 5.8: Average of help contents selection for top-k prediction
Top-20 Top-30 Top-40
Proposed method 17.91 26.57 33.16
Average selection of demographicallysimilar user
10.23 15.11 19.67
Average selection based on similarn-gram usage
11.46 17.86 21.13
Average selection based on similaruser specification
13.49 20.01 25.63
Neural embedding based approachwith user specification (win=4)
15.79 23.73 30.51
Neural embedding based approachwith user specification (win=6)
15.91 25.01 31.17
Table 5.8 depicts the result of effectiveness of the proposed method and other
benchmark methods that compare the number of help contents selected by the user
within top-k contents. As shown in the table, the proposed method shows the highest
effectiveness score compared to other benchmark approaches, such as demographics-
103
based approach or n-gram-based approach.
The proposed method has higher performance because it captures user spec-
ification value based on app usage sequence to represent the users’ interests and
characteristics effectively.
Although other benchmark methods considering user specification present rela-
tively higher effectiveness than the first two methods, it exhibited lower performance
compared to the current study’s proposed approach. A reason for using the method
without GAN is that GAN was unable to predict help usage effectively by only av-
eraging the similar user’s selection data and not utilizing the state-of-art generative
model.
Further, the result of neural embedding-based approach was obtained because of
the problem of window size. The neural embedding based method only considered
the neighboring apps with window sizes of four or six, whereas the proposed method
considered entire sequences as contextual information of the app sequences.
104
Chapter 6
Conclusion
Previously, various tasks respect to user experience design is performed heuristi-
cally, thus there are a few problems associated with it. Thus, in this studies, data
driven UX design approaches are proposed for the whole process of user experience
design stage. In details, this study focuses on three research scopes: Customer-voice
classification, User segmentation and Design elements selection.
First, in the customer-voice classification, this study proposes document de-
nosing approach, probabilistic word clustering based document representation and
another novel method to apply convolution filter to matrix of document representa-
tion.
The class vector is utilized in the novelty detection method that modifies the
previous novelty detection methods to observe that the proposed method detects
novel words more effectively than the previous method. In the actual experiments,
classification performance of customer-voice representation by applying the proposed
method outperformed those of the previous method. Therefore, it is concluded that
the novelty score of the proposed method is a more proper measurement when com-
pared with the previous method to determine whether a word effectively represents
a class.
105
And a probabilistic word-clustering-based approach considering the membership
strength of each word with respect to each cluster was proposed. It is expected that
the proposed method would be robust with respect to customer-voice data consist-
ing of extremely unstructured texts, including typos, by considering the membership
strength of those words. The proposed method outperforms all other document rep-
resentation methods in actual experiments with regard to classification performance.
And, this study proposes another novel approach to apply convolution filter to
matrix representation to address the complexity and the number of parameters for
further classification model. For doing this, we rearrange the elements in each docu-
ment representation vector to preserve the semantic distance between those clusters
with t-SNE algorithm and put them in one-to-one correspondence based on the
semantic meaning with linear transformation. It outperforms all other document
representation methods in actual experiments on classification performance. The
reason for higher performance of the proposed method is that it captures the var-
ious aspects of each representation method rather than individual representation,
especially on the customer-voice data, which is extremely unstructured text.
Second, in the user segmentation studies, a variant of a neural network is pro-
posed to effectively utilize the app usage sequence proposed herein a variant of a
neural network to effectively utilize the app usage sequence. We represented the app
usage sequence and each user via sequences in vector space.
And, this study proposes app clustering and relative similarity based approaches
for user segmentation that can provide an intuitive interpretation based on the ob-
servations in the study. With the app clustering-based user representation, each app
was represented as a vector in the vector space generated from the neural embedding
106
architecture, and characteristically similar apps were clustered into a cluster. Each
user was represented by the frequencies of these clusters. As the relative similarity
based user segmentation, we proposed a network representation-based method that
utilized the order of relative similarity. Based on the vector representation of the
app clustering based approach, relatively similar users, which are located closely
each other, were constructed node in the network. The users were then segmented
using the Louvain method on the constructed network.
These approaches provided an interpretation for the generated user representa-
tion. Our proposed methods also outperformed all of the other methods in terms of
the similarity metrics.
Third, in the design elements selection studies, an advanced method to develop
the attributes of a spec sheet that can effectively reflect the user’s purchasing inten-
tion is proposed. Most of the previous studies focused on developing the evaluation
or purchasing factor and were heuristically performed by those, who already have
comprehensive domain knowledge and background information of the product in-
dustry. The experiment section showed that the major key features of each product
were effectively extracted using the proposed method and showed a better extrac-
tion performance than the existing spec sheet for all the corresponding numbers of
attribute.
Lastly, user specification was considered in help contents re-organization while
utilizing the app usage sequence that reflects the user’s interests and preferences
effectively. The experiment depicted a higher absolute performance level of help
contents’ usage prediction, given that the generative and prediction problem were
considered, and not the simple classification problem. It also demonstrated a better
107
performance of effectiveness in re-organization of top-k contents than the existing
benchmark methods for all the corresponding number of k.
With these results, it is concluded that data driven approaches effectively ad-
dresses the previous problems cause by heuristic approaches. And it can provide
meaningful insight to several UI designers regarding customer-voice analysis, user
segmentation, product development or layout design. Future research will involve
the other feature such as duration and absolute time stamp of app usage sequence.
And it can also extend the scope of research based on this study for other tasks in
the whole UX design process such as usage pattern analysis, app recommendation
or graphical design. Finally, these studies are expected to aid the widespread appli-
cation of the proposed data driven UX design approaches in other tasks arising in
the context of real business environments.
108
Bibliography
[1] Y. Al-Raheem, R. Ali, N. Firdaus, and N. Z. Ab Rahim, A systematic
literature review of software help systems limitations, Indian Journal of Science
and Technology, 11 (2018).
[2] W. A. Alberts and T. M. van der Geest, Color matters: Color as trust-
worthiness cue in web sites, Technical communication, 58 (2011), pp. 149–160.
[3] S. Ando, Clustering needles in a haystack: An information theoretic analysis
of minority and outlier detection, in Seventh IEEE International Conference
on Data Mining (ICDM 2007), IEEE, 2007, pp. 13–22.
[4] R. Baeza-Yates, D. Jiang, F. Silvestri, and B. Harrison, Predicting
the next app that you are going to use, in Proceedings of the Eighth ACM In-
ternational Conference on Web Search and Data Mining, ACM, 2015, pp. 285–
294.
[5] R. Baeza-Yates, B. Ribeiro-Neto, et al., Modern information retrieval,
vol. 463, ACM press New York, 1999.
[6] L. D. Baker, T. Hofmann, A. McCallum, and Y. Yang, A hierarchical
probabilistic model for novelty detection in text, in Proceedings of International
Conference on Machine Learning, Citeseer, 1999.
109
[7] T. T. Barker, Writing software documentation, A Task-oriented Approach,
Neddham, (1998).
[8] L. Bing, T.-L. Wong, and W. Lam, Unsupervised extraction of popular
product attributes from e-commerce web sites by considering customer reviews,
ACM Transactions on Internet Technology (TOIT), 16 (2016), p. 12.
[9] D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent dirichlet allocation,
Journal of machine Learning research, 3 (2003), pp. 993–1022.
[10] J. Blitzer, M. Dredze, F. Pereira, et al., Biographies, bollywood, boom-
boxes and blenders: Domain adaptation for sentiment classification, in ACL,
vol. 7, 2007, pp. 440–447.
[11] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, Fast
unfolding of communities in large networks, Journal of statistical mechanics:
theory and experiment, 2008 (2008), p. P10008.
[12] I. Bose and X. Chen, Exploring business opportunities from mobile services
data of customers: An inter-cluster analysis approach, Electronic Commerce
Research and Applications, 9 (2010), pp. 197–208.
[13] H. Bouziane, B. Messabih, and A. Chouarfia, Profiles and majority
voting-based ensemble method for protein secondary structure prediction, Evo-
lutionary bioinformatics online, 7 (2011), p. 171.
[14] M. L. Brocardo, I. Traore, S. Saad, and I. Woungang, Authorship
verification for short messages using stylometry, in Computer, Information
110
and Telecommunication Systems (CITS), 2013 International Conference on,
IEEE, 2013, pp. 1–6.
[15] L. Cai and T. Hofmann, Text categorization by boosting automatically ex-
tracted concepts, in Proceedings of the 26th annual international ACM SIGIR
conference on Research and development in informaion retrieval, ACM, 2003,
pp. 182–189.
[16] Z. Cai, X. Hu, H. Li, and A. Graesser, Can word probabilities from lda
be simply added up to represent documents?, in Proceedings of the 9th Inter-
national Conference on Educational Data Mining, 2016.
[17] S. Chatterji, D. Chatterjee, and S. Sarkar, An efficient technique for
de-noising sentences using monolingual corpus and synonym dictionary., in
COLING (Demos), Citeseer, 2012, pp. 59–66.
[18] L.-C. Cheng and L.-M. Sun, Exploring consumer adoption of new services
by analyzing the behavior of 3g subscribers: An empirical case study, Electronic
Commerce Research and Applications, 11 (2012), pp. 89–100.
[19] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau,
F. Bougares, H. Schwenk, and Y. Bengio, Learning phrase represen-
tations using rnn encoder-decoder for statistical machine translation, arXiv
preprint arXiv:1406.1078, (2014).
[20] J. Choi, B.-J. Kim, and S. Yoon, Ux and strategic management: A case
study of smartphone (apple vs. samsung) and search engine (google vs. naver)
111
industry, in International Conference on HCI in Business, Springer, 2014,
pp. 703–710.
[21] Y. Choi and C. Cardie, Hierarchical sequential learning for extracting opin-
ions and their attributes, in Proceedings of the ACL 2010 conference short
papers, Association for Computational Linguistics, 2010, pp. 269–274.
[22] S. G. Chua et al., The mobile ecosystem in asia pacific-steering economic
and social impact through mobile broadband. ATkearney, 2011.
[23] A. Cichocki and A.-H. Phan, Fast local algorithms for large scale nonneg-
ative matrix and tensor factorizations, IEICE transactions on fundamentals of
electronics, communications and computer sciences, 92 (2009), pp. 708–721.
[24] M. Corbin, Design checklists for online help, Online publication
(http://www. writersua. com/articles/checklist/index. html), (2004).
[25] D. W. Cravens and N. Piercy, Strategic marketing, vol. 7, McGraw-Hill
New York, 2006.
[26] C. d’Alessandro and P. C. Trucco, Business potential and market oppor-
tunities of intelligent lbss for personal mobility–a european case study, Procedia
Computer Science, 5 (2011), pp. 906–911.
[27] C. Davidsson, Mobile application recommender system, 2010.
[28] M. De Reuver, H. Bouwman, and T. De Koning, The mobile context
explored, in Mobile service innovation and business models, Springer, 2008,
pp. 89–114.
112
[29] C. N. dos Santos and M. Gatti, Deep convolutional neural networks for
sentiment analysis of short texts., in COLING, 2014, pp. 69–78.
[30] S. T. Dumais, Latent semantic analysis, Annual review of information science
and technology, 38 (2004), pp. 188–230.
[31] H. Falaki, R. Mahajan, S. Kandula, D. Lymberopoulos, R. Govin-
dan, and D. Estrin, Diversity in smartphone usage, in Proceedings of the
8th international conference on Mobile systems, applications, and services,
ACM, 2010, pp. 179–194.
[32] C. Fevotte and J. Idier, Algorithms for nonnegative matrix factorization
with the β-divergence, Neural computation, 23 (2011), pp. 2421–2456.
[33] M. A. T. Figueiredo and A. K. Jain, Unsupervised learning of finite mix-
ture models, IEEE Transactions on pattern analysis and machine intelligence,
24 (2002), pp. 381–396.
[34] C. Fuchs, E. Prandelli, and M. Schreier, The psychological effects of
empowerment strategies on consumers’ product demand, Journal of Marketing,
74 (2010), pp. 65–79.
[35] J. J. Garrett, Elements of user experience, the: user-centered design for the
web and beyond, Pearson Education, 2010.
[36] S. P. Gaskin, A. Griffin, J. R. Hauser, G. M. Katz, and R. L. Klein,
Voice of the customer, Wiley International Encyclopedia of Marketing, (2010).
113
[37] X. Glorot, A. Bordes, and Y. Bengio, Domain adaptation for large-scale
sentiment classification: A deep learning approach, in Proceedings of the 28th
international conference on machine learning (ICML-11), 2011, pp. 513–520.
[38] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-
Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial
nets, in Advances in neural information processing systems, 2014, pp. 2672–
2680.
[39] A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, and
J. Schmidhuber, A novel connectionist system for unconstrained handwriting
recognition, IEEE transactions on pattern analysis and machine intelligence,
31 (2009), pp. 855–868.
[40] A. Griffin and J. R. Hauser, The voice of the customer, Marketing science,
12 (1993), pp. 1–27.
[41] D. Guthrie, Unsupervised Detection of Anomalous Text, PhD thesis, Univer-
sity of Sheffield, 2008.
[42] D. Guthrie, L. Guthrie, B. Allison, and Y. Wilks, Unsupervised
anomaly detection., in IJCAI, 2007, pp. 1624–1628.
[43] D. Guthrie, L. Guthrie, and Y. Wilks, An unsupervised approach for the
detection of outliers in corpora, LREC, 2008.
[44] N. Halko, P.-G. Martinsson, and J. A. Tropp, Finding structure with
randomness: Stochastic algorithms for constructing approximate matrix de-
compositions, (2009).
114
[45] F. Hamka, H. Bouwman, M. De Reuver, and M. Kroesen, Mobile cus-
tomer segmentation based on smartphone measurement, Telematics and Infor-
matics, 31 (2014), pp. 220–227.
[46] Z. S. Harris, Distributional structure, Word, 10 (1954), pp. 146–162.
[47] J. Heymann, O. Walter, R. Haeb-Umbach, and B. Raj, Unsupervised
word segmentation from noisy input, in Automatic Speech Recognition and
Understanding (ASRU), 2013 IEEE Workshop on, IEEE, 2013, pp. 458–463.
[48] Y. Hu, Efficient, high-quality force-directed graph drawing, Mathematica Jour-
nal, 10 (2005), pp. 37–71.
[49] S. Huang, X. Liu, X. Peng, and Z. Niu, Fine-grained product features ex-
traction and categorization in reviews opinion mining, in Data Mining Work-
shops (ICDMW), 2012 IEEE 12th International Conference on, IEEE, 2012,
pp. 680–686.
[50] L. Hubert and P. Arabie, Comparing partitions, Journal of classification,
2 (1985), pp. 193–218.
[51] B. Insights and C. Insights, Customer segmentation. Bain & Company,
2017.
[52] J. Jagarlamudi, H. Daume III, and R. Udupa, Incorporating lexical pri-
ors into topic models, in Proceedings of the 13th Conference of the Euro-
pean Chapter of the Association for Computational Linguistics, Association
for Computational Linguistics, 2012, pp. 204–213.
115
[53] N. Jakob and I. Gurevych, Extracting opinion targets in a single-and cross-
domain setting with conditional random fields, in Proceedings of the 2010 con-
ference on empirical methods in natural language processing, Association for
Computational Linguistics, 2010, pp. 1035–1045.
[54] A. Jamal and M. Goode, Consumers’ product evaluation: A study of the
primary evaluative criteria in the precious jewellery market in the uk, Journal
of Consumer Behaviour, 1 (2001), pp. 140–155.
[55] C. B. James, Pattern recognition with fuzzy objective function algorithms,
Kluwer Academic Publishers, (1981).
[56] K. Jarvelin and J. Kekalainen, Ir evaluation methods for retrieving highly
relevant documents, in Proceedings of the 23rd annual international ACM SI-
GIR conference on Research and development in information retrieval, ACM,
2000, pp. 41–48.
[57] , Cumulated gain-based evaluation of ir techniques, ACM Transactions on
Information Systems (TOIS), 20 (2002), pp. 422–446.
[58] W. Jin, H. H. Ho, and R. K. Srihari, A novel lexicalized hmm-based
learning framework for web opinion mining, in Proceedings of the 26th annual
international conference on machine learning, 2009, pp. 465–472.
[59] G. M. Katz, The “one right way” to gather the voice of the customer, PDMA
Visions Magazine, 25 (2001), pp. 1–6.
[60] Keras, Keras Documentation, 2017.
116
[61] H. K. Kim, H. Kim, and S. Cho, Bag-of-concepts: Comprehending document
representation through clustering words in distributed representation, Neuro-
computing, (2017).
[62] S. Kim, Novel document representations based on labels and sequential infor-
mation, PhD thesis, Georgia Institute of Technology, 2015.
[63] Y. Kim, Convolutional neural networks for sentence classification, arXiv
preprint arXiv:1408.5882, (2014).
[64] D. P. Kingma and M. Welling, Auto-encoding variational bayes, arXiv
preprint arXiv:1312.6114, (2013).
[65] P. Kotler and G. Armstrong, Principles of marketing, Pearson education,
2010.
[66] P. Kotler and K. L. Keller, Direccion de marketing, Pearson educacion,
2009.
[67] M. Kuniavsky, Observing the user experience: a practitioner’s guide to user
research, Elsevier, 2003.
[68] S. Lai, L. Xu, K. Liu, and J. Zhao, Recurrent convolutional neural networks
for text classification., in AAAI, 2015, pp. 2267–2273.
[69] T. K. Landauer, P. W. Foltz, and D. Laham, An introduction to latent
semantic analysis, Discourse processes, 25 (1998), pp. 259–284.
[70] Q. V. Le and T. Mikolov, Distributed representations of sentences and
documents., in ICML, vol. 14, 2014, pp. 1188–1196.
117
[71] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, 521 (2015),
pp. 436–444.
[72] Y. Lee, S. Song, and S. Cho, Document representation based on probabilis-
tic word clustering in customer-voice classification, (2016).
[73] F. Li, C. Han, M. Huang, X. Zhu, Y.-J. Xia, S. Zhang, and H. Yu,
Structure-aware review mining and summarization, in Proceedings of the 23rd
international conference on computational linguistics, Association for Compu-
tational Linguistics, 2010, pp. 653–661.
[74] Q. Lin, Mobile customer clustering analysis based on call detail records, Com-
munications of the IIMA, 7 (2007), p. 95.
[75] J. Linder, How to develop a help system for a communication app, 2015.
[76] P. Liu, X. Qiu, and X. Huang, Recurrent neural network for text classifi-
cation with multi-task learning, arXiv preprint arXiv:1605.05101, (2016).
[77] L. v. d. Maaten and G. Hinton, Visualizing data using t-sne, Journal of
Machine Learning Research, 9 (2008), pp. 2579–2605.
[78] A. Mahapatra, N. Srivastava, and J. Srivastava, Contextual anomaly
detection in text data, Algorithms, 5 (2012), pp. 469–489.
[79] L. Manevitz and M. Yousef, Learning from positive data for document
classification using neural networks, in Proceedings of the 2nd Bar-Ilan Work-
shop on Knowledge Discovery and Learning, 2000.
118
[80] L. M. Manevitz and M. Yousef, One-class svms for document classifica-
tion, Journal of Machine Learning Research, 2 (2001), pp. 139–154.
[81] C. D. Manning and H. Schutze, Foundations of statistical natural language
processing, vol. 999, MIT Press, 1999.
[82] O. Matan, Ensembles for supervised classification learning, PhD thesis, stan-
ford university, 1996.
[83] A. F. McDaid, B. T. Murphy, N. Friel, and N. J. Hurley, Model-
based clustering in networks with stochastic community finding, arXiv preprint
arXiv:1205.1997, (2012).
[84] Y. Miao, L. Yu, and P. Blunsom, Neural variational inference for text
processing, in International Conference on Machine Learning, 2016, pp. 1727–
1736.
[85] T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient estimation of
word representations in vector space, arXiv preprint arXiv:1301.3781, (2013).
[86] O. Mitrofanova, Automatic word clustering in studying semantic structure
of texts, Advances in Computational Linguistics: Research in Computing Sci-
ence. Mexico, 41 (2009), pp. 27–34.
[87] F. J. Molina-Castillo, C. Lopez-Nicolas, and H. Bouwman, Explain-
ing mobile commerce services adoption by different type of customers, Journal
of Systemics, Cybernetics and Informatics, 6 (2008), pp. 73–79.
[88] A. Mukherji, V. Srinivasan, and E. Welbourne, Adding intelligence to
your mobile device via on-device sequential pattern mining, in Proceedings of
119
the 2014 ACM International Joint Conference on Pervasive and Ubiquitous
Computing: Adjunct Publication, ACM, 2014, pp. 1005–1014.
[89] L. Muller, L. Cossio, and M. S. Silveira, Won’t it please, please help me?
the (un) availability and (lack of) necessity of help systems in mobile applica-
tions, in International Conference on Human-Computer Interaction, Springer,
2014, pp. 632–637.
[90] D. Natarajasivan and M. Govindarajan, An overview on mobile data
mining, International Journal of Computer Applications, 99 (2014), pp. 11–
14.
[91] I. W. on Semantic Evaluation, Semeval-2014 dataset, 2014.
[92] A. Onan, S. Korukoglu, and H. Bulut, A multiobjective weighted voting
ensemble classifier based on differential evolution algorithm for text sentiment
classification, Expert Systems with Applications, 62 (2016), pp. 1–16.
[93] R. Oppermann, Adaptive user support: ergonomic design of manually and
automatically adaptable software, Routledge, 2017.
[94] C. Orrite, M. Rodrıguez, F. Martınez, and M. Fairhurst, Classifier
ensemble generation for the majority vote rule, in Iberoamerican Congress on
Pattern Recognition, Springer, 2008, pp. 340–347.
[95] A. Oulasvirta, T. Rattenbury, L. Ma, and E. Raita, Habits make
smartphone use more pervasive, Personal and Ubiquitous Computing, 16
(2012), pp. 105–114.
120
[96] S. J. Pan and Q. Yang, A survey on transfer learning, IEEE Transactions
on knowledge and data engineering, 22 (2010), pp. 1345–1359.
[97] E. Park, Supervised feature representations for document classification, PhD
thesis, Seoul National University, 2016.
[98] J. Park and S. H. Han, Defining user value: A case study of a smartphone,
International Journal of Industrial Ergonomics, 43 (2013), pp. 274–282.
[99] M. A. Pimentel, D. A. Clifton, L. Clifton, and L. Tarassenko, A
review of novelty detection, Signal Processing, 99 (2014), pp. 215–249.
[100] I. Plaza, L. MartıN, S. Martin, and C. Medrano, Mobile applications
in an aging society: Status and trends, Journal of Systems and Software, 84
(2011), pp. 1977–1988.
[101] S. Poria, E. Cambria, and A. Gelbukh, Aspect extraction for opinion
mining with a deep convolutional neural network, Knowledge-Based Systems,
108 (2016), pp. 42–49.
[102] S. Poria, E. Cambria, L.-W. Ku, C. Gui, and A. Gelbukh, A rule-
based approach to aspect extraction from product reviews, in Proceedings of the
second workshop on natural language processing for social media (SocialNLP),
2014, pp. 28–37.
[103] G. Qiu, B. Liu, J. Bu, and C. Chen, Opinion word expansion and target
extraction through double propagation, Computational linguistics, 37 (2011),
pp. 9–27.
121
[104] C. Quan and F. Ren, Unsupervised product feature extraction for feature-
oriented opinion determination, Information Sciences, 272 (2014), pp. 16–28.
[105] D. J. Rezende, S. Mohamed, and D. Wierstra, Stochastic backpropa-
gation and approximate inference in deep generative models, arXiv preprint
arXiv:1401.4082, (2014).
[106] M. T. Ribeiro, S. Singh, and C. Guestrin, Why should i trust you?:
Explaining the predictions of any classifier, in Proceedings of the 22nd ACM
SIGKDD international conference on knowledge discovery and data mining,
ACM, 2016, pp. 1135–1144.
[107] M. Rosvall and C. T. Bergstrom, Maps of random walks on complex
networks reveal community structure, Proceedings of the National Academy of
Sciences, 105 (2008), pp. 1118–1123.
[108] M. C. Roy1, Y. Rannou, and L. Rivard, The design of effective online help
in web applications, Journal of Knowledge Management Practice, 8 (2007).
[109] D. E. Rumelhart, P. Smolensky, J. L. McClelland, and G. Hinton,
Sequential thought processes in pdp models, Parallel distributed processing:
explorations in the microstructures of cognition, 2 (1986), pp. 3–57.
[110] D. S. Sachan and S. Kumar, Class vectors: Embedding representation of
document classes, arXiv preprint arXiv:1508.00189, (2015).
[111] S. K. Saha, P. Mitra, and S. Sarkar, Word clustering and word selection
based feature reduction for maxent based hindi ner., in ACL, 2008, pp. 488–495.
122
[112] M. Sahlgren, The Word-Space Model: Using distributional analysis to
represent syntagmatic and paradigmatic relations between words in high-
dimensional vector spaces, PhD thesis, Institutionen for lingvistik, 2006.
[113] S. Samiee, Customer evaluation of products in a global market, Journal of
International Business Studies, 25 (1994), pp. 579–604.
[114] D. Sato, T. Morimura, T. Katsuki, Y. Toyota, T. Kato, and H. Tak-
agi, Automated help system for novice older users from touchscreen gestures,
in Pattern Recognition (ICPR), 2016 23rd International Conference on, IEEE,
2016, pp. 3073–3078.
[115] A. M. Schejter, A. Serenko, O. Turel, and M. Zahaf, Policy im-
plications of market segmentation as a determinant of fixed-mobile service
substitution: What it means for carriers and policy makers, Telematics and
Informatics, 27 (2010), pp. 90–102.
[116] A. Sell, P. Walden, and C. Carlsson, Are you efficient, trendy or skill-
full? an exploratory segmentation of mobile service users, in Mobile Business
and 2010 Ninth Global Mobility Roundtable (ICMB-GMR), 2010 Ninth Inter-
national Conference on, IEEE, 2010, pp. 116–123.
[117] R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and
D. Batra, Grad-cam: Why did you say that? visual explanations from deep
networks via gradient-based localization, CoRR, abs/1610.02391, 7 (2016).
123
[118] M. Z. Shafiq, L. Ji, A. X. Liu, J. Pang, and J. Wang, Characteriz-
ing geospatial dynamics of application usage in a 3g cellular data network, in
INFOCOM, 2012 Proceedings IEEE, IEEE, 2012, pp. 1341–1349.
[119] A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, Cnn
features off-the-shelf: an astounding baseline for recognition, in Proceedings of
the IEEE conference on computer vision and pattern recognition workshops,
2014, pp. 806–813.
[120] G. Sidorov, F. Velasquez, E. Stamatatos, A. Gelbukh, and
L. Chanona-Hernandez, Syntactic dependency-based n-grams as classifi-
cation features, in Mexican International Conference on Artificial Intelligence,
Springer, 2012, pp. 1–11.
[121] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-
scale image recognition, arXiv preprint arXiv:1409.1556, (2014).
[122] K. Singh and S. Upadhyaya, Outlier detection: applications and techniques,
International Journal of Computer Science Issues, 9 (2012), pp. 307–323.
[123] R. Socher, M. Ganjoo, C. D. Manning, and A. Ng, Zero-shot learning
through cross-modal transfer, in Advances in neural information processing
systems, 2013, pp. 935–943.
[124] E. J. Spinosa, F. de Leon, A. Ponce, and J. Gama, Novelty detection
with application to data streams, Intelligent Data Analysis, 13 (2009), pp. 405–
422.
124
[125] V. Srinivasan, S. Moghaddam, A. Mukherji, K. K. Rachuri, C. Xu,
and E. M. Tapia, Mobileminer: Mining your frequent patterns on your phone,
in Proceedings of the 2014 ACM International Joint Conference on Pervasive
and Ubiquitous Computing, ACM, 2014, pp. 389–400.
[126] A. Strehl and J. Ghosh, Cluster ensembles—a knowledge reuse frame-
work for combining multiple partitions, Journal of machine learning research,
3 (2002), pp. 583–617.
[127] V. Suarez-Paniagua, I. Segura-Bedmar, and P. Martınez, Word em-
bedding clustering for disease named entity recognition, in Proceedings of the
Fifth BioCreative Challenge Evaluation Workshop, 2015, pp. 299–304.
[128] I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning
with neural networks, in Advances in neural information processing systems,
2014, pp. 3104–3112.
[129] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Re-
thinking the inception architecture for computer vision, in Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, 2016,
pp. 2818–2826.
[130] C.-C. Tao et al., Market segmentation for mobile tv content on public trans-
portation by integrating innovation adoption model and lifestyle theory, Journal
of Service Science and Management, 1 (2008), p. 244.
125
[131] B. D. Temkin, B. Chatham, and M. Amato, The customer experience
value chain: An enterprisewide approach for meeting customer needs, Forrester
Research. March, 15 (2005).
[132] TensorFlow, TensorFlow Tutorials, 2017.
[133] Z. Toh and W. Wang, Dlirec: Aspect term extraction and term polarity
classification system, in Proceedings of the 8th International Workshop on
Semantic Evaluation (SemEval 2014), 2014, pp. 235–240.
[134] M. Uronen, Market segmentation approaches in the mobile service business,
Master’s thesis, Helsinki University of Technology, 2008.
[135] B. van Ginneken, A. A. Setio, C. Jacobs, and F. Ciompi, Off-the-
shelf convolutional neural network features for pulmonary nodule detection in
computed tomography scans, in Biomedical Imaging (ISBI), 2015 IEEE 12th
International Symposium on, IEEE, 2015, pp. 286–289.
[136] N. X. Vinh, J. Epps, and J. Bailey, Information theoretic measures for
clusterings comparison: is a correction for chance necessary?, in Proceedings of
the 26th Annual International Conference on Machine Learning, ACM, 2009,
pp. 1073–1080.
[137] S. P. Walsh, K. M. White, and R. McD Young, Needing to connect:
The effect of self and others on young people’s involvement with their mobile
phones, Australian journal of psychology, 62 (2010), pp. 194–203.
126
[138] L. Waltman and N. J. van Eck, A smart local moving algorithm for large-
scale modularity-based community detection, The European Physical Journal
B, 86 (2013), p. 471.
[139] T. Wang, Y. Cai, H.-f. Leung, R. Y. Lau, Q. Li, and H. Min, Product
aspect extraction supervised with online domain knowledge, Knowledge-Based
Systems, 71 (2014), pp. 86–100.
[140] Y. Wang, M. Huang, L. Zhao, et al., Attention-based lstm for aspect-level
sentiment classification, in Proceedings of the 2016 conference on empirical
methods in natural language processing, 2016, pp. 606–615.
[141] Y. Wang, L. Wang, Y. Li, D. He, and T.-Y. Liu, A theoretical analysis of
ndcg type ranking measures, in Conference on Learning Theory, 2013, pp. 25–
54.
[142] C. Xing, D. Wang, X. Zhang, and C. Liu, Document classification with
distributions of word vectors, in Signal and Information Processing Association
Annual Summit and Conference (APSIPA), 2014 Asia-Pacific, IEEE, 2014,
pp. 1–5.
[143] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov,
R. Zemel, and Y. Bengio, Show, attend and tell: Neural image caption gen-
eration with visual attention, in International Conference on Machine Learning,
2015, pp. 2048–2057.
127
[144] B. Yan and G. Chen, Appjoy: personalized mobile application discovery, in
Proceedings of the 9th international conference on Mobile systems, applica-
tions, and services, ACM, 2011, pp. 113–126.
[145] B. Yang and C. Cardie, Joint inference for fine-grained opinion extraction.,
in ACL (1), 2013, pp. 1640–1649.
[146] H. Zhang, T. W. Chow, and M. Rahman, A new dual wing harmonium
model for document retrieval, Pattern Recognition, 42 (2009), pp. 2950–2960.
[147] J. Zhang, Z. Ghahramani, and Y. Yang, A probabilistic model for on-
line document clustering with application to novelty detection, in Advances in
Neural Information Processing Systems, 2004, pp. 1617–1624.
[148] S. Zhong, Efficient online spherical k-means clustering, in Proceedings. 2005
IEEE International Joint Conference on Neural Networks, 2005., vol. 5, IEEE,
2005, pp. 3180–3185.
[149] H. Zhu, E. Chen, H. Xiong, K. Yu, H. Cao, and J. Tian, Mining mobile
user preferences for personalized context-aware recommendation, ACM Trans-
actions on Intelligent Systems and Technology (TIST), 5 (2015), p. 58.
[150] X. Zou, W. Zhang, S. Li, and G. Pan, Prophet: What app you wish to use
next, in Proceedings of the 2013 ACM conference on Pervasive and ubiquitous
computing adjunct publication, ACM, 2013, pp. 167–170.
128
국문초록
본 논문에서는 데이터 분석에 기반한 사용자 경험 디자인 방법론들을 제안한다. 기존
의 사용자 경험 연구, 특히 스마트폰의 사용자 경험을 향상시키기 위한 많은 연구들이
학계 및 산업계에서 시도되었으나 대부분의 기법이 디자이너 및 기획자의 능력에 의존
하는 것을 가정한 경우가 많아 관련된 여러가지 문제점을 내재하고 있었다. 따라서 본
연구에서는 기존의 문제점을 해결하기 위한 방법론으로써 세부적으로 고객 요구 사항
분류, 사용자 세그멘테이션 및 디자인 요소 선정 등의 주제에 초점을 맞추어 연구를
진행하었다. 첫째, 고객 요구 사항 분류 문제에서는 기존 대비 높은 성능을 보이는 문서
클리닝, 표상, 분류 방법론을 제안하였다. 둘째, 사용자 세그멘테이션 문제에서는 기존
연구들과 달리 실제 유저의 애플리케이션 사용 패턴에 기반한 사용자 세그멘테이션
방법론을 제안하였다. 마지막으로 디자인 요소 선정 문제에서는 콘텐츠를 재구성하는
문제와 스펙 시트의 항목을 선정하기 위한 방법론들을 제안하였다.본문에 기술된 높은
수준의 성능 및 실험 결과를 통해 데이터 분석 기법에 기반한 본 연구가 기존의 방법론
들에 내재되어 있던 문제점들을 효과적으로 해결할 수 있음을 확인할 수 있었고, 또한
사용자 요구 사항 분석, 제품 기획 및 디자인 업무에 연관된 현업 종사자들에게 의미있
는 인사이트를 제공할 수 있음을 예상할 수 있었다. 추후에는 본 연구를 더욱 발전시켜
사용 패턴 및 행태 분석, 앱 사용 추천 및 그래픽 디자인에 이르기까지 본 연구에서
다루지 않았던 전체 UX 디자인 프로세스로 그 연구 범위를 확장시킬 수 있을 것으로
기대한다.
주요어: 사용자 경험, 데이터 분석, 문서 분류, 사용자 세그멘테이션, 디자인 요소 선정
학번: 2016-30254
129
감사의 글
우선 해당 논문이 나오기까지 학문적으로 많은 지도를 아끼지 않으시고,생활적인 면에
서도 정말 많은 배려를 해주신 조성준 교수님께 가장 큰 감사를 드립니다. 그리고 항상
따뜻한 격려로 저를 이끌어주시는 강석호 교수님과 논문 심사 과정에서 훌륭한 조언
을 해주신 윤명환 교수님, 박우진 교수님, 정재윤 교수님, 홍지영 책임님께도 진심으로
감사 드립니다.
무엇보다 박사 과정을 무사히 마칠 수 있도록 수년간 묵묵히 지원해준 사랑하는
아내 경혜와 박사 과정 중에 태어난 사랑스러운 아들 유안이, 그리고 존경하는 부모님,
장인, 장모님, 처남께도 이 작지만 값진 영광을 바치고 싶습니다.
또한 연구를 진행함에 있어서 여러 가지로 조력을 아끼지 않은 연구실 선후배님들
에게도 많은 감사를 드리고 싶습니다. 특히 많은 것을 가르쳐준 선배인 태훈이, 호성이,
진원이, 태욱이, 훈식이, 제혁이, 용대, 은지, 현창이, 현중이. 그리고 함께 연구하면서
많은 것을 배운 동기들 인범이, 석민이, 진배, 동영이, 혜진이, 성환이, 지형이. 마지
막으로 즐거운 연구실 생활을 만들어준 후배들 민기, 도형이, 효창이, 동민이, 노일이,
연국이형 감사합니다.
마지막으로 공부를 병행할 수 있도록 양해해주신 회사 상사 최진해, 안정, 안신희,
조민행, 윤정혁, 김은영, 손주희, 이지은님 및 동료 임소연, 이진희, 문윤정, 김진욱,
이주혜, 윤지은, 정주현, 김성민, 나하나, 채병기님들께도 감사의 인사를 전하고 싶고,
여러 가지로 응원해준 생산방 선후배님들, S3, 꾸러기, 스크린, 탁구, JB, 대진, 세일,
금성 멤버 및 모든 동료들께도 감사를 표합니다.
이논문이앞으로누군가에게조금이나마도움이될수있는결과물이기를바라면서
다시 한번 모든 분들께 감사 드립니다. 감사합니다.
131