A Study on Interpretable Machine Learning and Applications
20
A Study on Interpretable Machine Learning and Applications Nguyen Thanh Phu 1820009 HUYNH LAB Graduate School of Advanced Science and Technology Japan Advanced Institute of Science and Technology Knowledge Science March, 2021 Keywords: Interpretable Machine Learning, Explainable Artificial Intelligence, K-means-based Clustering, Classification, Dempster–Shafer Theory. Research Content Machine learning and data mining techniques have been developed rapidly in recent times. In tasks such as classification, machine learning techniques have been shown to equal to and even surpass human performance. However, high-performance models are usually complex, opaque and have low interpretability thus making it difficult to explain the underlying behaviors of those models that lead to the final outcomes. In many domains such as medicine and health- care, interpretability is one of the most important factors when considering the adoption of those models. In our research, we aim to develop transparent machine learning models that are not only able to provide users with knowledge about the underlying data but also still can achieve competitive performance compared with other commonly used techniques. Specifically, in the field of unsupervised learning, clustering is a fundamental task that has been utilized in many scientific fields. Clustering groups data into clusters. For each cluster, objects in the same cluster are similar between themselves and dissimilar to objects in other clusters. K-means is a popular interpretable method for the clustering task. However, it suffers from the problem of underfitting data with simple dissimilarity measures - a key part in formu- lating clusters of k-means. In our work, we proposed a new k-means-based clustering method with a novel dissimilarity measure that can better fit with the underlying data. The effectiveness of the proposed clustering algorithm is proven by a comparative study conducted on popular clustering methods for categorical data. In the field of supervised learning, we proposed a two-stage binary classification system Copyright c 2021 by Nguyen Thanh Phu i
A Study on Interpretable Machine Learning and Applications
Intended Degree: Knowledge Science, Doctoral Degree
Laboratory: Nagai Lab
Student Number: 1820034
Research Content & Research Purpose
Our dissertation is mainly focusing on several topics for improving
collaboration
and communication in an enterprise. By considering two features of
collaboration,
unstructured collaboration (information collaboration) and
structured collaboration
(process collaboration), we primarily focus on two representative
tools: email and
Enterprise Resource Planning (ERP) System.
In terms of an enterprise, most of the current research result is
struggling to achieve
specific and practical goals by proposed theoretical findings in
the ERP domain. To
enable the managers to get a fuller picture of all the messages
generated from an ERP
system with the Enterprise Collaboration System (ECS) and improve
collaboration and
communication, we propose a complete method to develop an
artifact-SuccERP based
on the Design Science approach to carry out the integration. By
exploring multiple ERP
systems, we summarize our tasks into three aspects before
implementing the
integrations: authentication, data initialization, and specific
procedures implementation;
we also explain how the data-processing and integrations between
the ERP and ECS.
In our perspective, we can distinguish our contribution of the
proposed SuccERP
into two parts; 1) We present a complete demonstration of how to
get the architecture
and database schema of an existing ERP system and consider the
internal and external
hosting issues. 2) According to a series of literature reviews, we
implement the
integration based on the critical success factors and existing
issues presented in the
previous studies. In other words, we try to fill up the gap in
communication and
collaboration capabilities by enhancing the ERP and ECS systems'
integrations. In short,
we fulfill the data-processing and data-sharing from an ERP system
to the external
resources. Besides, based on our results, follow-up research can
explore the
implementation with other external resources for improving
different issues. Given the
context of the increasing demands of custom ERP, it is reasonable
to provide detailed
research as a guideline to those enterprises that plan to upgrade
and enhance their ERP
systems.
Next, the definition of information collaboration is employees
applying IT tools
to communicate and request assistance (answer); email is the most
standard
documentation tool for communication. Although existing studies use
the topic model
to support users for classifying emails, they disregard that human
is not like a machine
can focus on all the words in an email to determine the
distribution of email topics. The
Latent Dirichlet Allocation (LDA) model forms a basis for inferring
topics; our work
aims to discover how each word's visual attention influences the
topic inference and
estimates attention to a word according to its location
features.
By reviewing the visual-spatial research and the state-of-the-art
visual attention
models, we select the Bayesian Models to estimate attention and
proposing a novel
model-Attention orientation Latent Dirichlet Allocation model
(AttLDA). In AttLDA,
each email can regard as encoded into a two-dimensional space. We
take the line length
(the characters per line in an email) and window size (the number
of lines in an email)
into account to draw the optimal display size as a visual space and
assign a location for
each word in an email. Besides location, attention estimation also
considers the Term
Frequency and Inverse Document Frequency (TFIDF) and inferred
topics for each
iteration. Our aim is as follows; the readers can not completely
capture all the hidden
topics behind each word in an email, especially the context in the
forwarded message.
Unlike the previous research, our result shows each email's topic
distribution and
includes the distribution of related words' attention in each
topic. More precisely, we
can consider the visual attention as the significance of an email's
topic distribution. In
our experiment, we consider the public Enron email corpus as a
dataset and apply the
Perplexity metrics to measure the performance of AttLDA. AttLDA is
outperforming
the previous research on the perplexity evaluation.
Advanced technology has made the communication distance between
people
shorter than ever before and accumulates the number of messages
quicker and quicker.
People might quickly out of control for managing their messages
owing to their
negligence. Our research proposes the SuccERP, which builds a
platform to manage
ERP and ECS messages through definite guidelines to keep
communication efficiency.
On the other hand, we also proposed the AttLDA to effectively
extract the email topics
to improve email message management performance, and it can be
considered a feature
for settling further tasks.
Research Accomplishment
1. Lin, Y., Nagai, Y., Chiang, T., & Chiang, H. SuccERP: The
Design Science based
integration of ECS and ERP in post-implementation stage.
International Journal of
Engineering Business Management, Peer review, ijebm-20-0119.
(2nd-Review,
submitted at 25-Sep-2020).
2. Lin, Y., Nagai, Y., Chiang, T., & Chiang, H. AttLDA: Email
Topic Identification
using Latent Dirichlet Allocation integrated with Visual Attention.
Information
Processing and Management, Peer review, IPM-D-20-00545.
(1st-Review,
submitted at 10-July-2020).
3. Chiang, H. K., Nagai, Y., & Lin, Y. Y. (2020). Link up
Industry 4.0 with the
Enterprise Collaboration System to Help Small and Medium
Enterprises.
Mathematical Problems in Engineering, Peer review, vol. 2020, 1-13.
(Accept, I
am not first author).
4. Lin, Y., Nagai, Y., Chiang, T., & Chiang, H. (2020, March).
Design and Develop
Artifact for Integrating with ERP and ECS Based on Design Science.
In
Proceedings of the 2020 The 3rd International Conference on
Information Science
and System (pp. 218-223).
AI
AI
AI
AI
AI
4.2% AI 2020 :
5855
AI
AI
AI
A.
A-1.
Shirasaka , Hajime , Takashi Mikami , Makoto Onizuka, Youji Kohda
and Amna Javed” Structural
Condition of Combinatorial Innovation through Patent-ability AI
analysis”, International Journal of
Intellectual Property Management, Inderscience (2021 ).
A-2.
,2021 2 Vol.74 No.2 .
B.
Shirasaka , Hajime , Youji Kohda,2017,“Study on Impact of
Evaluation for Intellectual Property Value
using Artificial Intelligence on Intellectual Property Management”
5th International Conference on
Serviceology, Vienna: ICServ2017 , 207-210.
B-2.
Shirasaka , Hajime, Takashi Mikami, Youji Kohda, Amna Javed and
Yosuke Nara ,2019,“Artificial
Intelligence Examiner in Patent Evaluation” ,1st International
Conference on Information and
Knowledge Management, Dhaka, :i-IKM2019,67-68.
B-3.
Shirasaka , Hajime , Youji Kohda ,2020,“New Determination of
Inventive Step by Collaboration
between AI and Patent Attorneys” EPIP 2020 Conference, Madrid (EPIP
in Madrid Postponed to 8-
10 September 2021 for COVID-19).
C.
C-1.
11 (3): 533-547.
C-2.
No.3 : 18-21
D.
D-1. ,2017, 452017 8
:108-113.
E.
6185209
E-2. "
E-3. "
E-4. "
” 2017-234732 6457058
E-5. "
” 2018-237699 6531302
E-6. "”
2018-069694 6506439
E-7. "
4
E-8. "
” 2018-112348
” 2018-187804
” 2019-002764
E-11. "
” 2019- 002783
E-12. "
“ 2019-002801
" 2019-519032 6550583
E-14. "
” 2019-526021 6555704
E-15. "
” 2020-23711
E-16. "
” 2020-3571
E-17. "
” 2020-3572
E-18. "
” 2020-21848
” 6653833
2020-26761
E-21. 4 1 Allowable 3 1
1
F.
F-1. 2019 3 4 JEITA
F-2. 2019 10 GOOD DESIGN AWARD 2019
F-3. 2020 11 Matching HUB Business Idea & Plan
Competition