15
Research Article A Mobile-Based Question-Answering and Early Warning System for Assisting Diabetes Management Wenxiu Xie , 1 Ruoyao Ding , 1 Jun Yan , 2 and Yingying Qu 3 1 School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China 2 AI Lab, Yidu Cloud (Beijing) Technology Co. Ltd., Beijing, China 3 School of Business, Guangdong University of Foreign Studies, Guangzhou, China Correspondence should be addressed to Yingying Qu; [email protected] Received 27 March 2018; Accepted 26 April 2018; Published 6 June 2018 Academic Editor: Haoran Xie Copyright © 2018 Wenxiu Xie et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. With increasing demand for preventive management of chronic diseases in real time by using the Internet, interest in developing a convenient device on health management and monitoring has intensified. Unlike other chronic diseases, diabetes particularly type 2 is a lifelong chronic disease and usually requires daily health management by patients themselves. is study is to develop a mobile- based diabetes question-answering (Q&A) and early warning system named Dia-AID, assisting diabetes patients and populations at high risk. e Dia-AID system consists of three modules: a large-scale multilanguage diabetes frequently asked question repository, a multimode fusion Q&A framework, and a health data management module. A list of services including risk assessment and health early warning is provided to users for health condition monitoring. Using the diabetes frequently asked question repository as data, experiments are conducted on answer ranking and answer selection aspects. Results show that two essential methods in the system outperform baseline methods on both aspects. 1. Introduction With the increasing attention of ubiquitous healthcare (U- healthcare) services and the developing of information tech- nology, there has been a great need for preventive man- agement of chronic diseases and management of individ- ual health conditions [1]. Diabetes mellitus, a.k.a. diabetes, as one of the most representative chronic diseases, has become a serious global public health issue and the most challenging health problem in the 21st century [2–4]. e statistics of the number of diabetes patients 20-79 years of age in the past 18 years are shown in Figure 1, according to the latest global estimation from the International Dia- betes Federation (IDF) and the Research2guidance report (http://www.research2guidance.com). Compared to 151 mil- lion in 2000, there is nearly a threefold increase in the number of adults living with diabetes mellitus. Moreover, the number is expected to increase from 425 million in 2017 to 629 million in 2045, which means that one out of 11 adults will suffer from diabetes [5]. In addition, as reported by the World Health Organization (WHO), diabetes was the direct cause of 1.6 million deaths in 2015. However, nearly 50% of diabetes patients are undiagnosed and remain unaware of their conditions. Among the patient population, the majority of diabetes cases are type 2 diabetes mellitus (T2DM) [6]. Unlike type 1 diabetes mellitus which remains unpreventable with current knowledge, 80% of type 2 diabetes mellitus can be prevented by keeping moderate blood sugar and lifestyle [7]. People with diabetes type 2 frequently need counseling on healthy diet and regular physical activity to reduce the risk of complication [8]. us, diabetes management is a crucial and necessary procedure for diabetes patients or people at a high diabetes risk [9–11]. Recently, the focus of healthcare is shiſting from treat- ment to prevention and early diagnosis of disease [12]. Fox et al. [13] addressed that 31% of US smartphone owners used their phones to search for medical information online, 30% of Internet users consulted online reviews of rankings of healthcare services or treatments, and 26% of Internet users read other people’s experiences about health or medical issues. By 2015, nearly 500 million smartphone users used mobile health applications especially for diet and disease Hindawi Wireless Communications and Mobile Computing Volume 2018, Article ID 9163160, 14 pages https://doi.org/10.1155/2018/9163160

A Mobile-Based Question-Answering and Early …downloads.hindawi.com/journals/wcmc/2018/9163160.pdfunable to serve users with clinically relevant answers in a timely manner and thus

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Mobile-Based Question-Answering and Early …downloads.hindawi.com/journals/wcmc/2018/9163160.pdfunable to serve users with clinically relevant answers in a timely manner and thus

Research ArticleA Mobile-Based Question-Answering and Early Warning Systemfor Assisting Diabetes Management

Wenxiu Xie 1 Ruoyao Ding 1 Jun Yan 2 and Yingying Qu 3

1School of Information Science and Technology Guangdong University of Foreign Studies Guangzhou China2AI Lab Yidu Cloud (Beijing) Technology Co Ltd Beijing China3School of Business Guangdong University of Foreign Studies Guangzhou China

Correspondence should be addressed to Yingying Qu yingyinqu2gmailcom

Received 27 March 2018 Accepted 26 April 2018 Published 6 June 2018

Academic Editor Haoran Xie

Copyright copy 2018 Wenxiu Xie et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

With increasing demand for preventive management of chronic diseases in real time by using the Internet interest in developing aconvenient device on healthmanagement andmonitoring has intensified Unlike other chronic diseases diabetes particularly type 2is a lifelong chronic disease and usually requires daily health management by patients themselvesThis study is to develop amobile-based diabetes question-answering (QampA) and early warning system namedDia-AID assisting diabetes patients and populations athigh riskTheDia-AID system consists of three modules a large-scale multilanguage diabetes frequently asked question repositoryamultimode fusionQampA framework and a health datamanagementmodule A list of services including risk assessment and healthearly warning is provided to users for health conditionmonitoring Using the diabetes frequently asked question repository as dataexperiments are conducted on answer ranking and answer selection aspects Results show that two essential methods in the systemoutperform baseline methods on both aspects

1 Introduction

With the increasing attention of ubiquitous healthcare (U-healthcare) services and the developing of information tech-nology there has been a great need for preventive man-agement of chronic diseases and management of individ-ual health conditions [1] Diabetes mellitus aka diabetesas one of the most representative chronic diseases hasbecome a serious global public health issue and the mostchallenging health problem in the 21st century [2ndash4] Thestatistics of the number of diabetes patients 20-79 years ofage in the past 18 years are shown in Figure 1 accordingto the latest global estimation from the International Dia-betes Federation (IDF) and the Research2guidance report(httpwwwresearch2guidancecom) Compared to 151 mil-lion in 2000 there is nearly a threefold increase in thenumber of adults living with diabetes mellitus Moreover thenumber is expected to increase from 425 million in 2017 to629 million in 2045 which means that one out of 11 adultswill suffer from diabetes [5] In addition as reported by theWorld Health Organization (WHO) diabetes was the direct

cause of 16 million deaths in 2015 However nearly 50 ofdiabetes patients are undiagnosed and remain unaware oftheir conditions Among the patient population the majorityof diabetes cases are type 2 diabetes mellitus (T2DM) [6]Unlike type 1 diabetes mellitus which remains unpreventablewith current knowledge 80 of type 2 diabetes mellitus canbe prevented by keeping moderate blood sugar and lifestyle[7] People with diabetes type 2 frequently need counselingon healthy diet and regular physical activity to reduce the riskof complication [8] Thus diabetes management is a crucialand necessary procedure for diabetes patients or people at ahigh diabetes risk [9ndash11]

Recently the focus of healthcare is shifting from treat-ment to prevention and early diagnosis of disease [12] Foxet al [13] addressed that 31 of US smartphone ownersused their phones to search for medical information online30 of Internet users consulted online reviews of rankingsof healthcare services or treatments and 26 of Internetusers read other peoplersquos experiences about health or medicalissues By 2015 nearly 500 million smartphone users usedmobile health applications especially for diet and disease

HindawiWireless Communications and Mobile ComputingVolume 2018 Article ID 9163160 14 pageshttpsdoiorg10115520189163160

2 Wireless Communications and Mobile Computing

151194

246285

366 382415 425

050

100150200250300350400450500

Adul

ts w

ith D

iabe

tes (

mill

ions

)

2003 2006 2009 2011 2013 2015 20172000Year

Figure 1 Total number of adults with diabetes (20-79 years old)around the world during 2000 to 2017

management [7] Later Krebs et al [14] showed that 5823(9341604) of mobile phone users downloaded a health-related mobile app and used it at least once per day Asa convenient platform for checking usersrsquo health status ona real-time basis mobile applications have been developedfrom information provision to lifestyle-oriented smart healthmanagement Moreover existing research presented thatcontinuous real-time consulting and monitoring supportedby smartphones is applicable for improving efficiency ofdiabetes self-management [7 15ndash17] Therefore developinga mobile-based system for diabetes patients to assist in theirhealth management is of great importance

Many studies have shown that current medical searchengines eg PubMed Medical Subject Headings (MeSH)and Unified Medical Language System (UMLS) are oftenunable to serve users with clinically relevant answers in atimely manner and thus fail to satisfy patientsrsquo counselingneed [18 19] Hersh et al [20] found that a healthcareprofessional tookmore than 30minutes on average to seek ananswer utilizing information retrieval systems The processneeds about 2 minutes on average to obtain an answereven for experienced doctors [21] Instead based on naturallanguage processing techniques question answering (QampA)aims to provide users with direct precise answers to theirquestions and thus it is more preferred Hence there isan increasing demand to develop convenient and effectivequestion-answering systems for the medical domain [21ndash24] Moreover there is a particularly growing demand ofQampA systems for effectively and efficiently assisting diabetespatients to better utilize ever-accumulating expert knowledge[1 7 15]

To that end this study aims to develop a mobile-basedquestion-answering and early warning system called Dia-AID The system consists of 3 modules a large-scale multi-language diabetes FAQ repository a multimode fusion QampAframework and a health datamanagementmodule with earlywarning functionThe repository captures diabetes questionswith expert-defined answers and stores the question-answerknowledge in an interpretable and extendible form Theframework contains three different QampA resolution strate-gies knowledge-based QampA FAQ-based QampA and web-basedQampAThe health datamanagementmodule containingearly warning provides a convenient counseling service on a

smart health platform to assist diabetes patients in monitor-ing their health conditions

The contributions of this work include the following(1) a large-scale multilanguage diabetes FAQ repository isbuilt with a consistent representation format (2) a novelmultimode fusion QampA framework that integrates threemodes of QampA technologies is proposed to fulfill diabetesinformation seeking need (3) a health data managementmodule containing early warning function is developed tomonitor patient health status

The rest of this paper is organized as follows Section 2introduces related work in biomedical question answeringSection 3 describes themobile-based question-answering andearly warning system Dia-AID in detail Section 4 presentsthe experiment results of ourmethods based on the FAQ datarepository Section 5 addresses the conclusions

2 Related Work

The aim of question answering is to provide precise answersinstead of relevant documents from unstructured datasources to inquirers The research of open domain questionanswering (QampA) started from the prompt and instantiatedwork in the Text REtrieval Conference (TREC) evaluationcampaign [25] Recently with the increasing demand ofdomain-specific applications a growing interest has shiftedfrom open domain QampA to restricted domain QampA [26 27]Molla et al [28] addressed that restricted domain QampA tar-geting domain-specific information was expected to achieveeffective and reliable performance in real-world applicationsFurther as claimed by Mishra et al [29] restricted domainQampA could fulfill the specialized information requirementsof domain experts therefore improving the satisfaction ofusers Similarly Yu et al [30] and Rinaldi et al [31] notedthat restricted domain QampA such as biomedical domain [2432] could exploit domain-specific knowledge resources fordeeper text analysis as well as taking advantage of domain-specific typology formatting conventions to improve theanswer extraction performance

In light of Athenikos et alrsquos research [27] medical domainquestion answering was facing the challenge of highly com-plex domain-specific terminology and lexical and ontologicalresources Also emphasized by Abacha et al [33] the keyprocess was to translate the semantic relations expressed inquestions into a machine-readable representation to analyzethe natural language questions deeply and efficiently Theypresented a complete question analysis approach includingmedical entity recognition semantic relation extraction andautomatic translation to SPARQL queries Result presentedthat 60 of the questions were correctly translated toSPARQL queries via the proposed method Later Anca [34]proposed the GFMed for dealing with the same problem andthe challenge of querying a large number of LinkedData fromvarious domains GFMed was a QampA system for biomedicalinterlinked data aiming to fill the gap between end users andformal languages by introducing a grammatical framework totranslate biomedical information in natural language to thecorresponding SPARQL language The experimental resultsdemonstrated that the proposed methodology for building

Wireless Communications and Mobile Computing 3

Controlled Nature Language for querying Linked Data wasvalid Abacha et al [35] proposed an approach for ldquoAnswerSearchrdquo based on semantic search and query relaxation toresolve the problem of automatic QampA in medical domainThey defined question focuses as medical entities that werethe most closely linked to answers to improve the overallperformance of question answering Terol et al [36] claimeda general QampA system that was capable of working over anyrestricted domain Taking medical domain as an exampleapplication their system answeredmedical questions accord-ing to a generic question taxonomy and gained 944 overallprecision on the task

During question-answering process question represen-tation is an essential step in question analysis and answerretrieval Zhang et al [37] proposed a system based onmultilayer self-organizing map providing an efficient solu-tion to the organization problem of structured data of elec-tronic books A tree-structured representation was proposedto formulate the rich features of an e-book author Theirexperiment results corroborated that the proposed modelsbased on the tree-structured representation outperformedcontent-based models Later in their further research anefficient learning framework Tree2Vector for transformingtree-structured data into vectorial representations was pro-posed [38] By utilizing Tree2Vector framework to map tree-structured book data into vectorial space their continuedexperiments further presented that the mapped vectorialspace could explore term spatial distributions over a bookrather than the traditional documentmodelingmethods [39]

A recent trend among medical QampA systems was toincorporate the organized medical information throughoutQampA process in order to utilize the information for efficienthealthmanagement in various areas such asU-healthcare [4041] Jung et al [42] developed a decision supporting methodmainly for pain management for chronic disease patientsbased on frequent pattern treeminingThe proposedmethodaimed to reduce time and expenses for pain decision-makingof patients who were frequently exposed to pain Chunget al [12] presented a knowledge-based health service byleveraging a hybridwireless fidelity peer-to-peer architectureThe service was proposed to provide patients with efficientand economical healthcare through correct measurement ofvarious biosignals so that users could easily predict andmanage both health and disease Han et al [43] introduced aU-health service system THE MUSS focusing on achievingreusability and resolvability to provide stress and weightmanagement services

In subsequent medical QampA developments diabetesmellitus as one of the top three major worldwide causesof death from noncommunicable diseases has promptednumerous researches investigating the prevention preva-lence and mortality of diabetes mellitus [15 44ndash53] Thereis a great demand for a QampA system that can effectivelyand efficiently provide health consulting services and assistpeople in monitoring and managing their individual healthconditions Jung et al [7] explored a mobile healthcare appli-cation for providing self-diabetes management to patientsBy interoperating with Electronic Medical Record (EMR)the healthcare application provided services such as weight

management cardio-cerebrovascular risk evaluation andexercise management Waki et al [54] developed a real-time interactive system DialBetics to achieve diabetes self-management particularly HbA1c management By an eval-uation strategy the system helped patients improve theirHbA1c significantly by monitoring health data comparedwith continuing self-care regimen patients More recentlyYoo et al [1] proposed a Personal Health Record- (PHR-)based diabetes index service model through a mobile deviceoffering users a management information service for pre-venting diabetes Users were able to check their healthconditions on a real-time basis and receive informationabout desirable health behaviors and dietary habits related todiabetes

Yet the existing diabetes management applications pro-vided general information search and management whileignoring counseling services which were crucial for man-aging the health condition of diabetes patients Besidesas claimed by Mishra et al [29] the cons of restricteddomain QampA included the limited repository of domain-specific questions To overcome these difficulties we builta LMD-FAQ repository to provide users with concise andaccurate answers by physicians or experts from debatesrelated professional websites Moreover we aim to leveragethe LMD-FAQ repository to provide counseling services ofdiet medication and symptoms for diabetes patients Inaddition based on our previous work [55] by analyzingglobal clinical trials of 190 countries provided by the NationalInstitutes of Health (NIH) we discovered 6 representativehealth characteristics that were closely related to diabetesmellitus to better manage usersrsquo health conditions The sixrepresentative health characteristics were Body Mass Index(BMI) Glucose Systolic Hypertension Diastolic Hyperten-sion HbA1c and creatinine Further we defined severalearly warning intervals of health characteristics by referringto the existing international medical standards for healthmanagement and risk warning

3 Methods and Materials

The architecture of our mobile-based diabetes question-answering and early warning system Dia-AID is shown inFigure 2 It consists of 3 modules a large-scale multilanguagediabetes FAQ repository (LMD-FAQ repository) a novelmultimode fusion question-answering framework (MMF-QA) and a diabetes data management module with earlywarning (DM-EW) The LMD-FAQ repository contains alarge number of diabetes question-answer pairs acquiredfrom mainstream diabetes-related professional websites TheMMF-QA framework integrates three strategies knowledge-based QampA FAQ-based QampA and web-based QampA TheDM-EW module records patientsrsquo health data and monitorstheir health conditions in real time Six representative healthcharacteristics that are closely related to diabetes mellitusthat is BMI glucose systolic hypertension diastolic hyper-tension HbA1c and creatinine are applied In case of a rapidcharacteristic change or a predication of deterioration themodule will automatically warn patients and provide themwith dietary guidelines

4 Wireless Communications and Mobile Computing

Dia-AID

LMD-FAQ Repository

Automatic Question Acquisition

Question Target Identification and

Classification

Semantic Pattern Generation and Matching

Terminals

1 A Large-scale Multi-language Diabetes FAQ Repository

2 A Multi-mode FusionQampA Framework

3 Data Management with Early Warning Function

Computer

Smart Device

Mobile Phone

Knowledge-based QampA

FAQ-based QampA

Web-based QampA

Personal Characteristic Health Data

Heath Data Monitoring

Risk Warning

QSem-based Question Matching

LSI-based Answer Ranking

Risk Evaluation

Answer Selection

User profiles

Figure 2 The architecture of the diabetes question-answering and early warning system Dia-AID

31 The Large-Scale Multilanguage Diabetes FAQ RepositoryFrequently Asked Questions (FAQs) provide specific answersto the questions that are frequently asked when users browsespecific websites For example the website Health China(httphealthchinacom) allows users to ask questions infree text and those who are experts in the field answer thequestions freely These questions with professional answersare collected and organized as FAQ data The FAQ datacan dramatically benefit question answering by reusingthe accumulated professional knowledge In this paper wedevelop a method to automatically construct a large-scalemultilanguage diabetes FAQ (LMD-FAQ) repository throughidentifying FAQ data from professional diabetes websites

As illustrated in Figure 2 our method includes four steps(1) The first step is automatic question acquisition We firstanalyze the page structures of specific websites to identifydiabetes questions The websites elaborately selected bydomain experts include Diabetes Clinical Guidelines (Chi-neseMedical Association Diabetes Branch) professional dia-betes websites (International Diabetes Union the AmericanDiabetes Association etc) diabetes professional information

websites (CDC Health Channel) and diabetes interactivequestion-answering websites (Yahoo Knowledge)The ques-tions and associated answers are then extracted using regularexpression matching with the page codes (2) The secondstep is question target identification and classification Basedon our previous work [56 57] an automated answer typeidentification and classification method is applied to extractthe target and intent of questions by utilizing both syntacticand semantic analysis Considering that syntactic structuresvary according to the ways questions are asked four typicalsituations are identified and analyzed with each of themhaving a specific processing strategy During the processquestion target features are extracted via a principle-basedsyntactic parser and then expanded with their hypernymyfeatures and semantic labels Finally the expanded featuresare sent to a trained classifier to predict corresponding answertypes (3) The third step is semantic pattern generation andmatching Semantic pattern is utilized to index the questionswith answers in a more structured and semantic way Withquestion target and answer type extracted by the secondstep the questions are represented by a structural semantic

Wireless Communications and Mobile Computing 5

Figure 3 The visualization of example FAQ data in the LMD-FAQ repository

pattern which consists of five components the questiontarget question type concept event and constraint Anentropy-based method proposed in previous work [58] isapplied for automated semantic pattern generation Figure 3shows the visualization of example FAQ data in the LMD-FAQ repository

Based on the above procedure the method extractsFAQ data from professional websites formats them using aconsistent representation and indexes them with semanticpatterns for fast retrievalThrough the automatic process andthe human review on the indexed data the FAQ repositorycan be incrementally maintained Currently the LMD-FAQrepository comprises 19317 English frequently asked QApairs and 6041 Chinese QA pairs The repository providesour QampA system with fundamental data support for answer-ing commonly posted questions

32 The Multimode Fusion Question-Answering Frame-work Themultimode fusion question-answering framework(MMF-QA) integrates three QampA models knowledge-basedQampA FAQ-based QampA and web-based QampA The overallframework is shown in Figure 4The procedure of themodelsis described as follows

The knowledge-based QampA model relies on a diabetesknowledge base to generate concise answers for postedquestions For a new given question the model analyzes thestructure and keywords of the question and then generates

a corresponding semantic pattern Thus the question istransformed from natural language to a structural semanticrepresentation that captures semantic information such asquestion target question type concept event and con-straint The question then is further represented as a tuple([Concept1] Relation [Concept2]) in which ldquoConcept1rdquo andldquoConcept2rdquo are used to label meaningful entities The repre-sented question is used for answer extraction fromknowledgebase For instance ldquoWhatrsquos the symptoms of diabetesrdquo isrepresented as ([symptoms] Rel of [diabetes]) Thereforethe knowledge-based QampA process mainly maps entitiesand their relations to formally represented tuples which arefurther used to match knowledge base to retrieve accuratelymatched knowledge elements as answers

The FAQ-based QampA model computes matching scoresbetween a given question and questions in the FAQ reposi-toryThe questionswithmatching scores larger than a specificthreshold are kept as candidates The candidate questionsthen are ranked and the top 119896 questions with the highestscores are returned The model consists of three main stepsQsem-based question matching LSI-based answer rankingand answer selection As claimed by [59] a major challengeof FAQ-based QampA is to match questions to correspond-ing question-answer pairs Here we apply a QSem-basedquestion matching framework proposed in one of ourprevious works [60] to support answering FAQs throughreusing accumulated QA data The framework considers

6 Wireless Communications and Mobile Computing

User Questions

Multi-mode Fusion QampAFramework

Question Target Identification and Classification

Semantic Pattern-based Question Representation

Question Analysis

Knowledge Base

LMD-FAQ Repository

WebPreprocessed

questionsTop N

Answers

Semantic Pattern Generation

Semantic Pattern-based Question Representation

Knowledge-based QampA

Qsem-based Question Matching

LSI-based Answer Ranking

FAQ-based QampA

Answer Selection

Qsem-based Question Matching

Web-based QampA

knowledge elements

AN

Qcad

Qfaq

Qweb

Figure 4 The multimode fusion QampA framework MMF-QA

both question word types and semantic pattern accordingto their functionalities in question matching The questionword types include question target word user-oriented wordand irrelevant word These three word types are semanticallylabeled by a predefined ontology to enrich the semanticrepresentation of questions For each word type differentsimilarity strategies are applied to calculate the similarity asdescribed in [60] The similarity calculations for questiontarget and user-oriented word type between question 119902119894 and aFAQ candidate faq119895 are shown in (1) (2) and (3) respectively

119878119894119898119894119876119879 (119902119894 119891119886119902119895)

=2 times 10038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast) cap 119891119886119902119895 (119876119879119882 997888rarr 119871lowast)100381610038161003816100381610038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast)1003816100381610038161003816 +

10038161003816100381610038161003816119891119886119902119895 (119876119879119882 997888rarr 119871lowast)10038161003816100381610038161003816

(1)

119878119872119886119905119888ℎ (119908119898 119908119899)

=

0 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 = 01 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 ge 1

(2)

119878119894119898119894119880119874 (119902119894 119891119886119902119895)

=2 times sum119878119872119886119905119888ℎ (119902119894 (119880119874119882) 119891119886119902119895 (119880119874119882))

1003816100381610038161003816119902119894 (119880119874119882)1003816100381610038161003816 +10038161003816100381610038161003816119891119886119902119895 (119880119874119882)

10038161003816100381610038161003816

(3)

In the equations Simi119876119879 denotes the similarity score ofQT word type between a given new question 119902119894 and an exist-ing FAQ question faq119895 119871lowast denotes the set of semantic labelscorresponding to target words of the question 119902119894(119876119879119882 rarr119871lowast) and 119891119886119902119895(119876119879119882 rarr 119871lowast) represent the semantic labelsof QT words in 119902119894 and faq119895 through semantic labelingrespectively 119908119898 cup 119908119898 rarr 119878(119908119898) denotes synonymy wordsexpansion of word 119908119898 SMatch denotes the synonymy-basedword matching of two words 119908119898 and 119908119899 119908119898 cup 119908119898 rarr 119878(119908119898)is the synonymy extension of word 119908119898 by adding synonymyword collection 119878(119908119898) By integrating the previous three partsof matching the overall matching score 119878119894119898119894119878119862(119902119894 119891119886119902119895) ofthe two questions 119902119894 and faq119895 through balancing the similarityof each part is calculated as shown in

119872119886119905119888ℎ119878119862 (119902119894 119891119886119902119895) = 120572 times 119878119894119898119894119876119879 + 120573 times 119878119894119898119894119880119874+ (1 minus 120572 minus 120573) times 119878119894119898119894119878119875

(4)

After question matching top 119896 FAQs with the highestmatching scores are selected as candidates set 119876119891119886119902 Mean-while the web-based QampA model uses a similar strategy tocompute the matching scores to web question collections Itextracts 119896 answers from websites via the standard question-answering techniques Similarly the web-based QampA returnsa candidate question-answer set Q119908119890119887 Q119908119890119887 and 119876119891119886119902 are

Wireless Communications and Mobile Computing 7

Figure 5 The screen snapshots of the mobile-based system for providing diabetes information services using the three QampA models

merged as the final answer candidates Q119888119886119889 for answer rank-ing and answer selection

We propose a LSI-based answer ranking method to re-rank the questions in Q119888119886119889 The ranking method consistsof three steps feature extraction Latent Semantic Indexing(LSI) similarity calculation and ranking The extracted fea-tures of Chinese questions are bag-of-words (BOWs) andCharacter while the features for English questions are bag-of-words feature only The LSI approach takes advantage ofimplicit higher-order semantic structure and matches wordsin queries with words in documents [61] Here we treateach candidate answer as a short document and detect themost relevant answers via the LSI-based method After thatthe candidate answers are reranked based on the similarityvalues and the top 119873 answers as candidate list 119860119873 arereturned

Finally there is an answer selection processThe selectionof a candidate answer as correct or incorrect can be treated asa binary classification task The question and correspondingtop 119873 candidate answers in list 119860119873 are transformed to 119873QA pairs We propose an answer selection approach viaa Logistic Regression (LR) classifier which includes foursteps feature extraction parameter tuning model trainingand answer selecting Using the features similar to the LSI-based approach QA pairs are randomly selected from theLMD-FAQ repository as training data The QA pairs withcorrect answers are labeled as ldquo1rdquo and ldquo0rdquo otherwiseWe thentune the parameter ldquoCrdquo (inverse of regularization strength)to avoid overfittingunderfitting issue After parameter opti-mization the best parameter is applied in the LR classifierwhich then is applied to select the best119873 candidate answerswhere the top 1 is the best answer and the remaining N-1answers in list 119860119873 are relevant answers Figure 5 shows the

screen snapshots of the knowledge-based QampA FAQ-basedQampA and web-based QampA modes

33 Diabetes Data Management with Early Warning Sincediabetes patients and people at high risk usually needlong-term health management we develop a real-time datamanagement module incorporating early warning to achievepatient health self-management

In the data management module users are required toregister their basic information After that the users canlog in to report their recent health data related to six maincharacteristics HbA1c BMI glucose systolic hypertension(hypertension S) diastolic hypertension (hypertension D)and creatinine The health data then are stored in server sidesecurely

With the historical health data the module calculatesand monitors the health status in real time For each of thecharacteristics we set an alarm value according to literaturereview on IDF documents and reports Once the health datahas a dramatic change or the characteristics are close totheir corresponding alarm value ranges the system will auto-matically deliver a warning message to the users about thesituation To evaluate the usability of the system a 2-monthrandomized study is designed Thirty people volunteered asinternal test users to monitor their health condition via theDia-AID system During the test users measure and reportthe data of the six characteristics by themselves Based oneach new data report the system calculates the existing dataand newly submitted data to make a summarization of thehealth condition in real time Table 1 shows the reportedhealth data records by a user Cecil

The system records all the reported health data and gener-ates data change curves automatically For example Figure 6

8 Wireless Communications and Mobile Computing

Table 1 The reported health data records of the user Cecil

Characteristic Value Time Characteristic Value TimeHBA1C 75 2017-11-01 102203 HBA1C 90 2017-11-04 191518GLUCOSE 120 2017-11-01 102203 BMI 220 2017-11-04 191518BMI 221 2017-11-01 102203 GLUCOSE 150 2017-11-04 191518HYPERTENSION S 1110 2017-11-01 102203 HYPERTENSION S 1130 2017-11-04 191518HYPERTENSION D 820 2017-11-01 102203 HYPERTENSION D 840 2017-11-04 191518CREATININE 118 2017-11-01 102203 CREATININE 124 2017-11-04 191518HBA1C 60 2017-11-02 182439 HBA1C 90 2017-11-05 181025BMI 220 2017-11-02 182439 BMI 240 2017-11-05 181025GLUCOSE 100 2017-11-02 182439 GLUCOSE 150 2017-11-05 181025HYPERTENSION S 1100 2017-11-02 182439 HYPERTENSION S 1130 2017-11-05 181025HYPERTENSION D 800 2017-11-02 182439 HYPERTENSION D 840 2017-11-05 181025CREATININE 12 2017-11-02 182439 CREATININE 124 2017-11-05 181025HBA1C 78 2017-11-03 183028 HBA1C 100 2017-11-06 192113BMI 219 2017-11-03 183028 BMI 240 2017-11-06 192113GLUCOSE 120 2017-11-03 183028 GLUCOSE 180 2017-11-06 192113HYPERTENSION S 1110 2017-11-03 183028 HYPERTENSION S 1150 2017-11-06 192113HYPERTENSION D 820 2017-11-03 183028 HYPERTENSION D 860 2017-11-06 192113CREATININE 12 2017-11-03 183028 CREATININE 115 2017-11-06 192113

Figure 6 The health data management with early warning function and health record visualization by time for the user Cecil

Wireless Communications and Mobile Computing 9

shows the trend curve of Cecilrsquos diastolic hypertension in thelast 7 days When the current newly submitted health data iswithin safe range and there is no dramatic change comparedwith last report the system shows the user with the healthstatus messages eg ldquoYour health status is goodrdquo in greencolor Once the system identifies current user data exceedingalarm range (either too high or too low) according to thecurrent change trend the system will evaluate how long ittakes to reach the alarm value The system will evaluate howlong it will take to reach the alarm value If the period is tooshort the system will automatically warn the current userFor example the system warns the user Cecil that diastolichypertension is too high and will be in a danger range after 2days if the user does not have any control on it Through thehealth data management incorporating early warning userscan review their health status and take actions to reduce therisk of diabetes according to the warning messages

4 Results

41 Datasets Since there is no available diabetes FAQ datasetfor evaluation the evaluations of the proposed LSI-basedanswer ranking approach and answer selection methodwere based on the constructed LMD-FAQ repository Totest the LSI-based answer ranking approach we randomlyselected 500 750100012501500 and 1750 Chinese question-answer tuples (question ltanswer-setgt) from the repositoryrespectively as six subdatasets of Evaluation dataset-A Foreach question-answer tuple it contains one question and ananswer set which consists of one correct answer and nineincorrect answers randomly generated from the rest of therepositoryThus each question contains 10 candidate answersfor ranking For answer selection evaluation we suppose eachquestion has 119896 candidate answers ie for each questionk-1 incorrect answers are randomly generated as negativesamples In this paper k is set to 5 and 10 For the setting k=56000 QA pairs are randomly generated as Training dataset-B1 and 2500 QA pairs are randomly generated as Testingdataset-C1 For the setting k=10 8000QA pairs are randomlygenerated as Training dataset-B2 and 5000 QA pairs arerandomly generated as Testing dataset-C2

42 Evaluation Metrics The evaluation metrics includeMean Reciprocal Rank (MRR) AccuracyN of the returnedanswers precision recall and F1 measure all of which arecommonly usedmetrics to evaluate the performance of QampAsystems

(i) MRR Mean Reciprocal Rank of the first correctanswer as shown in (5) (ie 1 if a correct answerwas retrieved at rank 1 05 if a correct answer wasretrieved at rank 2 and so on Q is the test set and|119876| denotes the number of questions in Q rank119894represents the position of the first correct answer inanswer ranking candidates to a test question 119876119894)

(ii) AccuracyN proportion of correct answers in top119873 returned answers by the system as shown in (6)(119862119894(119873) = 1 if there is at least one correct answer in top119873 candidates otherwise it is 0)

(iii) Precision for any of the categories is the number oftrue positives (TP) (ie the number of questions cor-rectly labeled as belonging to the positive categories)that are divided by the total number of questionslabeled as belonging to the positive categories asshown in (7) False positive (FP) is the number ofquestions that the system incorrectly labeled

(iv) Recall is defined as the number of true positivesdivided by the total number of questions that actuallybelong to positive categories (ie the sum of truepositive and false negative) as shown in (8)

(v) F1-measure considers both the precision and therecall to compute a balanced score as shown in (9)

119872119877119877 = 1|119876||119876|

sum119894=1

1119903119886119899119896119894

(5)

119860119888119888119906119903119886119888119910119873 = 1|119876||119876|

sum119894=1

119862119894 (119873) (6)

119875119903119890119888119894119904119894119900119899 = 119879119875119879119875 + 119865119875 (7)

119877119890119888119886119897119897 = 119879119875119879119875 + 119865119873 (8)

1198651 = 2 lowast 119875119903119890119888119894119904119894119900119899 lowast 119877119890119888119886119897119897119875119903119890119888119894119904119894119900119899 + 119877119890119888119886119897119897 (9)

43 Results To validate the proposed LSI-based answerranking method we conduct the following two experimentsThe first experiment is to verify the effectiveness of the LSI-based answer ranking method by comparing to five base-lines We adopt Doc2Vec Latent Dirichlet Allocation (LDA)Locality Sensitive Hashing (LSH) docsim and Synonyms[62] as baselines We randomly select 500 questions andmeasure the performance inMRRandAccuracyN (AccN119873 = 1 2 3 4 5) Compared with the baselines our methodachieves the best performance in all evaluation metrics asshown in Table 2 For MRR our method improves by 1780compared to LSH which has the best performance amongbaselines For Acc1 LSH also obtains the best performanceas 06733 among baselines Our method outperformed LSHwith an improvement of 2352 In addition our methodranks 9499 of the correct answers in the top five ofcandidate answers The improvements of MRR and Acc1prove that the proposed method can potentially promote thepositions of correct answers

To assess the stability of the proposed method thesecond experiment is conducted by comparing to the samefive baselines with the measures of MRR and Acc1 Theused dataset is Evaluation dataset-A Figure 7 illustrates theexperiment results measured in MRR while Figure 8 showsthe results measured in Acc1 From the result our methodachieves stable performance over all different sizes of thequestion setsThis result is promising since ourmethod ranksmost of the correct answers in the top of the candidate answerlist Moreover compared to the baselines our method gains

10 Wireless Communications and Mobile Computing

Table 2 Evaluation results compared to baselines

MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499

Our Method

1009080706050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

MRR

LSHDoc2Vec

docsimLDASynonyms

Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure

the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list

Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy

To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments

1009080706

Acc

1

050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

Our MethodLSHDoc2Vec

docsimLDASynonyms

Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure

The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists

We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)

Wireless Communications and Mobile Computing 11

Table 3 The comparison of our answer selection method with baseline methods using different k settings

Setting Methods Accuracy Precision Recall F1

k=5

KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956

RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305

Our method 09222 08859 08657 08753

k=10

KNN 09032 08076 05258 05247RF 09106 07523 07277 07391

GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206

Our method 09651 09415 08559 08929

Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues

Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction

5 Conclusions

Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system

Acc

1000

0975

0950

0925

0900

0875

0850

0825

08002000 3000 4000

The size of training datasets5000 6000

RecallF1Precision

Figure 10 The performance with the increasing size of trainingdatasets when k=5

1000

0975

0950

0925

0900

0875

0850

0825

08004000 5000 6000

The size of training datasets7000 8000

Acc RecallF1Precision

Figure 11 The performance with the increasing size of trainingdatasets when k=10

12 Wireless Communications and Mobile Computing

assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods

Data Availability

The diabetes data is not made publicly available

Conflicts of Interest

There are no conflicts of interest in this paper

Acknowledgments

The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)

References

[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017

[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001

[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013

[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014

[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo

International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016

[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014

[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016

[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017

[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017

[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009

[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016

[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013

[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015

[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017

[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012

[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010

[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006

[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002

[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002

[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999

[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710

[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000

[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007

[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001

[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015

[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010

[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007

Wireless Communications and Mobile Computing 13

[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016

[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005

[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004

[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007

[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012

[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017

[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015

[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007

[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016

[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018

[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018

[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014

[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014

[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015

[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010

[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004

[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009

[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011

[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014

[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015

[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016

[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016

[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016

[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016

[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016

[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014

[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016

[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015

[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017

[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008

[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010

14 Wireless Communications and Mobile Computing

[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016

[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990

[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: A Mobile-Based Question-Answering and Early …downloads.hindawi.com/journals/wcmc/2018/9163160.pdfunable to serve users with clinically relevant answers in a timely manner and thus

2 Wireless Communications and Mobile Computing

151194

246285

366 382415 425

050

100150200250300350400450500

Adul

ts w

ith D

iabe

tes (

mill

ions

)

2003 2006 2009 2011 2013 2015 20172000Year

Figure 1 Total number of adults with diabetes (20-79 years old)around the world during 2000 to 2017

management [7] Later Krebs et al [14] showed that 5823(9341604) of mobile phone users downloaded a health-related mobile app and used it at least once per day Asa convenient platform for checking usersrsquo health status ona real-time basis mobile applications have been developedfrom information provision to lifestyle-oriented smart healthmanagement Moreover existing research presented thatcontinuous real-time consulting and monitoring supportedby smartphones is applicable for improving efficiency ofdiabetes self-management [7 15ndash17] Therefore developinga mobile-based system for diabetes patients to assist in theirhealth management is of great importance

Many studies have shown that current medical searchengines eg PubMed Medical Subject Headings (MeSH)and Unified Medical Language System (UMLS) are oftenunable to serve users with clinically relevant answers in atimely manner and thus fail to satisfy patientsrsquo counselingneed [18 19] Hersh et al [20] found that a healthcareprofessional tookmore than 30minutes on average to seek ananswer utilizing information retrieval systems The processneeds about 2 minutes on average to obtain an answereven for experienced doctors [21] Instead based on naturallanguage processing techniques question answering (QampA)aims to provide users with direct precise answers to theirquestions and thus it is more preferred Hence there isan increasing demand to develop convenient and effectivequestion-answering systems for the medical domain [21ndash24] Moreover there is a particularly growing demand ofQampA systems for effectively and efficiently assisting diabetespatients to better utilize ever-accumulating expert knowledge[1 7 15]

To that end this study aims to develop a mobile-basedquestion-answering and early warning system called Dia-AID The system consists of 3 modules a large-scale multi-language diabetes FAQ repository a multimode fusion QampAframework and a health datamanagementmodule with earlywarning functionThe repository captures diabetes questionswith expert-defined answers and stores the question-answerknowledge in an interpretable and extendible form Theframework contains three different QampA resolution strate-gies knowledge-based QampA FAQ-based QampA and web-basedQampAThe health datamanagementmodule containingearly warning provides a convenient counseling service on a

smart health platform to assist diabetes patients in monitor-ing their health conditions

The contributions of this work include the following(1) a large-scale multilanguage diabetes FAQ repository isbuilt with a consistent representation format (2) a novelmultimode fusion QampA framework that integrates threemodes of QampA technologies is proposed to fulfill diabetesinformation seeking need (3) a health data managementmodule containing early warning function is developed tomonitor patient health status

The rest of this paper is organized as follows Section 2introduces related work in biomedical question answeringSection 3 describes themobile-based question-answering andearly warning system Dia-AID in detail Section 4 presentsthe experiment results of ourmethods based on the FAQ datarepository Section 5 addresses the conclusions

2 Related Work

The aim of question answering is to provide precise answersinstead of relevant documents from unstructured datasources to inquirers The research of open domain questionanswering (QampA) started from the prompt and instantiatedwork in the Text REtrieval Conference (TREC) evaluationcampaign [25] Recently with the increasing demand ofdomain-specific applications a growing interest has shiftedfrom open domain QampA to restricted domain QampA [26 27]Molla et al [28] addressed that restricted domain QampA tar-geting domain-specific information was expected to achieveeffective and reliable performance in real-world applicationsFurther as claimed by Mishra et al [29] restricted domainQampA could fulfill the specialized information requirementsof domain experts therefore improving the satisfaction ofusers Similarly Yu et al [30] and Rinaldi et al [31] notedthat restricted domain QampA such as biomedical domain [2432] could exploit domain-specific knowledge resources fordeeper text analysis as well as taking advantage of domain-specific typology formatting conventions to improve theanswer extraction performance

In light of Athenikos et alrsquos research [27] medical domainquestion answering was facing the challenge of highly com-plex domain-specific terminology and lexical and ontologicalresources Also emphasized by Abacha et al [33] the keyprocess was to translate the semantic relations expressed inquestions into a machine-readable representation to analyzethe natural language questions deeply and efficiently Theypresented a complete question analysis approach includingmedical entity recognition semantic relation extraction andautomatic translation to SPARQL queries Result presentedthat 60 of the questions were correctly translated toSPARQL queries via the proposed method Later Anca [34]proposed the GFMed for dealing with the same problem andthe challenge of querying a large number of LinkedData fromvarious domains GFMed was a QampA system for biomedicalinterlinked data aiming to fill the gap between end users andformal languages by introducing a grammatical framework totranslate biomedical information in natural language to thecorresponding SPARQL language The experimental resultsdemonstrated that the proposed methodology for building

Wireless Communications and Mobile Computing 3

Controlled Nature Language for querying Linked Data wasvalid Abacha et al [35] proposed an approach for ldquoAnswerSearchrdquo based on semantic search and query relaxation toresolve the problem of automatic QampA in medical domainThey defined question focuses as medical entities that werethe most closely linked to answers to improve the overallperformance of question answering Terol et al [36] claimeda general QampA system that was capable of working over anyrestricted domain Taking medical domain as an exampleapplication their system answeredmedical questions accord-ing to a generic question taxonomy and gained 944 overallprecision on the task

During question-answering process question represen-tation is an essential step in question analysis and answerretrieval Zhang et al [37] proposed a system based onmultilayer self-organizing map providing an efficient solu-tion to the organization problem of structured data of elec-tronic books A tree-structured representation was proposedto formulate the rich features of an e-book author Theirexperiment results corroborated that the proposed modelsbased on the tree-structured representation outperformedcontent-based models Later in their further research anefficient learning framework Tree2Vector for transformingtree-structured data into vectorial representations was pro-posed [38] By utilizing Tree2Vector framework to map tree-structured book data into vectorial space their continuedexperiments further presented that the mapped vectorialspace could explore term spatial distributions over a bookrather than the traditional documentmodelingmethods [39]

A recent trend among medical QampA systems was toincorporate the organized medical information throughoutQampA process in order to utilize the information for efficienthealthmanagement in various areas such asU-healthcare [4041] Jung et al [42] developed a decision supporting methodmainly for pain management for chronic disease patientsbased on frequent pattern treeminingThe proposedmethodaimed to reduce time and expenses for pain decision-makingof patients who were frequently exposed to pain Chunget al [12] presented a knowledge-based health service byleveraging a hybridwireless fidelity peer-to-peer architectureThe service was proposed to provide patients with efficientand economical healthcare through correct measurement ofvarious biosignals so that users could easily predict andmanage both health and disease Han et al [43] introduced aU-health service system THE MUSS focusing on achievingreusability and resolvability to provide stress and weightmanagement services

In subsequent medical QampA developments diabetesmellitus as one of the top three major worldwide causesof death from noncommunicable diseases has promptednumerous researches investigating the prevention preva-lence and mortality of diabetes mellitus [15 44ndash53] Thereis a great demand for a QampA system that can effectivelyand efficiently provide health consulting services and assistpeople in monitoring and managing their individual healthconditions Jung et al [7] explored a mobile healthcare appli-cation for providing self-diabetes management to patientsBy interoperating with Electronic Medical Record (EMR)the healthcare application provided services such as weight

management cardio-cerebrovascular risk evaluation andexercise management Waki et al [54] developed a real-time interactive system DialBetics to achieve diabetes self-management particularly HbA1c management By an eval-uation strategy the system helped patients improve theirHbA1c significantly by monitoring health data comparedwith continuing self-care regimen patients More recentlyYoo et al [1] proposed a Personal Health Record- (PHR-)based diabetes index service model through a mobile deviceoffering users a management information service for pre-venting diabetes Users were able to check their healthconditions on a real-time basis and receive informationabout desirable health behaviors and dietary habits related todiabetes

Yet the existing diabetes management applications pro-vided general information search and management whileignoring counseling services which were crucial for man-aging the health condition of diabetes patients Besidesas claimed by Mishra et al [29] the cons of restricteddomain QampA included the limited repository of domain-specific questions To overcome these difficulties we builta LMD-FAQ repository to provide users with concise andaccurate answers by physicians or experts from debatesrelated professional websites Moreover we aim to leveragethe LMD-FAQ repository to provide counseling services ofdiet medication and symptoms for diabetes patients Inaddition based on our previous work [55] by analyzingglobal clinical trials of 190 countries provided by the NationalInstitutes of Health (NIH) we discovered 6 representativehealth characteristics that were closely related to diabetesmellitus to better manage usersrsquo health conditions The sixrepresentative health characteristics were Body Mass Index(BMI) Glucose Systolic Hypertension Diastolic Hyperten-sion HbA1c and creatinine Further we defined severalearly warning intervals of health characteristics by referringto the existing international medical standards for healthmanagement and risk warning

3 Methods and Materials

The architecture of our mobile-based diabetes question-answering and early warning system Dia-AID is shown inFigure 2 It consists of 3 modules a large-scale multilanguagediabetes FAQ repository (LMD-FAQ repository) a novelmultimode fusion question-answering framework (MMF-QA) and a diabetes data management module with earlywarning (DM-EW) The LMD-FAQ repository contains alarge number of diabetes question-answer pairs acquiredfrom mainstream diabetes-related professional websites TheMMF-QA framework integrates three strategies knowledge-based QampA FAQ-based QampA and web-based QampA TheDM-EW module records patientsrsquo health data and monitorstheir health conditions in real time Six representative healthcharacteristics that are closely related to diabetes mellitusthat is BMI glucose systolic hypertension diastolic hyper-tension HbA1c and creatinine are applied In case of a rapidcharacteristic change or a predication of deterioration themodule will automatically warn patients and provide themwith dietary guidelines

4 Wireless Communications and Mobile Computing

Dia-AID

LMD-FAQ Repository

Automatic Question Acquisition

Question Target Identification and

Classification

Semantic Pattern Generation and Matching

Terminals

1 A Large-scale Multi-language Diabetes FAQ Repository

2 A Multi-mode FusionQampA Framework

3 Data Management with Early Warning Function

Computer

Smart Device

Mobile Phone

Knowledge-based QampA

FAQ-based QampA

Web-based QampA

Personal Characteristic Health Data

Heath Data Monitoring

Risk Warning

QSem-based Question Matching

LSI-based Answer Ranking

Risk Evaluation

Answer Selection

User profiles

Figure 2 The architecture of the diabetes question-answering and early warning system Dia-AID

31 The Large-Scale Multilanguage Diabetes FAQ RepositoryFrequently Asked Questions (FAQs) provide specific answersto the questions that are frequently asked when users browsespecific websites For example the website Health China(httphealthchinacom) allows users to ask questions infree text and those who are experts in the field answer thequestions freely These questions with professional answersare collected and organized as FAQ data The FAQ datacan dramatically benefit question answering by reusingthe accumulated professional knowledge In this paper wedevelop a method to automatically construct a large-scalemultilanguage diabetes FAQ (LMD-FAQ) repository throughidentifying FAQ data from professional diabetes websites

As illustrated in Figure 2 our method includes four steps(1) The first step is automatic question acquisition We firstanalyze the page structures of specific websites to identifydiabetes questions The websites elaborately selected bydomain experts include Diabetes Clinical Guidelines (Chi-neseMedical Association Diabetes Branch) professional dia-betes websites (International Diabetes Union the AmericanDiabetes Association etc) diabetes professional information

websites (CDC Health Channel) and diabetes interactivequestion-answering websites (Yahoo Knowledge)The ques-tions and associated answers are then extracted using regularexpression matching with the page codes (2) The secondstep is question target identification and classification Basedon our previous work [56 57] an automated answer typeidentification and classification method is applied to extractthe target and intent of questions by utilizing both syntacticand semantic analysis Considering that syntactic structuresvary according to the ways questions are asked four typicalsituations are identified and analyzed with each of themhaving a specific processing strategy During the processquestion target features are extracted via a principle-basedsyntactic parser and then expanded with their hypernymyfeatures and semantic labels Finally the expanded featuresare sent to a trained classifier to predict corresponding answertypes (3) The third step is semantic pattern generation andmatching Semantic pattern is utilized to index the questionswith answers in a more structured and semantic way Withquestion target and answer type extracted by the secondstep the questions are represented by a structural semantic

Wireless Communications and Mobile Computing 5

Figure 3 The visualization of example FAQ data in the LMD-FAQ repository

pattern which consists of five components the questiontarget question type concept event and constraint Anentropy-based method proposed in previous work [58] isapplied for automated semantic pattern generation Figure 3shows the visualization of example FAQ data in the LMD-FAQ repository

Based on the above procedure the method extractsFAQ data from professional websites formats them using aconsistent representation and indexes them with semanticpatterns for fast retrievalThrough the automatic process andthe human review on the indexed data the FAQ repositorycan be incrementally maintained Currently the LMD-FAQrepository comprises 19317 English frequently asked QApairs and 6041 Chinese QA pairs The repository providesour QampA system with fundamental data support for answer-ing commonly posted questions

32 The Multimode Fusion Question-Answering Frame-work Themultimode fusion question-answering framework(MMF-QA) integrates three QampA models knowledge-basedQampA FAQ-based QampA and web-based QampA The overallframework is shown in Figure 4The procedure of themodelsis described as follows

The knowledge-based QampA model relies on a diabetesknowledge base to generate concise answers for postedquestions For a new given question the model analyzes thestructure and keywords of the question and then generates

a corresponding semantic pattern Thus the question istransformed from natural language to a structural semanticrepresentation that captures semantic information such asquestion target question type concept event and con-straint The question then is further represented as a tuple([Concept1] Relation [Concept2]) in which ldquoConcept1rdquo andldquoConcept2rdquo are used to label meaningful entities The repre-sented question is used for answer extraction fromknowledgebase For instance ldquoWhatrsquos the symptoms of diabetesrdquo isrepresented as ([symptoms] Rel of [diabetes]) Thereforethe knowledge-based QampA process mainly maps entitiesand their relations to formally represented tuples which arefurther used to match knowledge base to retrieve accuratelymatched knowledge elements as answers

The FAQ-based QampA model computes matching scoresbetween a given question and questions in the FAQ reposi-toryThe questionswithmatching scores larger than a specificthreshold are kept as candidates The candidate questionsthen are ranked and the top 119896 questions with the highestscores are returned The model consists of three main stepsQsem-based question matching LSI-based answer rankingand answer selection As claimed by [59] a major challengeof FAQ-based QampA is to match questions to correspond-ing question-answer pairs Here we apply a QSem-basedquestion matching framework proposed in one of ourprevious works [60] to support answering FAQs throughreusing accumulated QA data The framework considers

6 Wireless Communications and Mobile Computing

User Questions

Multi-mode Fusion QampAFramework

Question Target Identification and Classification

Semantic Pattern-based Question Representation

Question Analysis

Knowledge Base

LMD-FAQ Repository

WebPreprocessed

questionsTop N

Answers

Semantic Pattern Generation

Semantic Pattern-based Question Representation

Knowledge-based QampA

Qsem-based Question Matching

LSI-based Answer Ranking

FAQ-based QampA

Answer Selection

Qsem-based Question Matching

Web-based QampA

knowledge elements

AN

Qcad

Qfaq

Qweb

Figure 4 The multimode fusion QampA framework MMF-QA

both question word types and semantic pattern accordingto their functionalities in question matching The questionword types include question target word user-oriented wordand irrelevant word These three word types are semanticallylabeled by a predefined ontology to enrich the semanticrepresentation of questions For each word type differentsimilarity strategies are applied to calculate the similarity asdescribed in [60] The similarity calculations for questiontarget and user-oriented word type between question 119902119894 and aFAQ candidate faq119895 are shown in (1) (2) and (3) respectively

119878119894119898119894119876119879 (119902119894 119891119886119902119895)

=2 times 10038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast) cap 119891119886119902119895 (119876119879119882 997888rarr 119871lowast)100381610038161003816100381610038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast)1003816100381610038161003816 +

10038161003816100381610038161003816119891119886119902119895 (119876119879119882 997888rarr 119871lowast)10038161003816100381610038161003816

(1)

119878119872119886119905119888ℎ (119908119898 119908119899)

=

0 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 = 01 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 ge 1

(2)

119878119894119898119894119880119874 (119902119894 119891119886119902119895)

=2 times sum119878119872119886119905119888ℎ (119902119894 (119880119874119882) 119891119886119902119895 (119880119874119882))

1003816100381610038161003816119902119894 (119880119874119882)1003816100381610038161003816 +10038161003816100381610038161003816119891119886119902119895 (119880119874119882)

10038161003816100381610038161003816

(3)

In the equations Simi119876119879 denotes the similarity score ofQT word type between a given new question 119902119894 and an exist-ing FAQ question faq119895 119871lowast denotes the set of semantic labelscorresponding to target words of the question 119902119894(119876119879119882 rarr119871lowast) and 119891119886119902119895(119876119879119882 rarr 119871lowast) represent the semantic labelsof QT words in 119902119894 and faq119895 through semantic labelingrespectively 119908119898 cup 119908119898 rarr 119878(119908119898) denotes synonymy wordsexpansion of word 119908119898 SMatch denotes the synonymy-basedword matching of two words 119908119898 and 119908119899 119908119898 cup 119908119898 rarr 119878(119908119898)is the synonymy extension of word 119908119898 by adding synonymyword collection 119878(119908119898) By integrating the previous three partsof matching the overall matching score 119878119894119898119894119878119862(119902119894 119891119886119902119895) ofthe two questions 119902119894 and faq119895 through balancing the similarityof each part is calculated as shown in

119872119886119905119888ℎ119878119862 (119902119894 119891119886119902119895) = 120572 times 119878119894119898119894119876119879 + 120573 times 119878119894119898119894119880119874+ (1 minus 120572 minus 120573) times 119878119894119898119894119878119875

(4)

After question matching top 119896 FAQs with the highestmatching scores are selected as candidates set 119876119891119886119902 Mean-while the web-based QampA model uses a similar strategy tocompute the matching scores to web question collections Itextracts 119896 answers from websites via the standard question-answering techniques Similarly the web-based QampA returnsa candidate question-answer set Q119908119890119887 Q119908119890119887 and 119876119891119886119902 are

Wireless Communications and Mobile Computing 7

Figure 5 The screen snapshots of the mobile-based system for providing diabetes information services using the three QampA models

merged as the final answer candidates Q119888119886119889 for answer rank-ing and answer selection

We propose a LSI-based answer ranking method to re-rank the questions in Q119888119886119889 The ranking method consistsof three steps feature extraction Latent Semantic Indexing(LSI) similarity calculation and ranking The extracted fea-tures of Chinese questions are bag-of-words (BOWs) andCharacter while the features for English questions are bag-of-words feature only The LSI approach takes advantage ofimplicit higher-order semantic structure and matches wordsin queries with words in documents [61] Here we treateach candidate answer as a short document and detect themost relevant answers via the LSI-based method After thatthe candidate answers are reranked based on the similarityvalues and the top 119873 answers as candidate list 119860119873 arereturned

Finally there is an answer selection processThe selectionof a candidate answer as correct or incorrect can be treated asa binary classification task The question and correspondingtop 119873 candidate answers in list 119860119873 are transformed to 119873QA pairs We propose an answer selection approach viaa Logistic Regression (LR) classifier which includes foursteps feature extraction parameter tuning model trainingand answer selecting Using the features similar to the LSI-based approach QA pairs are randomly selected from theLMD-FAQ repository as training data The QA pairs withcorrect answers are labeled as ldquo1rdquo and ldquo0rdquo otherwiseWe thentune the parameter ldquoCrdquo (inverse of regularization strength)to avoid overfittingunderfitting issue After parameter opti-mization the best parameter is applied in the LR classifierwhich then is applied to select the best119873 candidate answerswhere the top 1 is the best answer and the remaining N-1answers in list 119860119873 are relevant answers Figure 5 shows the

screen snapshots of the knowledge-based QampA FAQ-basedQampA and web-based QampA modes

33 Diabetes Data Management with Early Warning Sincediabetes patients and people at high risk usually needlong-term health management we develop a real-time datamanagement module incorporating early warning to achievepatient health self-management

In the data management module users are required toregister their basic information After that the users canlog in to report their recent health data related to six maincharacteristics HbA1c BMI glucose systolic hypertension(hypertension S) diastolic hypertension (hypertension D)and creatinine The health data then are stored in server sidesecurely

With the historical health data the module calculatesand monitors the health status in real time For each of thecharacteristics we set an alarm value according to literaturereview on IDF documents and reports Once the health datahas a dramatic change or the characteristics are close totheir corresponding alarm value ranges the system will auto-matically deliver a warning message to the users about thesituation To evaluate the usability of the system a 2-monthrandomized study is designed Thirty people volunteered asinternal test users to monitor their health condition via theDia-AID system During the test users measure and reportthe data of the six characteristics by themselves Based oneach new data report the system calculates the existing dataand newly submitted data to make a summarization of thehealth condition in real time Table 1 shows the reportedhealth data records by a user Cecil

The system records all the reported health data and gener-ates data change curves automatically For example Figure 6

8 Wireless Communications and Mobile Computing

Table 1 The reported health data records of the user Cecil

Characteristic Value Time Characteristic Value TimeHBA1C 75 2017-11-01 102203 HBA1C 90 2017-11-04 191518GLUCOSE 120 2017-11-01 102203 BMI 220 2017-11-04 191518BMI 221 2017-11-01 102203 GLUCOSE 150 2017-11-04 191518HYPERTENSION S 1110 2017-11-01 102203 HYPERTENSION S 1130 2017-11-04 191518HYPERTENSION D 820 2017-11-01 102203 HYPERTENSION D 840 2017-11-04 191518CREATININE 118 2017-11-01 102203 CREATININE 124 2017-11-04 191518HBA1C 60 2017-11-02 182439 HBA1C 90 2017-11-05 181025BMI 220 2017-11-02 182439 BMI 240 2017-11-05 181025GLUCOSE 100 2017-11-02 182439 GLUCOSE 150 2017-11-05 181025HYPERTENSION S 1100 2017-11-02 182439 HYPERTENSION S 1130 2017-11-05 181025HYPERTENSION D 800 2017-11-02 182439 HYPERTENSION D 840 2017-11-05 181025CREATININE 12 2017-11-02 182439 CREATININE 124 2017-11-05 181025HBA1C 78 2017-11-03 183028 HBA1C 100 2017-11-06 192113BMI 219 2017-11-03 183028 BMI 240 2017-11-06 192113GLUCOSE 120 2017-11-03 183028 GLUCOSE 180 2017-11-06 192113HYPERTENSION S 1110 2017-11-03 183028 HYPERTENSION S 1150 2017-11-06 192113HYPERTENSION D 820 2017-11-03 183028 HYPERTENSION D 860 2017-11-06 192113CREATININE 12 2017-11-03 183028 CREATININE 115 2017-11-06 192113

Figure 6 The health data management with early warning function and health record visualization by time for the user Cecil

Wireless Communications and Mobile Computing 9

shows the trend curve of Cecilrsquos diastolic hypertension in thelast 7 days When the current newly submitted health data iswithin safe range and there is no dramatic change comparedwith last report the system shows the user with the healthstatus messages eg ldquoYour health status is goodrdquo in greencolor Once the system identifies current user data exceedingalarm range (either too high or too low) according to thecurrent change trend the system will evaluate how long ittakes to reach the alarm value The system will evaluate howlong it will take to reach the alarm value If the period is tooshort the system will automatically warn the current userFor example the system warns the user Cecil that diastolichypertension is too high and will be in a danger range after 2days if the user does not have any control on it Through thehealth data management incorporating early warning userscan review their health status and take actions to reduce therisk of diabetes according to the warning messages

4 Results

41 Datasets Since there is no available diabetes FAQ datasetfor evaluation the evaluations of the proposed LSI-basedanswer ranking approach and answer selection methodwere based on the constructed LMD-FAQ repository Totest the LSI-based answer ranking approach we randomlyselected 500 750100012501500 and 1750 Chinese question-answer tuples (question ltanswer-setgt) from the repositoryrespectively as six subdatasets of Evaluation dataset-A Foreach question-answer tuple it contains one question and ananswer set which consists of one correct answer and nineincorrect answers randomly generated from the rest of therepositoryThus each question contains 10 candidate answersfor ranking For answer selection evaluation we suppose eachquestion has 119896 candidate answers ie for each questionk-1 incorrect answers are randomly generated as negativesamples In this paper k is set to 5 and 10 For the setting k=56000 QA pairs are randomly generated as Training dataset-B1 and 2500 QA pairs are randomly generated as Testingdataset-C1 For the setting k=10 8000QA pairs are randomlygenerated as Training dataset-B2 and 5000 QA pairs arerandomly generated as Testing dataset-C2

42 Evaluation Metrics The evaluation metrics includeMean Reciprocal Rank (MRR) AccuracyN of the returnedanswers precision recall and F1 measure all of which arecommonly usedmetrics to evaluate the performance of QampAsystems

(i) MRR Mean Reciprocal Rank of the first correctanswer as shown in (5) (ie 1 if a correct answerwas retrieved at rank 1 05 if a correct answer wasretrieved at rank 2 and so on Q is the test set and|119876| denotes the number of questions in Q rank119894represents the position of the first correct answer inanswer ranking candidates to a test question 119876119894)

(ii) AccuracyN proportion of correct answers in top119873 returned answers by the system as shown in (6)(119862119894(119873) = 1 if there is at least one correct answer in top119873 candidates otherwise it is 0)

(iii) Precision for any of the categories is the number oftrue positives (TP) (ie the number of questions cor-rectly labeled as belonging to the positive categories)that are divided by the total number of questionslabeled as belonging to the positive categories asshown in (7) False positive (FP) is the number ofquestions that the system incorrectly labeled

(iv) Recall is defined as the number of true positivesdivided by the total number of questions that actuallybelong to positive categories (ie the sum of truepositive and false negative) as shown in (8)

(v) F1-measure considers both the precision and therecall to compute a balanced score as shown in (9)

119872119877119877 = 1|119876||119876|

sum119894=1

1119903119886119899119896119894

(5)

119860119888119888119906119903119886119888119910119873 = 1|119876||119876|

sum119894=1

119862119894 (119873) (6)

119875119903119890119888119894119904119894119900119899 = 119879119875119879119875 + 119865119875 (7)

119877119890119888119886119897119897 = 119879119875119879119875 + 119865119873 (8)

1198651 = 2 lowast 119875119903119890119888119894119904119894119900119899 lowast 119877119890119888119886119897119897119875119903119890119888119894119904119894119900119899 + 119877119890119888119886119897119897 (9)

43 Results To validate the proposed LSI-based answerranking method we conduct the following two experimentsThe first experiment is to verify the effectiveness of the LSI-based answer ranking method by comparing to five base-lines We adopt Doc2Vec Latent Dirichlet Allocation (LDA)Locality Sensitive Hashing (LSH) docsim and Synonyms[62] as baselines We randomly select 500 questions andmeasure the performance inMRRandAccuracyN (AccN119873 = 1 2 3 4 5) Compared with the baselines our methodachieves the best performance in all evaluation metrics asshown in Table 2 For MRR our method improves by 1780compared to LSH which has the best performance amongbaselines For Acc1 LSH also obtains the best performanceas 06733 among baselines Our method outperformed LSHwith an improvement of 2352 In addition our methodranks 9499 of the correct answers in the top five ofcandidate answers The improvements of MRR and Acc1prove that the proposed method can potentially promote thepositions of correct answers

To assess the stability of the proposed method thesecond experiment is conducted by comparing to the samefive baselines with the measures of MRR and Acc1 Theused dataset is Evaluation dataset-A Figure 7 illustrates theexperiment results measured in MRR while Figure 8 showsthe results measured in Acc1 From the result our methodachieves stable performance over all different sizes of thequestion setsThis result is promising since ourmethod ranksmost of the correct answers in the top of the candidate answerlist Moreover compared to the baselines our method gains

10 Wireless Communications and Mobile Computing

Table 2 Evaluation results compared to baselines

MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499

Our Method

1009080706050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

MRR

LSHDoc2Vec

docsimLDASynonyms

Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure

the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list

Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy

To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments

1009080706

Acc

1

050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

Our MethodLSHDoc2Vec

docsimLDASynonyms

Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure

The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists

We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)

Wireless Communications and Mobile Computing 11

Table 3 The comparison of our answer selection method with baseline methods using different k settings

Setting Methods Accuracy Precision Recall F1

k=5

KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956

RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305

Our method 09222 08859 08657 08753

k=10

KNN 09032 08076 05258 05247RF 09106 07523 07277 07391

GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206

Our method 09651 09415 08559 08929

Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues

Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction

5 Conclusions

Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system

Acc

1000

0975

0950

0925

0900

0875

0850

0825

08002000 3000 4000

The size of training datasets5000 6000

RecallF1Precision

Figure 10 The performance with the increasing size of trainingdatasets when k=5

1000

0975

0950

0925

0900

0875

0850

0825

08004000 5000 6000

The size of training datasets7000 8000

Acc RecallF1Precision

Figure 11 The performance with the increasing size of trainingdatasets when k=10

12 Wireless Communications and Mobile Computing

assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods

Data Availability

The diabetes data is not made publicly available

Conflicts of Interest

There are no conflicts of interest in this paper

Acknowledgments

The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)

References

[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017

[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001

[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013

[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014

[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo

International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016

[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014

[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016

[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017

[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017

[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009

[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016

[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013

[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015

[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017

[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012

[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010

[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006

[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002

[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002

[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999

[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710

[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000

[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007

[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001

[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015

[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010

[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007

Wireless Communications and Mobile Computing 13

[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016

[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005

[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004

[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007

[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012

[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017

[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015

[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007

[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016

[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018

[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018

[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014

[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014

[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015

[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010

[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004

[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009

[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011

[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014

[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015

[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016

[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016

[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016

[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016

[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016

[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014

[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016

[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015

[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017

[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008

[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010

14 Wireless Communications and Mobile Computing

[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016

[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990

[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: A Mobile-Based Question-Answering and Early …downloads.hindawi.com/journals/wcmc/2018/9163160.pdfunable to serve users with clinically relevant answers in a timely manner and thus

Wireless Communications and Mobile Computing 3

Controlled Nature Language for querying Linked Data wasvalid Abacha et al [35] proposed an approach for ldquoAnswerSearchrdquo based on semantic search and query relaxation toresolve the problem of automatic QampA in medical domainThey defined question focuses as medical entities that werethe most closely linked to answers to improve the overallperformance of question answering Terol et al [36] claimeda general QampA system that was capable of working over anyrestricted domain Taking medical domain as an exampleapplication their system answeredmedical questions accord-ing to a generic question taxonomy and gained 944 overallprecision on the task

During question-answering process question represen-tation is an essential step in question analysis and answerretrieval Zhang et al [37] proposed a system based onmultilayer self-organizing map providing an efficient solu-tion to the organization problem of structured data of elec-tronic books A tree-structured representation was proposedto formulate the rich features of an e-book author Theirexperiment results corroborated that the proposed modelsbased on the tree-structured representation outperformedcontent-based models Later in their further research anefficient learning framework Tree2Vector for transformingtree-structured data into vectorial representations was pro-posed [38] By utilizing Tree2Vector framework to map tree-structured book data into vectorial space their continuedexperiments further presented that the mapped vectorialspace could explore term spatial distributions over a bookrather than the traditional documentmodelingmethods [39]

A recent trend among medical QampA systems was toincorporate the organized medical information throughoutQampA process in order to utilize the information for efficienthealthmanagement in various areas such asU-healthcare [4041] Jung et al [42] developed a decision supporting methodmainly for pain management for chronic disease patientsbased on frequent pattern treeminingThe proposedmethodaimed to reduce time and expenses for pain decision-makingof patients who were frequently exposed to pain Chunget al [12] presented a knowledge-based health service byleveraging a hybridwireless fidelity peer-to-peer architectureThe service was proposed to provide patients with efficientand economical healthcare through correct measurement ofvarious biosignals so that users could easily predict andmanage both health and disease Han et al [43] introduced aU-health service system THE MUSS focusing on achievingreusability and resolvability to provide stress and weightmanagement services

In subsequent medical QampA developments diabetesmellitus as one of the top three major worldwide causesof death from noncommunicable diseases has promptednumerous researches investigating the prevention preva-lence and mortality of diabetes mellitus [15 44ndash53] Thereis a great demand for a QampA system that can effectivelyand efficiently provide health consulting services and assistpeople in monitoring and managing their individual healthconditions Jung et al [7] explored a mobile healthcare appli-cation for providing self-diabetes management to patientsBy interoperating with Electronic Medical Record (EMR)the healthcare application provided services such as weight

management cardio-cerebrovascular risk evaluation andexercise management Waki et al [54] developed a real-time interactive system DialBetics to achieve diabetes self-management particularly HbA1c management By an eval-uation strategy the system helped patients improve theirHbA1c significantly by monitoring health data comparedwith continuing self-care regimen patients More recentlyYoo et al [1] proposed a Personal Health Record- (PHR-)based diabetes index service model through a mobile deviceoffering users a management information service for pre-venting diabetes Users were able to check their healthconditions on a real-time basis and receive informationabout desirable health behaviors and dietary habits related todiabetes

Yet the existing diabetes management applications pro-vided general information search and management whileignoring counseling services which were crucial for man-aging the health condition of diabetes patients Besidesas claimed by Mishra et al [29] the cons of restricteddomain QampA included the limited repository of domain-specific questions To overcome these difficulties we builta LMD-FAQ repository to provide users with concise andaccurate answers by physicians or experts from debatesrelated professional websites Moreover we aim to leveragethe LMD-FAQ repository to provide counseling services ofdiet medication and symptoms for diabetes patients Inaddition based on our previous work [55] by analyzingglobal clinical trials of 190 countries provided by the NationalInstitutes of Health (NIH) we discovered 6 representativehealth characteristics that were closely related to diabetesmellitus to better manage usersrsquo health conditions The sixrepresentative health characteristics were Body Mass Index(BMI) Glucose Systolic Hypertension Diastolic Hyperten-sion HbA1c and creatinine Further we defined severalearly warning intervals of health characteristics by referringto the existing international medical standards for healthmanagement and risk warning

3 Methods and Materials

The architecture of our mobile-based diabetes question-answering and early warning system Dia-AID is shown inFigure 2 It consists of 3 modules a large-scale multilanguagediabetes FAQ repository (LMD-FAQ repository) a novelmultimode fusion question-answering framework (MMF-QA) and a diabetes data management module with earlywarning (DM-EW) The LMD-FAQ repository contains alarge number of diabetes question-answer pairs acquiredfrom mainstream diabetes-related professional websites TheMMF-QA framework integrates three strategies knowledge-based QampA FAQ-based QampA and web-based QampA TheDM-EW module records patientsrsquo health data and monitorstheir health conditions in real time Six representative healthcharacteristics that are closely related to diabetes mellitusthat is BMI glucose systolic hypertension diastolic hyper-tension HbA1c and creatinine are applied In case of a rapidcharacteristic change or a predication of deterioration themodule will automatically warn patients and provide themwith dietary guidelines

4 Wireless Communications and Mobile Computing

Dia-AID

LMD-FAQ Repository

Automatic Question Acquisition

Question Target Identification and

Classification

Semantic Pattern Generation and Matching

Terminals

1 A Large-scale Multi-language Diabetes FAQ Repository

2 A Multi-mode FusionQampA Framework

3 Data Management with Early Warning Function

Computer

Smart Device

Mobile Phone

Knowledge-based QampA

FAQ-based QampA

Web-based QampA

Personal Characteristic Health Data

Heath Data Monitoring

Risk Warning

QSem-based Question Matching

LSI-based Answer Ranking

Risk Evaluation

Answer Selection

User profiles

Figure 2 The architecture of the diabetes question-answering and early warning system Dia-AID

31 The Large-Scale Multilanguage Diabetes FAQ RepositoryFrequently Asked Questions (FAQs) provide specific answersto the questions that are frequently asked when users browsespecific websites For example the website Health China(httphealthchinacom) allows users to ask questions infree text and those who are experts in the field answer thequestions freely These questions with professional answersare collected and organized as FAQ data The FAQ datacan dramatically benefit question answering by reusingthe accumulated professional knowledge In this paper wedevelop a method to automatically construct a large-scalemultilanguage diabetes FAQ (LMD-FAQ) repository throughidentifying FAQ data from professional diabetes websites

As illustrated in Figure 2 our method includes four steps(1) The first step is automatic question acquisition We firstanalyze the page structures of specific websites to identifydiabetes questions The websites elaborately selected bydomain experts include Diabetes Clinical Guidelines (Chi-neseMedical Association Diabetes Branch) professional dia-betes websites (International Diabetes Union the AmericanDiabetes Association etc) diabetes professional information

websites (CDC Health Channel) and diabetes interactivequestion-answering websites (Yahoo Knowledge)The ques-tions and associated answers are then extracted using regularexpression matching with the page codes (2) The secondstep is question target identification and classification Basedon our previous work [56 57] an automated answer typeidentification and classification method is applied to extractthe target and intent of questions by utilizing both syntacticand semantic analysis Considering that syntactic structuresvary according to the ways questions are asked four typicalsituations are identified and analyzed with each of themhaving a specific processing strategy During the processquestion target features are extracted via a principle-basedsyntactic parser and then expanded with their hypernymyfeatures and semantic labels Finally the expanded featuresare sent to a trained classifier to predict corresponding answertypes (3) The third step is semantic pattern generation andmatching Semantic pattern is utilized to index the questionswith answers in a more structured and semantic way Withquestion target and answer type extracted by the secondstep the questions are represented by a structural semantic

Wireless Communications and Mobile Computing 5

Figure 3 The visualization of example FAQ data in the LMD-FAQ repository

pattern which consists of five components the questiontarget question type concept event and constraint Anentropy-based method proposed in previous work [58] isapplied for automated semantic pattern generation Figure 3shows the visualization of example FAQ data in the LMD-FAQ repository

Based on the above procedure the method extractsFAQ data from professional websites formats them using aconsistent representation and indexes them with semanticpatterns for fast retrievalThrough the automatic process andthe human review on the indexed data the FAQ repositorycan be incrementally maintained Currently the LMD-FAQrepository comprises 19317 English frequently asked QApairs and 6041 Chinese QA pairs The repository providesour QampA system with fundamental data support for answer-ing commonly posted questions

32 The Multimode Fusion Question-Answering Frame-work Themultimode fusion question-answering framework(MMF-QA) integrates three QampA models knowledge-basedQampA FAQ-based QampA and web-based QampA The overallframework is shown in Figure 4The procedure of themodelsis described as follows

The knowledge-based QampA model relies on a diabetesknowledge base to generate concise answers for postedquestions For a new given question the model analyzes thestructure and keywords of the question and then generates

a corresponding semantic pattern Thus the question istransformed from natural language to a structural semanticrepresentation that captures semantic information such asquestion target question type concept event and con-straint The question then is further represented as a tuple([Concept1] Relation [Concept2]) in which ldquoConcept1rdquo andldquoConcept2rdquo are used to label meaningful entities The repre-sented question is used for answer extraction fromknowledgebase For instance ldquoWhatrsquos the symptoms of diabetesrdquo isrepresented as ([symptoms] Rel of [diabetes]) Thereforethe knowledge-based QampA process mainly maps entitiesand their relations to formally represented tuples which arefurther used to match knowledge base to retrieve accuratelymatched knowledge elements as answers

The FAQ-based QampA model computes matching scoresbetween a given question and questions in the FAQ reposi-toryThe questionswithmatching scores larger than a specificthreshold are kept as candidates The candidate questionsthen are ranked and the top 119896 questions with the highestscores are returned The model consists of three main stepsQsem-based question matching LSI-based answer rankingand answer selection As claimed by [59] a major challengeof FAQ-based QampA is to match questions to correspond-ing question-answer pairs Here we apply a QSem-basedquestion matching framework proposed in one of ourprevious works [60] to support answering FAQs throughreusing accumulated QA data The framework considers

6 Wireless Communications and Mobile Computing

User Questions

Multi-mode Fusion QampAFramework

Question Target Identification and Classification

Semantic Pattern-based Question Representation

Question Analysis

Knowledge Base

LMD-FAQ Repository

WebPreprocessed

questionsTop N

Answers

Semantic Pattern Generation

Semantic Pattern-based Question Representation

Knowledge-based QampA

Qsem-based Question Matching

LSI-based Answer Ranking

FAQ-based QampA

Answer Selection

Qsem-based Question Matching

Web-based QampA

knowledge elements

AN

Qcad

Qfaq

Qweb

Figure 4 The multimode fusion QampA framework MMF-QA

both question word types and semantic pattern accordingto their functionalities in question matching The questionword types include question target word user-oriented wordand irrelevant word These three word types are semanticallylabeled by a predefined ontology to enrich the semanticrepresentation of questions For each word type differentsimilarity strategies are applied to calculate the similarity asdescribed in [60] The similarity calculations for questiontarget and user-oriented word type between question 119902119894 and aFAQ candidate faq119895 are shown in (1) (2) and (3) respectively

119878119894119898119894119876119879 (119902119894 119891119886119902119895)

=2 times 10038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast) cap 119891119886119902119895 (119876119879119882 997888rarr 119871lowast)100381610038161003816100381610038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast)1003816100381610038161003816 +

10038161003816100381610038161003816119891119886119902119895 (119876119879119882 997888rarr 119871lowast)10038161003816100381610038161003816

(1)

119878119872119886119905119888ℎ (119908119898 119908119899)

=

0 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 = 01 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 ge 1

(2)

119878119894119898119894119880119874 (119902119894 119891119886119902119895)

=2 times sum119878119872119886119905119888ℎ (119902119894 (119880119874119882) 119891119886119902119895 (119880119874119882))

1003816100381610038161003816119902119894 (119880119874119882)1003816100381610038161003816 +10038161003816100381610038161003816119891119886119902119895 (119880119874119882)

10038161003816100381610038161003816

(3)

In the equations Simi119876119879 denotes the similarity score ofQT word type between a given new question 119902119894 and an exist-ing FAQ question faq119895 119871lowast denotes the set of semantic labelscorresponding to target words of the question 119902119894(119876119879119882 rarr119871lowast) and 119891119886119902119895(119876119879119882 rarr 119871lowast) represent the semantic labelsof QT words in 119902119894 and faq119895 through semantic labelingrespectively 119908119898 cup 119908119898 rarr 119878(119908119898) denotes synonymy wordsexpansion of word 119908119898 SMatch denotes the synonymy-basedword matching of two words 119908119898 and 119908119899 119908119898 cup 119908119898 rarr 119878(119908119898)is the synonymy extension of word 119908119898 by adding synonymyword collection 119878(119908119898) By integrating the previous three partsof matching the overall matching score 119878119894119898119894119878119862(119902119894 119891119886119902119895) ofthe two questions 119902119894 and faq119895 through balancing the similarityof each part is calculated as shown in

119872119886119905119888ℎ119878119862 (119902119894 119891119886119902119895) = 120572 times 119878119894119898119894119876119879 + 120573 times 119878119894119898119894119880119874+ (1 minus 120572 minus 120573) times 119878119894119898119894119878119875

(4)

After question matching top 119896 FAQs with the highestmatching scores are selected as candidates set 119876119891119886119902 Mean-while the web-based QampA model uses a similar strategy tocompute the matching scores to web question collections Itextracts 119896 answers from websites via the standard question-answering techniques Similarly the web-based QampA returnsa candidate question-answer set Q119908119890119887 Q119908119890119887 and 119876119891119886119902 are

Wireless Communications and Mobile Computing 7

Figure 5 The screen snapshots of the mobile-based system for providing diabetes information services using the three QampA models

merged as the final answer candidates Q119888119886119889 for answer rank-ing and answer selection

We propose a LSI-based answer ranking method to re-rank the questions in Q119888119886119889 The ranking method consistsof three steps feature extraction Latent Semantic Indexing(LSI) similarity calculation and ranking The extracted fea-tures of Chinese questions are bag-of-words (BOWs) andCharacter while the features for English questions are bag-of-words feature only The LSI approach takes advantage ofimplicit higher-order semantic structure and matches wordsin queries with words in documents [61] Here we treateach candidate answer as a short document and detect themost relevant answers via the LSI-based method After thatthe candidate answers are reranked based on the similarityvalues and the top 119873 answers as candidate list 119860119873 arereturned

Finally there is an answer selection processThe selectionof a candidate answer as correct or incorrect can be treated asa binary classification task The question and correspondingtop 119873 candidate answers in list 119860119873 are transformed to 119873QA pairs We propose an answer selection approach viaa Logistic Regression (LR) classifier which includes foursteps feature extraction parameter tuning model trainingand answer selecting Using the features similar to the LSI-based approach QA pairs are randomly selected from theLMD-FAQ repository as training data The QA pairs withcorrect answers are labeled as ldquo1rdquo and ldquo0rdquo otherwiseWe thentune the parameter ldquoCrdquo (inverse of regularization strength)to avoid overfittingunderfitting issue After parameter opti-mization the best parameter is applied in the LR classifierwhich then is applied to select the best119873 candidate answerswhere the top 1 is the best answer and the remaining N-1answers in list 119860119873 are relevant answers Figure 5 shows the

screen snapshots of the knowledge-based QampA FAQ-basedQampA and web-based QampA modes

33 Diabetes Data Management with Early Warning Sincediabetes patients and people at high risk usually needlong-term health management we develop a real-time datamanagement module incorporating early warning to achievepatient health self-management

In the data management module users are required toregister their basic information After that the users canlog in to report their recent health data related to six maincharacteristics HbA1c BMI glucose systolic hypertension(hypertension S) diastolic hypertension (hypertension D)and creatinine The health data then are stored in server sidesecurely

With the historical health data the module calculatesand monitors the health status in real time For each of thecharacteristics we set an alarm value according to literaturereview on IDF documents and reports Once the health datahas a dramatic change or the characteristics are close totheir corresponding alarm value ranges the system will auto-matically deliver a warning message to the users about thesituation To evaluate the usability of the system a 2-monthrandomized study is designed Thirty people volunteered asinternal test users to monitor their health condition via theDia-AID system During the test users measure and reportthe data of the six characteristics by themselves Based oneach new data report the system calculates the existing dataand newly submitted data to make a summarization of thehealth condition in real time Table 1 shows the reportedhealth data records by a user Cecil

The system records all the reported health data and gener-ates data change curves automatically For example Figure 6

8 Wireless Communications and Mobile Computing

Table 1 The reported health data records of the user Cecil

Characteristic Value Time Characteristic Value TimeHBA1C 75 2017-11-01 102203 HBA1C 90 2017-11-04 191518GLUCOSE 120 2017-11-01 102203 BMI 220 2017-11-04 191518BMI 221 2017-11-01 102203 GLUCOSE 150 2017-11-04 191518HYPERTENSION S 1110 2017-11-01 102203 HYPERTENSION S 1130 2017-11-04 191518HYPERTENSION D 820 2017-11-01 102203 HYPERTENSION D 840 2017-11-04 191518CREATININE 118 2017-11-01 102203 CREATININE 124 2017-11-04 191518HBA1C 60 2017-11-02 182439 HBA1C 90 2017-11-05 181025BMI 220 2017-11-02 182439 BMI 240 2017-11-05 181025GLUCOSE 100 2017-11-02 182439 GLUCOSE 150 2017-11-05 181025HYPERTENSION S 1100 2017-11-02 182439 HYPERTENSION S 1130 2017-11-05 181025HYPERTENSION D 800 2017-11-02 182439 HYPERTENSION D 840 2017-11-05 181025CREATININE 12 2017-11-02 182439 CREATININE 124 2017-11-05 181025HBA1C 78 2017-11-03 183028 HBA1C 100 2017-11-06 192113BMI 219 2017-11-03 183028 BMI 240 2017-11-06 192113GLUCOSE 120 2017-11-03 183028 GLUCOSE 180 2017-11-06 192113HYPERTENSION S 1110 2017-11-03 183028 HYPERTENSION S 1150 2017-11-06 192113HYPERTENSION D 820 2017-11-03 183028 HYPERTENSION D 860 2017-11-06 192113CREATININE 12 2017-11-03 183028 CREATININE 115 2017-11-06 192113

Figure 6 The health data management with early warning function and health record visualization by time for the user Cecil

Wireless Communications and Mobile Computing 9

shows the trend curve of Cecilrsquos diastolic hypertension in thelast 7 days When the current newly submitted health data iswithin safe range and there is no dramatic change comparedwith last report the system shows the user with the healthstatus messages eg ldquoYour health status is goodrdquo in greencolor Once the system identifies current user data exceedingalarm range (either too high or too low) according to thecurrent change trend the system will evaluate how long ittakes to reach the alarm value The system will evaluate howlong it will take to reach the alarm value If the period is tooshort the system will automatically warn the current userFor example the system warns the user Cecil that diastolichypertension is too high and will be in a danger range after 2days if the user does not have any control on it Through thehealth data management incorporating early warning userscan review their health status and take actions to reduce therisk of diabetes according to the warning messages

4 Results

41 Datasets Since there is no available diabetes FAQ datasetfor evaluation the evaluations of the proposed LSI-basedanswer ranking approach and answer selection methodwere based on the constructed LMD-FAQ repository Totest the LSI-based answer ranking approach we randomlyselected 500 750100012501500 and 1750 Chinese question-answer tuples (question ltanswer-setgt) from the repositoryrespectively as six subdatasets of Evaluation dataset-A Foreach question-answer tuple it contains one question and ananswer set which consists of one correct answer and nineincorrect answers randomly generated from the rest of therepositoryThus each question contains 10 candidate answersfor ranking For answer selection evaluation we suppose eachquestion has 119896 candidate answers ie for each questionk-1 incorrect answers are randomly generated as negativesamples In this paper k is set to 5 and 10 For the setting k=56000 QA pairs are randomly generated as Training dataset-B1 and 2500 QA pairs are randomly generated as Testingdataset-C1 For the setting k=10 8000QA pairs are randomlygenerated as Training dataset-B2 and 5000 QA pairs arerandomly generated as Testing dataset-C2

42 Evaluation Metrics The evaluation metrics includeMean Reciprocal Rank (MRR) AccuracyN of the returnedanswers precision recall and F1 measure all of which arecommonly usedmetrics to evaluate the performance of QampAsystems

(i) MRR Mean Reciprocal Rank of the first correctanswer as shown in (5) (ie 1 if a correct answerwas retrieved at rank 1 05 if a correct answer wasretrieved at rank 2 and so on Q is the test set and|119876| denotes the number of questions in Q rank119894represents the position of the first correct answer inanswer ranking candidates to a test question 119876119894)

(ii) AccuracyN proportion of correct answers in top119873 returned answers by the system as shown in (6)(119862119894(119873) = 1 if there is at least one correct answer in top119873 candidates otherwise it is 0)

(iii) Precision for any of the categories is the number oftrue positives (TP) (ie the number of questions cor-rectly labeled as belonging to the positive categories)that are divided by the total number of questionslabeled as belonging to the positive categories asshown in (7) False positive (FP) is the number ofquestions that the system incorrectly labeled

(iv) Recall is defined as the number of true positivesdivided by the total number of questions that actuallybelong to positive categories (ie the sum of truepositive and false negative) as shown in (8)

(v) F1-measure considers both the precision and therecall to compute a balanced score as shown in (9)

119872119877119877 = 1|119876||119876|

sum119894=1

1119903119886119899119896119894

(5)

119860119888119888119906119903119886119888119910119873 = 1|119876||119876|

sum119894=1

119862119894 (119873) (6)

119875119903119890119888119894119904119894119900119899 = 119879119875119879119875 + 119865119875 (7)

119877119890119888119886119897119897 = 119879119875119879119875 + 119865119873 (8)

1198651 = 2 lowast 119875119903119890119888119894119904119894119900119899 lowast 119877119890119888119886119897119897119875119903119890119888119894119904119894119900119899 + 119877119890119888119886119897119897 (9)

43 Results To validate the proposed LSI-based answerranking method we conduct the following two experimentsThe first experiment is to verify the effectiveness of the LSI-based answer ranking method by comparing to five base-lines We adopt Doc2Vec Latent Dirichlet Allocation (LDA)Locality Sensitive Hashing (LSH) docsim and Synonyms[62] as baselines We randomly select 500 questions andmeasure the performance inMRRandAccuracyN (AccN119873 = 1 2 3 4 5) Compared with the baselines our methodachieves the best performance in all evaluation metrics asshown in Table 2 For MRR our method improves by 1780compared to LSH which has the best performance amongbaselines For Acc1 LSH also obtains the best performanceas 06733 among baselines Our method outperformed LSHwith an improvement of 2352 In addition our methodranks 9499 of the correct answers in the top five ofcandidate answers The improvements of MRR and Acc1prove that the proposed method can potentially promote thepositions of correct answers

To assess the stability of the proposed method thesecond experiment is conducted by comparing to the samefive baselines with the measures of MRR and Acc1 Theused dataset is Evaluation dataset-A Figure 7 illustrates theexperiment results measured in MRR while Figure 8 showsthe results measured in Acc1 From the result our methodachieves stable performance over all different sizes of thequestion setsThis result is promising since ourmethod ranksmost of the correct answers in the top of the candidate answerlist Moreover compared to the baselines our method gains

10 Wireless Communications and Mobile Computing

Table 2 Evaluation results compared to baselines

MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499

Our Method

1009080706050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

MRR

LSHDoc2Vec

docsimLDASynonyms

Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure

the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list

Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy

To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments

1009080706

Acc

1

050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

Our MethodLSHDoc2Vec

docsimLDASynonyms

Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure

The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists

We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)

Wireless Communications and Mobile Computing 11

Table 3 The comparison of our answer selection method with baseline methods using different k settings

Setting Methods Accuracy Precision Recall F1

k=5

KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956

RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305

Our method 09222 08859 08657 08753

k=10

KNN 09032 08076 05258 05247RF 09106 07523 07277 07391

GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206

Our method 09651 09415 08559 08929

Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues

Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction

5 Conclusions

Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system

Acc

1000

0975

0950

0925

0900

0875

0850

0825

08002000 3000 4000

The size of training datasets5000 6000

RecallF1Precision

Figure 10 The performance with the increasing size of trainingdatasets when k=5

1000

0975

0950

0925

0900

0875

0850

0825

08004000 5000 6000

The size of training datasets7000 8000

Acc RecallF1Precision

Figure 11 The performance with the increasing size of trainingdatasets when k=10

12 Wireless Communications and Mobile Computing

assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods

Data Availability

The diabetes data is not made publicly available

Conflicts of Interest

There are no conflicts of interest in this paper

Acknowledgments

The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)

References

[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017

[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001

[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013

[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014

[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo

International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016

[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014

[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016

[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017

[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017

[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009

[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016

[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013

[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015

[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017

[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012

[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010

[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006

[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002

[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002

[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999

[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710

[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000

[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007

[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001

[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015

[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010

[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007

Wireless Communications and Mobile Computing 13

[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016

[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005

[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004

[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007

[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012

[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017

[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015

[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007

[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016

[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018

[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018

[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014

[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014

[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015

[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010

[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004

[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009

[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011

[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014

[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015

[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016

[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016

[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016

[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016

[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016

[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014

[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016

[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015

[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017

[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008

[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010

14 Wireless Communications and Mobile Computing

[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016

[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990

[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: A Mobile-Based Question-Answering and Early …downloads.hindawi.com/journals/wcmc/2018/9163160.pdfunable to serve users with clinically relevant answers in a timely manner and thus

4 Wireless Communications and Mobile Computing

Dia-AID

LMD-FAQ Repository

Automatic Question Acquisition

Question Target Identification and

Classification

Semantic Pattern Generation and Matching

Terminals

1 A Large-scale Multi-language Diabetes FAQ Repository

2 A Multi-mode FusionQampA Framework

3 Data Management with Early Warning Function

Computer

Smart Device

Mobile Phone

Knowledge-based QampA

FAQ-based QampA

Web-based QampA

Personal Characteristic Health Data

Heath Data Monitoring

Risk Warning

QSem-based Question Matching

LSI-based Answer Ranking

Risk Evaluation

Answer Selection

User profiles

Figure 2 The architecture of the diabetes question-answering and early warning system Dia-AID

31 The Large-Scale Multilanguage Diabetes FAQ RepositoryFrequently Asked Questions (FAQs) provide specific answersto the questions that are frequently asked when users browsespecific websites For example the website Health China(httphealthchinacom) allows users to ask questions infree text and those who are experts in the field answer thequestions freely These questions with professional answersare collected and organized as FAQ data The FAQ datacan dramatically benefit question answering by reusingthe accumulated professional knowledge In this paper wedevelop a method to automatically construct a large-scalemultilanguage diabetes FAQ (LMD-FAQ) repository throughidentifying FAQ data from professional diabetes websites

As illustrated in Figure 2 our method includes four steps(1) The first step is automatic question acquisition We firstanalyze the page structures of specific websites to identifydiabetes questions The websites elaborately selected bydomain experts include Diabetes Clinical Guidelines (Chi-neseMedical Association Diabetes Branch) professional dia-betes websites (International Diabetes Union the AmericanDiabetes Association etc) diabetes professional information

websites (CDC Health Channel) and diabetes interactivequestion-answering websites (Yahoo Knowledge)The ques-tions and associated answers are then extracted using regularexpression matching with the page codes (2) The secondstep is question target identification and classification Basedon our previous work [56 57] an automated answer typeidentification and classification method is applied to extractthe target and intent of questions by utilizing both syntacticand semantic analysis Considering that syntactic structuresvary according to the ways questions are asked four typicalsituations are identified and analyzed with each of themhaving a specific processing strategy During the processquestion target features are extracted via a principle-basedsyntactic parser and then expanded with their hypernymyfeatures and semantic labels Finally the expanded featuresare sent to a trained classifier to predict corresponding answertypes (3) The third step is semantic pattern generation andmatching Semantic pattern is utilized to index the questionswith answers in a more structured and semantic way Withquestion target and answer type extracted by the secondstep the questions are represented by a structural semantic

Wireless Communications and Mobile Computing 5

Figure 3 The visualization of example FAQ data in the LMD-FAQ repository

pattern which consists of five components the questiontarget question type concept event and constraint Anentropy-based method proposed in previous work [58] isapplied for automated semantic pattern generation Figure 3shows the visualization of example FAQ data in the LMD-FAQ repository

Based on the above procedure the method extractsFAQ data from professional websites formats them using aconsistent representation and indexes them with semanticpatterns for fast retrievalThrough the automatic process andthe human review on the indexed data the FAQ repositorycan be incrementally maintained Currently the LMD-FAQrepository comprises 19317 English frequently asked QApairs and 6041 Chinese QA pairs The repository providesour QampA system with fundamental data support for answer-ing commonly posted questions

32 The Multimode Fusion Question-Answering Frame-work Themultimode fusion question-answering framework(MMF-QA) integrates three QampA models knowledge-basedQampA FAQ-based QampA and web-based QampA The overallframework is shown in Figure 4The procedure of themodelsis described as follows

The knowledge-based QampA model relies on a diabetesknowledge base to generate concise answers for postedquestions For a new given question the model analyzes thestructure and keywords of the question and then generates

a corresponding semantic pattern Thus the question istransformed from natural language to a structural semanticrepresentation that captures semantic information such asquestion target question type concept event and con-straint The question then is further represented as a tuple([Concept1] Relation [Concept2]) in which ldquoConcept1rdquo andldquoConcept2rdquo are used to label meaningful entities The repre-sented question is used for answer extraction fromknowledgebase For instance ldquoWhatrsquos the symptoms of diabetesrdquo isrepresented as ([symptoms] Rel of [diabetes]) Thereforethe knowledge-based QampA process mainly maps entitiesand their relations to formally represented tuples which arefurther used to match knowledge base to retrieve accuratelymatched knowledge elements as answers

The FAQ-based QampA model computes matching scoresbetween a given question and questions in the FAQ reposi-toryThe questionswithmatching scores larger than a specificthreshold are kept as candidates The candidate questionsthen are ranked and the top 119896 questions with the highestscores are returned The model consists of three main stepsQsem-based question matching LSI-based answer rankingand answer selection As claimed by [59] a major challengeof FAQ-based QampA is to match questions to correspond-ing question-answer pairs Here we apply a QSem-basedquestion matching framework proposed in one of ourprevious works [60] to support answering FAQs throughreusing accumulated QA data The framework considers

6 Wireless Communications and Mobile Computing

User Questions

Multi-mode Fusion QampAFramework

Question Target Identification and Classification

Semantic Pattern-based Question Representation

Question Analysis

Knowledge Base

LMD-FAQ Repository

WebPreprocessed

questionsTop N

Answers

Semantic Pattern Generation

Semantic Pattern-based Question Representation

Knowledge-based QampA

Qsem-based Question Matching

LSI-based Answer Ranking

FAQ-based QampA

Answer Selection

Qsem-based Question Matching

Web-based QampA

knowledge elements

AN

Qcad

Qfaq

Qweb

Figure 4 The multimode fusion QampA framework MMF-QA

both question word types and semantic pattern accordingto their functionalities in question matching The questionword types include question target word user-oriented wordand irrelevant word These three word types are semanticallylabeled by a predefined ontology to enrich the semanticrepresentation of questions For each word type differentsimilarity strategies are applied to calculate the similarity asdescribed in [60] The similarity calculations for questiontarget and user-oriented word type between question 119902119894 and aFAQ candidate faq119895 are shown in (1) (2) and (3) respectively

119878119894119898119894119876119879 (119902119894 119891119886119902119895)

=2 times 10038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast) cap 119891119886119902119895 (119876119879119882 997888rarr 119871lowast)100381610038161003816100381610038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast)1003816100381610038161003816 +

10038161003816100381610038161003816119891119886119902119895 (119876119879119882 997888rarr 119871lowast)10038161003816100381610038161003816

(1)

119878119872119886119905119888ℎ (119908119898 119908119899)

=

0 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 = 01 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 ge 1

(2)

119878119894119898119894119880119874 (119902119894 119891119886119902119895)

=2 times sum119878119872119886119905119888ℎ (119902119894 (119880119874119882) 119891119886119902119895 (119880119874119882))

1003816100381610038161003816119902119894 (119880119874119882)1003816100381610038161003816 +10038161003816100381610038161003816119891119886119902119895 (119880119874119882)

10038161003816100381610038161003816

(3)

In the equations Simi119876119879 denotes the similarity score ofQT word type between a given new question 119902119894 and an exist-ing FAQ question faq119895 119871lowast denotes the set of semantic labelscorresponding to target words of the question 119902119894(119876119879119882 rarr119871lowast) and 119891119886119902119895(119876119879119882 rarr 119871lowast) represent the semantic labelsof QT words in 119902119894 and faq119895 through semantic labelingrespectively 119908119898 cup 119908119898 rarr 119878(119908119898) denotes synonymy wordsexpansion of word 119908119898 SMatch denotes the synonymy-basedword matching of two words 119908119898 and 119908119899 119908119898 cup 119908119898 rarr 119878(119908119898)is the synonymy extension of word 119908119898 by adding synonymyword collection 119878(119908119898) By integrating the previous three partsof matching the overall matching score 119878119894119898119894119878119862(119902119894 119891119886119902119895) ofthe two questions 119902119894 and faq119895 through balancing the similarityof each part is calculated as shown in

119872119886119905119888ℎ119878119862 (119902119894 119891119886119902119895) = 120572 times 119878119894119898119894119876119879 + 120573 times 119878119894119898119894119880119874+ (1 minus 120572 minus 120573) times 119878119894119898119894119878119875

(4)

After question matching top 119896 FAQs with the highestmatching scores are selected as candidates set 119876119891119886119902 Mean-while the web-based QampA model uses a similar strategy tocompute the matching scores to web question collections Itextracts 119896 answers from websites via the standard question-answering techniques Similarly the web-based QampA returnsa candidate question-answer set Q119908119890119887 Q119908119890119887 and 119876119891119886119902 are

Wireless Communications and Mobile Computing 7

Figure 5 The screen snapshots of the mobile-based system for providing diabetes information services using the three QampA models

merged as the final answer candidates Q119888119886119889 for answer rank-ing and answer selection

We propose a LSI-based answer ranking method to re-rank the questions in Q119888119886119889 The ranking method consistsof three steps feature extraction Latent Semantic Indexing(LSI) similarity calculation and ranking The extracted fea-tures of Chinese questions are bag-of-words (BOWs) andCharacter while the features for English questions are bag-of-words feature only The LSI approach takes advantage ofimplicit higher-order semantic structure and matches wordsin queries with words in documents [61] Here we treateach candidate answer as a short document and detect themost relevant answers via the LSI-based method After thatthe candidate answers are reranked based on the similarityvalues and the top 119873 answers as candidate list 119860119873 arereturned

Finally there is an answer selection processThe selectionof a candidate answer as correct or incorrect can be treated asa binary classification task The question and correspondingtop 119873 candidate answers in list 119860119873 are transformed to 119873QA pairs We propose an answer selection approach viaa Logistic Regression (LR) classifier which includes foursteps feature extraction parameter tuning model trainingand answer selecting Using the features similar to the LSI-based approach QA pairs are randomly selected from theLMD-FAQ repository as training data The QA pairs withcorrect answers are labeled as ldquo1rdquo and ldquo0rdquo otherwiseWe thentune the parameter ldquoCrdquo (inverse of regularization strength)to avoid overfittingunderfitting issue After parameter opti-mization the best parameter is applied in the LR classifierwhich then is applied to select the best119873 candidate answerswhere the top 1 is the best answer and the remaining N-1answers in list 119860119873 are relevant answers Figure 5 shows the

screen snapshots of the knowledge-based QampA FAQ-basedQampA and web-based QampA modes

33 Diabetes Data Management with Early Warning Sincediabetes patients and people at high risk usually needlong-term health management we develop a real-time datamanagement module incorporating early warning to achievepatient health self-management

In the data management module users are required toregister their basic information After that the users canlog in to report their recent health data related to six maincharacteristics HbA1c BMI glucose systolic hypertension(hypertension S) diastolic hypertension (hypertension D)and creatinine The health data then are stored in server sidesecurely

With the historical health data the module calculatesand monitors the health status in real time For each of thecharacteristics we set an alarm value according to literaturereview on IDF documents and reports Once the health datahas a dramatic change or the characteristics are close totheir corresponding alarm value ranges the system will auto-matically deliver a warning message to the users about thesituation To evaluate the usability of the system a 2-monthrandomized study is designed Thirty people volunteered asinternal test users to monitor their health condition via theDia-AID system During the test users measure and reportthe data of the six characteristics by themselves Based oneach new data report the system calculates the existing dataand newly submitted data to make a summarization of thehealth condition in real time Table 1 shows the reportedhealth data records by a user Cecil

The system records all the reported health data and gener-ates data change curves automatically For example Figure 6

8 Wireless Communications and Mobile Computing

Table 1 The reported health data records of the user Cecil

Characteristic Value Time Characteristic Value TimeHBA1C 75 2017-11-01 102203 HBA1C 90 2017-11-04 191518GLUCOSE 120 2017-11-01 102203 BMI 220 2017-11-04 191518BMI 221 2017-11-01 102203 GLUCOSE 150 2017-11-04 191518HYPERTENSION S 1110 2017-11-01 102203 HYPERTENSION S 1130 2017-11-04 191518HYPERTENSION D 820 2017-11-01 102203 HYPERTENSION D 840 2017-11-04 191518CREATININE 118 2017-11-01 102203 CREATININE 124 2017-11-04 191518HBA1C 60 2017-11-02 182439 HBA1C 90 2017-11-05 181025BMI 220 2017-11-02 182439 BMI 240 2017-11-05 181025GLUCOSE 100 2017-11-02 182439 GLUCOSE 150 2017-11-05 181025HYPERTENSION S 1100 2017-11-02 182439 HYPERTENSION S 1130 2017-11-05 181025HYPERTENSION D 800 2017-11-02 182439 HYPERTENSION D 840 2017-11-05 181025CREATININE 12 2017-11-02 182439 CREATININE 124 2017-11-05 181025HBA1C 78 2017-11-03 183028 HBA1C 100 2017-11-06 192113BMI 219 2017-11-03 183028 BMI 240 2017-11-06 192113GLUCOSE 120 2017-11-03 183028 GLUCOSE 180 2017-11-06 192113HYPERTENSION S 1110 2017-11-03 183028 HYPERTENSION S 1150 2017-11-06 192113HYPERTENSION D 820 2017-11-03 183028 HYPERTENSION D 860 2017-11-06 192113CREATININE 12 2017-11-03 183028 CREATININE 115 2017-11-06 192113

Figure 6 The health data management with early warning function and health record visualization by time for the user Cecil

Wireless Communications and Mobile Computing 9

shows the trend curve of Cecilrsquos diastolic hypertension in thelast 7 days When the current newly submitted health data iswithin safe range and there is no dramatic change comparedwith last report the system shows the user with the healthstatus messages eg ldquoYour health status is goodrdquo in greencolor Once the system identifies current user data exceedingalarm range (either too high or too low) according to thecurrent change trend the system will evaluate how long ittakes to reach the alarm value The system will evaluate howlong it will take to reach the alarm value If the period is tooshort the system will automatically warn the current userFor example the system warns the user Cecil that diastolichypertension is too high and will be in a danger range after 2days if the user does not have any control on it Through thehealth data management incorporating early warning userscan review their health status and take actions to reduce therisk of diabetes according to the warning messages

4 Results

41 Datasets Since there is no available diabetes FAQ datasetfor evaluation the evaluations of the proposed LSI-basedanswer ranking approach and answer selection methodwere based on the constructed LMD-FAQ repository Totest the LSI-based answer ranking approach we randomlyselected 500 750100012501500 and 1750 Chinese question-answer tuples (question ltanswer-setgt) from the repositoryrespectively as six subdatasets of Evaluation dataset-A Foreach question-answer tuple it contains one question and ananswer set which consists of one correct answer and nineincorrect answers randomly generated from the rest of therepositoryThus each question contains 10 candidate answersfor ranking For answer selection evaluation we suppose eachquestion has 119896 candidate answers ie for each questionk-1 incorrect answers are randomly generated as negativesamples In this paper k is set to 5 and 10 For the setting k=56000 QA pairs are randomly generated as Training dataset-B1 and 2500 QA pairs are randomly generated as Testingdataset-C1 For the setting k=10 8000QA pairs are randomlygenerated as Training dataset-B2 and 5000 QA pairs arerandomly generated as Testing dataset-C2

42 Evaluation Metrics The evaluation metrics includeMean Reciprocal Rank (MRR) AccuracyN of the returnedanswers precision recall and F1 measure all of which arecommonly usedmetrics to evaluate the performance of QampAsystems

(i) MRR Mean Reciprocal Rank of the first correctanswer as shown in (5) (ie 1 if a correct answerwas retrieved at rank 1 05 if a correct answer wasretrieved at rank 2 and so on Q is the test set and|119876| denotes the number of questions in Q rank119894represents the position of the first correct answer inanswer ranking candidates to a test question 119876119894)

(ii) AccuracyN proportion of correct answers in top119873 returned answers by the system as shown in (6)(119862119894(119873) = 1 if there is at least one correct answer in top119873 candidates otherwise it is 0)

(iii) Precision for any of the categories is the number oftrue positives (TP) (ie the number of questions cor-rectly labeled as belonging to the positive categories)that are divided by the total number of questionslabeled as belonging to the positive categories asshown in (7) False positive (FP) is the number ofquestions that the system incorrectly labeled

(iv) Recall is defined as the number of true positivesdivided by the total number of questions that actuallybelong to positive categories (ie the sum of truepositive and false negative) as shown in (8)

(v) F1-measure considers both the precision and therecall to compute a balanced score as shown in (9)

119872119877119877 = 1|119876||119876|

sum119894=1

1119903119886119899119896119894

(5)

119860119888119888119906119903119886119888119910119873 = 1|119876||119876|

sum119894=1

119862119894 (119873) (6)

119875119903119890119888119894119904119894119900119899 = 119879119875119879119875 + 119865119875 (7)

119877119890119888119886119897119897 = 119879119875119879119875 + 119865119873 (8)

1198651 = 2 lowast 119875119903119890119888119894119904119894119900119899 lowast 119877119890119888119886119897119897119875119903119890119888119894119904119894119900119899 + 119877119890119888119886119897119897 (9)

43 Results To validate the proposed LSI-based answerranking method we conduct the following two experimentsThe first experiment is to verify the effectiveness of the LSI-based answer ranking method by comparing to five base-lines We adopt Doc2Vec Latent Dirichlet Allocation (LDA)Locality Sensitive Hashing (LSH) docsim and Synonyms[62] as baselines We randomly select 500 questions andmeasure the performance inMRRandAccuracyN (AccN119873 = 1 2 3 4 5) Compared with the baselines our methodachieves the best performance in all evaluation metrics asshown in Table 2 For MRR our method improves by 1780compared to LSH which has the best performance amongbaselines For Acc1 LSH also obtains the best performanceas 06733 among baselines Our method outperformed LSHwith an improvement of 2352 In addition our methodranks 9499 of the correct answers in the top five ofcandidate answers The improvements of MRR and Acc1prove that the proposed method can potentially promote thepositions of correct answers

To assess the stability of the proposed method thesecond experiment is conducted by comparing to the samefive baselines with the measures of MRR and Acc1 Theused dataset is Evaluation dataset-A Figure 7 illustrates theexperiment results measured in MRR while Figure 8 showsthe results measured in Acc1 From the result our methodachieves stable performance over all different sizes of thequestion setsThis result is promising since ourmethod ranksmost of the correct answers in the top of the candidate answerlist Moreover compared to the baselines our method gains

10 Wireless Communications and Mobile Computing

Table 2 Evaluation results compared to baselines

MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499

Our Method

1009080706050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

MRR

LSHDoc2Vec

docsimLDASynonyms

Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure

the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list

Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy

To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments

1009080706

Acc

1

050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

Our MethodLSHDoc2Vec

docsimLDASynonyms

Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure

The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists

We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)

Wireless Communications and Mobile Computing 11

Table 3 The comparison of our answer selection method with baseline methods using different k settings

Setting Methods Accuracy Precision Recall F1

k=5

KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956

RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305

Our method 09222 08859 08657 08753

k=10

KNN 09032 08076 05258 05247RF 09106 07523 07277 07391

GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206

Our method 09651 09415 08559 08929

Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues

Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction

5 Conclusions

Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system

Acc

1000

0975

0950

0925

0900

0875

0850

0825

08002000 3000 4000

The size of training datasets5000 6000

RecallF1Precision

Figure 10 The performance with the increasing size of trainingdatasets when k=5

1000

0975

0950

0925

0900

0875

0850

0825

08004000 5000 6000

The size of training datasets7000 8000

Acc RecallF1Precision

Figure 11 The performance with the increasing size of trainingdatasets when k=10

12 Wireless Communications and Mobile Computing

assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods

Data Availability

The diabetes data is not made publicly available

Conflicts of Interest

There are no conflicts of interest in this paper

Acknowledgments

The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)

References

[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017

[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001

[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013

[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014

[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo

International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016

[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014

[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016

[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017

[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017

[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009

[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016

[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013

[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015

[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017

[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012

[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010

[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006

[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002

[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002

[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999

[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710

[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000

[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007

[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001

[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015

[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010

[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007

Wireless Communications and Mobile Computing 13

[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016

[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005

[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004

[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007

[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012

[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017

[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015

[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007

[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016

[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018

[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018

[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014

[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014

[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015

[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010

[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004

[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009

[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011

[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014

[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015

[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016

[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016

[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016

[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016

[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016

[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014

[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016

[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015

[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017

[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008

[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010

14 Wireless Communications and Mobile Computing

[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016

[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990

[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: A Mobile-Based Question-Answering and Early …downloads.hindawi.com/journals/wcmc/2018/9163160.pdfunable to serve users with clinically relevant answers in a timely manner and thus

Wireless Communications and Mobile Computing 5

Figure 3 The visualization of example FAQ data in the LMD-FAQ repository

pattern which consists of five components the questiontarget question type concept event and constraint Anentropy-based method proposed in previous work [58] isapplied for automated semantic pattern generation Figure 3shows the visualization of example FAQ data in the LMD-FAQ repository

Based on the above procedure the method extractsFAQ data from professional websites formats them using aconsistent representation and indexes them with semanticpatterns for fast retrievalThrough the automatic process andthe human review on the indexed data the FAQ repositorycan be incrementally maintained Currently the LMD-FAQrepository comprises 19317 English frequently asked QApairs and 6041 Chinese QA pairs The repository providesour QampA system with fundamental data support for answer-ing commonly posted questions

32 The Multimode Fusion Question-Answering Frame-work Themultimode fusion question-answering framework(MMF-QA) integrates three QampA models knowledge-basedQampA FAQ-based QampA and web-based QampA The overallframework is shown in Figure 4The procedure of themodelsis described as follows

The knowledge-based QampA model relies on a diabetesknowledge base to generate concise answers for postedquestions For a new given question the model analyzes thestructure and keywords of the question and then generates

a corresponding semantic pattern Thus the question istransformed from natural language to a structural semanticrepresentation that captures semantic information such asquestion target question type concept event and con-straint The question then is further represented as a tuple([Concept1] Relation [Concept2]) in which ldquoConcept1rdquo andldquoConcept2rdquo are used to label meaningful entities The repre-sented question is used for answer extraction fromknowledgebase For instance ldquoWhatrsquos the symptoms of diabetesrdquo isrepresented as ([symptoms] Rel of [diabetes]) Thereforethe knowledge-based QampA process mainly maps entitiesand their relations to formally represented tuples which arefurther used to match knowledge base to retrieve accuratelymatched knowledge elements as answers

The FAQ-based QampA model computes matching scoresbetween a given question and questions in the FAQ reposi-toryThe questionswithmatching scores larger than a specificthreshold are kept as candidates The candidate questionsthen are ranked and the top 119896 questions with the highestscores are returned The model consists of three main stepsQsem-based question matching LSI-based answer rankingand answer selection As claimed by [59] a major challengeof FAQ-based QampA is to match questions to correspond-ing question-answer pairs Here we apply a QSem-basedquestion matching framework proposed in one of ourprevious works [60] to support answering FAQs throughreusing accumulated QA data The framework considers

6 Wireless Communications and Mobile Computing

User Questions

Multi-mode Fusion QampAFramework

Question Target Identification and Classification

Semantic Pattern-based Question Representation

Question Analysis

Knowledge Base

LMD-FAQ Repository

WebPreprocessed

questionsTop N

Answers

Semantic Pattern Generation

Semantic Pattern-based Question Representation

Knowledge-based QampA

Qsem-based Question Matching

LSI-based Answer Ranking

FAQ-based QampA

Answer Selection

Qsem-based Question Matching

Web-based QampA

knowledge elements

AN

Qcad

Qfaq

Qweb

Figure 4 The multimode fusion QampA framework MMF-QA

both question word types and semantic pattern accordingto their functionalities in question matching The questionword types include question target word user-oriented wordand irrelevant word These three word types are semanticallylabeled by a predefined ontology to enrich the semanticrepresentation of questions For each word type differentsimilarity strategies are applied to calculate the similarity asdescribed in [60] The similarity calculations for questiontarget and user-oriented word type between question 119902119894 and aFAQ candidate faq119895 are shown in (1) (2) and (3) respectively

119878119894119898119894119876119879 (119902119894 119891119886119902119895)

=2 times 10038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast) cap 119891119886119902119895 (119876119879119882 997888rarr 119871lowast)100381610038161003816100381610038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast)1003816100381610038161003816 +

10038161003816100381610038161003816119891119886119902119895 (119876119879119882 997888rarr 119871lowast)10038161003816100381610038161003816

(1)

119878119872119886119905119888ℎ (119908119898 119908119899)

=

0 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 = 01 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 ge 1

(2)

119878119894119898119894119880119874 (119902119894 119891119886119902119895)

=2 times sum119878119872119886119905119888ℎ (119902119894 (119880119874119882) 119891119886119902119895 (119880119874119882))

1003816100381610038161003816119902119894 (119880119874119882)1003816100381610038161003816 +10038161003816100381610038161003816119891119886119902119895 (119880119874119882)

10038161003816100381610038161003816

(3)

In the equations Simi119876119879 denotes the similarity score ofQT word type between a given new question 119902119894 and an exist-ing FAQ question faq119895 119871lowast denotes the set of semantic labelscorresponding to target words of the question 119902119894(119876119879119882 rarr119871lowast) and 119891119886119902119895(119876119879119882 rarr 119871lowast) represent the semantic labelsof QT words in 119902119894 and faq119895 through semantic labelingrespectively 119908119898 cup 119908119898 rarr 119878(119908119898) denotes synonymy wordsexpansion of word 119908119898 SMatch denotes the synonymy-basedword matching of two words 119908119898 and 119908119899 119908119898 cup 119908119898 rarr 119878(119908119898)is the synonymy extension of word 119908119898 by adding synonymyword collection 119878(119908119898) By integrating the previous three partsof matching the overall matching score 119878119894119898119894119878119862(119902119894 119891119886119902119895) ofthe two questions 119902119894 and faq119895 through balancing the similarityof each part is calculated as shown in

119872119886119905119888ℎ119878119862 (119902119894 119891119886119902119895) = 120572 times 119878119894119898119894119876119879 + 120573 times 119878119894119898119894119880119874+ (1 minus 120572 minus 120573) times 119878119894119898119894119878119875

(4)

After question matching top 119896 FAQs with the highestmatching scores are selected as candidates set 119876119891119886119902 Mean-while the web-based QampA model uses a similar strategy tocompute the matching scores to web question collections Itextracts 119896 answers from websites via the standard question-answering techniques Similarly the web-based QampA returnsa candidate question-answer set Q119908119890119887 Q119908119890119887 and 119876119891119886119902 are

Wireless Communications and Mobile Computing 7

Figure 5 The screen snapshots of the mobile-based system for providing diabetes information services using the three QampA models

merged as the final answer candidates Q119888119886119889 for answer rank-ing and answer selection

We propose a LSI-based answer ranking method to re-rank the questions in Q119888119886119889 The ranking method consistsof three steps feature extraction Latent Semantic Indexing(LSI) similarity calculation and ranking The extracted fea-tures of Chinese questions are bag-of-words (BOWs) andCharacter while the features for English questions are bag-of-words feature only The LSI approach takes advantage ofimplicit higher-order semantic structure and matches wordsin queries with words in documents [61] Here we treateach candidate answer as a short document and detect themost relevant answers via the LSI-based method After thatthe candidate answers are reranked based on the similarityvalues and the top 119873 answers as candidate list 119860119873 arereturned

Finally there is an answer selection processThe selectionof a candidate answer as correct or incorrect can be treated asa binary classification task The question and correspondingtop 119873 candidate answers in list 119860119873 are transformed to 119873QA pairs We propose an answer selection approach viaa Logistic Regression (LR) classifier which includes foursteps feature extraction parameter tuning model trainingand answer selecting Using the features similar to the LSI-based approach QA pairs are randomly selected from theLMD-FAQ repository as training data The QA pairs withcorrect answers are labeled as ldquo1rdquo and ldquo0rdquo otherwiseWe thentune the parameter ldquoCrdquo (inverse of regularization strength)to avoid overfittingunderfitting issue After parameter opti-mization the best parameter is applied in the LR classifierwhich then is applied to select the best119873 candidate answerswhere the top 1 is the best answer and the remaining N-1answers in list 119860119873 are relevant answers Figure 5 shows the

screen snapshots of the knowledge-based QampA FAQ-basedQampA and web-based QampA modes

33 Diabetes Data Management with Early Warning Sincediabetes patients and people at high risk usually needlong-term health management we develop a real-time datamanagement module incorporating early warning to achievepatient health self-management

In the data management module users are required toregister their basic information After that the users canlog in to report their recent health data related to six maincharacteristics HbA1c BMI glucose systolic hypertension(hypertension S) diastolic hypertension (hypertension D)and creatinine The health data then are stored in server sidesecurely

With the historical health data the module calculatesand monitors the health status in real time For each of thecharacteristics we set an alarm value according to literaturereview on IDF documents and reports Once the health datahas a dramatic change or the characteristics are close totheir corresponding alarm value ranges the system will auto-matically deliver a warning message to the users about thesituation To evaluate the usability of the system a 2-monthrandomized study is designed Thirty people volunteered asinternal test users to monitor their health condition via theDia-AID system During the test users measure and reportthe data of the six characteristics by themselves Based oneach new data report the system calculates the existing dataand newly submitted data to make a summarization of thehealth condition in real time Table 1 shows the reportedhealth data records by a user Cecil

The system records all the reported health data and gener-ates data change curves automatically For example Figure 6

8 Wireless Communications and Mobile Computing

Table 1 The reported health data records of the user Cecil

Characteristic Value Time Characteristic Value TimeHBA1C 75 2017-11-01 102203 HBA1C 90 2017-11-04 191518GLUCOSE 120 2017-11-01 102203 BMI 220 2017-11-04 191518BMI 221 2017-11-01 102203 GLUCOSE 150 2017-11-04 191518HYPERTENSION S 1110 2017-11-01 102203 HYPERTENSION S 1130 2017-11-04 191518HYPERTENSION D 820 2017-11-01 102203 HYPERTENSION D 840 2017-11-04 191518CREATININE 118 2017-11-01 102203 CREATININE 124 2017-11-04 191518HBA1C 60 2017-11-02 182439 HBA1C 90 2017-11-05 181025BMI 220 2017-11-02 182439 BMI 240 2017-11-05 181025GLUCOSE 100 2017-11-02 182439 GLUCOSE 150 2017-11-05 181025HYPERTENSION S 1100 2017-11-02 182439 HYPERTENSION S 1130 2017-11-05 181025HYPERTENSION D 800 2017-11-02 182439 HYPERTENSION D 840 2017-11-05 181025CREATININE 12 2017-11-02 182439 CREATININE 124 2017-11-05 181025HBA1C 78 2017-11-03 183028 HBA1C 100 2017-11-06 192113BMI 219 2017-11-03 183028 BMI 240 2017-11-06 192113GLUCOSE 120 2017-11-03 183028 GLUCOSE 180 2017-11-06 192113HYPERTENSION S 1110 2017-11-03 183028 HYPERTENSION S 1150 2017-11-06 192113HYPERTENSION D 820 2017-11-03 183028 HYPERTENSION D 860 2017-11-06 192113CREATININE 12 2017-11-03 183028 CREATININE 115 2017-11-06 192113

Figure 6 The health data management with early warning function and health record visualization by time for the user Cecil

Wireless Communications and Mobile Computing 9

shows the trend curve of Cecilrsquos diastolic hypertension in thelast 7 days When the current newly submitted health data iswithin safe range and there is no dramatic change comparedwith last report the system shows the user with the healthstatus messages eg ldquoYour health status is goodrdquo in greencolor Once the system identifies current user data exceedingalarm range (either too high or too low) according to thecurrent change trend the system will evaluate how long ittakes to reach the alarm value The system will evaluate howlong it will take to reach the alarm value If the period is tooshort the system will automatically warn the current userFor example the system warns the user Cecil that diastolichypertension is too high and will be in a danger range after 2days if the user does not have any control on it Through thehealth data management incorporating early warning userscan review their health status and take actions to reduce therisk of diabetes according to the warning messages

4 Results

41 Datasets Since there is no available diabetes FAQ datasetfor evaluation the evaluations of the proposed LSI-basedanswer ranking approach and answer selection methodwere based on the constructed LMD-FAQ repository Totest the LSI-based answer ranking approach we randomlyselected 500 750100012501500 and 1750 Chinese question-answer tuples (question ltanswer-setgt) from the repositoryrespectively as six subdatasets of Evaluation dataset-A Foreach question-answer tuple it contains one question and ananswer set which consists of one correct answer and nineincorrect answers randomly generated from the rest of therepositoryThus each question contains 10 candidate answersfor ranking For answer selection evaluation we suppose eachquestion has 119896 candidate answers ie for each questionk-1 incorrect answers are randomly generated as negativesamples In this paper k is set to 5 and 10 For the setting k=56000 QA pairs are randomly generated as Training dataset-B1 and 2500 QA pairs are randomly generated as Testingdataset-C1 For the setting k=10 8000QA pairs are randomlygenerated as Training dataset-B2 and 5000 QA pairs arerandomly generated as Testing dataset-C2

42 Evaluation Metrics The evaluation metrics includeMean Reciprocal Rank (MRR) AccuracyN of the returnedanswers precision recall and F1 measure all of which arecommonly usedmetrics to evaluate the performance of QampAsystems

(i) MRR Mean Reciprocal Rank of the first correctanswer as shown in (5) (ie 1 if a correct answerwas retrieved at rank 1 05 if a correct answer wasretrieved at rank 2 and so on Q is the test set and|119876| denotes the number of questions in Q rank119894represents the position of the first correct answer inanswer ranking candidates to a test question 119876119894)

(ii) AccuracyN proportion of correct answers in top119873 returned answers by the system as shown in (6)(119862119894(119873) = 1 if there is at least one correct answer in top119873 candidates otherwise it is 0)

(iii) Precision for any of the categories is the number oftrue positives (TP) (ie the number of questions cor-rectly labeled as belonging to the positive categories)that are divided by the total number of questionslabeled as belonging to the positive categories asshown in (7) False positive (FP) is the number ofquestions that the system incorrectly labeled

(iv) Recall is defined as the number of true positivesdivided by the total number of questions that actuallybelong to positive categories (ie the sum of truepositive and false negative) as shown in (8)

(v) F1-measure considers both the precision and therecall to compute a balanced score as shown in (9)

119872119877119877 = 1|119876||119876|

sum119894=1

1119903119886119899119896119894

(5)

119860119888119888119906119903119886119888119910119873 = 1|119876||119876|

sum119894=1

119862119894 (119873) (6)

119875119903119890119888119894119904119894119900119899 = 119879119875119879119875 + 119865119875 (7)

119877119890119888119886119897119897 = 119879119875119879119875 + 119865119873 (8)

1198651 = 2 lowast 119875119903119890119888119894119904119894119900119899 lowast 119877119890119888119886119897119897119875119903119890119888119894119904119894119900119899 + 119877119890119888119886119897119897 (9)

43 Results To validate the proposed LSI-based answerranking method we conduct the following two experimentsThe first experiment is to verify the effectiveness of the LSI-based answer ranking method by comparing to five base-lines We adopt Doc2Vec Latent Dirichlet Allocation (LDA)Locality Sensitive Hashing (LSH) docsim and Synonyms[62] as baselines We randomly select 500 questions andmeasure the performance inMRRandAccuracyN (AccN119873 = 1 2 3 4 5) Compared with the baselines our methodachieves the best performance in all evaluation metrics asshown in Table 2 For MRR our method improves by 1780compared to LSH which has the best performance amongbaselines For Acc1 LSH also obtains the best performanceas 06733 among baselines Our method outperformed LSHwith an improvement of 2352 In addition our methodranks 9499 of the correct answers in the top five ofcandidate answers The improvements of MRR and Acc1prove that the proposed method can potentially promote thepositions of correct answers

To assess the stability of the proposed method thesecond experiment is conducted by comparing to the samefive baselines with the measures of MRR and Acc1 Theused dataset is Evaluation dataset-A Figure 7 illustrates theexperiment results measured in MRR while Figure 8 showsthe results measured in Acc1 From the result our methodachieves stable performance over all different sizes of thequestion setsThis result is promising since ourmethod ranksmost of the correct answers in the top of the candidate answerlist Moreover compared to the baselines our method gains

10 Wireless Communications and Mobile Computing

Table 2 Evaluation results compared to baselines

MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499

Our Method

1009080706050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

MRR

LSHDoc2Vec

docsimLDASynonyms

Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure

the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list

Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy

To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments

1009080706

Acc

1

050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

Our MethodLSHDoc2Vec

docsimLDASynonyms

Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure

The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists

We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)

Wireless Communications and Mobile Computing 11

Table 3 The comparison of our answer selection method with baseline methods using different k settings

Setting Methods Accuracy Precision Recall F1

k=5

KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956

RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305

Our method 09222 08859 08657 08753

k=10

KNN 09032 08076 05258 05247RF 09106 07523 07277 07391

GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206

Our method 09651 09415 08559 08929

Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues

Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction

5 Conclusions

Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system

Acc

1000

0975

0950

0925

0900

0875

0850

0825

08002000 3000 4000

The size of training datasets5000 6000

RecallF1Precision

Figure 10 The performance with the increasing size of trainingdatasets when k=5

1000

0975

0950

0925

0900

0875

0850

0825

08004000 5000 6000

The size of training datasets7000 8000

Acc RecallF1Precision

Figure 11 The performance with the increasing size of trainingdatasets when k=10

12 Wireless Communications and Mobile Computing

assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods

Data Availability

The diabetes data is not made publicly available

Conflicts of Interest

There are no conflicts of interest in this paper

Acknowledgments

The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)

References

[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017

[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001

[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013

[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014

[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo

International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016

[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014

[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016

[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017

[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017

[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009

[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016

[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013

[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015

[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017

[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012

[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010

[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006

[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002

[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002

[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999

[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710

[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000

[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007

[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001

[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015

[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010

[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007

Wireless Communications and Mobile Computing 13

[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016

[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005

[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004

[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007

[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012

[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017

[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015

[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007

[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016

[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018

[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018

[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014

[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014

[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015

[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010

[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004

[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009

[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011

[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014

[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015

[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016

[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016

[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016

[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016

[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016

[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014

[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016

[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015

[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017

[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008

[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010

14 Wireless Communications and Mobile Computing

[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016

[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990

[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: A Mobile-Based Question-Answering and Early …downloads.hindawi.com/journals/wcmc/2018/9163160.pdfunable to serve users with clinically relevant answers in a timely manner and thus

6 Wireless Communications and Mobile Computing

User Questions

Multi-mode Fusion QampAFramework

Question Target Identification and Classification

Semantic Pattern-based Question Representation

Question Analysis

Knowledge Base

LMD-FAQ Repository

WebPreprocessed

questionsTop N

Answers

Semantic Pattern Generation

Semantic Pattern-based Question Representation

Knowledge-based QampA

Qsem-based Question Matching

LSI-based Answer Ranking

FAQ-based QampA

Answer Selection

Qsem-based Question Matching

Web-based QampA

knowledge elements

AN

Qcad

Qfaq

Qweb

Figure 4 The multimode fusion QampA framework MMF-QA

both question word types and semantic pattern accordingto their functionalities in question matching The questionword types include question target word user-oriented wordand irrelevant word These three word types are semanticallylabeled by a predefined ontology to enrich the semanticrepresentation of questions For each word type differentsimilarity strategies are applied to calculate the similarity asdescribed in [60] The similarity calculations for questiontarget and user-oriented word type between question 119902119894 and aFAQ candidate faq119895 are shown in (1) (2) and (3) respectively

119878119894119898119894119876119879 (119902119894 119891119886119902119895)

=2 times 10038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast) cap 119891119886119902119895 (119876119879119882 997888rarr 119871lowast)100381610038161003816100381610038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast)1003816100381610038161003816 +

10038161003816100381610038161003816119891119886119902119895 (119876119879119882 997888rarr 119871lowast)10038161003816100381610038161003816

(1)

119878119872119886119905119888ℎ (119908119898 119908119899)

=

0 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 = 01 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 ge 1

(2)

119878119894119898119894119880119874 (119902119894 119891119886119902119895)

=2 times sum119878119872119886119905119888ℎ (119902119894 (119880119874119882) 119891119886119902119895 (119880119874119882))

1003816100381610038161003816119902119894 (119880119874119882)1003816100381610038161003816 +10038161003816100381610038161003816119891119886119902119895 (119880119874119882)

10038161003816100381610038161003816

(3)

In the equations Simi119876119879 denotes the similarity score ofQT word type between a given new question 119902119894 and an exist-ing FAQ question faq119895 119871lowast denotes the set of semantic labelscorresponding to target words of the question 119902119894(119876119879119882 rarr119871lowast) and 119891119886119902119895(119876119879119882 rarr 119871lowast) represent the semantic labelsof QT words in 119902119894 and faq119895 through semantic labelingrespectively 119908119898 cup 119908119898 rarr 119878(119908119898) denotes synonymy wordsexpansion of word 119908119898 SMatch denotes the synonymy-basedword matching of two words 119908119898 and 119908119899 119908119898 cup 119908119898 rarr 119878(119908119898)is the synonymy extension of word 119908119898 by adding synonymyword collection 119878(119908119898) By integrating the previous three partsof matching the overall matching score 119878119894119898119894119878119862(119902119894 119891119886119902119895) ofthe two questions 119902119894 and faq119895 through balancing the similarityof each part is calculated as shown in

119872119886119905119888ℎ119878119862 (119902119894 119891119886119902119895) = 120572 times 119878119894119898119894119876119879 + 120573 times 119878119894119898119894119880119874+ (1 minus 120572 minus 120573) times 119878119894119898119894119878119875

(4)

After question matching top 119896 FAQs with the highestmatching scores are selected as candidates set 119876119891119886119902 Mean-while the web-based QampA model uses a similar strategy tocompute the matching scores to web question collections Itextracts 119896 answers from websites via the standard question-answering techniques Similarly the web-based QampA returnsa candidate question-answer set Q119908119890119887 Q119908119890119887 and 119876119891119886119902 are

Wireless Communications and Mobile Computing 7

Figure 5 The screen snapshots of the mobile-based system for providing diabetes information services using the three QampA models

merged as the final answer candidates Q119888119886119889 for answer rank-ing and answer selection

We propose a LSI-based answer ranking method to re-rank the questions in Q119888119886119889 The ranking method consistsof three steps feature extraction Latent Semantic Indexing(LSI) similarity calculation and ranking The extracted fea-tures of Chinese questions are bag-of-words (BOWs) andCharacter while the features for English questions are bag-of-words feature only The LSI approach takes advantage ofimplicit higher-order semantic structure and matches wordsin queries with words in documents [61] Here we treateach candidate answer as a short document and detect themost relevant answers via the LSI-based method After thatthe candidate answers are reranked based on the similarityvalues and the top 119873 answers as candidate list 119860119873 arereturned

Finally there is an answer selection processThe selectionof a candidate answer as correct or incorrect can be treated asa binary classification task The question and correspondingtop 119873 candidate answers in list 119860119873 are transformed to 119873QA pairs We propose an answer selection approach viaa Logistic Regression (LR) classifier which includes foursteps feature extraction parameter tuning model trainingand answer selecting Using the features similar to the LSI-based approach QA pairs are randomly selected from theLMD-FAQ repository as training data The QA pairs withcorrect answers are labeled as ldquo1rdquo and ldquo0rdquo otherwiseWe thentune the parameter ldquoCrdquo (inverse of regularization strength)to avoid overfittingunderfitting issue After parameter opti-mization the best parameter is applied in the LR classifierwhich then is applied to select the best119873 candidate answerswhere the top 1 is the best answer and the remaining N-1answers in list 119860119873 are relevant answers Figure 5 shows the

screen snapshots of the knowledge-based QampA FAQ-basedQampA and web-based QampA modes

33 Diabetes Data Management with Early Warning Sincediabetes patients and people at high risk usually needlong-term health management we develop a real-time datamanagement module incorporating early warning to achievepatient health self-management

In the data management module users are required toregister their basic information After that the users canlog in to report their recent health data related to six maincharacteristics HbA1c BMI glucose systolic hypertension(hypertension S) diastolic hypertension (hypertension D)and creatinine The health data then are stored in server sidesecurely

With the historical health data the module calculatesand monitors the health status in real time For each of thecharacteristics we set an alarm value according to literaturereview on IDF documents and reports Once the health datahas a dramatic change or the characteristics are close totheir corresponding alarm value ranges the system will auto-matically deliver a warning message to the users about thesituation To evaluate the usability of the system a 2-monthrandomized study is designed Thirty people volunteered asinternal test users to monitor their health condition via theDia-AID system During the test users measure and reportthe data of the six characteristics by themselves Based oneach new data report the system calculates the existing dataand newly submitted data to make a summarization of thehealth condition in real time Table 1 shows the reportedhealth data records by a user Cecil

The system records all the reported health data and gener-ates data change curves automatically For example Figure 6

8 Wireless Communications and Mobile Computing

Table 1 The reported health data records of the user Cecil

Characteristic Value Time Characteristic Value TimeHBA1C 75 2017-11-01 102203 HBA1C 90 2017-11-04 191518GLUCOSE 120 2017-11-01 102203 BMI 220 2017-11-04 191518BMI 221 2017-11-01 102203 GLUCOSE 150 2017-11-04 191518HYPERTENSION S 1110 2017-11-01 102203 HYPERTENSION S 1130 2017-11-04 191518HYPERTENSION D 820 2017-11-01 102203 HYPERTENSION D 840 2017-11-04 191518CREATININE 118 2017-11-01 102203 CREATININE 124 2017-11-04 191518HBA1C 60 2017-11-02 182439 HBA1C 90 2017-11-05 181025BMI 220 2017-11-02 182439 BMI 240 2017-11-05 181025GLUCOSE 100 2017-11-02 182439 GLUCOSE 150 2017-11-05 181025HYPERTENSION S 1100 2017-11-02 182439 HYPERTENSION S 1130 2017-11-05 181025HYPERTENSION D 800 2017-11-02 182439 HYPERTENSION D 840 2017-11-05 181025CREATININE 12 2017-11-02 182439 CREATININE 124 2017-11-05 181025HBA1C 78 2017-11-03 183028 HBA1C 100 2017-11-06 192113BMI 219 2017-11-03 183028 BMI 240 2017-11-06 192113GLUCOSE 120 2017-11-03 183028 GLUCOSE 180 2017-11-06 192113HYPERTENSION S 1110 2017-11-03 183028 HYPERTENSION S 1150 2017-11-06 192113HYPERTENSION D 820 2017-11-03 183028 HYPERTENSION D 860 2017-11-06 192113CREATININE 12 2017-11-03 183028 CREATININE 115 2017-11-06 192113

Figure 6 The health data management with early warning function and health record visualization by time for the user Cecil

Wireless Communications and Mobile Computing 9

shows the trend curve of Cecilrsquos diastolic hypertension in thelast 7 days When the current newly submitted health data iswithin safe range and there is no dramatic change comparedwith last report the system shows the user with the healthstatus messages eg ldquoYour health status is goodrdquo in greencolor Once the system identifies current user data exceedingalarm range (either too high or too low) according to thecurrent change trend the system will evaluate how long ittakes to reach the alarm value The system will evaluate howlong it will take to reach the alarm value If the period is tooshort the system will automatically warn the current userFor example the system warns the user Cecil that diastolichypertension is too high and will be in a danger range after 2days if the user does not have any control on it Through thehealth data management incorporating early warning userscan review their health status and take actions to reduce therisk of diabetes according to the warning messages

4 Results

41 Datasets Since there is no available diabetes FAQ datasetfor evaluation the evaluations of the proposed LSI-basedanswer ranking approach and answer selection methodwere based on the constructed LMD-FAQ repository Totest the LSI-based answer ranking approach we randomlyselected 500 750100012501500 and 1750 Chinese question-answer tuples (question ltanswer-setgt) from the repositoryrespectively as six subdatasets of Evaluation dataset-A Foreach question-answer tuple it contains one question and ananswer set which consists of one correct answer and nineincorrect answers randomly generated from the rest of therepositoryThus each question contains 10 candidate answersfor ranking For answer selection evaluation we suppose eachquestion has 119896 candidate answers ie for each questionk-1 incorrect answers are randomly generated as negativesamples In this paper k is set to 5 and 10 For the setting k=56000 QA pairs are randomly generated as Training dataset-B1 and 2500 QA pairs are randomly generated as Testingdataset-C1 For the setting k=10 8000QA pairs are randomlygenerated as Training dataset-B2 and 5000 QA pairs arerandomly generated as Testing dataset-C2

42 Evaluation Metrics The evaluation metrics includeMean Reciprocal Rank (MRR) AccuracyN of the returnedanswers precision recall and F1 measure all of which arecommonly usedmetrics to evaluate the performance of QampAsystems

(i) MRR Mean Reciprocal Rank of the first correctanswer as shown in (5) (ie 1 if a correct answerwas retrieved at rank 1 05 if a correct answer wasretrieved at rank 2 and so on Q is the test set and|119876| denotes the number of questions in Q rank119894represents the position of the first correct answer inanswer ranking candidates to a test question 119876119894)

(ii) AccuracyN proportion of correct answers in top119873 returned answers by the system as shown in (6)(119862119894(119873) = 1 if there is at least one correct answer in top119873 candidates otherwise it is 0)

(iii) Precision for any of the categories is the number oftrue positives (TP) (ie the number of questions cor-rectly labeled as belonging to the positive categories)that are divided by the total number of questionslabeled as belonging to the positive categories asshown in (7) False positive (FP) is the number ofquestions that the system incorrectly labeled

(iv) Recall is defined as the number of true positivesdivided by the total number of questions that actuallybelong to positive categories (ie the sum of truepositive and false negative) as shown in (8)

(v) F1-measure considers both the precision and therecall to compute a balanced score as shown in (9)

119872119877119877 = 1|119876||119876|

sum119894=1

1119903119886119899119896119894

(5)

119860119888119888119906119903119886119888119910119873 = 1|119876||119876|

sum119894=1

119862119894 (119873) (6)

119875119903119890119888119894119904119894119900119899 = 119879119875119879119875 + 119865119875 (7)

119877119890119888119886119897119897 = 119879119875119879119875 + 119865119873 (8)

1198651 = 2 lowast 119875119903119890119888119894119904119894119900119899 lowast 119877119890119888119886119897119897119875119903119890119888119894119904119894119900119899 + 119877119890119888119886119897119897 (9)

43 Results To validate the proposed LSI-based answerranking method we conduct the following two experimentsThe first experiment is to verify the effectiveness of the LSI-based answer ranking method by comparing to five base-lines We adopt Doc2Vec Latent Dirichlet Allocation (LDA)Locality Sensitive Hashing (LSH) docsim and Synonyms[62] as baselines We randomly select 500 questions andmeasure the performance inMRRandAccuracyN (AccN119873 = 1 2 3 4 5) Compared with the baselines our methodachieves the best performance in all evaluation metrics asshown in Table 2 For MRR our method improves by 1780compared to LSH which has the best performance amongbaselines For Acc1 LSH also obtains the best performanceas 06733 among baselines Our method outperformed LSHwith an improvement of 2352 In addition our methodranks 9499 of the correct answers in the top five ofcandidate answers The improvements of MRR and Acc1prove that the proposed method can potentially promote thepositions of correct answers

To assess the stability of the proposed method thesecond experiment is conducted by comparing to the samefive baselines with the measures of MRR and Acc1 Theused dataset is Evaluation dataset-A Figure 7 illustrates theexperiment results measured in MRR while Figure 8 showsthe results measured in Acc1 From the result our methodachieves stable performance over all different sizes of thequestion setsThis result is promising since ourmethod ranksmost of the correct answers in the top of the candidate answerlist Moreover compared to the baselines our method gains

10 Wireless Communications and Mobile Computing

Table 2 Evaluation results compared to baselines

MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499

Our Method

1009080706050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

MRR

LSHDoc2Vec

docsimLDASynonyms

Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure

the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list

Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy

To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments

1009080706

Acc

1

050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

Our MethodLSHDoc2Vec

docsimLDASynonyms

Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure

The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists

We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)

Wireless Communications and Mobile Computing 11

Table 3 The comparison of our answer selection method with baseline methods using different k settings

Setting Methods Accuracy Precision Recall F1

k=5

KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956

RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305

Our method 09222 08859 08657 08753

k=10

KNN 09032 08076 05258 05247RF 09106 07523 07277 07391

GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206

Our method 09651 09415 08559 08929

Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues

Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction

5 Conclusions

Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system

Acc

1000

0975

0950

0925

0900

0875

0850

0825

08002000 3000 4000

The size of training datasets5000 6000

RecallF1Precision

Figure 10 The performance with the increasing size of trainingdatasets when k=5

1000

0975

0950

0925

0900

0875

0850

0825

08004000 5000 6000

The size of training datasets7000 8000

Acc RecallF1Precision

Figure 11 The performance with the increasing size of trainingdatasets when k=10

12 Wireless Communications and Mobile Computing

assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods

Data Availability

The diabetes data is not made publicly available

Conflicts of Interest

There are no conflicts of interest in this paper

Acknowledgments

The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)

References

[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017

[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001

[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013

[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014

[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo

International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016

[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014

[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016

[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017

[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017

[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009

[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016

[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013

[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015

[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017

[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012

[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010

[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006

[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002

[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002

[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999

[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710

[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000

[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007

[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001

[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015

[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010

[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007

Wireless Communications and Mobile Computing 13

[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016

[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005

[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004

[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007

[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012

[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017

[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015

[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007

[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016

[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018

[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018

[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014

[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014

[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015

[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010

[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004

[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009

[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011

[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014

[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015

[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016

[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016

[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016

[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016

[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016

[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014

[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016

[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015

[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017

[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008

[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010

14 Wireless Communications and Mobile Computing

[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016

[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990

[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: A Mobile-Based Question-Answering and Early …downloads.hindawi.com/journals/wcmc/2018/9163160.pdfunable to serve users with clinically relevant answers in a timely manner and thus

Wireless Communications and Mobile Computing 7

Figure 5 The screen snapshots of the mobile-based system for providing diabetes information services using the three QampA models

merged as the final answer candidates Q119888119886119889 for answer rank-ing and answer selection

We propose a LSI-based answer ranking method to re-rank the questions in Q119888119886119889 The ranking method consistsof three steps feature extraction Latent Semantic Indexing(LSI) similarity calculation and ranking The extracted fea-tures of Chinese questions are bag-of-words (BOWs) andCharacter while the features for English questions are bag-of-words feature only The LSI approach takes advantage ofimplicit higher-order semantic structure and matches wordsin queries with words in documents [61] Here we treateach candidate answer as a short document and detect themost relevant answers via the LSI-based method After thatthe candidate answers are reranked based on the similarityvalues and the top 119873 answers as candidate list 119860119873 arereturned

Finally there is an answer selection processThe selectionof a candidate answer as correct or incorrect can be treated asa binary classification task The question and correspondingtop 119873 candidate answers in list 119860119873 are transformed to 119873QA pairs We propose an answer selection approach viaa Logistic Regression (LR) classifier which includes foursteps feature extraction parameter tuning model trainingand answer selecting Using the features similar to the LSI-based approach QA pairs are randomly selected from theLMD-FAQ repository as training data The QA pairs withcorrect answers are labeled as ldquo1rdquo and ldquo0rdquo otherwiseWe thentune the parameter ldquoCrdquo (inverse of regularization strength)to avoid overfittingunderfitting issue After parameter opti-mization the best parameter is applied in the LR classifierwhich then is applied to select the best119873 candidate answerswhere the top 1 is the best answer and the remaining N-1answers in list 119860119873 are relevant answers Figure 5 shows the

screen snapshots of the knowledge-based QampA FAQ-basedQampA and web-based QampA modes

33 Diabetes Data Management with Early Warning Sincediabetes patients and people at high risk usually needlong-term health management we develop a real-time datamanagement module incorporating early warning to achievepatient health self-management

In the data management module users are required toregister their basic information After that the users canlog in to report their recent health data related to six maincharacteristics HbA1c BMI glucose systolic hypertension(hypertension S) diastolic hypertension (hypertension D)and creatinine The health data then are stored in server sidesecurely

With the historical health data the module calculatesand monitors the health status in real time For each of thecharacteristics we set an alarm value according to literaturereview on IDF documents and reports Once the health datahas a dramatic change or the characteristics are close totheir corresponding alarm value ranges the system will auto-matically deliver a warning message to the users about thesituation To evaluate the usability of the system a 2-monthrandomized study is designed Thirty people volunteered asinternal test users to monitor their health condition via theDia-AID system During the test users measure and reportthe data of the six characteristics by themselves Based oneach new data report the system calculates the existing dataand newly submitted data to make a summarization of thehealth condition in real time Table 1 shows the reportedhealth data records by a user Cecil

The system records all the reported health data and gener-ates data change curves automatically For example Figure 6

8 Wireless Communications and Mobile Computing

Table 1 The reported health data records of the user Cecil

Characteristic Value Time Characteristic Value TimeHBA1C 75 2017-11-01 102203 HBA1C 90 2017-11-04 191518GLUCOSE 120 2017-11-01 102203 BMI 220 2017-11-04 191518BMI 221 2017-11-01 102203 GLUCOSE 150 2017-11-04 191518HYPERTENSION S 1110 2017-11-01 102203 HYPERTENSION S 1130 2017-11-04 191518HYPERTENSION D 820 2017-11-01 102203 HYPERTENSION D 840 2017-11-04 191518CREATININE 118 2017-11-01 102203 CREATININE 124 2017-11-04 191518HBA1C 60 2017-11-02 182439 HBA1C 90 2017-11-05 181025BMI 220 2017-11-02 182439 BMI 240 2017-11-05 181025GLUCOSE 100 2017-11-02 182439 GLUCOSE 150 2017-11-05 181025HYPERTENSION S 1100 2017-11-02 182439 HYPERTENSION S 1130 2017-11-05 181025HYPERTENSION D 800 2017-11-02 182439 HYPERTENSION D 840 2017-11-05 181025CREATININE 12 2017-11-02 182439 CREATININE 124 2017-11-05 181025HBA1C 78 2017-11-03 183028 HBA1C 100 2017-11-06 192113BMI 219 2017-11-03 183028 BMI 240 2017-11-06 192113GLUCOSE 120 2017-11-03 183028 GLUCOSE 180 2017-11-06 192113HYPERTENSION S 1110 2017-11-03 183028 HYPERTENSION S 1150 2017-11-06 192113HYPERTENSION D 820 2017-11-03 183028 HYPERTENSION D 860 2017-11-06 192113CREATININE 12 2017-11-03 183028 CREATININE 115 2017-11-06 192113

Figure 6 The health data management with early warning function and health record visualization by time for the user Cecil

Wireless Communications and Mobile Computing 9

shows the trend curve of Cecilrsquos diastolic hypertension in thelast 7 days When the current newly submitted health data iswithin safe range and there is no dramatic change comparedwith last report the system shows the user with the healthstatus messages eg ldquoYour health status is goodrdquo in greencolor Once the system identifies current user data exceedingalarm range (either too high or too low) according to thecurrent change trend the system will evaluate how long ittakes to reach the alarm value The system will evaluate howlong it will take to reach the alarm value If the period is tooshort the system will automatically warn the current userFor example the system warns the user Cecil that diastolichypertension is too high and will be in a danger range after 2days if the user does not have any control on it Through thehealth data management incorporating early warning userscan review their health status and take actions to reduce therisk of diabetes according to the warning messages

4 Results

41 Datasets Since there is no available diabetes FAQ datasetfor evaluation the evaluations of the proposed LSI-basedanswer ranking approach and answer selection methodwere based on the constructed LMD-FAQ repository Totest the LSI-based answer ranking approach we randomlyselected 500 750100012501500 and 1750 Chinese question-answer tuples (question ltanswer-setgt) from the repositoryrespectively as six subdatasets of Evaluation dataset-A Foreach question-answer tuple it contains one question and ananswer set which consists of one correct answer and nineincorrect answers randomly generated from the rest of therepositoryThus each question contains 10 candidate answersfor ranking For answer selection evaluation we suppose eachquestion has 119896 candidate answers ie for each questionk-1 incorrect answers are randomly generated as negativesamples In this paper k is set to 5 and 10 For the setting k=56000 QA pairs are randomly generated as Training dataset-B1 and 2500 QA pairs are randomly generated as Testingdataset-C1 For the setting k=10 8000QA pairs are randomlygenerated as Training dataset-B2 and 5000 QA pairs arerandomly generated as Testing dataset-C2

42 Evaluation Metrics The evaluation metrics includeMean Reciprocal Rank (MRR) AccuracyN of the returnedanswers precision recall and F1 measure all of which arecommonly usedmetrics to evaluate the performance of QampAsystems

(i) MRR Mean Reciprocal Rank of the first correctanswer as shown in (5) (ie 1 if a correct answerwas retrieved at rank 1 05 if a correct answer wasretrieved at rank 2 and so on Q is the test set and|119876| denotes the number of questions in Q rank119894represents the position of the first correct answer inanswer ranking candidates to a test question 119876119894)

(ii) AccuracyN proportion of correct answers in top119873 returned answers by the system as shown in (6)(119862119894(119873) = 1 if there is at least one correct answer in top119873 candidates otherwise it is 0)

(iii) Precision for any of the categories is the number oftrue positives (TP) (ie the number of questions cor-rectly labeled as belonging to the positive categories)that are divided by the total number of questionslabeled as belonging to the positive categories asshown in (7) False positive (FP) is the number ofquestions that the system incorrectly labeled

(iv) Recall is defined as the number of true positivesdivided by the total number of questions that actuallybelong to positive categories (ie the sum of truepositive and false negative) as shown in (8)

(v) F1-measure considers both the precision and therecall to compute a balanced score as shown in (9)

119872119877119877 = 1|119876||119876|

sum119894=1

1119903119886119899119896119894

(5)

119860119888119888119906119903119886119888119910119873 = 1|119876||119876|

sum119894=1

119862119894 (119873) (6)

119875119903119890119888119894119904119894119900119899 = 119879119875119879119875 + 119865119875 (7)

119877119890119888119886119897119897 = 119879119875119879119875 + 119865119873 (8)

1198651 = 2 lowast 119875119903119890119888119894119904119894119900119899 lowast 119877119890119888119886119897119897119875119903119890119888119894119904119894119900119899 + 119877119890119888119886119897119897 (9)

43 Results To validate the proposed LSI-based answerranking method we conduct the following two experimentsThe first experiment is to verify the effectiveness of the LSI-based answer ranking method by comparing to five base-lines We adopt Doc2Vec Latent Dirichlet Allocation (LDA)Locality Sensitive Hashing (LSH) docsim and Synonyms[62] as baselines We randomly select 500 questions andmeasure the performance inMRRandAccuracyN (AccN119873 = 1 2 3 4 5) Compared with the baselines our methodachieves the best performance in all evaluation metrics asshown in Table 2 For MRR our method improves by 1780compared to LSH which has the best performance amongbaselines For Acc1 LSH also obtains the best performanceas 06733 among baselines Our method outperformed LSHwith an improvement of 2352 In addition our methodranks 9499 of the correct answers in the top five ofcandidate answers The improvements of MRR and Acc1prove that the proposed method can potentially promote thepositions of correct answers

To assess the stability of the proposed method thesecond experiment is conducted by comparing to the samefive baselines with the measures of MRR and Acc1 Theused dataset is Evaluation dataset-A Figure 7 illustrates theexperiment results measured in MRR while Figure 8 showsthe results measured in Acc1 From the result our methodachieves stable performance over all different sizes of thequestion setsThis result is promising since ourmethod ranksmost of the correct answers in the top of the candidate answerlist Moreover compared to the baselines our method gains

10 Wireless Communications and Mobile Computing

Table 2 Evaluation results compared to baselines

MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499

Our Method

1009080706050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

MRR

LSHDoc2Vec

docsimLDASynonyms

Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure

the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list

Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy

To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments

1009080706

Acc

1

050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

Our MethodLSHDoc2Vec

docsimLDASynonyms

Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure

The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists

We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)

Wireless Communications and Mobile Computing 11

Table 3 The comparison of our answer selection method with baseline methods using different k settings

Setting Methods Accuracy Precision Recall F1

k=5

KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956

RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305

Our method 09222 08859 08657 08753

k=10

KNN 09032 08076 05258 05247RF 09106 07523 07277 07391

GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206

Our method 09651 09415 08559 08929

Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues

Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction

5 Conclusions

Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system

Acc

1000

0975

0950

0925

0900

0875

0850

0825

08002000 3000 4000

The size of training datasets5000 6000

RecallF1Precision

Figure 10 The performance with the increasing size of trainingdatasets when k=5

1000

0975

0950

0925

0900

0875

0850

0825

08004000 5000 6000

The size of training datasets7000 8000

Acc RecallF1Precision

Figure 11 The performance with the increasing size of trainingdatasets when k=10

12 Wireless Communications and Mobile Computing

assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods

Data Availability

The diabetes data is not made publicly available

Conflicts of Interest

There are no conflicts of interest in this paper

Acknowledgments

The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)

References

[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017

[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001

[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013

[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014

[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo

International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016

[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014

[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016

[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017

[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017

[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009

[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016

[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013

[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015

[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017

[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012

[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010

[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006

[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002

[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002

[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999

[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710

[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000

[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007

[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001

[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015

[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010

[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007

Wireless Communications and Mobile Computing 13

[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016

[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005

[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004

[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007

[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012

[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017

[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015

[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007

[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016

[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018

[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018

[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014

[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014

[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015

[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010

[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004

[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009

[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011

[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014

[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015

[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016

[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016

[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016

[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016

[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016

[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014

[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016

[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015

[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017

[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008

[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010

14 Wireless Communications and Mobile Computing

[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016

[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990

[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: A Mobile-Based Question-Answering and Early …downloads.hindawi.com/journals/wcmc/2018/9163160.pdfunable to serve users with clinically relevant answers in a timely manner and thus

8 Wireless Communications and Mobile Computing

Table 1 The reported health data records of the user Cecil

Characteristic Value Time Characteristic Value TimeHBA1C 75 2017-11-01 102203 HBA1C 90 2017-11-04 191518GLUCOSE 120 2017-11-01 102203 BMI 220 2017-11-04 191518BMI 221 2017-11-01 102203 GLUCOSE 150 2017-11-04 191518HYPERTENSION S 1110 2017-11-01 102203 HYPERTENSION S 1130 2017-11-04 191518HYPERTENSION D 820 2017-11-01 102203 HYPERTENSION D 840 2017-11-04 191518CREATININE 118 2017-11-01 102203 CREATININE 124 2017-11-04 191518HBA1C 60 2017-11-02 182439 HBA1C 90 2017-11-05 181025BMI 220 2017-11-02 182439 BMI 240 2017-11-05 181025GLUCOSE 100 2017-11-02 182439 GLUCOSE 150 2017-11-05 181025HYPERTENSION S 1100 2017-11-02 182439 HYPERTENSION S 1130 2017-11-05 181025HYPERTENSION D 800 2017-11-02 182439 HYPERTENSION D 840 2017-11-05 181025CREATININE 12 2017-11-02 182439 CREATININE 124 2017-11-05 181025HBA1C 78 2017-11-03 183028 HBA1C 100 2017-11-06 192113BMI 219 2017-11-03 183028 BMI 240 2017-11-06 192113GLUCOSE 120 2017-11-03 183028 GLUCOSE 180 2017-11-06 192113HYPERTENSION S 1110 2017-11-03 183028 HYPERTENSION S 1150 2017-11-06 192113HYPERTENSION D 820 2017-11-03 183028 HYPERTENSION D 860 2017-11-06 192113CREATININE 12 2017-11-03 183028 CREATININE 115 2017-11-06 192113

Figure 6 The health data management with early warning function and health record visualization by time for the user Cecil

Wireless Communications and Mobile Computing 9

shows the trend curve of Cecilrsquos diastolic hypertension in thelast 7 days When the current newly submitted health data iswithin safe range and there is no dramatic change comparedwith last report the system shows the user with the healthstatus messages eg ldquoYour health status is goodrdquo in greencolor Once the system identifies current user data exceedingalarm range (either too high or too low) according to thecurrent change trend the system will evaluate how long ittakes to reach the alarm value The system will evaluate howlong it will take to reach the alarm value If the period is tooshort the system will automatically warn the current userFor example the system warns the user Cecil that diastolichypertension is too high and will be in a danger range after 2days if the user does not have any control on it Through thehealth data management incorporating early warning userscan review their health status and take actions to reduce therisk of diabetes according to the warning messages

4 Results

41 Datasets Since there is no available diabetes FAQ datasetfor evaluation the evaluations of the proposed LSI-basedanswer ranking approach and answer selection methodwere based on the constructed LMD-FAQ repository Totest the LSI-based answer ranking approach we randomlyselected 500 750100012501500 and 1750 Chinese question-answer tuples (question ltanswer-setgt) from the repositoryrespectively as six subdatasets of Evaluation dataset-A Foreach question-answer tuple it contains one question and ananswer set which consists of one correct answer and nineincorrect answers randomly generated from the rest of therepositoryThus each question contains 10 candidate answersfor ranking For answer selection evaluation we suppose eachquestion has 119896 candidate answers ie for each questionk-1 incorrect answers are randomly generated as negativesamples In this paper k is set to 5 and 10 For the setting k=56000 QA pairs are randomly generated as Training dataset-B1 and 2500 QA pairs are randomly generated as Testingdataset-C1 For the setting k=10 8000QA pairs are randomlygenerated as Training dataset-B2 and 5000 QA pairs arerandomly generated as Testing dataset-C2

42 Evaluation Metrics The evaluation metrics includeMean Reciprocal Rank (MRR) AccuracyN of the returnedanswers precision recall and F1 measure all of which arecommonly usedmetrics to evaluate the performance of QampAsystems

(i) MRR Mean Reciprocal Rank of the first correctanswer as shown in (5) (ie 1 if a correct answerwas retrieved at rank 1 05 if a correct answer wasretrieved at rank 2 and so on Q is the test set and|119876| denotes the number of questions in Q rank119894represents the position of the first correct answer inanswer ranking candidates to a test question 119876119894)

(ii) AccuracyN proportion of correct answers in top119873 returned answers by the system as shown in (6)(119862119894(119873) = 1 if there is at least one correct answer in top119873 candidates otherwise it is 0)

(iii) Precision for any of the categories is the number oftrue positives (TP) (ie the number of questions cor-rectly labeled as belonging to the positive categories)that are divided by the total number of questionslabeled as belonging to the positive categories asshown in (7) False positive (FP) is the number ofquestions that the system incorrectly labeled

(iv) Recall is defined as the number of true positivesdivided by the total number of questions that actuallybelong to positive categories (ie the sum of truepositive and false negative) as shown in (8)

(v) F1-measure considers both the precision and therecall to compute a balanced score as shown in (9)

119872119877119877 = 1|119876||119876|

sum119894=1

1119903119886119899119896119894

(5)

119860119888119888119906119903119886119888119910119873 = 1|119876||119876|

sum119894=1

119862119894 (119873) (6)

119875119903119890119888119894119904119894119900119899 = 119879119875119879119875 + 119865119875 (7)

119877119890119888119886119897119897 = 119879119875119879119875 + 119865119873 (8)

1198651 = 2 lowast 119875119903119890119888119894119904119894119900119899 lowast 119877119890119888119886119897119897119875119903119890119888119894119904119894119900119899 + 119877119890119888119886119897119897 (9)

43 Results To validate the proposed LSI-based answerranking method we conduct the following two experimentsThe first experiment is to verify the effectiveness of the LSI-based answer ranking method by comparing to five base-lines We adopt Doc2Vec Latent Dirichlet Allocation (LDA)Locality Sensitive Hashing (LSH) docsim and Synonyms[62] as baselines We randomly select 500 questions andmeasure the performance inMRRandAccuracyN (AccN119873 = 1 2 3 4 5) Compared with the baselines our methodachieves the best performance in all evaluation metrics asshown in Table 2 For MRR our method improves by 1780compared to LSH which has the best performance amongbaselines For Acc1 LSH also obtains the best performanceas 06733 among baselines Our method outperformed LSHwith an improvement of 2352 In addition our methodranks 9499 of the correct answers in the top five ofcandidate answers The improvements of MRR and Acc1prove that the proposed method can potentially promote thepositions of correct answers

To assess the stability of the proposed method thesecond experiment is conducted by comparing to the samefive baselines with the measures of MRR and Acc1 Theused dataset is Evaluation dataset-A Figure 7 illustrates theexperiment results measured in MRR while Figure 8 showsthe results measured in Acc1 From the result our methodachieves stable performance over all different sizes of thequestion setsThis result is promising since ourmethod ranksmost of the correct answers in the top of the candidate answerlist Moreover compared to the baselines our method gains

10 Wireless Communications and Mobile Computing

Table 2 Evaluation results compared to baselines

MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499

Our Method

1009080706050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

MRR

LSHDoc2Vec

docsimLDASynonyms

Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure

the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list

Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy

To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments

1009080706

Acc

1

050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

Our MethodLSHDoc2Vec

docsimLDASynonyms

Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure

The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists

We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)

Wireless Communications and Mobile Computing 11

Table 3 The comparison of our answer selection method with baseline methods using different k settings

Setting Methods Accuracy Precision Recall F1

k=5

KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956

RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305

Our method 09222 08859 08657 08753

k=10

KNN 09032 08076 05258 05247RF 09106 07523 07277 07391

GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206

Our method 09651 09415 08559 08929

Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues

Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction

5 Conclusions

Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system

Acc

1000

0975

0950

0925

0900

0875

0850

0825

08002000 3000 4000

The size of training datasets5000 6000

RecallF1Precision

Figure 10 The performance with the increasing size of trainingdatasets when k=5

1000

0975

0950

0925

0900

0875

0850

0825

08004000 5000 6000

The size of training datasets7000 8000

Acc RecallF1Precision

Figure 11 The performance with the increasing size of trainingdatasets when k=10

12 Wireless Communications and Mobile Computing

assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods

Data Availability

The diabetes data is not made publicly available

Conflicts of Interest

There are no conflicts of interest in this paper

Acknowledgments

The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)

References

[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017

[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001

[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013

[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014

[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo

International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016

[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014

[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016

[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017

[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017

[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009

[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016

[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013

[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015

[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017

[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012

[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010

[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006

[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002

[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002

[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999

[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710

[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000

[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007

[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001

[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015

[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010

[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007

Wireless Communications and Mobile Computing 13

[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016

[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005

[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004

[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007

[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012

[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017

[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015

[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007

[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016

[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018

[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018

[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014

[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014

[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015

[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010

[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004

[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009

[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011

[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014

[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015

[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016

[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016

[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016

[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016

[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016

[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014

[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016

[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015

[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017

[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008

[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010

14 Wireless Communications and Mobile Computing

[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016

[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990

[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: A Mobile-Based Question-Answering and Early …downloads.hindawi.com/journals/wcmc/2018/9163160.pdfunable to serve users with clinically relevant answers in a timely manner and thus

Wireless Communications and Mobile Computing 9

shows the trend curve of Cecilrsquos diastolic hypertension in thelast 7 days When the current newly submitted health data iswithin safe range and there is no dramatic change comparedwith last report the system shows the user with the healthstatus messages eg ldquoYour health status is goodrdquo in greencolor Once the system identifies current user data exceedingalarm range (either too high or too low) according to thecurrent change trend the system will evaluate how long ittakes to reach the alarm value The system will evaluate howlong it will take to reach the alarm value If the period is tooshort the system will automatically warn the current userFor example the system warns the user Cecil that diastolichypertension is too high and will be in a danger range after 2days if the user does not have any control on it Through thehealth data management incorporating early warning userscan review their health status and take actions to reduce therisk of diabetes according to the warning messages

4 Results

41 Datasets Since there is no available diabetes FAQ datasetfor evaluation the evaluations of the proposed LSI-basedanswer ranking approach and answer selection methodwere based on the constructed LMD-FAQ repository Totest the LSI-based answer ranking approach we randomlyselected 500 750100012501500 and 1750 Chinese question-answer tuples (question ltanswer-setgt) from the repositoryrespectively as six subdatasets of Evaluation dataset-A Foreach question-answer tuple it contains one question and ananswer set which consists of one correct answer and nineincorrect answers randomly generated from the rest of therepositoryThus each question contains 10 candidate answersfor ranking For answer selection evaluation we suppose eachquestion has 119896 candidate answers ie for each questionk-1 incorrect answers are randomly generated as negativesamples In this paper k is set to 5 and 10 For the setting k=56000 QA pairs are randomly generated as Training dataset-B1 and 2500 QA pairs are randomly generated as Testingdataset-C1 For the setting k=10 8000QA pairs are randomlygenerated as Training dataset-B2 and 5000 QA pairs arerandomly generated as Testing dataset-C2

42 Evaluation Metrics The evaluation metrics includeMean Reciprocal Rank (MRR) AccuracyN of the returnedanswers precision recall and F1 measure all of which arecommonly usedmetrics to evaluate the performance of QampAsystems

(i) MRR Mean Reciprocal Rank of the first correctanswer as shown in (5) (ie 1 if a correct answerwas retrieved at rank 1 05 if a correct answer wasretrieved at rank 2 and so on Q is the test set and|119876| denotes the number of questions in Q rank119894represents the position of the first correct answer inanswer ranking candidates to a test question 119876119894)

(ii) AccuracyN proportion of correct answers in top119873 returned answers by the system as shown in (6)(119862119894(119873) = 1 if there is at least one correct answer in top119873 candidates otherwise it is 0)

(iii) Precision for any of the categories is the number oftrue positives (TP) (ie the number of questions cor-rectly labeled as belonging to the positive categories)that are divided by the total number of questionslabeled as belonging to the positive categories asshown in (7) False positive (FP) is the number ofquestions that the system incorrectly labeled

(iv) Recall is defined as the number of true positivesdivided by the total number of questions that actuallybelong to positive categories (ie the sum of truepositive and false negative) as shown in (8)

(v) F1-measure considers both the precision and therecall to compute a balanced score as shown in (9)

119872119877119877 = 1|119876||119876|

sum119894=1

1119903119886119899119896119894

(5)

119860119888119888119906119903119886119888119910119873 = 1|119876||119876|

sum119894=1

119862119894 (119873) (6)

119875119903119890119888119894119904119894119900119899 = 119879119875119879119875 + 119865119875 (7)

119877119890119888119886119897119897 = 119879119875119879119875 + 119865119873 (8)

1198651 = 2 lowast 119875119903119890119888119894119904119894119900119899 lowast 119877119890119888119886119897119897119875119903119890119888119894119904119894119900119899 + 119877119890119888119886119897119897 (9)

43 Results To validate the proposed LSI-based answerranking method we conduct the following two experimentsThe first experiment is to verify the effectiveness of the LSI-based answer ranking method by comparing to five base-lines We adopt Doc2Vec Latent Dirichlet Allocation (LDA)Locality Sensitive Hashing (LSH) docsim and Synonyms[62] as baselines We randomly select 500 questions andmeasure the performance inMRRandAccuracyN (AccN119873 = 1 2 3 4 5) Compared with the baselines our methodachieves the best performance in all evaluation metrics asshown in Table 2 For MRR our method improves by 1780compared to LSH which has the best performance amongbaselines For Acc1 LSH also obtains the best performanceas 06733 among baselines Our method outperformed LSHwith an improvement of 2352 In addition our methodranks 9499 of the correct answers in the top five ofcandidate answers The improvements of MRR and Acc1prove that the proposed method can potentially promote thepositions of correct answers

To assess the stability of the proposed method thesecond experiment is conducted by comparing to the samefive baselines with the measures of MRR and Acc1 Theused dataset is Evaluation dataset-A Figure 7 illustrates theexperiment results measured in MRR while Figure 8 showsthe results measured in Acc1 From the result our methodachieves stable performance over all different sizes of thequestion setsThis result is promising since ourmethod ranksmost of the correct answers in the top of the candidate answerlist Moreover compared to the baselines our method gains

10 Wireless Communications and Mobile Computing

Table 2 Evaluation results compared to baselines

MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499

Our Method

1009080706050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

MRR

LSHDoc2Vec

docsimLDASynonyms

Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure

the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list

Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy

To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments

1009080706

Acc

1

050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

Our MethodLSHDoc2Vec

docsimLDASynonyms

Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure

The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists

We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)

Wireless Communications and Mobile Computing 11

Table 3 The comparison of our answer selection method with baseline methods using different k settings

Setting Methods Accuracy Precision Recall F1

k=5

KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956

RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305

Our method 09222 08859 08657 08753

k=10

KNN 09032 08076 05258 05247RF 09106 07523 07277 07391

GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206

Our method 09651 09415 08559 08929

Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues

Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction

5 Conclusions

Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system

Acc

1000

0975

0950

0925

0900

0875

0850

0825

08002000 3000 4000

The size of training datasets5000 6000

RecallF1Precision

Figure 10 The performance with the increasing size of trainingdatasets when k=5

1000

0975

0950

0925

0900

0875

0850

0825

08004000 5000 6000

The size of training datasets7000 8000

Acc RecallF1Precision

Figure 11 The performance with the increasing size of trainingdatasets when k=10

12 Wireless Communications and Mobile Computing

assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods

Data Availability

The diabetes data is not made publicly available

Conflicts of Interest

There are no conflicts of interest in this paper

Acknowledgments

The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)

References

[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017

[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001

[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013

[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014

[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo

International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016

[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014

[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016

[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017

[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017

[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009

[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016

[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013

[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015

[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017

[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012

[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010

[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006

[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002

[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002

[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999

[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710

[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000

[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007

[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001

[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015

[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010

[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007

Wireless Communications and Mobile Computing 13

[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016

[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005

[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004

[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007

[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012

[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017

[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015

[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007

[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016

[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018

[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018

[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014

[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014

[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015

[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010

[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004

[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009

[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011

[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014

[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015

[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016

[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016

[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016

[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016

[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016

[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014

[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016

[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015

[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017

[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008

[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010

14 Wireless Communications and Mobile Computing

[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016

[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990

[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: A Mobile-Based Question-Answering and Early …downloads.hindawi.com/journals/wcmc/2018/9163160.pdfunable to serve users with clinically relevant answers in a timely manner and thus

10 Wireless Communications and Mobile Computing

Table 2 Evaluation results compared to baselines

MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499

Our Method

1009080706050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

MRR

LSHDoc2Vec

docsimLDASynonyms

Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure

the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list

Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy

To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments

1009080706

Acc

1

050403020100

500 750 1000 1250 of question-answer tuples

1500 1750

Our MethodLSHDoc2Vec

docsimLDASynonyms

Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure

The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists

We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)

Wireless Communications and Mobile Computing 11

Table 3 The comparison of our answer selection method with baseline methods using different k settings

Setting Methods Accuracy Precision Recall F1

k=5

KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956

RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305

Our method 09222 08859 08657 08753

k=10

KNN 09032 08076 05258 05247RF 09106 07523 07277 07391

GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206

Our method 09651 09415 08559 08929

Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues

Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction

5 Conclusions

Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system

Acc

1000

0975

0950

0925

0900

0875

0850

0825

08002000 3000 4000

The size of training datasets5000 6000

RecallF1Precision

Figure 10 The performance with the increasing size of trainingdatasets when k=5

1000

0975

0950

0925

0900

0875

0850

0825

08004000 5000 6000

The size of training datasets7000 8000

Acc RecallF1Precision

Figure 11 The performance with the increasing size of trainingdatasets when k=10

12 Wireless Communications and Mobile Computing

assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods

Data Availability

The diabetes data is not made publicly available

Conflicts of Interest

There are no conflicts of interest in this paper

Acknowledgments

The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)

References

[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017

[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001

[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013

[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014

[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo

International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016

[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014

[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016

[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017

[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017

[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009

[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016

[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013

[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015

[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017

[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012

[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010

[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006

[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002

[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002

[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999

[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710

[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000

[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007

[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001

[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015

[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010

[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007

Wireless Communications and Mobile Computing 13

[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016

[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005

[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004

[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007

[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012

[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017

[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015

[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007

[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016

[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018

[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018

[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014

[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014

[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015

[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010

[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004

[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009

[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011

[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014

[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015

[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016

[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016

[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016

[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016

[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016

[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014

[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016

[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015

[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017

[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008

[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010

14 Wireless Communications and Mobile Computing

[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016

[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990

[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 11: A Mobile-Based Question-Answering and Early …downloads.hindawi.com/journals/wcmc/2018/9163160.pdfunable to serve users with clinically relevant answers in a timely manner and thus

Wireless Communications and Mobile Computing 11

Table 3 The comparison of our answer selection method with baseline methods using different k settings

Setting Methods Accuracy Precision Recall F1

k=5

KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956

RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305

Our method 09222 08859 08657 08753

k=10

KNN 09032 08076 05258 05247RF 09106 07523 07277 07391

GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206

Our method 09651 09415 08559 08929

Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues

Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction

5 Conclusions

Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system

Acc

1000

0975

0950

0925

0900

0875

0850

0825

08002000 3000 4000

The size of training datasets5000 6000

RecallF1Precision

Figure 10 The performance with the increasing size of trainingdatasets when k=5

1000

0975

0950

0925

0900

0875

0850

0825

08004000 5000 6000

The size of training datasets7000 8000

Acc RecallF1Precision

Figure 11 The performance with the increasing size of trainingdatasets when k=10

12 Wireless Communications and Mobile Computing

assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods

Data Availability

The diabetes data is not made publicly available

Conflicts of Interest

There are no conflicts of interest in this paper

Acknowledgments

The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)

References

[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017

[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001

[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013

[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014

[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo

International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016

[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014

[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016

[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017

[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017

[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009

[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016

[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013

[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015

[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017

[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012

[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010

[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006

[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002

[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002

[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999

[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710

[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000

[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007

[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001

[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015

[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010

[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007

Wireless Communications and Mobile Computing 13

[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016

[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005

[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004

[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007

[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012

[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017

[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015

[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007

[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016

[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018

[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018

[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014

[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014

[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015

[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010

[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004

[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009

[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011

[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014

[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015

[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016

[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016

[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016

[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016

[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016

[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014

[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016

[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015

[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017

[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008

[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010

14 Wireless Communications and Mobile Computing

[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016

[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990

[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 12: A Mobile-Based Question-Answering and Early …downloads.hindawi.com/journals/wcmc/2018/9163160.pdfunable to serve users with clinically relevant answers in a timely manner and thus

12 Wireless Communications and Mobile Computing

assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods

Data Availability

The diabetes data is not made publicly available

Conflicts of Interest

There are no conflicts of interest in this paper

Acknowledgments

The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)

References

[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017

[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001

[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013

[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014

[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo

International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016

[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014

[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016

[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017

[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017

[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009

[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016

[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013

[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015

[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017

[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012

[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010

[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006

[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002

[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002

[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999

[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710

[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000

[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007

[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001

[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015

[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010

[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007

Wireless Communications and Mobile Computing 13

[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016

[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005

[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004

[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007

[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012

[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017

[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015

[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007

[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016

[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018

[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018

[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014

[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014

[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015

[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010

[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004

[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009

[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011

[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014

[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015

[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016

[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016

[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016

[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016

[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016

[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014

[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016

[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015

[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017

[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008

[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010

14 Wireless Communications and Mobile Computing

[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016

[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990

[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 13: A Mobile-Based Question-Answering and Early …downloads.hindawi.com/journals/wcmc/2018/9163160.pdfunable to serve users with clinically relevant answers in a timely manner and thus

Wireless Communications and Mobile Computing 13

[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016

[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005

[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004

[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007

[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012

[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017

[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015

[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007

[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016

[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018

[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018

[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014

[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014

[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015

[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010

[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004

[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009

[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011

[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014

[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015

[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016

[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016

[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016

[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016

[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016

[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014

[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016

[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015

[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017

[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008

[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010

14 Wireless Communications and Mobile Computing

[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016

[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990

[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 14: A Mobile-Based Question-Answering and Early …downloads.hindawi.com/journals/wcmc/2018/9163160.pdfunable to serve users with clinically relevant answers in a timely manner and thus

14 Wireless Communications and Mobile Computing

[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016

[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990

[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 15: A Mobile-Based Question-Answering and Early …downloads.hindawi.com/journals/wcmc/2018/9163160.pdfunable to serve users with clinically relevant answers in a timely manner and thus

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom