Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Research ArticleA Mobile-Based Question-Answering and Early Warning Systemfor Assisting Diabetes Management
Wenxiu Xie 1 Ruoyao Ding 1 Jun Yan 2 and Yingying Qu 3
1School of Information Science and Technology Guangdong University of Foreign Studies Guangzhou China2AI Lab Yidu Cloud (Beijing) Technology Co Ltd Beijing China3School of Business Guangdong University of Foreign Studies Guangzhou China
Correspondence should be addressed to Yingying Qu yingyinqu2gmailcom
Received 27 March 2018 Accepted 26 April 2018 Published 6 June 2018
Academic Editor Haoran Xie
Copyright copy 2018 Wenxiu Xie et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
With increasing demand for preventive management of chronic diseases in real time by using the Internet interest in developing aconvenient device on healthmanagement andmonitoring has intensified Unlike other chronic diseases diabetes particularly type 2is a lifelong chronic disease and usually requires daily health management by patients themselvesThis study is to develop amobile-based diabetes question-answering (QampA) and early warning system namedDia-AID assisting diabetes patients and populations athigh riskTheDia-AID system consists of three modules a large-scale multilanguage diabetes frequently asked question repositoryamultimode fusionQampA framework and a health datamanagementmodule A list of services including risk assessment and healthearly warning is provided to users for health conditionmonitoring Using the diabetes frequently asked question repository as dataexperiments are conducted on answer ranking and answer selection aspects Results show that two essential methods in the systemoutperform baseline methods on both aspects
1 Introduction
With the increasing attention of ubiquitous healthcare (U-healthcare) services and the developing of information tech-nology there has been a great need for preventive man-agement of chronic diseases and management of individ-ual health conditions [1] Diabetes mellitus aka diabetesas one of the most representative chronic diseases hasbecome a serious global public health issue and the mostchallenging health problem in the 21st century [2ndash4] Thestatistics of the number of diabetes patients 20-79 years ofage in the past 18 years are shown in Figure 1 accordingto the latest global estimation from the International Dia-betes Federation (IDF) and the Research2guidance report(httpwwwresearch2guidancecom) Compared to 151 mil-lion in 2000 there is nearly a threefold increase in thenumber of adults living with diabetes mellitus Moreover thenumber is expected to increase from 425 million in 2017 to629 million in 2045 which means that one out of 11 adultswill suffer from diabetes [5] In addition as reported by theWorld Health Organization (WHO) diabetes was the direct
cause of 16 million deaths in 2015 However nearly 50 ofdiabetes patients are undiagnosed and remain unaware oftheir conditions Among the patient population the majorityof diabetes cases are type 2 diabetes mellitus (T2DM) [6]Unlike type 1 diabetes mellitus which remains unpreventablewith current knowledge 80 of type 2 diabetes mellitus canbe prevented by keeping moderate blood sugar and lifestyle[7] People with diabetes type 2 frequently need counselingon healthy diet and regular physical activity to reduce the riskof complication [8] Thus diabetes management is a crucialand necessary procedure for diabetes patients or people at ahigh diabetes risk [9ndash11]
Recently the focus of healthcare is shifting from treat-ment to prevention and early diagnosis of disease [12] Foxet al [13] addressed that 31 of US smartphone ownersused their phones to search for medical information online30 of Internet users consulted online reviews of rankingsof healthcare services or treatments and 26 of Internetusers read other peoplersquos experiences about health or medicalissues By 2015 nearly 500 million smartphone users usedmobile health applications especially for diet and disease
HindawiWireless Communications and Mobile ComputingVolume 2018 Article ID 9163160 14 pageshttpsdoiorg10115520189163160
2 Wireless Communications and Mobile Computing
151194
246285
366 382415 425
050
100150200250300350400450500
Adul
ts w
ith D
iabe
tes (
mill
ions
)
2003 2006 2009 2011 2013 2015 20172000Year
Figure 1 Total number of adults with diabetes (20-79 years old)around the world during 2000 to 2017
management [7] Later Krebs et al [14] showed that 5823(9341604) of mobile phone users downloaded a health-related mobile app and used it at least once per day Asa convenient platform for checking usersrsquo health status ona real-time basis mobile applications have been developedfrom information provision to lifestyle-oriented smart healthmanagement Moreover existing research presented thatcontinuous real-time consulting and monitoring supportedby smartphones is applicable for improving efficiency ofdiabetes self-management [7 15ndash17] Therefore developinga mobile-based system for diabetes patients to assist in theirhealth management is of great importance
Many studies have shown that current medical searchengines eg PubMed Medical Subject Headings (MeSH)and Unified Medical Language System (UMLS) are oftenunable to serve users with clinically relevant answers in atimely manner and thus fail to satisfy patientsrsquo counselingneed [18 19] Hersh et al [20] found that a healthcareprofessional tookmore than 30minutes on average to seek ananswer utilizing information retrieval systems The processneeds about 2 minutes on average to obtain an answereven for experienced doctors [21] Instead based on naturallanguage processing techniques question answering (QampA)aims to provide users with direct precise answers to theirquestions and thus it is more preferred Hence there isan increasing demand to develop convenient and effectivequestion-answering systems for the medical domain [21ndash24] Moreover there is a particularly growing demand ofQampA systems for effectively and efficiently assisting diabetespatients to better utilize ever-accumulating expert knowledge[1 7 15]
To that end this study aims to develop a mobile-basedquestion-answering and early warning system called Dia-AID The system consists of 3 modules a large-scale multi-language diabetes FAQ repository a multimode fusion QampAframework and a health datamanagementmodule with earlywarning functionThe repository captures diabetes questionswith expert-defined answers and stores the question-answerknowledge in an interpretable and extendible form Theframework contains three different QampA resolution strate-gies knowledge-based QampA FAQ-based QampA and web-basedQampAThe health datamanagementmodule containingearly warning provides a convenient counseling service on a
smart health platform to assist diabetes patients in monitor-ing their health conditions
The contributions of this work include the following(1) a large-scale multilanguage diabetes FAQ repository isbuilt with a consistent representation format (2) a novelmultimode fusion QampA framework that integrates threemodes of QampA technologies is proposed to fulfill diabetesinformation seeking need (3) a health data managementmodule containing early warning function is developed tomonitor patient health status
The rest of this paper is organized as follows Section 2introduces related work in biomedical question answeringSection 3 describes themobile-based question-answering andearly warning system Dia-AID in detail Section 4 presentsthe experiment results of ourmethods based on the FAQ datarepository Section 5 addresses the conclusions
2 Related Work
The aim of question answering is to provide precise answersinstead of relevant documents from unstructured datasources to inquirers The research of open domain questionanswering (QampA) started from the prompt and instantiatedwork in the Text REtrieval Conference (TREC) evaluationcampaign [25] Recently with the increasing demand ofdomain-specific applications a growing interest has shiftedfrom open domain QampA to restricted domain QampA [26 27]Molla et al [28] addressed that restricted domain QampA tar-geting domain-specific information was expected to achieveeffective and reliable performance in real-world applicationsFurther as claimed by Mishra et al [29] restricted domainQampA could fulfill the specialized information requirementsof domain experts therefore improving the satisfaction ofusers Similarly Yu et al [30] and Rinaldi et al [31] notedthat restricted domain QampA such as biomedical domain [2432] could exploit domain-specific knowledge resources fordeeper text analysis as well as taking advantage of domain-specific typology formatting conventions to improve theanswer extraction performance
In light of Athenikos et alrsquos research [27] medical domainquestion answering was facing the challenge of highly com-plex domain-specific terminology and lexical and ontologicalresources Also emphasized by Abacha et al [33] the keyprocess was to translate the semantic relations expressed inquestions into a machine-readable representation to analyzethe natural language questions deeply and efficiently Theypresented a complete question analysis approach includingmedical entity recognition semantic relation extraction andautomatic translation to SPARQL queries Result presentedthat 60 of the questions were correctly translated toSPARQL queries via the proposed method Later Anca [34]proposed the GFMed for dealing with the same problem andthe challenge of querying a large number of LinkedData fromvarious domains GFMed was a QampA system for biomedicalinterlinked data aiming to fill the gap between end users andformal languages by introducing a grammatical framework totranslate biomedical information in natural language to thecorresponding SPARQL language The experimental resultsdemonstrated that the proposed methodology for building
Wireless Communications and Mobile Computing 3
Controlled Nature Language for querying Linked Data wasvalid Abacha et al [35] proposed an approach for ldquoAnswerSearchrdquo based on semantic search and query relaxation toresolve the problem of automatic QampA in medical domainThey defined question focuses as medical entities that werethe most closely linked to answers to improve the overallperformance of question answering Terol et al [36] claimeda general QampA system that was capable of working over anyrestricted domain Taking medical domain as an exampleapplication their system answeredmedical questions accord-ing to a generic question taxonomy and gained 944 overallprecision on the task
During question-answering process question represen-tation is an essential step in question analysis and answerretrieval Zhang et al [37] proposed a system based onmultilayer self-organizing map providing an efficient solu-tion to the organization problem of structured data of elec-tronic books A tree-structured representation was proposedto formulate the rich features of an e-book author Theirexperiment results corroborated that the proposed modelsbased on the tree-structured representation outperformedcontent-based models Later in their further research anefficient learning framework Tree2Vector for transformingtree-structured data into vectorial representations was pro-posed [38] By utilizing Tree2Vector framework to map tree-structured book data into vectorial space their continuedexperiments further presented that the mapped vectorialspace could explore term spatial distributions over a bookrather than the traditional documentmodelingmethods [39]
A recent trend among medical QampA systems was toincorporate the organized medical information throughoutQampA process in order to utilize the information for efficienthealthmanagement in various areas such asU-healthcare [4041] Jung et al [42] developed a decision supporting methodmainly for pain management for chronic disease patientsbased on frequent pattern treeminingThe proposedmethodaimed to reduce time and expenses for pain decision-makingof patients who were frequently exposed to pain Chunget al [12] presented a knowledge-based health service byleveraging a hybridwireless fidelity peer-to-peer architectureThe service was proposed to provide patients with efficientand economical healthcare through correct measurement ofvarious biosignals so that users could easily predict andmanage both health and disease Han et al [43] introduced aU-health service system THE MUSS focusing on achievingreusability and resolvability to provide stress and weightmanagement services
In subsequent medical QampA developments diabetesmellitus as one of the top three major worldwide causesof death from noncommunicable diseases has promptednumerous researches investigating the prevention preva-lence and mortality of diabetes mellitus [15 44ndash53] Thereis a great demand for a QampA system that can effectivelyand efficiently provide health consulting services and assistpeople in monitoring and managing their individual healthconditions Jung et al [7] explored a mobile healthcare appli-cation for providing self-diabetes management to patientsBy interoperating with Electronic Medical Record (EMR)the healthcare application provided services such as weight
management cardio-cerebrovascular risk evaluation andexercise management Waki et al [54] developed a real-time interactive system DialBetics to achieve diabetes self-management particularly HbA1c management By an eval-uation strategy the system helped patients improve theirHbA1c significantly by monitoring health data comparedwith continuing self-care regimen patients More recentlyYoo et al [1] proposed a Personal Health Record- (PHR-)based diabetes index service model through a mobile deviceoffering users a management information service for pre-venting diabetes Users were able to check their healthconditions on a real-time basis and receive informationabout desirable health behaviors and dietary habits related todiabetes
Yet the existing diabetes management applications pro-vided general information search and management whileignoring counseling services which were crucial for man-aging the health condition of diabetes patients Besidesas claimed by Mishra et al [29] the cons of restricteddomain QampA included the limited repository of domain-specific questions To overcome these difficulties we builta LMD-FAQ repository to provide users with concise andaccurate answers by physicians or experts from debatesrelated professional websites Moreover we aim to leveragethe LMD-FAQ repository to provide counseling services ofdiet medication and symptoms for diabetes patients Inaddition based on our previous work [55] by analyzingglobal clinical trials of 190 countries provided by the NationalInstitutes of Health (NIH) we discovered 6 representativehealth characteristics that were closely related to diabetesmellitus to better manage usersrsquo health conditions The sixrepresentative health characteristics were Body Mass Index(BMI) Glucose Systolic Hypertension Diastolic Hyperten-sion HbA1c and creatinine Further we defined severalearly warning intervals of health characteristics by referringto the existing international medical standards for healthmanagement and risk warning
3 Methods and Materials
The architecture of our mobile-based diabetes question-answering and early warning system Dia-AID is shown inFigure 2 It consists of 3 modules a large-scale multilanguagediabetes FAQ repository (LMD-FAQ repository) a novelmultimode fusion question-answering framework (MMF-QA) and a diabetes data management module with earlywarning (DM-EW) The LMD-FAQ repository contains alarge number of diabetes question-answer pairs acquiredfrom mainstream diabetes-related professional websites TheMMF-QA framework integrates three strategies knowledge-based QampA FAQ-based QampA and web-based QampA TheDM-EW module records patientsrsquo health data and monitorstheir health conditions in real time Six representative healthcharacteristics that are closely related to diabetes mellitusthat is BMI glucose systolic hypertension diastolic hyper-tension HbA1c and creatinine are applied In case of a rapidcharacteristic change or a predication of deterioration themodule will automatically warn patients and provide themwith dietary guidelines
4 Wireless Communications and Mobile Computing
Dia-AID
LMD-FAQ Repository
Automatic Question Acquisition
Question Target Identification and
Classification
Semantic Pattern Generation and Matching
Terminals
1 A Large-scale Multi-language Diabetes FAQ Repository
2 A Multi-mode FusionQampA Framework
3 Data Management with Early Warning Function
Computer
Smart Device
Mobile Phone
Knowledge-based QampA
FAQ-based QampA
Web-based QampA
Personal Characteristic Health Data
Heath Data Monitoring
Risk Warning
QSem-based Question Matching
LSI-based Answer Ranking
Risk Evaluation
Answer Selection
User profiles
Figure 2 The architecture of the diabetes question-answering and early warning system Dia-AID
31 The Large-Scale Multilanguage Diabetes FAQ RepositoryFrequently Asked Questions (FAQs) provide specific answersto the questions that are frequently asked when users browsespecific websites For example the website Health China(httphealthchinacom) allows users to ask questions infree text and those who are experts in the field answer thequestions freely These questions with professional answersare collected and organized as FAQ data The FAQ datacan dramatically benefit question answering by reusingthe accumulated professional knowledge In this paper wedevelop a method to automatically construct a large-scalemultilanguage diabetes FAQ (LMD-FAQ) repository throughidentifying FAQ data from professional diabetes websites
As illustrated in Figure 2 our method includes four steps(1) The first step is automatic question acquisition We firstanalyze the page structures of specific websites to identifydiabetes questions The websites elaborately selected bydomain experts include Diabetes Clinical Guidelines (Chi-neseMedical Association Diabetes Branch) professional dia-betes websites (International Diabetes Union the AmericanDiabetes Association etc) diabetes professional information
websites (CDC Health Channel) and diabetes interactivequestion-answering websites (Yahoo Knowledge)The ques-tions and associated answers are then extracted using regularexpression matching with the page codes (2) The secondstep is question target identification and classification Basedon our previous work [56 57] an automated answer typeidentification and classification method is applied to extractthe target and intent of questions by utilizing both syntacticand semantic analysis Considering that syntactic structuresvary according to the ways questions are asked four typicalsituations are identified and analyzed with each of themhaving a specific processing strategy During the processquestion target features are extracted via a principle-basedsyntactic parser and then expanded with their hypernymyfeatures and semantic labels Finally the expanded featuresare sent to a trained classifier to predict corresponding answertypes (3) The third step is semantic pattern generation andmatching Semantic pattern is utilized to index the questionswith answers in a more structured and semantic way Withquestion target and answer type extracted by the secondstep the questions are represented by a structural semantic
Wireless Communications and Mobile Computing 5
Figure 3 The visualization of example FAQ data in the LMD-FAQ repository
pattern which consists of five components the questiontarget question type concept event and constraint Anentropy-based method proposed in previous work [58] isapplied for automated semantic pattern generation Figure 3shows the visualization of example FAQ data in the LMD-FAQ repository
Based on the above procedure the method extractsFAQ data from professional websites formats them using aconsistent representation and indexes them with semanticpatterns for fast retrievalThrough the automatic process andthe human review on the indexed data the FAQ repositorycan be incrementally maintained Currently the LMD-FAQrepository comprises 19317 English frequently asked QApairs and 6041 Chinese QA pairs The repository providesour QampA system with fundamental data support for answer-ing commonly posted questions
32 The Multimode Fusion Question-Answering Frame-work Themultimode fusion question-answering framework(MMF-QA) integrates three QampA models knowledge-basedQampA FAQ-based QampA and web-based QampA The overallframework is shown in Figure 4The procedure of themodelsis described as follows
The knowledge-based QampA model relies on a diabetesknowledge base to generate concise answers for postedquestions For a new given question the model analyzes thestructure and keywords of the question and then generates
a corresponding semantic pattern Thus the question istransformed from natural language to a structural semanticrepresentation that captures semantic information such asquestion target question type concept event and con-straint The question then is further represented as a tuple([Concept1] Relation [Concept2]) in which ldquoConcept1rdquo andldquoConcept2rdquo are used to label meaningful entities The repre-sented question is used for answer extraction fromknowledgebase For instance ldquoWhatrsquos the symptoms of diabetesrdquo isrepresented as ([symptoms] Rel of [diabetes]) Thereforethe knowledge-based QampA process mainly maps entitiesand their relations to formally represented tuples which arefurther used to match knowledge base to retrieve accuratelymatched knowledge elements as answers
The FAQ-based QampA model computes matching scoresbetween a given question and questions in the FAQ reposi-toryThe questionswithmatching scores larger than a specificthreshold are kept as candidates The candidate questionsthen are ranked and the top 119896 questions with the highestscores are returned The model consists of three main stepsQsem-based question matching LSI-based answer rankingand answer selection As claimed by [59] a major challengeof FAQ-based QampA is to match questions to correspond-ing question-answer pairs Here we apply a QSem-basedquestion matching framework proposed in one of ourprevious works [60] to support answering FAQs throughreusing accumulated QA data The framework considers
6 Wireless Communications and Mobile Computing
User Questions
Multi-mode Fusion QampAFramework
Question Target Identification and Classification
Semantic Pattern-based Question Representation
Question Analysis
Knowledge Base
LMD-FAQ Repository
WebPreprocessed
questionsTop N
Answers
Semantic Pattern Generation
Semantic Pattern-based Question Representation
Knowledge-based QampA
Qsem-based Question Matching
LSI-based Answer Ranking
FAQ-based QampA
Answer Selection
Qsem-based Question Matching
Web-based QampA
knowledge elements
AN
Qcad
Qfaq
Qweb
Figure 4 The multimode fusion QampA framework MMF-QA
both question word types and semantic pattern accordingto their functionalities in question matching The questionword types include question target word user-oriented wordand irrelevant word These three word types are semanticallylabeled by a predefined ontology to enrich the semanticrepresentation of questions For each word type differentsimilarity strategies are applied to calculate the similarity asdescribed in [60] The similarity calculations for questiontarget and user-oriented word type between question 119902119894 and aFAQ candidate faq119895 are shown in (1) (2) and (3) respectively
119878119894119898119894119876119879 (119902119894 119891119886119902119895)
=2 times 10038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast) cap 119891119886119902119895 (119876119879119882 997888rarr 119871lowast)100381610038161003816100381610038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast)1003816100381610038161003816 +
10038161003816100381610038161003816119891119886119902119895 (119876119879119882 997888rarr 119871lowast)10038161003816100381610038161003816
(1)
119878119872119886119905119888ℎ (119908119898 119908119899)
=
0 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 = 01 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 ge 1
(2)
119878119894119898119894119880119874 (119902119894 119891119886119902119895)
=2 times sum119878119872119886119905119888ℎ (119902119894 (119880119874119882) 119891119886119902119895 (119880119874119882))
1003816100381610038161003816119902119894 (119880119874119882)1003816100381610038161003816 +10038161003816100381610038161003816119891119886119902119895 (119880119874119882)
10038161003816100381610038161003816
(3)
In the equations Simi119876119879 denotes the similarity score ofQT word type between a given new question 119902119894 and an exist-ing FAQ question faq119895 119871lowast denotes the set of semantic labelscorresponding to target words of the question 119902119894(119876119879119882 rarr119871lowast) and 119891119886119902119895(119876119879119882 rarr 119871lowast) represent the semantic labelsof QT words in 119902119894 and faq119895 through semantic labelingrespectively 119908119898 cup 119908119898 rarr 119878(119908119898) denotes synonymy wordsexpansion of word 119908119898 SMatch denotes the synonymy-basedword matching of two words 119908119898 and 119908119899 119908119898 cup 119908119898 rarr 119878(119908119898)is the synonymy extension of word 119908119898 by adding synonymyword collection 119878(119908119898) By integrating the previous three partsof matching the overall matching score 119878119894119898119894119878119862(119902119894 119891119886119902119895) ofthe two questions 119902119894 and faq119895 through balancing the similarityof each part is calculated as shown in
119872119886119905119888ℎ119878119862 (119902119894 119891119886119902119895) = 120572 times 119878119894119898119894119876119879 + 120573 times 119878119894119898119894119880119874+ (1 minus 120572 minus 120573) times 119878119894119898119894119878119875
(4)
After question matching top 119896 FAQs with the highestmatching scores are selected as candidates set 119876119891119886119902 Mean-while the web-based QampA model uses a similar strategy tocompute the matching scores to web question collections Itextracts 119896 answers from websites via the standard question-answering techniques Similarly the web-based QampA returnsa candidate question-answer set Q119908119890119887 Q119908119890119887 and 119876119891119886119902 are
Wireless Communications and Mobile Computing 7
Figure 5 The screen snapshots of the mobile-based system for providing diabetes information services using the three QampA models
merged as the final answer candidates Q119888119886119889 for answer rank-ing and answer selection
We propose a LSI-based answer ranking method to re-rank the questions in Q119888119886119889 The ranking method consistsof three steps feature extraction Latent Semantic Indexing(LSI) similarity calculation and ranking The extracted fea-tures of Chinese questions are bag-of-words (BOWs) andCharacter while the features for English questions are bag-of-words feature only The LSI approach takes advantage ofimplicit higher-order semantic structure and matches wordsin queries with words in documents [61] Here we treateach candidate answer as a short document and detect themost relevant answers via the LSI-based method After thatthe candidate answers are reranked based on the similarityvalues and the top 119873 answers as candidate list 119860119873 arereturned
Finally there is an answer selection processThe selectionof a candidate answer as correct or incorrect can be treated asa binary classification task The question and correspondingtop 119873 candidate answers in list 119860119873 are transformed to 119873QA pairs We propose an answer selection approach viaa Logistic Regression (LR) classifier which includes foursteps feature extraction parameter tuning model trainingand answer selecting Using the features similar to the LSI-based approach QA pairs are randomly selected from theLMD-FAQ repository as training data The QA pairs withcorrect answers are labeled as ldquo1rdquo and ldquo0rdquo otherwiseWe thentune the parameter ldquoCrdquo (inverse of regularization strength)to avoid overfittingunderfitting issue After parameter opti-mization the best parameter is applied in the LR classifierwhich then is applied to select the best119873 candidate answerswhere the top 1 is the best answer and the remaining N-1answers in list 119860119873 are relevant answers Figure 5 shows the
screen snapshots of the knowledge-based QampA FAQ-basedQampA and web-based QampA modes
33 Diabetes Data Management with Early Warning Sincediabetes patients and people at high risk usually needlong-term health management we develop a real-time datamanagement module incorporating early warning to achievepatient health self-management
In the data management module users are required toregister their basic information After that the users canlog in to report their recent health data related to six maincharacteristics HbA1c BMI glucose systolic hypertension(hypertension S) diastolic hypertension (hypertension D)and creatinine The health data then are stored in server sidesecurely
With the historical health data the module calculatesand monitors the health status in real time For each of thecharacteristics we set an alarm value according to literaturereview on IDF documents and reports Once the health datahas a dramatic change or the characteristics are close totheir corresponding alarm value ranges the system will auto-matically deliver a warning message to the users about thesituation To evaluate the usability of the system a 2-monthrandomized study is designed Thirty people volunteered asinternal test users to monitor their health condition via theDia-AID system During the test users measure and reportthe data of the six characteristics by themselves Based oneach new data report the system calculates the existing dataand newly submitted data to make a summarization of thehealth condition in real time Table 1 shows the reportedhealth data records by a user Cecil
The system records all the reported health data and gener-ates data change curves automatically For example Figure 6
8 Wireless Communications and Mobile Computing
Table 1 The reported health data records of the user Cecil
Characteristic Value Time Characteristic Value TimeHBA1C 75 2017-11-01 102203 HBA1C 90 2017-11-04 191518GLUCOSE 120 2017-11-01 102203 BMI 220 2017-11-04 191518BMI 221 2017-11-01 102203 GLUCOSE 150 2017-11-04 191518HYPERTENSION S 1110 2017-11-01 102203 HYPERTENSION S 1130 2017-11-04 191518HYPERTENSION D 820 2017-11-01 102203 HYPERTENSION D 840 2017-11-04 191518CREATININE 118 2017-11-01 102203 CREATININE 124 2017-11-04 191518HBA1C 60 2017-11-02 182439 HBA1C 90 2017-11-05 181025BMI 220 2017-11-02 182439 BMI 240 2017-11-05 181025GLUCOSE 100 2017-11-02 182439 GLUCOSE 150 2017-11-05 181025HYPERTENSION S 1100 2017-11-02 182439 HYPERTENSION S 1130 2017-11-05 181025HYPERTENSION D 800 2017-11-02 182439 HYPERTENSION D 840 2017-11-05 181025CREATININE 12 2017-11-02 182439 CREATININE 124 2017-11-05 181025HBA1C 78 2017-11-03 183028 HBA1C 100 2017-11-06 192113BMI 219 2017-11-03 183028 BMI 240 2017-11-06 192113GLUCOSE 120 2017-11-03 183028 GLUCOSE 180 2017-11-06 192113HYPERTENSION S 1110 2017-11-03 183028 HYPERTENSION S 1150 2017-11-06 192113HYPERTENSION D 820 2017-11-03 183028 HYPERTENSION D 860 2017-11-06 192113CREATININE 12 2017-11-03 183028 CREATININE 115 2017-11-06 192113
Figure 6 The health data management with early warning function and health record visualization by time for the user Cecil
Wireless Communications and Mobile Computing 9
shows the trend curve of Cecilrsquos diastolic hypertension in thelast 7 days When the current newly submitted health data iswithin safe range and there is no dramatic change comparedwith last report the system shows the user with the healthstatus messages eg ldquoYour health status is goodrdquo in greencolor Once the system identifies current user data exceedingalarm range (either too high or too low) according to thecurrent change trend the system will evaluate how long ittakes to reach the alarm value The system will evaluate howlong it will take to reach the alarm value If the period is tooshort the system will automatically warn the current userFor example the system warns the user Cecil that diastolichypertension is too high and will be in a danger range after 2days if the user does not have any control on it Through thehealth data management incorporating early warning userscan review their health status and take actions to reduce therisk of diabetes according to the warning messages
4 Results
41 Datasets Since there is no available diabetes FAQ datasetfor evaluation the evaluations of the proposed LSI-basedanswer ranking approach and answer selection methodwere based on the constructed LMD-FAQ repository Totest the LSI-based answer ranking approach we randomlyselected 500 750100012501500 and 1750 Chinese question-answer tuples (question ltanswer-setgt) from the repositoryrespectively as six subdatasets of Evaluation dataset-A Foreach question-answer tuple it contains one question and ananswer set which consists of one correct answer and nineincorrect answers randomly generated from the rest of therepositoryThus each question contains 10 candidate answersfor ranking For answer selection evaluation we suppose eachquestion has 119896 candidate answers ie for each questionk-1 incorrect answers are randomly generated as negativesamples In this paper k is set to 5 and 10 For the setting k=56000 QA pairs are randomly generated as Training dataset-B1 and 2500 QA pairs are randomly generated as Testingdataset-C1 For the setting k=10 8000QA pairs are randomlygenerated as Training dataset-B2 and 5000 QA pairs arerandomly generated as Testing dataset-C2
42 Evaluation Metrics The evaluation metrics includeMean Reciprocal Rank (MRR) AccuracyN of the returnedanswers precision recall and F1 measure all of which arecommonly usedmetrics to evaluate the performance of QampAsystems
(i) MRR Mean Reciprocal Rank of the first correctanswer as shown in (5) (ie 1 if a correct answerwas retrieved at rank 1 05 if a correct answer wasretrieved at rank 2 and so on Q is the test set and|119876| denotes the number of questions in Q rank119894represents the position of the first correct answer inanswer ranking candidates to a test question 119876119894)
(ii) AccuracyN proportion of correct answers in top119873 returned answers by the system as shown in (6)(119862119894(119873) = 1 if there is at least one correct answer in top119873 candidates otherwise it is 0)
(iii) Precision for any of the categories is the number oftrue positives (TP) (ie the number of questions cor-rectly labeled as belonging to the positive categories)that are divided by the total number of questionslabeled as belonging to the positive categories asshown in (7) False positive (FP) is the number ofquestions that the system incorrectly labeled
(iv) Recall is defined as the number of true positivesdivided by the total number of questions that actuallybelong to positive categories (ie the sum of truepositive and false negative) as shown in (8)
(v) F1-measure considers both the precision and therecall to compute a balanced score as shown in (9)
119872119877119877 = 1|119876||119876|
sum119894=1
1119903119886119899119896119894
(5)
119860119888119888119906119903119886119888119910119873 = 1|119876||119876|
sum119894=1
119862119894 (119873) (6)
119875119903119890119888119894119904119894119900119899 = 119879119875119879119875 + 119865119875 (7)
119877119890119888119886119897119897 = 119879119875119879119875 + 119865119873 (8)
1198651 = 2 lowast 119875119903119890119888119894119904119894119900119899 lowast 119877119890119888119886119897119897119875119903119890119888119894119904119894119900119899 + 119877119890119888119886119897119897 (9)
43 Results To validate the proposed LSI-based answerranking method we conduct the following two experimentsThe first experiment is to verify the effectiveness of the LSI-based answer ranking method by comparing to five base-lines We adopt Doc2Vec Latent Dirichlet Allocation (LDA)Locality Sensitive Hashing (LSH) docsim and Synonyms[62] as baselines We randomly select 500 questions andmeasure the performance inMRRandAccuracyN (AccN119873 = 1 2 3 4 5) Compared with the baselines our methodachieves the best performance in all evaluation metrics asshown in Table 2 For MRR our method improves by 1780compared to LSH which has the best performance amongbaselines For Acc1 LSH also obtains the best performanceas 06733 among baselines Our method outperformed LSHwith an improvement of 2352 In addition our methodranks 9499 of the correct answers in the top five ofcandidate answers The improvements of MRR and Acc1prove that the proposed method can potentially promote thepositions of correct answers
To assess the stability of the proposed method thesecond experiment is conducted by comparing to the samefive baselines with the measures of MRR and Acc1 Theused dataset is Evaluation dataset-A Figure 7 illustrates theexperiment results measured in MRR while Figure 8 showsthe results measured in Acc1 From the result our methodachieves stable performance over all different sizes of thequestion setsThis result is promising since ourmethod ranksmost of the correct answers in the top of the candidate answerlist Moreover compared to the baselines our method gains
10 Wireless Communications and Mobile Computing
Table 2 Evaluation results compared to baselines
MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499
Our Method
1009080706050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
MRR
LSHDoc2Vec
docsimLDASynonyms
Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure
the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list
Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy
To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments
1009080706
Acc
1
050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
Our MethodLSHDoc2Vec
docsimLDASynonyms
Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure
The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists
We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)
Wireless Communications and Mobile Computing 11
Table 3 The comparison of our answer selection method with baseline methods using different k settings
Setting Methods Accuracy Precision Recall F1
k=5
KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956
RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305
Our method 09222 08859 08657 08753
k=10
KNN 09032 08076 05258 05247RF 09106 07523 07277 07391
GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206
Our method 09651 09415 08559 08929
Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues
Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction
5 Conclusions
Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system
Acc
1000
0975
0950
0925
0900
0875
0850
0825
08002000 3000 4000
The size of training datasets5000 6000
RecallF1Precision
Figure 10 The performance with the increasing size of trainingdatasets when k=5
1000
0975
0950
0925
0900
0875
0850
0825
08004000 5000 6000
The size of training datasets7000 8000
Acc RecallF1Precision
Figure 11 The performance with the increasing size of trainingdatasets when k=10
12 Wireless Communications and Mobile Computing
assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods
Data Availability
The diabetes data is not made publicly available
Conflicts of Interest
There are no conflicts of interest in this paper
Acknowledgments
The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)
References
[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017
[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001
[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013
[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014
[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo
International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016
[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014
[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016
[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017
[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017
[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009
[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016
[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013
[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015
[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017
[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012
[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010
[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006
[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002
[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002
[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999
[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710
[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000
[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007
[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001
[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015
[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010
[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007
Wireless Communications and Mobile Computing 13
[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016
[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005
[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004
[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007
[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012
[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017
[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015
[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007
[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016
[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018
[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018
[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014
[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014
[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015
[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010
[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004
[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009
[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011
[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014
[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015
[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016
[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016
[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016
[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016
[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016
[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014
[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016
[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015
[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017
[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008
[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010
14 Wireless Communications and Mobile Computing
[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016
[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990
[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
2 Wireless Communications and Mobile Computing
151194
246285
366 382415 425
050
100150200250300350400450500
Adul
ts w
ith D
iabe
tes (
mill
ions
)
2003 2006 2009 2011 2013 2015 20172000Year
Figure 1 Total number of adults with diabetes (20-79 years old)around the world during 2000 to 2017
management [7] Later Krebs et al [14] showed that 5823(9341604) of mobile phone users downloaded a health-related mobile app and used it at least once per day Asa convenient platform for checking usersrsquo health status ona real-time basis mobile applications have been developedfrom information provision to lifestyle-oriented smart healthmanagement Moreover existing research presented thatcontinuous real-time consulting and monitoring supportedby smartphones is applicable for improving efficiency ofdiabetes self-management [7 15ndash17] Therefore developinga mobile-based system for diabetes patients to assist in theirhealth management is of great importance
Many studies have shown that current medical searchengines eg PubMed Medical Subject Headings (MeSH)and Unified Medical Language System (UMLS) are oftenunable to serve users with clinically relevant answers in atimely manner and thus fail to satisfy patientsrsquo counselingneed [18 19] Hersh et al [20] found that a healthcareprofessional tookmore than 30minutes on average to seek ananswer utilizing information retrieval systems The processneeds about 2 minutes on average to obtain an answereven for experienced doctors [21] Instead based on naturallanguage processing techniques question answering (QampA)aims to provide users with direct precise answers to theirquestions and thus it is more preferred Hence there isan increasing demand to develop convenient and effectivequestion-answering systems for the medical domain [21ndash24] Moreover there is a particularly growing demand ofQampA systems for effectively and efficiently assisting diabetespatients to better utilize ever-accumulating expert knowledge[1 7 15]
To that end this study aims to develop a mobile-basedquestion-answering and early warning system called Dia-AID The system consists of 3 modules a large-scale multi-language diabetes FAQ repository a multimode fusion QampAframework and a health datamanagementmodule with earlywarning functionThe repository captures diabetes questionswith expert-defined answers and stores the question-answerknowledge in an interpretable and extendible form Theframework contains three different QampA resolution strate-gies knowledge-based QampA FAQ-based QampA and web-basedQampAThe health datamanagementmodule containingearly warning provides a convenient counseling service on a
smart health platform to assist diabetes patients in monitor-ing their health conditions
The contributions of this work include the following(1) a large-scale multilanguage diabetes FAQ repository isbuilt with a consistent representation format (2) a novelmultimode fusion QampA framework that integrates threemodes of QampA technologies is proposed to fulfill diabetesinformation seeking need (3) a health data managementmodule containing early warning function is developed tomonitor patient health status
The rest of this paper is organized as follows Section 2introduces related work in biomedical question answeringSection 3 describes themobile-based question-answering andearly warning system Dia-AID in detail Section 4 presentsthe experiment results of ourmethods based on the FAQ datarepository Section 5 addresses the conclusions
2 Related Work
The aim of question answering is to provide precise answersinstead of relevant documents from unstructured datasources to inquirers The research of open domain questionanswering (QampA) started from the prompt and instantiatedwork in the Text REtrieval Conference (TREC) evaluationcampaign [25] Recently with the increasing demand ofdomain-specific applications a growing interest has shiftedfrom open domain QampA to restricted domain QampA [26 27]Molla et al [28] addressed that restricted domain QampA tar-geting domain-specific information was expected to achieveeffective and reliable performance in real-world applicationsFurther as claimed by Mishra et al [29] restricted domainQampA could fulfill the specialized information requirementsof domain experts therefore improving the satisfaction ofusers Similarly Yu et al [30] and Rinaldi et al [31] notedthat restricted domain QampA such as biomedical domain [2432] could exploit domain-specific knowledge resources fordeeper text analysis as well as taking advantage of domain-specific typology formatting conventions to improve theanswer extraction performance
In light of Athenikos et alrsquos research [27] medical domainquestion answering was facing the challenge of highly com-plex domain-specific terminology and lexical and ontologicalresources Also emphasized by Abacha et al [33] the keyprocess was to translate the semantic relations expressed inquestions into a machine-readable representation to analyzethe natural language questions deeply and efficiently Theypresented a complete question analysis approach includingmedical entity recognition semantic relation extraction andautomatic translation to SPARQL queries Result presentedthat 60 of the questions were correctly translated toSPARQL queries via the proposed method Later Anca [34]proposed the GFMed for dealing with the same problem andthe challenge of querying a large number of LinkedData fromvarious domains GFMed was a QampA system for biomedicalinterlinked data aiming to fill the gap between end users andformal languages by introducing a grammatical framework totranslate biomedical information in natural language to thecorresponding SPARQL language The experimental resultsdemonstrated that the proposed methodology for building
Wireless Communications and Mobile Computing 3
Controlled Nature Language for querying Linked Data wasvalid Abacha et al [35] proposed an approach for ldquoAnswerSearchrdquo based on semantic search and query relaxation toresolve the problem of automatic QampA in medical domainThey defined question focuses as medical entities that werethe most closely linked to answers to improve the overallperformance of question answering Terol et al [36] claimeda general QampA system that was capable of working over anyrestricted domain Taking medical domain as an exampleapplication their system answeredmedical questions accord-ing to a generic question taxonomy and gained 944 overallprecision on the task
During question-answering process question represen-tation is an essential step in question analysis and answerretrieval Zhang et al [37] proposed a system based onmultilayer self-organizing map providing an efficient solu-tion to the organization problem of structured data of elec-tronic books A tree-structured representation was proposedto formulate the rich features of an e-book author Theirexperiment results corroborated that the proposed modelsbased on the tree-structured representation outperformedcontent-based models Later in their further research anefficient learning framework Tree2Vector for transformingtree-structured data into vectorial representations was pro-posed [38] By utilizing Tree2Vector framework to map tree-structured book data into vectorial space their continuedexperiments further presented that the mapped vectorialspace could explore term spatial distributions over a bookrather than the traditional documentmodelingmethods [39]
A recent trend among medical QampA systems was toincorporate the organized medical information throughoutQampA process in order to utilize the information for efficienthealthmanagement in various areas such asU-healthcare [4041] Jung et al [42] developed a decision supporting methodmainly for pain management for chronic disease patientsbased on frequent pattern treeminingThe proposedmethodaimed to reduce time and expenses for pain decision-makingof patients who were frequently exposed to pain Chunget al [12] presented a knowledge-based health service byleveraging a hybridwireless fidelity peer-to-peer architectureThe service was proposed to provide patients with efficientand economical healthcare through correct measurement ofvarious biosignals so that users could easily predict andmanage both health and disease Han et al [43] introduced aU-health service system THE MUSS focusing on achievingreusability and resolvability to provide stress and weightmanagement services
In subsequent medical QampA developments diabetesmellitus as one of the top three major worldwide causesof death from noncommunicable diseases has promptednumerous researches investigating the prevention preva-lence and mortality of diabetes mellitus [15 44ndash53] Thereis a great demand for a QampA system that can effectivelyand efficiently provide health consulting services and assistpeople in monitoring and managing their individual healthconditions Jung et al [7] explored a mobile healthcare appli-cation for providing self-diabetes management to patientsBy interoperating with Electronic Medical Record (EMR)the healthcare application provided services such as weight
management cardio-cerebrovascular risk evaluation andexercise management Waki et al [54] developed a real-time interactive system DialBetics to achieve diabetes self-management particularly HbA1c management By an eval-uation strategy the system helped patients improve theirHbA1c significantly by monitoring health data comparedwith continuing self-care regimen patients More recentlyYoo et al [1] proposed a Personal Health Record- (PHR-)based diabetes index service model through a mobile deviceoffering users a management information service for pre-venting diabetes Users were able to check their healthconditions on a real-time basis and receive informationabout desirable health behaviors and dietary habits related todiabetes
Yet the existing diabetes management applications pro-vided general information search and management whileignoring counseling services which were crucial for man-aging the health condition of diabetes patients Besidesas claimed by Mishra et al [29] the cons of restricteddomain QampA included the limited repository of domain-specific questions To overcome these difficulties we builta LMD-FAQ repository to provide users with concise andaccurate answers by physicians or experts from debatesrelated professional websites Moreover we aim to leveragethe LMD-FAQ repository to provide counseling services ofdiet medication and symptoms for diabetes patients Inaddition based on our previous work [55] by analyzingglobal clinical trials of 190 countries provided by the NationalInstitutes of Health (NIH) we discovered 6 representativehealth characteristics that were closely related to diabetesmellitus to better manage usersrsquo health conditions The sixrepresentative health characteristics were Body Mass Index(BMI) Glucose Systolic Hypertension Diastolic Hyperten-sion HbA1c and creatinine Further we defined severalearly warning intervals of health characteristics by referringto the existing international medical standards for healthmanagement and risk warning
3 Methods and Materials
The architecture of our mobile-based diabetes question-answering and early warning system Dia-AID is shown inFigure 2 It consists of 3 modules a large-scale multilanguagediabetes FAQ repository (LMD-FAQ repository) a novelmultimode fusion question-answering framework (MMF-QA) and a diabetes data management module with earlywarning (DM-EW) The LMD-FAQ repository contains alarge number of diabetes question-answer pairs acquiredfrom mainstream diabetes-related professional websites TheMMF-QA framework integrates three strategies knowledge-based QampA FAQ-based QampA and web-based QampA TheDM-EW module records patientsrsquo health data and monitorstheir health conditions in real time Six representative healthcharacteristics that are closely related to diabetes mellitusthat is BMI glucose systolic hypertension diastolic hyper-tension HbA1c and creatinine are applied In case of a rapidcharacteristic change or a predication of deterioration themodule will automatically warn patients and provide themwith dietary guidelines
4 Wireless Communications and Mobile Computing
Dia-AID
LMD-FAQ Repository
Automatic Question Acquisition
Question Target Identification and
Classification
Semantic Pattern Generation and Matching
Terminals
1 A Large-scale Multi-language Diabetes FAQ Repository
2 A Multi-mode FusionQampA Framework
3 Data Management with Early Warning Function
Computer
Smart Device
Mobile Phone
Knowledge-based QampA
FAQ-based QampA
Web-based QampA
Personal Characteristic Health Data
Heath Data Monitoring
Risk Warning
QSem-based Question Matching
LSI-based Answer Ranking
Risk Evaluation
Answer Selection
User profiles
Figure 2 The architecture of the diabetes question-answering and early warning system Dia-AID
31 The Large-Scale Multilanguage Diabetes FAQ RepositoryFrequently Asked Questions (FAQs) provide specific answersto the questions that are frequently asked when users browsespecific websites For example the website Health China(httphealthchinacom) allows users to ask questions infree text and those who are experts in the field answer thequestions freely These questions with professional answersare collected and organized as FAQ data The FAQ datacan dramatically benefit question answering by reusingthe accumulated professional knowledge In this paper wedevelop a method to automatically construct a large-scalemultilanguage diabetes FAQ (LMD-FAQ) repository throughidentifying FAQ data from professional diabetes websites
As illustrated in Figure 2 our method includes four steps(1) The first step is automatic question acquisition We firstanalyze the page structures of specific websites to identifydiabetes questions The websites elaborately selected bydomain experts include Diabetes Clinical Guidelines (Chi-neseMedical Association Diabetes Branch) professional dia-betes websites (International Diabetes Union the AmericanDiabetes Association etc) diabetes professional information
websites (CDC Health Channel) and diabetes interactivequestion-answering websites (Yahoo Knowledge)The ques-tions and associated answers are then extracted using regularexpression matching with the page codes (2) The secondstep is question target identification and classification Basedon our previous work [56 57] an automated answer typeidentification and classification method is applied to extractthe target and intent of questions by utilizing both syntacticand semantic analysis Considering that syntactic structuresvary according to the ways questions are asked four typicalsituations are identified and analyzed with each of themhaving a specific processing strategy During the processquestion target features are extracted via a principle-basedsyntactic parser and then expanded with their hypernymyfeatures and semantic labels Finally the expanded featuresare sent to a trained classifier to predict corresponding answertypes (3) The third step is semantic pattern generation andmatching Semantic pattern is utilized to index the questionswith answers in a more structured and semantic way Withquestion target and answer type extracted by the secondstep the questions are represented by a structural semantic
Wireless Communications and Mobile Computing 5
Figure 3 The visualization of example FAQ data in the LMD-FAQ repository
pattern which consists of five components the questiontarget question type concept event and constraint Anentropy-based method proposed in previous work [58] isapplied for automated semantic pattern generation Figure 3shows the visualization of example FAQ data in the LMD-FAQ repository
Based on the above procedure the method extractsFAQ data from professional websites formats them using aconsistent representation and indexes them with semanticpatterns for fast retrievalThrough the automatic process andthe human review on the indexed data the FAQ repositorycan be incrementally maintained Currently the LMD-FAQrepository comprises 19317 English frequently asked QApairs and 6041 Chinese QA pairs The repository providesour QampA system with fundamental data support for answer-ing commonly posted questions
32 The Multimode Fusion Question-Answering Frame-work Themultimode fusion question-answering framework(MMF-QA) integrates three QampA models knowledge-basedQampA FAQ-based QampA and web-based QampA The overallframework is shown in Figure 4The procedure of themodelsis described as follows
The knowledge-based QampA model relies on a diabetesknowledge base to generate concise answers for postedquestions For a new given question the model analyzes thestructure and keywords of the question and then generates
a corresponding semantic pattern Thus the question istransformed from natural language to a structural semanticrepresentation that captures semantic information such asquestion target question type concept event and con-straint The question then is further represented as a tuple([Concept1] Relation [Concept2]) in which ldquoConcept1rdquo andldquoConcept2rdquo are used to label meaningful entities The repre-sented question is used for answer extraction fromknowledgebase For instance ldquoWhatrsquos the symptoms of diabetesrdquo isrepresented as ([symptoms] Rel of [diabetes]) Thereforethe knowledge-based QampA process mainly maps entitiesand their relations to formally represented tuples which arefurther used to match knowledge base to retrieve accuratelymatched knowledge elements as answers
The FAQ-based QampA model computes matching scoresbetween a given question and questions in the FAQ reposi-toryThe questionswithmatching scores larger than a specificthreshold are kept as candidates The candidate questionsthen are ranked and the top 119896 questions with the highestscores are returned The model consists of three main stepsQsem-based question matching LSI-based answer rankingand answer selection As claimed by [59] a major challengeof FAQ-based QampA is to match questions to correspond-ing question-answer pairs Here we apply a QSem-basedquestion matching framework proposed in one of ourprevious works [60] to support answering FAQs throughreusing accumulated QA data The framework considers
6 Wireless Communications and Mobile Computing
User Questions
Multi-mode Fusion QampAFramework
Question Target Identification and Classification
Semantic Pattern-based Question Representation
Question Analysis
Knowledge Base
LMD-FAQ Repository
WebPreprocessed
questionsTop N
Answers
Semantic Pattern Generation
Semantic Pattern-based Question Representation
Knowledge-based QampA
Qsem-based Question Matching
LSI-based Answer Ranking
FAQ-based QampA
Answer Selection
Qsem-based Question Matching
Web-based QampA
knowledge elements
AN
Qcad
Qfaq
Qweb
Figure 4 The multimode fusion QampA framework MMF-QA
both question word types and semantic pattern accordingto their functionalities in question matching The questionword types include question target word user-oriented wordand irrelevant word These three word types are semanticallylabeled by a predefined ontology to enrich the semanticrepresentation of questions For each word type differentsimilarity strategies are applied to calculate the similarity asdescribed in [60] The similarity calculations for questiontarget and user-oriented word type between question 119902119894 and aFAQ candidate faq119895 are shown in (1) (2) and (3) respectively
119878119894119898119894119876119879 (119902119894 119891119886119902119895)
=2 times 10038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast) cap 119891119886119902119895 (119876119879119882 997888rarr 119871lowast)100381610038161003816100381610038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast)1003816100381610038161003816 +
10038161003816100381610038161003816119891119886119902119895 (119876119879119882 997888rarr 119871lowast)10038161003816100381610038161003816
(1)
119878119872119886119905119888ℎ (119908119898 119908119899)
=
0 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 = 01 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 ge 1
(2)
119878119894119898119894119880119874 (119902119894 119891119886119902119895)
=2 times sum119878119872119886119905119888ℎ (119902119894 (119880119874119882) 119891119886119902119895 (119880119874119882))
1003816100381610038161003816119902119894 (119880119874119882)1003816100381610038161003816 +10038161003816100381610038161003816119891119886119902119895 (119880119874119882)
10038161003816100381610038161003816
(3)
In the equations Simi119876119879 denotes the similarity score ofQT word type between a given new question 119902119894 and an exist-ing FAQ question faq119895 119871lowast denotes the set of semantic labelscorresponding to target words of the question 119902119894(119876119879119882 rarr119871lowast) and 119891119886119902119895(119876119879119882 rarr 119871lowast) represent the semantic labelsof QT words in 119902119894 and faq119895 through semantic labelingrespectively 119908119898 cup 119908119898 rarr 119878(119908119898) denotes synonymy wordsexpansion of word 119908119898 SMatch denotes the synonymy-basedword matching of two words 119908119898 and 119908119899 119908119898 cup 119908119898 rarr 119878(119908119898)is the synonymy extension of word 119908119898 by adding synonymyword collection 119878(119908119898) By integrating the previous three partsof matching the overall matching score 119878119894119898119894119878119862(119902119894 119891119886119902119895) ofthe two questions 119902119894 and faq119895 through balancing the similarityof each part is calculated as shown in
119872119886119905119888ℎ119878119862 (119902119894 119891119886119902119895) = 120572 times 119878119894119898119894119876119879 + 120573 times 119878119894119898119894119880119874+ (1 minus 120572 minus 120573) times 119878119894119898119894119878119875
(4)
After question matching top 119896 FAQs with the highestmatching scores are selected as candidates set 119876119891119886119902 Mean-while the web-based QampA model uses a similar strategy tocompute the matching scores to web question collections Itextracts 119896 answers from websites via the standard question-answering techniques Similarly the web-based QampA returnsa candidate question-answer set Q119908119890119887 Q119908119890119887 and 119876119891119886119902 are
Wireless Communications and Mobile Computing 7
Figure 5 The screen snapshots of the mobile-based system for providing diabetes information services using the three QampA models
merged as the final answer candidates Q119888119886119889 for answer rank-ing and answer selection
We propose a LSI-based answer ranking method to re-rank the questions in Q119888119886119889 The ranking method consistsof three steps feature extraction Latent Semantic Indexing(LSI) similarity calculation and ranking The extracted fea-tures of Chinese questions are bag-of-words (BOWs) andCharacter while the features for English questions are bag-of-words feature only The LSI approach takes advantage ofimplicit higher-order semantic structure and matches wordsin queries with words in documents [61] Here we treateach candidate answer as a short document and detect themost relevant answers via the LSI-based method After thatthe candidate answers are reranked based on the similarityvalues and the top 119873 answers as candidate list 119860119873 arereturned
Finally there is an answer selection processThe selectionof a candidate answer as correct or incorrect can be treated asa binary classification task The question and correspondingtop 119873 candidate answers in list 119860119873 are transformed to 119873QA pairs We propose an answer selection approach viaa Logistic Regression (LR) classifier which includes foursteps feature extraction parameter tuning model trainingand answer selecting Using the features similar to the LSI-based approach QA pairs are randomly selected from theLMD-FAQ repository as training data The QA pairs withcorrect answers are labeled as ldquo1rdquo and ldquo0rdquo otherwiseWe thentune the parameter ldquoCrdquo (inverse of regularization strength)to avoid overfittingunderfitting issue After parameter opti-mization the best parameter is applied in the LR classifierwhich then is applied to select the best119873 candidate answerswhere the top 1 is the best answer and the remaining N-1answers in list 119860119873 are relevant answers Figure 5 shows the
screen snapshots of the knowledge-based QampA FAQ-basedQampA and web-based QampA modes
33 Diabetes Data Management with Early Warning Sincediabetes patients and people at high risk usually needlong-term health management we develop a real-time datamanagement module incorporating early warning to achievepatient health self-management
In the data management module users are required toregister their basic information After that the users canlog in to report their recent health data related to six maincharacteristics HbA1c BMI glucose systolic hypertension(hypertension S) diastolic hypertension (hypertension D)and creatinine The health data then are stored in server sidesecurely
With the historical health data the module calculatesand monitors the health status in real time For each of thecharacteristics we set an alarm value according to literaturereview on IDF documents and reports Once the health datahas a dramatic change or the characteristics are close totheir corresponding alarm value ranges the system will auto-matically deliver a warning message to the users about thesituation To evaluate the usability of the system a 2-monthrandomized study is designed Thirty people volunteered asinternal test users to monitor their health condition via theDia-AID system During the test users measure and reportthe data of the six characteristics by themselves Based oneach new data report the system calculates the existing dataand newly submitted data to make a summarization of thehealth condition in real time Table 1 shows the reportedhealth data records by a user Cecil
The system records all the reported health data and gener-ates data change curves automatically For example Figure 6
8 Wireless Communications and Mobile Computing
Table 1 The reported health data records of the user Cecil
Characteristic Value Time Characteristic Value TimeHBA1C 75 2017-11-01 102203 HBA1C 90 2017-11-04 191518GLUCOSE 120 2017-11-01 102203 BMI 220 2017-11-04 191518BMI 221 2017-11-01 102203 GLUCOSE 150 2017-11-04 191518HYPERTENSION S 1110 2017-11-01 102203 HYPERTENSION S 1130 2017-11-04 191518HYPERTENSION D 820 2017-11-01 102203 HYPERTENSION D 840 2017-11-04 191518CREATININE 118 2017-11-01 102203 CREATININE 124 2017-11-04 191518HBA1C 60 2017-11-02 182439 HBA1C 90 2017-11-05 181025BMI 220 2017-11-02 182439 BMI 240 2017-11-05 181025GLUCOSE 100 2017-11-02 182439 GLUCOSE 150 2017-11-05 181025HYPERTENSION S 1100 2017-11-02 182439 HYPERTENSION S 1130 2017-11-05 181025HYPERTENSION D 800 2017-11-02 182439 HYPERTENSION D 840 2017-11-05 181025CREATININE 12 2017-11-02 182439 CREATININE 124 2017-11-05 181025HBA1C 78 2017-11-03 183028 HBA1C 100 2017-11-06 192113BMI 219 2017-11-03 183028 BMI 240 2017-11-06 192113GLUCOSE 120 2017-11-03 183028 GLUCOSE 180 2017-11-06 192113HYPERTENSION S 1110 2017-11-03 183028 HYPERTENSION S 1150 2017-11-06 192113HYPERTENSION D 820 2017-11-03 183028 HYPERTENSION D 860 2017-11-06 192113CREATININE 12 2017-11-03 183028 CREATININE 115 2017-11-06 192113
Figure 6 The health data management with early warning function and health record visualization by time for the user Cecil
Wireless Communications and Mobile Computing 9
shows the trend curve of Cecilrsquos diastolic hypertension in thelast 7 days When the current newly submitted health data iswithin safe range and there is no dramatic change comparedwith last report the system shows the user with the healthstatus messages eg ldquoYour health status is goodrdquo in greencolor Once the system identifies current user data exceedingalarm range (either too high or too low) according to thecurrent change trend the system will evaluate how long ittakes to reach the alarm value The system will evaluate howlong it will take to reach the alarm value If the period is tooshort the system will automatically warn the current userFor example the system warns the user Cecil that diastolichypertension is too high and will be in a danger range after 2days if the user does not have any control on it Through thehealth data management incorporating early warning userscan review their health status and take actions to reduce therisk of diabetes according to the warning messages
4 Results
41 Datasets Since there is no available diabetes FAQ datasetfor evaluation the evaluations of the proposed LSI-basedanswer ranking approach and answer selection methodwere based on the constructed LMD-FAQ repository Totest the LSI-based answer ranking approach we randomlyselected 500 750100012501500 and 1750 Chinese question-answer tuples (question ltanswer-setgt) from the repositoryrespectively as six subdatasets of Evaluation dataset-A Foreach question-answer tuple it contains one question and ananswer set which consists of one correct answer and nineincorrect answers randomly generated from the rest of therepositoryThus each question contains 10 candidate answersfor ranking For answer selection evaluation we suppose eachquestion has 119896 candidate answers ie for each questionk-1 incorrect answers are randomly generated as negativesamples In this paper k is set to 5 and 10 For the setting k=56000 QA pairs are randomly generated as Training dataset-B1 and 2500 QA pairs are randomly generated as Testingdataset-C1 For the setting k=10 8000QA pairs are randomlygenerated as Training dataset-B2 and 5000 QA pairs arerandomly generated as Testing dataset-C2
42 Evaluation Metrics The evaluation metrics includeMean Reciprocal Rank (MRR) AccuracyN of the returnedanswers precision recall and F1 measure all of which arecommonly usedmetrics to evaluate the performance of QampAsystems
(i) MRR Mean Reciprocal Rank of the first correctanswer as shown in (5) (ie 1 if a correct answerwas retrieved at rank 1 05 if a correct answer wasretrieved at rank 2 and so on Q is the test set and|119876| denotes the number of questions in Q rank119894represents the position of the first correct answer inanswer ranking candidates to a test question 119876119894)
(ii) AccuracyN proportion of correct answers in top119873 returned answers by the system as shown in (6)(119862119894(119873) = 1 if there is at least one correct answer in top119873 candidates otherwise it is 0)
(iii) Precision for any of the categories is the number oftrue positives (TP) (ie the number of questions cor-rectly labeled as belonging to the positive categories)that are divided by the total number of questionslabeled as belonging to the positive categories asshown in (7) False positive (FP) is the number ofquestions that the system incorrectly labeled
(iv) Recall is defined as the number of true positivesdivided by the total number of questions that actuallybelong to positive categories (ie the sum of truepositive and false negative) as shown in (8)
(v) F1-measure considers both the precision and therecall to compute a balanced score as shown in (9)
119872119877119877 = 1|119876||119876|
sum119894=1
1119903119886119899119896119894
(5)
119860119888119888119906119903119886119888119910119873 = 1|119876||119876|
sum119894=1
119862119894 (119873) (6)
119875119903119890119888119894119904119894119900119899 = 119879119875119879119875 + 119865119875 (7)
119877119890119888119886119897119897 = 119879119875119879119875 + 119865119873 (8)
1198651 = 2 lowast 119875119903119890119888119894119904119894119900119899 lowast 119877119890119888119886119897119897119875119903119890119888119894119904119894119900119899 + 119877119890119888119886119897119897 (9)
43 Results To validate the proposed LSI-based answerranking method we conduct the following two experimentsThe first experiment is to verify the effectiveness of the LSI-based answer ranking method by comparing to five base-lines We adopt Doc2Vec Latent Dirichlet Allocation (LDA)Locality Sensitive Hashing (LSH) docsim and Synonyms[62] as baselines We randomly select 500 questions andmeasure the performance inMRRandAccuracyN (AccN119873 = 1 2 3 4 5) Compared with the baselines our methodachieves the best performance in all evaluation metrics asshown in Table 2 For MRR our method improves by 1780compared to LSH which has the best performance amongbaselines For Acc1 LSH also obtains the best performanceas 06733 among baselines Our method outperformed LSHwith an improvement of 2352 In addition our methodranks 9499 of the correct answers in the top five ofcandidate answers The improvements of MRR and Acc1prove that the proposed method can potentially promote thepositions of correct answers
To assess the stability of the proposed method thesecond experiment is conducted by comparing to the samefive baselines with the measures of MRR and Acc1 Theused dataset is Evaluation dataset-A Figure 7 illustrates theexperiment results measured in MRR while Figure 8 showsthe results measured in Acc1 From the result our methodachieves stable performance over all different sizes of thequestion setsThis result is promising since ourmethod ranksmost of the correct answers in the top of the candidate answerlist Moreover compared to the baselines our method gains
10 Wireless Communications and Mobile Computing
Table 2 Evaluation results compared to baselines
MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499
Our Method
1009080706050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
MRR
LSHDoc2Vec
docsimLDASynonyms
Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure
the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list
Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy
To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments
1009080706
Acc
1
050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
Our MethodLSHDoc2Vec
docsimLDASynonyms
Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure
The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists
We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)
Wireless Communications and Mobile Computing 11
Table 3 The comparison of our answer selection method with baseline methods using different k settings
Setting Methods Accuracy Precision Recall F1
k=5
KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956
RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305
Our method 09222 08859 08657 08753
k=10
KNN 09032 08076 05258 05247RF 09106 07523 07277 07391
GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206
Our method 09651 09415 08559 08929
Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues
Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction
5 Conclusions
Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system
Acc
1000
0975
0950
0925
0900
0875
0850
0825
08002000 3000 4000
The size of training datasets5000 6000
RecallF1Precision
Figure 10 The performance with the increasing size of trainingdatasets when k=5
1000
0975
0950
0925
0900
0875
0850
0825
08004000 5000 6000
The size of training datasets7000 8000
Acc RecallF1Precision
Figure 11 The performance with the increasing size of trainingdatasets when k=10
12 Wireless Communications and Mobile Computing
assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods
Data Availability
The diabetes data is not made publicly available
Conflicts of Interest
There are no conflicts of interest in this paper
Acknowledgments
The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)
References
[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017
[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001
[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013
[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014
[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo
International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016
[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014
[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016
[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017
[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017
[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009
[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016
[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013
[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015
[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017
[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012
[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010
[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006
[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002
[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002
[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999
[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710
[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000
[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007
[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001
[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015
[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010
[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007
Wireless Communications and Mobile Computing 13
[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016
[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005
[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004
[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007
[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012
[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017
[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015
[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007
[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016
[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018
[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018
[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014
[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014
[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015
[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010
[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004
[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009
[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011
[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014
[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015
[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016
[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016
[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016
[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016
[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016
[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014
[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016
[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015
[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017
[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008
[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010
14 Wireless Communications and Mobile Computing
[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016
[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990
[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Wireless Communications and Mobile Computing 3
Controlled Nature Language for querying Linked Data wasvalid Abacha et al [35] proposed an approach for ldquoAnswerSearchrdquo based on semantic search and query relaxation toresolve the problem of automatic QampA in medical domainThey defined question focuses as medical entities that werethe most closely linked to answers to improve the overallperformance of question answering Terol et al [36] claimeda general QampA system that was capable of working over anyrestricted domain Taking medical domain as an exampleapplication their system answeredmedical questions accord-ing to a generic question taxonomy and gained 944 overallprecision on the task
During question-answering process question represen-tation is an essential step in question analysis and answerretrieval Zhang et al [37] proposed a system based onmultilayer self-organizing map providing an efficient solu-tion to the organization problem of structured data of elec-tronic books A tree-structured representation was proposedto formulate the rich features of an e-book author Theirexperiment results corroborated that the proposed modelsbased on the tree-structured representation outperformedcontent-based models Later in their further research anefficient learning framework Tree2Vector for transformingtree-structured data into vectorial representations was pro-posed [38] By utilizing Tree2Vector framework to map tree-structured book data into vectorial space their continuedexperiments further presented that the mapped vectorialspace could explore term spatial distributions over a bookrather than the traditional documentmodelingmethods [39]
A recent trend among medical QampA systems was toincorporate the organized medical information throughoutQampA process in order to utilize the information for efficienthealthmanagement in various areas such asU-healthcare [4041] Jung et al [42] developed a decision supporting methodmainly for pain management for chronic disease patientsbased on frequent pattern treeminingThe proposedmethodaimed to reduce time and expenses for pain decision-makingof patients who were frequently exposed to pain Chunget al [12] presented a knowledge-based health service byleveraging a hybridwireless fidelity peer-to-peer architectureThe service was proposed to provide patients with efficientand economical healthcare through correct measurement ofvarious biosignals so that users could easily predict andmanage both health and disease Han et al [43] introduced aU-health service system THE MUSS focusing on achievingreusability and resolvability to provide stress and weightmanagement services
In subsequent medical QampA developments diabetesmellitus as one of the top three major worldwide causesof death from noncommunicable diseases has promptednumerous researches investigating the prevention preva-lence and mortality of diabetes mellitus [15 44ndash53] Thereis a great demand for a QampA system that can effectivelyand efficiently provide health consulting services and assistpeople in monitoring and managing their individual healthconditions Jung et al [7] explored a mobile healthcare appli-cation for providing self-diabetes management to patientsBy interoperating with Electronic Medical Record (EMR)the healthcare application provided services such as weight
management cardio-cerebrovascular risk evaluation andexercise management Waki et al [54] developed a real-time interactive system DialBetics to achieve diabetes self-management particularly HbA1c management By an eval-uation strategy the system helped patients improve theirHbA1c significantly by monitoring health data comparedwith continuing self-care regimen patients More recentlyYoo et al [1] proposed a Personal Health Record- (PHR-)based diabetes index service model through a mobile deviceoffering users a management information service for pre-venting diabetes Users were able to check their healthconditions on a real-time basis and receive informationabout desirable health behaviors and dietary habits related todiabetes
Yet the existing diabetes management applications pro-vided general information search and management whileignoring counseling services which were crucial for man-aging the health condition of diabetes patients Besidesas claimed by Mishra et al [29] the cons of restricteddomain QampA included the limited repository of domain-specific questions To overcome these difficulties we builta LMD-FAQ repository to provide users with concise andaccurate answers by physicians or experts from debatesrelated professional websites Moreover we aim to leveragethe LMD-FAQ repository to provide counseling services ofdiet medication and symptoms for diabetes patients Inaddition based on our previous work [55] by analyzingglobal clinical trials of 190 countries provided by the NationalInstitutes of Health (NIH) we discovered 6 representativehealth characteristics that were closely related to diabetesmellitus to better manage usersrsquo health conditions The sixrepresentative health characteristics were Body Mass Index(BMI) Glucose Systolic Hypertension Diastolic Hyperten-sion HbA1c and creatinine Further we defined severalearly warning intervals of health characteristics by referringto the existing international medical standards for healthmanagement and risk warning
3 Methods and Materials
The architecture of our mobile-based diabetes question-answering and early warning system Dia-AID is shown inFigure 2 It consists of 3 modules a large-scale multilanguagediabetes FAQ repository (LMD-FAQ repository) a novelmultimode fusion question-answering framework (MMF-QA) and a diabetes data management module with earlywarning (DM-EW) The LMD-FAQ repository contains alarge number of diabetes question-answer pairs acquiredfrom mainstream diabetes-related professional websites TheMMF-QA framework integrates three strategies knowledge-based QampA FAQ-based QampA and web-based QampA TheDM-EW module records patientsrsquo health data and monitorstheir health conditions in real time Six representative healthcharacteristics that are closely related to diabetes mellitusthat is BMI glucose systolic hypertension diastolic hyper-tension HbA1c and creatinine are applied In case of a rapidcharacteristic change or a predication of deterioration themodule will automatically warn patients and provide themwith dietary guidelines
4 Wireless Communications and Mobile Computing
Dia-AID
LMD-FAQ Repository
Automatic Question Acquisition
Question Target Identification and
Classification
Semantic Pattern Generation and Matching
Terminals
1 A Large-scale Multi-language Diabetes FAQ Repository
2 A Multi-mode FusionQampA Framework
3 Data Management with Early Warning Function
Computer
Smart Device
Mobile Phone
Knowledge-based QampA
FAQ-based QampA
Web-based QampA
Personal Characteristic Health Data
Heath Data Monitoring
Risk Warning
QSem-based Question Matching
LSI-based Answer Ranking
Risk Evaluation
Answer Selection
User profiles
Figure 2 The architecture of the diabetes question-answering and early warning system Dia-AID
31 The Large-Scale Multilanguage Diabetes FAQ RepositoryFrequently Asked Questions (FAQs) provide specific answersto the questions that are frequently asked when users browsespecific websites For example the website Health China(httphealthchinacom) allows users to ask questions infree text and those who are experts in the field answer thequestions freely These questions with professional answersare collected and organized as FAQ data The FAQ datacan dramatically benefit question answering by reusingthe accumulated professional knowledge In this paper wedevelop a method to automatically construct a large-scalemultilanguage diabetes FAQ (LMD-FAQ) repository throughidentifying FAQ data from professional diabetes websites
As illustrated in Figure 2 our method includes four steps(1) The first step is automatic question acquisition We firstanalyze the page structures of specific websites to identifydiabetes questions The websites elaborately selected bydomain experts include Diabetes Clinical Guidelines (Chi-neseMedical Association Diabetes Branch) professional dia-betes websites (International Diabetes Union the AmericanDiabetes Association etc) diabetes professional information
websites (CDC Health Channel) and diabetes interactivequestion-answering websites (Yahoo Knowledge)The ques-tions and associated answers are then extracted using regularexpression matching with the page codes (2) The secondstep is question target identification and classification Basedon our previous work [56 57] an automated answer typeidentification and classification method is applied to extractthe target and intent of questions by utilizing both syntacticand semantic analysis Considering that syntactic structuresvary according to the ways questions are asked four typicalsituations are identified and analyzed with each of themhaving a specific processing strategy During the processquestion target features are extracted via a principle-basedsyntactic parser and then expanded with their hypernymyfeatures and semantic labels Finally the expanded featuresare sent to a trained classifier to predict corresponding answertypes (3) The third step is semantic pattern generation andmatching Semantic pattern is utilized to index the questionswith answers in a more structured and semantic way Withquestion target and answer type extracted by the secondstep the questions are represented by a structural semantic
Wireless Communications and Mobile Computing 5
Figure 3 The visualization of example FAQ data in the LMD-FAQ repository
pattern which consists of five components the questiontarget question type concept event and constraint Anentropy-based method proposed in previous work [58] isapplied for automated semantic pattern generation Figure 3shows the visualization of example FAQ data in the LMD-FAQ repository
Based on the above procedure the method extractsFAQ data from professional websites formats them using aconsistent representation and indexes them with semanticpatterns for fast retrievalThrough the automatic process andthe human review on the indexed data the FAQ repositorycan be incrementally maintained Currently the LMD-FAQrepository comprises 19317 English frequently asked QApairs and 6041 Chinese QA pairs The repository providesour QampA system with fundamental data support for answer-ing commonly posted questions
32 The Multimode Fusion Question-Answering Frame-work Themultimode fusion question-answering framework(MMF-QA) integrates three QampA models knowledge-basedQampA FAQ-based QampA and web-based QampA The overallframework is shown in Figure 4The procedure of themodelsis described as follows
The knowledge-based QampA model relies on a diabetesknowledge base to generate concise answers for postedquestions For a new given question the model analyzes thestructure and keywords of the question and then generates
a corresponding semantic pattern Thus the question istransformed from natural language to a structural semanticrepresentation that captures semantic information such asquestion target question type concept event and con-straint The question then is further represented as a tuple([Concept1] Relation [Concept2]) in which ldquoConcept1rdquo andldquoConcept2rdquo are used to label meaningful entities The repre-sented question is used for answer extraction fromknowledgebase For instance ldquoWhatrsquos the symptoms of diabetesrdquo isrepresented as ([symptoms] Rel of [diabetes]) Thereforethe knowledge-based QampA process mainly maps entitiesand their relations to formally represented tuples which arefurther used to match knowledge base to retrieve accuratelymatched knowledge elements as answers
The FAQ-based QampA model computes matching scoresbetween a given question and questions in the FAQ reposi-toryThe questionswithmatching scores larger than a specificthreshold are kept as candidates The candidate questionsthen are ranked and the top 119896 questions with the highestscores are returned The model consists of three main stepsQsem-based question matching LSI-based answer rankingand answer selection As claimed by [59] a major challengeof FAQ-based QampA is to match questions to correspond-ing question-answer pairs Here we apply a QSem-basedquestion matching framework proposed in one of ourprevious works [60] to support answering FAQs throughreusing accumulated QA data The framework considers
6 Wireless Communications and Mobile Computing
User Questions
Multi-mode Fusion QampAFramework
Question Target Identification and Classification
Semantic Pattern-based Question Representation
Question Analysis
Knowledge Base
LMD-FAQ Repository
WebPreprocessed
questionsTop N
Answers
Semantic Pattern Generation
Semantic Pattern-based Question Representation
Knowledge-based QampA
Qsem-based Question Matching
LSI-based Answer Ranking
FAQ-based QampA
Answer Selection
Qsem-based Question Matching
Web-based QampA
knowledge elements
AN
Qcad
Qfaq
Qweb
Figure 4 The multimode fusion QampA framework MMF-QA
both question word types and semantic pattern accordingto their functionalities in question matching The questionword types include question target word user-oriented wordand irrelevant word These three word types are semanticallylabeled by a predefined ontology to enrich the semanticrepresentation of questions For each word type differentsimilarity strategies are applied to calculate the similarity asdescribed in [60] The similarity calculations for questiontarget and user-oriented word type between question 119902119894 and aFAQ candidate faq119895 are shown in (1) (2) and (3) respectively
119878119894119898119894119876119879 (119902119894 119891119886119902119895)
=2 times 10038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast) cap 119891119886119902119895 (119876119879119882 997888rarr 119871lowast)100381610038161003816100381610038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast)1003816100381610038161003816 +
10038161003816100381610038161003816119891119886119902119895 (119876119879119882 997888rarr 119871lowast)10038161003816100381610038161003816
(1)
119878119872119886119905119888ℎ (119908119898 119908119899)
=
0 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 = 01 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 ge 1
(2)
119878119894119898119894119880119874 (119902119894 119891119886119902119895)
=2 times sum119878119872119886119905119888ℎ (119902119894 (119880119874119882) 119891119886119902119895 (119880119874119882))
1003816100381610038161003816119902119894 (119880119874119882)1003816100381610038161003816 +10038161003816100381610038161003816119891119886119902119895 (119880119874119882)
10038161003816100381610038161003816
(3)
In the equations Simi119876119879 denotes the similarity score ofQT word type between a given new question 119902119894 and an exist-ing FAQ question faq119895 119871lowast denotes the set of semantic labelscorresponding to target words of the question 119902119894(119876119879119882 rarr119871lowast) and 119891119886119902119895(119876119879119882 rarr 119871lowast) represent the semantic labelsof QT words in 119902119894 and faq119895 through semantic labelingrespectively 119908119898 cup 119908119898 rarr 119878(119908119898) denotes synonymy wordsexpansion of word 119908119898 SMatch denotes the synonymy-basedword matching of two words 119908119898 and 119908119899 119908119898 cup 119908119898 rarr 119878(119908119898)is the synonymy extension of word 119908119898 by adding synonymyword collection 119878(119908119898) By integrating the previous three partsof matching the overall matching score 119878119894119898119894119878119862(119902119894 119891119886119902119895) ofthe two questions 119902119894 and faq119895 through balancing the similarityof each part is calculated as shown in
119872119886119905119888ℎ119878119862 (119902119894 119891119886119902119895) = 120572 times 119878119894119898119894119876119879 + 120573 times 119878119894119898119894119880119874+ (1 minus 120572 minus 120573) times 119878119894119898119894119878119875
(4)
After question matching top 119896 FAQs with the highestmatching scores are selected as candidates set 119876119891119886119902 Mean-while the web-based QampA model uses a similar strategy tocompute the matching scores to web question collections Itextracts 119896 answers from websites via the standard question-answering techniques Similarly the web-based QampA returnsa candidate question-answer set Q119908119890119887 Q119908119890119887 and 119876119891119886119902 are
Wireless Communications and Mobile Computing 7
Figure 5 The screen snapshots of the mobile-based system for providing diabetes information services using the three QampA models
merged as the final answer candidates Q119888119886119889 for answer rank-ing and answer selection
We propose a LSI-based answer ranking method to re-rank the questions in Q119888119886119889 The ranking method consistsof three steps feature extraction Latent Semantic Indexing(LSI) similarity calculation and ranking The extracted fea-tures of Chinese questions are bag-of-words (BOWs) andCharacter while the features for English questions are bag-of-words feature only The LSI approach takes advantage ofimplicit higher-order semantic structure and matches wordsin queries with words in documents [61] Here we treateach candidate answer as a short document and detect themost relevant answers via the LSI-based method After thatthe candidate answers are reranked based on the similarityvalues and the top 119873 answers as candidate list 119860119873 arereturned
Finally there is an answer selection processThe selectionof a candidate answer as correct or incorrect can be treated asa binary classification task The question and correspondingtop 119873 candidate answers in list 119860119873 are transformed to 119873QA pairs We propose an answer selection approach viaa Logistic Regression (LR) classifier which includes foursteps feature extraction parameter tuning model trainingand answer selecting Using the features similar to the LSI-based approach QA pairs are randomly selected from theLMD-FAQ repository as training data The QA pairs withcorrect answers are labeled as ldquo1rdquo and ldquo0rdquo otherwiseWe thentune the parameter ldquoCrdquo (inverse of regularization strength)to avoid overfittingunderfitting issue After parameter opti-mization the best parameter is applied in the LR classifierwhich then is applied to select the best119873 candidate answerswhere the top 1 is the best answer and the remaining N-1answers in list 119860119873 are relevant answers Figure 5 shows the
screen snapshots of the knowledge-based QampA FAQ-basedQampA and web-based QampA modes
33 Diabetes Data Management with Early Warning Sincediabetes patients and people at high risk usually needlong-term health management we develop a real-time datamanagement module incorporating early warning to achievepatient health self-management
In the data management module users are required toregister their basic information After that the users canlog in to report their recent health data related to six maincharacteristics HbA1c BMI glucose systolic hypertension(hypertension S) diastolic hypertension (hypertension D)and creatinine The health data then are stored in server sidesecurely
With the historical health data the module calculatesand monitors the health status in real time For each of thecharacteristics we set an alarm value according to literaturereview on IDF documents and reports Once the health datahas a dramatic change or the characteristics are close totheir corresponding alarm value ranges the system will auto-matically deliver a warning message to the users about thesituation To evaluate the usability of the system a 2-monthrandomized study is designed Thirty people volunteered asinternal test users to monitor their health condition via theDia-AID system During the test users measure and reportthe data of the six characteristics by themselves Based oneach new data report the system calculates the existing dataand newly submitted data to make a summarization of thehealth condition in real time Table 1 shows the reportedhealth data records by a user Cecil
The system records all the reported health data and gener-ates data change curves automatically For example Figure 6
8 Wireless Communications and Mobile Computing
Table 1 The reported health data records of the user Cecil
Characteristic Value Time Characteristic Value TimeHBA1C 75 2017-11-01 102203 HBA1C 90 2017-11-04 191518GLUCOSE 120 2017-11-01 102203 BMI 220 2017-11-04 191518BMI 221 2017-11-01 102203 GLUCOSE 150 2017-11-04 191518HYPERTENSION S 1110 2017-11-01 102203 HYPERTENSION S 1130 2017-11-04 191518HYPERTENSION D 820 2017-11-01 102203 HYPERTENSION D 840 2017-11-04 191518CREATININE 118 2017-11-01 102203 CREATININE 124 2017-11-04 191518HBA1C 60 2017-11-02 182439 HBA1C 90 2017-11-05 181025BMI 220 2017-11-02 182439 BMI 240 2017-11-05 181025GLUCOSE 100 2017-11-02 182439 GLUCOSE 150 2017-11-05 181025HYPERTENSION S 1100 2017-11-02 182439 HYPERTENSION S 1130 2017-11-05 181025HYPERTENSION D 800 2017-11-02 182439 HYPERTENSION D 840 2017-11-05 181025CREATININE 12 2017-11-02 182439 CREATININE 124 2017-11-05 181025HBA1C 78 2017-11-03 183028 HBA1C 100 2017-11-06 192113BMI 219 2017-11-03 183028 BMI 240 2017-11-06 192113GLUCOSE 120 2017-11-03 183028 GLUCOSE 180 2017-11-06 192113HYPERTENSION S 1110 2017-11-03 183028 HYPERTENSION S 1150 2017-11-06 192113HYPERTENSION D 820 2017-11-03 183028 HYPERTENSION D 860 2017-11-06 192113CREATININE 12 2017-11-03 183028 CREATININE 115 2017-11-06 192113
Figure 6 The health data management with early warning function and health record visualization by time for the user Cecil
Wireless Communications and Mobile Computing 9
shows the trend curve of Cecilrsquos diastolic hypertension in thelast 7 days When the current newly submitted health data iswithin safe range and there is no dramatic change comparedwith last report the system shows the user with the healthstatus messages eg ldquoYour health status is goodrdquo in greencolor Once the system identifies current user data exceedingalarm range (either too high or too low) according to thecurrent change trend the system will evaluate how long ittakes to reach the alarm value The system will evaluate howlong it will take to reach the alarm value If the period is tooshort the system will automatically warn the current userFor example the system warns the user Cecil that diastolichypertension is too high and will be in a danger range after 2days if the user does not have any control on it Through thehealth data management incorporating early warning userscan review their health status and take actions to reduce therisk of diabetes according to the warning messages
4 Results
41 Datasets Since there is no available diabetes FAQ datasetfor evaluation the evaluations of the proposed LSI-basedanswer ranking approach and answer selection methodwere based on the constructed LMD-FAQ repository Totest the LSI-based answer ranking approach we randomlyselected 500 750100012501500 and 1750 Chinese question-answer tuples (question ltanswer-setgt) from the repositoryrespectively as six subdatasets of Evaluation dataset-A Foreach question-answer tuple it contains one question and ananswer set which consists of one correct answer and nineincorrect answers randomly generated from the rest of therepositoryThus each question contains 10 candidate answersfor ranking For answer selection evaluation we suppose eachquestion has 119896 candidate answers ie for each questionk-1 incorrect answers are randomly generated as negativesamples In this paper k is set to 5 and 10 For the setting k=56000 QA pairs are randomly generated as Training dataset-B1 and 2500 QA pairs are randomly generated as Testingdataset-C1 For the setting k=10 8000QA pairs are randomlygenerated as Training dataset-B2 and 5000 QA pairs arerandomly generated as Testing dataset-C2
42 Evaluation Metrics The evaluation metrics includeMean Reciprocal Rank (MRR) AccuracyN of the returnedanswers precision recall and F1 measure all of which arecommonly usedmetrics to evaluate the performance of QampAsystems
(i) MRR Mean Reciprocal Rank of the first correctanswer as shown in (5) (ie 1 if a correct answerwas retrieved at rank 1 05 if a correct answer wasretrieved at rank 2 and so on Q is the test set and|119876| denotes the number of questions in Q rank119894represents the position of the first correct answer inanswer ranking candidates to a test question 119876119894)
(ii) AccuracyN proportion of correct answers in top119873 returned answers by the system as shown in (6)(119862119894(119873) = 1 if there is at least one correct answer in top119873 candidates otherwise it is 0)
(iii) Precision for any of the categories is the number oftrue positives (TP) (ie the number of questions cor-rectly labeled as belonging to the positive categories)that are divided by the total number of questionslabeled as belonging to the positive categories asshown in (7) False positive (FP) is the number ofquestions that the system incorrectly labeled
(iv) Recall is defined as the number of true positivesdivided by the total number of questions that actuallybelong to positive categories (ie the sum of truepositive and false negative) as shown in (8)
(v) F1-measure considers both the precision and therecall to compute a balanced score as shown in (9)
119872119877119877 = 1|119876||119876|
sum119894=1
1119903119886119899119896119894
(5)
119860119888119888119906119903119886119888119910119873 = 1|119876||119876|
sum119894=1
119862119894 (119873) (6)
119875119903119890119888119894119904119894119900119899 = 119879119875119879119875 + 119865119875 (7)
119877119890119888119886119897119897 = 119879119875119879119875 + 119865119873 (8)
1198651 = 2 lowast 119875119903119890119888119894119904119894119900119899 lowast 119877119890119888119886119897119897119875119903119890119888119894119904119894119900119899 + 119877119890119888119886119897119897 (9)
43 Results To validate the proposed LSI-based answerranking method we conduct the following two experimentsThe first experiment is to verify the effectiveness of the LSI-based answer ranking method by comparing to five base-lines We adopt Doc2Vec Latent Dirichlet Allocation (LDA)Locality Sensitive Hashing (LSH) docsim and Synonyms[62] as baselines We randomly select 500 questions andmeasure the performance inMRRandAccuracyN (AccN119873 = 1 2 3 4 5) Compared with the baselines our methodachieves the best performance in all evaluation metrics asshown in Table 2 For MRR our method improves by 1780compared to LSH which has the best performance amongbaselines For Acc1 LSH also obtains the best performanceas 06733 among baselines Our method outperformed LSHwith an improvement of 2352 In addition our methodranks 9499 of the correct answers in the top five ofcandidate answers The improvements of MRR and Acc1prove that the proposed method can potentially promote thepositions of correct answers
To assess the stability of the proposed method thesecond experiment is conducted by comparing to the samefive baselines with the measures of MRR and Acc1 Theused dataset is Evaluation dataset-A Figure 7 illustrates theexperiment results measured in MRR while Figure 8 showsthe results measured in Acc1 From the result our methodachieves stable performance over all different sizes of thequestion setsThis result is promising since ourmethod ranksmost of the correct answers in the top of the candidate answerlist Moreover compared to the baselines our method gains
10 Wireless Communications and Mobile Computing
Table 2 Evaluation results compared to baselines
MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499
Our Method
1009080706050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
MRR
LSHDoc2Vec
docsimLDASynonyms
Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure
the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list
Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy
To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments
1009080706
Acc
1
050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
Our MethodLSHDoc2Vec
docsimLDASynonyms
Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure
The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists
We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)
Wireless Communications and Mobile Computing 11
Table 3 The comparison of our answer selection method with baseline methods using different k settings
Setting Methods Accuracy Precision Recall F1
k=5
KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956
RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305
Our method 09222 08859 08657 08753
k=10
KNN 09032 08076 05258 05247RF 09106 07523 07277 07391
GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206
Our method 09651 09415 08559 08929
Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues
Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction
5 Conclusions
Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system
Acc
1000
0975
0950
0925
0900
0875
0850
0825
08002000 3000 4000
The size of training datasets5000 6000
RecallF1Precision
Figure 10 The performance with the increasing size of trainingdatasets when k=5
1000
0975
0950
0925
0900
0875
0850
0825
08004000 5000 6000
The size of training datasets7000 8000
Acc RecallF1Precision
Figure 11 The performance with the increasing size of trainingdatasets when k=10
12 Wireless Communications and Mobile Computing
assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods
Data Availability
The diabetes data is not made publicly available
Conflicts of Interest
There are no conflicts of interest in this paper
Acknowledgments
The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)
References
[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017
[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001
[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013
[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014
[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo
International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016
[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014
[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016
[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017
[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017
[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009
[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016
[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013
[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015
[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017
[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012
[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010
[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006
[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002
[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002
[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999
[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710
[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000
[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007
[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001
[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015
[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010
[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007
Wireless Communications and Mobile Computing 13
[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016
[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005
[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004
[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007
[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012
[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017
[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015
[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007
[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016
[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018
[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018
[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014
[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014
[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015
[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010
[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004
[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009
[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011
[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014
[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015
[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016
[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016
[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016
[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016
[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016
[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014
[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016
[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015
[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017
[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008
[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010
14 Wireless Communications and Mobile Computing
[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016
[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990
[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
4 Wireless Communications and Mobile Computing
Dia-AID
LMD-FAQ Repository
Automatic Question Acquisition
Question Target Identification and
Classification
Semantic Pattern Generation and Matching
Terminals
1 A Large-scale Multi-language Diabetes FAQ Repository
2 A Multi-mode FusionQampA Framework
3 Data Management with Early Warning Function
Computer
Smart Device
Mobile Phone
Knowledge-based QampA
FAQ-based QampA
Web-based QampA
Personal Characteristic Health Data
Heath Data Monitoring
Risk Warning
QSem-based Question Matching
LSI-based Answer Ranking
Risk Evaluation
Answer Selection
User profiles
Figure 2 The architecture of the diabetes question-answering and early warning system Dia-AID
31 The Large-Scale Multilanguage Diabetes FAQ RepositoryFrequently Asked Questions (FAQs) provide specific answersto the questions that are frequently asked when users browsespecific websites For example the website Health China(httphealthchinacom) allows users to ask questions infree text and those who are experts in the field answer thequestions freely These questions with professional answersare collected and organized as FAQ data The FAQ datacan dramatically benefit question answering by reusingthe accumulated professional knowledge In this paper wedevelop a method to automatically construct a large-scalemultilanguage diabetes FAQ (LMD-FAQ) repository throughidentifying FAQ data from professional diabetes websites
As illustrated in Figure 2 our method includes four steps(1) The first step is automatic question acquisition We firstanalyze the page structures of specific websites to identifydiabetes questions The websites elaborately selected bydomain experts include Diabetes Clinical Guidelines (Chi-neseMedical Association Diabetes Branch) professional dia-betes websites (International Diabetes Union the AmericanDiabetes Association etc) diabetes professional information
websites (CDC Health Channel) and diabetes interactivequestion-answering websites (Yahoo Knowledge)The ques-tions and associated answers are then extracted using regularexpression matching with the page codes (2) The secondstep is question target identification and classification Basedon our previous work [56 57] an automated answer typeidentification and classification method is applied to extractthe target and intent of questions by utilizing both syntacticand semantic analysis Considering that syntactic structuresvary according to the ways questions are asked four typicalsituations are identified and analyzed with each of themhaving a specific processing strategy During the processquestion target features are extracted via a principle-basedsyntactic parser and then expanded with their hypernymyfeatures and semantic labels Finally the expanded featuresare sent to a trained classifier to predict corresponding answertypes (3) The third step is semantic pattern generation andmatching Semantic pattern is utilized to index the questionswith answers in a more structured and semantic way Withquestion target and answer type extracted by the secondstep the questions are represented by a structural semantic
Wireless Communications and Mobile Computing 5
Figure 3 The visualization of example FAQ data in the LMD-FAQ repository
pattern which consists of five components the questiontarget question type concept event and constraint Anentropy-based method proposed in previous work [58] isapplied for automated semantic pattern generation Figure 3shows the visualization of example FAQ data in the LMD-FAQ repository
Based on the above procedure the method extractsFAQ data from professional websites formats them using aconsistent representation and indexes them with semanticpatterns for fast retrievalThrough the automatic process andthe human review on the indexed data the FAQ repositorycan be incrementally maintained Currently the LMD-FAQrepository comprises 19317 English frequently asked QApairs and 6041 Chinese QA pairs The repository providesour QampA system with fundamental data support for answer-ing commonly posted questions
32 The Multimode Fusion Question-Answering Frame-work Themultimode fusion question-answering framework(MMF-QA) integrates three QampA models knowledge-basedQampA FAQ-based QampA and web-based QampA The overallframework is shown in Figure 4The procedure of themodelsis described as follows
The knowledge-based QampA model relies on a diabetesknowledge base to generate concise answers for postedquestions For a new given question the model analyzes thestructure and keywords of the question and then generates
a corresponding semantic pattern Thus the question istransformed from natural language to a structural semanticrepresentation that captures semantic information such asquestion target question type concept event and con-straint The question then is further represented as a tuple([Concept1] Relation [Concept2]) in which ldquoConcept1rdquo andldquoConcept2rdquo are used to label meaningful entities The repre-sented question is used for answer extraction fromknowledgebase For instance ldquoWhatrsquos the symptoms of diabetesrdquo isrepresented as ([symptoms] Rel of [diabetes]) Thereforethe knowledge-based QampA process mainly maps entitiesand their relations to formally represented tuples which arefurther used to match knowledge base to retrieve accuratelymatched knowledge elements as answers
The FAQ-based QampA model computes matching scoresbetween a given question and questions in the FAQ reposi-toryThe questionswithmatching scores larger than a specificthreshold are kept as candidates The candidate questionsthen are ranked and the top 119896 questions with the highestscores are returned The model consists of three main stepsQsem-based question matching LSI-based answer rankingand answer selection As claimed by [59] a major challengeof FAQ-based QampA is to match questions to correspond-ing question-answer pairs Here we apply a QSem-basedquestion matching framework proposed in one of ourprevious works [60] to support answering FAQs throughreusing accumulated QA data The framework considers
6 Wireless Communications and Mobile Computing
User Questions
Multi-mode Fusion QampAFramework
Question Target Identification and Classification
Semantic Pattern-based Question Representation
Question Analysis
Knowledge Base
LMD-FAQ Repository
WebPreprocessed
questionsTop N
Answers
Semantic Pattern Generation
Semantic Pattern-based Question Representation
Knowledge-based QampA
Qsem-based Question Matching
LSI-based Answer Ranking
FAQ-based QampA
Answer Selection
Qsem-based Question Matching
Web-based QampA
knowledge elements
AN
Qcad
Qfaq
Qweb
Figure 4 The multimode fusion QampA framework MMF-QA
both question word types and semantic pattern accordingto their functionalities in question matching The questionword types include question target word user-oriented wordand irrelevant word These three word types are semanticallylabeled by a predefined ontology to enrich the semanticrepresentation of questions For each word type differentsimilarity strategies are applied to calculate the similarity asdescribed in [60] The similarity calculations for questiontarget and user-oriented word type between question 119902119894 and aFAQ candidate faq119895 are shown in (1) (2) and (3) respectively
119878119894119898119894119876119879 (119902119894 119891119886119902119895)
=2 times 10038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast) cap 119891119886119902119895 (119876119879119882 997888rarr 119871lowast)100381610038161003816100381610038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast)1003816100381610038161003816 +
10038161003816100381610038161003816119891119886119902119895 (119876119879119882 997888rarr 119871lowast)10038161003816100381610038161003816
(1)
119878119872119886119905119888ℎ (119908119898 119908119899)
=
0 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 = 01 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 ge 1
(2)
119878119894119898119894119880119874 (119902119894 119891119886119902119895)
=2 times sum119878119872119886119905119888ℎ (119902119894 (119880119874119882) 119891119886119902119895 (119880119874119882))
1003816100381610038161003816119902119894 (119880119874119882)1003816100381610038161003816 +10038161003816100381610038161003816119891119886119902119895 (119880119874119882)
10038161003816100381610038161003816
(3)
In the equations Simi119876119879 denotes the similarity score ofQT word type between a given new question 119902119894 and an exist-ing FAQ question faq119895 119871lowast denotes the set of semantic labelscorresponding to target words of the question 119902119894(119876119879119882 rarr119871lowast) and 119891119886119902119895(119876119879119882 rarr 119871lowast) represent the semantic labelsof QT words in 119902119894 and faq119895 through semantic labelingrespectively 119908119898 cup 119908119898 rarr 119878(119908119898) denotes synonymy wordsexpansion of word 119908119898 SMatch denotes the synonymy-basedword matching of two words 119908119898 and 119908119899 119908119898 cup 119908119898 rarr 119878(119908119898)is the synonymy extension of word 119908119898 by adding synonymyword collection 119878(119908119898) By integrating the previous three partsof matching the overall matching score 119878119894119898119894119878119862(119902119894 119891119886119902119895) ofthe two questions 119902119894 and faq119895 through balancing the similarityof each part is calculated as shown in
119872119886119905119888ℎ119878119862 (119902119894 119891119886119902119895) = 120572 times 119878119894119898119894119876119879 + 120573 times 119878119894119898119894119880119874+ (1 minus 120572 minus 120573) times 119878119894119898119894119878119875
(4)
After question matching top 119896 FAQs with the highestmatching scores are selected as candidates set 119876119891119886119902 Mean-while the web-based QampA model uses a similar strategy tocompute the matching scores to web question collections Itextracts 119896 answers from websites via the standard question-answering techniques Similarly the web-based QampA returnsa candidate question-answer set Q119908119890119887 Q119908119890119887 and 119876119891119886119902 are
Wireless Communications and Mobile Computing 7
Figure 5 The screen snapshots of the mobile-based system for providing diabetes information services using the three QampA models
merged as the final answer candidates Q119888119886119889 for answer rank-ing and answer selection
We propose a LSI-based answer ranking method to re-rank the questions in Q119888119886119889 The ranking method consistsof three steps feature extraction Latent Semantic Indexing(LSI) similarity calculation and ranking The extracted fea-tures of Chinese questions are bag-of-words (BOWs) andCharacter while the features for English questions are bag-of-words feature only The LSI approach takes advantage ofimplicit higher-order semantic structure and matches wordsin queries with words in documents [61] Here we treateach candidate answer as a short document and detect themost relevant answers via the LSI-based method After thatthe candidate answers are reranked based on the similarityvalues and the top 119873 answers as candidate list 119860119873 arereturned
Finally there is an answer selection processThe selectionof a candidate answer as correct or incorrect can be treated asa binary classification task The question and correspondingtop 119873 candidate answers in list 119860119873 are transformed to 119873QA pairs We propose an answer selection approach viaa Logistic Regression (LR) classifier which includes foursteps feature extraction parameter tuning model trainingand answer selecting Using the features similar to the LSI-based approach QA pairs are randomly selected from theLMD-FAQ repository as training data The QA pairs withcorrect answers are labeled as ldquo1rdquo and ldquo0rdquo otherwiseWe thentune the parameter ldquoCrdquo (inverse of regularization strength)to avoid overfittingunderfitting issue After parameter opti-mization the best parameter is applied in the LR classifierwhich then is applied to select the best119873 candidate answerswhere the top 1 is the best answer and the remaining N-1answers in list 119860119873 are relevant answers Figure 5 shows the
screen snapshots of the knowledge-based QampA FAQ-basedQampA and web-based QampA modes
33 Diabetes Data Management with Early Warning Sincediabetes patients and people at high risk usually needlong-term health management we develop a real-time datamanagement module incorporating early warning to achievepatient health self-management
In the data management module users are required toregister their basic information After that the users canlog in to report their recent health data related to six maincharacteristics HbA1c BMI glucose systolic hypertension(hypertension S) diastolic hypertension (hypertension D)and creatinine The health data then are stored in server sidesecurely
With the historical health data the module calculatesand monitors the health status in real time For each of thecharacteristics we set an alarm value according to literaturereview on IDF documents and reports Once the health datahas a dramatic change or the characteristics are close totheir corresponding alarm value ranges the system will auto-matically deliver a warning message to the users about thesituation To evaluate the usability of the system a 2-monthrandomized study is designed Thirty people volunteered asinternal test users to monitor their health condition via theDia-AID system During the test users measure and reportthe data of the six characteristics by themselves Based oneach new data report the system calculates the existing dataand newly submitted data to make a summarization of thehealth condition in real time Table 1 shows the reportedhealth data records by a user Cecil
The system records all the reported health data and gener-ates data change curves automatically For example Figure 6
8 Wireless Communications and Mobile Computing
Table 1 The reported health data records of the user Cecil
Characteristic Value Time Characteristic Value TimeHBA1C 75 2017-11-01 102203 HBA1C 90 2017-11-04 191518GLUCOSE 120 2017-11-01 102203 BMI 220 2017-11-04 191518BMI 221 2017-11-01 102203 GLUCOSE 150 2017-11-04 191518HYPERTENSION S 1110 2017-11-01 102203 HYPERTENSION S 1130 2017-11-04 191518HYPERTENSION D 820 2017-11-01 102203 HYPERTENSION D 840 2017-11-04 191518CREATININE 118 2017-11-01 102203 CREATININE 124 2017-11-04 191518HBA1C 60 2017-11-02 182439 HBA1C 90 2017-11-05 181025BMI 220 2017-11-02 182439 BMI 240 2017-11-05 181025GLUCOSE 100 2017-11-02 182439 GLUCOSE 150 2017-11-05 181025HYPERTENSION S 1100 2017-11-02 182439 HYPERTENSION S 1130 2017-11-05 181025HYPERTENSION D 800 2017-11-02 182439 HYPERTENSION D 840 2017-11-05 181025CREATININE 12 2017-11-02 182439 CREATININE 124 2017-11-05 181025HBA1C 78 2017-11-03 183028 HBA1C 100 2017-11-06 192113BMI 219 2017-11-03 183028 BMI 240 2017-11-06 192113GLUCOSE 120 2017-11-03 183028 GLUCOSE 180 2017-11-06 192113HYPERTENSION S 1110 2017-11-03 183028 HYPERTENSION S 1150 2017-11-06 192113HYPERTENSION D 820 2017-11-03 183028 HYPERTENSION D 860 2017-11-06 192113CREATININE 12 2017-11-03 183028 CREATININE 115 2017-11-06 192113
Figure 6 The health data management with early warning function and health record visualization by time for the user Cecil
Wireless Communications and Mobile Computing 9
shows the trend curve of Cecilrsquos diastolic hypertension in thelast 7 days When the current newly submitted health data iswithin safe range and there is no dramatic change comparedwith last report the system shows the user with the healthstatus messages eg ldquoYour health status is goodrdquo in greencolor Once the system identifies current user data exceedingalarm range (either too high or too low) according to thecurrent change trend the system will evaluate how long ittakes to reach the alarm value The system will evaluate howlong it will take to reach the alarm value If the period is tooshort the system will automatically warn the current userFor example the system warns the user Cecil that diastolichypertension is too high and will be in a danger range after 2days if the user does not have any control on it Through thehealth data management incorporating early warning userscan review their health status and take actions to reduce therisk of diabetes according to the warning messages
4 Results
41 Datasets Since there is no available diabetes FAQ datasetfor evaluation the evaluations of the proposed LSI-basedanswer ranking approach and answer selection methodwere based on the constructed LMD-FAQ repository Totest the LSI-based answer ranking approach we randomlyselected 500 750100012501500 and 1750 Chinese question-answer tuples (question ltanswer-setgt) from the repositoryrespectively as six subdatasets of Evaluation dataset-A Foreach question-answer tuple it contains one question and ananswer set which consists of one correct answer and nineincorrect answers randomly generated from the rest of therepositoryThus each question contains 10 candidate answersfor ranking For answer selection evaluation we suppose eachquestion has 119896 candidate answers ie for each questionk-1 incorrect answers are randomly generated as negativesamples In this paper k is set to 5 and 10 For the setting k=56000 QA pairs are randomly generated as Training dataset-B1 and 2500 QA pairs are randomly generated as Testingdataset-C1 For the setting k=10 8000QA pairs are randomlygenerated as Training dataset-B2 and 5000 QA pairs arerandomly generated as Testing dataset-C2
42 Evaluation Metrics The evaluation metrics includeMean Reciprocal Rank (MRR) AccuracyN of the returnedanswers precision recall and F1 measure all of which arecommonly usedmetrics to evaluate the performance of QampAsystems
(i) MRR Mean Reciprocal Rank of the first correctanswer as shown in (5) (ie 1 if a correct answerwas retrieved at rank 1 05 if a correct answer wasretrieved at rank 2 and so on Q is the test set and|119876| denotes the number of questions in Q rank119894represents the position of the first correct answer inanswer ranking candidates to a test question 119876119894)
(ii) AccuracyN proportion of correct answers in top119873 returned answers by the system as shown in (6)(119862119894(119873) = 1 if there is at least one correct answer in top119873 candidates otherwise it is 0)
(iii) Precision for any of the categories is the number oftrue positives (TP) (ie the number of questions cor-rectly labeled as belonging to the positive categories)that are divided by the total number of questionslabeled as belonging to the positive categories asshown in (7) False positive (FP) is the number ofquestions that the system incorrectly labeled
(iv) Recall is defined as the number of true positivesdivided by the total number of questions that actuallybelong to positive categories (ie the sum of truepositive and false negative) as shown in (8)
(v) F1-measure considers both the precision and therecall to compute a balanced score as shown in (9)
119872119877119877 = 1|119876||119876|
sum119894=1
1119903119886119899119896119894
(5)
119860119888119888119906119903119886119888119910119873 = 1|119876||119876|
sum119894=1
119862119894 (119873) (6)
119875119903119890119888119894119904119894119900119899 = 119879119875119879119875 + 119865119875 (7)
119877119890119888119886119897119897 = 119879119875119879119875 + 119865119873 (8)
1198651 = 2 lowast 119875119903119890119888119894119904119894119900119899 lowast 119877119890119888119886119897119897119875119903119890119888119894119904119894119900119899 + 119877119890119888119886119897119897 (9)
43 Results To validate the proposed LSI-based answerranking method we conduct the following two experimentsThe first experiment is to verify the effectiveness of the LSI-based answer ranking method by comparing to five base-lines We adopt Doc2Vec Latent Dirichlet Allocation (LDA)Locality Sensitive Hashing (LSH) docsim and Synonyms[62] as baselines We randomly select 500 questions andmeasure the performance inMRRandAccuracyN (AccN119873 = 1 2 3 4 5) Compared with the baselines our methodachieves the best performance in all evaluation metrics asshown in Table 2 For MRR our method improves by 1780compared to LSH which has the best performance amongbaselines For Acc1 LSH also obtains the best performanceas 06733 among baselines Our method outperformed LSHwith an improvement of 2352 In addition our methodranks 9499 of the correct answers in the top five ofcandidate answers The improvements of MRR and Acc1prove that the proposed method can potentially promote thepositions of correct answers
To assess the stability of the proposed method thesecond experiment is conducted by comparing to the samefive baselines with the measures of MRR and Acc1 Theused dataset is Evaluation dataset-A Figure 7 illustrates theexperiment results measured in MRR while Figure 8 showsthe results measured in Acc1 From the result our methodachieves stable performance over all different sizes of thequestion setsThis result is promising since ourmethod ranksmost of the correct answers in the top of the candidate answerlist Moreover compared to the baselines our method gains
10 Wireless Communications and Mobile Computing
Table 2 Evaluation results compared to baselines
MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499
Our Method
1009080706050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
MRR
LSHDoc2Vec
docsimLDASynonyms
Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure
the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list
Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy
To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments
1009080706
Acc
1
050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
Our MethodLSHDoc2Vec
docsimLDASynonyms
Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure
The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists
We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)
Wireless Communications and Mobile Computing 11
Table 3 The comparison of our answer selection method with baseline methods using different k settings
Setting Methods Accuracy Precision Recall F1
k=5
KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956
RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305
Our method 09222 08859 08657 08753
k=10
KNN 09032 08076 05258 05247RF 09106 07523 07277 07391
GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206
Our method 09651 09415 08559 08929
Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues
Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction
5 Conclusions
Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system
Acc
1000
0975
0950
0925
0900
0875
0850
0825
08002000 3000 4000
The size of training datasets5000 6000
RecallF1Precision
Figure 10 The performance with the increasing size of trainingdatasets when k=5
1000
0975
0950
0925
0900
0875
0850
0825
08004000 5000 6000
The size of training datasets7000 8000
Acc RecallF1Precision
Figure 11 The performance with the increasing size of trainingdatasets when k=10
12 Wireless Communications and Mobile Computing
assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods
Data Availability
The diabetes data is not made publicly available
Conflicts of Interest
There are no conflicts of interest in this paper
Acknowledgments
The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)
References
[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017
[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001
[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013
[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014
[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo
International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016
[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014
[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016
[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017
[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017
[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009
[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016
[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013
[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015
[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017
[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012
[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010
[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006
[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002
[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002
[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999
[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710
[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000
[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007
[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001
[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015
[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010
[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007
Wireless Communications and Mobile Computing 13
[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016
[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005
[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004
[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007
[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012
[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017
[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015
[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007
[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016
[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018
[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018
[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014
[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014
[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015
[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010
[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004
[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009
[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011
[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014
[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015
[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016
[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016
[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016
[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016
[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016
[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014
[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016
[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015
[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017
[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008
[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010
14 Wireless Communications and Mobile Computing
[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016
[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990
[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Wireless Communications and Mobile Computing 5
Figure 3 The visualization of example FAQ data in the LMD-FAQ repository
pattern which consists of five components the questiontarget question type concept event and constraint Anentropy-based method proposed in previous work [58] isapplied for automated semantic pattern generation Figure 3shows the visualization of example FAQ data in the LMD-FAQ repository
Based on the above procedure the method extractsFAQ data from professional websites formats them using aconsistent representation and indexes them with semanticpatterns for fast retrievalThrough the automatic process andthe human review on the indexed data the FAQ repositorycan be incrementally maintained Currently the LMD-FAQrepository comprises 19317 English frequently asked QApairs and 6041 Chinese QA pairs The repository providesour QampA system with fundamental data support for answer-ing commonly posted questions
32 The Multimode Fusion Question-Answering Frame-work Themultimode fusion question-answering framework(MMF-QA) integrates three QampA models knowledge-basedQampA FAQ-based QampA and web-based QampA The overallframework is shown in Figure 4The procedure of themodelsis described as follows
The knowledge-based QampA model relies on a diabetesknowledge base to generate concise answers for postedquestions For a new given question the model analyzes thestructure and keywords of the question and then generates
a corresponding semantic pattern Thus the question istransformed from natural language to a structural semanticrepresentation that captures semantic information such asquestion target question type concept event and con-straint The question then is further represented as a tuple([Concept1] Relation [Concept2]) in which ldquoConcept1rdquo andldquoConcept2rdquo are used to label meaningful entities The repre-sented question is used for answer extraction fromknowledgebase For instance ldquoWhatrsquos the symptoms of diabetesrdquo isrepresented as ([symptoms] Rel of [diabetes]) Thereforethe knowledge-based QampA process mainly maps entitiesand their relations to formally represented tuples which arefurther used to match knowledge base to retrieve accuratelymatched knowledge elements as answers
The FAQ-based QampA model computes matching scoresbetween a given question and questions in the FAQ reposi-toryThe questionswithmatching scores larger than a specificthreshold are kept as candidates The candidate questionsthen are ranked and the top 119896 questions with the highestscores are returned The model consists of three main stepsQsem-based question matching LSI-based answer rankingand answer selection As claimed by [59] a major challengeof FAQ-based QampA is to match questions to correspond-ing question-answer pairs Here we apply a QSem-basedquestion matching framework proposed in one of ourprevious works [60] to support answering FAQs throughreusing accumulated QA data The framework considers
6 Wireless Communications and Mobile Computing
User Questions
Multi-mode Fusion QampAFramework
Question Target Identification and Classification
Semantic Pattern-based Question Representation
Question Analysis
Knowledge Base
LMD-FAQ Repository
WebPreprocessed
questionsTop N
Answers
Semantic Pattern Generation
Semantic Pattern-based Question Representation
Knowledge-based QampA
Qsem-based Question Matching
LSI-based Answer Ranking
FAQ-based QampA
Answer Selection
Qsem-based Question Matching
Web-based QampA
knowledge elements
AN
Qcad
Qfaq
Qweb
Figure 4 The multimode fusion QampA framework MMF-QA
both question word types and semantic pattern accordingto their functionalities in question matching The questionword types include question target word user-oriented wordand irrelevant word These three word types are semanticallylabeled by a predefined ontology to enrich the semanticrepresentation of questions For each word type differentsimilarity strategies are applied to calculate the similarity asdescribed in [60] The similarity calculations for questiontarget and user-oriented word type between question 119902119894 and aFAQ candidate faq119895 are shown in (1) (2) and (3) respectively
119878119894119898119894119876119879 (119902119894 119891119886119902119895)
=2 times 10038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast) cap 119891119886119902119895 (119876119879119882 997888rarr 119871lowast)100381610038161003816100381610038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast)1003816100381610038161003816 +
10038161003816100381610038161003816119891119886119902119895 (119876119879119882 997888rarr 119871lowast)10038161003816100381610038161003816
(1)
119878119872119886119905119888ℎ (119908119898 119908119899)
=
0 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 = 01 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 ge 1
(2)
119878119894119898119894119880119874 (119902119894 119891119886119902119895)
=2 times sum119878119872119886119905119888ℎ (119902119894 (119880119874119882) 119891119886119902119895 (119880119874119882))
1003816100381610038161003816119902119894 (119880119874119882)1003816100381610038161003816 +10038161003816100381610038161003816119891119886119902119895 (119880119874119882)
10038161003816100381610038161003816
(3)
In the equations Simi119876119879 denotes the similarity score ofQT word type between a given new question 119902119894 and an exist-ing FAQ question faq119895 119871lowast denotes the set of semantic labelscorresponding to target words of the question 119902119894(119876119879119882 rarr119871lowast) and 119891119886119902119895(119876119879119882 rarr 119871lowast) represent the semantic labelsof QT words in 119902119894 and faq119895 through semantic labelingrespectively 119908119898 cup 119908119898 rarr 119878(119908119898) denotes synonymy wordsexpansion of word 119908119898 SMatch denotes the synonymy-basedword matching of two words 119908119898 and 119908119899 119908119898 cup 119908119898 rarr 119878(119908119898)is the synonymy extension of word 119908119898 by adding synonymyword collection 119878(119908119898) By integrating the previous three partsof matching the overall matching score 119878119894119898119894119878119862(119902119894 119891119886119902119895) ofthe two questions 119902119894 and faq119895 through balancing the similarityof each part is calculated as shown in
119872119886119905119888ℎ119878119862 (119902119894 119891119886119902119895) = 120572 times 119878119894119898119894119876119879 + 120573 times 119878119894119898119894119880119874+ (1 minus 120572 minus 120573) times 119878119894119898119894119878119875
(4)
After question matching top 119896 FAQs with the highestmatching scores are selected as candidates set 119876119891119886119902 Mean-while the web-based QampA model uses a similar strategy tocompute the matching scores to web question collections Itextracts 119896 answers from websites via the standard question-answering techniques Similarly the web-based QampA returnsa candidate question-answer set Q119908119890119887 Q119908119890119887 and 119876119891119886119902 are
Wireless Communications and Mobile Computing 7
Figure 5 The screen snapshots of the mobile-based system for providing diabetes information services using the three QampA models
merged as the final answer candidates Q119888119886119889 for answer rank-ing and answer selection
We propose a LSI-based answer ranking method to re-rank the questions in Q119888119886119889 The ranking method consistsof three steps feature extraction Latent Semantic Indexing(LSI) similarity calculation and ranking The extracted fea-tures of Chinese questions are bag-of-words (BOWs) andCharacter while the features for English questions are bag-of-words feature only The LSI approach takes advantage ofimplicit higher-order semantic structure and matches wordsin queries with words in documents [61] Here we treateach candidate answer as a short document and detect themost relevant answers via the LSI-based method After thatthe candidate answers are reranked based on the similarityvalues and the top 119873 answers as candidate list 119860119873 arereturned
Finally there is an answer selection processThe selectionof a candidate answer as correct or incorrect can be treated asa binary classification task The question and correspondingtop 119873 candidate answers in list 119860119873 are transformed to 119873QA pairs We propose an answer selection approach viaa Logistic Regression (LR) classifier which includes foursteps feature extraction parameter tuning model trainingand answer selecting Using the features similar to the LSI-based approach QA pairs are randomly selected from theLMD-FAQ repository as training data The QA pairs withcorrect answers are labeled as ldquo1rdquo and ldquo0rdquo otherwiseWe thentune the parameter ldquoCrdquo (inverse of regularization strength)to avoid overfittingunderfitting issue After parameter opti-mization the best parameter is applied in the LR classifierwhich then is applied to select the best119873 candidate answerswhere the top 1 is the best answer and the remaining N-1answers in list 119860119873 are relevant answers Figure 5 shows the
screen snapshots of the knowledge-based QampA FAQ-basedQampA and web-based QampA modes
33 Diabetes Data Management with Early Warning Sincediabetes patients and people at high risk usually needlong-term health management we develop a real-time datamanagement module incorporating early warning to achievepatient health self-management
In the data management module users are required toregister their basic information After that the users canlog in to report their recent health data related to six maincharacteristics HbA1c BMI glucose systolic hypertension(hypertension S) diastolic hypertension (hypertension D)and creatinine The health data then are stored in server sidesecurely
With the historical health data the module calculatesand monitors the health status in real time For each of thecharacteristics we set an alarm value according to literaturereview on IDF documents and reports Once the health datahas a dramatic change or the characteristics are close totheir corresponding alarm value ranges the system will auto-matically deliver a warning message to the users about thesituation To evaluate the usability of the system a 2-monthrandomized study is designed Thirty people volunteered asinternal test users to monitor their health condition via theDia-AID system During the test users measure and reportthe data of the six characteristics by themselves Based oneach new data report the system calculates the existing dataand newly submitted data to make a summarization of thehealth condition in real time Table 1 shows the reportedhealth data records by a user Cecil
The system records all the reported health data and gener-ates data change curves automatically For example Figure 6
8 Wireless Communications and Mobile Computing
Table 1 The reported health data records of the user Cecil
Characteristic Value Time Characteristic Value TimeHBA1C 75 2017-11-01 102203 HBA1C 90 2017-11-04 191518GLUCOSE 120 2017-11-01 102203 BMI 220 2017-11-04 191518BMI 221 2017-11-01 102203 GLUCOSE 150 2017-11-04 191518HYPERTENSION S 1110 2017-11-01 102203 HYPERTENSION S 1130 2017-11-04 191518HYPERTENSION D 820 2017-11-01 102203 HYPERTENSION D 840 2017-11-04 191518CREATININE 118 2017-11-01 102203 CREATININE 124 2017-11-04 191518HBA1C 60 2017-11-02 182439 HBA1C 90 2017-11-05 181025BMI 220 2017-11-02 182439 BMI 240 2017-11-05 181025GLUCOSE 100 2017-11-02 182439 GLUCOSE 150 2017-11-05 181025HYPERTENSION S 1100 2017-11-02 182439 HYPERTENSION S 1130 2017-11-05 181025HYPERTENSION D 800 2017-11-02 182439 HYPERTENSION D 840 2017-11-05 181025CREATININE 12 2017-11-02 182439 CREATININE 124 2017-11-05 181025HBA1C 78 2017-11-03 183028 HBA1C 100 2017-11-06 192113BMI 219 2017-11-03 183028 BMI 240 2017-11-06 192113GLUCOSE 120 2017-11-03 183028 GLUCOSE 180 2017-11-06 192113HYPERTENSION S 1110 2017-11-03 183028 HYPERTENSION S 1150 2017-11-06 192113HYPERTENSION D 820 2017-11-03 183028 HYPERTENSION D 860 2017-11-06 192113CREATININE 12 2017-11-03 183028 CREATININE 115 2017-11-06 192113
Figure 6 The health data management with early warning function and health record visualization by time for the user Cecil
Wireless Communications and Mobile Computing 9
shows the trend curve of Cecilrsquos diastolic hypertension in thelast 7 days When the current newly submitted health data iswithin safe range and there is no dramatic change comparedwith last report the system shows the user with the healthstatus messages eg ldquoYour health status is goodrdquo in greencolor Once the system identifies current user data exceedingalarm range (either too high or too low) according to thecurrent change trend the system will evaluate how long ittakes to reach the alarm value The system will evaluate howlong it will take to reach the alarm value If the period is tooshort the system will automatically warn the current userFor example the system warns the user Cecil that diastolichypertension is too high and will be in a danger range after 2days if the user does not have any control on it Through thehealth data management incorporating early warning userscan review their health status and take actions to reduce therisk of diabetes according to the warning messages
4 Results
41 Datasets Since there is no available diabetes FAQ datasetfor evaluation the evaluations of the proposed LSI-basedanswer ranking approach and answer selection methodwere based on the constructed LMD-FAQ repository Totest the LSI-based answer ranking approach we randomlyselected 500 750100012501500 and 1750 Chinese question-answer tuples (question ltanswer-setgt) from the repositoryrespectively as six subdatasets of Evaluation dataset-A Foreach question-answer tuple it contains one question and ananswer set which consists of one correct answer and nineincorrect answers randomly generated from the rest of therepositoryThus each question contains 10 candidate answersfor ranking For answer selection evaluation we suppose eachquestion has 119896 candidate answers ie for each questionk-1 incorrect answers are randomly generated as negativesamples In this paper k is set to 5 and 10 For the setting k=56000 QA pairs are randomly generated as Training dataset-B1 and 2500 QA pairs are randomly generated as Testingdataset-C1 For the setting k=10 8000QA pairs are randomlygenerated as Training dataset-B2 and 5000 QA pairs arerandomly generated as Testing dataset-C2
42 Evaluation Metrics The evaluation metrics includeMean Reciprocal Rank (MRR) AccuracyN of the returnedanswers precision recall and F1 measure all of which arecommonly usedmetrics to evaluate the performance of QampAsystems
(i) MRR Mean Reciprocal Rank of the first correctanswer as shown in (5) (ie 1 if a correct answerwas retrieved at rank 1 05 if a correct answer wasretrieved at rank 2 and so on Q is the test set and|119876| denotes the number of questions in Q rank119894represents the position of the first correct answer inanswer ranking candidates to a test question 119876119894)
(ii) AccuracyN proportion of correct answers in top119873 returned answers by the system as shown in (6)(119862119894(119873) = 1 if there is at least one correct answer in top119873 candidates otherwise it is 0)
(iii) Precision for any of the categories is the number oftrue positives (TP) (ie the number of questions cor-rectly labeled as belonging to the positive categories)that are divided by the total number of questionslabeled as belonging to the positive categories asshown in (7) False positive (FP) is the number ofquestions that the system incorrectly labeled
(iv) Recall is defined as the number of true positivesdivided by the total number of questions that actuallybelong to positive categories (ie the sum of truepositive and false negative) as shown in (8)
(v) F1-measure considers both the precision and therecall to compute a balanced score as shown in (9)
119872119877119877 = 1|119876||119876|
sum119894=1
1119903119886119899119896119894
(5)
119860119888119888119906119903119886119888119910119873 = 1|119876||119876|
sum119894=1
119862119894 (119873) (6)
119875119903119890119888119894119904119894119900119899 = 119879119875119879119875 + 119865119875 (7)
119877119890119888119886119897119897 = 119879119875119879119875 + 119865119873 (8)
1198651 = 2 lowast 119875119903119890119888119894119904119894119900119899 lowast 119877119890119888119886119897119897119875119903119890119888119894119904119894119900119899 + 119877119890119888119886119897119897 (9)
43 Results To validate the proposed LSI-based answerranking method we conduct the following two experimentsThe first experiment is to verify the effectiveness of the LSI-based answer ranking method by comparing to five base-lines We adopt Doc2Vec Latent Dirichlet Allocation (LDA)Locality Sensitive Hashing (LSH) docsim and Synonyms[62] as baselines We randomly select 500 questions andmeasure the performance inMRRandAccuracyN (AccN119873 = 1 2 3 4 5) Compared with the baselines our methodachieves the best performance in all evaluation metrics asshown in Table 2 For MRR our method improves by 1780compared to LSH which has the best performance amongbaselines For Acc1 LSH also obtains the best performanceas 06733 among baselines Our method outperformed LSHwith an improvement of 2352 In addition our methodranks 9499 of the correct answers in the top five ofcandidate answers The improvements of MRR and Acc1prove that the proposed method can potentially promote thepositions of correct answers
To assess the stability of the proposed method thesecond experiment is conducted by comparing to the samefive baselines with the measures of MRR and Acc1 Theused dataset is Evaluation dataset-A Figure 7 illustrates theexperiment results measured in MRR while Figure 8 showsthe results measured in Acc1 From the result our methodachieves stable performance over all different sizes of thequestion setsThis result is promising since ourmethod ranksmost of the correct answers in the top of the candidate answerlist Moreover compared to the baselines our method gains
10 Wireless Communications and Mobile Computing
Table 2 Evaluation results compared to baselines
MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499
Our Method
1009080706050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
MRR
LSHDoc2Vec
docsimLDASynonyms
Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure
the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list
Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy
To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments
1009080706
Acc
1
050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
Our MethodLSHDoc2Vec
docsimLDASynonyms
Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure
The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists
We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)
Wireless Communications and Mobile Computing 11
Table 3 The comparison of our answer selection method with baseline methods using different k settings
Setting Methods Accuracy Precision Recall F1
k=5
KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956
RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305
Our method 09222 08859 08657 08753
k=10
KNN 09032 08076 05258 05247RF 09106 07523 07277 07391
GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206
Our method 09651 09415 08559 08929
Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues
Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction
5 Conclusions
Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system
Acc
1000
0975
0950
0925
0900
0875
0850
0825
08002000 3000 4000
The size of training datasets5000 6000
RecallF1Precision
Figure 10 The performance with the increasing size of trainingdatasets when k=5
1000
0975
0950
0925
0900
0875
0850
0825
08004000 5000 6000
The size of training datasets7000 8000
Acc RecallF1Precision
Figure 11 The performance with the increasing size of trainingdatasets when k=10
12 Wireless Communications and Mobile Computing
assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods
Data Availability
The diabetes data is not made publicly available
Conflicts of Interest
There are no conflicts of interest in this paper
Acknowledgments
The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)
References
[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017
[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001
[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013
[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014
[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo
International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016
[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014
[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016
[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017
[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017
[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009
[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016
[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013
[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015
[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017
[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012
[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010
[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006
[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002
[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002
[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999
[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710
[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000
[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007
[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001
[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015
[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010
[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007
Wireless Communications and Mobile Computing 13
[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016
[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005
[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004
[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007
[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012
[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017
[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015
[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007
[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016
[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018
[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018
[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014
[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014
[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015
[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010
[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004
[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009
[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011
[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014
[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015
[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016
[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016
[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016
[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016
[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016
[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014
[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016
[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015
[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017
[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008
[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010
14 Wireless Communications and Mobile Computing
[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016
[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990
[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
6 Wireless Communications and Mobile Computing
User Questions
Multi-mode Fusion QampAFramework
Question Target Identification and Classification
Semantic Pattern-based Question Representation
Question Analysis
Knowledge Base
LMD-FAQ Repository
WebPreprocessed
questionsTop N
Answers
Semantic Pattern Generation
Semantic Pattern-based Question Representation
Knowledge-based QampA
Qsem-based Question Matching
LSI-based Answer Ranking
FAQ-based QampA
Answer Selection
Qsem-based Question Matching
Web-based QampA
knowledge elements
AN
Qcad
Qfaq
Qweb
Figure 4 The multimode fusion QampA framework MMF-QA
both question word types and semantic pattern accordingto their functionalities in question matching The questionword types include question target word user-oriented wordand irrelevant word These three word types are semanticallylabeled by a predefined ontology to enrich the semanticrepresentation of questions For each word type differentsimilarity strategies are applied to calculate the similarity asdescribed in [60] The similarity calculations for questiontarget and user-oriented word type between question 119902119894 and aFAQ candidate faq119895 are shown in (1) (2) and (3) respectively
119878119894119898119894119876119879 (119902119894 119891119886119902119895)
=2 times 10038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast) cap 119891119886119902119895 (119876119879119882 997888rarr 119871lowast)100381610038161003816100381610038161003816100381610038161003816119902119894 (119876119879119882 997888rarr 119871lowast)1003816100381610038161003816 +
10038161003816100381610038161003816119891119886119902119895 (119876119879119882 997888rarr 119871lowast)10038161003816100381610038161003816
(1)
119878119872119886119905119888ℎ (119908119898 119908119899)
=
0 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 = 01 1003816100381610038161003816(119908119898 cup 119908119898 997888rarr 119878 (119908119898)) cap (119908119899 cup 119908119899 997888rarr 119878 (119908119899))1003816100381610038161003816 ge 1
(2)
119878119894119898119894119880119874 (119902119894 119891119886119902119895)
=2 times sum119878119872119886119905119888ℎ (119902119894 (119880119874119882) 119891119886119902119895 (119880119874119882))
1003816100381610038161003816119902119894 (119880119874119882)1003816100381610038161003816 +10038161003816100381610038161003816119891119886119902119895 (119880119874119882)
10038161003816100381610038161003816
(3)
In the equations Simi119876119879 denotes the similarity score ofQT word type between a given new question 119902119894 and an exist-ing FAQ question faq119895 119871lowast denotes the set of semantic labelscorresponding to target words of the question 119902119894(119876119879119882 rarr119871lowast) and 119891119886119902119895(119876119879119882 rarr 119871lowast) represent the semantic labelsof QT words in 119902119894 and faq119895 through semantic labelingrespectively 119908119898 cup 119908119898 rarr 119878(119908119898) denotes synonymy wordsexpansion of word 119908119898 SMatch denotes the synonymy-basedword matching of two words 119908119898 and 119908119899 119908119898 cup 119908119898 rarr 119878(119908119898)is the synonymy extension of word 119908119898 by adding synonymyword collection 119878(119908119898) By integrating the previous three partsof matching the overall matching score 119878119894119898119894119878119862(119902119894 119891119886119902119895) ofthe two questions 119902119894 and faq119895 through balancing the similarityof each part is calculated as shown in
119872119886119905119888ℎ119878119862 (119902119894 119891119886119902119895) = 120572 times 119878119894119898119894119876119879 + 120573 times 119878119894119898119894119880119874+ (1 minus 120572 minus 120573) times 119878119894119898119894119878119875
(4)
After question matching top 119896 FAQs with the highestmatching scores are selected as candidates set 119876119891119886119902 Mean-while the web-based QampA model uses a similar strategy tocompute the matching scores to web question collections Itextracts 119896 answers from websites via the standard question-answering techniques Similarly the web-based QampA returnsa candidate question-answer set Q119908119890119887 Q119908119890119887 and 119876119891119886119902 are
Wireless Communications and Mobile Computing 7
Figure 5 The screen snapshots of the mobile-based system for providing diabetes information services using the three QampA models
merged as the final answer candidates Q119888119886119889 for answer rank-ing and answer selection
We propose a LSI-based answer ranking method to re-rank the questions in Q119888119886119889 The ranking method consistsof three steps feature extraction Latent Semantic Indexing(LSI) similarity calculation and ranking The extracted fea-tures of Chinese questions are bag-of-words (BOWs) andCharacter while the features for English questions are bag-of-words feature only The LSI approach takes advantage ofimplicit higher-order semantic structure and matches wordsin queries with words in documents [61] Here we treateach candidate answer as a short document and detect themost relevant answers via the LSI-based method After thatthe candidate answers are reranked based on the similarityvalues and the top 119873 answers as candidate list 119860119873 arereturned
Finally there is an answer selection processThe selectionof a candidate answer as correct or incorrect can be treated asa binary classification task The question and correspondingtop 119873 candidate answers in list 119860119873 are transformed to 119873QA pairs We propose an answer selection approach viaa Logistic Regression (LR) classifier which includes foursteps feature extraction parameter tuning model trainingand answer selecting Using the features similar to the LSI-based approach QA pairs are randomly selected from theLMD-FAQ repository as training data The QA pairs withcorrect answers are labeled as ldquo1rdquo and ldquo0rdquo otherwiseWe thentune the parameter ldquoCrdquo (inverse of regularization strength)to avoid overfittingunderfitting issue After parameter opti-mization the best parameter is applied in the LR classifierwhich then is applied to select the best119873 candidate answerswhere the top 1 is the best answer and the remaining N-1answers in list 119860119873 are relevant answers Figure 5 shows the
screen snapshots of the knowledge-based QampA FAQ-basedQampA and web-based QampA modes
33 Diabetes Data Management with Early Warning Sincediabetes patients and people at high risk usually needlong-term health management we develop a real-time datamanagement module incorporating early warning to achievepatient health self-management
In the data management module users are required toregister their basic information After that the users canlog in to report their recent health data related to six maincharacteristics HbA1c BMI glucose systolic hypertension(hypertension S) diastolic hypertension (hypertension D)and creatinine The health data then are stored in server sidesecurely
With the historical health data the module calculatesand monitors the health status in real time For each of thecharacteristics we set an alarm value according to literaturereview on IDF documents and reports Once the health datahas a dramatic change or the characteristics are close totheir corresponding alarm value ranges the system will auto-matically deliver a warning message to the users about thesituation To evaluate the usability of the system a 2-monthrandomized study is designed Thirty people volunteered asinternal test users to monitor their health condition via theDia-AID system During the test users measure and reportthe data of the six characteristics by themselves Based oneach new data report the system calculates the existing dataand newly submitted data to make a summarization of thehealth condition in real time Table 1 shows the reportedhealth data records by a user Cecil
The system records all the reported health data and gener-ates data change curves automatically For example Figure 6
8 Wireless Communications and Mobile Computing
Table 1 The reported health data records of the user Cecil
Characteristic Value Time Characteristic Value TimeHBA1C 75 2017-11-01 102203 HBA1C 90 2017-11-04 191518GLUCOSE 120 2017-11-01 102203 BMI 220 2017-11-04 191518BMI 221 2017-11-01 102203 GLUCOSE 150 2017-11-04 191518HYPERTENSION S 1110 2017-11-01 102203 HYPERTENSION S 1130 2017-11-04 191518HYPERTENSION D 820 2017-11-01 102203 HYPERTENSION D 840 2017-11-04 191518CREATININE 118 2017-11-01 102203 CREATININE 124 2017-11-04 191518HBA1C 60 2017-11-02 182439 HBA1C 90 2017-11-05 181025BMI 220 2017-11-02 182439 BMI 240 2017-11-05 181025GLUCOSE 100 2017-11-02 182439 GLUCOSE 150 2017-11-05 181025HYPERTENSION S 1100 2017-11-02 182439 HYPERTENSION S 1130 2017-11-05 181025HYPERTENSION D 800 2017-11-02 182439 HYPERTENSION D 840 2017-11-05 181025CREATININE 12 2017-11-02 182439 CREATININE 124 2017-11-05 181025HBA1C 78 2017-11-03 183028 HBA1C 100 2017-11-06 192113BMI 219 2017-11-03 183028 BMI 240 2017-11-06 192113GLUCOSE 120 2017-11-03 183028 GLUCOSE 180 2017-11-06 192113HYPERTENSION S 1110 2017-11-03 183028 HYPERTENSION S 1150 2017-11-06 192113HYPERTENSION D 820 2017-11-03 183028 HYPERTENSION D 860 2017-11-06 192113CREATININE 12 2017-11-03 183028 CREATININE 115 2017-11-06 192113
Figure 6 The health data management with early warning function and health record visualization by time for the user Cecil
Wireless Communications and Mobile Computing 9
shows the trend curve of Cecilrsquos diastolic hypertension in thelast 7 days When the current newly submitted health data iswithin safe range and there is no dramatic change comparedwith last report the system shows the user with the healthstatus messages eg ldquoYour health status is goodrdquo in greencolor Once the system identifies current user data exceedingalarm range (either too high or too low) according to thecurrent change trend the system will evaluate how long ittakes to reach the alarm value The system will evaluate howlong it will take to reach the alarm value If the period is tooshort the system will automatically warn the current userFor example the system warns the user Cecil that diastolichypertension is too high and will be in a danger range after 2days if the user does not have any control on it Through thehealth data management incorporating early warning userscan review their health status and take actions to reduce therisk of diabetes according to the warning messages
4 Results
41 Datasets Since there is no available diabetes FAQ datasetfor evaluation the evaluations of the proposed LSI-basedanswer ranking approach and answer selection methodwere based on the constructed LMD-FAQ repository Totest the LSI-based answer ranking approach we randomlyselected 500 750100012501500 and 1750 Chinese question-answer tuples (question ltanswer-setgt) from the repositoryrespectively as six subdatasets of Evaluation dataset-A Foreach question-answer tuple it contains one question and ananswer set which consists of one correct answer and nineincorrect answers randomly generated from the rest of therepositoryThus each question contains 10 candidate answersfor ranking For answer selection evaluation we suppose eachquestion has 119896 candidate answers ie for each questionk-1 incorrect answers are randomly generated as negativesamples In this paper k is set to 5 and 10 For the setting k=56000 QA pairs are randomly generated as Training dataset-B1 and 2500 QA pairs are randomly generated as Testingdataset-C1 For the setting k=10 8000QA pairs are randomlygenerated as Training dataset-B2 and 5000 QA pairs arerandomly generated as Testing dataset-C2
42 Evaluation Metrics The evaluation metrics includeMean Reciprocal Rank (MRR) AccuracyN of the returnedanswers precision recall and F1 measure all of which arecommonly usedmetrics to evaluate the performance of QampAsystems
(i) MRR Mean Reciprocal Rank of the first correctanswer as shown in (5) (ie 1 if a correct answerwas retrieved at rank 1 05 if a correct answer wasretrieved at rank 2 and so on Q is the test set and|119876| denotes the number of questions in Q rank119894represents the position of the first correct answer inanswer ranking candidates to a test question 119876119894)
(ii) AccuracyN proportion of correct answers in top119873 returned answers by the system as shown in (6)(119862119894(119873) = 1 if there is at least one correct answer in top119873 candidates otherwise it is 0)
(iii) Precision for any of the categories is the number oftrue positives (TP) (ie the number of questions cor-rectly labeled as belonging to the positive categories)that are divided by the total number of questionslabeled as belonging to the positive categories asshown in (7) False positive (FP) is the number ofquestions that the system incorrectly labeled
(iv) Recall is defined as the number of true positivesdivided by the total number of questions that actuallybelong to positive categories (ie the sum of truepositive and false negative) as shown in (8)
(v) F1-measure considers both the precision and therecall to compute a balanced score as shown in (9)
119872119877119877 = 1|119876||119876|
sum119894=1
1119903119886119899119896119894
(5)
119860119888119888119906119903119886119888119910119873 = 1|119876||119876|
sum119894=1
119862119894 (119873) (6)
119875119903119890119888119894119904119894119900119899 = 119879119875119879119875 + 119865119875 (7)
119877119890119888119886119897119897 = 119879119875119879119875 + 119865119873 (8)
1198651 = 2 lowast 119875119903119890119888119894119904119894119900119899 lowast 119877119890119888119886119897119897119875119903119890119888119894119904119894119900119899 + 119877119890119888119886119897119897 (9)
43 Results To validate the proposed LSI-based answerranking method we conduct the following two experimentsThe first experiment is to verify the effectiveness of the LSI-based answer ranking method by comparing to five base-lines We adopt Doc2Vec Latent Dirichlet Allocation (LDA)Locality Sensitive Hashing (LSH) docsim and Synonyms[62] as baselines We randomly select 500 questions andmeasure the performance inMRRandAccuracyN (AccN119873 = 1 2 3 4 5) Compared with the baselines our methodachieves the best performance in all evaluation metrics asshown in Table 2 For MRR our method improves by 1780compared to LSH which has the best performance amongbaselines For Acc1 LSH also obtains the best performanceas 06733 among baselines Our method outperformed LSHwith an improvement of 2352 In addition our methodranks 9499 of the correct answers in the top five ofcandidate answers The improvements of MRR and Acc1prove that the proposed method can potentially promote thepositions of correct answers
To assess the stability of the proposed method thesecond experiment is conducted by comparing to the samefive baselines with the measures of MRR and Acc1 Theused dataset is Evaluation dataset-A Figure 7 illustrates theexperiment results measured in MRR while Figure 8 showsthe results measured in Acc1 From the result our methodachieves stable performance over all different sizes of thequestion setsThis result is promising since ourmethod ranksmost of the correct answers in the top of the candidate answerlist Moreover compared to the baselines our method gains
10 Wireless Communications and Mobile Computing
Table 2 Evaluation results compared to baselines
MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499
Our Method
1009080706050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
MRR
LSHDoc2Vec
docsimLDASynonyms
Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure
the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list
Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy
To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments
1009080706
Acc
1
050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
Our MethodLSHDoc2Vec
docsimLDASynonyms
Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure
The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists
We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)
Wireless Communications and Mobile Computing 11
Table 3 The comparison of our answer selection method with baseline methods using different k settings
Setting Methods Accuracy Precision Recall F1
k=5
KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956
RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305
Our method 09222 08859 08657 08753
k=10
KNN 09032 08076 05258 05247RF 09106 07523 07277 07391
GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206
Our method 09651 09415 08559 08929
Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues
Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction
5 Conclusions
Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system
Acc
1000
0975
0950
0925
0900
0875
0850
0825
08002000 3000 4000
The size of training datasets5000 6000
RecallF1Precision
Figure 10 The performance with the increasing size of trainingdatasets when k=5
1000
0975
0950
0925
0900
0875
0850
0825
08004000 5000 6000
The size of training datasets7000 8000
Acc RecallF1Precision
Figure 11 The performance with the increasing size of trainingdatasets when k=10
12 Wireless Communications and Mobile Computing
assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods
Data Availability
The diabetes data is not made publicly available
Conflicts of Interest
There are no conflicts of interest in this paper
Acknowledgments
The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)
References
[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017
[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001
[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013
[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014
[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo
International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016
[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014
[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016
[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017
[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017
[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009
[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016
[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013
[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015
[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017
[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012
[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010
[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006
[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002
[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002
[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999
[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710
[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000
[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007
[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001
[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015
[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010
[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007
Wireless Communications and Mobile Computing 13
[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016
[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005
[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004
[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007
[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012
[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017
[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015
[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007
[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016
[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018
[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018
[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014
[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014
[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015
[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010
[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004
[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009
[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011
[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014
[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015
[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016
[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016
[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016
[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016
[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016
[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014
[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016
[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015
[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017
[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008
[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010
14 Wireless Communications and Mobile Computing
[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016
[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990
[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Wireless Communications and Mobile Computing 7
Figure 5 The screen snapshots of the mobile-based system for providing diabetes information services using the three QampA models
merged as the final answer candidates Q119888119886119889 for answer rank-ing and answer selection
We propose a LSI-based answer ranking method to re-rank the questions in Q119888119886119889 The ranking method consistsof three steps feature extraction Latent Semantic Indexing(LSI) similarity calculation and ranking The extracted fea-tures of Chinese questions are bag-of-words (BOWs) andCharacter while the features for English questions are bag-of-words feature only The LSI approach takes advantage ofimplicit higher-order semantic structure and matches wordsin queries with words in documents [61] Here we treateach candidate answer as a short document and detect themost relevant answers via the LSI-based method After thatthe candidate answers are reranked based on the similarityvalues and the top 119873 answers as candidate list 119860119873 arereturned
Finally there is an answer selection processThe selectionof a candidate answer as correct or incorrect can be treated asa binary classification task The question and correspondingtop 119873 candidate answers in list 119860119873 are transformed to 119873QA pairs We propose an answer selection approach viaa Logistic Regression (LR) classifier which includes foursteps feature extraction parameter tuning model trainingand answer selecting Using the features similar to the LSI-based approach QA pairs are randomly selected from theLMD-FAQ repository as training data The QA pairs withcorrect answers are labeled as ldquo1rdquo and ldquo0rdquo otherwiseWe thentune the parameter ldquoCrdquo (inverse of regularization strength)to avoid overfittingunderfitting issue After parameter opti-mization the best parameter is applied in the LR classifierwhich then is applied to select the best119873 candidate answerswhere the top 1 is the best answer and the remaining N-1answers in list 119860119873 are relevant answers Figure 5 shows the
screen snapshots of the knowledge-based QampA FAQ-basedQampA and web-based QampA modes
33 Diabetes Data Management with Early Warning Sincediabetes patients and people at high risk usually needlong-term health management we develop a real-time datamanagement module incorporating early warning to achievepatient health self-management
In the data management module users are required toregister their basic information After that the users canlog in to report their recent health data related to six maincharacteristics HbA1c BMI glucose systolic hypertension(hypertension S) diastolic hypertension (hypertension D)and creatinine The health data then are stored in server sidesecurely
With the historical health data the module calculatesand monitors the health status in real time For each of thecharacteristics we set an alarm value according to literaturereview on IDF documents and reports Once the health datahas a dramatic change or the characteristics are close totheir corresponding alarm value ranges the system will auto-matically deliver a warning message to the users about thesituation To evaluate the usability of the system a 2-monthrandomized study is designed Thirty people volunteered asinternal test users to monitor their health condition via theDia-AID system During the test users measure and reportthe data of the six characteristics by themselves Based oneach new data report the system calculates the existing dataand newly submitted data to make a summarization of thehealth condition in real time Table 1 shows the reportedhealth data records by a user Cecil
The system records all the reported health data and gener-ates data change curves automatically For example Figure 6
8 Wireless Communications and Mobile Computing
Table 1 The reported health data records of the user Cecil
Characteristic Value Time Characteristic Value TimeHBA1C 75 2017-11-01 102203 HBA1C 90 2017-11-04 191518GLUCOSE 120 2017-11-01 102203 BMI 220 2017-11-04 191518BMI 221 2017-11-01 102203 GLUCOSE 150 2017-11-04 191518HYPERTENSION S 1110 2017-11-01 102203 HYPERTENSION S 1130 2017-11-04 191518HYPERTENSION D 820 2017-11-01 102203 HYPERTENSION D 840 2017-11-04 191518CREATININE 118 2017-11-01 102203 CREATININE 124 2017-11-04 191518HBA1C 60 2017-11-02 182439 HBA1C 90 2017-11-05 181025BMI 220 2017-11-02 182439 BMI 240 2017-11-05 181025GLUCOSE 100 2017-11-02 182439 GLUCOSE 150 2017-11-05 181025HYPERTENSION S 1100 2017-11-02 182439 HYPERTENSION S 1130 2017-11-05 181025HYPERTENSION D 800 2017-11-02 182439 HYPERTENSION D 840 2017-11-05 181025CREATININE 12 2017-11-02 182439 CREATININE 124 2017-11-05 181025HBA1C 78 2017-11-03 183028 HBA1C 100 2017-11-06 192113BMI 219 2017-11-03 183028 BMI 240 2017-11-06 192113GLUCOSE 120 2017-11-03 183028 GLUCOSE 180 2017-11-06 192113HYPERTENSION S 1110 2017-11-03 183028 HYPERTENSION S 1150 2017-11-06 192113HYPERTENSION D 820 2017-11-03 183028 HYPERTENSION D 860 2017-11-06 192113CREATININE 12 2017-11-03 183028 CREATININE 115 2017-11-06 192113
Figure 6 The health data management with early warning function and health record visualization by time for the user Cecil
Wireless Communications and Mobile Computing 9
shows the trend curve of Cecilrsquos diastolic hypertension in thelast 7 days When the current newly submitted health data iswithin safe range and there is no dramatic change comparedwith last report the system shows the user with the healthstatus messages eg ldquoYour health status is goodrdquo in greencolor Once the system identifies current user data exceedingalarm range (either too high or too low) according to thecurrent change trend the system will evaluate how long ittakes to reach the alarm value The system will evaluate howlong it will take to reach the alarm value If the period is tooshort the system will automatically warn the current userFor example the system warns the user Cecil that diastolichypertension is too high and will be in a danger range after 2days if the user does not have any control on it Through thehealth data management incorporating early warning userscan review their health status and take actions to reduce therisk of diabetes according to the warning messages
4 Results
41 Datasets Since there is no available diabetes FAQ datasetfor evaluation the evaluations of the proposed LSI-basedanswer ranking approach and answer selection methodwere based on the constructed LMD-FAQ repository Totest the LSI-based answer ranking approach we randomlyselected 500 750100012501500 and 1750 Chinese question-answer tuples (question ltanswer-setgt) from the repositoryrespectively as six subdatasets of Evaluation dataset-A Foreach question-answer tuple it contains one question and ananswer set which consists of one correct answer and nineincorrect answers randomly generated from the rest of therepositoryThus each question contains 10 candidate answersfor ranking For answer selection evaluation we suppose eachquestion has 119896 candidate answers ie for each questionk-1 incorrect answers are randomly generated as negativesamples In this paper k is set to 5 and 10 For the setting k=56000 QA pairs are randomly generated as Training dataset-B1 and 2500 QA pairs are randomly generated as Testingdataset-C1 For the setting k=10 8000QA pairs are randomlygenerated as Training dataset-B2 and 5000 QA pairs arerandomly generated as Testing dataset-C2
42 Evaluation Metrics The evaluation metrics includeMean Reciprocal Rank (MRR) AccuracyN of the returnedanswers precision recall and F1 measure all of which arecommonly usedmetrics to evaluate the performance of QampAsystems
(i) MRR Mean Reciprocal Rank of the first correctanswer as shown in (5) (ie 1 if a correct answerwas retrieved at rank 1 05 if a correct answer wasretrieved at rank 2 and so on Q is the test set and|119876| denotes the number of questions in Q rank119894represents the position of the first correct answer inanswer ranking candidates to a test question 119876119894)
(ii) AccuracyN proportion of correct answers in top119873 returned answers by the system as shown in (6)(119862119894(119873) = 1 if there is at least one correct answer in top119873 candidates otherwise it is 0)
(iii) Precision for any of the categories is the number oftrue positives (TP) (ie the number of questions cor-rectly labeled as belonging to the positive categories)that are divided by the total number of questionslabeled as belonging to the positive categories asshown in (7) False positive (FP) is the number ofquestions that the system incorrectly labeled
(iv) Recall is defined as the number of true positivesdivided by the total number of questions that actuallybelong to positive categories (ie the sum of truepositive and false negative) as shown in (8)
(v) F1-measure considers both the precision and therecall to compute a balanced score as shown in (9)
119872119877119877 = 1|119876||119876|
sum119894=1
1119903119886119899119896119894
(5)
119860119888119888119906119903119886119888119910119873 = 1|119876||119876|
sum119894=1
119862119894 (119873) (6)
119875119903119890119888119894119904119894119900119899 = 119879119875119879119875 + 119865119875 (7)
119877119890119888119886119897119897 = 119879119875119879119875 + 119865119873 (8)
1198651 = 2 lowast 119875119903119890119888119894119904119894119900119899 lowast 119877119890119888119886119897119897119875119903119890119888119894119904119894119900119899 + 119877119890119888119886119897119897 (9)
43 Results To validate the proposed LSI-based answerranking method we conduct the following two experimentsThe first experiment is to verify the effectiveness of the LSI-based answer ranking method by comparing to five base-lines We adopt Doc2Vec Latent Dirichlet Allocation (LDA)Locality Sensitive Hashing (LSH) docsim and Synonyms[62] as baselines We randomly select 500 questions andmeasure the performance inMRRandAccuracyN (AccN119873 = 1 2 3 4 5) Compared with the baselines our methodachieves the best performance in all evaluation metrics asshown in Table 2 For MRR our method improves by 1780compared to LSH which has the best performance amongbaselines For Acc1 LSH also obtains the best performanceas 06733 among baselines Our method outperformed LSHwith an improvement of 2352 In addition our methodranks 9499 of the correct answers in the top five ofcandidate answers The improvements of MRR and Acc1prove that the proposed method can potentially promote thepositions of correct answers
To assess the stability of the proposed method thesecond experiment is conducted by comparing to the samefive baselines with the measures of MRR and Acc1 Theused dataset is Evaluation dataset-A Figure 7 illustrates theexperiment results measured in MRR while Figure 8 showsthe results measured in Acc1 From the result our methodachieves stable performance over all different sizes of thequestion setsThis result is promising since ourmethod ranksmost of the correct answers in the top of the candidate answerlist Moreover compared to the baselines our method gains
10 Wireless Communications and Mobile Computing
Table 2 Evaluation results compared to baselines
MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499
Our Method
1009080706050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
MRR
LSHDoc2Vec
docsimLDASynonyms
Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure
the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list
Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy
To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments
1009080706
Acc
1
050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
Our MethodLSHDoc2Vec
docsimLDASynonyms
Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure
The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists
We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)
Wireless Communications and Mobile Computing 11
Table 3 The comparison of our answer selection method with baseline methods using different k settings
Setting Methods Accuracy Precision Recall F1
k=5
KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956
RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305
Our method 09222 08859 08657 08753
k=10
KNN 09032 08076 05258 05247RF 09106 07523 07277 07391
GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206
Our method 09651 09415 08559 08929
Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues
Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction
5 Conclusions
Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system
Acc
1000
0975
0950
0925
0900
0875
0850
0825
08002000 3000 4000
The size of training datasets5000 6000
RecallF1Precision
Figure 10 The performance with the increasing size of trainingdatasets when k=5
1000
0975
0950
0925
0900
0875
0850
0825
08004000 5000 6000
The size of training datasets7000 8000
Acc RecallF1Precision
Figure 11 The performance with the increasing size of trainingdatasets when k=10
12 Wireless Communications and Mobile Computing
assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods
Data Availability
The diabetes data is not made publicly available
Conflicts of Interest
There are no conflicts of interest in this paper
Acknowledgments
The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)
References
[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017
[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001
[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013
[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014
[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo
International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016
[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014
[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016
[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017
[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017
[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009
[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016
[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013
[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015
[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017
[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012
[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010
[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006
[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002
[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002
[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999
[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710
[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000
[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007
[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001
[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015
[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010
[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007
Wireless Communications and Mobile Computing 13
[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016
[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005
[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004
[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007
[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012
[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017
[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015
[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007
[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016
[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018
[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018
[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014
[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014
[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015
[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010
[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004
[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009
[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011
[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014
[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015
[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016
[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016
[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016
[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016
[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016
[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014
[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016
[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015
[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017
[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008
[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010
14 Wireless Communications and Mobile Computing
[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016
[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990
[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
8 Wireless Communications and Mobile Computing
Table 1 The reported health data records of the user Cecil
Characteristic Value Time Characteristic Value TimeHBA1C 75 2017-11-01 102203 HBA1C 90 2017-11-04 191518GLUCOSE 120 2017-11-01 102203 BMI 220 2017-11-04 191518BMI 221 2017-11-01 102203 GLUCOSE 150 2017-11-04 191518HYPERTENSION S 1110 2017-11-01 102203 HYPERTENSION S 1130 2017-11-04 191518HYPERTENSION D 820 2017-11-01 102203 HYPERTENSION D 840 2017-11-04 191518CREATININE 118 2017-11-01 102203 CREATININE 124 2017-11-04 191518HBA1C 60 2017-11-02 182439 HBA1C 90 2017-11-05 181025BMI 220 2017-11-02 182439 BMI 240 2017-11-05 181025GLUCOSE 100 2017-11-02 182439 GLUCOSE 150 2017-11-05 181025HYPERTENSION S 1100 2017-11-02 182439 HYPERTENSION S 1130 2017-11-05 181025HYPERTENSION D 800 2017-11-02 182439 HYPERTENSION D 840 2017-11-05 181025CREATININE 12 2017-11-02 182439 CREATININE 124 2017-11-05 181025HBA1C 78 2017-11-03 183028 HBA1C 100 2017-11-06 192113BMI 219 2017-11-03 183028 BMI 240 2017-11-06 192113GLUCOSE 120 2017-11-03 183028 GLUCOSE 180 2017-11-06 192113HYPERTENSION S 1110 2017-11-03 183028 HYPERTENSION S 1150 2017-11-06 192113HYPERTENSION D 820 2017-11-03 183028 HYPERTENSION D 860 2017-11-06 192113CREATININE 12 2017-11-03 183028 CREATININE 115 2017-11-06 192113
Figure 6 The health data management with early warning function and health record visualization by time for the user Cecil
Wireless Communications and Mobile Computing 9
shows the trend curve of Cecilrsquos diastolic hypertension in thelast 7 days When the current newly submitted health data iswithin safe range and there is no dramatic change comparedwith last report the system shows the user with the healthstatus messages eg ldquoYour health status is goodrdquo in greencolor Once the system identifies current user data exceedingalarm range (either too high or too low) according to thecurrent change trend the system will evaluate how long ittakes to reach the alarm value The system will evaluate howlong it will take to reach the alarm value If the period is tooshort the system will automatically warn the current userFor example the system warns the user Cecil that diastolichypertension is too high and will be in a danger range after 2days if the user does not have any control on it Through thehealth data management incorporating early warning userscan review their health status and take actions to reduce therisk of diabetes according to the warning messages
4 Results
41 Datasets Since there is no available diabetes FAQ datasetfor evaluation the evaluations of the proposed LSI-basedanswer ranking approach and answer selection methodwere based on the constructed LMD-FAQ repository Totest the LSI-based answer ranking approach we randomlyselected 500 750100012501500 and 1750 Chinese question-answer tuples (question ltanswer-setgt) from the repositoryrespectively as six subdatasets of Evaluation dataset-A Foreach question-answer tuple it contains one question and ananswer set which consists of one correct answer and nineincorrect answers randomly generated from the rest of therepositoryThus each question contains 10 candidate answersfor ranking For answer selection evaluation we suppose eachquestion has 119896 candidate answers ie for each questionk-1 incorrect answers are randomly generated as negativesamples In this paper k is set to 5 and 10 For the setting k=56000 QA pairs are randomly generated as Training dataset-B1 and 2500 QA pairs are randomly generated as Testingdataset-C1 For the setting k=10 8000QA pairs are randomlygenerated as Training dataset-B2 and 5000 QA pairs arerandomly generated as Testing dataset-C2
42 Evaluation Metrics The evaluation metrics includeMean Reciprocal Rank (MRR) AccuracyN of the returnedanswers precision recall and F1 measure all of which arecommonly usedmetrics to evaluate the performance of QampAsystems
(i) MRR Mean Reciprocal Rank of the first correctanswer as shown in (5) (ie 1 if a correct answerwas retrieved at rank 1 05 if a correct answer wasretrieved at rank 2 and so on Q is the test set and|119876| denotes the number of questions in Q rank119894represents the position of the first correct answer inanswer ranking candidates to a test question 119876119894)
(ii) AccuracyN proportion of correct answers in top119873 returned answers by the system as shown in (6)(119862119894(119873) = 1 if there is at least one correct answer in top119873 candidates otherwise it is 0)
(iii) Precision for any of the categories is the number oftrue positives (TP) (ie the number of questions cor-rectly labeled as belonging to the positive categories)that are divided by the total number of questionslabeled as belonging to the positive categories asshown in (7) False positive (FP) is the number ofquestions that the system incorrectly labeled
(iv) Recall is defined as the number of true positivesdivided by the total number of questions that actuallybelong to positive categories (ie the sum of truepositive and false negative) as shown in (8)
(v) F1-measure considers both the precision and therecall to compute a balanced score as shown in (9)
119872119877119877 = 1|119876||119876|
sum119894=1
1119903119886119899119896119894
(5)
119860119888119888119906119903119886119888119910119873 = 1|119876||119876|
sum119894=1
119862119894 (119873) (6)
119875119903119890119888119894119904119894119900119899 = 119879119875119879119875 + 119865119875 (7)
119877119890119888119886119897119897 = 119879119875119879119875 + 119865119873 (8)
1198651 = 2 lowast 119875119903119890119888119894119904119894119900119899 lowast 119877119890119888119886119897119897119875119903119890119888119894119904119894119900119899 + 119877119890119888119886119897119897 (9)
43 Results To validate the proposed LSI-based answerranking method we conduct the following two experimentsThe first experiment is to verify the effectiveness of the LSI-based answer ranking method by comparing to five base-lines We adopt Doc2Vec Latent Dirichlet Allocation (LDA)Locality Sensitive Hashing (LSH) docsim and Synonyms[62] as baselines We randomly select 500 questions andmeasure the performance inMRRandAccuracyN (AccN119873 = 1 2 3 4 5) Compared with the baselines our methodachieves the best performance in all evaluation metrics asshown in Table 2 For MRR our method improves by 1780compared to LSH which has the best performance amongbaselines For Acc1 LSH also obtains the best performanceas 06733 among baselines Our method outperformed LSHwith an improvement of 2352 In addition our methodranks 9499 of the correct answers in the top five ofcandidate answers The improvements of MRR and Acc1prove that the proposed method can potentially promote thepositions of correct answers
To assess the stability of the proposed method thesecond experiment is conducted by comparing to the samefive baselines with the measures of MRR and Acc1 Theused dataset is Evaluation dataset-A Figure 7 illustrates theexperiment results measured in MRR while Figure 8 showsthe results measured in Acc1 From the result our methodachieves stable performance over all different sizes of thequestion setsThis result is promising since ourmethod ranksmost of the correct answers in the top of the candidate answerlist Moreover compared to the baselines our method gains
10 Wireless Communications and Mobile Computing
Table 2 Evaluation results compared to baselines
MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499
Our Method
1009080706050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
MRR
LSHDoc2Vec
docsimLDASynonyms
Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure
the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list
Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy
To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments
1009080706
Acc
1
050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
Our MethodLSHDoc2Vec
docsimLDASynonyms
Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure
The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists
We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)
Wireless Communications and Mobile Computing 11
Table 3 The comparison of our answer selection method with baseline methods using different k settings
Setting Methods Accuracy Precision Recall F1
k=5
KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956
RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305
Our method 09222 08859 08657 08753
k=10
KNN 09032 08076 05258 05247RF 09106 07523 07277 07391
GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206
Our method 09651 09415 08559 08929
Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues
Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction
5 Conclusions
Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system
Acc
1000
0975
0950
0925
0900
0875
0850
0825
08002000 3000 4000
The size of training datasets5000 6000
RecallF1Precision
Figure 10 The performance with the increasing size of trainingdatasets when k=5
1000
0975
0950
0925
0900
0875
0850
0825
08004000 5000 6000
The size of training datasets7000 8000
Acc RecallF1Precision
Figure 11 The performance with the increasing size of trainingdatasets when k=10
12 Wireless Communications and Mobile Computing
assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods
Data Availability
The diabetes data is not made publicly available
Conflicts of Interest
There are no conflicts of interest in this paper
Acknowledgments
The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)
References
[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017
[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001
[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013
[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014
[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo
International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016
[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014
[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016
[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017
[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017
[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009
[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016
[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013
[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015
[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017
[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012
[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010
[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006
[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002
[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002
[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999
[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710
[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000
[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007
[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001
[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015
[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010
[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007
Wireless Communications and Mobile Computing 13
[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016
[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005
[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004
[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007
[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012
[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017
[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015
[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007
[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016
[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018
[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018
[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014
[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014
[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015
[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010
[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004
[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009
[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011
[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014
[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015
[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016
[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016
[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016
[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016
[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016
[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014
[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016
[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015
[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017
[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008
[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010
14 Wireless Communications and Mobile Computing
[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016
[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990
[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Wireless Communications and Mobile Computing 9
shows the trend curve of Cecilrsquos diastolic hypertension in thelast 7 days When the current newly submitted health data iswithin safe range and there is no dramatic change comparedwith last report the system shows the user with the healthstatus messages eg ldquoYour health status is goodrdquo in greencolor Once the system identifies current user data exceedingalarm range (either too high or too low) according to thecurrent change trend the system will evaluate how long ittakes to reach the alarm value The system will evaluate howlong it will take to reach the alarm value If the period is tooshort the system will automatically warn the current userFor example the system warns the user Cecil that diastolichypertension is too high and will be in a danger range after 2days if the user does not have any control on it Through thehealth data management incorporating early warning userscan review their health status and take actions to reduce therisk of diabetes according to the warning messages
4 Results
41 Datasets Since there is no available diabetes FAQ datasetfor evaluation the evaluations of the proposed LSI-basedanswer ranking approach and answer selection methodwere based on the constructed LMD-FAQ repository Totest the LSI-based answer ranking approach we randomlyselected 500 750100012501500 and 1750 Chinese question-answer tuples (question ltanswer-setgt) from the repositoryrespectively as six subdatasets of Evaluation dataset-A Foreach question-answer tuple it contains one question and ananswer set which consists of one correct answer and nineincorrect answers randomly generated from the rest of therepositoryThus each question contains 10 candidate answersfor ranking For answer selection evaluation we suppose eachquestion has 119896 candidate answers ie for each questionk-1 incorrect answers are randomly generated as negativesamples In this paper k is set to 5 and 10 For the setting k=56000 QA pairs are randomly generated as Training dataset-B1 and 2500 QA pairs are randomly generated as Testingdataset-C1 For the setting k=10 8000QA pairs are randomlygenerated as Training dataset-B2 and 5000 QA pairs arerandomly generated as Testing dataset-C2
42 Evaluation Metrics The evaluation metrics includeMean Reciprocal Rank (MRR) AccuracyN of the returnedanswers precision recall and F1 measure all of which arecommonly usedmetrics to evaluate the performance of QampAsystems
(i) MRR Mean Reciprocal Rank of the first correctanswer as shown in (5) (ie 1 if a correct answerwas retrieved at rank 1 05 if a correct answer wasretrieved at rank 2 and so on Q is the test set and|119876| denotes the number of questions in Q rank119894represents the position of the first correct answer inanswer ranking candidates to a test question 119876119894)
(ii) AccuracyN proportion of correct answers in top119873 returned answers by the system as shown in (6)(119862119894(119873) = 1 if there is at least one correct answer in top119873 candidates otherwise it is 0)
(iii) Precision for any of the categories is the number oftrue positives (TP) (ie the number of questions cor-rectly labeled as belonging to the positive categories)that are divided by the total number of questionslabeled as belonging to the positive categories asshown in (7) False positive (FP) is the number ofquestions that the system incorrectly labeled
(iv) Recall is defined as the number of true positivesdivided by the total number of questions that actuallybelong to positive categories (ie the sum of truepositive and false negative) as shown in (8)
(v) F1-measure considers both the precision and therecall to compute a balanced score as shown in (9)
119872119877119877 = 1|119876||119876|
sum119894=1
1119903119886119899119896119894
(5)
119860119888119888119906119903119886119888119910119873 = 1|119876||119876|
sum119894=1
119862119894 (119873) (6)
119875119903119890119888119894119904119894119900119899 = 119879119875119879119875 + 119865119875 (7)
119877119890119888119886119897119897 = 119879119875119879119875 + 119865119873 (8)
1198651 = 2 lowast 119875119903119890119888119894119904119894119900119899 lowast 119877119890119888119886119897119897119875119903119890119888119894119904119894119900119899 + 119877119890119888119886119897119897 (9)
43 Results To validate the proposed LSI-based answerranking method we conduct the following two experimentsThe first experiment is to verify the effectiveness of the LSI-based answer ranking method by comparing to five base-lines We adopt Doc2Vec Latent Dirichlet Allocation (LDA)Locality Sensitive Hashing (LSH) docsim and Synonyms[62] as baselines We randomly select 500 questions andmeasure the performance inMRRandAccuracyN (AccN119873 = 1 2 3 4 5) Compared with the baselines our methodachieves the best performance in all evaluation metrics asshown in Table 2 For MRR our method improves by 1780compared to LSH which has the best performance amongbaselines For Acc1 LSH also obtains the best performanceas 06733 among baselines Our method outperformed LSHwith an improvement of 2352 In addition our methodranks 9499 of the correct answers in the top five ofcandidate answers The improvements of MRR and Acc1prove that the proposed method can potentially promote thepositions of correct answers
To assess the stability of the proposed method thesecond experiment is conducted by comparing to the samefive baselines with the measures of MRR and Acc1 Theused dataset is Evaluation dataset-A Figure 7 illustrates theexperiment results measured in MRR while Figure 8 showsthe results measured in Acc1 From the result our methodachieves stable performance over all different sizes of thequestion setsThis result is promising since ourmethod ranksmost of the correct answers in the top of the candidate answerlist Moreover compared to the baselines our method gains
10 Wireless Communications and Mobile Computing
Table 2 Evaluation results compared to baselines
MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499
Our Method
1009080706050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
MRR
LSHDoc2Vec
docsimLDASynonyms
Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure
the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list
Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy
To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments
1009080706
Acc
1
050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
Our MethodLSHDoc2Vec
docsimLDASynonyms
Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure
The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists
We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)
Wireless Communications and Mobile Computing 11
Table 3 The comparison of our answer selection method with baseline methods using different k settings
Setting Methods Accuracy Precision Recall F1
k=5
KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956
RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305
Our method 09222 08859 08657 08753
k=10
KNN 09032 08076 05258 05247RF 09106 07523 07277 07391
GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206
Our method 09651 09415 08559 08929
Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues
Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction
5 Conclusions
Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system
Acc
1000
0975
0950
0925
0900
0875
0850
0825
08002000 3000 4000
The size of training datasets5000 6000
RecallF1Precision
Figure 10 The performance with the increasing size of trainingdatasets when k=5
1000
0975
0950
0925
0900
0875
0850
0825
08004000 5000 6000
The size of training datasets7000 8000
Acc RecallF1Precision
Figure 11 The performance with the increasing size of trainingdatasets when k=10
12 Wireless Communications and Mobile Computing
assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods
Data Availability
The diabetes data is not made publicly available
Conflicts of Interest
There are no conflicts of interest in this paper
Acknowledgments
The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)
References
[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017
[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001
[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013
[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014
[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo
International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016
[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014
[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016
[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017
[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017
[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009
[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016
[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013
[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015
[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017
[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012
[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010
[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006
[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002
[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002
[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999
[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710
[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000
[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007
[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001
[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015
[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010
[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007
Wireless Communications and Mobile Computing 13
[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016
[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005
[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004
[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007
[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012
[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017
[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015
[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007
[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016
[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018
[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018
[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014
[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014
[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015
[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010
[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004
[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009
[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011
[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014
[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015
[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016
[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016
[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016
[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016
[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016
[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014
[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016
[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015
[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017
[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008
[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010
14 Wireless Communications and Mobile Computing
[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016
[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990
[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
10 Wireless Communications and Mobile Computing
Table 2 Evaluation results compared to baselines
MRR Acc1 Acc2 Acc3 Acc4 Acc5Doc2Vec 02811 00721 01923 02905 03967 05250LDA 05536 03326 05811 07294 08016 08597Synonyms 06082 04108 06332 07454 08376 08877docsim 07233 06312 07074 07515 07895 08336LSH 07517 06733 07394 07835 08056 08276Our Method 08855 08317 08918 09259 09459 09499
Our Method
1009080706050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
MRR
LSHDoc2Vec
docsimLDASynonyms
Figure 7 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the MRR measure
the best performance measured in Acc1 on all the questionsets From the results even with the increasing number ofquestions nearly 85 of correct answers are ranked in thetop of the candidate answer list
Since our answer selection approach uses a binary clas-sifier we assess the method by evaluating the effectivenessof answer classification During the evaluation three experi-ments are designed the first is to train optimized parametersthe second aims to assess the stability of classification andthe third aims to evaluate the effectiveness by comparingwithbaseline methods The datasets used for evaluation are fromthe constructed LMD-FAQ repository and the evaluationmetrics are precision recall F1 and accuracy
To avoid the overfittingunderfitting problemwe tune theparameter ldquoCrdquo (inverse of regularization strength) for the LRclassifier as described above 12651 QA pairs are randomlyselected from the LMD-FAQ repository as the dataset Thedataset then is randomly shuffled into two subgroups as train-ing (70) and testing (30) We use k-fold cross-validationto assess the model performance Figure 9 demonstratesthe validation curve where training accuracy represents theresults on testing dataset and validation accuracy denotes the10-fold cross-validation results From the results the methodgains the best performance when ldquoCrdquo is equal to 1 which isthe best parameter applied in the following two experiments
1009080706
Acc
1
050403020100
500 750 1000 1250 of question-answer tuples
1500 1750
Our MethodLSHDoc2Vec
docsimLDASynonyms
Figure 8 The performance comparison between the proposedmethod and baselines with the increasing number of question-answer tuples using the Acc1 measure
The stability of the proposed method is tested withdifferent sizes of training data and different 119896 values Bysetting k=5 the Training dataset-B1 is randomly dividedinto 5 training subsets containing 2000 3000 4000 5000and 6000 question-answer pairs respectively Similarly bysetting k=10 the Training dataset B2 is randomly divided into5 training subsets with 4000 5000 6000 7000 and 8000question-answer pairs The datasets C1 and C2 are used astesting datasets independently The results are measured inaccuracy (Acc) precision recall and F1-measure (F1) Asshown in Figure 10 ourmethod receives a stable performanceon all evaluation metrics with k=5 When the size of trainingdataset is larger than 3000 the performance on all metricsincreases The experiment results indicate that our methodis not affected much by training dataset size As illustratedin Figure 11 the performance measured in accuracy remainsstable on all sizes of training datasets With the increasingof training dataset size the performance measured in F1increases Comparing the performance on the two datasetsettings our method yields a better performance when 119896equals 10 which indicates that the proposed method remainsstable even with more incorrect answers in candidate answerlists
We further compare our method with five commonlyused classificationmethods Support VectorMachine (SVM)
Wireless Communications and Mobile Computing 11
Table 3 The comparison of our answer selection method with baseline methods using different k settings
Setting Methods Accuracy Precision Recall F1
k=5
KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956
RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305
Our method 09222 08859 08657 08753
k=10
KNN 09032 08076 05258 05247RF 09106 07523 07277 07391
GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206
Our method 09651 09415 08559 08929
Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues
Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction
5 Conclusions
Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system
Acc
1000
0975
0950
0925
0900
0875
0850
0825
08002000 3000 4000
The size of training datasets5000 6000
RecallF1Precision
Figure 10 The performance with the increasing size of trainingdatasets when k=5
1000
0975
0950
0925
0900
0875
0850
0825
08004000 5000 6000
The size of training datasets7000 8000
Acc RecallF1Precision
Figure 11 The performance with the increasing size of trainingdatasets when k=10
12 Wireless Communications and Mobile Computing
assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods
Data Availability
The diabetes data is not made publicly available
Conflicts of Interest
There are no conflicts of interest in this paper
Acknowledgments
The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)
References
[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017
[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001
[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013
[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014
[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo
International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016
[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014
[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016
[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017
[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017
[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009
[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016
[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013
[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015
[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017
[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012
[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010
[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006
[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002
[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002
[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999
[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710
[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000
[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007
[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001
[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015
[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010
[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007
Wireless Communications and Mobile Computing 13
[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016
[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005
[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004
[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007
[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012
[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017
[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015
[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007
[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016
[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018
[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018
[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014
[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014
[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015
[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010
[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004
[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009
[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011
[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014
[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015
[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016
[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016
[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016
[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016
[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016
[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014
[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016
[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015
[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017
[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008
[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010
14 Wireless Communications and Mobile Computing
[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016
[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990
[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Wireless Communications and Mobile Computing 11
Table 3 The comparison of our answer selection method with baseline methods using different k settings
Setting Methods Accuracy Precision Recall F1
k=5
KNN 08052 06820 05679 05746GaussianNB 07383 06980 08026 06956
RF 08301 07339 07097 07203SVM 08373 07513 08029 07706PPN 08842 08129 08540 08305
Our method 09222 08859 08657 08753
k=10
KNN 09032 08076 05258 05247RF 09106 07523 07277 07391
GaussianNB 08956 07197 07737 07422SVM 09154 07642 08399 07951PPN 09271 07897 08633 08206
Our method 09651 09415 08559 08929
Figure 9 The validation curve of parameter ldquoCrdquo using differentvalues
Perceptron (PPN) Random Forest (RF) Gaussian NaiveBayes (GaussianNB) and k-Nearest Neighbor (KNN) Thedatasets used are the Training dataset-B1 and Trainingdataset-B2 and the corresponding Testing dataset-C1 andTesting dataset-C2 The evaluation metrics are accuracyprecision recall and F1 Table 3 shows the comparison resultsusing different dataset settings By setting k=5 an accuracy of09222 a precision of 08859 a recall of 08657 and an F1 of08753 are achieved as the best performance compared to fivebaseline methods By setting k=10 our method also obtainsthe highest performance on all evaluation metrics comparedto the baselines Particularly the higher precision and F1 aremore preferable since our expectation is the return of morecorrect answers to users to improve user satisfaction
5 Conclusions
Aimed at assisting diabetes patients or populations at highrisk of diabetes to have long-term health management thispaper designed and developed a mobile-based question-answering and early warning system Dia-AID The system
Acc
1000
0975
0950
0925
0900
0875
0850
0825
08002000 3000 4000
The size of training datasets5000 6000
RecallF1Precision
Figure 10 The performance with the increasing size of trainingdatasets when k=5
1000
0975
0950
0925
0900
0875
0850
0825
08004000 5000 6000
The size of training datasets7000 8000
Acc RecallF1Precision
Figure 11 The performance with the increasing size of trainingdatasets when k=10
12 Wireless Communications and Mobile Computing
assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods
Data Availability
The diabetes data is not made publicly available
Conflicts of Interest
There are no conflicts of interest in this paper
Acknowledgments
The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)
References
[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017
[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001
[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013
[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014
[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo
International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016
[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014
[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016
[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017
[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017
[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009
[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016
[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013
[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015
[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017
[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012
[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010
[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006
[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002
[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002
[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999
[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710
[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000
[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007
[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001
[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015
[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010
[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007
Wireless Communications and Mobile Computing 13
[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016
[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005
[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004
[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007
[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012
[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017
[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015
[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007
[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016
[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018
[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018
[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014
[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014
[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015
[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010
[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004
[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009
[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011
[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014
[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015
[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016
[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016
[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016
[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016
[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016
[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014
[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016
[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015
[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017
[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008
[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010
14 Wireless Communications and Mobile Computing
[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016
[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990
[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
12 Wireless Communications and Mobile Computing
assists users in providing diabetes information and monitor-ing their health status through diabetes question answeringrisk assessment and health record management We evalu-ated two essential models in our system and compared themwith five baseline methods on various metrics The resultsshowed that our methods achieved the best performancecompared with the baseline methods
Data Availability
The diabetes data is not made publicly available
Conflicts of Interest
There are no conflicts of interest in this paper
Acknowledgments
The work was substantially supported by a grant fromthe National Natural Science Foundation of China (no61772146) the Science and Technology Plan of Guangzhou(no 201804010296) the Innovative School Project in HigherEducation of Guangdong Province (no YQ2015062) andScientific Research Innovation Team in Department of Edu-cation of Guangdong Province (no 2017KCXTD013)
References
[1] H Yoo and K Chung ldquoPHR Based Diabetes Index ServiceModel Using Life Behavior Analysisrdquo Wireless Personal Com-munications vol 93 no 1 pp 161ndash174 2017
[2] P Zimmet KGMMAlberti and J Shaw ldquoGlobal and societalimplications of the diabetes epidemicrdquo Nature vol 414 no6865 pp 782ndash787 2001
[3] F Aguiree and A Brown ldquoIDF Diabetes Atlasrdquo in IDF Dia-betes Atlas - sixth Edition International Diabetes FederationBelgium 2013
[4] P Z Zimmet D J Magliano W H Herman and J E ShawldquoDiabetes a 21st century challengerdquo The Lancet Diabetes ampEndocrinology vol 2 no 1 pp 56ndash64 2014
[5] J G Melton IDF Diabetes Atlas 8th edition 2017[6] G Roglic ldquoWHO Global report on diabetes a summaryrdquo
International Journal of Noncommunicable Diseases vol 1 no1 pp 3ndash8 2016
[7] E-Y Jung J Kim K-Y Chung and D K Park ldquoMobilehealthcare application with EMR interoperability for diabetespatientsrdquo Cluster Computing vol 17 no 3 pp 871ndash880 2014
[8] World Health Organization ldquoGlobal Report on Diabetesrdquo Isbnvol 978 article 88 2016
[9] J Beck D A Greenwood L Blanton et al ldquo2017 National Stan-dards for Diabetes Self-Management Education and SupportrdquoDiabetes Care article dci170025 2017
[10] A D American Diabetes Association ldquo4 Lifestyle Manage-mentrdquo Diabetes Care vol 40 no Suppl 1 pp S33ndashS43 2017
[11] Diabetes Prevention Program Research Group ldquo10-year follow-up of diabetes incidence and weight loss in the DiabetesPrevention Program Outcomes StudyrdquoThe Lancet vol 374 no9702 pp 1677ndash1686 2009
[12] K Chung J-C Kim and R C Park ldquoKnowledge-based healthservice considering user convenience using hybrid Wi-Fi P2PrdquoInformation Technology and Management vol 17 no 1 pp 67ndash80 2016
[13] Susannah Fox and Maeve Duggan ldquoHealth Online 2013 mdashPew Research Centerrdquo National survey by the Pew ResearchCenterrsquos Internet and American Life Project 2013 httpwwwpewinternetorg20130115health-online-2013
[14] P Krebs and D T Duncan ldquoHealth app use among US mobilephone owners a national surveyrdquo JMIR mHealth and uHealthvol 3 no 4 article e101 2015
[15] S Chavez D Fedele Y Guo et al ldquoMobile apps for themanagement of diabetesrdquo Diabetes Care vol 40 no 10 ppe145ndashe146 2017
[16] KWaki H Fujita Y Uchimura et al ldquoDialBetics Smartphone-based self-management for type 2 diabetes patientsrdquo Journal ofDiabetes Science and Technology vol 6 no 4 pp 983ndash985 2012
[17] P P Committee and A Classification ldquoStandards of medicalcare in diabetesmdash2010rdquo Diabetes Care vol 33 no S1 pp S11ndashS61 2010
[18] D Demner-Fushman and J Lin ldquoAnswer extraction semanticclustering and extractive summarization for clinical questionansweringrdquo in Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of theAssociation for Computational Linguistics COLINGACL 2006pp 841ndash848 July 2006
[19] S Schulz M Honeck and U Hahn ldquoBiomedical text retrievalin languages with a complex morphologyrdquo in Proceedings of themeeting of the association for computational linguistics vol 3 pp61ndash68 July 2002
[20] W R Hersh M Katherine Crabtree D H Hickam et alldquoFactors associated with success in searching MEDLINE andapplying evidence to answer clinical questionsrdquo Journal of theAmericanMedical Informatics Association vol 9 no 3 pp 283ndash293 2002
[21] J W Ely J A Osheroff M H Ebell et al ldquoAnalysis of questionsasked by family doctors regarding patient carerdquo BMJ vol 319no 7206 pp 358ndash361 1999
[22] J W Ely ldquoObstacles to answering doctorsrsquo questions aboutpatient care with evidence qualitative studyrdquo BMJ vol 324 no7339 pp 710-710
[23] G R Bergus C S Randall S D Sinift and D M RosenthalldquoDoes the structure of clinical questions affect the outcome ofcurbside consultations with specialty colleaguesrdquo Archives ofFamily Medicine vol 9 no 6 pp 541ndash547 2000
[24] P Sondhi P Raj V V Kumar and AMittal ldquoQuestion process-ing and clustering in INDOCA biomedical question answeringsystemrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2007 2007
[25] E M Voorhees ldquoThe TREC question answering trackrdquoNaturalLanguage Engineering vol 7 no 4 pp 361ndash378 2001
[26] T Hao W Xie C Chen and Y Shen ldquoSystematic comparisonof question target classification taxonomies towards questionansweringrdquo Communications in Computer and InformationScience vol 568 pp 131ndash143 2015
[27] S J Athenikos and H Han ldquoBiomedical question answering Asurveyrdquo Computer Methods and Programs in Biomedicine vol99 no 1 pp 1ndash24 2010
[28] D Molla and J L Vicedo ldquoQuestion answering in restricteddomains An overviewrdquo Computational Linguistics vol 33 no1 pp 41ndash61 2007
Wireless Communications and Mobile Computing 13
[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016
[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005
[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004
[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007
[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012
[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017
[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015
[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007
[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016
[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018
[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018
[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014
[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014
[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015
[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010
[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004
[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009
[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011
[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014
[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015
[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016
[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016
[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016
[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016
[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016
[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014
[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016
[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015
[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017
[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008
[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010
14 Wireless Communications and Mobile Computing
[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016
[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990
[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Wireless Communications and Mobile Computing 13
[29] A Mishra and S K Jain ldquoA survey on question answeringsystems with classificationrdquo Journal of King Saud University -Computer and Information Sciences vol 28 no 3 pp 345ndash3612016
[30] H Yu and C Sable ldquoBeing Erlang Shen Identifying AnswerableQuestionsrdquo in Proceedings of the Nineteenth International JointConference onArtificial Intelligence onKnowledge andReasoningfor Answering Questions 2005
[31] F Rinaldi J Dowdall G Schneider and A Persidis ldquoAnsweringquestions in the genomics domainrdquo in Proceedings of the ACL2004 Workshop on Question Answering in Restricted Domainspp 46ndash53 2004
[32] A M Cohen J Yang S Fisher B Roark andW R Hersh ldquoTheOHSUBiomedical QuestionAnswering System Frameworkrdquo inProceedings of the Sixteenth Text Retrieval Conference 2007
[33] A B Abacha and P Zweigenbaum ldquoMedical question answer-ing Translating medical questions into SPARQL queriesrdquo inProceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium IHIrsquo12 pp 41ndash50 January 2012
[34] A Marginean ldquoQuestion answering over biomedical linkeddata with Grammatical Frameworkrdquo Journal of Web SemanticsScience Services and Agents on the World Wide Web vol 8 no4 pp 565ndash580 2017
[35] A Ben Abacha and P Zweigenbaum ldquoMEANS A medicalquestion-answering system combining NLP techniques andsemanticWeb technologiesrdquo Information Processing ampManage-ment vol 51 no 5 pp 570ndash594 2015
[36] R M Terol P Martınez-Barco and M Palomar ldquoA knowledgebased method for the medical question answering problemrdquoComputers in Biology andMedicine vol 37 no 10 pp 1511ndash15212007
[37] H Zhang T W S Chow and Q M J Wu ldquoOrganizing Booksand Authors by Multilayer SOMrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 27 no 12 pp 2537ndash25502016
[38] H Zhang S Wang X Xu T W Chow and Q M WuldquoTree2Vector Learning a Vectorial Representation for Tree-Structured Datardquo IEEE Transactions on Neural Networks andLearning Systems pp 1ndash15 2018
[39] H Zhang S Wang Z Mingbo X Xu and Y Ye ldquoLocalityReconstruction Models for Book Representationrdquo IEEE Trans-actions on Knowledge and Data Engineering 2018
[40] K-Y Chung J Yoo and K J Kim ldquoRecent trends on mobilecomputing and future networksrdquo Personal and UbiquitousComputing vol 18 no 3 pp 489ndash491 2014
[41] S-K Kang K-Y Chung and J-H Lee ldquoDevelopment of headdetection and tracking systems for visual surveillancerdquo Personaland Ubiquitous Computing vol 18 no 3 pp 515ndash522 2014
[42] H Jung K-Y Chung and Y-H Lee ldquoDecision supportingmethod for chronic disease patients based on mining frequentpattern treerdquoMultimedia Tools and Applications vol 74 no 20pp 8979ndash8991 2015
[43] D HanM Lee and S Park ldquoTHE-MUSSMobile u-health ser-vice systemrdquo Computer Methods and Programs in Biomedicinevol 97 no 2 pp 178ndash188 2010
[44] S Wild G Roglic A Green R Sicree and H King ldquoGlobalprevalence of diabetes estimates for the year 2000 and projec-tions for 2030rdquoDiabetes Care vol 27 no 5 pp 1047ndash1053 2004
[45] M E Cox and D Edelman ldquoTests for screening and diagnosisof type 2 diabetesrdquo Clinical Diabetes vol 27 no 4 pp 132ndash1382009
[46] G Danaei MM Finucane Y Lu et al ldquoNational regional andglobal trends in fasting plasma glucose and diabetes prevalencesince 1980 systematic analysis of health examination surveysand epidemiological studies with 370 country-years and 2sdot7million participantsrdquo The Lancet vol 378 no 9785 pp 31ndash402011
[47] R A Bailey Y Wang V Zhu and M F Rupnow ldquoChronickidney disease in US adults with type 2 diabetes an updatednational estimate of prevalence based on kidney diseaseimproving Global Outcomes (KDIGO) stagingrdquo BMC ResearchNotes vol 7 article 415 2014
[48] P-J Lin D M Kent A Winn J T Cohen and P J NeumannldquoMultiple chronic conditions in type 2 diabetes mellitus preva-lence and consequencesrdquo The American Journal of ManagedCare vol 21 no 1 pp e23ndashe34 2015
[49] M L Tracey M Gilmartin K OrsquoNeill et al ldquoEpidemiologyof diabetes and complications among adults in the Republic ofIreland 1998ndash2015 a systematic review andmeta-analysisrdquoBMCPublic Health vol 16 article 132 no 1 2016
[50] L Yang J Shao Y Bian et al ldquoPrevalence of type 2 diabetesmellitus among inland residents in China (2000ndash2014) Ameta-analysisrdquo Journal of Diabetes Investigation vol 7 no 6 pp 845ndash852 2016
[51] K Iglay H Hannachi P J Howie et al ldquoPrevalence and co-prevalence of comorbidities among patientswith type 2 diabetesmellitusrdquo Current Medical Research and Opinion vol 32 no 7pp 1243ndash1252 2016
[52] P Zimmet K G Alberti D J Magliano and P H BennettldquoDiabetes mellitus statistics on prevalence and mortality Factsand fallaciesrdquo Nature Reviews Endocrinology vol 12 no 10 pp616ndash622 2016
[53] I Dedov M Shestakova M M Benedetti D Simon I Pakho-mov and G Galstyan ldquoPrevalence of type 2 diabetes mellitus(T2DM) in the adult Russian population (NATION study)rdquoDiabetes Research and Clinical Practice vol 115 pp 90ndash95 2016
[54] K Waki H Fujita Y Uchimura et al ldquoDialBetics A novelsmartphone-based self-management support system for type 2diabetes patientsrdquo Journal of Diabetes Science and Technologyvol 8 no 2 pp 209ndash215 2014
[55] T Hao H Liu and CWeng ldquoValx A system for extracting andstructuring numeric lab test comparison statements from textrdquoMethods of Information in Medicine vol 55 no 3 pp 266ndash2752016
[56] THaoWXie andFXu ldquoAwordnet expansion-based approachfor question targets identification and classificationrdquo LectureNotes in Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinformatics)Preface vol 9427 pp 333ndash344 2015
[57] T Hao W Xie Q Wu H Weng and Y Qu ldquoLeveraging ques-tion target word features through semantic relation expansionfor answer type classificationrdquo Knowledge-Based Systems vol133 pp 43ndash52 2017
[58] T Hao D Hu L Wenyin and Q Zeng ldquoSemantic patterns foruser-interactive question answeringrdquo Concurrency and Compu-tation Practice and Experience vol 20 no 7 pp 783ndash799 2008
[59] Z M Juan ldquoAn effective similarity measurement for FAQquestion answering systemrdquo in Proceedings of the InternationalConference on Electrical and Control Engineering ICECE 2010pp 4638ndash4641 June 2010
14 Wireless Communications and Mobile Computing
[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016
[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990
[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
14 Wireless Communications and Mobile Computing
[60] T Hao and Y Qu ldquoQSem A novel question representationframework for question matching over accumulated question-answer datardquo Journal of Information Science vol 42 no 5 pp583ndash596 2016
[61] S Deerwester S T Dumais G W Furnas T K Landauer andR Harshman ldquoIndexing by latent semantic analysisrdquo Journal ofthe Association for Information Science and Technology vol 41no 6 pp 391ndash407 1990
[62] H L Wang and H Y Xi ldquoChinese synonyms toolkitrdquo 2018httpsgithubcomhuyingxiSynonyms
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom