15
Research Article Composition Analysis and Feature Selection of the Oral Microbiota Associated with Periodontal Disease Wen-Pei Chen, 1 Shih-Hao Chang, 2,3 Chuan-Yi Tang , 4 Ming-Li Liou, 5 Suh-Jen Jane Tsai, 1 and Yaw-Ling Lin 1,4 Department of Applied Chemistry, Providence University, Taichung City, Taiwan Department of Periodontics, Linkou Medical Center, Chang Gung Memorial Hospital, Taoyuan, Taiwan Graduate Institute of Dental and Craniofacial Science, Chang Gung University, Taoyuan, Taiwan Department of Computer Science and Information Engineering, Providence University, Taichung City, Taiwan Department of Medical Laboratory Science and Biotechnology, Yuanpei University of Medical Technology, Hsin-Chu City, Taiwan Correspondence should be addressed to Yaw-Ling Lin; [email protected] Received 30 May 2018; Revised 10 October 2018; Accepted 4 November 2018; Published 15 November 2018 Academic Editor: Momiao Xiong Copyright © 2018 Wen-Pei Chen et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Periodontitis is an inflammatory disease involving complex interactions between oral microorganisms and the host immune response. Understanding the structure of the microbiota community associated with periodontitis is essential for improving classifications and diagnoses of various types of periodontal diseases and will facilitate clinical decision-making. In this study, we used a 16S rRNA metagenomics approach to investigate and compare the compositions of the microbiota communities from 76 subgingival plagues samples, including 26 from healthy individuals and 50 from patients with periodontitis. Furthermore, we propose a novel feature selection algorithm for selecting features with more information from many variables with a combination of these features and machine learning methods were used to construct prediction models for predicting the health status of patients with periodontal disease. We identified a total of 12 phyla, 124 genera, and 355 species and observed differences between health- and periodontitis-associated bacterial communities at all phylogenetic levels. We discovered that the genera Porphyromonas, Treponema, Tannerella, Filifactor, and Aggregatibacter were more abundant in patients with periodontal disease, whereas Streptococcus, Haemophilus, Capnocytophaga, Gemella, Campylobacter, and Granulicatella were found at higher levels in healthy controls. Using our feature selection algorithm, random forests performed better in terms of predictive power than other methods and consumed the least amount of computational time. 1. Introduction e human mouth harbors a complex microbial community, with estimates of up to 700 or more different bacterial species, most of which are commensal and required to maintain the balance of the mouth ecosystem [1]. However, some of the bacteria in the mouth microbiota play important roles in the development of oral diseases, including dental caries and periodontal disease [2]. Periodontal disease and dental caries initiate with the growth of the dental plaque, a biofilm formed by the accumulation of bacteria together with various human salivary glycoproteins and polysaccharides secreted by the microbes [3]. e subgingival plaque, located within the neutral or alkaline subgingival sulcus, is typically inhabited by anaerobic gram-negative bacteria and is responsible for the development of gingivitis and periodontitis. e compo- sition of oral microorganisms depends on multiple factors, including lifestyle (e.g., diet, oral care habits), health (e.g., oral diseases, host immune responses, and genetic susceptibility), and physical location in the oral cavity (tongue or tooth surfaces, as well as supragingival or subgingival sites) [4]. Periodontitis is an inflammatory disease involving a complex interaction between oral microorganisms organized in a biofilm structure and the host immune response. Clinically, periodontitis results in the destruction of tissues that support and protect the tooth and is a major cause of tooth loss in adults [5]. Moreover, periodontitis can also affect systemic health by increasing the risk of atherosclerosis, adverse Hindawi BioMed Research International Volume 2018, Article ID 3130607, 14 pages https://doi.org/10.1155/2018/3130607

Composition Analysis and Feature Selection of the Oral

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Composition Analysis and Feature Selection of the Oral

Research ArticleComposition Analysis and Feature Selection ofthe Oral Microbiota Associated with Periodontal Disease

Wen-Pei Chen1 Shih-Hao Chang23 Chuan-Yi Tang 4 Ming-Li Liou5

Suh-Jen Jane Tsai1 and Yaw-Ling Lin 14

1Department of Applied Chemistry Providence University Taichung City Taiwan2Department of Periodontics Linkou Medical Center Chang Gung Memorial Hospital Taoyuan Taiwan3Graduate Institute of Dental and Craniofacial Science Chang Gung University Taoyuan Taiwan4Department of Computer Science and Information Engineering Providence University Taichung City Taiwan5Department of Medical Laboratory Science and Biotechnology Yuanpei University of Medical Technology Hsin-Chu City Taiwan

Correspondence should be addressed to Yaw-Ling Lin yllinpuedutw

Received 30 May 2018 Revised 10 October 2018 Accepted 4 November 2018 Published 15 November 2018

Academic Editor Momiao Xiong

Copyright copy 2018 Wen-Pei Chen et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Periodontitis is an inflammatory disease involving complex interactions between oral microorganisms and the host immuneresponse Understanding the structure of the microbiota community associated with periodontitis is essential for improvingclassifications and diagnoses of various types of periodontal diseases and will facilitate clinical decision-making In this studywe used a 16S rRNA metagenomics approach to investigate and compare the compositions of the microbiota communities from76 subgingival plagues samples including 26 from healthy individuals and 50 from patients with periodontitis Furthermore wepropose a novel feature selection algorithm for selecting features with more information from many variables with a combinationof these features and machine learning methods were used to construct prediction models for predicting the health statusof patients with periodontal disease We identified a total of 12 phyla 124 genera and 355 species and observed differencesbetween health- and periodontitis-associated bacterial communities at all phylogenetic levels We discovered that the generaPorphyromonas Treponema Tannerella Filifactor and Aggregatibacter were more abundant in patients with periodontal diseasewhereas Streptococcus Haemophilus Capnocytophaga Gemella Campylobacter and Granulicatella were found at higher levels inhealthy controls Using our feature selection algorithm random forests performed better in terms of predictive power than othermethods and consumed the least amount of computational time

1 Introduction

The human mouth harbors a complex microbial communitywith estimates of up to 700ormore different bacterial speciesmost of which are commensal and required to maintain thebalance of the mouth ecosystem [1] However some of thebacteria in the mouth microbiota play important roles inthe development of oral diseases including dental caries andperiodontal disease [2] Periodontal disease and dental cariesinitiate with the growth of the dental plaque a biofilm formedby the accumulation of bacteria together with various humansalivary glycoproteins and polysaccharides secreted by themicrobes [3] The subgingival plaque located within theneutral or alkaline subgingival sulcus is typically inhabited

by anaerobic gram-negative bacteria and is responsible forthe development of gingivitis and periodontitis The compo-sition of oral microorganisms depends on multiple factorsincluding lifestyle (eg diet oral care habits) health (eg oraldiseases host immune responses and genetic susceptibility)and physical location in the oral cavity (tongue or toothsurfaces as well as supragingival or subgingival sites) [4]Periodontitis is an inflammatory disease involving a complexinteraction between oral microorganisms organized in abiofilm structure and the host immune response Clinicallyperiodontitis results in the destruction of tissues that supportand protect the tooth and is a major cause of tooth loss inadults [5] Moreover periodontitis can also affect systemichealth by increasing the risk of atherosclerosis adverse

HindawiBioMed Research InternationalVolume 2018 Article ID 3130607 14 pageshttpsdoiorg10115520183130607

2 BioMed Research International

pregnancy outcomes rheumatoid arthritis aspiration pneu-monia and cancer [6ndash11]

In the past half century numerous studies have char-acterized the community composition of the oral micro-biota and described the association between periodonti-tis and pathogenic microorganisms For example Aggre-gatibacter actinomycetemcomitans Porphyromonas gingivalisTannerella forsythia Treponema denticola Fusobacteriumnucleatum and Prevotella intermedia have traditionally beenconsidered pathogenic bacteria contributing to periodontitis[5 12 13] Socransky et al [14] described the role of 5main microbial complexes in the subgingival biofilm Theyreported that red complex species Porphyromonas gingivalisTreponema denticola and Tannerella forsythia exhibited avery strong relationship with periodontitis Subsequentlyother association and elimination studies have confirmed theinvolvement of the three members of the red complex andsome members of the orange complex such as Prevotellaintermedia Parvimonas micra Fusobacterium nucleatumEubacterium nodatum and Aggregatibacter actinomycetem-comitans in the etiology of different periodontal conditions[15] Additionally during the past decade researchers usingculture-independent molecular techniques have shown thatsome representatives of the generaMegasphaera ParvimonasDesulfobulbus and Filifactor are more abundant in patientswith periodontal diseases whereas members ofAggregatibac-ter Prevotella Selenomonas Streptococcus Actinomyces andRothia are more abundant in healthy patients [16ndash19]

Machine learning is data method that involves findingpatterns and making predictions from data based on multi-variate statistics data mining and pattern recognition Thistechnology had been used to solved many metagenomicproblems such as operational taxonomic unit (out) clustering[20ndash24] binning [25ndash30] taxonomic profiling and assign-ment [31ndash35] comparative metagenomics [36ndash38] and geneprediction [39ndash42] In addition to the learning algorithmand the model the most important component of a learningsystem is how features are extracted from the domain data aprocess known as feature selection The purposes of featureselection include improving the prediction performance ofthe predictors providing faster and more cost-effective pre-dictors and providing a better understanding of the underly-ing process that generated the data [43ndash45] Feature selectionmethodology can be categorized into three classes (filterwrapper and embedded methods) according to how thefeature selection search is combined with the construction ofthe classification mode Filter methods estimate the relevanceof features by analysis of the intrinsic properties of the dataThesemethods are computationally simple and fast can scaleto very high-dimensional datasets easily and are independentof the classification algorithm

Although much is known about individual species asso-ciated with pathogenesis the global structure of the bacterialcommunity and the microbial signatures of periodontaldisease are still poorly understood In this study we exploredthe microbial diversity in the subgingival plaque of healthypatients and patients with periodontal disease using culture-independent molecular methods based on 16S ribosomalDNA cloning We also compared the bacterial community

compositions between healthy patients and patients withperiodontal disease and determined the core microbiomespresent in these patients Furthermore we proposed a novelalgorithm for feature selection and microbes with significantdifferences were extracted as features and provided to gen-erate feature combinations by applying our algorithm Usingmachine learning methods we built prediction models andfound that the health status of patients with periodontal dis-ease could be identified accurately using only a few features

2 Materials and Methods

21 16S rRNA Sequence Dataset In total 76 samples usedfor this study were collected from subgingival plaques of76 unrelated individuals including 10 patients with severeperiodontal disease 40 patients with moderate periodontaldisease and 26 healthy controls This study was approvedby the Institutional Review Board of Chang Gung Memo-rial Hospital Taiwan (approval no 102-4239B) All patientsprovided informed consent prior to their enrolment inthe study The oral health statuses of all individuals weredetermined by a dentist who performed a full-mouth clinicalexamination that included clinical parameters of periodontalpocket depths gingival recession clinical attachment lossbleeding on probing tooth mobility and furcation involve-ment These clinical parameters were measured at 6 sitesper tooth (mesiobuccal buccal distobuccal distolinguallingual and mesiolingual) at all teeth Table 1 summarizesthe parameters of periodontal pocket depths bleeding onprobing and clinical attachment loss for all of the samplesTheclassification of periodontitis as slight moderate or severewas based on the guidelines of the American Academy ofPeriodontology [46] Subjects who had received previousperiodontal therapy within two years and recent history ofantibiotics taking within last 6 months were excluded

After sampling DNA extraction and polymerase chainreaction (PCR) were performed based on methods describedby Tang et al [47] Following extraction barcoded PCRamplification was performed with 382-bp amplicons flankingthe highly variable V1-V2 region of the 16S rRNA genesequence [48] Next-generation sequencing evaluation of oralmicrobial communities was carried out using an IlluminaMiSeq Desktop Sequencer after 30 cycles of PCR to enrichthe adapter-modified DNA fragments

22 Sequence Processing Paired-end reads sequenced by theIllumina Sequencer were assembled with PEAR software[49] Using split librariespy in QIIME with default param-eters [50] assembled reads were demultiplexed and low-quality reads were filtered The GoldG database containingthe ChimeraSlayer reference database in the Broad Micro-biome Utilities [51] was used with UCHIME software [52]for chimera detection and removalThe remaining reads wereclustered into OTUs using a de novo OTU selection protocolat the 97 identity level with a USEARCH algorithm [21]Before clustering sequences we filtered out all reads thatoccurred fewer than three times This reduced the number ofunique sequences to a computationally manageable level and

BioMed Research International 3

Table 1 Clinical characteristics of studied subjects Clinical attachment loss and probing depth were measured in mm and represent themean for all collected sites in the oral cavity of studied subjects

Characteristics Healthy Moderate periodontitis Severe periodontitisProbing depth (mean plusmn sd) 13 plusmn 06 50 plusmn 13 79 plusmn 07Clinical attachment loss (mean plusmn sd) 16 plusmn 07 57 plusmn 15 86 plusmn 11 sites with bleeding on probing (mean plusmn sd) 28 plusmn 18 683 plusmn 232 797 plusmn 175

potentially reduced the number of errors from sequencingand contamination The taxonomy associated with eachOTUwas assigned by blasting a representative sequence of eachOTU against the Human Oral Microbiome Database [53](HOMD)The sequence processing was carried out using ourmetagenomic analysis platforms [45]

23 Diversity and Significance Analysis Sample data storedin the biological observation matrix format were subjectedto statistical analysis using R language We analyzed thesequencing depth of samples prior to downstream analysisusing the Shannon index The main microbes and taxo-nomic composition of the microbiota in each sample werealso estimated Abundance differences of microbes betweensample groups were evaluated using the KruskalminusWallis testFour non-phylogeny-based metrics namely the observerspecies chao 1 metric [54] Ace richness and Shannon indexwere used to evaluate alpha diversity which representedthe amount of diversity contained within communities byapplying the phyloseq R package UniFrac is a distancemetric used for comparing biological communities Principalcoordinate analysis (PCoA) with weighted UniFrac distanceswas applied to evaluate beta diversity which represented theamount of diversity shared among communities Principalcomponent analysis (PCA) was used to characterize theprimary microbes contained within communities

24 Feature Selection and Machine Learning In this studywe proposed a method of feature selection for selectingthe informative microbes to predict whether an individualsuffered fromperiodontal disease First the microbes presentat less than 05 relative abundance in all samples wereignored and nonparametric KruskalminusWallis tests were usedto detectmicroorganisms with significantly differential abun-dance between healthy patients and patients with periodontaldisease Microbes with more significant differential scoreswere considered features with more information Then theprioritized feature combination-generated algorithm shownin Algorithm 1 was adopted to produce the feature combina-tions composed by these more informative features

In prioritized order the feature combinations wereapplied to build classifiers with machine learning algorithmssuch as deep learning support vector machine (SVM)random forests and logistic regression We picked 80 ofsamples from both healthy and disease cases to train thepredictionmodel and the remaining cases were used for test-ing The prediction ability of each feature combination wasevaluated by calculating the average accuracy from 10 pre-dictions with different training and testing sample sets Here

we selected 10 of the most significant features having p valuesbetween 327E-11 and 777E-9 In total 1023 feature combi-nations were evaluated for their prediction ability using deeplearning SVM random forest and logistic regression meth-ods These machine learning algorithms were supported bythe R packages H

2O e1071 randomForest and stats respec-

tively We considered the radial basis function kernel forSVM Parameters for each machine learning algorithm weretuned using grid search and the parameters that obtainedbetter accuracy were adopted for training prediction models

3 Results and Discussion

31 Sample Sequencing and Identification In total 76 sub-gingival plaque samples from 76 unrelated individuals weredivided into three classes according to their periodontalhealth status ie healthy (H) severe periodontitis (SP) andmoderate periodontitis (MP) Following DNA extraction andbarcoded PCR amplification these samples were sequencedgenerating a total of 7530767 sequences After filtering andtrimming 6170984 sequences remained and there were 481OTUs in all samples (481 and 429 in diseased and healthysamples respectively) Due to variations in the number ofsequences among samples the total sequence reads withina sample was normalized to the relative abundance forsubsequent analyses

32 Taxonomic Composition of the Human Oral Micro-biota Table 2 summarizes the dominant microbes in thehuman oral microbial communities In the experimentalresults the microbial communities included 12 differentphyla Bacteroidetes Firmicutes Fusobacteria ProteobacteriaSpirochaetes Actinobacteria Candidate division TM7 Syner-gistetes Fusobacteria Candidate division SR1 Gracilibacteriaand Chloroflexi Bacteroidetes (37) was the most abundantphylum in the human oral microbiota The major generaconsisted of previously characterized oral bacteria includingPrevotella (1356) Fusobacterium (1130) Porphyromonas(1094) Treponema (886) Streptococcus (652) Lep-totrichia (476) and Capnocytophaga (364) In summarythere were 25 classes 40 orders 66 families 124 genera and355 species at each taxonomic level

In comparison of the compositions of microbial commu-nities between healthy patients and patients with periodonti-tis we found that the spectra of microbial communities dif-fered In healthy samples the dominant genera were Strepto-coccus (1309) Prevotella (1243) Fusobacterium (1170)Capnocytophaga (625) Leptotrichia (560)Alloprevotella(426) Campylobacter (394) Porphyromonas (378)

4 BioMed Research International

GenPFC((1198861 1198862 119886119899)) 119862 Generate prioritized feature combinations

Input (1198861 1198862 119886119899) a list with 119899 features in prioritized order

Output a queue 119876 used to store 2119899 minus 1 feature combinations1 119876 larr997888 (0) 119862 Enqueue empty set 0 into queue 1198762 for 119894 larr997888 1 to 119899 do 119862 Generate attribute combinations according to features in the list3 119879 larr997888 119876 119862 Copy 119876 into 119879 which is a temporary queue4 for each 119904 in 119879 do5 Enqueue(119876 119904 cup 119886

119894)

6 Dequeue(119876) 119862 Delete first empty set 0 from queue 1198767 return 119876

Algorithm 1 The prioritized feature combination-generated algorithm was used to generate all combinations of selected features inprioritized order As an example when 119899 equals four the generated list will be (1000 0100 1100 0010 1010 0110 1110 0001 1001 01011101 0011 1011 0111 1111) Each element is a combination and denotes whether the four features were selected in that combination (eg thecombination containing the first and third features is represented as 1010)

Table 2 Dominant microbes of the human oral microbiota at each taxonomic level

Phylum Class OrderBacteroidetes 3741 Bacteroidia 3171 Bacteroidales 3171Firmicutes 2082 Fusobacteria 1606 Fusobacteriales 1606Fusobacteria 1606 Spirochaetia 886 Spirochaetales 886Proteobacteria 930 Bacilli 783 Lactobacillales 706Spirochaetes 886 Clostridia 678 Clostridiales 678Actinobacteria 238 Negativicutes 521 Selenomonadales 521Family Genus SpeciesPrevotellaceae 1639 Prevotella 1356 Porphyromonas gingivalis 730Porphyromonadaceae 1296 Fusobacterium 1130 Fusobacterium nucleatum subsp vincentii 523Fusobacteriaceae 1130 Porphyromonas 1094 Prevotella intermedia 462Spirochaetaceae 886 Treponema 886 Streptococcus sp oral taxon 423 262Streptococcaceae 652 Streptococcus 652 Bacteroidales sp oral taxon 274 218Veillonellaceae 521 Leptotrichia 476 Prevotella loescheii 215

Veillonella (349) and Neisseria (327) however inpatients with periodontal disease the dominant genera werePorphyromonas (1467) Prevotella (1416) Treponema(1190) Fusobacterium (1109) Leptotrichia (432) andStreptococcus (310) At the species level Streptococcus sporal taxon 423 (02-36) was the most abundant species inhealthy patients whereas Porphyromonas gingivalis (0-31)was the most abundant species in patients with periodontitisTable 3 compares the dominant microbes between healthypatients and patients with periodontitis at each taxonomiclevel The genus and species level taxonomic compositionsbetween healthy patients and patients with periodontitis areshown in Figures 1 and 2 Streptococcus was more abun-dant in samples from all healthy individuals but decreasedin samples from patients with periodontitis AdditionallyPorphyromonas and Treponema were more abundant inpatients with periodontitis but decreased significantly insamples from healthy individuals In total 25 species wereidentified with significantly different abundances betweensample groups Porphyromonas gingivalis was the specieswith the most significantly differential abundance betweensamples fromhealthy patients and patients with periodontitis(p value = 241E-9)

Overall our findings were largely comparable to thoseof previous studies [14 55ndash61] indicating that species suchas Porphyromonas gingivalisTreponema denticola Tannerellaforsythia Filifactor alocis Treponema socranskii Aggregat-ibacter actinomycetemcomitans Treponema vincentii andMycoplasma faucium were significantly enriched in samplesfrom patients with periodontitis Furthermore we found aset of species including Streptococcus sanguinisHaemophilusparainfluenzae Capnocytophaga granulosa Gemella morbil-lorum Campylobacter showae and Granulicatella adiacenswere significantly enriched in samples from healthy individ-uals

Several studies have described the bacterial communitiesin patientswith periodontitis and healthy control participantsusing metagenomics [16ndash19 61ndash63] The dominant microor-ganisms associated with periodontitis and the healthy statewere largely consistent in those studies however we observedseveral discrepancies First in addition to common diseased-associated microorganisms such as Porphyromonas gingi-valis Treponema denticola Tannerella forsythia Filifactoralocis and Aggregatibacter actinomycetemcomitans we alsofound that the speciesMycoplasma faucium was significantlyenriched in samples from patients with periodontal disease

BioMed Research International 5

Table 3 Dominant microbes of the oral microbiota between healthy patients and patients with periodontitis at each taxonomic level

Healthy patients Patients with periodontitisPhylum

Bacteroidetes 3193 Bacteroidetes 4026Firmicutes 2690 Firmicutes 1766Fusobacteria 1731 Fusobacteria 1542Proteobacteria 1181 Spirochaetes 1190Actinobacteria 336 Proteobacteria 799Saccharibacteria 320 Synergistetes 250

ClassBacteroidia 2476 Bacteroidia 3532Fusobacteria 1731 Fusobacteria 1542

Bacilli 1523 Spirochaetia 1190Negativicutes 671 Clostridia 793Flavobacteriia 667 Negativicutes 443Clostridia 457 Bacilli 398

OrderBacteroidales 2476 Bacteroidales 3532Fusobacteriales 1731 Fusobacteriales 1542Lactobacillales 1394 Spirochaetales 1190Selenomonadales 671 Clostridiales 793Flavobacteriales 667 Selenomonadales 443Clostridiales 457 Lactobacillales 348

FamilyPrevotellaceae 1669 Porphyromonadaceae 1719Streptococcaceae 1309 Prevotellaceae 1624Fusobacteriaceae 1170 Spirochaetaceae 1190Veillonellaceae 671 Fusobacteriaceae 1109

Flavobacteriaceae 667 Veillonellaceae 443Leptotrichiaceae 561 Leptotrichiaceae 432

GenusStreptococcus 1309 Porphyromonas 1467Prevotella 1243 Prevotella 1416

Fusobacterium 1170 Treponema 1190Capnocytophaga 625 Fusobacterium 1109Leptotrichia 560 Leptotrichia 432Alloprevotella 426 Streptococcus 310

SpeciesStreptococcus sp oral taxon 423 588 Porphyromonas gingivalis 1101

Fusobacterium nucleatum subsp vincentii 422 Prevotella intermedia 602Fusobacterium nucleatum subsp polymorphum 352 Fusobacterium nucleatum subsp vincentii 576

Veillonella parvula 333 Treponema denticola 268Bacteroidales sp oral taxon 274 311 Fusobacterium nucleatum subsp nucleatum 234

Fusobacterium nucleatum subsp animalis 309 Tannerella forsythia 232

There were 26 samples that contained this species at greaterthan 05 abundance and only one of these samples wasderived from a healthy patient The average relative abun-dance of Mycoplasma faucium was 059 in all samples(004 and 087 in samples from healthy patients andpatients with periodontal disease respectively) and was upto 485 in one diseased sample Although this is a rare

bacterium in the normal microbiota of the human orophar-ynx some reports have identified this pathogen in brainabscesses [64 65] Additionally Liu et al [61] characterizedthe genomes of key players in the subgingival microbiotain patients with periodontitis including an unculturableTM7 organism They also demonstrated that TM7 organismswere significantly enriched in samples from patients with

6 BioMed Research International

lowast PorphyromonasPrevotella

lowast Treponemalowast Tannerella

lowast Fretibacterium lowast Filifactor

AggregatibacterBacteroidetes_[Gminus5]

lowast MycoplasmaParvimonas

lowast Bacteroidetes_[Gminus3]Bacteroidetes_[Gminus6]

AnaeroglobusFusobacteriumlowast Streptococcus

lowast CapnocytophagaLeptotrichia

AlloprevotellaCampylobacter

VeillonellaNeisseria

Bacteroidales_[Gminus2]lowast Haemophilus

TM7_[Gminus1]SelenomonasTM7_[Gminus5]lowast Gemella

DialisterCorynebacteriumlowast Granulicatella

ActinomycesSR1_[Gminus1]

Peptostreptococcaceae_[XI][Gminus7]

12 9 6 3 0 3 6 9 12Average abundance

Sample group

Case

Control

Gen

us

Figure 1 Microbial compositions of samples from healthy patients and patients with periodontitis at the genus level The abundances werecalculated by averaging the relative abundances in samples from healthy patients and patients with periodontitis Only genera with gt 05abundance in at least one sample were included Genera with significant differences in abundance between sample groups are indicated withasterisks (lowast) (p value lt 00001)

periodontitis In our study 49 of 76 samples contained TM7bacteria at greater than 1 abundance (average abundanceof 21 in all samples) In samples from healthy patientsand patients with periodontitis the average abundanceswere 32 and 149 respectively However significantenrichment was not observed in samples from patients withperiodontitis Furthermore we found that the subspeciesFusobacterium nucleatum subsp polymorphum which isrelated to periodontal disease and is themember of the orangecluster described by Socransky et al [14] is more abundant inhealthy patients In our results the average abundances were352 and 113 in samples fromhealthy patients and patientswith periodontitis respectively This situation also can beobserved in other three species including Campylobactergracilis Campylobacter rectus and Campylobacter showaeThis discrepancy could be explained by geographic variability[66] or by differences in the depths of the pockets sampled[14] as well as the sample size and the DNA analytic bias [67]Finally Spearmanrsquos rank correlation coefficientwas computed

to assess association between each pair of species associatedwith periodontal disease Figure 3 shows that a very strongrelationship exhibited among species Porphyromonas gingi-valis Treponema denticola and Tannerella forsythia

In our study there are 25 bacterial species with signif-icantly different abundances between healthy patients andpatients with periodontitis The relationships of these speciesto pocket depth and clinical attachment loss were examinedFigure 4 shows that three species Porphyromonas gingivalisTreponema denticola and Tannerella forsythia exhibited avery strong relationship with pocket depth and clinicalattachment loss For instance the three species increasedin abundance with increasing pocket depth and clinicalattachment loss The abundances of those species amongdifferent level of pocket depth and clinical attachment losswere different significantly However it should be noted thatnot only oral microorganisms but also others factors suchas supragingival plaque would affect the pocket depth andclinical attachment loss [68]

BioMed Research International 7

lowast Porphyromonas gingivalisPrevotella intermedia

Fusobacterium nucleatum_subsp_vincentiilowast Treponema denticola

lowast Fusobacterium nucleatum_subsp_nucleatumlowast Tannerella forsythia

Porphyromonas endodontalislowast Filifactor alocis

lowast Treponema vincentiiTreponema sp_oral_taxon_231

lowast Treponema socranskiilowast Aggregatibacter actinomycetemcomitans

lowast Fretibacterium sp_oral_taxon_359lowast Leptotrichia sp_oral_taxon_218lowast Treponema sp_oral_taxon_237

Prevotella sp_oral_taxon_313Prevotella pleuritidis

Bacteroidetes sp_oral_taxon_511lowast Mycoplasma faucium

lowast Fretibacterium sp_oral_taxon_360Parvimonas micra

Bacteroidetes sp_oral_taxon_516Campylobacter sp_oral_taxon_044

lowast Prevotella sp_oral_taxon_304Anaeroglobus geminatus

9 6 3 0 3 6Average abundance

Sample group

Case

Control

Spec

ies

lowast Streptococcus sp_oral_taxon_423lowast Fusobacterium nucleatum_subsp_polymorphum

Veillonella parvula

Fusobacterium nucleatum_subsp_animalisBacteroidales sp_oral_taxon_274

Prevotella loescheiilowast Streptococcus sanguinis

Alloprevotella tanneraelowast Haemophilus parainfluenzae

Campylobacter gracilislowast Fusobacterium sp_HOT_204

Porphyromonas catoniaelowast Capnocytophaga granulosa

Neisseria flavescensNeisseria sicca

TM7sp_oral_taxon_356Prevotella oris

Streptococcus cristatusCapnocytophaga leadbetteri

Leptotrichia sp_oral_taxon_417lowast Gemella morbillorumStreptococcus dentisani

lowast Campylobacter showaeAlloprevotella sp_oral_taxon_473

Corynebacterium matruchotiiDialister invisus

Leptotrichia sp_oral_taxon_225lowast Capnocytophaga sp_oral_taxon_412

Capnocytophaga sputigenalowast Granulicatella adiacens

Prevotella pallensStreptococcus sp_oral_taxon_058lowast Prevotella sp_oral_taxon_300

Aggregatibacter segnisCapnocytophaga sp_oral_taxon_326

TM7sp_oral_taxon_346Leptotrichia hongkongensis

Campylobacter concisusLeptotrichia sp_oral_taxon_392

Figure 2 Microbial compositions of samples from healthy patients and patients with periodontitis at the species levelThe abundances werecalculated by averaging the relative abundances in samples from healthy patients and patients with periodontitis Only species with gt 05abundance in at least one sample are shown Species with significant differences in abundance between sample groups are indicated withasterisks (lowast) (p value lt 00001)

8 BioMed Research International

F nuc ss nucleatum

C rectus

P intermedia

P gingivalis

T forsythia

T denticolaF nuc ss polymorphum

C showae

P micra

F nuc ss vincentiiC gracilis

P nigrescens

F nuc s

s nucle

atum

C rectu

s

P inter

media

P gingiva

lis

T forsy

thia

T denticola

F nuc s

s polym

orphum

C showae

P micr

aF n

uc ss v

incentii

C gracilis

P nigres

cens

1

08

06

04

02

0

-02

-04

-06

-08

-1

Figure 3 The relationships among species were evaluated using Spearmanrsquos rank correlation coefficient

p lt 00001

Porphyromonas gingivalis Treponema denticola Tannerella forsythia

12

9

6

3

12

9

6

3

12

9

6

3

15

12

lt 3 3 4 5 6 gt 7

Porphyromonas gingivalis

p lt 00001

p lt 00001 p lt 00001 p lt 00001

lt 3 3 4 5 6 gt 7

Treponema denticola

p lt 00001 15

12

lt 3 3 4 5 6 gt 7Pocket depth (mm)

Tannerella forsythia

9 9

6 6

3

15

12

9

6

3 3

lt 3 3 4 5 6 gt 7 lt 3 3 4 5 6 gt 7 lt 3 3 4 5 6 gt 7Clinical attachment loss (mm)

Aver

age a

bund

ance

Aver

age a

bund

ance

Figure 4 Relationships of the average abundance of three species to selected pocket depths and clinical attachment loss levels Significanceof differences among pocket depth levels was tested using the Kruskal-Wallis test

33 Diversity of Bacterial Community Profiles To evaluatethe alpha diversity of the microbial communities Shannonindex curves scores and richness metrics (Observed Chao1and ACE) were applied as shown in Figure 5 As depictedin Figure 5(a) the Shannon diversity index curves clearlyreached plateau levels after the sequence number exceeded5000 in all three health statuses indicating that the microbial

composition for each health status was well represented bythe sequencing depth As shown in Figure 5(b) the averagerichness measured by Observed Chao1 and Ace indexeswas higher in samples from patients with periodontitis thanin samples from healthy individuals however these resultswere in contrast to the results from the Shannon diversityindex Thus the relative abundance of each microbe was

BioMed Research International 9

6

Health StatusHMP SP

4

2

0

0 5000 10000 15000Sequences Per Sample

Aver

age S

hann

on In

dex

(a)

Observed Chao1 ACE

Case

Control

Case

Control

Case

Control

0

100

200

300

400

500

Sample group

Alp

ha D

iver

sity M

easu

re

Shannon

Case

Control

30

35

40

45

p value = 0166

(b)

Figure 5 (a) The sequencing depths measured by average scores from the Shannon index reached a plateau when the sequence numberexceeded 5000 (b) Alpha-diversity metrics (richness and Shannon index) were employed to measure the microbial communities of samplesfrom healthy patients and patients with periodontitis The average richness of microbes was higher in patients with periodontal disease thanin healthy patients however the microbial communities of healthy patients exhibited higher Shannon indexes

more balanced in samples from healthy individuals than insamples from patients with periodontal disease and therewere more microbes with low relative abundance in samplesfrom patients with periodontitis

To further explore the relationships between bacterialcommunities in healthy patients and patients with peri-odontal disease PCoAwas performed (Figure 6(a)) Analysisof beta diversity based on the weighted UniFrac distancesshowed greater concentration in diseased samples than inhealthy samples In other words the microbial composi-tions of diseased samples were more similar to each otherAs shown in Figure 6(b) PCA of microbial communitiesrevealed that the core genera in healthy samples includedStreptococcus Capnocytophaga Campylobacter VeillonellaAlloprevotella TM7 [G-1] Leptotrichia and Selenomonaswhereas those in samples from patients with periodontitiswere Filifactor Treponema Fretibacterium Porphyromonasand Tannerella

34 Machine Learning and Feature Selection Before applyingthe machine learning algorithm to classify samples it isnecessary to select the features from the samples and trainpredictionmodels Table 4 lists featureswith difference scores

p lt 1E-07 Based on significant differences between healthypatients and patients with periodontitis we selected the top10 microbes with more information as features In total1023 combinations of selected features were generated by ouralgorithm All feature combinations were evaluated by SVMrandom forest logical regression and deep learning machinelearning methods and the average accuracies were 088 093085 and 090 respectively Figure 7 shows the performanceof each machine learning method In general the accuracyof prediction increased slightly with the number of featuresused except in logistic regression From our results wefound that random forests had better predictive ability thanthe other methods Applying combinations consisting ofPeptoniphilaceae sp oral taxon 113 Streptococcus sanguinisMollicutes sp oral taxon 906Aggregatibacter actinomycetem-comitans Porphyromonas gingivalis Peptostreptococcaceae sporal taxon 950 and Lachnospiraceae sp oral taxon 500or Stomatobaculum sp oral taxon 373 Desulfobulbus sporal taxon 041 Peptoniphilaceae sp oral taxon 113 Strep-tococcus sanguinis Aggregatibacter actinomycetemcomitansPorphyromonas gingivalis and Leptotrichia sp oral taxon 218showed that random forests could predict the health status ofsamples accurately The feature combinations having averageaccuracies of more than 094 are reported in Table 5

10 BioMed Research International

minus01

00

01

02

minus02 minus01 00 01 02Axis1 [403]

Axi

s2 [1

35

]

Health statusHMPSP

Sample groupCaseControl

(a)

Prevotella

Fusobacterium

Porphyromonas

Treponema

Streptococcu

s

Leptotri

chia

Capnocytophaga

Alloprevotella

Campylobacter

Neisse

ria

Veillonella

Bacteroidales_[Gminus2]

Tannerella

Fretibacterium

Filifactor

Aggregatibacter

SelenomonasTM7_[Gminus1]minus2

0

2

4

minus4 minus2 0 2PC1 (330 explained var)

PC2

(13

3 ex

plai

ned

var)

Sample group Case Control

(b)

Figure 6 (a) Principal coordinate analysis (PCoA) with weighted UniFrac distance matrixes for bacterial communities associated with thethree health statuses (b) Principal component analysis (PCA) of the dominant genera between samples from healthy patients and patientswith periodontitis Only genera with ge 1 mean relative abundance across all samples are shown

Table 4 Features with significant differences between healthy patients and patients with periodontitis Correlation coefficients and p valueswere determined by Spearmanrsquos rank correlation coefficient and KruskalminusWallis tests respectively Negative correlations indicated that thefeatures were observed more often in patients with periodontitis than in healthy patients

No Feature (Species) Correlation coefficient 119901

1 Stomatobaculum sp oral taxon 373 -0766029754 327E-112 Desulfobulbus sp oral taxon 041 -074877058 890E-113 Peptoniphilaceae sp oral taxon 113 -0723418056 373E-104 Streptococcus sanguinis 071684624 536E-105 Mollicutes sp oral taxon 906 -0709369416 808E-106 Aggregatibacter actinomycetemcomitans -0686608198 274E-097 Porphyromonas gingivalis -0683993685 315E-098 Peptostreptococcaceae sp oral taxon 950 -0681489164 359E-099 Lachnospiraceae sp oral taxon 500 -0670324546 643E-0910 Leptotrichia sp oral taxon 218 -0666642231 777E-0911 Bosea vestrisii 0665468802 826E-0912 Filifactor alocis -0656797473 129E-0813 Mycoplasma faucium -0641322841 279E-0814 Prevotella sp oral taxon 304 -0638587976 320E-0815 Fretibacterium sp oral taxon 359 -0632290825 436E-0816 Bergeyella sp oral taxon 322 0630961524 465E-0817 Tannerella forsythia -0628346704 528E-0818 Peptostreptococcus indolicus -0626504998 577E-0819 Johnsonella sp oral taxon 166 -0622396393 704E-0820 Peptostreptococcaceae [Eubacterium] saphenum -0616735679 924E-08

BioMed Research International 11

Table 5 Feature combinations and their predictive accuracies with different machine learning methods Only feature combinations withmore than 094 average accuracy are shown DL RF and LR represent deep learning random forests and logistic regression respectively

Feature combination DL RF SVM LR Average accuracyStomatobaculum sp oral taxon 373 0967 0973 0960 0933 0958Peptoniphilaceae sp oral taxon 113Desulfobulbaceae sp oral taxon 041

0933 0960 0973 0947 0953Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLachnospiraceae sp oral taxon 500Leptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0933 0973 0960 0947 0953Streptococcus sanguinisAggregatibacter actinomycetemcomitansDesulfobulbaceae sp oral taxon 041

0973 0967 0933 0927 0950Mollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0947 0953 0907 0987 0948Streptococcus sanguinisMollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0960 0967 0947 0913 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Desulfobulbaceae sp oral taxon 041

0933 0973 0933 0947 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0967 0933 0953 0933 0947Peptoniphilaceae sp oral taxon 113Mollicutes sp oral taxon 906Peptoniphilaceae sp oral taxon 113

0960 0987 0867 0967 0945Streptococcus sanguinisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0920 0947 0967 0947 0945Aggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0967 0967 0953 0893 0945Peptoniphilaceae sp oral taxon 113Porphyromonas gingivalisAggregatibacter actinomycetemcomitans

According to previous studies Caruana et al [69 70] pro-posed that the random forest method showed better accuracyin high-dimensional and large-scale data than neural netsSVM and logistic regression In this study we found that therandom forest method was more suitable for small-scale datathan other methods In contrast deep learning approachesled to good performance but required long computation

times and large amounts of memory particularly when thehidden layer size was increased

4 Conclusions

With the development of high-throughput DNA sequencingtechnology the limitations associated with difficult culture of

12 BioMed Research International

075

080

085

090

095

1 2 3 4 5 6 7 8 9 10Number of features in the combinstion

Aver

age a

ccur

acy

Machine learning algorithmsDeep learningLogistic regressionRandom forestSVM

Figure 7 Average accuracies of different numbers of features

manymicrobes that populate the oral cavity can be overcomefacilitating the analysis of bacterial community compositionUsing 16S rRNA sequencing of subgingival samples from 50individuals with periodontitis and 26 periodontally healthycontrols we determined the diversity of and differences incommunity compositions Moreover we identified microbesassociated with good health and periodontal disease andprovided a machine learning method for finding patternsand making predictions for oral microbiota associated withperiodontal disease

Our results showed that there was a higher diversity ofmicrobes in samples from patients with periodontal diseasethan in samples from healthy patients Importantly the coremicrobes in healthy patients were different significantly fromthose in patients with periodontitis We also found thatbacterial communities associated with healthy and diseasedstates were highly different in PCA and PCoA and thecompositions of microorganisms were more similar to eachother in samples from patients with periodontal disease thanin samples from healthy individuals

We proposed a novel feature selection method andinvestigated the potential of machine learning approachesfor determination of health status based on oral metage-nomics data By using nonparametric KruskalminusWallis teststo assess the significance of each microorganism we selectedsignificant microbes to generate prioritized feature combina-tions by our algorithm The performances of four machinelearning approaches were evaluated with these feature com-binations and random forests showed the best performance(average accuracy of 093 from 1023 feature combinations)followed by deep learning SVM and logistic regressionUsing machine learning methods training models couldaccurately predict the health status of samples by examiningfewer features According to our observations the accuracyof prediction generally increased slightly with the numberof features used except for logistic regression Notablycertain combinations composed of fewer features showedbetter accuracy than combinations composed of all selectedfeatures These combinations of features may only apply toour dataset However the results implied that a few relatedfeatures may have better predictive ability than multiple

independent features Therefore in order to improve theprediction accuracy of the model it is essential to identifythe most informative features Due to limitations in fundingtime and ethical considerations it is not easy to obtain largenumbers of oral samples from patients with periodontitisAlthough insufficient and incomplete samples could easilylead to bias and variance in training models our study stillprovided an important basis for further studies

Periodontitis is a chronic inflammatory disease involvingcomplex interactions between the oral microorganisms andthe host immune response In addition to the individualspecies associated with pathogenesis the system-level mech-anisms underlying the transition from a healthy state to adiseased state are key points for studying periodontal diseaseThus in our future studies we aim to elucidate the globalgenetic metabolic and ecological changes associated withperiodontitis and identify the pathogenic features of con-structing machine learning models Rapid molecular tech-niques and machine learning methods capable of identifyingperiodontal bacteria with great accuracy may eventuallyprovide improved classification and diagnosis of varioustypes of periodontal diseases and aid significantly in clinicaldecision-making

Data Availability

The raw sequences of human oral subgingival plaque sampleswere deposited at theNCBI SequenceReadArchive under theBioproject Accession no PRJNA437129

Conflicts of Interest

The authors declare no conflicts of interest

Authorsrsquo Contributions

Wen-Pei Chen and Shih-Hao Chang contributed equally tothis work

Funding

Thepresent work was partially supported by a grant from theMinistry of Science and Technology [grant number MOST107-2218-E-126-001-] and [grant number NSC 102-2622-E-126-002 CC1]

References

[1] L Gao T Xu G Huang S Jiang Y Gu and F Chen ldquoOral mi-crobiomesmore andmore importance in oral cavity and wholebodyrdquo Protein amp Cell pp 1ndash13 2018

[2] P D Marsh ldquoMicrobiology of dental plaque biofilms and theirrole in oral health and cariesrdquo Dental Clinics of North Americavol 54 no 3 pp 441ndash454 2010

[3] P D Marsh ldquoDental plaque as a biofilm and a microbialcommunity - Implications for health and diseaserdquo BMC OralHealth vol 6 no 1 2006

BioMed Research International 13

[4] R J Palmer Jr ldquoComposition and development of oral bacterialcommunitiesrdquo Periodontology 2000 vol 64 no 1 pp 20ndash392014

[5] B L Pihlstrom B S Michalowicz and N W Johnson ldquoPeri-odontal diseasesrdquoe Lancet vol 366 no 9499 pp 1809ndash18202005

[6] R J Genco and T E Van Dyke ldquoReducing the risk of CVD inpatients with periodontitisrdquo Nature Reviews Cardiology vol 7no 9 pp 479-480 2010

[7] K Lundberg N Wegner T Yucel-Lindberg and P J VenablesldquoPeriodontitis in RA-the citrullinated enolase connectionrdquoNature Reviews Rheumatology vol 6 no 12 pp 727ndash730 2010

[8] M Kebschull R T Demmer and P N Papapanou ldquoldquoGumbug leave my heart alonerdquomdashepidemiologic and mechanisticevidence linking periodontal infections and atherosclerosisrdquoJournal of Dental Research vol 89 no 9 pp 879ndash902 2010

[9] S E Whitmore R J Lamont and W E Goldman ldquoOral Bac-teria and Cancerrdquo PLoS Pathogens vol 10 no 3 p e10039332014

[10] Y W Han and X Wang ldquoMobile microbiome oral bacteria inextra-oral infections and inflammationrdquo Journal of Dental Re-search vol 92 no 6 pp 485ndash491 2013

[11] P N Madianos Y A Bobetsis and S Offenbacher ldquoAdversepregnancy outcomes (APOs) and periodontal disease Patho-genic mechanismsrdquo Journal of Clinical Periodontology vol 40no 14 pp S170ndashS180 2013

[12] S Temoin K LWu VWuM Shoham and YW Han ldquoSignalpeptide of FadA adhesin from Fusobacterium nucleatum playsa novel structural role by modulating the filamentrsquos length andwidthrdquo FEBS Letters vol 586 no 1 pp 1ndash6 2012

[13] S Malm M Jusko S Eick J Potempa K Riesbeck and AM Blom ldquoAcquisition of complement inhibitor serine proteasefactor i and its cofactors C4b-binding protein and factor H byprevotella intermediardquo PLoS ONE vol 7 no 4 2012

[14] S S Socransky A D Haffajee M A Cugini C Smith and R LKent Jr ldquoMicrobial complexes in subgingival plaquerdquo Journal ofClinical Periodontology vol 25 no 2 pp 134ndash144 1998

[15] R Teles F Teles J Frias-Lopez B Paster and A HaffajeeldquoLessons learned and unlearned in periodontal microbiologyrdquoPeriodontology 2000 vol 62 no 1 pp 95ndash162 2013

[16] P Belda-Ferre L D Alcaraz R Cabrera-Rubio et al ldquoThe oralmetagenome in health and diseaserdquo e ISME Journal vol 6no 1 pp 46ndash56 2012

[17] P S Kumar A L Griffen M L Moeschberger and E J LeysldquoIdentification of candidate periodontal pathogens and ben-eficial species by quantitative 16S clonal analysisrdquo Journal ofClinical Microbiology vol 43 no 8 pp 3944ndash3955 2005

[18] A P V Colombo S K Boches S L Cotton et al ldquoComparisonsof subgingival microbial profiles of refractory periodontitissevere periodontitis and periodontal health using the humanoral microbe identification microarrayrdquo Journal of Periodontol-ogy vol 80 no 9 pp 1421ndash1432 2009

[19] S Junemann K Prior R Szczepanowski et al ldquoBacterial com-munity shift in treated periodontitis patients revealed by IonTorrent 16S rRNA gene amplicon sequencingrdquo PLoS ONE vol7 no 8 Article ID e41606 2012

[20] P D Schloss S L Westcott T Ryabin et al ldquoIntroduc-ing mothur open-source platform-independent community-supported software for describing and comparing microbialcommunitiesrdquoApplied and EnvironmentalMicrobiology vol 75no 23 pp 7537ndash7541 2009

[21] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquoBioinformatics vol 26 no 19 pp 2460-2461 2010

[22] F Mahe T Rognes C Quince C de Vargas andM DunthornldquoSwarm Robust and fast clusteringmethod for amplicon-basedstudiesrdquo PeerJ vol 2014 no 1 2014

[23] M Ghodsi B Liu and M Pop ldquoDNACLUST Accurate andefficient clustering of phylogenetic marker genesrdquo BMC Bioin-formatics vol 12 article no 271 2011

[24] L Fu B Niu Z Zhu SWu andW Li ldquoCD-HIT accelerated forclustering the next-generation sequencing datardquoBioinformaticsvol 28 no 23 pp 3150ndash3152 2012

[25] H Teeling J Waldmann T Lombardot M Bauer and F OGlockner ldquoTETRA a web-service and a stand-alone programfor the analysis and comparison of tetranucleotide usage pat-terns inDNA sequencesrdquoBMCBioinformatics vol 5 article 1632004

[26] Y Wang H C M Leung S M Yiu and F Y L Chin ldquoMeta-Cluster 40 A novel binning algorithm for NGS reads and hugenumber of speciesrdquo Journal of Computational Biology vol 19no 2 pp 241ndash249 2012

[27] YWangH CM Leung SM Yiu and F Y L Chin ldquoMetaclus-ter 50 a two-roundbinning approach formetagenomic data forlow-abundance species in a noisy samplerdquo Bioinformatics vol28 no 18 Article ID bts397 pp i356ndashi362 2012

[28] D D Kang J Froula R Egan and Z Wang ldquoMetaBAT anefficient tool for accurately reconstructing single genomes fromcomplex microbial communitiesrdquo PeerJ vol 2015 no 8 2015

[29] J Alneberg B S Bjarnason I De Bruijn et al ldquoBinning meta-genomic contigs by coverage and compositionrdquo Nature Meth-ods vol 11 no 11 pp 1144ndash1146 2014

[30] Y-W Wu Y-H Tang S G Tringe B A Simmons and SW Singer ldquoMaxBin An automated binning method to recoverindividual genomes from metagenomes using an expectation-maximization algorithmrdquoMicrobiome vol 2 no 1 2014

[31] QWangGMGarrity JM Tiedje and J R Cole ldquoNaıve Baye-sian classifier for rapid assignment of rRNA sequences into thenew bacterial taxonomyrdquo Applied and Environmental Microbi-ology vol 73 no 16 pp 5261ndash5267 2007

[32] N Chaudhary A K Sharma P Agarwal A Gupta and V KSharma ldquo16S classifier A tool for fast and accurate taxono-mic classification of 16S rRNAhypervariable regions inmetage-nomic datasetsrdquo PLoS ONE vol 10 no 2 2015

[33] S P Szafranski M L Wos-Oxley R Vilchez-Vargas et alldquoHigh-resolution taxonomic profiling of the subgingival micro-biome for biomarker discovery and periodontitis diagnosisrdquoApplied and EnvironmentalMicrobiology vol 81 no 3 pp 1047ndash1058 2015

[34] K Vervier PMaheM Tournoud J-B Veyrieras and J-P VertldquoLarge-scale machine learning for metagenomics sequenceclassificationrdquo Bioinformatics vol 32 no 7 pp 1023ndash1032 2016

[35] A E Darling G Jospin E Lowe F A Matsen H M Bik andJ A Eisen ldquoPhyloSift Phylogenetic analysis of genomes andmetagenomesrdquo PeerJ vol 2013 no 1 2014

[36] Z Liu W Hsiao B L Cantarel E F Drabek and C Fraser-Liggett ldquoSparse distance-based learning for simultaneous mul-ticlass classification and feature selection of metagenomic datardquoBioinformatics vol 27 no 23 pp 3242ndash3249 2011

[37] O Tanaseichuk J Borneman and T Jiang ldquoPhylogeny-basedclassification ofmicrobial communitiesrdquoBioinformatics vol 30no 4 pp 449ndash456 2014

14 BioMed Research International

[38] H Cui and X Zhang ldquoAlignment-free supervised classificationof metagenomes by recursive SVMrdquo BMCGenomics vol 14 no1 article no 641 2013

[39] E Pasolli D T Truong F Malik L Waldron and N SegataldquoMachine Learning Meta-analysis of Large MetagenomicDatasets Tools and Biological Insightsrdquo PLoS ComputationalBiology vol 12 no 7 2016

[40] M Arumugam E D Harrington K U Foerstner J Raes andP Bork ldquoSmashCommunity A metagenomic annotation andanalysis toolrdquo Bioinformatics vol 26 no 23 pp 2977-29782010

[41] K J Hoff T Lingner P Meinicke and M Tech ldquoOrpheliaPredicting genes in metagenomic sequencing readsrdquo NucleicAcids Research vol 37 no 2 pp W101ndashW105 2009

[42] K J HoffM Tech T Lingner R Daniel BMorgenstern and PMeinicke ldquoGene prediction in metagenomic fragments A largescale machine learning approachrdquo BMC Bioinformatics vol 9article no 217 2008

[43] I Guyon and A Elisseeff ldquoAn introduction to variable andfeature selectionrdquo Journal of Machine Learning Research vol 3pp 1157ndash1182 2003

[44] Y Saeys I Inza and P Larranaga ldquoA review of feature selectiontechniques in bioinformaticsrdquo Bioinformatics vol 23 no 19 pp2507ndash2517 2007

[45] W-P Chen S-J J Tsai Y-C Hu and Y-L Lin ldquoMetage-nomic analysis and features selection in human oral microbiotaassociated with periodontal diseaserdquo in Proceedings of the 33rdWorkshop on Combinatorial Mathematics and Computationeory pp 58ndash64 2016

[46] G C Armitage ldquoDevelopment of a classification system forperiodontal diseases and conditionsrdquo Annals of Periodontologyvol 4 no 1 pp 1ndash6 1999

[47] C Y Tang S-M Yiu H-Y Kuo et al ldquoApplication of 16S rRNAmetagenomics to analyze bacterial communities at a respiratorycare centre in Taiwanrdquo Applied Microbiology and Biotechnologyvol 99 no 6 pp 2871ndash2881 2015

[48] L Cai L Ye A H Y Tong S Lok and T Zhang ldquoBiased Di-versityMetrics Revealed byBacterial 16S PyrotagsDerived fromDifferent Primer Setsrdquo PLoS ONE vol 8 no 1 2013

[49] J Zhang K Kobert T Flouri and A Stamatakis ldquoPEAR A fastand accurate Illumina Paired-End reAd mergeRrdquo Bioinformat-ics vol 30 no 5 pp 614ndash620 2014

[50] J G Caporaso J Kuczynski J Stombaugh et al ldquoQIIME allowsanalysis of high-throughput community sequencing datardquoNature Methods vol 7 no 5 pp 335-336 2010

[51] B J Haas D Gevers A M Earl et al ldquoChimeric 16S rRNAsequence formation and detection in Sanger and 454-pyro-sequenced PCR ampliconsrdquo Genome Research vol 21 no 3 pp494ndash504 2011

[52] R C Edgar B J Haas J C Clemente C Quince and RKnight ldquoUCHIME improves sensitivity and speed of chimeradetectionrdquo Bioinformatics vol 27 no 16 pp 2194ndash2200 2011

[53] T ChenW-H Yu J Izard OV BaranovaA Lakshmanan andF E Dewhirst ldquoThe Human Oral Microbiome Database a webaccessible resource for investigating oral microbe taxonomicand genomic informationrdquo Database the journal of biologicaldatabases and curation vol 2010 p baq013 2010

[54] T C J Hill K A Walsh J A Harris and B F Moffett ldquoUsingecological diversity measures with bacterial communitiesrdquoFEMS Microbiology Ecology vol 43 no 1 pp 1ndash11 2003

[55] R PDarveau ldquoPeriodontitis a polymicrobial disruption of hosthomeostasisrdquoNature ReviewsMicrobiology vol 8 no 7 pp 481ndash490 2010

[56] J M Albandar L J Brown and H Loe ldquoPutative PeriodontalPathogens in Subgingival Plaque of Young Adults with andWithout Early-Onset Periodontitisrdquo Journal of Periodontologyvol 68 no 10 pp 973ndash981 1997

[57] Y Takeuchi M Umeda M Sakamoto Y Benno Y Huang andI Ishikawa ldquoTreponema socranskii Treponema denticola andPorphyromonas gingivalis are associated with severity of peri-odontal tissue destructionrdquo Journal of Periodontology vol 72no 10 pp 1354ndash1363 2001

[58] J J Zambon ldquoActinobacillus actinomycetemcomitans in hu-man periodontal diseaserdquo Journal of Clinical Periodontologyvol 12 no 1 pp 1ndash20 1985

[59] J Slots and M Ting ldquoActinobacillus actinomycetemcomitansand Porphyromonas gingivalis in human periodontal diseaseOccurrence and treatmentrdquo Periodontology 2000 vol 20 no 1pp 82ndash121 1999

[60] BKChoi B J Paster F EDewhirst andUBGobel ldquoDiversityof cultivable and uncultivable oral spirochetes from a patientwith severe destructive periodontitisrdquo Infection and Immunityvol 62 no 5 pp 1889ndash1895 1994

[61] B Liu L L Faller N Klitgord et al ldquoDeep sequencing of theoral microbiome reveals signatures of periodontal diseaserdquoPLoS ONE vol 7 no 6 Article ID e37919 2012

[62] J Wang J Qi H Zhao et al ldquoMetagenomic sequencingreveals microbiota and its functional potential associated withperiodontal diseaserdquo Scientific Reports vol 3 2013

[63] A L Griffen C J Beall J H Campbell et al ldquoDistinct and com-plex bacterial profiles in human periodontitis and healthrevealed by 16S pyrosequencingrdquo e ISME Journal vol 6 no6 pp 1176ndash1185 2012

[64] M A Masalma F Armougom W Michael Scheld et al ldquoTheexpansion of the microbiological spectrum of brain abscessesWith use of multiple 16S ribosomal DNA Sequencingrdquo ClinicalInfectious Diseases vol 48 no 9 pp 1169ndash1178 2009

[65] MAlMasalmaM LonjonHRichet et al ldquoMetagenomic anal-ysis of brain abscesses identifies specific bacterial associationsrdquoClinical Infectious Diseases vol 54 no 2 pp 202ndash210 2012

[66] I Nasidze J Li DQuinque K Tang andM Stoneking ldquoGlobaldiversity in the human salivary microbiomerdquoGenome Researchvol 19 no 4 pp 636ndash643 2009

[67] F V Wintzingerode U B Gobel and E Stackebrandt ldquoDeter-mination of microbial diversity in environmental samples Pit-falls of PCR-based rRNA analysisrdquo FEMSMicrobiology Reviewsvol 21 no 3 pp 213ndash229 1997

[68] M Tezal F A Scannapieco J Wactawski-Wende S G Grossiand R J Genco ldquoSupragingival plaque may modify the effectsof subgingival bacteria on attachment lossrdquo Journal of Periodon-tology vol 77 no 5 pp 808ndash813 2006

[69] R Caruana and A Niculescu-Mizil ldquoAn empirical comparisonof supervised learning algorithmsrdquo in Proceedings of the ICML2006 23rd International Conference on Machine Learning pp161ndash168 USA June 2006

[70] R Caruana N Karampatziakis and A Yessenalina ldquoAn empir-ical evaluation of supervised learning in high dimensionsrdquo inProceedings of the 25th International Conference on MachineLearning pp 96ndash103 Finland July 2008

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 2: Composition Analysis and Feature Selection of the Oral

2 BioMed Research International

pregnancy outcomes rheumatoid arthritis aspiration pneu-monia and cancer [6ndash11]

In the past half century numerous studies have char-acterized the community composition of the oral micro-biota and described the association between periodonti-tis and pathogenic microorganisms For example Aggre-gatibacter actinomycetemcomitans Porphyromonas gingivalisTannerella forsythia Treponema denticola Fusobacteriumnucleatum and Prevotella intermedia have traditionally beenconsidered pathogenic bacteria contributing to periodontitis[5 12 13] Socransky et al [14] described the role of 5main microbial complexes in the subgingival biofilm Theyreported that red complex species Porphyromonas gingivalisTreponema denticola and Tannerella forsythia exhibited avery strong relationship with periodontitis Subsequentlyother association and elimination studies have confirmed theinvolvement of the three members of the red complex andsome members of the orange complex such as Prevotellaintermedia Parvimonas micra Fusobacterium nucleatumEubacterium nodatum and Aggregatibacter actinomycetem-comitans in the etiology of different periodontal conditions[15] Additionally during the past decade researchers usingculture-independent molecular techniques have shown thatsome representatives of the generaMegasphaera ParvimonasDesulfobulbus and Filifactor are more abundant in patientswith periodontal diseases whereas members ofAggregatibac-ter Prevotella Selenomonas Streptococcus Actinomyces andRothia are more abundant in healthy patients [16ndash19]

Machine learning is data method that involves findingpatterns and making predictions from data based on multi-variate statistics data mining and pattern recognition Thistechnology had been used to solved many metagenomicproblems such as operational taxonomic unit (out) clustering[20ndash24] binning [25ndash30] taxonomic profiling and assign-ment [31ndash35] comparative metagenomics [36ndash38] and geneprediction [39ndash42] In addition to the learning algorithmand the model the most important component of a learningsystem is how features are extracted from the domain data aprocess known as feature selection The purposes of featureselection include improving the prediction performance ofthe predictors providing faster and more cost-effective pre-dictors and providing a better understanding of the underly-ing process that generated the data [43ndash45] Feature selectionmethodology can be categorized into three classes (filterwrapper and embedded methods) according to how thefeature selection search is combined with the construction ofthe classification mode Filter methods estimate the relevanceof features by analysis of the intrinsic properties of the dataThesemethods are computationally simple and fast can scaleto very high-dimensional datasets easily and are independentof the classification algorithm

Although much is known about individual species asso-ciated with pathogenesis the global structure of the bacterialcommunity and the microbial signatures of periodontaldisease are still poorly understood In this study we exploredthe microbial diversity in the subgingival plaque of healthypatients and patients with periodontal disease using culture-independent molecular methods based on 16S ribosomalDNA cloning We also compared the bacterial community

compositions between healthy patients and patients withperiodontal disease and determined the core microbiomespresent in these patients Furthermore we proposed a novelalgorithm for feature selection and microbes with significantdifferences were extracted as features and provided to gen-erate feature combinations by applying our algorithm Usingmachine learning methods we built prediction models andfound that the health status of patients with periodontal dis-ease could be identified accurately using only a few features

2 Materials and Methods

21 16S rRNA Sequence Dataset In total 76 samples usedfor this study were collected from subgingival plaques of76 unrelated individuals including 10 patients with severeperiodontal disease 40 patients with moderate periodontaldisease and 26 healthy controls This study was approvedby the Institutional Review Board of Chang Gung Memo-rial Hospital Taiwan (approval no 102-4239B) All patientsprovided informed consent prior to their enrolment inthe study The oral health statuses of all individuals weredetermined by a dentist who performed a full-mouth clinicalexamination that included clinical parameters of periodontalpocket depths gingival recession clinical attachment lossbleeding on probing tooth mobility and furcation involve-ment These clinical parameters were measured at 6 sitesper tooth (mesiobuccal buccal distobuccal distolinguallingual and mesiolingual) at all teeth Table 1 summarizesthe parameters of periodontal pocket depths bleeding onprobing and clinical attachment loss for all of the samplesTheclassification of periodontitis as slight moderate or severewas based on the guidelines of the American Academy ofPeriodontology [46] Subjects who had received previousperiodontal therapy within two years and recent history ofantibiotics taking within last 6 months were excluded

After sampling DNA extraction and polymerase chainreaction (PCR) were performed based on methods describedby Tang et al [47] Following extraction barcoded PCRamplification was performed with 382-bp amplicons flankingthe highly variable V1-V2 region of the 16S rRNA genesequence [48] Next-generation sequencing evaluation of oralmicrobial communities was carried out using an IlluminaMiSeq Desktop Sequencer after 30 cycles of PCR to enrichthe adapter-modified DNA fragments

22 Sequence Processing Paired-end reads sequenced by theIllumina Sequencer were assembled with PEAR software[49] Using split librariespy in QIIME with default param-eters [50] assembled reads were demultiplexed and low-quality reads were filtered The GoldG database containingthe ChimeraSlayer reference database in the Broad Micro-biome Utilities [51] was used with UCHIME software [52]for chimera detection and removalThe remaining reads wereclustered into OTUs using a de novo OTU selection protocolat the 97 identity level with a USEARCH algorithm [21]Before clustering sequences we filtered out all reads thatoccurred fewer than three times This reduced the number ofunique sequences to a computationally manageable level and

BioMed Research International 3

Table 1 Clinical characteristics of studied subjects Clinical attachment loss and probing depth were measured in mm and represent themean for all collected sites in the oral cavity of studied subjects

Characteristics Healthy Moderate periodontitis Severe periodontitisProbing depth (mean plusmn sd) 13 plusmn 06 50 plusmn 13 79 plusmn 07Clinical attachment loss (mean plusmn sd) 16 plusmn 07 57 plusmn 15 86 plusmn 11 sites with bleeding on probing (mean plusmn sd) 28 plusmn 18 683 plusmn 232 797 plusmn 175

potentially reduced the number of errors from sequencingand contamination The taxonomy associated with eachOTUwas assigned by blasting a representative sequence of eachOTU against the Human Oral Microbiome Database [53](HOMD)The sequence processing was carried out using ourmetagenomic analysis platforms [45]

23 Diversity and Significance Analysis Sample data storedin the biological observation matrix format were subjectedto statistical analysis using R language We analyzed thesequencing depth of samples prior to downstream analysisusing the Shannon index The main microbes and taxo-nomic composition of the microbiota in each sample werealso estimated Abundance differences of microbes betweensample groups were evaluated using the KruskalminusWallis testFour non-phylogeny-based metrics namely the observerspecies chao 1 metric [54] Ace richness and Shannon indexwere used to evaluate alpha diversity which representedthe amount of diversity contained within communities byapplying the phyloseq R package UniFrac is a distancemetric used for comparing biological communities Principalcoordinate analysis (PCoA) with weighted UniFrac distanceswas applied to evaluate beta diversity which represented theamount of diversity shared among communities Principalcomponent analysis (PCA) was used to characterize theprimary microbes contained within communities

24 Feature Selection and Machine Learning In this studywe proposed a method of feature selection for selectingthe informative microbes to predict whether an individualsuffered fromperiodontal disease First the microbes presentat less than 05 relative abundance in all samples wereignored and nonparametric KruskalminusWallis tests were usedto detectmicroorganisms with significantly differential abun-dance between healthy patients and patients with periodontaldisease Microbes with more significant differential scoreswere considered features with more information Then theprioritized feature combination-generated algorithm shownin Algorithm 1 was adopted to produce the feature combina-tions composed by these more informative features

In prioritized order the feature combinations wereapplied to build classifiers with machine learning algorithmssuch as deep learning support vector machine (SVM)random forests and logistic regression We picked 80 ofsamples from both healthy and disease cases to train thepredictionmodel and the remaining cases were used for test-ing The prediction ability of each feature combination wasevaluated by calculating the average accuracy from 10 pre-dictions with different training and testing sample sets Here

we selected 10 of the most significant features having p valuesbetween 327E-11 and 777E-9 In total 1023 feature combi-nations were evaluated for their prediction ability using deeplearning SVM random forest and logistic regression meth-ods These machine learning algorithms were supported bythe R packages H

2O e1071 randomForest and stats respec-

tively We considered the radial basis function kernel forSVM Parameters for each machine learning algorithm weretuned using grid search and the parameters that obtainedbetter accuracy were adopted for training prediction models

3 Results and Discussion

31 Sample Sequencing and Identification In total 76 sub-gingival plaque samples from 76 unrelated individuals weredivided into three classes according to their periodontalhealth status ie healthy (H) severe periodontitis (SP) andmoderate periodontitis (MP) Following DNA extraction andbarcoded PCR amplification these samples were sequencedgenerating a total of 7530767 sequences After filtering andtrimming 6170984 sequences remained and there were 481OTUs in all samples (481 and 429 in diseased and healthysamples respectively) Due to variations in the number ofsequences among samples the total sequence reads withina sample was normalized to the relative abundance forsubsequent analyses

32 Taxonomic Composition of the Human Oral Micro-biota Table 2 summarizes the dominant microbes in thehuman oral microbial communities In the experimentalresults the microbial communities included 12 differentphyla Bacteroidetes Firmicutes Fusobacteria ProteobacteriaSpirochaetes Actinobacteria Candidate division TM7 Syner-gistetes Fusobacteria Candidate division SR1 Gracilibacteriaand Chloroflexi Bacteroidetes (37) was the most abundantphylum in the human oral microbiota The major generaconsisted of previously characterized oral bacteria includingPrevotella (1356) Fusobacterium (1130) Porphyromonas(1094) Treponema (886) Streptococcus (652) Lep-totrichia (476) and Capnocytophaga (364) In summarythere were 25 classes 40 orders 66 families 124 genera and355 species at each taxonomic level

In comparison of the compositions of microbial commu-nities between healthy patients and patients with periodonti-tis we found that the spectra of microbial communities dif-fered In healthy samples the dominant genera were Strepto-coccus (1309) Prevotella (1243) Fusobacterium (1170)Capnocytophaga (625) Leptotrichia (560)Alloprevotella(426) Campylobacter (394) Porphyromonas (378)

4 BioMed Research International

GenPFC((1198861 1198862 119886119899)) 119862 Generate prioritized feature combinations

Input (1198861 1198862 119886119899) a list with 119899 features in prioritized order

Output a queue 119876 used to store 2119899 minus 1 feature combinations1 119876 larr997888 (0) 119862 Enqueue empty set 0 into queue 1198762 for 119894 larr997888 1 to 119899 do 119862 Generate attribute combinations according to features in the list3 119879 larr997888 119876 119862 Copy 119876 into 119879 which is a temporary queue4 for each 119904 in 119879 do5 Enqueue(119876 119904 cup 119886

119894)

6 Dequeue(119876) 119862 Delete first empty set 0 from queue 1198767 return 119876

Algorithm 1 The prioritized feature combination-generated algorithm was used to generate all combinations of selected features inprioritized order As an example when 119899 equals four the generated list will be (1000 0100 1100 0010 1010 0110 1110 0001 1001 01011101 0011 1011 0111 1111) Each element is a combination and denotes whether the four features were selected in that combination (eg thecombination containing the first and third features is represented as 1010)

Table 2 Dominant microbes of the human oral microbiota at each taxonomic level

Phylum Class OrderBacteroidetes 3741 Bacteroidia 3171 Bacteroidales 3171Firmicutes 2082 Fusobacteria 1606 Fusobacteriales 1606Fusobacteria 1606 Spirochaetia 886 Spirochaetales 886Proteobacteria 930 Bacilli 783 Lactobacillales 706Spirochaetes 886 Clostridia 678 Clostridiales 678Actinobacteria 238 Negativicutes 521 Selenomonadales 521Family Genus SpeciesPrevotellaceae 1639 Prevotella 1356 Porphyromonas gingivalis 730Porphyromonadaceae 1296 Fusobacterium 1130 Fusobacterium nucleatum subsp vincentii 523Fusobacteriaceae 1130 Porphyromonas 1094 Prevotella intermedia 462Spirochaetaceae 886 Treponema 886 Streptococcus sp oral taxon 423 262Streptococcaceae 652 Streptococcus 652 Bacteroidales sp oral taxon 274 218Veillonellaceae 521 Leptotrichia 476 Prevotella loescheii 215

Veillonella (349) and Neisseria (327) however inpatients with periodontal disease the dominant genera werePorphyromonas (1467) Prevotella (1416) Treponema(1190) Fusobacterium (1109) Leptotrichia (432) andStreptococcus (310) At the species level Streptococcus sporal taxon 423 (02-36) was the most abundant species inhealthy patients whereas Porphyromonas gingivalis (0-31)was the most abundant species in patients with periodontitisTable 3 compares the dominant microbes between healthypatients and patients with periodontitis at each taxonomiclevel The genus and species level taxonomic compositionsbetween healthy patients and patients with periodontitis areshown in Figures 1 and 2 Streptococcus was more abun-dant in samples from all healthy individuals but decreasedin samples from patients with periodontitis AdditionallyPorphyromonas and Treponema were more abundant inpatients with periodontitis but decreased significantly insamples from healthy individuals In total 25 species wereidentified with significantly different abundances betweensample groups Porphyromonas gingivalis was the specieswith the most significantly differential abundance betweensamples fromhealthy patients and patients with periodontitis(p value = 241E-9)

Overall our findings were largely comparable to thoseof previous studies [14 55ndash61] indicating that species suchas Porphyromonas gingivalisTreponema denticola Tannerellaforsythia Filifactor alocis Treponema socranskii Aggregat-ibacter actinomycetemcomitans Treponema vincentii andMycoplasma faucium were significantly enriched in samplesfrom patients with periodontitis Furthermore we found aset of species including Streptococcus sanguinisHaemophilusparainfluenzae Capnocytophaga granulosa Gemella morbil-lorum Campylobacter showae and Granulicatella adiacenswere significantly enriched in samples from healthy individ-uals

Several studies have described the bacterial communitiesin patientswith periodontitis and healthy control participantsusing metagenomics [16ndash19 61ndash63] The dominant microor-ganisms associated with periodontitis and the healthy statewere largely consistent in those studies however we observedseveral discrepancies First in addition to common diseased-associated microorganisms such as Porphyromonas gingi-valis Treponema denticola Tannerella forsythia Filifactoralocis and Aggregatibacter actinomycetemcomitans we alsofound that the speciesMycoplasma faucium was significantlyenriched in samples from patients with periodontal disease

BioMed Research International 5

Table 3 Dominant microbes of the oral microbiota between healthy patients and patients with periodontitis at each taxonomic level

Healthy patients Patients with periodontitisPhylum

Bacteroidetes 3193 Bacteroidetes 4026Firmicutes 2690 Firmicutes 1766Fusobacteria 1731 Fusobacteria 1542Proteobacteria 1181 Spirochaetes 1190Actinobacteria 336 Proteobacteria 799Saccharibacteria 320 Synergistetes 250

ClassBacteroidia 2476 Bacteroidia 3532Fusobacteria 1731 Fusobacteria 1542

Bacilli 1523 Spirochaetia 1190Negativicutes 671 Clostridia 793Flavobacteriia 667 Negativicutes 443Clostridia 457 Bacilli 398

OrderBacteroidales 2476 Bacteroidales 3532Fusobacteriales 1731 Fusobacteriales 1542Lactobacillales 1394 Spirochaetales 1190Selenomonadales 671 Clostridiales 793Flavobacteriales 667 Selenomonadales 443Clostridiales 457 Lactobacillales 348

FamilyPrevotellaceae 1669 Porphyromonadaceae 1719Streptococcaceae 1309 Prevotellaceae 1624Fusobacteriaceae 1170 Spirochaetaceae 1190Veillonellaceae 671 Fusobacteriaceae 1109

Flavobacteriaceae 667 Veillonellaceae 443Leptotrichiaceae 561 Leptotrichiaceae 432

GenusStreptococcus 1309 Porphyromonas 1467Prevotella 1243 Prevotella 1416

Fusobacterium 1170 Treponema 1190Capnocytophaga 625 Fusobacterium 1109Leptotrichia 560 Leptotrichia 432Alloprevotella 426 Streptococcus 310

SpeciesStreptococcus sp oral taxon 423 588 Porphyromonas gingivalis 1101

Fusobacterium nucleatum subsp vincentii 422 Prevotella intermedia 602Fusobacterium nucleatum subsp polymorphum 352 Fusobacterium nucleatum subsp vincentii 576

Veillonella parvula 333 Treponema denticola 268Bacteroidales sp oral taxon 274 311 Fusobacterium nucleatum subsp nucleatum 234

Fusobacterium nucleatum subsp animalis 309 Tannerella forsythia 232

There were 26 samples that contained this species at greaterthan 05 abundance and only one of these samples wasderived from a healthy patient The average relative abun-dance of Mycoplasma faucium was 059 in all samples(004 and 087 in samples from healthy patients andpatients with periodontal disease respectively) and was upto 485 in one diseased sample Although this is a rare

bacterium in the normal microbiota of the human orophar-ynx some reports have identified this pathogen in brainabscesses [64 65] Additionally Liu et al [61] characterizedthe genomes of key players in the subgingival microbiotain patients with periodontitis including an unculturableTM7 organism They also demonstrated that TM7 organismswere significantly enriched in samples from patients with

6 BioMed Research International

lowast PorphyromonasPrevotella

lowast Treponemalowast Tannerella

lowast Fretibacterium lowast Filifactor

AggregatibacterBacteroidetes_[Gminus5]

lowast MycoplasmaParvimonas

lowast Bacteroidetes_[Gminus3]Bacteroidetes_[Gminus6]

AnaeroglobusFusobacteriumlowast Streptococcus

lowast CapnocytophagaLeptotrichia

AlloprevotellaCampylobacter

VeillonellaNeisseria

Bacteroidales_[Gminus2]lowast Haemophilus

TM7_[Gminus1]SelenomonasTM7_[Gminus5]lowast Gemella

DialisterCorynebacteriumlowast Granulicatella

ActinomycesSR1_[Gminus1]

Peptostreptococcaceae_[XI][Gminus7]

12 9 6 3 0 3 6 9 12Average abundance

Sample group

Case

Control

Gen

us

Figure 1 Microbial compositions of samples from healthy patients and patients with periodontitis at the genus level The abundances werecalculated by averaging the relative abundances in samples from healthy patients and patients with periodontitis Only genera with gt 05abundance in at least one sample were included Genera with significant differences in abundance between sample groups are indicated withasterisks (lowast) (p value lt 00001)

periodontitis In our study 49 of 76 samples contained TM7bacteria at greater than 1 abundance (average abundanceof 21 in all samples) In samples from healthy patientsand patients with periodontitis the average abundanceswere 32 and 149 respectively However significantenrichment was not observed in samples from patients withperiodontitis Furthermore we found that the subspeciesFusobacterium nucleatum subsp polymorphum which isrelated to periodontal disease and is themember of the orangecluster described by Socransky et al [14] is more abundant inhealthy patients In our results the average abundances were352 and 113 in samples fromhealthy patients and patientswith periodontitis respectively This situation also can beobserved in other three species including Campylobactergracilis Campylobacter rectus and Campylobacter showaeThis discrepancy could be explained by geographic variability[66] or by differences in the depths of the pockets sampled[14] as well as the sample size and the DNA analytic bias [67]Finally Spearmanrsquos rank correlation coefficientwas computed

to assess association between each pair of species associatedwith periodontal disease Figure 3 shows that a very strongrelationship exhibited among species Porphyromonas gingi-valis Treponema denticola and Tannerella forsythia

In our study there are 25 bacterial species with signif-icantly different abundances between healthy patients andpatients with periodontitis The relationships of these speciesto pocket depth and clinical attachment loss were examinedFigure 4 shows that three species Porphyromonas gingivalisTreponema denticola and Tannerella forsythia exhibited avery strong relationship with pocket depth and clinicalattachment loss For instance the three species increasedin abundance with increasing pocket depth and clinicalattachment loss The abundances of those species amongdifferent level of pocket depth and clinical attachment losswere different significantly However it should be noted thatnot only oral microorganisms but also others factors suchas supragingival plaque would affect the pocket depth andclinical attachment loss [68]

BioMed Research International 7

lowast Porphyromonas gingivalisPrevotella intermedia

Fusobacterium nucleatum_subsp_vincentiilowast Treponema denticola

lowast Fusobacterium nucleatum_subsp_nucleatumlowast Tannerella forsythia

Porphyromonas endodontalislowast Filifactor alocis

lowast Treponema vincentiiTreponema sp_oral_taxon_231

lowast Treponema socranskiilowast Aggregatibacter actinomycetemcomitans

lowast Fretibacterium sp_oral_taxon_359lowast Leptotrichia sp_oral_taxon_218lowast Treponema sp_oral_taxon_237

Prevotella sp_oral_taxon_313Prevotella pleuritidis

Bacteroidetes sp_oral_taxon_511lowast Mycoplasma faucium

lowast Fretibacterium sp_oral_taxon_360Parvimonas micra

Bacteroidetes sp_oral_taxon_516Campylobacter sp_oral_taxon_044

lowast Prevotella sp_oral_taxon_304Anaeroglobus geminatus

9 6 3 0 3 6Average abundance

Sample group

Case

Control

Spec

ies

lowast Streptococcus sp_oral_taxon_423lowast Fusobacterium nucleatum_subsp_polymorphum

Veillonella parvula

Fusobacterium nucleatum_subsp_animalisBacteroidales sp_oral_taxon_274

Prevotella loescheiilowast Streptococcus sanguinis

Alloprevotella tanneraelowast Haemophilus parainfluenzae

Campylobacter gracilislowast Fusobacterium sp_HOT_204

Porphyromonas catoniaelowast Capnocytophaga granulosa

Neisseria flavescensNeisseria sicca

TM7sp_oral_taxon_356Prevotella oris

Streptococcus cristatusCapnocytophaga leadbetteri

Leptotrichia sp_oral_taxon_417lowast Gemella morbillorumStreptococcus dentisani

lowast Campylobacter showaeAlloprevotella sp_oral_taxon_473

Corynebacterium matruchotiiDialister invisus

Leptotrichia sp_oral_taxon_225lowast Capnocytophaga sp_oral_taxon_412

Capnocytophaga sputigenalowast Granulicatella adiacens

Prevotella pallensStreptococcus sp_oral_taxon_058lowast Prevotella sp_oral_taxon_300

Aggregatibacter segnisCapnocytophaga sp_oral_taxon_326

TM7sp_oral_taxon_346Leptotrichia hongkongensis

Campylobacter concisusLeptotrichia sp_oral_taxon_392

Figure 2 Microbial compositions of samples from healthy patients and patients with periodontitis at the species levelThe abundances werecalculated by averaging the relative abundances in samples from healthy patients and patients with periodontitis Only species with gt 05abundance in at least one sample are shown Species with significant differences in abundance between sample groups are indicated withasterisks (lowast) (p value lt 00001)

8 BioMed Research International

F nuc ss nucleatum

C rectus

P intermedia

P gingivalis

T forsythia

T denticolaF nuc ss polymorphum

C showae

P micra

F nuc ss vincentiiC gracilis

P nigrescens

F nuc s

s nucle

atum

C rectu

s

P inter

media

P gingiva

lis

T forsy

thia

T denticola

F nuc s

s polym

orphum

C showae

P micr

aF n

uc ss v

incentii

C gracilis

P nigres

cens

1

08

06

04

02

0

-02

-04

-06

-08

-1

Figure 3 The relationships among species were evaluated using Spearmanrsquos rank correlation coefficient

p lt 00001

Porphyromonas gingivalis Treponema denticola Tannerella forsythia

12

9

6

3

12

9

6

3

12

9

6

3

15

12

lt 3 3 4 5 6 gt 7

Porphyromonas gingivalis

p lt 00001

p lt 00001 p lt 00001 p lt 00001

lt 3 3 4 5 6 gt 7

Treponema denticola

p lt 00001 15

12

lt 3 3 4 5 6 gt 7Pocket depth (mm)

Tannerella forsythia

9 9

6 6

3

15

12

9

6

3 3

lt 3 3 4 5 6 gt 7 lt 3 3 4 5 6 gt 7 lt 3 3 4 5 6 gt 7Clinical attachment loss (mm)

Aver

age a

bund

ance

Aver

age a

bund

ance

Figure 4 Relationships of the average abundance of three species to selected pocket depths and clinical attachment loss levels Significanceof differences among pocket depth levels was tested using the Kruskal-Wallis test

33 Diversity of Bacterial Community Profiles To evaluatethe alpha diversity of the microbial communities Shannonindex curves scores and richness metrics (Observed Chao1and ACE) were applied as shown in Figure 5 As depictedin Figure 5(a) the Shannon diversity index curves clearlyreached plateau levels after the sequence number exceeded5000 in all three health statuses indicating that the microbial

composition for each health status was well represented bythe sequencing depth As shown in Figure 5(b) the averagerichness measured by Observed Chao1 and Ace indexeswas higher in samples from patients with periodontitis thanin samples from healthy individuals however these resultswere in contrast to the results from the Shannon diversityindex Thus the relative abundance of each microbe was

BioMed Research International 9

6

Health StatusHMP SP

4

2

0

0 5000 10000 15000Sequences Per Sample

Aver

age S

hann

on In

dex

(a)

Observed Chao1 ACE

Case

Control

Case

Control

Case

Control

0

100

200

300

400

500

Sample group

Alp

ha D

iver

sity M

easu

re

Shannon

Case

Control

30

35

40

45

p value = 0166

(b)

Figure 5 (a) The sequencing depths measured by average scores from the Shannon index reached a plateau when the sequence numberexceeded 5000 (b) Alpha-diversity metrics (richness and Shannon index) were employed to measure the microbial communities of samplesfrom healthy patients and patients with periodontitis The average richness of microbes was higher in patients with periodontal disease thanin healthy patients however the microbial communities of healthy patients exhibited higher Shannon indexes

more balanced in samples from healthy individuals than insamples from patients with periodontal disease and therewere more microbes with low relative abundance in samplesfrom patients with periodontitis

To further explore the relationships between bacterialcommunities in healthy patients and patients with peri-odontal disease PCoAwas performed (Figure 6(a)) Analysisof beta diversity based on the weighted UniFrac distancesshowed greater concentration in diseased samples than inhealthy samples In other words the microbial composi-tions of diseased samples were more similar to each otherAs shown in Figure 6(b) PCA of microbial communitiesrevealed that the core genera in healthy samples includedStreptococcus Capnocytophaga Campylobacter VeillonellaAlloprevotella TM7 [G-1] Leptotrichia and Selenomonaswhereas those in samples from patients with periodontitiswere Filifactor Treponema Fretibacterium Porphyromonasand Tannerella

34 Machine Learning and Feature Selection Before applyingthe machine learning algorithm to classify samples it isnecessary to select the features from the samples and trainpredictionmodels Table 4 lists featureswith difference scores

p lt 1E-07 Based on significant differences between healthypatients and patients with periodontitis we selected the top10 microbes with more information as features In total1023 combinations of selected features were generated by ouralgorithm All feature combinations were evaluated by SVMrandom forest logical regression and deep learning machinelearning methods and the average accuracies were 088 093085 and 090 respectively Figure 7 shows the performanceof each machine learning method In general the accuracyof prediction increased slightly with the number of featuresused except in logistic regression From our results wefound that random forests had better predictive ability thanthe other methods Applying combinations consisting ofPeptoniphilaceae sp oral taxon 113 Streptococcus sanguinisMollicutes sp oral taxon 906Aggregatibacter actinomycetem-comitans Porphyromonas gingivalis Peptostreptococcaceae sporal taxon 950 and Lachnospiraceae sp oral taxon 500or Stomatobaculum sp oral taxon 373 Desulfobulbus sporal taxon 041 Peptoniphilaceae sp oral taxon 113 Strep-tococcus sanguinis Aggregatibacter actinomycetemcomitansPorphyromonas gingivalis and Leptotrichia sp oral taxon 218showed that random forests could predict the health status ofsamples accurately The feature combinations having averageaccuracies of more than 094 are reported in Table 5

10 BioMed Research International

minus01

00

01

02

minus02 minus01 00 01 02Axis1 [403]

Axi

s2 [1

35

]

Health statusHMPSP

Sample groupCaseControl

(a)

Prevotella

Fusobacterium

Porphyromonas

Treponema

Streptococcu

s

Leptotri

chia

Capnocytophaga

Alloprevotella

Campylobacter

Neisse

ria

Veillonella

Bacteroidales_[Gminus2]

Tannerella

Fretibacterium

Filifactor

Aggregatibacter

SelenomonasTM7_[Gminus1]minus2

0

2

4

minus4 minus2 0 2PC1 (330 explained var)

PC2

(13

3 ex

plai

ned

var)

Sample group Case Control

(b)

Figure 6 (a) Principal coordinate analysis (PCoA) with weighted UniFrac distance matrixes for bacterial communities associated with thethree health statuses (b) Principal component analysis (PCA) of the dominant genera between samples from healthy patients and patientswith periodontitis Only genera with ge 1 mean relative abundance across all samples are shown

Table 4 Features with significant differences between healthy patients and patients with periodontitis Correlation coefficients and p valueswere determined by Spearmanrsquos rank correlation coefficient and KruskalminusWallis tests respectively Negative correlations indicated that thefeatures were observed more often in patients with periodontitis than in healthy patients

No Feature (Species) Correlation coefficient 119901

1 Stomatobaculum sp oral taxon 373 -0766029754 327E-112 Desulfobulbus sp oral taxon 041 -074877058 890E-113 Peptoniphilaceae sp oral taxon 113 -0723418056 373E-104 Streptococcus sanguinis 071684624 536E-105 Mollicutes sp oral taxon 906 -0709369416 808E-106 Aggregatibacter actinomycetemcomitans -0686608198 274E-097 Porphyromonas gingivalis -0683993685 315E-098 Peptostreptococcaceae sp oral taxon 950 -0681489164 359E-099 Lachnospiraceae sp oral taxon 500 -0670324546 643E-0910 Leptotrichia sp oral taxon 218 -0666642231 777E-0911 Bosea vestrisii 0665468802 826E-0912 Filifactor alocis -0656797473 129E-0813 Mycoplasma faucium -0641322841 279E-0814 Prevotella sp oral taxon 304 -0638587976 320E-0815 Fretibacterium sp oral taxon 359 -0632290825 436E-0816 Bergeyella sp oral taxon 322 0630961524 465E-0817 Tannerella forsythia -0628346704 528E-0818 Peptostreptococcus indolicus -0626504998 577E-0819 Johnsonella sp oral taxon 166 -0622396393 704E-0820 Peptostreptococcaceae [Eubacterium] saphenum -0616735679 924E-08

BioMed Research International 11

Table 5 Feature combinations and their predictive accuracies with different machine learning methods Only feature combinations withmore than 094 average accuracy are shown DL RF and LR represent deep learning random forests and logistic regression respectively

Feature combination DL RF SVM LR Average accuracyStomatobaculum sp oral taxon 373 0967 0973 0960 0933 0958Peptoniphilaceae sp oral taxon 113Desulfobulbaceae sp oral taxon 041

0933 0960 0973 0947 0953Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLachnospiraceae sp oral taxon 500Leptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0933 0973 0960 0947 0953Streptococcus sanguinisAggregatibacter actinomycetemcomitansDesulfobulbaceae sp oral taxon 041

0973 0967 0933 0927 0950Mollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0947 0953 0907 0987 0948Streptococcus sanguinisMollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0960 0967 0947 0913 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Desulfobulbaceae sp oral taxon 041

0933 0973 0933 0947 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0967 0933 0953 0933 0947Peptoniphilaceae sp oral taxon 113Mollicutes sp oral taxon 906Peptoniphilaceae sp oral taxon 113

0960 0987 0867 0967 0945Streptococcus sanguinisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0920 0947 0967 0947 0945Aggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0967 0967 0953 0893 0945Peptoniphilaceae sp oral taxon 113Porphyromonas gingivalisAggregatibacter actinomycetemcomitans

According to previous studies Caruana et al [69 70] pro-posed that the random forest method showed better accuracyin high-dimensional and large-scale data than neural netsSVM and logistic regression In this study we found that therandom forest method was more suitable for small-scale datathan other methods In contrast deep learning approachesled to good performance but required long computation

times and large amounts of memory particularly when thehidden layer size was increased

4 Conclusions

With the development of high-throughput DNA sequencingtechnology the limitations associated with difficult culture of

12 BioMed Research International

075

080

085

090

095

1 2 3 4 5 6 7 8 9 10Number of features in the combinstion

Aver

age a

ccur

acy

Machine learning algorithmsDeep learningLogistic regressionRandom forestSVM

Figure 7 Average accuracies of different numbers of features

manymicrobes that populate the oral cavity can be overcomefacilitating the analysis of bacterial community compositionUsing 16S rRNA sequencing of subgingival samples from 50individuals with periodontitis and 26 periodontally healthycontrols we determined the diversity of and differences incommunity compositions Moreover we identified microbesassociated with good health and periodontal disease andprovided a machine learning method for finding patternsand making predictions for oral microbiota associated withperiodontal disease

Our results showed that there was a higher diversity ofmicrobes in samples from patients with periodontal diseasethan in samples from healthy patients Importantly the coremicrobes in healthy patients were different significantly fromthose in patients with periodontitis We also found thatbacterial communities associated with healthy and diseasedstates were highly different in PCA and PCoA and thecompositions of microorganisms were more similar to eachother in samples from patients with periodontal disease thanin samples from healthy individuals

We proposed a novel feature selection method andinvestigated the potential of machine learning approachesfor determination of health status based on oral metage-nomics data By using nonparametric KruskalminusWallis teststo assess the significance of each microorganism we selectedsignificant microbes to generate prioritized feature combina-tions by our algorithm The performances of four machinelearning approaches were evaluated with these feature com-binations and random forests showed the best performance(average accuracy of 093 from 1023 feature combinations)followed by deep learning SVM and logistic regressionUsing machine learning methods training models couldaccurately predict the health status of samples by examiningfewer features According to our observations the accuracyof prediction generally increased slightly with the numberof features used except for logistic regression Notablycertain combinations composed of fewer features showedbetter accuracy than combinations composed of all selectedfeatures These combinations of features may only apply toour dataset However the results implied that a few relatedfeatures may have better predictive ability than multiple

independent features Therefore in order to improve theprediction accuracy of the model it is essential to identifythe most informative features Due to limitations in fundingtime and ethical considerations it is not easy to obtain largenumbers of oral samples from patients with periodontitisAlthough insufficient and incomplete samples could easilylead to bias and variance in training models our study stillprovided an important basis for further studies

Periodontitis is a chronic inflammatory disease involvingcomplex interactions between the oral microorganisms andthe host immune response In addition to the individualspecies associated with pathogenesis the system-level mech-anisms underlying the transition from a healthy state to adiseased state are key points for studying periodontal diseaseThus in our future studies we aim to elucidate the globalgenetic metabolic and ecological changes associated withperiodontitis and identify the pathogenic features of con-structing machine learning models Rapid molecular tech-niques and machine learning methods capable of identifyingperiodontal bacteria with great accuracy may eventuallyprovide improved classification and diagnosis of varioustypes of periodontal diseases and aid significantly in clinicaldecision-making

Data Availability

The raw sequences of human oral subgingival plaque sampleswere deposited at theNCBI SequenceReadArchive under theBioproject Accession no PRJNA437129

Conflicts of Interest

The authors declare no conflicts of interest

Authorsrsquo Contributions

Wen-Pei Chen and Shih-Hao Chang contributed equally tothis work

Funding

Thepresent work was partially supported by a grant from theMinistry of Science and Technology [grant number MOST107-2218-E-126-001-] and [grant number NSC 102-2622-E-126-002 CC1]

References

[1] L Gao T Xu G Huang S Jiang Y Gu and F Chen ldquoOral mi-crobiomesmore andmore importance in oral cavity and wholebodyrdquo Protein amp Cell pp 1ndash13 2018

[2] P D Marsh ldquoMicrobiology of dental plaque biofilms and theirrole in oral health and cariesrdquo Dental Clinics of North Americavol 54 no 3 pp 441ndash454 2010

[3] P D Marsh ldquoDental plaque as a biofilm and a microbialcommunity - Implications for health and diseaserdquo BMC OralHealth vol 6 no 1 2006

BioMed Research International 13

[4] R J Palmer Jr ldquoComposition and development of oral bacterialcommunitiesrdquo Periodontology 2000 vol 64 no 1 pp 20ndash392014

[5] B L Pihlstrom B S Michalowicz and N W Johnson ldquoPeri-odontal diseasesrdquoe Lancet vol 366 no 9499 pp 1809ndash18202005

[6] R J Genco and T E Van Dyke ldquoReducing the risk of CVD inpatients with periodontitisrdquo Nature Reviews Cardiology vol 7no 9 pp 479-480 2010

[7] K Lundberg N Wegner T Yucel-Lindberg and P J VenablesldquoPeriodontitis in RA-the citrullinated enolase connectionrdquoNature Reviews Rheumatology vol 6 no 12 pp 727ndash730 2010

[8] M Kebschull R T Demmer and P N Papapanou ldquoldquoGumbug leave my heart alonerdquomdashepidemiologic and mechanisticevidence linking periodontal infections and atherosclerosisrdquoJournal of Dental Research vol 89 no 9 pp 879ndash902 2010

[9] S E Whitmore R J Lamont and W E Goldman ldquoOral Bac-teria and Cancerrdquo PLoS Pathogens vol 10 no 3 p e10039332014

[10] Y W Han and X Wang ldquoMobile microbiome oral bacteria inextra-oral infections and inflammationrdquo Journal of Dental Re-search vol 92 no 6 pp 485ndash491 2013

[11] P N Madianos Y A Bobetsis and S Offenbacher ldquoAdversepregnancy outcomes (APOs) and periodontal disease Patho-genic mechanismsrdquo Journal of Clinical Periodontology vol 40no 14 pp S170ndashS180 2013

[12] S Temoin K LWu VWuM Shoham and YW Han ldquoSignalpeptide of FadA adhesin from Fusobacterium nucleatum playsa novel structural role by modulating the filamentrsquos length andwidthrdquo FEBS Letters vol 586 no 1 pp 1ndash6 2012

[13] S Malm M Jusko S Eick J Potempa K Riesbeck and AM Blom ldquoAcquisition of complement inhibitor serine proteasefactor i and its cofactors C4b-binding protein and factor H byprevotella intermediardquo PLoS ONE vol 7 no 4 2012

[14] S S Socransky A D Haffajee M A Cugini C Smith and R LKent Jr ldquoMicrobial complexes in subgingival plaquerdquo Journal ofClinical Periodontology vol 25 no 2 pp 134ndash144 1998

[15] R Teles F Teles J Frias-Lopez B Paster and A HaffajeeldquoLessons learned and unlearned in periodontal microbiologyrdquoPeriodontology 2000 vol 62 no 1 pp 95ndash162 2013

[16] P Belda-Ferre L D Alcaraz R Cabrera-Rubio et al ldquoThe oralmetagenome in health and diseaserdquo e ISME Journal vol 6no 1 pp 46ndash56 2012

[17] P S Kumar A L Griffen M L Moeschberger and E J LeysldquoIdentification of candidate periodontal pathogens and ben-eficial species by quantitative 16S clonal analysisrdquo Journal ofClinical Microbiology vol 43 no 8 pp 3944ndash3955 2005

[18] A P V Colombo S K Boches S L Cotton et al ldquoComparisonsof subgingival microbial profiles of refractory periodontitissevere periodontitis and periodontal health using the humanoral microbe identification microarrayrdquo Journal of Periodontol-ogy vol 80 no 9 pp 1421ndash1432 2009

[19] S Junemann K Prior R Szczepanowski et al ldquoBacterial com-munity shift in treated periodontitis patients revealed by IonTorrent 16S rRNA gene amplicon sequencingrdquo PLoS ONE vol7 no 8 Article ID e41606 2012

[20] P D Schloss S L Westcott T Ryabin et al ldquoIntroduc-ing mothur open-source platform-independent community-supported software for describing and comparing microbialcommunitiesrdquoApplied and EnvironmentalMicrobiology vol 75no 23 pp 7537ndash7541 2009

[21] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquoBioinformatics vol 26 no 19 pp 2460-2461 2010

[22] F Mahe T Rognes C Quince C de Vargas andM DunthornldquoSwarm Robust and fast clusteringmethod for amplicon-basedstudiesrdquo PeerJ vol 2014 no 1 2014

[23] M Ghodsi B Liu and M Pop ldquoDNACLUST Accurate andefficient clustering of phylogenetic marker genesrdquo BMC Bioin-formatics vol 12 article no 271 2011

[24] L Fu B Niu Z Zhu SWu andW Li ldquoCD-HIT accelerated forclustering the next-generation sequencing datardquoBioinformaticsvol 28 no 23 pp 3150ndash3152 2012

[25] H Teeling J Waldmann T Lombardot M Bauer and F OGlockner ldquoTETRA a web-service and a stand-alone programfor the analysis and comparison of tetranucleotide usage pat-terns inDNA sequencesrdquoBMCBioinformatics vol 5 article 1632004

[26] Y Wang H C M Leung S M Yiu and F Y L Chin ldquoMeta-Cluster 40 A novel binning algorithm for NGS reads and hugenumber of speciesrdquo Journal of Computational Biology vol 19no 2 pp 241ndash249 2012

[27] YWangH CM Leung SM Yiu and F Y L Chin ldquoMetaclus-ter 50 a two-roundbinning approach formetagenomic data forlow-abundance species in a noisy samplerdquo Bioinformatics vol28 no 18 Article ID bts397 pp i356ndashi362 2012

[28] D D Kang J Froula R Egan and Z Wang ldquoMetaBAT anefficient tool for accurately reconstructing single genomes fromcomplex microbial communitiesrdquo PeerJ vol 2015 no 8 2015

[29] J Alneberg B S Bjarnason I De Bruijn et al ldquoBinning meta-genomic contigs by coverage and compositionrdquo Nature Meth-ods vol 11 no 11 pp 1144ndash1146 2014

[30] Y-W Wu Y-H Tang S G Tringe B A Simmons and SW Singer ldquoMaxBin An automated binning method to recoverindividual genomes from metagenomes using an expectation-maximization algorithmrdquoMicrobiome vol 2 no 1 2014

[31] QWangGMGarrity JM Tiedje and J R Cole ldquoNaıve Baye-sian classifier for rapid assignment of rRNA sequences into thenew bacterial taxonomyrdquo Applied and Environmental Microbi-ology vol 73 no 16 pp 5261ndash5267 2007

[32] N Chaudhary A K Sharma P Agarwal A Gupta and V KSharma ldquo16S classifier A tool for fast and accurate taxono-mic classification of 16S rRNAhypervariable regions inmetage-nomic datasetsrdquo PLoS ONE vol 10 no 2 2015

[33] S P Szafranski M L Wos-Oxley R Vilchez-Vargas et alldquoHigh-resolution taxonomic profiling of the subgingival micro-biome for biomarker discovery and periodontitis diagnosisrdquoApplied and EnvironmentalMicrobiology vol 81 no 3 pp 1047ndash1058 2015

[34] K Vervier PMaheM Tournoud J-B Veyrieras and J-P VertldquoLarge-scale machine learning for metagenomics sequenceclassificationrdquo Bioinformatics vol 32 no 7 pp 1023ndash1032 2016

[35] A E Darling G Jospin E Lowe F A Matsen H M Bik andJ A Eisen ldquoPhyloSift Phylogenetic analysis of genomes andmetagenomesrdquo PeerJ vol 2013 no 1 2014

[36] Z Liu W Hsiao B L Cantarel E F Drabek and C Fraser-Liggett ldquoSparse distance-based learning for simultaneous mul-ticlass classification and feature selection of metagenomic datardquoBioinformatics vol 27 no 23 pp 3242ndash3249 2011

[37] O Tanaseichuk J Borneman and T Jiang ldquoPhylogeny-basedclassification ofmicrobial communitiesrdquoBioinformatics vol 30no 4 pp 449ndash456 2014

14 BioMed Research International

[38] H Cui and X Zhang ldquoAlignment-free supervised classificationof metagenomes by recursive SVMrdquo BMCGenomics vol 14 no1 article no 641 2013

[39] E Pasolli D T Truong F Malik L Waldron and N SegataldquoMachine Learning Meta-analysis of Large MetagenomicDatasets Tools and Biological Insightsrdquo PLoS ComputationalBiology vol 12 no 7 2016

[40] M Arumugam E D Harrington K U Foerstner J Raes andP Bork ldquoSmashCommunity A metagenomic annotation andanalysis toolrdquo Bioinformatics vol 26 no 23 pp 2977-29782010

[41] K J Hoff T Lingner P Meinicke and M Tech ldquoOrpheliaPredicting genes in metagenomic sequencing readsrdquo NucleicAcids Research vol 37 no 2 pp W101ndashW105 2009

[42] K J HoffM Tech T Lingner R Daniel BMorgenstern and PMeinicke ldquoGene prediction in metagenomic fragments A largescale machine learning approachrdquo BMC Bioinformatics vol 9article no 217 2008

[43] I Guyon and A Elisseeff ldquoAn introduction to variable andfeature selectionrdquo Journal of Machine Learning Research vol 3pp 1157ndash1182 2003

[44] Y Saeys I Inza and P Larranaga ldquoA review of feature selectiontechniques in bioinformaticsrdquo Bioinformatics vol 23 no 19 pp2507ndash2517 2007

[45] W-P Chen S-J J Tsai Y-C Hu and Y-L Lin ldquoMetage-nomic analysis and features selection in human oral microbiotaassociated with periodontal diseaserdquo in Proceedings of the 33rdWorkshop on Combinatorial Mathematics and Computationeory pp 58ndash64 2016

[46] G C Armitage ldquoDevelopment of a classification system forperiodontal diseases and conditionsrdquo Annals of Periodontologyvol 4 no 1 pp 1ndash6 1999

[47] C Y Tang S-M Yiu H-Y Kuo et al ldquoApplication of 16S rRNAmetagenomics to analyze bacterial communities at a respiratorycare centre in Taiwanrdquo Applied Microbiology and Biotechnologyvol 99 no 6 pp 2871ndash2881 2015

[48] L Cai L Ye A H Y Tong S Lok and T Zhang ldquoBiased Di-versityMetrics Revealed byBacterial 16S PyrotagsDerived fromDifferent Primer Setsrdquo PLoS ONE vol 8 no 1 2013

[49] J Zhang K Kobert T Flouri and A Stamatakis ldquoPEAR A fastand accurate Illumina Paired-End reAd mergeRrdquo Bioinformat-ics vol 30 no 5 pp 614ndash620 2014

[50] J G Caporaso J Kuczynski J Stombaugh et al ldquoQIIME allowsanalysis of high-throughput community sequencing datardquoNature Methods vol 7 no 5 pp 335-336 2010

[51] B J Haas D Gevers A M Earl et al ldquoChimeric 16S rRNAsequence formation and detection in Sanger and 454-pyro-sequenced PCR ampliconsrdquo Genome Research vol 21 no 3 pp494ndash504 2011

[52] R C Edgar B J Haas J C Clemente C Quince and RKnight ldquoUCHIME improves sensitivity and speed of chimeradetectionrdquo Bioinformatics vol 27 no 16 pp 2194ndash2200 2011

[53] T ChenW-H Yu J Izard OV BaranovaA Lakshmanan andF E Dewhirst ldquoThe Human Oral Microbiome Database a webaccessible resource for investigating oral microbe taxonomicand genomic informationrdquo Database the journal of biologicaldatabases and curation vol 2010 p baq013 2010

[54] T C J Hill K A Walsh J A Harris and B F Moffett ldquoUsingecological diversity measures with bacterial communitiesrdquoFEMS Microbiology Ecology vol 43 no 1 pp 1ndash11 2003

[55] R PDarveau ldquoPeriodontitis a polymicrobial disruption of hosthomeostasisrdquoNature ReviewsMicrobiology vol 8 no 7 pp 481ndash490 2010

[56] J M Albandar L J Brown and H Loe ldquoPutative PeriodontalPathogens in Subgingival Plaque of Young Adults with andWithout Early-Onset Periodontitisrdquo Journal of Periodontologyvol 68 no 10 pp 973ndash981 1997

[57] Y Takeuchi M Umeda M Sakamoto Y Benno Y Huang andI Ishikawa ldquoTreponema socranskii Treponema denticola andPorphyromonas gingivalis are associated with severity of peri-odontal tissue destructionrdquo Journal of Periodontology vol 72no 10 pp 1354ndash1363 2001

[58] J J Zambon ldquoActinobacillus actinomycetemcomitans in hu-man periodontal diseaserdquo Journal of Clinical Periodontologyvol 12 no 1 pp 1ndash20 1985

[59] J Slots and M Ting ldquoActinobacillus actinomycetemcomitansand Porphyromonas gingivalis in human periodontal diseaseOccurrence and treatmentrdquo Periodontology 2000 vol 20 no 1pp 82ndash121 1999

[60] BKChoi B J Paster F EDewhirst andUBGobel ldquoDiversityof cultivable and uncultivable oral spirochetes from a patientwith severe destructive periodontitisrdquo Infection and Immunityvol 62 no 5 pp 1889ndash1895 1994

[61] B Liu L L Faller N Klitgord et al ldquoDeep sequencing of theoral microbiome reveals signatures of periodontal diseaserdquoPLoS ONE vol 7 no 6 Article ID e37919 2012

[62] J Wang J Qi H Zhao et al ldquoMetagenomic sequencingreveals microbiota and its functional potential associated withperiodontal diseaserdquo Scientific Reports vol 3 2013

[63] A L Griffen C J Beall J H Campbell et al ldquoDistinct and com-plex bacterial profiles in human periodontitis and healthrevealed by 16S pyrosequencingrdquo e ISME Journal vol 6 no6 pp 1176ndash1185 2012

[64] M A Masalma F Armougom W Michael Scheld et al ldquoTheexpansion of the microbiological spectrum of brain abscessesWith use of multiple 16S ribosomal DNA Sequencingrdquo ClinicalInfectious Diseases vol 48 no 9 pp 1169ndash1178 2009

[65] MAlMasalmaM LonjonHRichet et al ldquoMetagenomic anal-ysis of brain abscesses identifies specific bacterial associationsrdquoClinical Infectious Diseases vol 54 no 2 pp 202ndash210 2012

[66] I Nasidze J Li DQuinque K Tang andM Stoneking ldquoGlobaldiversity in the human salivary microbiomerdquoGenome Researchvol 19 no 4 pp 636ndash643 2009

[67] F V Wintzingerode U B Gobel and E Stackebrandt ldquoDeter-mination of microbial diversity in environmental samples Pit-falls of PCR-based rRNA analysisrdquo FEMSMicrobiology Reviewsvol 21 no 3 pp 213ndash229 1997

[68] M Tezal F A Scannapieco J Wactawski-Wende S G Grossiand R J Genco ldquoSupragingival plaque may modify the effectsof subgingival bacteria on attachment lossrdquo Journal of Periodon-tology vol 77 no 5 pp 808ndash813 2006

[69] R Caruana and A Niculescu-Mizil ldquoAn empirical comparisonof supervised learning algorithmsrdquo in Proceedings of the ICML2006 23rd International Conference on Machine Learning pp161ndash168 USA June 2006

[70] R Caruana N Karampatziakis and A Yessenalina ldquoAn empir-ical evaluation of supervised learning in high dimensionsrdquo inProceedings of the 25th International Conference on MachineLearning pp 96ndash103 Finland July 2008

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 3: Composition Analysis and Feature Selection of the Oral

BioMed Research International 3

Table 1 Clinical characteristics of studied subjects Clinical attachment loss and probing depth were measured in mm and represent themean for all collected sites in the oral cavity of studied subjects

Characteristics Healthy Moderate periodontitis Severe periodontitisProbing depth (mean plusmn sd) 13 plusmn 06 50 plusmn 13 79 plusmn 07Clinical attachment loss (mean plusmn sd) 16 plusmn 07 57 plusmn 15 86 plusmn 11 sites with bleeding on probing (mean plusmn sd) 28 plusmn 18 683 plusmn 232 797 plusmn 175

potentially reduced the number of errors from sequencingand contamination The taxonomy associated with eachOTUwas assigned by blasting a representative sequence of eachOTU against the Human Oral Microbiome Database [53](HOMD)The sequence processing was carried out using ourmetagenomic analysis platforms [45]

23 Diversity and Significance Analysis Sample data storedin the biological observation matrix format were subjectedto statistical analysis using R language We analyzed thesequencing depth of samples prior to downstream analysisusing the Shannon index The main microbes and taxo-nomic composition of the microbiota in each sample werealso estimated Abundance differences of microbes betweensample groups were evaluated using the KruskalminusWallis testFour non-phylogeny-based metrics namely the observerspecies chao 1 metric [54] Ace richness and Shannon indexwere used to evaluate alpha diversity which representedthe amount of diversity contained within communities byapplying the phyloseq R package UniFrac is a distancemetric used for comparing biological communities Principalcoordinate analysis (PCoA) with weighted UniFrac distanceswas applied to evaluate beta diversity which represented theamount of diversity shared among communities Principalcomponent analysis (PCA) was used to characterize theprimary microbes contained within communities

24 Feature Selection and Machine Learning In this studywe proposed a method of feature selection for selectingthe informative microbes to predict whether an individualsuffered fromperiodontal disease First the microbes presentat less than 05 relative abundance in all samples wereignored and nonparametric KruskalminusWallis tests were usedto detectmicroorganisms with significantly differential abun-dance between healthy patients and patients with periodontaldisease Microbes with more significant differential scoreswere considered features with more information Then theprioritized feature combination-generated algorithm shownin Algorithm 1 was adopted to produce the feature combina-tions composed by these more informative features

In prioritized order the feature combinations wereapplied to build classifiers with machine learning algorithmssuch as deep learning support vector machine (SVM)random forests and logistic regression We picked 80 ofsamples from both healthy and disease cases to train thepredictionmodel and the remaining cases were used for test-ing The prediction ability of each feature combination wasevaluated by calculating the average accuracy from 10 pre-dictions with different training and testing sample sets Here

we selected 10 of the most significant features having p valuesbetween 327E-11 and 777E-9 In total 1023 feature combi-nations were evaluated for their prediction ability using deeplearning SVM random forest and logistic regression meth-ods These machine learning algorithms were supported bythe R packages H

2O e1071 randomForest and stats respec-

tively We considered the radial basis function kernel forSVM Parameters for each machine learning algorithm weretuned using grid search and the parameters that obtainedbetter accuracy were adopted for training prediction models

3 Results and Discussion

31 Sample Sequencing and Identification In total 76 sub-gingival plaque samples from 76 unrelated individuals weredivided into three classes according to their periodontalhealth status ie healthy (H) severe periodontitis (SP) andmoderate periodontitis (MP) Following DNA extraction andbarcoded PCR amplification these samples were sequencedgenerating a total of 7530767 sequences After filtering andtrimming 6170984 sequences remained and there were 481OTUs in all samples (481 and 429 in diseased and healthysamples respectively) Due to variations in the number ofsequences among samples the total sequence reads withina sample was normalized to the relative abundance forsubsequent analyses

32 Taxonomic Composition of the Human Oral Micro-biota Table 2 summarizes the dominant microbes in thehuman oral microbial communities In the experimentalresults the microbial communities included 12 differentphyla Bacteroidetes Firmicutes Fusobacteria ProteobacteriaSpirochaetes Actinobacteria Candidate division TM7 Syner-gistetes Fusobacteria Candidate division SR1 Gracilibacteriaand Chloroflexi Bacteroidetes (37) was the most abundantphylum in the human oral microbiota The major generaconsisted of previously characterized oral bacteria includingPrevotella (1356) Fusobacterium (1130) Porphyromonas(1094) Treponema (886) Streptococcus (652) Lep-totrichia (476) and Capnocytophaga (364) In summarythere were 25 classes 40 orders 66 families 124 genera and355 species at each taxonomic level

In comparison of the compositions of microbial commu-nities between healthy patients and patients with periodonti-tis we found that the spectra of microbial communities dif-fered In healthy samples the dominant genera were Strepto-coccus (1309) Prevotella (1243) Fusobacterium (1170)Capnocytophaga (625) Leptotrichia (560)Alloprevotella(426) Campylobacter (394) Porphyromonas (378)

4 BioMed Research International

GenPFC((1198861 1198862 119886119899)) 119862 Generate prioritized feature combinations

Input (1198861 1198862 119886119899) a list with 119899 features in prioritized order

Output a queue 119876 used to store 2119899 minus 1 feature combinations1 119876 larr997888 (0) 119862 Enqueue empty set 0 into queue 1198762 for 119894 larr997888 1 to 119899 do 119862 Generate attribute combinations according to features in the list3 119879 larr997888 119876 119862 Copy 119876 into 119879 which is a temporary queue4 for each 119904 in 119879 do5 Enqueue(119876 119904 cup 119886

119894)

6 Dequeue(119876) 119862 Delete first empty set 0 from queue 1198767 return 119876

Algorithm 1 The prioritized feature combination-generated algorithm was used to generate all combinations of selected features inprioritized order As an example when 119899 equals four the generated list will be (1000 0100 1100 0010 1010 0110 1110 0001 1001 01011101 0011 1011 0111 1111) Each element is a combination and denotes whether the four features were selected in that combination (eg thecombination containing the first and third features is represented as 1010)

Table 2 Dominant microbes of the human oral microbiota at each taxonomic level

Phylum Class OrderBacteroidetes 3741 Bacteroidia 3171 Bacteroidales 3171Firmicutes 2082 Fusobacteria 1606 Fusobacteriales 1606Fusobacteria 1606 Spirochaetia 886 Spirochaetales 886Proteobacteria 930 Bacilli 783 Lactobacillales 706Spirochaetes 886 Clostridia 678 Clostridiales 678Actinobacteria 238 Negativicutes 521 Selenomonadales 521Family Genus SpeciesPrevotellaceae 1639 Prevotella 1356 Porphyromonas gingivalis 730Porphyromonadaceae 1296 Fusobacterium 1130 Fusobacterium nucleatum subsp vincentii 523Fusobacteriaceae 1130 Porphyromonas 1094 Prevotella intermedia 462Spirochaetaceae 886 Treponema 886 Streptococcus sp oral taxon 423 262Streptococcaceae 652 Streptococcus 652 Bacteroidales sp oral taxon 274 218Veillonellaceae 521 Leptotrichia 476 Prevotella loescheii 215

Veillonella (349) and Neisseria (327) however inpatients with periodontal disease the dominant genera werePorphyromonas (1467) Prevotella (1416) Treponema(1190) Fusobacterium (1109) Leptotrichia (432) andStreptococcus (310) At the species level Streptococcus sporal taxon 423 (02-36) was the most abundant species inhealthy patients whereas Porphyromonas gingivalis (0-31)was the most abundant species in patients with periodontitisTable 3 compares the dominant microbes between healthypatients and patients with periodontitis at each taxonomiclevel The genus and species level taxonomic compositionsbetween healthy patients and patients with periodontitis areshown in Figures 1 and 2 Streptococcus was more abun-dant in samples from all healthy individuals but decreasedin samples from patients with periodontitis AdditionallyPorphyromonas and Treponema were more abundant inpatients with periodontitis but decreased significantly insamples from healthy individuals In total 25 species wereidentified with significantly different abundances betweensample groups Porphyromonas gingivalis was the specieswith the most significantly differential abundance betweensamples fromhealthy patients and patients with periodontitis(p value = 241E-9)

Overall our findings were largely comparable to thoseof previous studies [14 55ndash61] indicating that species suchas Porphyromonas gingivalisTreponema denticola Tannerellaforsythia Filifactor alocis Treponema socranskii Aggregat-ibacter actinomycetemcomitans Treponema vincentii andMycoplasma faucium were significantly enriched in samplesfrom patients with periodontitis Furthermore we found aset of species including Streptococcus sanguinisHaemophilusparainfluenzae Capnocytophaga granulosa Gemella morbil-lorum Campylobacter showae and Granulicatella adiacenswere significantly enriched in samples from healthy individ-uals

Several studies have described the bacterial communitiesin patientswith periodontitis and healthy control participantsusing metagenomics [16ndash19 61ndash63] The dominant microor-ganisms associated with periodontitis and the healthy statewere largely consistent in those studies however we observedseveral discrepancies First in addition to common diseased-associated microorganisms such as Porphyromonas gingi-valis Treponema denticola Tannerella forsythia Filifactoralocis and Aggregatibacter actinomycetemcomitans we alsofound that the speciesMycoplasma faucium was significantlyenriched in samples from patients with periodontal disease

BioMed Research International 5

Table 3 Dominant microbes of the oral microbiota between healthy patients and patients with periodontitis at each taxonomic level

Healthy patients Patients with periodontitisPhylum

Bacteroidetes 3193 Bacteroidetes 4026Firmicutes 2690 Firmicutes 1766Fusobacteria 1731 Fusobacteria 1542Proteobacteria 1181 Spirochaetes 1190Actinobacteria 336 Proteobacteria 799Saccharibacteria 320 Synergistetes 250

ClassBacteroidia 2476 Bacteroidia 3532Fusobacteria 1731 Fusobacteria 1542

Bacilli 1523 Spirochaetia 1190Negativicutes 671 Clostridia 793Flavobacteriia 667 Negativicutes 443Clostridia 457 Bacilli 398

OrderBacteroidales 2476 Bacteroidales 3532Fusobacteriales 1731 Fusobacteriales 1542Lactobacillales 1394 Spirochaetales 1190Selenomonadales 671 Clostridiales 793Flavobacteriales 667 Selenomonadales 443Clostridiales 457 Lactobacillales 348

FamilyPrevotellaceae 1669 Porphyromonadaceae 1719Streptococcaceae 1309 Prevotellaceae 1624Fusobacteriaceae 1170 Spirochaetaceae 1190Veillonellaceae 671 Fusobacteriaceae 1109

Flavobacteriaceae 667 Veillonellaceae 443Leptotrichiaceae 561 Leptotrichiaceae 432

GenusStreptococcus 1309 Porphyromonas 1467Prevotella 1243 Prevotella 1416

Fusobacterium 1170 Treponema 1190Capnocytophaga 625 Fusobacterium 1109Leptotrichia 560 Leptotrichia 432Alloprevotella 426 Streptococcus 310

SpeciesStreptococcus sp oral taxon 423 588 Porphyromonas gingivalis 1101

Fusobacterium nucleatum subsp vincentii 422 Prevotella intermedia 602Fusobacterium nucleatum subsp polymorphum 352 Fusobacterium nucleatum subsp vincentii 576

Veillonella parvula 333 Treponema denticola 268Bacteroidales sp oral taxon 274 311 Fusobacterium nucleatum subsp nucleatum 234

Fusobacterium nucleatum subsp animalis 309 Tannerella forsythia 232

There were 26 samples that contained this species at greaterthan 05 abundance and only one of these samples wasderived from a healthy patient The average relative abun-dance of Mycoplasma faucium was 059 in all samples(004 and 087 in samples from healthy patients andpatients with periodontal disease respectively) and was upto 485 in one diseased sample Although this is a rare

bacterium in the normal microbiota of the human orophar-ynx some reports have identified this pathogen in brainabscesses [64 65] Additionally Liu et al [61] characterizedthe genomes of key players in the subgingival microbiotain patients with periodontitis including an unculturableTM7 organism They also demonstrated that TM7 organismswere significantly enriched in samples from patients with

6 BioMed Research International

lowast PorphyromonasPrevotella

lowast Treponemalowast Tannerella

lowast Fretibacterium lowast Filifactor

AggregatibacterBacteroidetes_[Gminus5]

lowast MycoplasmaParvimonas

lowast Bacteroidetes_[Gminus3]Bacteroidetes_[Gminus6]

AnaeroglobusFusobacteriumlowast Streptococcus

lowast CapnocytophagaLeptotrichia

AlloprevotellaCampylobacter

VeillonellaNeisseria

Bacteroidales_[Gminus2]lowast Haemophilus

TM7_[Gminus1]SelenomonasTM7_[Gminus5]lowast Gemella

DialisterCorynebacteriumlowast Granulicatella

ActinomycesSR1_[Gminus1]

Peptostreptococcaceae_[XI][Gminus7]

12 9 6 3 0 3 6 9 12Average abundance

Sample group

Case

Control

Gen

us

Figure 1 Microbial compositions of samples from healthy patients and patients with periodontitis at the genus level The abundances werecalculated by averaging the relative abundances in samples from healthy patients and patients with periodontitis Only genera with gt 05abundance in at least one sample were included Genera with significant differences in abundance between sample groups are indicated withasterisks (lowast) (p value lt 00001)

periodontitis In our study 49 of 76 samples contained TM7bacteria at greater than 1 abundance (average abundanceof 21 in all samples) In samples from healthy patientsand patients with periodontitis the average abundanceswere 32 and 149 respectively However significantenrichment was not observed in samples from patients withperiodontitis Furthermore we found that the subspeciesFusobacterium nucleatum subsp polymorphum which isrelated to periodontal disease and is themember of the orangecluster described by Socransky et al [14] is more abundant inhealthy patients In our results the average abundances were352 and 113 in samples fromhealthy patients and patientswith periodontitis respectively This situation also can beobserved in other three species including Campylobactergracilis Campylobacter rectus and Campylobacter showaeThis discrepancy could be explained by geographic variability[66] or by differences in the depths of the pockets sampled[14] as well as the sample size and the DNA analytic bias [67]Finally Spearmanrsquos rank correlation coefficientwas computed

to assess association between each pair of species associatedwith periodontal disease Figure 3 shows that a very strongrelationship exhibited among species Porphyromonas gingi-valis Treponema denticola and Tannerella forsythia

In our study there are 25 bacterial species with signif-icantly different abundances between healthy patients andpatients with periodontitis The relationships of these speciesto pocket depth and clinical attachment loss were examinedFigure 4 shows that three species Porphyromonas gingivalisTreponema denticola and Tannerella forsythia exhibited avery strong relationship with pocket depth and clinicalattachment loss For instance the three species increasedin abundance with increasing pocket depth and clinicalattachment loss The abundances of those species amongdifferent level of pocket depth and clinical attachment losswere different significantly However it should be noted thatnot only oral microorganisms but also others factors suchas supragingival plaque would affect the pocket depth andclinical attachment loss [68]

BioMed Research International 7

lowast Porphyromonas gingivalisPrevotella intermedia

Fusobacterium nucleatum_subsp_vincentiilowast Treponema denticola

lowast Fusobacterium nucleatum_subsp_nucleatumlowast Tannerella forsythia

Porphyromonas endodontalislowast Filifactor alocis

lowast Treponema vincentiiTreponema sp_oral_taxon_231

lowast Treponema socranskiilowast Aggregatibacter actinomycetemcomitans

lowast Fretibacterium sp_oral_taxon_359lowast Leptotrichia sp_oral_taxon_218lowast Treponema sp_oral_taxon_237

Prevotella sp_oral_taxon_313Prevotella pleuritidis

Bacteroidetes sp_oral_taxon_511lowast Mycoplasma faucium

lowast Fretibacterium sp_oral_taxon_360Parvimonas micra

Bacteroidetes sp_oral_taxon_516Campylobacter sp_oral_taxon_044

lowast Prevotella sp_oral_taxon_304Anaeroglobus geminatus

9 6 3 0 3 6Average abundance

Sample group

Case

Control

Spec

ies

lowast Streptococcus sp_oral_taxon_423lowast Fusobacterium nucleatum_subsp_polymorphum

Veillonella parvula

Fusobacterium nucleatum_subsp_animalisBacteroidales sp_oral_taxon_274

Prevotella loescheiilowast Streptococcus sanguinis

Alloprevotella tanneraelowast Haemophilus parainfluenzae

Campylobacter gracilislowast Fusobacterium sp_HOT_204

Porphyromonas catoniaelowast Capnocytophaga granulosa

Neisseria flavescensNeisseria sicca

TM7sp_oral_taxon_356Prevotella oris

Streptococcus cristatusCapnocytophaga leadbetteri

Leptotrichia sp_oral_taxon_417lowast Gemella morbillorumStreptococcus dentisani

lowast Campylobacter showaeAlloprevotella sp_oral_taxon_473

Corynebacterium matruchotiiDialister invisus

Leptotrichia sp_oral_taxon_225lowast Capnocytophaga sp_oral_taxon_412

Capnocytophaga sputigenalowast Granulicatella adiacens

Prevotella pallensStreptococcus sp_oral_taxon_058lowast Prevotella sp_oral_taxon_300

Aggregatibacter segnisCapnocytophaga sp_oral_taxon_326

TM7sp_oral_taxon_346Leptotrichia hongkongensis

Campylobacter concisusLeptotrichia sp_oral_taxon_392

Figure 2 Microbial compositions of samples from healthy patients and patients with periodontitis at the species levelThe abundances werecalculated by averaging the relative abundances in samples from healthy patients and patients with periodontitis Only species with gt 05abundance in at least one sample are shown Species with significant differences in abundance between sample groups are indicated withasterisks (lowast) (p value lt 00001)

8 BioMed Research International

F nuc ss nucleatum

C rectus

P intermedia

P gingivalis

T forsythia

T denticolaF nuc ss polymorphum

C showae

P micra

F nuc ss vincentiiC gracilis

P nigrescens

F nuc s

s nucle

atum

C rectu

s

P inter

media

P gingiva

lis

T forsy

thia

T denticola

F nuc s

s polym

orphum

C showae

P micr

aF n

uc ss v

incentii

C gracilis

P nigres

cens

1

08

06

04

02

0

-02

-04

-06

-08

-1

Figure 3 The relationships among species were evaluated using Spearmanrsquos rank correlation coefficient

p lt 00001

Porphyromonas gingivalis Treponema denticola Tannerella forsythia

12

9

6

3

12

9

6

3

12

9

6

3

15

12

lt 3 3 4 5 6 gt 7

Porphyromonas gingivalis

p lt 00001

p lt 00001 p lt 00001 p lt 00001

lt 3 3 4 5 6 gt 7

Treponema denticola

p lt 00001 15

12

lt 3 3 4 5 6 gt 7Pocket depth (mm)

Tannerella forsythia

9 9

6 6

3

15

12

9

6

3 3

lt 3 3 4 5 6 gt 7 lt 3 3 4 5 6 gt 7 lt 3 3 4 5 6 gt 7Clinical attachment loss (mm)

Aver

age a

bund

ance

Aver

age a

bund

ance

Figure 4 Relationships of the average abundance of three species to selected pocket depths and clinical attachment loss levels Significanceof differences among pocket depth levels was tested using the Kruskal-Wallis test

33 Diversity of Bacterial Community Profiles To evaluatethe alpha diversity of the microbial communities Shannonindex curves scores and richness metrics (Observed Chao1and ACE) were applied as shown in Figure 5 As depictedin Figure 5(a) the Shannon diversity index curves clearlyreached plateau levels after the sequence number exceeded5000 in all three health statuses indicating that the microbial

composition for each health status was well represented bythe sequencing depth As shown in Figure 5(b) the averagerichness measured by Observed Chao1 and Ace indexeswas higher in samples from patients with periodontitis thanin samples from healthy individuals however these resultswere in contrast to the results from the Shannon diversityindex Thus the relative abundance of each microbe was

BioMed Research International 9

6

Health StatusHMP SP

4

2

0

0 5000 10000 15000Sequences Per Sample

Aver

age S

hann

on In

dex

(a)

Observed Chao1 ACE

Case

Control

Case

Control

Case

Control

0

100

200

300

400

500

Sample group

Alp

ha D

iver

sity M

easu

re

Shannon

Case

Control

30

35

40

45

p value = 0166

(b)

Figure 5 (a) The sequencing depths measured by average scores from the Shannon index reached a plateau when the sequence numberexceeded 5000 (b) Alpha-diversity metrics (richness and Shannon index) were employed to measure the microbial communities of samplesfrom healthy patients and patients with periodontitis The average richness of microbes was higher in patients with periodontal disease thanin healthy patients however the microbial communities of healthy patients exhibited higher Shannon indexes

more balanced in samples from healthy individuals than insamples from patients with periodontal disease and therewere more microbes with low relative abundance in samplesfrom patients with periodontitis

To further explore the relationships between bacterialcommunities in healthy patients and patients with peri-odontal disease PCoAwas performed (Figure 6(a)) Analysisof beta diversity based on the weighted UniFrac distancesshowed greater concentration in diseased samples than inhealthy samples In other words the microbial composi-tions of diseased samples were more similar to each otherAs shown in Figure 6(b) PCA of microbial communitiesrevealed that the core genera in healthy samples includedStreptococcus Capnocytophaga Campylobacter VeillonellaAlloprevotella TM7 [G-1] Leptotrichia and Selenomonaswhereas those in samples from patients with periodontitiswere Filifactor Treponema Fretibacterium Porphyromonasand Tannerella

34 Machine Learning and Feature Selection Before applyingthe machine learning algorithm to classify samples it isnecessary to select the features from the samples and trainpredictionmodels Table 4 lists featureswith difference scores

p lt 1E-07 Based on significant differences between healthypatients and patients with periodontitis we selected the top10 microbes with more information as features In total1023 combinations of selected features were generated by ouralgorithm All feature combinations were evaluated by SVMrandom forest logical regression and deep learning machinelearning methods and the average accuracies were 088 093085 and 090 respectively Figure 7 shows the performanceof each machine learning method In general the accuracyof prediction increased slightly with the number of featuresused except in logistic regression From our results wefound that random forests had better predictive ability thanthe other methods Applying combinations consisting ofPeptoniphilaceae sp oral taxon 113 Streptococcus sanguinisMollicutes sp oral taxon 906Aggregatibacter actinomycetem-comitans Porphyromonas gingivalis Peptostreptococcaceae sporal taxon 950 and Lachnospiraceae sp oral taxon 500or Stomatobaculum sp oral taxon 373 Desulfobulbus sporal taxon 041 Peptoniphilaceae sp oral taxon 113 Strep-tococcus sanguinis Aggregatibacter actinomycetemcomitansPorphyromonas gingivalis and Leptotrichia sp oral taxon 218showed that random forests could predict the health status ofsamples accurately The feature combinations having averageaccuracies of more than 094 are reported in Table 5

10 BioMed Research International

minus01

00

01

02

minus02 minus01 00 01 02Axis1 [403]

Axi

s2 [1

35

]

Health statusHMPSP

Sample groupCaseControl

(a)

Prevotella

Fusobacterium

Porphyromonas

Treponema

Streptococcu

s

Leptotri

chia

Capnocytophaga

Alloprevotella

Campylobacter

Neisse

ria

Veillonella

Bacteroidales_[Gminus2]

Tannerella

Fretibacterium

Filifactor

Aggregatibacter

SelenomonasTM7_[Gminus1]minus2

0

2

4

minus4 minus2 0 2PC1 (330 explained var)

PC2

(13

3 ex

plai

ned

var)

Sample group Case Control

(b)

Figure 6 (a) Principal coordinate analysis (PCoA) with weighted UniFrac distance matrixes for bacterial communities associated with thethree health statuses (b) Principal component analysis (PCA) of the dominant genera between samples from healthy patients and patientswith periodontitis Only genera with ge 1 mean relative abundance across all samples are shown

Table 4 Features with significant differences between healthy patients and patients with periodontitis Correlation coefficients and p valueswere determined by Spearmanrsquos rank correlation coefficient and KruskalminusWallis tests respectively Negative correlations indicated that thefeatures were observed more often in patients with periodontitis than in healthy patients

No Feature (Species) Correlation coefficient 119901

1 Stomatobaculum sp oral taxon 373 -0766029754 327E-112 Desulfobulbus sp oral taxon 041 -074877058 890E-113 Peptoniphilaceae sp oral taxon 113 -0723418056 373E-104 Streptococcus sanguinis 071684624 536E-105 Mollicutes sp oral taxon 906 -0709369416 808E-106 Aggregatibacter actinomycetemcomitans -0686608198 274E-097 Porphyromonas gingivalis -0683993685 315E-098 Peptostreptococcaceae sp oral taxon 950 -0681489164 359E-099 Lachnospiraceae sp oral taxon 500 -0670324546 643E-0910 Leptotrichia sp oral taxon 218 -0666642231 777E-0911 Bosea vestrisii 0665468802 826E-0912 Filifactor alocis -0656797473 129E-0813 Mycoplasma faucium -0641322841 279E-0814 Prevotella sp oral taxon 304 -0638587976 320E-0815 Fretibacterium sp oral taxon 359 -0632290825 436E-0816 Bergeyella sp oral taxon 322 0630961524 465E-0817 Tannerella forsythia -0628346704 528E-0818 Peptostreptococcus indolicus -0626504998 577E-0819 Johnsonella sp oral taxon 166 -0622396393 704E-0820 Peptostreptococcaceae [Eubacterium] saphenum -0616735679 924E-08

BioMed Research International 11

Table 5 Feature combinations and their predictive accuracies with different machine learning methods Only feature combinations withmore than 094 average accuracy are shown DL RF and LR represent deep learning random forests and logistic regression respectively

Feature combination DL RF SVM LR Average accuracyStomatobaculum sp oral taxon 373 0967 0973 0960 0933 0958Peptoniphilaceae sp oral taxon 113Desulfobulbaceae sp oral taxon 041

0933 0960 0973 0947 0953Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLachnospiraceae sp oral taxon 500Leptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0933 0973 0960 0947 0953Streptococcus sanguinisAggregatibacter actinomycetemcomitansDesulfobulbaceae sp oral taxon 041

0973 0967 0933 0927 0950Mollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0947 0953 0907 0987 0948Streptococcus sanguinisMollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0960 0967 0947 0913 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Desulfobulbaceae sp oral taxon 041

0933 0973 0933 0947 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0967 0933 0953 0933 0947Peptoniphilaceae sp oral taxon 113Mollicutes sp oral taxon 906Peptoniphilaceae sp oral taxon 113

0960 0987 0867 0967 0945Streptococcus sanguinisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0920 0947 0967 0947 0945Aggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0967 0967 0953 0893 0945Peptoniphilaceae sp oral taxon 113Porphyromonas gingivalisAggregatibacter actinomycetemcomitans

According to previous studies Caruana et al [69 70] pro-posed that the random forest method showed better accuracyin high-dimensional and large-scale data than neural netsSVM and logistic regression In this study we found that therandom forest method was more suitable for small-scale datathan other methods In contrast deep learning approachesled to good performance but required long computation

times and large amounts of memory particularly when thehidden layer size was increased

4 Conclusions

With the development of high-throughput DNA sequencingtechnology the limitations associated with difficult culture of

12 BioMed Research International

075

080

085

090

095

1 2 3 4 5 6 7 8 9 10Number of features in the combinstion

Aver

age a

ccur

acy

Machine learning algorithmsDeep learningLogistic regressionRandom forestSVM

Figure 7 Average accuracies of different numbers of features

manymicrobes that populate the oral cavity can be overcomefacilitating the analysis of bacterial community compositionUsing 16S rRNA sequencing of subgingival samples from 50individuals with periodontitis and 26 periodontally healthycontrols we determined the diversity of and differences incommunity compositions Moreover we identified microbesassociated with good health and periodontal disease andprovided a machine learning method for finding patternsand making predictions for oral microbiota associated withperiodontal disease

Our results showed that there was a higher diversity ofmicrobes in samples from patients with periodontal diseasethan in samples from healthy patients Importantly the coremicrobes in healthy patients were different significantly fromthose in patients with periodontitis We also found thatbacterial communities associated with healthy and diseasedstates were highly different in PCA and PCoA and thecompositions of microorganisms were more similar to eachother in samples from patients with periodontal disease thanin samples from healthy individuals

We proposed a novel feature selection method andinvestigated the potential of machine learning approachesfor determination of health status based on oral metage-nomics data By using nonparametric KruskalminusWallis teststo assess the significance of each microorganism we selectedsignificant microbes to generate prioritized feature combina-tions by our algorithm The performances of four machinelearning approaches were evaluated with these feature com-binations and random forests showed the best performance(average accuracy of 093 from 1023 feature combinations)followed by deep learning SVM and logistic regressionUsing machine learning methods training models couldaccurately predict the health status of samples by examiningfewer features According to our observations the accuracyof prediction generally increased slightly with the numberof features used except for logistic regression Notablycertain combinations composed of fewer features showedbetter accuracy than combinations composed of all selectedfeatures These combinations of features may only apply toour dataset However the results implied that a few relatedfeatures may have better predictive ability than multiple

independent features Therefore in order to improve theprediction accuracy of the model it is essential to identifythe most informative features Due to limitations in fundingtime and ethical considerations it is not easy to obtain largenumbers of oral samples from patients with periodontitisAlthough insufficient and incomplete samples could easilylead to bias and variance in training models our study stillprovided an important basis for further studies

Periodontitis is a chronic inflammatory disease involvingcomplex interactions between the oral microorganisms andthe host immune response In addition to the individualspecies associated with pathogenesis the system-level mech-anisms underlying the transition from a healthy state to adiseased state are key points for studying periodontal diseaseThus in our future studies we aim to elucidate the globalgenetic metabolic and ecological changes associated withperiodontitis and identify the pathogenic features of con-structing machine learning models Rapid molecular tech-niques and machine learning methods capable of identifyingperiodontal bacteria with great accuracy may eventuallyprovide improved classification and diagnosis of varioustypes of periodontal diseases and aid significantly in clinicaldecision-making

Data Availability

The raw sequences of human oral subgingival plaque sampleswere deposited at theNCBI SequenceReadArchive under theBioproject Accession no PRJNA437129

Conflicts of Interest

The authors declare no conflicts of interest

Authorsrsquo Contributions

Wen-Pei Chen and Shih-Hao Chang contributed equally tothis work

Funding

Thepresent work was partially supported by a grant from theMinistry of Science and Technology [grant number MOST107-2218-E-126-001-] and [grant number NSC 102-2622-E-126-002 CC1]

References

[1] L Gao T Xu G Huang S Jiang Y Gu and F Chen ldquoOral mi-crobiomesmore andmore importance in oral cavity and wholebodyrdquo Protein amp Cell pp 1ndash13 2018

[2] P D Marsh ldquoMicrobiology of dental plaque biofilms and theirrole in oral health and cariesrdquo Dental Clinics of North Americavol 54 no 3 pp 441ndash454 2010

[3] P D Marsh ldquoDental plaque as a biofilm and a microbialcommunity - Implications for health and diseaserdquo BMC OralHealth vol 6 no 1 2006

BioMed Research International 13

[4] R J Palmer Jr ldquoComposition and development of oral bacterialcommunitiesrdquo Periodontology 2000 vol 64 no 1 pp 20ndash392014

[5] B L Pihlstrom B S Michalowicz and N W Johnson ldquoPeri-odontal diseasesrdquoe Lancet vol 366 no 9499 pp 1809ndash18202005

[6] R J Genco and T E Van Dyke ldquoReducing the risk of CVD inpatients with periodontitisrdquo Nature Reviews Cardiology vol 7no 9 pp 479-480 2010

[7] K Lundberg N Wegner T Yucel-Lindberg and P J VenablesldquoPeriodontitis in RA-the citrullinated enolase connectionrdquoNature Reviews Rheumatology vol 6 no 12 pp 727ndash730 2010

[8] M Kebschull R T Demmer and P N Papapanou ldquoldquoGumbug leave my heart alonerdquomdashepidemiologic and mechanisticevidence linking periodontal infections and atherosclerosisrdquoJournal of Dental Research vol 89 no 9 pp 879ndash902 2010

[9] S E Whitmore R J Lamont and W E Goldman ldquoOral Bac-teria and Cancerrdquo PLoS Pathogens vol 10 no 3 p e10039332014

[10] Y W Han and X Wang ldquoMobile microbiome oral bacteria inextra-oral infections and inflammationrdquo Journal of Dental Re-search vol 92 no 6 pp 485ndash491 2013

[11] P N Madianos Y A Bobetsis and S Offenbacher ldquoAdversepregnancy outcomes (APOs) and periodontal disease Patho-genic mechanismsrdquo Journal of Clinical Periodontology vol 40no 14 pp S170ndashS180 2013

[12] S Temoin K LWu VWuM Shoham and YW Han ldquoSignalpeptide of FadA adhesin from Fusobacterium nucleatum playsa novel structural role by modulating the filamentrsquos length andwidthrdquo FEBS Letters vol 586 no 1 pp 1ndash6 2012

[13] S Malm M Jusko S Eick J Potempa K Riesbeck and AM Blom ldquoAcquisition of complement inhibitor serine proteasefactor i and its cofactors C4b-binding protein and factor H byprevotella intermediardquo PLoS ONE vol 7 no 4 2012

[14] S S Socransky A D Haffajee M A Cugini C Smith and R LKent Jr ldquoMicrobial complexes in subgingival plaquerdquo Journal ofClinical Periodontology vol 25 no 2 pp 134ndash144 1998

[15] R Teles F Teles J Frias-Lopez B Paster and A HaffajeeldquoLessons learned and unlearned in periodontal microbiologyrdquoPeriodontology 2000 vol 62 no 1 pp 95ndash162 2013

[16] P Belda-Ferre L D Alcaraz R Cabrera-Rubio et al ldquoThe oralmetagenome in health and diseaserdquo e ISME Journal vol 6no 1 pp 46ndash56 2012

[17] P S Kumar A L Griffen M L Moeschberger and E J LeysldquoIdentification of candidate periodontal pathogens and ben-eficial species by quantitative 16S clonal analysisrdquo Journal ofClinical Microbiology vol 43 no 8 pp 3944ndash3955 2005

[18] A P V Colombo S K Boches S L Cotton et al ldquoComparisonsof subgingival microbial profiles of refractory periodontitissevere periodontitis and periodontal health using the humanoral microbe identification microarrayrdquo Journal of Periodontol-ogy vol 80 no 9 pp 1421ndash1432 2009

[19] S Junemann K Prior R Szczepanowski et al ldquoBacterial com-munity shift in treated periodontitis patients revealed by IonTorrent 16S rRNA gene amplicon sequencingrdquo PLoS ONE vol7 no 8 Article ID e41606 2012

[20] P D Schloss S L Westcott T Ryabin et al ldquoIntroduc-ing mothur open-source platform-independent community-supported software for describing and comparing microbialcommunitiesrdquoApplied and EnvironmentalMicrobiology vol 75no 23 pp 7537ndash7541 2009

[21] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquoBioinformatics vol 26 no 19 pp 2460-2461 2010

[22] F Mahe T Rognes C Quince C de Vargas andM DunthornldquoSwarm Robust and fast clusteringmethod for amplicon-basedstudiesrdquo PeerJ vol 2014 no 1 2014

[23] M Ghodsi B Liu and M Pop ldquoDNACLUST Accurate andefficient clustering of phylogenetic marker genesrdquo BMC Bioin-formatics vol 12 article no 271 2011

[24] L Fu B Niu Z Zhu SWu andW Li ldquoCD-HIT accelerated forclustering the next-generation sequencing datardquoBioinformaticsvol 28 no 23 pp 3150ndash3152 2012

[25] H Teeling J Waldmann T Lombardot M Bauer and F OGlockner ldquoTETRA a web-service and a stand-alone programfor the analysis and comparison of tetranucleotide usage pat-terns inDNA sequencesrdquoBMCBioinformatics vol 5 article 1632004

[26] Y Wang H C M Leung S M Yiu and F Y L Chin ldquoMeta-Cluster 40 A novel binning algorithm for NGS reads and hugenumber of speciesrdquo Journal of Computational Biology vol 19no 2 pp 241ndash249 2012

[27] YWangH CM Leung SM Yiu and F Y L Chin ldquoMetaclus-ter 50 a two-roundbinning approach formetagenomic data forlow-abundance species in a noisy samplerdquo Bioinformatics vol28 no 18 Article ID bts397 pp i356ndashi362 2012

[28] D D Kang J Froula R Egan and Z Wang ldquoMetaBAT anefficient tool for accurately reconstructing single genomes fromcomplex microbial communitiesrdquo PeerJ vol 2015 no 8 2015

[29] J Alneberg B S Bjarnason I De Bruijn et al ldquoBinning meta-genomic contigs by coverage and compositionrdquo Nature Meth-ods vol 11 no 11 pp 1144ndash1146 2014

[30] Y-W Wu Y-H Tang S G Tringe B A Simmons and SW Singer ldquoMaxBin An automated binning method to recoverindividual genomes from metagenomes using an expectation-maximization algorithmrdquoMicrobiome vol 2 no 1 2014

[31] QWangGMGarrity JM Tiedje and J R Cole ldquoNaıve Baye-sian classifier for rapid assignment of rRNA sequences into thenew bacterial taxonomyrdquo Applied and Environmental Microbi-ology vol 73 no 16 pp 5261ndash5267 2007

[32] N Chaudhary A K Sharma P Agarwal A Gupta and V KSharma ldquo16S classifier A tool for fast and accurate taxono-mic classification of 16S rRNAhypervariable regions inmetage-nomic datasetsrdquo PLoS ONE vol 10 no 2 2015

[33] S P Szafranski M L Wos-Oxley R Vilchez-Vargas et alldquoHigh-resolution taxonomic profiling of the subgingival micro-biome for biomarker discovery and periodontitis diagnosisrdquoApplied and EnvironmentalMicrobiology vol 81 no 3 pp 1047ndash1058 2015

[34] K Vervier PMaheM Tournoud J-B Veyrieras and J-P VertldquoLarge-scale machine learning for metagenomics sequenceclassificationrdquo Bioinformatics vol 32 no 7 pp 1023ndash1032 2016

[35] A E Darling G Jospin E Lowe F A Matsen H M Bik andJ A Eisen ldquoPhyloSift Phylogenetic analysis of genomes andmetagenomesrdquo PeerJ vol 2013 no 1 2014

[36] Z Liu W Hsiao B L Cantarel E F Drabek and C Fraser-Liggett ldquoSparse distance-based learning for simultaneous mul-ticlass classification and feature selection of metagenomic datardquoBioinformatics vol 27 no 23 pp 3242ndash3249 2011

[37] O Tanaseichuk J Borneman and T Jiang ldquoPhylogeny-basedclassification ofmicrobial communitiesrdquoBioinformatics vol 30no 4 pp 449ndash456 2014

14 BioMed Research International

[38] H Cui and X Zhang ldquoAlignment-free supervised classificationof metagenomes by recursive SVMrdquo BMCGenomics vol 14 no1 article no 641 2013

[39] E Pasolli D T Truong F Malik L Waldron and N SegataldquoMachine Learning Meta-analysis of Large MetagenomicDatasets Tools and Biological Insightsrdquo PLoS ComputationalBiology vol 12 no 7 2016

[40] M Arumugam E D Harrington K U Foerstner J Raes andP Bork ldquoSmashCommunity A metagenomic annotation andanalysis toolrdquo Bioinformatics vol 26 no 23 pp 2977-29782010

[41] K J Hoff T Lingner P Meinicke and M Tech ldquoOrpheliaPredicting genes in metagenomic sequencing readsrdquo NucleicAcids Research vol 37 no 2 pp W101ndashW105 2009

[42] K J HoffM Tech T Lingner R Daniel BMorgenstern and PMeinicke ldquoGene prediction in metagenomic fragments A largescale machine learning approachrdquo BMC Bioinformatics vol 9article no 217 2008

[43] I Guyon and A Elisseeff ldquoAn introduction to variable andfeature selectionrdquo Journal of Machine Learning Research vol 3pp 1157ndash1182 2003

[44] Y Saeys I Inza and P Larranaga ldquoA review of feature selectiontechniques in bioinformaticsrdquo Bioinformatics vol 23 no 19 pp2507ndash2517 2007

[45] W-P Chen S-J J Tsai Y-C Hu and Y-L Lin ldquoMetage-nomic analysis and features selection in human oral microbiotaassociated with periodontal diseaserdquo in Proceedings of the 33rdWorkshop on Combinatorial Mathematics and Computationeory pp 58ndash64 2016

[46] G C Armitage ldquoDevelopment of a classification system forperiodontal diseases and conditionsrdquo Annals of Periodontologyvol 4 no 1 pp 1ndash6 1999

[47] C Y Tang S-M Yiu H-Y Kuo et al ldquoApplication of 16S rRNAmetagenomics to analyze bacterial communities at a respiratorycare centre in Taiwanrdquo Applied Microbiology and Biotechnologyvol 99 no 6 pp 2871ndash2881 2015

[48] L Cai L Ye A H Y Tong S Lok and T Zhang ldquoBiased Di-versityMetrics Revealed byBacterial 16S PyrotagsDerived fromDifferent Primer Setsrdquo PLoS ONE vol 8 no 1 2013

[49] J Zhang K Kobert T Flouri and A Stamatakis ldquoPEAR A fastand accurate Illumina Paired-End reAd mergeRrdquo Bioinformat-ics vol 30 no 5 pp 614ndash620 2014

[50] J G Caporaso J Kuczynski J Stombaugh et al ldquoQIIME allowsanalysis of high-throughput community sequencing datardquoNature Methods vol 7 no 5 pp 335-336 2010

[51] B J Haas D Gevers A M Earl et al ldquoChimeric 16S rRNAsequence formation and detection in Sanger and 454-pyro-sequenced PCR ampliconsrdquo Genome Research vol 21 no 3 pp494ndash504 2011

[52] R C Edgar B J Haas J C Clemente C Quince and RKnight ldquoUCHIME improves sensitivity and speed of chimeradetectionrdquo Bioinformatics vol 27 no 16 pp 2194ndash2200 2011

[53] T ChenW-H Yu J Izard OV BaranovaA Lakshmanan andF E Dewhirst ldquoThe Human Oral Microbiome Database a webaccessible resource for investigating oral microbe taxonomicand genomic informationrdquo Database the journal of biologicaldatabases and curation vol 2010 p baq013 2010

[54] T C J Hill K A Walsh J A Harris and B F Moffett ldquoUsingecological diversity measures with bacterial communitiesrdquoFEMS Microbiology Ecology vol 43 no 1 pp 1ndash11 2003

[55] R PDarveau ldquoPeriodontitis a polymicrobial disruption of hosthomeostasisrdquoNature ReviewsMicrobiology vol 8 no 7 pp 481ndash490 2010

[56] J M Albandar L J Brown and H Loe ldquoPutative PeriodontalPathogens in Subgingival Plaque of Young Adults with andWithout Early-Onset Periodontitisrdquo Journal of Periodontologyvol 68 no 10 pp 973ndash981 1997

[57] Y Takeuchi M Umeda M Sakamoto Y Benno Y Huang andI Ishikawa ldquoTreponema socranskii Treponema denticola andPorphyromonas gingivalis are associated with severity of peri-odontal tissue destructionrdquo Journal of Periodontology vol 72no 10 pp 1354ndash1363 2001

[58] J J Zambon ldquoActinobacillus actinomycetemcomitans in hu-man periodontal diseaserdquo Journal of Clinical Periodontologyvol 12 no 1 pp 1ndash20 1985

[59] J Slots and M Ting ldquoActinobacillus actinomycetemcomitansand Porphyromonas gingivalis in human periodontal diseaseOccurrence and treatmentrdquo Periodontology 2000 vol 20 no 1pp 82ndash121 1999

[60] BKChoi B J Paster F EDewhirst andUBGobel ldquoDiversityof cultivable and uncultivable oral spirochetes from a patientwith severe destructive periodontitisrdquo Infection and Immunityvol 62 no 5 pp 1889ndash1895 1994

[61] B Liu L L Faller N Klitgord et al ldquoDeep sequencing of theoral microbiome reveals signatures of periodontal diseaserdquoPLoS ONE vol 7 no 6 Article ID e37919 2012

[62] J Wang J Qi H Zhao et al ldquoMetagenomic sequencingreveals microbiota and its functional potential associated withperiodontal diseaserdquo Scientific Reports vol 3 2013

[63] A L Griffen C J Beall J H Campbell et al ldquoDistinct and com-plex bacterial profiles in human periodontitis and healthrevealed by 16S pyrosequencingrdquo e ISME Journal vol 6 no6 pp 1176ndash1185 2012

[64] M A Masalma F Armougom W Michael Scheld et al ldquoTheexpansion of the microbiological spectrum of brain abscessesWith use of multiple 16S ribosomal DNA Sequencingrdquo ClinicalInfectious Diseases vol 48 no 9 pp 1169ndash1178 2009

[65] MAlMasalmaM LonjonHRichet et al ldquoMetagenomic anal-ysis of brain abscesses identifies specific bacterial associationsrdquoClinical Infectious Diseases vol 54 no 2 pp 202ndash210 2012

[66] I Nasidze J Li DQuinque K Tang andM Stoneking ldquoGlobaldiversity in the human salivary microbiomerdquoGenome Researchvol 19 no 4 pp 636ndash643 2009

[67] F V Wintzingerode U B Gobel and E Stackebrandt ldquoDeter-mination of microbial diversity in environmental samples Pit-falls of PCR-based rRNA analysisrdquo FEMSMicrobiology Reviewsvol 21 no 3 pp 213ndash229 1997

[68] M Tezal F A Scannapieco J Wactawski-Wende S G Grossiand R J Genco ldquoSupragingival plaque may modify the effectsof subgingival bacteria on attachment lossrdquo Journal of Periodon-tology vol 77 no 5 pp 808ndash813 2006

[69] R Caruana and A Niculescu-Mizil ldquoAn empirical comparisonof supervised learning algorithmsrdquo in Proceedings of the ICML2006 23rd International Conference on Machine Learning pp161ndash168 USA June 2006

[70] R Caruana N Karampatziakis and A Yessenalina ldquoAn empir-ical evaluation of supervised learning in high dimensionsrdquo inProceedings of the 25th International Conference on MachineLearning pp 96ndash103 Finland July 2008

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 4: Composition Analysis and Feature Selection of the Oral

4 BioMed Research International

GenPFC((1198861 1198862 119886119899)) 119862 Generate prioritized feature combinations

Input (1198861 1198862 119886119899) a list with 119899 features in prioritized order

Output a queue 119876 used to store 2119899 minus 1 feature combinations1 119876 larr997888 (0) 119862 Enqueue empty set 0 into queue 1198762 for 119894 larr997888 1 to 119899 do 119862 Generate attribute combinations according to features in the list3 119879 larr997888 119876 119862 Copy 119876 into 119879 which is a temporary queue4 for each 119904 in 119879 do5 Enqueue(119876 119904 cup 119886

119894)

6 Dequeue(119876) 119862 Delete first empty set 0 from queue 1198767 return 119876

Algorithm 1 The prioritized feature combination-generated algorithm was used to generate all combinations of selected features inprioritized order As an example when 119899 equals four the generated list will be (1000 0100 1100 0010 1010 0110 1110 0001 1001 01011101 0011 1011 0111 1111) Each element is a combination and denotes whether the four features were selected in that combination (eg thecombination containing the first and third features is represented as 1010)

Table 2 Dominant microbes of the human oral microbiota at each taxonomic level

Phylum Class OrderBacteroidetes 3741 Bacteroidia 3171 Bacteroidales 3171Firmicutes 2082 Fusobacteria 1606 Fusobacteriales 1606Fusobacteria 1606 Spirochaetia 886 Spirochaetales 886Proteobacteria 930 Bacilli 783 Lactobacillales 706Spirochaetes 886 Clostridia 678 Clostridiales 678Actinobacteria 238 Negativicutes 521 Selenomonadales 521Family Genus SpeciesPrevotellaceae 1639 Prevotella 1356 Porphyromonas gingivalis 730Porphyromonadaceae 1296 Fusobacterium 1130 Fusobacterium nucleatum subsp vincentii 523Fusobacteriaceae 1130 Porphyromonas 1094 Prevotella intermedia 462Spirochaetaceae 886 Treponema 886 Streptococcus sp oral taxon 423 262Streptococcaceae 652 Streptococcus 652 Bacteroidales sp oral taxon 274 218Veillonellaceae 521 Leptotrichia 476 Prevotella loescheii 215

Veillonella (349) and Neisseria (327) however inpatients with periodontal disease the dominant genera werePorphyromonas (1467) Prevotella (1416) Treponema(1190) Fusobacterium (1109) Leptotrichia (432) andStreptococcus (310) At the species level Streptococcus sporal taxon 423 (02-36) was the most abundant species inhealthy patients whereas Porphyromonas gingivalis (0-31)was the most abundant species in patients with periodontitisTable 3 compares the dominant microbes between healthypatients and patients with periodontitis at each taxonomiclevel The genus and species level taxonomic compositionsbetween healthy patients and patients with periodontitis areshown in Figures 1 and 2 Streptococcus was more abun-dant in samples from all healthy individuals but decreasedin samples from patients with periodontitis AdditionallyPorphyromonas and Treponema were more abundant inpatients with periodontitis but decreased significantly insamples from healthy individuals In total 25 species wereidentified with significantly different abundances betweensample groups Porphyromonas gingivalis was the specieswith the most significantly differential abundance betweensamples fromhealthy patients and patients with periodontitis(p value = 241E-9)

Overall our findings were largely comparable to thoseof previous studies [14 55ndash61] indicating that species suchas Porphyromonas gingivalisTreponema denticola Tannerellaforsythia Filifactor alocis Treponema socranskii Aggregat-ibacter actinomycetemcomitans Treponema vincentii andMycoplasma faucium were significantly enriched in samplesfrom patients with periodontitis Furthermore we found aset of species including Streptococcus sanguinisHaemophilusparainfluenzae Capnocytophaga granulosa Gemella morbil-lorum Campylobacter showae and Granulicatella adiacenswere significantly enriched in samples from healthy individ-uals

Several studies have described the bacterial communitiesin patientswith periodontitis and healthy control participantsusing metagenomics [16ndash19 61ndash63] The dominant microor-ganisms associated with periodontitis and the healthy statewere largely consistent in those studies however we observedseveral discrepancies First in addition to common diseased-associated microorganisms such as Porphyromonas gingi-valis Treponema denticola Tannerella forsythia Filifactoralocis and Aggregatibacter actinomycetemcomitans we alsofound that the speciesMycoplasma faucium was significantlyenriched in samples from patients with periodontal disease

BioMed Research International 5

Table 3 Dominant microbes of the oral microbiota between healthy patients and patients with periodontitis at each taxonomic level

Healthy patients Patients with periodontitisPhylum

Bacteroidetes 3193 Bacteroidetes 4026Firmicutes 2690 Firmicutes 1766Fusobacteria 1731 Fusobacteria 1542Proteobacteria 1181 Spirochaetes 1190Actinobacteria 336 Proteobacteria 799Saccharibacteria 320 Synergistetes 250

ClassBacteroidia 2476 Bacteroidia 3532Fusobacteria 1731 Fusobacteria 1542

Bacilli 1523 Spirochaetia 1190Negativicutes 671 Clostridia 793Flavobacteriia 667 Negativicutes 443Clostridia 457 Bacilli 398

OrderBacteroidales 2476 Bacteroidales 3532Fusobacteriales 1731 Fusobacteriales 1542Lactobacillales 1394 Spirochaetales 1190Selenomonadales 671 Clostridiales 793Flavobacteriales 667 Selenomonadales 443Clostridiales 457 Lactobacillales 348

FamilyPrevotellaceae 1669 Porphyromonadaceae 1719Streptococcaceae 1309 Prevotellaceae 1624Fusobacteriaceae 1170 Spirochaetaceae 1190Veillonellaceae 671 Fusobacteriaceae 1109

Flavobacteriaceae 667 Veillonellaceae 443Leptotrichiaceae 561 Leptotrichiaceae 432

GenusStreptococcus 1309 Porphyromonas 1467Prevotella 1243 Prevotella 1416

Fusobacterium 1170 Treponema 1190Capnocytophaga 625 Fusobacterium 1109Leptotrichia 560 Leptotrichia 432Alloprevotella 426 Streptococcus 310

SpeciesStreptococcus sp oral taxon 423 588 Porphyromonas gingivalis 1101

Fusobacterium nucleatum subsp vincentii 422 Prevotella intermedia 602Fusobacterium nucleatum subsp polymorphum 352 Fusobacterium nucleatum subsp vincentii 576

Veillonella parvula 333 Treponema denticola 268Bacteroidales sp oral taxon 274 311 Fusobacterium nucleatum subsp nucleatum 234

Fusobacterium nucleatum subsp animalis 309 Tannerella forsythia 232

There were 26 samples that contained this species at greaterthan 05 abundance and only one of these samples wasderived from a healthy patient The average relative abun-dance of Mycoplasma faucium was 059 in all samples(004 and 087 in samples from healthy patients andpatients with periodontal disease respectively) and was upto 485 in one diseased sample Although this is a rare

bacterium in the normal microbiota of the human orophar-ynx some reports have identified this pathogen in brainabscesses [64 65] Additionally Liu et al [61] characterizedthe genomes of key players in the subgingival microbiotain patients with periodontitis including an unculturableTM7 organism They also demonstrated that TM7 organismswere significantly enriched in samples from patients with

6 BioMed Research International

lowast PorphyromonasPrevotella

lowast Treponemalowast Tannerella

lowast Fretibacterium lowast Filifactor

AggregatibacterBacteroidetes_[Gminus5]

lowast MycoplasmaParvimonas

lowast Bacteroidetes_[Gminus3]Bacteroidetes_[Gminus6]

AnaeroglobusFusobacteriumlowast Streptococcus

lowast CapnocytophagaLeptotrichia

AlloprevotellaCampylobacter

VeillonellaNeisseria

Bacteroidales_[Gminus2]lowast Haemophilus

TM7_[Gminus1]SelenomonasTM7_[Gminus5]lowast Gemella

DialisterCorynebacteriumlowast Granulicatella

ActinomycesSR1_[Gminus1]

Peptostreptococcaceae_[XI][Gminus7]

12 9 6 3 0 3 6 9 12Average abundance

Sample group

Case

Control

Gen

us

Figure 1 Microbial compositions of samples from healthy patients and patients with periodontitis at the genus level The abundances werecalculated by averaging the relative abundances in samples from healthy patients and patients with periodontitis Only genera with gt 05abundance in at least one sample were included Genera with significant differences in abundance between sample groups are indicated withasterisks (lowast) (p value lt 00001)

periodontitis In our study 49 of 76 samples contained TM7bacteria at greater than 1 abundance (average abundanceof 21 in all samples) In samples from healthy patientsand patients with periodontitis the average abundanceswere 32 and 149 respectively However significantenrichment was not observed in samples from patients withperiodontitis Furthermore we found that the subspeciesFusobacterium nucleatum subsp polymorphum which isrelated to periodontal disease and is themember of the orangecluster described by Socransky et al [14] is more abundant inhealthy patients In our results the average abundances were352 and 113 in samples fromhealthy patients and patientswith periodontitis respectively This situation also can beobserved in other three species including Campylobactergracilis Campylobacter rectus and Campylobacter showaeThis discrepancy could be explained by geographic variability[66] or by differences in the depths of the pockets sampled[14] as well as the sample size and the DNA analytic bias [67]Finally Spearmanrsquos rank correlation coefficientwas computed

to assess association between each pair of species associatedwith periodontal disease Figure 3 shows that a very strongrelationship exhibited among species Porphyromonas gingi-valis Treponema denticola and Tannerella forsythia

In our study there are 25 bacterial species with signif-icantly different abundances between healthy patients andpatients with periodontitis The relationships of these speciesto pocket depth and clinical attachment loss were examinedFigure 4 shows that three species Porphyromonas gingivalisTreponema denticola and Tannerella forsythia exhibited avery strong relationship with pocket depth and clinicalattachment loss For instance the three species increasedin abundance with increasing pocket depth and clinicalattachment loss The abundances of those species amongdifferent level of pocket depth and clinical attachment losswere different significantly However it should be noted thatnot only oral microorganisms but also others factors suchas supragingival plaque would affect the pocket depth andclinical attachment loss [68]

BioMed Research International 7

lowast Porphyromonas gingivalisPrevotella intermedia

Fusobacterium nucleatum_subsp_vincentiilowast Treponema denticola

lowast Fusobacterium nucleatum_subsp_nucleatumlowast Tannerella forsythia

Porphyromonas endodontalislowast Filifactor alocis

lowast Treponema vincentiiTreponema sp_oral_taxon_231

lowast Treponema socranskiilowast Aggregatibacter actinomycetemcomitans

lowast Fretibacterium sp_oral_taxon_359lowast Leptotrichia sp_oral_taxon_218lowast Treponema sp_oral_taxon_237

Prevotella sp_oral_taxon_313Prevotella pleuritidis

Bacteroidetes sp_oral_taxon_511lowast Mycoplasma faucium

lowast Fretibacterium sp_oral_taxon_360Parvimonas micra

Bacteroidetes sp_oral_taxon_516Campylobacter sp_oral_taxon_044

lowast Prevotella sp_oral_taxon_304Anaeroglobus geminatus

9 6 3 0 3 6Average abundance

Sample group

Case

Control

Spec

ies

lowast Streptococcus sp_oral_taxon_423lowast Fusobacterium nucleatum_subsp_polymorphum

Veillonella parvula

Fusobacterium nucleatum_subsp_animalisBacteroidales sp_oral_taxon_274

Prevotella loescheiilowast Streptococcus sanguinis

Alloprevotella tanneraelowast Haemophilus parainfluenzae

Campylobacter gracilislowast Fusobacterium sp_HOT_204

Porphyromonas catoniaelowast Capnocytophaga granulosa

Neisseria flavescensNeisseria sicca

TM7sp_oral_taxon_356Prevotella oris

Streptococcus cristatusCapnocytophaga leadbetteri

Leptotrichia sp_oral_taxon_417lowast Gemella morbillorumStreptococcus dentisani

lowast Campylobacter showaeAlloprevotella sp_oral_taxon_473

Corynebacterium matruchotiiDialister invisus

Leptotrichia sp_oral_taxon_225lowast Capnocytophaga sp_oral_taxon_412

Capnocytophaga sputigenalowast Granulicatella adiacens

Prevotella pallensStreptococcus sp_oral_taxon_058lowast Prevotella sp_oral_taxon_300

Aggregatibacter segnisCapnocytophaga sp_oral_taxon_326

TM7sp_oral_taxon_346Leptotrichia hongkongensis

Campylobacter concisusLeptotrichia sp_oral_taxon_392

Figure 2 Microbial compositions of samples from healthy patients and patients with periodontitis at the species levelThe abundances werecalculated by averaging the relative abundances in samples from healthy patients and patients with periodontitis Only species with gt 05abundance in at least one sample are shown Species with significant differences in abundance between sample groups are indicated withasterisks (lowast) (p value lt 00001)

8 BioMed Research International

F nuc ss nucleatum

C rectus

P intermedia

P gingivalis

T forsythia

T denticolaF nuc ss polymorphum

C showae

P micra

F nuc ss vincentiiC gracilis

P nigrescens

F nuc s

s nucle

atum

C rectu

s

P inter

media

P gingiva

lis

T forsy

thia

T denticola

F nuc s

s polym

orphum

C showae

P micr

aF n

uc ss v

incentii

C gracilis

P nigres

cens

1

08

06

04

02

0

-02

-04

-06

-08

-1

Figure 3 The relationships among species were evaluated using Spearmanrsquos rank correlation coefficient

p lt 00001

Porphyromonas gingivalis Treponema denticola Tannerella forsythia

12

9

6

3

12

9

6

3

12

9

6

3

15

12

lt 3 3 4 5 6 gt 7

Porphyromonas gingivalis

p lt 00001

p lt 00001 p lt 00001 p lt 00001

lt 3 3 4 5 6 gt 7

Treponema denticola

p lt 00001 15

12

lt 3 3 4 5 6 gt 7Pocket depth (mm)

Tannerella forsythia

9 9

6 6

3

15

12

9

6

3 3

lt 3 3 4 5 6 gt 7 lt 3 3 4 5 6 gt 7 lt 3 3 4 5 6 gt 7Clinical attachment loss (mm)

Aver

age a

bund

ance

Aver

age a

bund

ance

Figure 4 Relationships of the average abundance of three species to selected pocket depths and clinical attachment loss levels Significanceof differences among pocket depth levels was tested using the Kruskal-Wallis test

33 Diversity of Bacterial Community Profiles To evaluatethe alpha diversity of the microbial communities Shannonindex curves scores and richness metrics (Observed Chao1and ACE) were applied as shown in Figure 5 As depictedin Figure 5(a) the Shannon diversity index curves clearlyreached plateau levels after the sequence number exceeded5000 in all three health statuses indicating that the microbial

composition for each health status was well represented bythe sequencing depth As shown in Figure 5(b) the averagerichness measured by Observed Chao1 and Ace indexeswas higher in samples from patients with periodontitis thanin samples from healthy individuals however these resultswere in contrast to the results from the Shannon diversityindex Thus the relative abundance of each microbe was

BioMed Research International 9

6

Health StatusHMP SP

4

2

0

0 5000 10000 15000Sequences Per Sample

Aver

age S

hann

on In

dex

(a)

Observed Chao1 ACE

Case

Control

Case

Control

Case

Control

0

100

200

300

400

500

Sample group

Alp

ha D

iver

sity M

easu

re

Shannon

Case

Control

30

35

40

45

p value = 0166

(b)

Figure 5 (a) The sequencing depths measured by average scores from the Shannon index reached a plateau when the sequence numberexceeded 5000 (b) Alpha-diversity metrics (richness and Shannon index) were employed to measure the microbial communities of samplesfrom healthy patients and patients with periodontitis The average richness of microbes was higher in patients with periodontal disease thanin healthy patients however the microbial communities of healthy patients exhibited higher Shannon indexes

more balanced in samples from healthy individuals than insamples from patients with periodontal disease and therewere more microbes with low relative abundance in samplesfrom patients with periodontitis

To further explore the relationships between bacterialcommunities in healthy patients and patients with peri-odontal disease PCoAwas performed (Figure 6(a)) Analysisof beta diversity based on the weighted UniFrac distancesshowed greater concentration in diseased samples than inhealthy samples In other words the microbial composi-tions of diseased samples were more similar to each otherAs shown in Figure 6(b) PCA of microbial communitiesrevealed that the core genera in healthy samples includedStreptococcus Capnocytophaga Campylobacter VeillonellaAlloprevotella TM7 [G-1] Leptotrichia and Selenomonaswhereas those in samples from patients with periodontitiswere Filifactor Treponema Fretibacterium Porphyromonasand Tannerella

34 Machine Learning and Feature Selection Before applyingthe machine learning algorithm to classify samples it isnecessary to select the features from the samples and trainpredictionmodels Table 4 lists featureswith difference scores

p lt 1E-07 Based on significant differences between healthypatients and patients with periodontitis we selected the top10 microbes with more information as features In total1023 combinations of selected features were generated by ouralgorithm All feature combinations were evaluated by SVMrandom forest logical regression and deep learning machinelearning methods and the average accuracies were 088 093085 and 090 respectively Figure 7 shows the performanceof each machine learning method In general the accuracyof prediction increased slightly with the number of featuresused except in logistic regression From our results wefound that random forests had better predictive ability thanthe other methods Applying combinations consisting ofPeptoniphilaceae sp oral taxon 113 Streptococcus sanguinisMollicutes sp oral taxon 906Aggregatibacter actinomycetem-comitans Porphyromonas gingivalis Peptostreptococcaceae sporal taxon 950 and Lachnospiraceae sp oral taxon 500or Stomatobaculum sp oral taxon 373 Desulfobulbus sporal taxon 041 Peptoniphilaceae sp oral taxon 113 Strep-tococcus sanguinis Aggregatibacter actinomycetemcomitansPorphyromonas gingivalis and Leptotrichia sp oral taxon 218showed that random forests could predict the health status ofsamples accurately The feature combinations having averageaccuracies of more than 094 are reported in Table 5

10 BioMed Research International

minus01

00

01

02

minus02 minus01 00 01 02Axis1 [403]

Axi

s2 [1

35

]

Health statusHMPSP

Sample groupCaseControl

(a)

Prevotella

Fusobacterium

Porphyromonas

Treponema

Streptococcu

s

Leptotri

chia

Capnocytophaga

Alloprevotella

Campylobacter

Neisse

ria

Veillonella

Bacteroidales_[Gminus2]

Tannerella

Fretibacterium

Filifactor

Aggregatibacter

SelenomonasTM7_[Gminus1]minus2

0

2

4

minus4 minus2 0 2PC1 (330 explained var)

PC2

(13

3 ex

plai

ned

var)

Sample group Case Control

(b)

Figure 6 (a) Principal coordinate analysis (PCoA) with weighted UniFrac distance matrixes for bacterial communities associated with thethree health statuses (b) Principal component analysis (PCA) of the dominant genera between samples from healthy patients and patientswith periodontitis Only genera with ge 1 mean relative abundance across all samples are shown

Table 4 Features with significant differences between healthy patients and patients with periodontitis Correlation coefficients and p valueswere determined by Spearmanrsquos rank correlation coefficient and KruskalminusWallis tests respectively Negative correlations indicated that thefeatures were observed more often in patients with periodontitis than in healthy patients

No Feature (Species) Correlation coefficient 119901

1 Stomatobaculum sp oral taxon 373 -0766029754 327E-112 Desulfobulbus sp oral taxon 041 -074877058 890E-113 Peptoniphilaceae sp oral taxon 113 -0723418056 373E-104 Streptococcus sanguinis 071684624 536E-105 Mollicutes sp oral taxon 906 -0709369416 808E-106 Aggregatibacter actinomycetemcomitans -0686608198 274E-097 Porphyromonas gingivalis -0683993685 315E-098 Peptostreptococcaceae sp oral taxon 950 -0681489164 359E-099 Lachnospiraceae sp oral taxon 500 -0670324546 643E-0910 Leptotrichia sp oral taxon 218 -0666642231 777E-0911 Bosea vestrisii 0665468802 826E-0912 Filifactor alocis -0656797473 129E-0813 Mycoplasma faucium -0641322841 279E-0814 Prevotella sp oral taxon 304 -0638587976 320E-0815 Fretibacterium sp oral taxon 359 -0632290825 436E-0816 Bergeyella sp oral taxon 322 0630961524 465E-0817 Tannerella forsythia -0628346704 528E-0818 Peptostreptococcus indolicus -0626504998 577E-0819 Johnsonella sp oral taxon 166 -0622396393 704E-0820 Peptostreptococcaceae [Eubacterium] saphenum -0616735679 924E-08

BioMed Research International 11

Table 5 Feature combinations and their predictive accuracies with different machine learning methods Only feature combinations withmore than 094 average accuracy are shown DL RF and LR represent deep learning random forests and logistic regression respectively

Feature combination DL RF SVM LR Average accuracyStomatobaculum sp oral taxon 373 0967 0973 0960 0933 0958Peptoniphilaceae sp oral taxon 113Desulfobulbaceae sp oral taxon 041

0933 0960 0973 0947 0953Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLachnospiraceae sp oral taxon 500Leptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0933 0973 0960 0947 0953Streptococcus sanguinisAggregatibacter actinomycetemcomitansDesulfobulbaceae sp oral taxon 041

0973 0967 0933 0927 0950Mollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0947 0953 0907 0987 0948Streptococcus sanguinisMollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0960 0967 0947 0913 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Desulfobulbaceae sp oral taxon 041

0933 0973 0933 0947 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0967 0933 0953 0933 0947Peptoniphilaceae sp oral taxon 113Mollicutes sp oral taxon 906Peptoniphilaceae sp oral taxon 113

0960 0987 0867 0967 0945Streptococcus sanguinisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0920 0947 0967 0947 0945Aggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0967 0967 0953 0893 0945Peptoniphilaceae sp oral taxon 113Porphyromonas gingivalisAggregatibacter actinomycetemcomitans

According to previous studies Caruana et al [69 70] pro-posed that the random forest method showed better accuracyin high-dimensional and large-scale data than neural netsSVM and logistic regression In this study we found that therandom forest method was more suitable for small-scale datathan other methods In contrast deep learning approachesled to good performance but required long computation

times and large amounts of memory particularly when thehidden layer size was increased

4 Conclusions

With the development of high-throughput DNA sequencingtechnology the limitations associated with difficult culture of

12 BioMed Research International

075

080

085

090

095

1 2 3 4 5 6 7 8 9 10Number of features in the combinstion

Aver

age a

ccur

acy

Machine learning algorithmsDeep learningLogistic regressionRandom forestSVM

Figure 7 Average accuracies of different numbers of features

manymicrobes that populate the oral cavity can be overcomefacilitating the analysis of bacterial community compositionUsing 16S rRNA sequencing of subgingival samples from 50individuals with periodontitis and 26 periodontally healthycontrols we determined the diversity of and differences incommunity compositions Moreover we identified microbesassociated with good health and periodontal disease andprovided a machine learning method for finding patternsand making predictions for oral microbiota associated withperiodontal disease

Our results showed that there was a higher diversity ofmicrobes in samples from patients with periodontal diseasethan in samples from healthy patients Importantly the coremicrobes in healthy patients were different significantly fromthose in patients with periodontitis We also found thatbacterial communities associated with healthy and diseasedstates were highly different in PCA and PCoA and thecompositions of microorganisms were more similar to eachother in samples from patients with periodontal disease thanin samples from healthy individuals

We proposed a novel feature selection method andinvestigated the potential of machine learning approachesfor determination of health status based on oral metage-nomics data By using nonparametric KruskalminusWallis teststo assess the significance of each microorganism we selectedsignificant microbes to generate prioritized feature combina-tions by our algorithm The performances of four machinelearning approaches were evaluated with these feature com-binations and random forests showed the best performance(average accuracy of 093 from 1023 feature combinations)followed by deep learning SVM and logistic regressionUsing machine learning methods training models couldaccurately predict the health status of samples by examiningfewer features According to our observations the accuracyof prediction generally increased slightly with the numberof features used except for logistic regression Notablycertain combinations composed of fewer features showedbetter accuracy than combinations composed of all selectedfeatures These combinations of features may only apply toour dataset However the results implied that a few relatedfeatures may have better predictive ability than multiple

independent features Therefore in order to improve theprediction accuracy of the model it is essential to identifythe most informative features Due to limitations in fundingtime and ethical considerations it is not easy to obtain largenumbers of oral samples from patients with periodontitisAlthough insufficient and incomplete samples could easilylead to bias and variance in training models our study stillprovided an important basis for further studies

Periodontitis is a chronic inflammatory disease involvingcomplex interactions between the oral microorganisms andthe host immune response In addition to the individualspecies associated with pathogenesis the system-level mech-anisms underlying the transition from a healthy state to adiseased state are key points for studying periodontal diseaseThus in our future studies we aim to elucidate the globalgenetic metabolic and ecological changes associated withperiodontitis and identify the pathogenic features of con-structing machine learning models Rapid molecular tech-niques and machine learning methods capable of identifyingperiodontal bacteria with great accuracy may eventuallyprovide improved classification and diagnosis of varioustypes of periodontal diseases and aid significantly in clinicaldecision-making

Data Availability

The raw sequences of human oral subgingival plaque sampleswere deposited at theNCBI SequenceReadArchive under theBioproject Accession no PRJNA437129

Conflicts of Interest

The authors declare no conflicts of interest

Authorsrsquo Contributions

Wen-Pei Chen and Shih-Hao Chang contributed equally tothis work

Funding

Thepresent work was partially supported by a grant from theMinistry of Science and Technology [grant number MOST107-2218-E-126-001-] and [grant number NSC 102-2622-E-126-002 CC1]

References

[1] L Gao T Xu G Huang S Jiang Y Gu and F Chen ldquoOral mi-crobiomesmore andmore importance in oral cavity and wholebodyrdquo Protein amp Cell pp 1ndash13 2018

[2] P D Marsh ldquoMicrobiology of dental plaque biofilms and theirrole in oral health and cariesrdquo Dental Clinics of North Americavol 54 no 3 pp 441ndash454 2010

[3] P D Marsh ldquoDental plaque as a biofilm and a microbialcommunity - Implications for health and diseaserdquo BMC OralHealth vol 6 no 1 2006

BioMed Research International 13

[4] R J Palmer Jr ldquoComposition and development of oral bacterialcommunitiesrdquo Periodontology 2000 vol 64 no 1 pp 20ndash392014

[5] B L Pihlstrom B S Michalowicz and N W Johnson ldquoPeri-odontal diseasesrdquoe Lancet vol 366 no 9499 pp 1809ndash18202005

[6] R J Genco and T E Van Dyke ldquoReducing the risk of CVD inpatients with periodontitisrdquo Nature Reviews Cardiology vol 7no 9 pp 479-480 2010

[7] K Lundberg N Wegner T Yucel-Lindberg and P J VenablesldquoPeriodontitis in RA-the citrullinated enolase connectionrdquoNature Reviews Rheumatology vol 6 no 12 pp 727ndash730 2010

[8] M Kebschull R T Demmer and P N Papapanou ldquoldquoGumbug leave my heart alonerdquomdashepidemiologic and mechanisticevidence linking periodontal infections and atherosclerosisrdquoJournal of Dental Research vol 89 no 9 pp 879ndash902 2010

[9] S E Whitmore R J Lamont and W E Goldman ldquoOral Bac-teria and Cancerrdquo PLoS Pathogens vol 10 no 3 p e10039332014

[10] Y W Han and X Wang ldquoMobile microbiome oral bacteria inextra-oral infections and inflammationrdquo Journal of Dental Re-search vol 92 no 6 pp 485ndash491 2013

[11] P N Madianos Y A Bobetsis and S Offenbacher ldquoAdversepregnancy outcomes (APOs) and periodontal disease Patho-genic mechanismsrdquo Journal of Clinical Periodontology vol 40no 14 pp S170ndashS180 2013

[12] S Temoin K LWu VWuM Shoham and YW Han ldquoSignalpeptide of FadA adhesin from Fusobacterium nucleatum playsa novel structural role by modulating the filamentrsquos length andwidthrdquo FEBS Letters vol 586 no 1 pp 1ndash6 2012

[13] S Malm M Jusko S Eick J Potempa K Riesbeck and AM Blom ldquoAcquisition of complement inhibitor serine proteasefactor i and its cofactors C4b-binding protein and factor H byprevotella intermediardquo PLoS ONE vol 7 no 4 2012

[14] S S Socransky A D Haffajee M A Cugini C Smith and R LKent Jr ldquoMicrobial complexes in subgingival plaquerdquo Journal ofClinical Periodontology vol 25 no 2 pp 134ndash144 1998

[15] R Teles F Teles J Frias-Lopez B Paster and A HaffajeeldquoLessons learned and unlearned in periodontal microbiologyrdquoPeriodontology 2000 vol 62 no 1 pp 95ndash162 2013

[16] P Belda-Ferre L D Alcaraz R Cabrera-Rubio et al ldquoThe oralmetagenome in health and diseaserdquo e ISME Journal vol 6no 1 pp 46ndash56 2012

[17] P S Kumar A L Griffen M L Moeschberger and E J LeysldquoIdentification of candidate periodontal pathogens and ben-eficial species by quantitative 16S clonal analysisrdquo Journal ofClinical Microbiology vol 43 no 8 pp 3944ndash3955 2005

[18] A P V Colombo S K Boches S L Cotton et al ldquoComparisonsof subgingival microbial profiles of refractory periodontitissevere periodontitis and periodontal health using the humanoral microbe identification microarrayrdquo Journal of Periodontol-ogy vol 80 no 9 pp 1421ndash1432 2009

[19] S Junemann K Prior R Szczepanowski et al ldquoBacterial com-munity shift in treated periodontitis patients revealed by IonTorrent 16S rRNA gene amplicon sequencingrdquo PLoS ONE vol7 no 8 Article ID e41606 2012

[20] P D Schloss S L Westcott T Ryabin et al ldquoIntroduc-ing mothur open-source platform-independent community-supported software for describing and comparing microbialcommunitiesrdquoApplied and EnvironmentalMicrobiology vol 75no 23 pp 7537ndash7541 2009

[21] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquoBioinformatics vol 26 no 19 pp 2460-2461 2010

[22] F Mahe T Rognes C Quince C de Vargas andM DunthornldquoSwarm Robust and fast clusteringmethod for amplicon-basedstudiesrdquo PeerJ vol 2014 no 1 2014

[23] M Ghodsi B Liu and M Pop ldquoDNACLUST Accurate andefficient clustering of phylogenetic marker genesrdquo BMC Bioin-formatics vol 12 article no 271 2011

[24] L Fu B Niu Z Zhu SWu andW Li ldquoCD-HIT accelerated forclustering the next-generation sequencing datardquoBioinformaticsvol 28 no 23 pp 3150ndash3152 2012

[25] H Teeling J Waldmann T Lombardot M Bauer and F OGlockner ldquoTETRA a web-service and a stand-alone programfor the analysis and comparison of tetranucleotide usage pat-terns inDNA sequencesrdquoBMCBioinformatics vol 5 article 1632004

[26] Y Wang H C M Leung S M Yiu and F Y L Chin ldquoMeta-Cluster 40 A novel binning algorithm for NGS reads and hugenumber of speciesrdquo Journal of Computational Biology vol 19no 2 pp 241ndash249 2012

[27] YWangH CM Leung SM Yiu and F Y L Chin ldquoMetaclus-ter 50 a two-roundbinning approach formetagenomic data forlow-abundance species in a noisy samplerdquo Bioinformatics vol28 no 18 Article ID bts397 pp i356ndashi362 2012

[28] D D Kang J Froula R Egan and Z Wang ldquoMetaBAT anefficient tool for accurately reconstructing single genomes fromcomplex microbial communitiesrdquo PeerJ vol 2015 no 8 2015

[29] J Alneberg B S Bjarnason I De Bruijn et al ldquoBinning meta-genomic contigs by coverage and compositionrdquo Nature Meth-ods vol 11 no 11 pp 1144ndash1146 2014

[30] Y-W Wu Y-H Tang S G Tringe B A Simmons and SW Singer ldquoMaxBin An automated binning method to recoverindividual genomes from metagenomes using an expectation-maximization algorithmrdquoMicrobiome vol 2 no 1 2014

[31] QWangGMGarrity JM Tiedje and J R Cole ldquoNaıve Baye-sian classifier for rapid assignment of rRNA sequences into thenew bacterial taxonomyrdquo Applied and Environmental Microbi-ology vol 73 no 16 pp 5261ndash5267 2007

[32] N Chaudhary A K Sharma P Agarwal A Gupta and V KSharma ldquo16S classifier A tool for fast and accurate taxono-mic classification of 16S rRNAhypervariable regions inmetage-nomic datasetsrdquo PLoS ONE vol 10 no 2 2015

[33] S P Szafranski M L Wos-Oxley R Vilchez-Vargas et alldquoHigh-resolution taxonomic profiling of the subgingival micro-biome for biomarker discovery and periodontitis diagnosisrdquoApplied and EnvironmentalMicrobiology vol 81 no 3 pp 1047ndash1058 2015

[34] K Vervier PMaheM Tournoud J-B Veyrieras and J-P VertldquoLarge-scale machine learning for metagenomics sequenceclassificationrdquo Bioinformatics vol 32 no 7 pp 1023ndash1032 2016

[35] A E Darling G Jospin E Lowe F A Matsen H M Bik andJ A Eisen ldquoPhyloSift Phylogenetic analysis of genomes andmetagenomesrdquo PeerJ vol 2013 no 1 2014

[36] Z Liu W Hsiao B L Cantarel E F Drabek and C Fraser-Liggett ldquoSparse distance-based learning for simultaneous mul-ticlass classification and feature selection of metagenomic datardquoBioinformatics vol 27 no 23 pp 3242ndash3249 2011

[37] O Tanaseichuk J Borneman and T Jiang ldquoPhylogeny-basedclassification ofmicrobial communitiesrdquoBioinformatics vol 30no 4 pp 449ndash456 2014

14 BioMed Research International

[38] H Cui and X Zhang ldquoAlignment-free supervised classificationof metagenomes by recursive SVMrdquo BMCGenomics vol 14 no1 article no 641 2013

[39] E Pasolli D T Truong F Malik L Waldron and N SegataldquoMachine Learning Meta-analysis of Large MetagenomicDatasets Tools and Biological Insightsrdquo PLoS ComputationalBiology vol 12 no 7 2016

[40] M Arumugam E D Harrington K U Foerstner J Raes andP Bork ldquoSmashCommunity A metagenomic annotation andanalysis toolrdquo Bioinformatics vol 26 no 23 pp 2977-29782010

[41] K J Hoff T Lingner P Meinicke and M Tech ldquoOrpheliaPredicting genes in metagenomic sequencing readsrdquo NucleicAcids Research vol 37 no 2 pp W101ndashW105 2009

[42] K J HoffM Tech T Lingner R Daniel BMorgenstern and PMeinicke ldquoGene prediction in metagenomic fragments A largescale machine learning approachrdquo BMC Bioinformatics vol 9article no 217 2008

[43] I Guyon and A Elisseeff ldquoAn introduction to variable andfeature selectionrdquo Journal of Machine Learning Research vol 3pp 1157ndash1182 2003

[44] Y Saeys I Inza and P Larranaga ldquoA review of feature selectiontechniques in bioinformaticsrdquo Bioinformatics vol 23 no 19 pp2507ndash2517 2007

[45] W-P Chen S-J J Tsai Y-C Hu and Y-L Lin ldquoMetage-nomic analysis and features selection in human oral microbiotaassociated with periodontal diseaserdquo in Proceedings of the 33rdWorkshop on Combinatorial Mathematics and Computationeory pp 58ndash64 2016

[46] G C Armitage ldquoDevelopment of a classification system forperiodontal diseases and conditionsrdquo Annals of Periodontologyvol 4 no 1 pp 1ndash6 1999

[47] C Y Tang S-M Yiu H-Y Kuo et al ldquoApplication of 16S rRNAmetagenomics to analyze bacterial communities at a respiratorycare centre in Taiwanrdquo Applied Microbiology and Biotechnologyvol 99 no 6 pp 2871ndash2881 2015

[48] L Cai L Ye A H Y Tong S Lok and T Zhang ldquoBiased Di-versityMetrics Revealed byBacterial 16S PyrotagsDerived fromDifferent Primer Setsrdquo PLoS ONE vol 8 no 1 2013

[49] J Zhang K Kobert T Flouri and A Stamatakis ldquoPEAR A fastand accurate Illumina Paired-End reAd mergeRrdquo Bioinformat-ics vol 30 no 5 pp 614ndash620 2014

[50] J G Caporaso J Kuczynski J Stombaugh et al ldquoQIIME allowsanalysis of high-throughput community sequencing datardquoNature Methods vol 7 no 5 pp 335-336 2010

[51] B J Haas D Gevers A M Earl et al ldquoChimeric 16S rRNAsequence formation and detection in Sanger and 454-pyro-sequenced PCR ampliconsrdquo Genome Research vol 21 no 3 pp494ndash504 2011

[52] R C Edgar B J Haas J C Clemente C Quince and RKnight ldquoUCHIME improves sensitivity and speed of chimeradetectionrdquo Bioinformatics vol 27 no 16 pp 2194ndash2200 2011

[53] T ChenW-H Yu J Izard OV BaranovaA Lakshmanan andF E Dewhirst ldquoThe Human Oral Microbiome Database a webaccessible resource for investigating oral microbe taxonomicand genomic informationrdquo Database the journal of biologicaldatabases and curation vol 2010 p baq013 2010

[54] T C J Hill K A Walsh J A Harris and B F Moffett ldquoUsingecological diversity measures with bacterial communitiesrdquoFEMS Microbiology Ecology vol 43 no 1 pp 1ndash11 2003

[55] R PDarveau ldquoPeriodontitis a polymicrobial disruption of hosthomeostasisrdquoNature ReviewsMicrobiology vol 8 no 7 pp 481ndash490 2010

[56] J M Albandar L J Brown and H Loe ldquoPutative PeriodontalPathogens in Subgingival Plaque of Young Adults with andWithout Early-Onset Periodontitisrdquo Journal of Periodontologyvol 68 no 10 pp 973ndash981 1997

[57] Y Takeuchi M Umeda M Sakamoto Y Benno Y Huang andI Ishikawa ldquoTreponema socranskii Treponema denticola andPorphyromonas gingivalis are associated with severity of peri-odontal tissue destructionrdquo Journal of Periodontology vol 72no 10 pp 1354ndash1363 2001

[58] J J Zambon ldquoActinobacillus actinomycetemcomitans in hu-man periodontal diseaserdquo Journal of Clinical Periodontologyvol 12 no 1 pp 1ndash20 1985

[59] J Slots and M Ting ldquoActinobacillus actinomycetemcomitansand Porphyromonas gingivalis in human periodontal diseaseOccurrence and treatmentrdquo Periodontology 2000 vol 20 no 1pp 82ndash121 1999

[60] BKChoi B J Paster F EDewhirst andUBGobel ldquoDiversityof cultivable and uncultivable oral spirochetes from a patientwith severe destructive periodontitisrdquo Infection and Immunityvol 62 no 5 pp 1889ndash1895 1994

[61] B Liu L L Faller N Klitgord et al ldquoDeep sequencing of theoral microbiome reveals signatures of periodontal diseaserdquoPLoS ONE vol 7 no 6 Article ID e37919 2012

[62] J Wang J Qi H Zhao et al ldquoMetagenomic sequencingreveals microbiota and its functional potential associated withperiodontal diseaserdquo Scientific Reports vol 3 2013

[63] A L Griffen C J Beall J H Campbell et al ldquoDistinct and com-plex bacterial profiles in human periodontitis and healthrevealed by 16S pyrosequencingrdquo e ISME Journal vol 6 no6 pp 1176ndash1185 2012

[64] M A Masalma F Armougom W Michael Scheld et al ldquoTheexpansion of the microbiological spectrum of brain abscessesWith use of multiple 16S ribosomal DNA Sequencingrdquo ClinicalInfectious Diseases vol 48 no 9 pp 1169ndash1178 2009

[65] MAlMasalmaM LonjonHRichet et al ldquoMetagenomic anal-ysis of brain abscesses identifies specific bacterial associationsrdquoClinical Infectious Diseases vol 54 no 2 pp 202ndash210 2012

[66] I Nasidze J Li DQuinque K Tang andM Stoneking ldquoGlobaldiversity in the human salivary microbiomerdquoGenome Researchvol 19 no 4 pp 636ndash643 2009

[67] F V Wintzingerode U B Gobel and E Stackebrandt ldquoDeter-mination of microbial diversity in environmental samples Pit-falls of PCR-based rRNA analysisrdquo FEMSMicrobiology Reviewsvol 21 no 3 pp 213ndash229 1997

[68] M Tezal F A Scannapieco J Wactawski-Wende S G Grossiand R J Genco ldquoSupragingival plaque may modify the effectsof subgingival bacteria on attachment lossrdquo Journal of Periodon-tology vol 77 no 5 pp 808ndash813 2006

[69] R Caruana and A Niculescu-Mizil ldquoAn empirical comparisonof supervised learning algorithmsrdquo in Proceedings of the ICML2006 23rd International Conference on Machine Learning pp161ndash168 USA June 2006

[70] R Caruana N Karampatziakis and A Yessenalina ldquoAn empir-ical evaluation of supervised learning in high dimensionsrdquo inProceedings of the 25th International Conference on MachineLearning pp 96ndash103 Finland July 2008

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 5: Composition Analysis and Feature Selection of the Oral

BioMed Research International 5

Table 3 Dominant microbes of the oral microbiota between healthy patients and patients with periodontitis at each taxonomic level

Healthy patients Patients with periodontitisPhylum

Bacteroidetes 3193 Bacteroidetes 4026Firmicutes 2690 Firmicutes 1766Fusobacteria 1731 Fusobacteria 1542Proteobacteria 1181 Spirochaetes 1190Actinobacteria 336 Proteobacteria 799Saccharibacteria 320 Synergistetes 250

ClassBacteroidia 2476 Bacteroidia 3532Fusobacteria 1731 Fusobacteria 1542

Bacilli 1523 Spirochaetia 1190Negativicutes 671 Clostridia 793Flavobacteriia 667 Negativicutes 443Clostridia 457 Bacilli 398

OrderBacteroidales 2476 Bacteroidales 3532Fusobacteriales 1731 Fusobacteriales 1542Lactobacillales 1394 Spirochaetales 1190Selenomonadales 671 Clostridiales 793Flavobacteriales 667 Selenomonadales 443Clostridiales 457 Lactobacillales 348

FamilyPrevotellaceae 1669 Porphyromonadaceae 1719Streptococcaceae 1309 Prevotellaceae 1624Fusobacteriaceae 1170 Spirochaetaceae 1190Veillonellaceae 671 Fusobacteriaceae 1109

Flavobacteriaceae 667 Veillonellaceae 443Leptotrichiaceae 561 Leptotrichiaceae 432

GenusStreptococcus 1309 Porphyromonas 1467Prevotella 1243 Prevotella 1416

Fusobacterium 1170 Treponema 1190Capnocytophaga 625 Fusobacterium 1109Leptotrichia 560 Leptotrichia 432Alloprevotella 426 Streptococcus 310

SpeciesStreptococcus sp oral taxon 423 588 Porphyromonas gingivalis 1101

Fusobacterium nucleatum subsp vincentii 422 Prevotella intermedia 602Fusobacterium nucleatum subsp polymorphum 352 Fusobacterium nucleatum subsp vincentii 576

Veillonella parvula 333 Treponema denticola 268Bacteroidales sp oral taxon 274 311 Fusobacterium nucleatum subsp nucleatum 234

Fusobacterium nucleatum subsp animalis 309 Tannerella forsythia 232

There were 26 samples that contained this species at greaterthan 05 abundance and only one of these samples wasderived from a healthy patient The average relative abun-dance of Mycoplasma faucium was 059 in all samples(004 and 087 in samples from healthy patients andpatients with periodontal disease respectively) and was upto 485 in one diseased sample Although this is a rare

bacterium in the normal microbiota of the human orophar-ynx some reports have identified this pathogen in brainabscesses [64 65] Additionally Liu et al [61] characterizedthe genomes of key players in the subgingival microbiotain patients with periodontitis including an unculturableTM7 organism They also demonstrated that TM7 organismswere significantly enriched in samples from patients with

6 BioMed Research International

lowast PorphyromonasPrevotella

lowast Treponemalowast Tannerella

lowast Fretibacterium lowast Filifactor

AggregatibacterBacteroidetes_[Gminus5]

lowast MycoplasmaParvimonas

lowast Bacteroidetes_[Gminus3]Bacteroidetes_[Gminus6]

AnaeroglobusFusobacteriumlowast Streptococcus

lowast CapnocytophagaLeptotrichia

AlloprevotellaCampylobacter

VeillonellaNeisseria

Bacteroidales_[Gminus2]lowast Haemophilus

TM7_[Gminus1]SelenomonasTM7_[Gminus5]lowast Gemella

DialisterCorynebacteriumlowast Granulicatella

ActinomycesSR1_[Gminus1]

Peptostreptococcaceae_[XI][Gminus7]

12 9 6 3 0 3 6 9 12Average abundance

Sample group

Case

Control

Gen

us

Figure 1 Microbial compositions of samples from healthy patients and patients with periodontitis at the genus level The abundances werecalculated by averaging the relative abundances in samples from healthy patients and patients with periodontitis Only genera with gt 05abundance in at least one sample were included Genera with significant differences in abundance between sample groups are indicated withasterisks (lowast) (p value lt 00001)

periodontitis In our study 49 of 76 samples contained TM7bacteria at greater than 1 abundance (average abundanceof 21 in all samples) In samples from healthy patientsand patients with periodontitis the average abundanceswere 32 and 149 respectively However significantenrichment was not observed in samples from patients withperiodontitis Furthermore we found that the subspeciesFusobacterium nucleatum subsp polymorphum which isrelated to periodontal disease and is themember of the orangecluster described by Socransky et al [14] is more abundant inhealthy patients In our results the average abundances were352 and 113 in samples fromhealthy patients and patientswith periodontitis respectively This situation also can beobserved in other three species including Campylobactergracilis Campylobacter rectus and Campylobacter showaeThis discrepancy could be explained by geographic variability[66] or by differences in the depths of the pockets sampled[14] as well as the sample size and the DNA analytic bias [67]Finally Spearmanrsquos rank correlation coefficientwas computed

to assess association between each pair of species associatedwith periodontal disease Figure 3 shows that a very strongrelationship exhibited among species Porphyromonas gingi-valis Treponema denticola and Tannerella forsythia

In our study there are 25 bacterial species with signif-icantly different abundances between healthy patients andpatients with periodontitis The relationships of these speciesto pocket depth and clinical attachment loss were examinedFigure 4 shows that three species Porphyromonas gingivalisTreponema denticola and Tannerella forsythia exhibited avery strong relationship with pocket depth and clinicalattachment loss For instance the three species increasedin abundance with increasing pocket depth and clinicalattachment loss The abundances of those species amongdifferent level of pocket depth and clinical attachment losswere different significantly However it should be noted thatnot only oral microorganisms but also others factors suchas supragingival plaque would affect the pocket depth andclinical attachment loss [68]

BioMed Research International 7

lowast Porphyromonas gingivalisPrevotella intermedia

Fusobacterium nucleatum_subsp_vincentiilowast Treponema denticola

lowast Fusobacterium nucleatum_subsp_nucleatumlowast Tannerella forsythia

Porphyromonas endodontalislowast Filifactor alocis

lowast Treponema vincentiiTreponema sp_oral_taxon_231

lowast Treponema socranskiilowast Aggregatibacter actinomycetemcomitans

lowast Fretibacterium sp_oral_taxon_359lowast Leptotrichia sp_oral_taxon_218lowast Treponema sp_oral_taxon_237

Prevotella sp_oral_taxon_313Prevotella pleuritidis

Bacteroidetes sp_oral_taxon_511lowast Mycoplasma faucium

lowast Fretibacterium sp_oral_taxon_360Parvimonas micra

Bacteroidetes sp_oral_taxon_516Campylobacter sp_oral_taxon_044

lowast Prevotella sp_oral_taxon_304Anaeroglobus geminatus

9 6 3 0 3 6Average abundance

Sample group

Case

Control

Spec

ies

lowast Streptococcus sp_oral_taxon_423lowast Fusobacterium nucleatum_subsp_polymorphum

Veillonella parvula

Fusobacterium nucleatum_subsp_animalisBacteroidales sp_oral_taxon_274

Prevotella loescheiilowast Streptococcus sanguinis

Alloprevotella tanneraelowast Haemophilus parainfluenzae

Campylobacter gracilislowast Fusobacterium sp_HOT_204

Porphyromonas catoniaelowast Capnocytophaga granulosa

Neisseria flavescensNeisseria sicca

TM7sp_oral_taxon_356Prevotella oris

Streptococcus cristatusCapnocytophaga leadbetteri

Leptotrichia sp_oral_taxon_417lowast Gemella morbillorumStreptococcus dentisani

lowast Campylobacter showaeAlloprevotella sp_oral_taxon_473

Corynebacterium matruchotiiDialister invisus

Leptotrichia sp_oral_taxon_225lowast Capnocytophaga sp_oral_taxon_412

Capnocytophaga sputigenalowast Granulicatella adiacens

Prevotella pallensStreptococcus sp_oral_taxon_058lowast Prevotella sp_oral_taxon_300

Aggregatibacter segnisCapnocytophaga sp_oral_taxon_326

TM7sp_oral_taxon_346Leptotrichia hongkongensis

Campylobacter concisusLeptotrichia sp_oral_taxon_392

Figure 2 Microbial compositions of samples from healthy patients and patients with periodontitis at the species levelThe abundances werecalculated by averaging the relative abundances in samples from healthy patients and patients with periodontitis Only species with gt 05abundance in at least one sample are shown Species with significant differences in abundance between sample groups are indicated withasterisks (lowast) (p value lt 00001)

8 BioMed Research International

F nuc ss nucleatum

C rectus

P intermedia

P gingivalis

T forsythia

T denticolaF nuc ss polymorphum

C showae

P micra

F nuc ss vincentiiC gracilis

P nigrescens

F nuc s

s nucle

atum

C rectu

s

P inter

media

P gingiva

lis

T forsy

thia

T denticola

F nuc s

s polym

orphum

C showae

P micr

aF n

uc ss v

incentii

C gracilis

P nigres

cens

1

08

06

04

02

0

-02

-04

-06

-08

-1

Figure 3 The relationships among species were evaluated using Spearmanrsquos rank correlation coefficient

p lt 00001

Porphyromonas gingivalis Treponema denticola Tannerella forsythia

12

9

6

3

12

9

6

3

12

9

6

3

15

12

lt 3 3 4 5 6 gt 7

Porphyromonas gingivalis

p lt 00001

p lt 00001 p lt 00001 p lt 00001

lt 3 3 4 5 6 gt 7

Treponema denticola

p lt 00001 15

12

lt 3 3 4 5 6 gt 7Pocket depth (mm)

Tannerella forsythia

9 9

6 6

3

15

12

9

6

3 3

lt 3 3 4 5 6 gt 7 lt 3 3 4 5 6 gt 7 lt 3 3 4 5 6 gt 7Clinical attachment loss (mm)

Aver

age a

bund

ance

Aver

age a

bund

ance

Figure 4 Relationships of the average abundance of three species to selected pocket depths and clinical attachment loss levels Significanceof differences among pocket depth levels was tested using the Kruskal-Wallis test

33 Diversity of Bacterial Community Profiles To evaluatethe alpha diversity of the microbial communities Shannonindex curves scores and richness metrics (Observed Chao1and ACE) were applied as shown in Figure 5 As depictedin Figure 5(a) the Shannon diversity index curves clearlyreached plateau levels after the sequence number exceeded5000 in all three health statuses indicating that the microbial

composition for each health status was well represented bythe sequencing depth As shown in Figure 5(b) the averagerichness measured by Observed Chao1 and Ace indexeswas higher in samples from patients with periodontitis thanin samples from healthy individuals however these resultswere in contrast to the results from the Shannon diversityindex Thus the relative abundance of each microbe was

BioMed Research International 9

6

Health StatusHMP SP

4

2

0

0 5000 10000 15000Sequences Per Sample

Aver

age S

hann

on In

dex

(a)

Observed Chao1 ACE

Case

Control

Case

Control

Case

Control

0

100

200

300

400

500

Sample group

Alp

ha D

iver

sity M

easu

re

Shannon

Case

Control

30

35

40

45

p value = 0166

(b)

Figure 5 (a) The sequencing depths measured by average scores from the Shannon index reached a plateau when the sequence numberexceeded 5000 (b) Alpha-diversity metrics (richness and Shannon index) were employed to measure the microbial communities of samplesfrom healthy patients and patients with periodontitis The average richness of microbes was higher in patients with periodontal disease thanin healthy patients however the microbial communities of healthy patients exhibited higher Shannon indexes

more balanced in samples from healthy individuals than insamples from patients with periodontal disease and therewere more microbes with low relative abundance in samplesfrom patients with periodontitis

To further explore the relationships between bacterialcommunities in healthy patients and patients with peri-odontal disease PCoAwas performed (Figure 6(a)) Analysisof beta diversity based on the weighted UniFrac distancesshowed greater concentration in diseased samples than inhealthy samples In other words the microbial composi-tions of diseased samples were more similar to each otherAs shown in Figure 6(b) PCA of microbial communitiesrevealed that the core genera in healthy samples includedStreptococcus Capnocytophaga Campylobacter VeillonellaAlloprevotella TM7 [G-1] Leptotrichia and Selenomonaswhereas those in samples from patients with periodontitiswere Filifactor Treponema Fretibacterium Porphyromonasand Tannerella

34 Machine Learning and Feature Selection Before applyingthe machine learning algorithm to classify samples it isnecessary to select the features from the samples and trainpredictionmodels Table 4 lists featureswith difference scores

p lt 1E-07 Based on significant differences between healthypatients and patients with periodontitis we selected the top10 microbes with more information as features In total1023 combinations of selected features were generated by ouralgorithm All feature combinations were evaluated by SVMrandom forest logical regression and deep learning machinelearning methods and the average accuracies were 088 093085 and 090 respectively Figure 7 shows the performanceof each machine learning method In general the accuracyof prediction increased slightly with the number of featuresused except in logistic regression From our results wefound that random forests had better predictive ability thanthe other methods Applying combinations consisting ofPeptoniphilaceae sp oral taxon 113 Streptococcus sanguinisMollicutes sp oral taxon 906Aggregatibacter actinomycetem-comitans Porphyromonas gingivalis Peptostreptococcaceae sporal taxon 950 and Lachnospiraceae sp oral taxon 500or Stomatobaculum sp oral taxon 373 Desulfobulbus sporal taxon 041 Peptoniphilaceae sp oral taxon 113 Strep-tococcus sanguinis Aggregatibacter actinomycetemcomitansPorphyromonas gingivalis and Leptotrichia sp oral taxon 218showed that random forests could predict the health status ofsamples accurately The feature combinations having averageaccuracies of more than 094 are reported in Table 5

10 BioMed Research International

minus01

00

01

02

minus02 minus01 00 01 02Axis1 [403]

Axi

s2 [1

35

]

Health statusHMPSP

Sample groupCaseControl

(a)

Prevotella

Fusobacterium

Porphyromonas

Treponema

Streptococcu

s

Leptotri

chia

Capnocytophaga

Alloprevotella

Campylobacter

Neisse

ria

Veillonella

Bacteroidales_[Gminus2]

Tannerella

Fretibacterium

Filifactor

Aggregatibacter

SelenomonasTM7_[Gminus1]minus2

0

2

4

minus4 minus2 0 2PC1 (330 explained var)

PC2

(13

3 ex

plai

ned

var)

Sample group Case Control

(b)

Figure 6 (a) Principal coordinate analysis (PCoA) with weighted UniFrac distance matrixes for bacterial communities associated with thethree health statuses (b) Principal component analysis (PCA) of the dominant genera between samples from healthy patients and patientswith periodontitis Only genera with ge 1 mean relative abundance across all samples are shown

Table 4 Features with significant differences between healthy patients and patients with periodontitis Correlation coefficients and p valueswere determined by Spearmanrsquos rank correlation coefficient and KruskalminusWallis tests respectively Negative correlations indicated that thefeatures were observed more often in patients with periodontitis than in healthy patients

No Feature (Species) Correlation coefficient 119901

1 Stomatobaculum sp oral taxon 373 -0766029754 327E-112 Desulfobulbus sp oral taxon 041 -074877058 890E-113 Peptoniphilaceae sp oral taxon 113 -0723418056 373E-104 Streptococcus sanguinis 071684624 536E-105 Mollicutes sp oral taxon 906 -0709369416 808E-106 Aggregatibacter actinomycetemcomitans -0686608198 274E-097 Porphyromonas gingivalis -0683993685 315E-098 Peptostreptococcaceae sp oral taxon 950 -0681489164 359E-099 Lachnospiraceae sp oral taxon 500 -0670324546 643E-0910 Leptotrichia sp oral taxon 218 -0666642231 777E-0911 Bosea vestrisii 0665468802 826E-0912 Filifactor alocis -0656797473 129E-0813 Mycoplasma faucium -0641322841 279E-0814 Prevotella sp oral taxon 304 -0638587976 320E-0815 Fretibacterium sp oral taxon 359 -0632290825 436E-0816 Bergeyella sp oral taxon 322 0630961524 465E-0817 Tannerella forsythia -0628346704 528E-0818 Peptostreptococcus indolicus -0626504998 577E-0819 Johnsonella sp oral taxon 166 -0622396393 704E-0820 Peptostreptococcaceae [Eubacterium] saphenum -0616735679 924E-08

BioMed Research International 11

Table 5 Feature combinations and their predictive accuracies with different machine learning methods Only feature combinations withmore than 094 average accuracy are shown DL RF and LR represent deep learning random forests and logistic regression respectively

Feature combination DL RF SVM LR Average accuracyStomatobaculum sp oral taxon 373 0967 0973 0960 0933 0958Peptoniphilaceae sp oral taxon 113Desulfobulbaceae sp oral taxon 041

0933 0960 0973 0947 0953Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLachnospiraceae sp oral taxon 500Leptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0933 0973 0960 0947 0953Streptococcus sanguinisAggregatibacter actinomycetemcomitansDesulfobulbaceae sp oral taxon 041

0973 0967 0933 0927 0950Mollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0947 0953 0907 0987 0948Streptococcus sanguinisMollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0960 0967 0947 0913 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Desulfobulbaceae sp oral taxon 041

0933 0973 0933 0947 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0967 0933 0953 0933 0947Peptoniphilaceae sp oral taxon 113Mollicutes sp oral taxon 906Peptoniphilaceae sp oral taxon 113

0960 0987 0867 0967 0945Streptococcus sanguinisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0920 0947 0967 0947 0945Aggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0967 0967 0953 0893 0945Peptoniphilaceae sp oral taxon 113Porphyromonas gingivalisAggregatibacter actinomycetemcomitans

According to previous studies Caruana et al [69 70] pro-posed that the random forest method showed better accuracyin high-dimensional and large-scale data than neural netsSVM and logistic regression In this study we found that therandom forest method was more suitable for small-scale datathan other methods In contrast deep learning approachesled to good performance but required long computation

times and large amounts of memory particularly when thehidden layer size was increased

4 Conclusions

With the development of high-throughput DNA sequencingtechnology the limitations associated with difficult culture of

12 BioMed Research International

075

080

085

090

095

1 2 3 4 5 6 7 8 9 10Number of features in the combinstion

Aver

age a

ccur

acy

Machine learning algorithmsDeep learningLogistic regressionRandom forestSVM

Figure 7 Average accuracies of different numbers of features

manymicrobes that populate the oral cavity can be overcomefacilitating the analysis of bacterial community compositionUsing 16S rRNA sequencing of subgingival samples from 50individuals with periodontitis and 26 periodontally healthycontrols we determined the diversity of and differences incommunity compositions Moreover we identified microbesassociated with good health and periodontal disease andprovided a machine learning method for finding patternsand making predictions for oral microbiota associated withperiodontal disease

Our results showed that there was a higher diversity ofmicrobes in samples from patients with periodontal diseasethan in samples from healthy patients Importantly the coremicrobes in healthy patients were different significantly fromthose in patients with periodontitis We also found thatbacterial communities associated with healthy and diseasedstates were highly different in PCA and PCoA and thecompositions of microorganisms were more similar to eachother in samples from patients with periodontal disease thanin samples from healthy individuals

We proposed a novel feature selection method andinvestigated the potential of machine learning approachesfor determination of health status based on oral metage-nomics data By using nonparametric KruskalminusWallis teststo assess the significance of each microorganism we selectedsignificant microbes to generate prioritized feature combina-tions by our algorithm The performances of four machinelearning approaches were evaluated with these feature com-binations and random forests showed the best performance(average accuracy of 093 from 1023 feature combinations)followed by deep learning SVM and logistic regressionUsing machine learning methods training models couldaccurately predict the health status of samples by examiningfewer features According to our observations the accuracyof prediction generally increased slightly with the numberof features used except for logistic regression Notablycertain combinations composed of fewer features showedbetter accuracy than combinations composed of all selectedfeatures These combinations of features may only apply toour dataset However the results implied that a few relatedfeatures may have better predictive ability than multiple

independent features Therefore in order to improve theprediction accuracy of the model it is essential to identifythe most informative features Due to limitations in fundingtime and ethical considerations it is not easy to obtain largenumbers of oral samples from patients with periodontitisAlthough insufficient and incomplete samples could easilylead to bias and variance in training models our study stillprovided an important basis for further studies

Periodontitis is a chronic inflammatory disease involvingcomplex interactions between the oral microorganisms andthe host immune response In addition to the individualspecies associated with pathogenesis the system-level mech-anisms underlying the transition from a healthy state to adiseased state are key points for studying periodontal diseaseThus in our future studies we aim to elucidate the globalgenetic metabolic and ecological changes associated withperiodontitis and identify the pathogenic features of con-structing machine learning models Rapid molecular tech-niques and machine learning methods capable of identifyingperiodontal bacteria with great accuracy may eventuallyprovide improved classification and diagnosis of varioustypes of periodontal diseases and aid significantly in clinicaldecision-making

Data Availability

The raw sequences of human oral subgingival plaque sampleswere deposited at theNCBI SequenceReadArchive under theBioproject Accession no PRJNA437129

Conflicts of Interest

The authors declare no conflicts of interest

Authorsrsquo Contributions

Wen-Pei Chen and Shih-Hao Chang contributed equally tothis work

Funding

Thepresent work was partially supported by a grant from theMinistry of Science and Technology [grant number MOST107-2218-E-126-001-] and [grant number NSC 102-2622-E-126-002 CC1]

References

[1] L Gao T Xu G Huang S Jiang Y Gu and F Chen ldquoOral mi-crobiomesmore andmore importance in oral cavity and wholebodyrdquo Protein amp Cell pp 1ndash13 2018

[2] P D Marsh ldquoMicrobiology of dental plaque biofilms and theirrole in oral health and cariesrdquo Dental Clinics of North Americavol 54 no 3 pp 441ndash454 2010

[3] P D Marsh ldquoDental plaque as a biofilm and a microbialcommunity - Implications for health and diseaserdquo BMC OralHealth vol 6 no 1 2006

BioMed Research International 13

[4] R J Palmer Jr ldquoComposition and development of oral bacterialcommunitiesrdquo Periodontology 2000 vol 64 no 1 pp 20ndash392014

[5] B L Pihlstrom B S Michalowicz and N W Johnson ldquoPeri-odontal diseasesrdquoe Lancet vol 366 no 9499 pp 1809ndash18202005

[6] R J Genco and T E Van Dyke ldquoReducing the risk of CVD inpatients with periodontitisrdquo Nature Reviews Cardiology vol 7no 9 pp 479-480 2010

[7] K Lundberg N Wegner T Yucel-Lindberg and P J VenablesldquoPeriodontitis in RA-the citrullinated enolase connectionrdquoNature Reviews Rheumatology vol 6 no 12 pp 727ndash730 2010

[8] M Kebschull R T Demmer and P N Papapanou ldquoldquoGumbug leave my heart alonerdquomdashepidemiologic and mechanisticevidence linking periodontal infections and atherosclerosisrdquoJournal of Dental Research vol 89 no 9 pp 879ndash902 2010

[9] S E Whitmore R J Lamont and W E Goldman ldquoOral Bac-teria and Cancerrdquo PLoS Pathogens vol 10 no 3 p e10039332014

[10] Y W Han and X Wang ldquoMobile microbiome oral bacteria inextra-oral infections and inflammationrdquo Journal of Dental Re-search vol 92 no 6 pp 485ndash491 2013

[11] P N Madianos Y A Bobetsis and S Offenbacher ldquoAdversepregnancy outcomes (APOs) and periodontal disease Patho-genic mechanismsrdquo Journal of Clinical Periodontology vol 40no 14 pp S170ndashS180 2013

[12] S Temoin K LWu VWuM Shoham and YW Han ldquoSignalpeptide of FadA adhesin from Fusobacterium nucleatum playsa novel structural role by modulating the filamentrsquos length andwidthrdquo FEBS Letters vol 586 no 1 pp 1ndash6 2012

[13] S Malm M Jusko S Eick J Potempa K Riesbeck and AM Blom ldquoAcquisition of complement inhibitor serine proteasefactor i and its cofactors C4b-binding protein and factor H byprevotella intermediardquo PLoS ONE vol 7 no 4 2012

[14] S S Socransky A D Haffajee M A Cugini C Smith and R LKent Jr ldquoMicrobial complexes in subgingival plaquerdquo Journal ofClinical Periodontology vol 25 no 2 pp 134ndash144 1998

[15] R Teles F Teles J Frias-Lopez B Paster and A HaffajeeldquoLessons learned and unlearned in periodontal microbiologyrdquoPeriodontology 2000 vol 62 no 1 pp 95ndash162 2013

[16] P Belda-Ferre L D Alcaraz R Cabrera-Rubio et al ldquoThe oralmetagenome in health and diseaserdquo e ISME Journal vol 6no 1 pp 46ndash56 2012

[17] P S Kumar A L Griffen M L Moeschberger and E J LeysldquoIdentification of candidate periodontal pathogens and ben-eficial species by quantitative 16S clonal analysisrdquo Journal ofClinical Microbiology vol 43 no 8 pp 3944ndash3955 2005

[18] A P V Colombo S K Boches S L Cotton et al ldquoComparisonsof subgingival microbial profiles of refractory periodontitissevere periodontitis and periodontal health using the humanoral microbe identification microarrayrdquo Journal of Periodontol-ogy vol 80 no 9 pp 1421ndash1432 2009

[19] S Junemann K Prior R Szczepanowski et al ldquoBacterial com-munity shift in treated periodontitis patients revealed by IonTorrent 16S rRNA gene amplicon sequencingrdquo PLoS ONE vol7 no 8 Article ID e41606 2012

[20] P D Schloss S L Westcott T Ryabin et al ldquoIntroduc-ing mothur open-source platform-independent community-supported software for describing and comparing microbialcommunitiesrdquoApplied and EnvironmentalMicrobiology vol 75no 23 pp 7537ndash7541 2009

[21] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquoBioinformatics vol 26 no 19 pp 2460-2461 2010

[22] F Mahe T Rognes C Quince C de Vargas andM DunthornldquoSwarm Robust and fast clusteringmethod for amplicon-basedstudiesrdquo PeerJ vol 2014 no 1 2014

[23] M Ghodsi B Liu and M Pop ldquoDNACLUST Accurate andefficient clustering of phylogenetic marker genesrdquo BMC Bioin-formatics vol 12 article no 271 2011

[24] L Fu B Niu Z Zhu SWu andW Li ldquoCD-HIT accelerated forclustering the next-generation sequencing datardquoBioinformaticsvol 28 no 23 pp 3150ndash3152 2012

[25] H Teeling J Waldmann T Lombardot M Bauer and F OGlockner ldquoTETRA a web-service and a stand-alone programfor the analysis and comparison of tetranucleotide usage pat-terns inDNA sequencesrdquoBMCBioinformatics vol 5 article 1632004

[26] Y Wang H C M Leung S M Yiu and F Y L Chin ldquoMeta-Cluster 40 A novel binning algorithm for NGS reads and hugenumber of speciesrdquo Journal of Computational Biology vol 19no 2 pp 241ndash249 2012

[27] YWangH CM Leung SM Yiu and F Y L Chin ldquoMetaclus-ter 50 a two-roundbinning approach formetagenomic data forlow-abundance species in a noisy samplerdquo Bioinformatics vol28 no 18 Article ID bts397 pp i356ndashi362 2012

[28] D D Kang J Froula R Egan and Z Wang ldquoMetaBAT anefficient tool for accurately reconstructing single genomes fromcomplex microbial communitiesrdquo PeerJ vol 2015 no 8 2015

[29] J Alneberg B S Bjarnason I De Bruijn et al ldquoBinning meta-genomic contigs by coverage and compositionrdquo Nature Meth-ods vol 11 no 11 pp 1144ndash1146 2014

[30] Y-W Wu Y-H Tang S G Tringe B A Simmons and SW Singer ldquoMaxBin An automated binning method to recoverindividual genomes from metagenomes using an expectation-maximization algorithmrdquoMicrobiome vol 2 no 1 2014

[31] QWangGMGarrity JM Tiedje and J R Cole ldquoNaıve Baye-sian classifier for rapid assignment of rRNA sequences into thenew bacterial taxonomyrdquo Applied and Environmental Microbi-ology vol 73 no 16 pp 5261ndash5267 2007

[32] N Chaudhary A K Sharma P Agarwal A Gupta and V KSharma ldquo16S classifier A tool for fast and accurate taxono-mic classification of 16S rRNAhypervariable regions inmetage-nomic datasetsrdquo PLoS ONE vol 10 no 2 2015

[33] S P Szafranski M L Wos-Oxley R Vilchez-Vargas et alldquoHigh-resolution taxonomic profiling of the subgingival micro-biome for biomarker discovery and periodontitis diagnosisrdquoApplied and EnvironmentalMicrobiology vol 81 no 3 pp 1047ndash1058 2015

[34] K Vervier PMaheM Tournoud J-B Veyrieras and J-P VertldquoLarge-scale machine learning for metagenomics sequenceclassificationrdquo Bioinformatics vol 32 no 7 pp 1023ndash1032 2016

[35] A E Darling G Jospin E Lowe F A Matsen H M Bik andJ A Eisen ldquoPhyloSift Phylogenetic analysis of genomes andmetagenomesrdquo PeerJ vol 2013 no 1 2014

[36] Z Liu W Hsiao B L Cantarel E F Drabek and C Fraser-Liggett ldquoSparse distance-based learning for simultaneous mul-ticlass classification and feature selection of metagenomic datardquoBioinformatics vol 27 no 23 pp 3242ndash3249 2011

[37] O Tanaseichuk J Borneman and T Jiang ldquoPhylogeny-basedclassification ofmicrobial communitiesrdquoBioinformatics vol 30no 4 pp 449ndash456 2014

14 BioMed Research International

[38] H Cui and X Zhang ldquoAlignment-free supervised classificationof metagenomes by recursive SVMrdquo BMCGenomics vol 14 no1 article no 641 2013

[39] E Pasolli D T Truong F Malik L Waldron and N SegataldquoMachine Learning Meta-analysis of Large MetagenomicDatasets Tools and Biological Insightsrdquo PLoS ComputationalBiology vol 12 no 7 2016

[40] M Arumugam E D Harrington K U Foerstner J Raes andP Bork ldquoSmashCommunity A metagenomic annotation andanalysis toolrdquo Bioinformatics vol 26 no 23 pp 2977-29782010

[41] K J Hoff T Lingner P Meinicke and M Tech ldquoOrpheliaPredicting genes in metagenomic sequencing readsrdquo NucleicAcids Research vol 37 no 2 pp W101ndashW105 2009

[42] K J HoffM Tech T Lingner R Daniel BMorgenstern and PMeinicke ldquoGene prediction in metagenomic fragments A largescale machine learning approachrdquo BMC Bioinformatics vol 9article no 217 2008

[43] I Guyon and A Elisseeff ldquoAn introduction to variable andfeature selectionrdquo Journal of Machine Learning Research vol 3pp 1157ndash1182 2003

[44] Y Saeys I Inza and P Larranaga ldquoA review of feature selectiontechniques in bioinformaticsrdquo Bioinformatics vol 23 no 19 pp2507ndash2517 2007

[45] W-P Chen S-J J Tsai Y-C Hu and Y-L Lin ldquoMetage-nomic analysis and features selection in human oral microbiotaassociated with periodontal diseaserdquo in Proceedings of the 33rdWorkshop on Combinatorial Mathematics and Computationeory pp 58ndash64 2016

[46] G C Armitage ldquoDevelopment of a classification system forperiodontal diseases and conditionsrdquo Annals of Periodontologyvol 4 no 1 pp 1ndash6 1999

[47] C Y Tang S-M Yiu H-Y Kuo et al ldquoApplication of 16S rRNAmetagenomics to analyze bacterial communities at a respiratorycare centre in Taiwanrdquo Applied Microbiology and Biotechnologyvol 99 no 6 pp 2871ndash2881 2015

[48] L Cai L Ye A H Y Tong S Lok and T Zhang ldquoBiased Di-versityMetrics Revealed byBacterial 16S PyrotagsDerived fromDifferent Primer Setsrdquo PLoS ONE vol 8 no 1 2013

[49] J Zhang K Kobert T Flouri and A Stamatakis ldquoPEAR A fastand accurate Illumina Paired-End reAd mergeRrdquo Bioinformat-ics vol 30 no 5 pp 614ndash620 2014

[50] J G Caporaso J Kuczynski J Stombaugh et al ldquoQIIME allowsanalysis of high-throughput community sequencing datardquoNature Methods vol 7 no 5 pp 335-336 2010

[51] B J Haas D Gevers A M Earl et al ldquoChimeric 16S rRNAsequence formation and detection in Sanger and 454-pyro-sequenced PCR ampliconsrdquo Genome Research vol 21 no 3 pp494ndash504 2011

[52] R C Edgar B J Haas J C Clemente C Quince and RKnight ldquoUCHIME improves sensitivity and speed of chimeradetectionrdquo Bioinformatics vol 27 no 16 pp 2194ndash2200 2011

[53] T ChenW-H Yu J Izard OV BaranovaA Lakshmanan andF E Dewhirst ldquoThe Human Oral Microbiome Database a webaccessible resource for investigating oral microbe taxonomicand genomic informationrdquo Database the journal of biologicaldatabases and curation vol 2010 p baq013 2010

[54] T C J Hill K A Walsh J A Harris and B F Moffett ldquoUsingecological diversity measures with bacterial communitiesrdquoFEMS Microbiology Ecology vol 43 no 1 pp 1ndash11 2003

[55] R PDarveau ldquoPeriodontitis a polymicrobial disruption of hosthomeostasisrdquoNature ReviewsMicrobiology vol 8 no 7 pp 481ndash490 2010

[56] J M Albandar L J Brown and H Loe ldquoPutative PeriodontalPathogens in Subgingival Plaque of Young Adults with andWithout Early-Onset Periodontitisrdquo Journal of Periodontologyvol 68 no 10 pp 973ndash981 1997

[57] Y Takeuchi M Umeda M Sakamoto Y Benno Y Huang andI Ishikawa ldquoTreponema socranskii Treponema denticola andPorphyromonas gingivalis are associated with severity of peri-odontal tissue destructionrdquo Journal of Periodontology vol 72no 10 pp 1354ndash1363 2001

[58] J J Zambon ldquoActinobacillus actinomycetemcomitans in hu-man periodontal diseaserdquo Journal of Clinical Periodontologyvol 12 no 1 pp 1ndash20 1985

[59] J Slots and M Ting ldquoActinobacillus actinomycetemcomitansand Porphyromonas gingivalis in human periodontal diseaseOccurrence and treatmentrdquo Periodontology 2000 vol 20 no 1pp 82ndash121 1999

[60] BKChoi B J Paster F EDewhirst andUBGobel ldquoDiversityof cultivable and uncultivable oral spirochetes from a patientwith severe destructive periodontitisrdquo Infection and Immunityvol 62 no 5 pp 1889ndash1895 1994

[61] B Liu L L Faller N Klitgord et al ldquoDeep sequencing of theoral microbiome reveals signatures of periodontal diseaserdquoPLoS ONE vol 7 no 6 Article ID e37919 2012

[62] J Wang J Qi H Zhao et al ldquoMetagenomic sequencingreveals microbiota and its functional potential associated withperiodontal diseaserdquo Scientific Reports vol 3 2013

[63] A L Griffen C J Beall J H Campbell et al ldquoDistinct and com-plex bacterial profiles in human periodontitis and healthrevealed by 16S pyrosequencingrdquo e ISME Journal vol 6 no6 pp 1176ndash1185 2012

[64] M A Masalma F Armougom W Michael Scheld et al ldquoTheexpansion of the microbiological spectrum of brain abscessesWith use of multiple 16S ribosomal DNA Sequencingrdquo ClinicalInfectious Diseases vol 48 no 9 pp 1169ndash1178 2009

[65] MAlMasalmaM LonjonHRichet et al ldquoMetagenomic anal-ysis of brain abscesses identifies specific bacterial associationsrdquoClinical Infectious Diseases vol 54 no 2 pp 202ndash210 2012

[66] I Nasidze J Li DQuinque K Tang andM Stoneking ldquoGlobaldiversity in the human salivary microbiomerdquoGenome Researchvol 19 no 4 pp 636ndash643 2009

[67] F V Wintzingerode U B Gobel and E Stackebrandt ldquoDeter-mination of microbial diversity in environmental samples Pit-falls of PCR-based rRNA analysisrdquo FEMSMicrobiology Reviewsvol 21 no 3 pp 213ndash229 1997

[68] M Tezal F A Scannapieco J Wactawski-Wende S G Grossiand R J Genco ldquoSupragingival plaque may modify the effectsof subgingival bacteria on attachment lossrdquo Journal of Periodon-tology vol 77 no 5 pp 808ndash813 2006

[69] R Caruana and A Niculescu-Mizil ldquoAn empirical comparisonof supervised learning algorithmsrdquo in Proceedings of the ICML2006 23rd International Conference on Machine Learning pp161ndash168 USA June 2006

[70] R Caruana N Karampatziakis and A Yessenalina ldquoAn empir-ical evaluation of supervised learning in high dimensionsrdquo inProceedings of the 25th International Conference on MachineLearning pp 96ndash103 Finland July 2008

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 6: Composition Analysis and Feature Selection of the Oral

6 BioMed Research International

lowast PorphyromonasPrevotella

lowast Treponemalowast Tannerella

lowast Fretibacterium lowast Filifactor

AggregatibacterBacteroidetes_[Gminus5]

lowast MycoplasmaParvimonas

lowast Bacteroidetes_[Gminus3]Bacteroidetes_[Gminus6]

AnaeroglobusFusobacteriumlowast Streptococcus

lowast CapnocytophagaLeptotrichia

AlloprevotellaCampylobacter

VeillonellaNeisseria

Bacteroidales_[Gminus2]lowast Haemophilus

TM7_[Gminus1]SelenomonasTM7_[Gminus5]lowast Gemella

DialisterCorynebacteriumlowast Granulicatella

ActinomycesSR1_[Gminus1]

Peptostreptococcaceae_[XI][Gminus7]

12 9 6 3 0 3 6 9 12Average abundance

Sample group

Case

Control

Gen

us

Figure 1 Microbial compositions of samples from healthy patients and patients with periodontitis at the genus level The abundances werecalculated by averaging the relative abundances in samples from healthy patients and patients with periodontitis Only genera with gt 05abundance in at least one sample were included Genera with significant differences in abundance between sample groups are indicated withasterisks (lowast) (p value lt 00001)

periodontitis In our study 49 of 76 samples contained TM7bacteria at greater than 1 abundance (average abundanceof 21 in all samples) In samples from healthy patientsand patients with periodontitis the average abundanceswere 32 and 149 respectively However significantenrichment was not observed in samples from patients withperiodontitis Furthermore we found that the subspeciesFusobacterium nucleatum subsp polymorphum which isrelated to periodontal disease and is themember of the orangecluster described by Socransky et al [14] is more abundant inhealthy patients In our results the average abundances were352 and 113 in samples fromhealthy patients and patientswith periodontitis respectively This situation also can beobserved in other three species including Campylobactergracilis Campylobacter rectus and Campylobacter showaeThis discrepancy could be explained by geographic variability[66] or by differences in the depths of the pockets sampled[14] as well as the sample size and the DNA analytic bias [67]Finally Spearmanrsquos rank correlation coefficientwas computed

to assess association between each pair of species associatedwith periodontal disease Figure 3 shows that a very strongrelationship exhibited among species Porphyromonas gingi-valis Treponema denticola and Tannerella forsythia

In our study there are 25 bacterial species with signif-icantly different abundances between healthy patients andpatients with periodontitis The relationships of these speciesto pocket depth and clinical attachment loss were examinedFigure 4 shows that three species Porphyromonas gingivalisTreponema denticola and Tannerella forsythia exhibited avery strong relationship with pocket depth and clinicalattachment loss For instance the three species increasedin abundance with increasing pocket depth and clinicalattachment loss The abundances of those species amongdifferent level of pocket depth and clinical attachment losswere different significantly However it should be noted thatnot only oral microorganisms but also others factors suchas supragingival plaque would affect the pocket depth andclinical attachment loss [68]

BioMed Research International 7

lowast Porphyromonas gingivalisPrevotella intermedia

Fusobacterium nucleatum_subsp_vincentiilowast Treponema denticola

lowast Fusobacterium nucleatum_subsp_nucleatumlowast Tannerella forsythia

Porphyromonas endodontalislowast Filifactor alocis

lowast Treponema vincentiiTreponema sp_oral_taxon_231

lowast Treponema socranskiilowast Aggregatibacter actinomycetemcomitans

lowast Fretibacterium sp_oral_taxon_359lowast Leptotrichia sp_oral_taxon_218lowast Treponema sp_oral_taxon_237

Prevotella sp_oral_taxon_313Prevotella pleuritidis

Bacteroidetes sp_oral_taxon_511lowast Mycoplasma faucium

lowast Fretibacterium sp_oral_taxon_360Parvimonas micra

Bacteroidetes sp_oral_taxon_516Campylobacter sp_oral_taxon_044

lowast Prevotella sp_oral_taxon_304Anaeroglobus geminatus

9 6 3 0 3 6Average abundance

Sample group

Case

Control

Spec

ies

lowast Streptococcus sp_oral_taxon_423lowast Fusobacterium nucleatum_subsp_polymorphum

Veillonella parvula

Fusobacterium nucleatum_subsp_animalisBacteroidales sp_oral_taxon_274

Prevotella loescheiilowast Streptococcus sanguinis

Alloprevotella tanneraelowast Haemophilus parainfluenzae

Campylobacter gracilislowast Fusobacterium sp_HOT_204

Porphyromonas catoniaelowast Capnocytophaga granulosa

Neisseria flavescensNeisseria sicca

TM7sp_oral_taxon_356Prevotella oris

Streptococcus cristatusCapnocytophaga leadbetteri

Leptotrichia sp_oral_taxon_417lowast Gemella morbillorumStreptococcus dentisani

lowast Campylobacter showaeAlloprevotella sp_oral_taxon_473

Corynebacterium matruchotiiDialister invisus

Leptotrichia sp_oral_taxon_225lowast Capnocytophaga sp_oral_taxon_412

Capnocytophaga sputigenalowast Granulicatella adiacens

Prevotella pallensStreptococcus sp_oral_taxon_058lowast Prevotella sp_oral_taxon_300

Aggregatibacter segnisCapnocytophaga sp_oral_taxon_326

TM7sp_oral_taxon_346Leptotrichia hongkongensis

Campylobacter concisusLeptotrichia sp_oral_taxon_392

Figure 2 Microbial compositions of samples from healthy patients and patients with periodontitis at the species levelThe abundances werecalculated by averaging the relative abundances in samples from healthy patients and patients with periodontitis Only species with gt 05abundance in at least one sample are shown Species with significant differences in abundance between sample groups are indicated withasterisks (lowast) (p value lt 00001)

8 BioMed Research International

F nuc ss nucleatum

C rectus

P intermedia

P gingivalis

T forsythia

T denticolaF nuc ss polymorphum

C showae

P micra

F nuc ss vincentiiC gracilis

P nigrescens

F nuc s

s nucle

atum

C rectu

s

P inter

media

P gingiva

lis

T forsy

thia

T denticola

F nuc s

s polym

orphum

C showae

P micr

aF n

uc ss v

incentii

C gracilis

P nigres

cens

1

08

06

04

02

0

-02

-04

-06

-08

-1

Figure 3 The relationships among species were evaluated using Spearmanrsquos rank correlation coefficient

p lt 00001

Porphyromonas gingivalis Treponema denticola Tannerella forsythia

12

9

6

3

12

9

6

3

12

9

6

3

15

12

lt 3 3 4 5 6 gt 7

Porphyromonas gingivalis

p lt 00001

p lt 00001 p lt 00001 p lt 00001

lt 3 3 4 5 6 gt 7

Treponema denticola

p lt 00001 15

12

lt 3 3 4 5 6 gt 7Pocket depth (mm)

Tannerella forsythia

9 9

6 6

3

15

12

9

6

3 3

lt 3 3 4 5 6 gt 7 lt 3 3 4 5 6 gt 7 lt 3 3 4 5 6 gt 7Clinical attachment loss (mm)

Aver

age a

bund

ance

Aver

age a

bund

ance

Figure 4 Relationships of the average abundance of three species to selected pocket depths and clinical attachment loss levels Significanceof differences among pocket depth levels was tested using the Kruskal-Wallis test

33 Diversity of Bacterial Community Profiles To evaluatethe alpha diversity of the microbial communities Shannonindex curves scores and richness metrics (Observed Chao1and ACE) were applied as shown in Figure 5 As depictedin Figure 5(a) the Shannon diversity index curves clearlyreached plateau levels after the sequence number exceeded5000 in all three health statuses indicating that the microbial

composition for each health status was well represented bythe sequencing depth As shown in Figure 5(b) the averagerichness measured by Observed Chao1 and Ace indexeswas higher in samples from patients with periodontitis thanin samples from healthy individuals however these resultswere in contrast to the results from the Shannon diversityindex Thus the relative abundance of each microbe was

BioMed Research International 9

6

Health StatusHMP SP

4

2

0

0 5000 10000 15000Sequences Per Sample

Aver

age S

hann

on In

dex

(a)

Observed Chao1 ACE

Case

Control

Case

Control

Case

Control

0

100

200

300

400

500

Sample group

Alp

ha D

iver

sity M

easu

re

Shannon

Case

Control

30

35

40

45

p value = 0166

(b)

Figure 5 (a) The sequencing depths measured by average scores from the Shannon index reached a plateau when the sequence numberexceeded 5000 (b) Alpha-diversity metrics (richness and Shannon index) were employed to measure the microbial communities of samplesfrom healthy patients and patients with periodontitis The average richness of microbes was higher in patients with periodontal disease thanin healthy patients however the microbial communities of healthy patients exhibited higher Shannon indexes

more balanced in samples from healthy individuals than insamples from patients with periodontal disease and therewere more microbes with low relative abundance in samplesfrom patients with periodontitis

To further explore the relationships between bacterialcommunities in healthy patients and patients with peri-odontal disease PCoAwas performed (Figure 6(a)) Analysisof beta diversity based on the weighted UniFrac distancesshowed greater concentration in diseased samples than inhealthy samples In other words the microbial composi-tions of diseased samples were more similar to each otherAs shown in Figure 6(b) PCA of microbial communitiesrevealed that the core genera in healthy samples includedStreptococcus Capnocytophaga Campylobacter VeillonellaAlloprevotella TM7 [G-1] Leptotrichia and Selenomonaswhereas those in samples from patients with periodontitiswere Filifactor Treponema Fretibacterium Porphyromonasand Tannerella

34 Machine Learning and Feature Selection Before applyingthe machine learning algorithm to classify samples it isnecessary to select the features from the samples and trainpredictionmodels Table 4 lists featureswith difference scores

p lt 1E-07 Based on significant differences between healthypatients and patients with periodontitis we selected the top10 microbes with more information as features In total1023 combinations of selected features were generated by ouralgorithm All feature combinations were evaluated by SVMrandom forest logical regression and deep learning machinelearning methods and the average accuracies were 088 093085 and 090 respectively Figure 7 shows the performanceof each machine learning method In general the accuracyof prediction increased slightly with the number of featuresused except in logistic regression From our results wefound that random forests had better predictive ability thanthe other methods Applying combinations consisting ofPeptoniphilaceae sp oral taxon 113 Streptococcus sanguinisMollicutes sp oral taxon 906Aggregatibacter actinomycetem-comitans Porphyromonas gingivalis Peptostreptococcaceae sporal taxon 950 and Lachnospiraceae sp oral taxon 500or Stomatobaculum sp oral taxon 373 Desulfobulbus sporal taxon 041 Peptoniphilaceae sp oral taxon 113 Strep-tococcus sanguinis Aggregatibacter actinomycetemcomitansPorphyromonas gingivalis and Leptotrichia sp oral taxon 218showed that random forests could predict the health status ofsamples accurately The feature combinations having averageaccuracies of more than 094 are reported in Table 5

10 BioMed Research International

minus01

00

01

02

minus02 minus01 00 01 02Axis1 [403]

Axi

s2 [1

35

]

Health statusHMPSP

Sample groupCaseControl

(a)

Prevotella

Fusobacterium

Porphyromonas

Treponema

Streptococcu

s

Leptotri

chia

Capnocytophaga

Alloprevotella

Campylobacter

Neisse

ria

Veillonella

Bacteroidales_[Gminus2]

Tannerella

Fretibacterium

Filifactor

Aggregatibacter

SelenomonasTM7_[Gminus1]minus2

0

2

4

minus4 minus2 0 2PC1 (330 explained var)

PC2

(13

3 ex

plai

ned

var)

Sample group Case Control

(b)

Figure 6 (a) Principal coordinate analysis (PCoA) with weighted UniFrac distance matrixes for bacterial communities associated with thethree health statuses (b) Principal component analysis (PCA) of the dominant genera between samples from healthy patients and patientswith periodontitis Only genera with ge 1 mean relative abundance across all samples are shown

Table 4 Features with significant differences between healthy patients and patients with periodontitis Correlation coefficients and p valueswere determined by Spearmanrsquos rank correlation coefficient and KruskalminusWallis tests respectively Negative correlations indicated that thefeatures were observed more often in patients with periodontitis than in healthy patients

No Feature (Species) Correlation coefficient 119901

1 Stomatobaculum sp oral taxon 373 -0766029754 327E-112 Desulfobulbus sp oral taxon 041 -074877058 890E-113 Peptoniphilaceae sp oral taxon 113 -0723418056 373E-104 Streptococcus sanguinis 071684624 536E-105 Mollicutes sp oral taxon 906 -0709369416 808E-106 Aggregatibacter actinomycetemcomitans -0686608198 274E-097 Porphyromonas gingivalis -0683993685 315E-098 Peptostreptococcaceae sp oral taxon 950 -0681489164 359E-099 Lachnospiraceae sp oral taxon 500 -0670324546 643E-0910 Leptotrichia sp oral taxon 218 -0666642231 777E-0911 Bosea vestrisii 0665468802 826E-0912 Filifactor alocis -0656797473 129E-0813 Mycoplasma faucium -0641322841 279E-0814 Prevotella sp oral taxon 304 -0638587976 320E-0815 Fretibacterium sp oral taxon 359 -0632290825 436E-0816 Bergeyella sp oral taxon 322 0630961524 465E-0817 Tannerella forsythia -0628346704 528E-0818 Peptostreptococcus indolicus -0626504998 577E-0819 Johnsonella sp oral taxon 166 -0622396393 704E-0820 Peptostreptococcaceae [Eubacterium] saphenum -0616735679 924E-08

BioMed Research International 11

Table 5 Feature combinations and their predictive accuracies with different machine learning methods Only feature combinations withmore than 094 average accuracy are shown DL RF and LR represent deep learning random forests and logistic regression respectively

Feature combination DL RF SVM LR Average accuracyStomatobaculum sp oral taxon 373 0967 0973 0960 0933 0958Peptoniphilaceae sp oral taxon 113Desulfobulbaceae sp oral taxon 041

0933 0960 0973 0947 0953Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLachnospiraceae sp oral taxon 500Leptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0933 0973 0960 0947 0953Streptococcus sanguinisAggregatibacter actinomycetemcomitansDesulfobulbaceae sp oral taxon 041

0973 0967 0933 0927 0950Mollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0947 0953 0907 0987 0948Streptococcus sanguinisMollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0960 0967 0947 0913 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Desulfobulbaceae sp oral taxon 041

0933 0973 0933 0947 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0967 0933 0953 0933 0947Peptoniphilaceae sp oral taxon 113Mollicutes sp oral taxon 906Peptoniphilaceae sp oral taxon 113

0960 0987 0867 0967 0945Streptococcus sanguinisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0920 0947 0967 0947 0945Aggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0967 0967 0953 0893 0945Peptoniphilaceae sp oral taxon 113Porphyromonas gingivalisAggregatibacter actinomycetemcomitans

According to previous studies Caruana et al [69 70] pro-posed that the random forest method showed better accuracyin high-dimensional and large-scale data than neural netsSVM and logistic regression In this study we found that therandom forest method was more suitable for small-scale datathan other methods In contrast deep learning approachesled to good performance but required long computation

times and large amounts of memory particularly when thehidden layer size was increased

4 Conclusions

With the development of high-throughput DNA sequencingtechnology the limitations associated with difficult culture of

12 BioMed Research International

075

080

085

090

095

1 2 3 4 5 6 7 8 9 10Number of features in the combinstion

Aver

age a

ccur

acy

Machine learning algorithmsDeep learningLogistic regressionRandom forestSVM

Figure 7 Average accuracies of different numbers of features

manymicrobes that populate the oral cavity can be overcomefacilitating the analysis of bacterial community compositionUsing 16S rRNA sequencing of subgingival samples from 50individuals with periodontitis and 26 periodontally healthycontrols we determined the diversity of and differences incommunity compositions Moreover we identified microbesassociated with good health and periodontal disease andprovided a machine learning method for finding patternsand making predictions for oral microbiota associated withperiodontal disease

Our results showed that there was a higher diversity ofmicrobes in samples from patients with periodontal diseasethan in samples from healthy patients Importantly the coremicrobes in healthy patients were different significantly fromthose in patients with periodontitis We also found thatbacterial communities associated with healthy and diseasedstates were highly different in PCA and PCoA and thecompositions of microorganisms were more similar to eachother in samples from patients with periodontal disease thanin samples from healthy individuals

We proposed a novel feature selection method andinvestigated the potential of machine learning approachesfor determination of health status based on oral metage-nomics data By using nonparametric KruskalminusWallis teststo assess the significance of each microorganism we selectedsignificant microbes to generate prioritized feature combina-tions by our algorithm The performances of four machinelearning approaches were evaluated with these feature com-binations and random forests showed the best performance(average accuracy of 093 from 1023 feature combinations)followed by deep learning SVM and logistic regressionUsing machine learning methods training models couldaccurately predict the health status of samples by examiningfewer features According to our observations the accuracyof prediction generally increased slightly with the numberof features used except for logistic regression Notablycertain combinations composed of fewer features showedbetter accuracy than combinations composed of all selectedfeatures These combinations of features may only apply toour dataset However the results implied that a few relatedfeatures may have better predictive ability than multiple

independent features Therefore in order to improve theprediction accuracy of the model it is essential to identifythe most informative features Due to limitations in fundingtime and ethical considerations it is not easy to obtain largenumbers of oral samples from patients with periodontitisAlthough insufficient and incomplete samples could easilylead to bias and variance in training models our study stillprovided an important basis for further studies

Periodontitis is a chronic inflammatory disease involvingcomplex interactions between the oral microorganisms andthe host immune response In addition to the individualspecies associated with pathogenesis the system-level mech-anisms underlying the transition from a healthy state to adiseased state are key points for studying periodontal diseaseThus in our future studies we aim to elucidate the globalgenetic metabolic and ecological changes associated withperiodontitis and identify the pathogenic features of con-structing machine learning models Rapid molecular tech-niques and machine learning methods capable of identifyingperiodontal bacteria with great accuracy may eventuallyprovide improved classification and diagnosis of varioustypes of periodontal diseases and aid significantly in clinicaldecision-making

Data Availability

The raw sequences of human oral subgingival plaque sampleswere deposited at theNCBI SequenceReadArchive under theBioproject Accession no PRJNA437129

Conflicts of Interest

The authors declare no conflicts of interest

Authorsrsquo Contributions

Wen-Pei Chen and Shih-Hao Chang contributed equally tothis work

Funding

Thepresent work was partially supported by a grant from theMinistry of Science and Technology [grant number MOST107-2218-E-126-001-] and [grant number NSC 102-2622-E-126-002 CC1]

References

[1] L Gao T Xu G Huang S Jiang Y Gu and F Chen ldquoOral mi-crobiomesmore andmore importance in oral cavity and wholebodyrdquo Protein amp Cell pp 1ndash13 2018

[2] P D Marsh ldquoMicrobiology of dental plaque biofilms and theirrole in oral health and cariesrdquo Dental Clinics of North Americavol 54 no 3 pp 441ndash454 2010

[3] P D Marsh ldquoDental plaque as a biofilm and a microbialcommunity - Implications for health and diseaserdquo BMC OralHealth vol 6 no 1 2006

BioMed Research International 13

[4] R J Palmer Jr ldquoComposition and development of oral bacterialcommunitiesrdquo Periodontology 2000 vol 64 no 1 pp 20ndash392014

[5] B L Pihlstrom B S Michalowicz and N W Johnson ldquoPeri-odontal diseasesrdquoe Lancet vol 366 no 9499 pp 1809ndash18202005

[6] R J Genco and T E Van Dyke ldquoReducing the risk of CVD inpatients with periodontitisrdquo Nature Reviews Cardiology vol 7no 9 pp 479-480 2010

[7] K Lundberg N Wegner T Yucel-Lindberg and P J VenablesldquoPeriodontitis in RA-the citrullinated enolase connectionrdquoNature Reviews Rheumatology vol 6 no 12 pp 727ndash730 2010

[8] M Kebschull R T Demmer and P N Papapanou ldquoldquoGumbug leave my heart alonerdquomdashepidemiologic and mechanisticevidence linking periodontal infections and atherosclerosisrdquoJournal of Dental Research vol 89 no 9 pp 879ndash902 2010

[9] S E Whitmore R J Lamont and W E Goldman ldquoOral Bac-teria and Cancerrdquo PLoS Pathogens vol 10 no 3 p e10039332014

[10] Y W Han and X Wang ldquoMobile microbiome oral bacteria inextra-oral infections and inflammationrdquo Journal of Dental Re-search vol 92 no 6 pp 485ndash491 2013

[11] P N Madianos Y A Bobetsis and S Offenbacher ldquoAdversepregnancy outcomes (APOs) and periodontal disease Patho-genic mechanismsrdquo Journal of Clinical Periodontology vol 40no 14 pp S170ndashS180 2013

[12] S Temoin K LWu VWuM Shoham and YW Han ldquoSignalpeptide of FadA adhesin from Fusobacterium nucleatum playsa novel structural role by modulating the filamentrsquos length andwidthrdquo FEBS Letters vol 586 no 1 pp 1ndash6 2012

[13] S Malm M Jusko S Eick J Potempa K Riesbeck and AM Blom ldquoAcquisition of complement inhibitor serine proteasefactor i and its cofactors C4b-binding protein and factor H byprevotella intermediardquo PLoS ONE vol 7 no 4 2012

[14] S S Socransky A D Haffajee M A Cugini C Smith and R LKent Jr ldquoMicrobial complexes in subgingival plaquerdquo Journal ofClinical Periodontology vol 25 no 2 pp 134ndash144 1998

[15] R Teles F Teles J Frias-Lopez B Paster and A HaffajeeldquoLessons learned and unlearned in periodontal microbiologyrdquoPeriodontology 2000 vol 62 no 1 pp 95ndash162 2013

[16] P Belda-Ferre L D Alcaraz R Cabrera-Rubio et al ldquoThe oralmetagenome in health and diseaserdquo e ISME Journal vol 6no 1 pp 46ndash56 2012

[17] P S Kumar A L Griffen M L Moeschberger and E J LeysldquoIdentification of candidate periodontal pathogens and ben-eficial species by quantitative 16S clonal analysisrdquo Journal ofClinical Microbiology vol 43 no 8 pp 3944ndash3955 2005

[18] A P V Colombo S K Boches S L Cotton et al ldquoComparisonsof subgingival microbial profiles of refractory periodontitissevere periodontitis and periodontal health using the humanoral microbe identification microarrayrdquo Journal of Periodontol-ogy vol 80 no 9 pp 1421ndash1432 2009

[19] S Junemann K Prior R Szczepanowski et al ldquoBacterial com-munity shift in treated periodontitis patients revealed by IonTorrent 16S rRNA gene amplicon sequencingrdquo PLoS ONE vol7 no 8 Article ID e41606 2012

[20] P D Schloss S L Westcott T Ryabin et al ldquoIntroduc-ing mothur open-source platform-independent community-supported software for describing and comparing microbialcommunitiesrdquoApplied and EnvironmentalMicrobiology vol 75no 23 pp 7537ndash7541 2009

[21] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquoBioinformatics vol 26 no 19 pp 2460-2461 2010

[22] F Mahe T Rognes C Quince C de Vargas andM DunthornldquoSwarm Robust and fast clusteringmethod for amplicon-basedstudiesrdquo PeerJ vol 2014 no 1 2014

[23] M Ghodsi B Liu and M Pop ldquoDNACLUST Accurate andefficient clustering of phylogenetic marker genesrdquo BMC Bioin-formatics vol 12 article no 271 2011

[24] L Fu B Niu Z Zhu SWu andW Li ldquoCD-HIT accelerated forclustering the next-generation sequencing datardquoBioinformaticsvol 28 no 23 pp 3150ndash3152 2012

[25] H Teeling J Waldmann T Lombardot M Bauer and F OGlockner ldquoTETRA a web-service and a stand-alone programfor the analysis and comparison of tetranucleotide usage pat-terns inDNA sequencesrdquoBMCBioinformatics vol 5 article 1632004

[26] Y Wang H C M Leung S M Yiu and F Y L Chin ldquoMeta-Cluster 40 A novel binning algorithm for NGS reads and hugenumber of speciesrdquo Journal of Computational Biology vol 19no 2 pp 241ndash249 2012

[27] YWangH CM Leung SM Yiu and F Y L Chin ldquoMetaclus-ter 50 a two-roundbinning approach formetagenomic data forlow-abundance species in a noisy samplerdquo Bioinformatics vol28 no 18 Article ID bts397 pp i356ndashi362 2012

[28] D D Kang J Froula R Egan and Z Wang ldquoMetaBAT anefficient tool for accurately reconstructing single genomes fromcomplex microbial communitiesrdquo PeerJ vol 2015 no 8 2015

[29] J Alneberg B S Bjarnason I De Bruijn et al ldquoBinning meta-genomic contigs by coverage and compositionrdquo Nature Meth-ods vol 11 no 11 pp 1144ndash1146 2014

[30] Y-W Wu Y-H Tang S G Tringe B A Simmons and SW Singer ldquoMaxBin An automated binning method to recoverindividual genomes from metagenomes using an expectation-maximization algorithmrdquoMicrobiome vol 2 no 1 2014

[31] QWangGMGarrity JM Tiedje and J R Cole ldquoNaıve Baye-sian classifier for rapid assignment of rRNA sequences into thenew bacterial taxonomyrdquo Applied and Environmental Microbi-ology vol 73 no 16 pp 5261ndash5267 2007

[32] N Chaudhary A K Sharma P Agarwal A Gupta and V KSharma ldquo16S classifier A tool for fast and accurate taxono-mic classification of 16S rRNAhypervariable regions inmetage-nomic datasetsrdquo PLoS ONE vol 10 no 2 2015

[33] S P Szafranski M L Wos-Oxley R Vilchez-Vargas et alldquoHigh-resolution taxonomic profiling of the subgingival micro-biome for biomarker discovery and periodontitis diagnosisrdquoApplied and EnvironmentalMicrobiology vol 81 no 3 pp 1047ndash1058 2015

[34] K Vervier PMaheM Tournoud J-B Veyrieras and J-P VertldquoLarge-scale machine learning for metagenomics sequenceclassificationrdquo Bioinformatics vol 32 no 7 pp 1023ndash1032 2016

[35] A E Darling G Jospin E Lowe F A Matsen H M Bik andJ A Eisen ldquoPhyloSift Phylogenetic analysis of genomes andmetagenomesrdquo PeerJ vol 2013 no 1 2014

[36] Z Liu W Hsiao B L Cantarel E F Drabek and C Fraser-Liggett ldquoSparse distance-based learning for simultaneous mul-ticlass classification and feature selection of metagenomic datardquoBioinformatics vol 27 no 23 pp 3242ndash3249 2011

[37] O Tanaseichuk J Borneman and T Jiang ldquoPhylogeny-basedclassification ofmicrobial communitiesrdquoBioinformatics vol 30no 4 pp 449ndash456 2014

14 BioMed Research International

[38] H Cui and X Zhang ldquoAlignment-free supervised classificationof metagenomes by recursive SVMrdquo BMCGenomics vol 14 no1 article no 641 2013

[39] E Pasolli D T Truong F Malik L Waldron and N SegataldquoMachine Learning Meta-analysis of Large MetagenomicDatasets Tools and Biological Insightsrdquo PLoS ComputationalBiology vol 12 no 7 2016

[40] M Arumugam E D Harrington K U Foerstner J Raes andP Bork ldquoSmashCommunity A metagenomic annotation andanalysis toolrdquo Bioinformatics vol 26 no 23 pp 2977-29782010

[41] K J Hoff T Lingner P Meinicke and M Tech ldquoOrpheliaPredicting genes in metagenomic sequencing readsrdquo NucleicAcids Research vol 37 no 2 pp W101ndashW105 2009

[42] K J HoffM Tech T Lingner R Daniel BMorgenstern and PMeinicke ldquoGene prediction in metagenomic fragments A largescale machine learning approachrdquo BMC Bioinformatics vol 9article no 217 2008

[43] I Guyon and A Elisseeff ldquoAn introduction to variable andfeature selectionrdquo Journal of Machine Learning Research vol 3pp 1157ndash1182 2003

[44] Y Saeys I Inza and P Larranaga ldquoA review of feature selectiontechniques in bioinformaticsrdquo Bioinformatics vol 23 no 19 pp2507ndash2517 2007

[45] W-P Chen S-J J Tsai Y-C Hu and Y-L Lin ldquoMetage-nomic analysis and features selection in human oral microbiotaassociated with periodontal diseaserdquo in Proceedings of the 33rdWorkshop on Combinatorial Mathematics and Computationeory pp 58ndash64 2016

[46] G C Armitage ldquoDevelopment of a classification system forperiodontal diseases and conditionsrdquo Annals of Periodontologyvol 4 no 1 pp 1ndash6 1999

[47] C Y Tang S-M Yiu H-Y Kuo et al ldquoApplication of 16S rRNAmetagenomics to analyze bacterial communities at a respiratorycare centre in Taiwanrdquo Applied Microbiology and Biotechnologyvol 99 no 6 pp 2871ndash2881 2015

[48] L Cai L Ye A H Y Tong S Lok and T Zhang ldquoBiased Di-versityMetrics Revealed byBacterial 16S PyrotagsDerived fromDifferent Primer Setsrdquo PLoS ONE vol 8 no 1 2013

[49] J Zhang K Kobert T Flouri and A Stamatakis ldquoPEAR A fastand accurate Illumina Paired-End reAd mergeRrdquo Bioinformat-ics vol 30 no 5 pp 614ndash620 2014

[50] J G Caporaso J Kuczynski J Stombaugh et al ldquoQIIME allowsanalysis of high-throughput community sequencing datardquoNature Methods vol 7 no 5 pp 335-336 2010

[51] B J Haas D Gevers A M Earl et al ldquoChimeric 16S rRNAsequence formation and detection in Sanger and 454-pyro-sequenced PCR ampliconsrdquo Genome Research vol 21 no 3 pp494ndash504 2011

[52] R C Edgar B J Haas J C Clemente C Quince and RKnight ldquoUCHIME improves sensitivity and speed of chimeradetectionrdquo Bioinformatics vol 27 no 16 pp 2194ndash2200 2011

[53] T ChenW-H Yu J Izard OV BaranovaA Lakshmanan andF E Dewhirst ldquoThe Human Oral Microbiome Database a webaccessible resource for investigating oral microbe taxonomicand genomic informationrdquo Database the journal of biologicaldatabases and curation vol 2010 p baq013 2010

[54] T C J Hill K A Walsh J A Harris and B F Moffett ldquoUsingecological diversity measures with bacterial communitiesrdquoFEMS Microbiology Ecology vol 43 no 1 pp 1ndash11 2003

[55] R PDarveau ldquoPeriodontitis a polymicrobial disruption of hosthomeostasisrdquoNature ReviewsMicrobiology vol 8 no 7 pp 481ndash490 2010

[56] J M Albandar L J Brown and H Loe ldquoPutative PeriodontalPathogens in Subgingival Plaque of Young Adults with andWithout Early-Onset Periodontitisrdquo Journal of Periodontologyvol 68 no 10 pp 973ndash981 1997

[57] Y Takeuchi M Umeda M Sakamoto Y Benno Y Huang andI Ishikawa ldquoTreponema socranskii Treponema denticola andPorphyromonas gingivalis are associated with severity of peri-odontal tissue destructionrdquo Journal of Periodontology vol 72no 10 pp 1354ndash1363 2001

[58] J J Zambon ldquoActinobacillus actinomycetemcomitans in hu-man periodontal diseaserdquo Journal of Clinical Periodontologyvol 12 no 1 pp 1ndash20 1985

[59] J Slots and M Ting ldquoActinobacillus actinomycetemcomitansand Porphyromonas gingivalis in human periodontal diseaseOccurrence and treatmentrdquo Periodontology 2000 vol 20 no 1pp 82ndash121 1999

[60] BKChoi B J Paster F EDewhirst andUBGobel ldquoDiversityof cultivable and uncultivable oral spirochetes from a patientwith severe destructive periodontitisrdquo Infection and Immunityvol 62 no 5 pp 1889ndash1895 1994

[61] B Liu L L Faller N Klitgord et al ldquoDeep sequencing of theoral microbiome reveals signatures of periodontal diseaserdquoPLoS ONE vol 7 no 6 Article ID e37919 2012

[62] J Wang J Qi H Zhao et al ldquoMetagenomic sequencingreveals microbiota and its functional potential associated withperiodontal diseaserdquo Scientific Reports vol 3 2013

[63] A L Griffen C J Beall J H Campbell et al ldquoDistinct and com-plex bacterial profiles in human periodontitis and healthrevealed by 16S pyrosequencingrdquo e ISME Journal vol 6 no6 pp 1176ndash1185 2012

[64] M A Masalma F Armougom W Michael Scheld et al ldquoTheexpansion of the microbiological spectrum of brain abscessesWith use of multiple 16S ribosomal DNA Sequencingrdquo ClinicalInfectious Diseases vol 48 no 9 pp 1169ndash1178 2009

[65] MAlMasalmaM LonjonHRichet et al ldquoMetagenomic anal-ysis of brain abscesses identifies specific bacterial associationsrdquoClinical Infectious Diseases vol 54 no 2 pp 202ndash210 2012

[66] I Nasidze J Li DQuinque K Tang andM Stoneking ldquoGlobaldiversity in the human salivary microbiomerdquoGenome Researchvol 19 no 4 pp 636ndash643 2009

[67] F V Wintzingerode U B Gobel and E Stackebrandt ldquoDeter-mination of microbial diversity in environmental samples Pit-falls of PCR-based rRNA analysisrdquo FEMSMicrobiology Reviewsvol 21 no 3 pp 213ndash229 1997

[68] M Tezal F A Scannapieco J Wactawski-Wende S G Grossiand R J Genco ldquoSupragingival plaque may modify the effectsof subgingival bacteria on attachment lossrdquo Journal of Periodon-tology vol 77 no 5 pp 808ndash813 2006

[69] R Caruana and A Niculescu-Mizil ldquoAn empirical comparisonof supervised learning algorithmsrdquo in Proceedings of the ICML2006 23rd International Conference on Machine Learning pp161ndash168 USA June 2006

[70] R Caruana N Karampatziakis and A Yessenalina ldquoAn empir-ical evaluation of supervised learning in high dimensionsrdquo inProceedings of the 25th International Conference on MachineLearning pp 96ndash103 Finland July 2008

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 7: Composition Analysis and Feature Selection of the Oral

BioMed Research International 7

lowast Porphyromonas gingivalisPrevotella intermedia

Fusobacterium nucleatum_subsp_vincentiilowast Treponema denticola

lowast Fusobacterium nucleatum_subsp_nucleatumlowast Tannerella forsythia

Porphyromonas endodontalislowast Filifactor alocis

lowast Treponema vincentiiTreponema sp_oral_taxon_231

lowast Treponema socranskiilowast Aggregatibacter actinomycetemcomitans

lowast Fretibacterium sp_oral_taxon_359lowast Leptotrichia sp_oral_taxon_218lowast Treponema sp_oral_taxon_237

Prevotella sp_oral_taxon_313Prevotella pleuritidis

Bacteroidetes sp_oral_taxon_511lowast Mycoplasma faucium

lowast Fretibacterium sp_oral_taxon_360Parvimonas micra

Bacteroidetes sp_oral_taxon_516Campylobacter sp_oral_taxon_044

lowast Prevotella sp_oral_taxon_304Anaeroglobus geminatus

9 6 3 0 3 6Average abundance

Sample group

Case

Control

Spec

ies

lowast Streptococcus sp_oral_taxon_423lowast Fusobacterium nucleatum_subsp_polymorphum

Veillonella parvula

Fusobacterium nucleatum_subsp_animalisBacteroidales sp_oral_taxon_274

Prevotella loescheiilowast Streptococcus sanguinis

Alloprevotella tanneraelowast Haemophilus parainfluenzae

Campylobacter gracilislowast Fusobacterium sp_HOT_204

Porphyromonas catoniaelowast Capnocytophaga granulosa

Neisseria flavescensNeisseria sicca

TM7sp_oral_taxon_356Prevotella oris

Streptococcus cristatusCapnocytophaga leadbetteri

Leptotrichia sp_oral_taxon_417lowast Gemella morbillorumStreptococcus dentisani

lowast Campylobacter showaeAlloprevotella sp_oral_taxon_473

Corynebacterium matruchotiiDialister invisus

Leptotrichia sp_oral_taxon_225lowast Capnocytophaga sp_oral_taxon_412

Capnocytophaga sputigenalowast Granulicatella adiacens

Prevotella pallensStreptococcus sp_oral_taxon_058lowast Prevotella sp_oral_taxon_300

Aggregatibacter segnisCapnocytophaga sp_oral_taxon_326

TM7sp_oral_taxon_346Leptotrichia hongkongensis

Campylobacter concisusLeptotrichia sp_oral_taxon_392

Figure 2 Microbial compositions of samples from healthy patients and patients with periodontitis at the species levelThe abundances werecalculated by averaging the relative abundances in samples from healthy patients and patients with periodontitis Only species with gt 05abundance in at least one sample are shown Species with significant differences in abundance between sample groups are indicated withasterisks (lowast) (p value lt 00001)

8 BioMed Research International

F nuc ss nucleatum

C rectus

P intermedia

P gingivalis

T forsythia

T denticolaF nuc ss polymorphum

C showae

P micra

F nuc ss vincentiiC gracilis

P nigrescens

F nuc s

s nucle

atum

C rectu

s

P inter

media

P gingiva

lis

T forsy

thia

T denticola

F nuc s

s polym

orphum

C showae

P micr

aF n

uc ss v

incentii

C gracilis

P nigres

cens

1

08

06

04

02

0

-02

-04

-06

-08

-1

Figure 3 The relationships among species were evaluated using Spearmanrsquos rank correlation coefficient

p lt 00001

Porphyromonas gingivalis Treponema denticola Tannerella forsythia

12

9

6

3

12

9

6

3

12

9

6

3

15

12

lt 3 3 4 5 6 gt 7

Porphyromonas gingivalis

p lt 00001

p lt 00001 p lt 00001 p lt 00001

lt 3 3 4 5 6 gt 7

Treponema denticola

p lt 00001 15

12

lt 3 3 4 5 6 gt 7Pocket depth (mm)

Tannerella forsythia

9 9

6 6

3

15

12

9

6

3 3

lt 3 3 4 5 6 gt 7 lt 3 3 4 5 6 gt 7 lt 3 3 4 5 6 gt 7Clinical attachment loss (mm)

Aver

age a

bund

ance

Aver

age a

bund

ance

Figure 4 Relationships of the average abundance of three species to selected pocket depths and clinical attachment loss levels Significanceof differences among pocket depth levels was tested using the Kruskal-Wallis test

33 Diversity of Bacterial Community Profiles To evaluatethe alpha diversity of the microbial communities Shannonindex curves scores and richness metrics (Observed Chao1and ACE) were applied as shown in Figure 5 As depictedin Figure 5(a) the Shannon diversity index curves clearlyreached plateau levels after the sequence number exceeded5000 in all three health statuses indicating that the microbial

composition for each health status was well represented bythe sequencing depth As shown in Figure 5(b) the averagerichness measured by Observed Chao1 and Ace indexeswas higher in samples from patients with periodontitis thanin samples from healthy individuals however these resultswere in contrast to the results from the Shannon diversityindex Thus the relative abundance of each microbe was

BioMed Research International 9

6

Health StatusHMP SP

4

2

0

0 5000 10000 15000Sequences Per Sample

Aver

age S

hann

on In

dex

(a)

Observed Chao1 ACE

Case

Control

Case

Control

Case

Control

0

100

200

300

400

500

Sample group

Alp

ha D

iver

sity M

easu

re

Shannon

Case

Control

30

35

40

45

p value = 0166

(b)

Figure 5 (a) The sequencing depths measured by average scores from the Shannon index reached a plateau when the sequence numberexceeded 5000 (b) Alpha-diversity metrics (richness and Shannon index) were employed to measure the microbial communities of samplesfrom healthy patients and patients with periodontitis The average richness of microbes was higher in patients with periodontal disease thanin healthy patients however the microbial communities of healthy patients exhibited higher Shannon indexes

more balanced in samples from healthy individuals than insamples from patients with periodontal disease and therewere more microbes with low relative abundance in samplesfrom patients with periodontitis

To further explore the relationships between bacterialcommunities in healthy patients and patients with peri-odontal disease PCoAwas performed (Figure 6(a)) Analysisof beta diversity based on the weighted UniFrac distancesshowed greater concentration in diseased samples than inhealthy samples In other words the microbial composi-tions of diseased samples were more similar to each otherAs shown in Figure 6(b) PCA of microbial communitiesrevealed that the core genera in healthy samples includedStreptococcus Capnocytophaga Campylobacter VeillonellaAlloprevotella TM7 [G-1] Leptotrichia and Selenomonaswhereas those in samples from patients with periodontitiswere Filifactor Treponema Fretibacterium Porphyromonasand Tannerella

34 Machine Learning and Feature Selection Before applyingthe machine learning algorithm to classify samples it isnecessary to select the features from the samples and trainpredictionmodels Table 4 lists featureswith difference scores

p lt 1E-07 Based on significant differences between healthypatients and patients with periodontitis we selected the top10 microbes with more information as features In total1023 combinations of selected features were generated by ouralgorithm All feature combinations were evaluated by SVMrandom forest logical regression and deep learning machinelearning methods and the average accuracies were 088 093085 and 090 respectively Figure 7 shows the performanceof each machine learning method In general the accuracyof prediction increased slightly with the number of featuresused except in logistic regression From our results wefound that random forests had better predictive ability thanthe other methods Applying combinations consisting ofPeptoniphilaceae sp oral taxon 113 Streptococcus sanguinisMollicutes sp oral taxon 906Aggregatibacter actinomycetem-comitans Porphyromonas gingivalis Peptostreptococcaceae sporal taxon 950 and Lachnospiraceae sp oral taxon 500or Stomatobaculum sp oral taxon 373 Desulfobulbus sporal taxon 041 Peptoniphilaceae sp oral taxon 113 Strep-tococcus sanguinis Aggregatibacter actinomycetemcomitansPorphyromonas gingivalis and Leptotrichia sp oral taxon 218showed that random forests could predict the health status ofsamples accurately The feature combinations having averageaccuracies of more than 094 are reported in Table 5

10 BioMed Research International

minus01

00

01

02

minus02 minus01 00 01 02Axis1 [403]

Axi

s2 [1

35

]

Health statusHMPSP

Sample groupCaseControl

(a)

Prevotella

Fusobacterium

Porphyromonas

Treponema

Streptococcu

s

Leptotri

chia

Capnocytophaga

Alloprevotella

Campylobacter

Neisse

ria

Veillonella

Bacteroidales_[Gminus2]

Tannerella

Fretibacterium

Filifactor

Aggregatibacter

SelenomonasTM7_[Gminus1]minus2

0

2

4

minus4 minus2 0 2PC1 (330 explained var)

PC2

(13

3 ex

plai

ned

var)

Sample group Case Control

(b)

Figure 6 (a) Principal coordinate analysis (PCoA) with weighted UniFrac distance matrixes for bacterial communities associated with thethree health statuses (b) Principal component analysis (PCA) of the dominant genera between samples from healthy patients and patientswith periodontitis Only genera with ge 1 mean relative abundance across all samples are shown

Table 4 Features with significant differences between healthy patients and patients with periodontitis Correlation coefficients and p valueswere determined by Spearmanrsquos rank correlation coefficient and KruskalminusWallis tests respectively Negative correlations indicated that thefeatures were observed more often in patients with periodontitis than in healthy patients

No Feature (Species) Correlation coefficient 119901

1 Stomatobaculum sp oral taxon 373 -0766029754 327E-112 Desulfobulbus sp oral taxon 041 -074877058 890E-113 Peptoniphilaceae sp oral taxon 113 -0723418056 373E-104 Streptococcus sanguinis 071684624 536E-105 Mollicutes sp oral taxon 906 -0709369416 808E-106 Aggregatibacter actinomycetemcomitans -0686608198 274E-097 Porphyromonas gingivalis -0683993685 315E-098 Peptostreptococcaceae sp oral taxon 950 -0681489164 359E-099 Lachnospiraceae sp oral taxon 500 -0670324546 643E-0910 Leptotrichia sp oral taxon 218 -0666642231 777E-0911 Bosea vestrisii 0665468802 826E-0912 Filifactor alocis -0656797473 129E-0813 Mycoplasma faucium -0641322841 279E-0814 Prevotella sp oral taxon 304 -0638587976 320E-0815 Fretibacterium sp oral taxon 359 -0632290825 436E-0816 Bergeyella sp oral taxon 322 0630961524 465E-0817 Tannerella forsythia -0628346704 528E-0818 Peptostreptococcus indolicus -0626504998 577E-0819 Johnsonella sp oral taxon 166 -0622396393 704E-0820 Peptostreptococcaceae [Eubacterium] saphenum -0616735679 924E-08

BioMed Research International 11

Table 5 Feature combinations and their predictive accuracies with different machine learning methods Only feature combinations withmore than 094 average accuracy are shown DL RF and LR represent deep learning random forests and logistic regression respectively

Feature combination DL RF SVM LR Average accuracyStomatobaculum sp oral taxon 373 0967 0973 0960 0933 0958Peptoniphilaceae sp oral taxon 113Desulfobulbaceae sp oral taxon 041

0933 0960 0973 0947 0953Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLachnospiraceae sp oral taxon 500Leptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0933 0973 0960 0947 0953Streptococcus sanguinisAggregatibacter actinomycetemcomitansDesulfobulbaceae sp oral taxon 041

0973 0967 0933 0927 0950Mollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0947 0953 0907 0987 0948Streptococcus sanguinisMollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0960 0967 0947 0913 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Desulfobulbaceae sp oral taxon 041

0933 0973 0933 0947 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0967 0933 0953 0933 0947Peptoniphilaceae sp oral taxon 113Mollicutes sp oral taxon 906Peptoniphilaceae sp oral taxon 113

0960 0987 0867 0967 0945Streptococcus sanguinisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0920 0947 0967 0947 0945Aggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0967 0967 0953 0893 0945Peptoniphilaceae sp oral taxon 113Porphyromonas gingivalisAggregatibacter actinomycetemcomitans

According to previous studies Caruana et al [69 70] pro-posed that the random forest method showed better accuracyin high-dimensional and large-scale data than neural netsSVM and logistic regression In this study we found that therandom forest method was more suitable for small-scale datathan other methods In contrast deep learning approachesled to good performance but required long computation

times and large amounts of memory particularly when thehidden layer size was increased

4 Conclusions

With the development of high-throughput DNA sequencingtechnology the limitations associated with difficult culture of

12 BioMed Research International

075

080

085

090

095

1 2 3 4 5 6 7 8 9 10Number of features in the combinstion

Aver

age a

ccur

acy

Machine learning algorithmsDeep learningLogistic regressionRandom forestSVM

Figure 7 Average accuracies of different numbers of features

manymicrobes that populate the oral cavity can be overcomefacilitating the analysis of bacterial community compositionUsing 16S rRNA sequencing of subgingival samples from 50individuals with periodontitis and 26 periodontally healthycontrols we determined the diversity of and differences incommunity compositions Moreover we identified microbesassociated with good health and periodontal disease andprovided a machine learning method for finding patternsand making predictions for oral microbiota associated withperiodontal disease

Our results showed that there was a higher diversity ofmicrobes in samples from patients with periodontal diseasethan in samples from healthy patients Importantly the coremicrobes in healthy patients were different significantly fromthose in patients with periodontitis We also found thatbacterial communities associated with healthy and diseasedstates were highly different in PCA and PCoA and thecompositions of microorganisms were more similar to eachother in samples from patients with periodontal disease thanin samples from healthy individuals

We proposed a novel feature selection method andinvestigated the potential of machine learning approachesfor determination of health status based on oral metage-nomics data By using nonparametric KruskalminusWallis teststo assess the significance of each microorganism we selectedsignificant microbes to generate prioritized feature combina-tions by our algorithm The performances of four machinelearning approaches were evaluated with these feature com-binations and random forests showed the best performance(average accuracy of 093 from 1023 feature combinations)followed by deep learning SVM and logistic regressionUsing machine learning methods training models couldaccurately predict the health status of samples by examiningfewer features According to our observations the accuracyof prediction generally increased slightly with the numberof features used except for logistic regression Notablycertain combinations composed of fewer features showedbetter accuracy than combinations composed of all selectedfeatures These combinations of features may only apply toour dataset However the results implied that a few relatedfeatures may have better predictive ability than multiple

independent features Therefore in order to improve theprediction accuracy of the model it is essential to identifythe most informative features Due to limitations in fundingtime and ethical considerations it is not easy to obtain largenumbers of oral samples from patients with periodontitisAlthough insufficient and incomplete samples could easilylead to bias and variance in training models our study stillprovided an important basis for further studies

Periodontitis is a chronic inflammatory disease involvingcomplex interactions between the oral microorganisms andthe host immune response In addition to the individualspecies associated with pathogenesis the system-level mech-anisms underlying the transition from a healthy state to adiseased state are key points for studying periodontal diseaseThus in our future studies we aim to elucidate the globalgenetic metabolic and ecological changes associated withperiodontitis and identify the pathogenic features of con-structing machine learning models Rapid molecular tech-niques and machine learning methods capable of identifyingperiodontal bacteria with great accuracy may eventuallyprovide improved classification and diagnosis of varioustypes of periodontal diseases and aid significantly in clinicaldecision-making

Data Availability

The raw sequences of human oral subgingival plaque sampleswere deposited at theNCBI SequenceReadArchive under theBioproject Accession no PRJNA437129

Conflicts of Interest

The authors declare no conflicts of interest

Authorsrsquo Contributions

Wen-Pei Chen and Shih-Hao Chang contributed equally tothis work

Funding

Thepresent work was partially supported by a grant from theMinistry of Science and Technology [grant number MOST107-2218-E-126-001-] and [grant number NSC 102-2622-E-126-002 CC1]

References

[1] L Gao T Xu G Huang S Jiang Y Gu and F Chen ldquoOral mi-crobiomesmore andmore importance in oral cavity and wholebodyrdquo Protein amp Cell pp 1ndash13 2018

[2] P D Marsh ldquoMicrobiology of dental plaque biofilms and theirrole in oral health and cariesrdquo Dental Clinics of North Americavol 54 no 3 pp 441ndash454 2010

[3] P D Marsh ldquoDental plaque as a biofilm and a microbialcommunity - Implications for health and diseaserdquo BMC OralHealth vol 6 no 1 2006

BioMed Research International 13

[4] R J Palmer Jr ldquoComposition and development of oral bacterialcommunitiesrdquo Periodontology 2000 vol 64 no 1 pp 20ndash392014

[5] B L Pihlstrom B S Michalowicz and N W Johnson ldquoPeri-odontal diseasesrdquoe Lancet vol 366 no 9499 pp 1809ndash18202005

[6] R J Genco and T E Van Dyke ldquoReducing the risk of CVD inpatients with periodontitisrdquo Nature Reviews Cardiology vol 7no 9 pp 479-480 2010

[7] K Lundberg N Wegner T Yucel-Lindberg and P J VenablesldquoPeriodontitis in RA-the citrullinated enolase connectionrdquoNature Reviews Rheumatology vol 6 no 12 pp 727ndash730 2010

[8] M Kebschull R T Demmer and P N Papapanou ldquoldquoGumbug leave my heart alonerdquomdashepidemiologic and mechanisticevidence linking periodontal infections and atherosclerosisrdquoJournal of Dental Research vol 89 no 9 pp 879ndash902 2010

[9] S E Whitmore R J Lamont and W E Goldman ldquoOral Bac-teria and Cancerrdquo PLoS Pathogens vol 10 no 3 p e10039332014

[10] Y W Han and X Wang ldquoMobile microbiome oral bacteria inextra-oral infections and inflammationrdquo Journal of Dental Re-search vol 92 no 6 pp 485ndash491 2013

[11] P N Madianos Y A Bobetsis and S Offenbacher ldquoAdversepregnancy outcomes (APOs) and periodontal disease Patho-genic mechanismsrdquo Journal of Clinical Periodontology vol 40no 14 pp S170ndashS180 2013

[12] S Temoin K LWu VWuM Shoham and YW Han ldquoSignalpeptide of FadA adhesin from Fusobacterium nucleatum playsa novel structural role by modulating the filamentrsquos length andwidthrdquo FEBS Letters vol 586 no 1 pp 1ndash6 2012

[13] S Malm M Jusko S Eick J Potempa K Riesbeck and AM Blom ldquoAcquisition of complement inhibitor serine proteasefactor i and its cofactors C4b-binding protein and factor H byprevotella intermediardquo PLoS ONE vol 7 no 4 2012

[14] S S Socransky A D Haffajee M A Cugini C Smith and R LKent Jr ldquoMicrobial complexes in subgingival plaquerdquo Journal ofClinical Periodontology vol 25 no 2 pp 134ndash144 1998

[15] R Teles F Teles J Frias-Lopez B Paster and A HaffajeeldquoLessons learned and unlearned in periodontal microbiologyrdquoPeriodontology 2000 vol 62 no 1 pp 95ndash162 2013

[16] P Belda-Ferre L D Alcaraz R Cabrera-Rubio et al ldquoThe oralmetagenome in health and diseaserdquo e ISME Journal vol 6no 1 pp 46ndash56 2012

[17] P S Kumar A L Griffen M L Moeschberger and E J LeysldquoIdentification of candidate periodontal pathogens and ben-eficial species by quantitative 16S clonal analysisrdquo Journal ofClinical Microbiology vol 43 no 8 pp 3944ndash3955 2005

[18] A P V Colombo S K Boches S L Cotton et al ldquoComparisonsof subgingival microbial profiles of refractory periodontitissevere periodontitis and periodontal health using the humanoral microbe identification microarrayrdquo Journal of Periodontol-ogy vol 80 no 9 pp 1421ndash1432 2009

[19] S Junemann K Prior R Szczepanowski et al ldquoBacterial com-munity shift in treated periodontitis patients revealed by IonTorrent 16S rRNA gene amplicon sequencingrdquo PLoS ONE vol7 no 8 Article ID e41606 2012

[20] P D Schloss S L Westcott T Ryabin et al ldquoIntroduc-ing mothur open-source platform-independent community-supported software for describing and comparing microbialcommunitiesrdquoApplied and EnvironmentalMicrobiology vol 75no 23 pp 7537ndash7541 2009

[21] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquoBioinformatics vol 26 no 19 pp 2460-2461 2010

[22] F Mahe T Rognes C Quince C de Vargas andM DunthornldquoSwarm Robust and fast clusteringmethod for amplicon-basedstudiesrdquo PeerJ vol 2014 no 1 2014

[23] M Ghodsi B Liu and M Pop ldquoDNACLUST Accurate andefficient clustering of phylogenetic marker genesrdquo BMC Bioin-formatics vol 12 article no 271 2011

[24] L Fu B Niu Z Zhu SWu andW Li ldquoCD-HIT accelerated forclustering the next-generation sequencing datardquoBioinformaticsvol 28 no 23 pp 3150ndash3152 2012

[25] H Teeling J Waldmann T Lombardot M Bauer and F OGlockner ldquoTETRA a web-service and a stand-alone programfor the analysis and comparison of tetranucleotide usage pat-terns inDNA sequencesrdquoBMCBioinformatics vol 5 article 1632004

[26] Y Wang H C M Leung S M Yiu and F Y L Chin ldquoMeta-Cluster 40 A novel binning algorithm for NGS reads and hugenumber of speciesrdquo Journal of Computational Biology vol 19no 2 pp 241ndash249 2012

[27] YWangH CM Leung SM Yiu and F Y L Chin ldquoMetaclus-ter 50 a two-roundbinning approach formetagenomic data forlow-abundance species in a noisy samplerdquo Bioinformatics vol28 no 18 Article ID bts397 pp i356ndashi362 2012

[28] D D Kang J Froula R Egan and Z Wang ldquoMetaBAT anefficient tool for accurately reconstructing single genomes fromcomplex microbial communitiesrdquo PeerJ vol 2015 no 8 2015

[29] J Alneberg B S Bjarnason I De Bruijn et al ldquoBinning meta-genomic contigs by coverage and compositionrdquo Nature Meth-ods vol 11 no 11 pp 1144ndash1146 2014

[30] Y-W Wu Y-H Tang S G Tringe B A Simmons and SW Singer ldquoMaxBin An automated binning method to recoverindividual genomes from metagenomes using an expectation-maximization algorithmrdquoMicrobiome vol 2 no 1 2014

[31] QWangGMGarrity JM Tiedje and J R Cole ldquoNaıve Baye-sian classifier for rapid assignment of rRNA sequences into thenew bacterial taxonomyrdquo Applied and Environmental Microbi-ology vol 73 no 16 pp 5261ndash5267 2007

[32] N Chaudhary A K Sharma P Agarwal A Gupta and V KSharma ldquo16S classifier A tool for fast and accurate taxono-mic classification of 16S rRNAhypervariable regions inmetage-nomic datasetsrdquo PLoS ONE vol 10 no 2 2015

[33] S P Szafranski M L Wos-Oxley R Vilchez-Vargas et alldquoHigh-resolution taxonomic profiling of the subgingival micro-biome for biomarker discovery and periodontitis diagnosisrdquoApplied and EnvironmentalMicrobiology vol 81 no 3 pp 1047ndash1058 2015

[34] K Vervier PMaheM Tournoud J-B Veyrieras and J-P VertldquoLarge-scale machine learning for metagenomics sequenceclassificationrdquo Bioinformatics vol 32 no 7 pp 1023ndash1032 2016

[35] A E Darling G Jospin E Lowe F A Matsen H M Bik andJ A Eisen ldquoPhyloSift Phylogenetic analysis of genomes andmetagenomesrdquo PeerJ vol 2013 no 1 2014

[36] Z Liu W Hsiao B L Cantarel E F Drabek and C Fraser-Liggett ldquoSparse distance-based learning for simultaneous mul-ticlass classification and feature selection of metagenomic datardquoBioinformatics vol 27 no 23 pp 3242ndash3249 2011

[37] O Tanaseichuk J Borneman and T Jiang ldquoPhylogeny-basedclassification ofmicrobial communitiesrdquoBioinformatics vol 30no 4 pp 449ndash456 2014

14 BioMed Research International

[38] H Cui and X Zhang ldquoAlignment-free supervised classificationof metagenomes by recursive SVMrdquo BMCGenomics vol 14 no1 article no 641 2013

[39] E Pasolli D T Truong F Malik L Waldron and N SegataldquoMachine Learning Meta-analysis of Large MetagenomicDatasets Tools and Biological Insightsrdquo PLoS ComputationalBiology vol 12 no 7 2016

[40] M Arumugam E D Harrington K U Foerstner J Raes andP Bork ldquoSmashCommunity A metagenomic annotation andanalysis toolrdquo Bioinformatics vol 26 no 23 pp 2977-29782010

[41] K J Hoff T Lingner P Meinicke and M Tech ldquoOrpheliaPredicting genes in metagenomic sequencing readsrdquo NucleicAcids Research vol 37 no 2 pp W101ndashW105 2009

[42] K J HoffM Tech T Lingner R Daniel BMorgenstern and PMeinicke ldquoGene prediction in metagenomic fragments A largescale machine learning approachrdquo BMC Bioinformatics vol 9article no 217 2008

[43] I Guyon and A Elisseeff ldquoAn introduction to variable andfeature selectionrdquo Journal of Machine Learning Research vol 3pp 1157ndash1182 2003

[44] Y Saeys I Inza and P Larranaga ldquoA review of feature selectiontechniques in bioinformaticsrdquo Bioinformatics vol 23 no 19 pp2507ndash2517 2007

[45] W-P Chen S-J J Tsai Y-C Hu and Y-L Lin ldquoMetage-nomic analysis and features selection in human oral microbiotaassociated with periodontal diseaserdquo in Proceedings of the 33rdWorkshop on Combinatorial Mathematics and Computationeory pp 58ndash64 2016

[46] G C Armitage ldquoDevelopment of a classification system forperiodontal diseases and conditionsrdquo Annals of Periodontologyvol 4 no 1 pp 1ndash6 1999

[47] C Y Tang S-M Yiu H-Y Kuo et al ldquoApplication of 16S rRNAmetagenomics to analyze bacterial communities at a respiratorycare centre in Taiwanrdquo Applied Microbiology and Biotechnologyvol 99 no 6 pp 2871ndash2881 2015

[48] L Cai L Ye A H Y Tong S Lok and T Zhang ldquoBiased Di-versityMetrics Revealed byBacterial 16S PyrotagsDerived fromDifferent Primer Setsrdquo PLoS ONE vol 8 no 1 2013

[49] J Zhang K Kobert T Flouri and A Stamatakis ldquoPEAR A fastand accurate Illumina Paired-End reAd mergeRrdquo Bioinformat-ics vol 30 no 5 pp 614ndash620 2014

[50] J G Caporaso J Kuczynski J Stombaugh et al ldquoQIIME allowsanalysis of high-throughput community sequencing datardquoNature Methods vol 7 no 5 pp 335-336 2010

[51] B J Haas D Gevers A M Earl et al ldquoChimeric 16S rRNAsequence formation and detection in Sanger and 454-pyro-sequenced PCR ampliconsrdquo Genome Research vol 21 no 3 pp494ndash504 2011

[52] R C Edgar B J Haas J C Clemente C Quince and RKnight ldquoUCHIME improves sensitivity and speed of chimeradetectionrdquo Bioinformatics vol 27 no 16 pp 2194ndash2200 2011

[53] T ChenW-H Yu J Izard OV BaranovaA Lakshmanan andF E Dewhirst ldquoThe Human Oral Microbiome Database a webaccessible resource for investigating oral microbe taxonomicand genomic informationrdquo Database the journal of biologicaldatabases and curation vol 2010 p baq013 2010

[54] T C J Hill K A Walsh J A Harris and B F Moffett ldquoUsingecological diversity measures with bacterial communitiesrdquoFEMS Microbiology Ecology vol 43 no 1 pp 1ndash11 2003

[55] R PDarveau ldquoPeriodontitis a polymicrobial disruption of hosthomeostasisrdquoNature ReviewsMicrobiology vol 8 no 7 pp 481ndash490 2010

[56] J M Albandar L J Brown and H Loe ldquoPutative PeriodontalPathogens in Subgingival Plaque of Young Adults with andWithout Early-Onset Periodontitisrdquo Journal of Periodontologyvol 68 no 10 pp 973ndash981 1997

[57] Y Takeuchi M Umeda M Sakamoto Y Benno Y Huang andI Ishikawa ldquoTreponema socranskii Treponema denticola andPorphyromonas gingivalis are associated with severity of peri-odontal tissue destructionrdquo Journal of Periodontology vol 72no 10 pp 1354ndash1363 2001

[58] J J Zambon ldquoActinobacillus actinomycetemcomitans in hu-man periodontal diseaserdquo Journal of Clinical Periodontologyvol 12 no 1 pp 1ndash20 1985

[59] J Slots and M Ting ldquoActinobacillus actinomycetemcomitansand Porphyromonas gingivalis in human periodontal diseaseOccurrence and treatmentrdquo Periodontology 2000 vol 20 no 1pp 82ndash121 1999

[60] BKChoi B J Paster F EDewhirst andUBGobel ldquoDiversityof cultivable and uncultivable oral spirochetes from a patientwith severe destructive periodontitisrdquo Infection and Immunityvol 62 no 5 pp 1889ndash1895 1994

[61] B Liu L L Faller N Klitgord et al ldquoDeep sequencing of theoral microbiome reveals signatures of periodontal diseaserdquoPLoS ONE vol 7 no 6 Article ID e37919 2012

[62] J Wang J Qi H Zhao et al ldquoMetagenomic sequencingreveals microbiota and its functional potential associated withperiodontal diseaserdquo Scientific Reports vol 3 2013

[63] A L Griffen C J Beall J H Campbell et al ldquoDistinct and com-plex bacterial profiles in human periodontitis and healthrevealed by 16S pyrosequencingrdquo e ISME Journal vol 6 no6 pp 1176ndash1185 2012

[64] M A Masalma F Armougom W Michael Scheld et al ldquoTheexpansion of the microbiological spectrum of brain abscessesWith use of multiple 16S ribosomal DNA Sequencingrdquo ClinicalInfectious Diseases vol 48 no 9 pp 1169ndash1178 2009

[65] MAlMasalmaM LonjonHRichet et al ldquoMetagenomic anal-ysis of brain abscesses identifies specific bacterial associationsrdquoClinical Infectious Diseases vol 54 no 2 pp 202ndash210 2012

[66] I Nasidze J Li DQuinque K Tang andM Stoneking ldquoGlobaldiversity in the human salivary microbiomerdquoGenome Researchvol 19 no 4 pp 636ndash643 2009

[67] F V Wintzingerode U B Gobel and E Stackebrandt ldquoDeter-mination of microbial diversity in environmental samples Pit-falls of PCR-based rRNA analysisrdquo FEMSMicrobiology Reviewsvol 21 no 3 pp 213ndash229 1997

[68] M Tezal F A Scannapieco J Wactawski-Wende S G Grossiand R J Genco ldquoSupragingival plaque may modify the effectsof subgingival bacteria on attachment lossrdquo Journal of Periodon-tology vol 77 no 5 pp 808ndash813 2006

[69] R Caruana and A Niculescu-Mizil ldquoAn empirical comparisonof supervised learning algorithmsrdquo in Proceedings of the ICML2006 23rd International Conference on Machine Learning pp161ndash168 USA June 2006

[70] R Caruana N Karampatziakis and A Yessenalina ldquoAn empir-ical evaluation of supervised learning in high dimensionsrdquo inProceedings of the 25th International Conference on MachineLearning pp 96ndash103 Finland July 2008

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 8: Composition Analysis and Feature Selection of the Oral

8 BioMed Research International

F nuc ss nucleatum

C rectus

P intermedia

P gingivalis

T forsythia

T denticolaF nuc ss polymorphum

C showae

P micra

F nuc ss vincentiiC gracilis

P nigrescens

F nuc s

s nucle

atum

C rectu

s

P inter

media

P gingiva

lis

T forsy

thia

T denticola

F nuc s

s polym

orphum

C showae

P micr

aF n

uc ss v

incentii

C gracilis

P nigres

cens

1

08

06

04

02

0

-02

-04

-06

-08

-1

Figure 3 The relationships among species were evaluated using Spearmanrsquos rank correlation coefficient

p lt 00001

Porphyromonas gingivalis Treponema denticola Tannerella forsythia

12

9

6

3

12

9

6

3

12

9

6

3

15

12

lt 3 3 4 5 6 gt 7

Porphyromonas gingivalis

p lt 00001

p lt 00001 p lt 00001 p lt 00001

lt 3 3 4 5 6 gt 7

Treponema denticola

p lt 00001 15

12

lt 3 3 4 5 6 gt 7Pocket depth (mm)

Tannerella forsythia

9 9

6 6

3

15

12

9

6

3 3

lt 3 3 4 5 6 gt 7 lt 3 3 4 5 6 gt 7 lt 3 3 4 5 6 gt 7Clinical attachment loss (mm)

Aver

age a

bund

ance

Aver

age a

bund

ance

Figure 4 Relationships of the average abundance of three species to selected pocket depths and clinical attachment loss levels Significanceof differences among pocket depth levels was tested using the Kruskal-Wallis test

33 Diversity of Bacterial Community Profiles To evaluatethe alpha diversity of the microbial communities Shannonindex curves scores and richness metrics (Observed Chao1and ACE) were applied as shown in Figure 5 As depictedin Figure 5(a) the Shannon diversity index curves clearlyreached plateau levels after the sequence number exceeded5000 in all three health statuses indicating that the microbial

composition for each health status was well represented bythe sequencing depth As shown in Figure 5(b) the averagerichness measured by Observed Chao1 and Ace indexeswas higher in samples from patients with periodontitis thanin samples from healthy individuals however these resultswere in contrast to the results from the Shannon diversityindex Thus the relative abundance of each microbe was

BioMed Research International 9

6

Health StatusHMP SP

4

2

0

0 5000 10000 15000Sequences Per Sample

Aver

age S

hann

on In

dex

(a)

Observed Chao1 ACE

Case

Control

Case

Control

Case

Control

0

100

200

300

400

500

Sample group

Alp

ha D

iver

sity M

easu

re

Shannon

Case

Control

30

35

40

45

p value = 0166

(b)

Figure 5 (a) The sequencing depths measured by average scores from the Shannon index reached a plateau when the sequence numberexceeded 5000 (b) Alpha-diversity metrics (richness and Shannon index) were employed to measure the microbial communities of samplesfrom healthy patients and patients with periodontitis The average richness of microbes was higher in patients with periodontal disease thanin healthy patients however the microbial communities of healthy patients exhibited higher Shannon indexes

more balanced in samples from healthy individuals than insamples from patients with periodontal disease and therewere more microbes with low relative abundance in samplesfrom patients with periodontitis

To further explore the relationships between bacterialcommunities in healthy patients and patients with peri-odontal disease PCoAwas performed (Figure 6(a)) Analysisof beta diversity based on the weighted UniFrac distancesshowed greater concentration in diseased samples than inhealthy samples In other words the microbial composi-tions of diseased samples were more similar to each otherAs shown in Figure 6(b) PCA of microbial communitiesrevealed that the core genera in healthy samples includedStreptococcus Capnocytophaga Campylobacter VeillonellaAlloprevotella TM7 [G-1] Leptotrichia and Selenomonaswhereas those in samples from patients with periodontitiswere Filifactor Treponema Fretibacterium Porphyromonasand Tannerella

34 Machine Learning and Feature Selection Before applyingthe machine learning algorithm to classify samples it isnecessary to select the features from the samples and trainpredictionmodels Table 4 lists featureswith difference scores

p lt 1E-07 Based on significant differences between healthypatients and patients with periodontitis we selected the top10 microbes with more information as features In total1023 combinations of selected features were generated by ouralgorithm All feature combinations were evaluated by SVMrandom forest logical regression and deep learning machinelearning methods and the average accuracies were 088 093085 and 090 respectively Figure 7 shows the performanceof each machine learning method In general the accuracyof prediction increased slightly with the number of featuresused except in logistic regression From our results wefound that random forests had better predictive ability thanthe other methods Applying combinations consisting ofPeptoniphilaceae sp oral taxon 113 Streptococcus sanguinisMollicutes sp oral taxon 906Aggregatibacter actinomycetem-comitans Porphyromonas gingivalis Peptostreptococcaceae sporal taxon 950 and Lachnospiraceae sp oral taxon 500or Stomatobaculum sp oral taxon 373 Desulfobulbus sporal taxon 041 Peptoniphilaceae sp oral taxon 113 Strep-tococcus sanguinis Aggregatibacter actinomycetemcomitansPorphyromonas gingivalis and Leptotrichia sp oral taxon 218showed that random forests could predict the health status ofsamples accurately The feature combinations having averageaccuracies of more than 094 are reported in Table 5

10 BioMed Research International

minus01

00

01

02

minus02 minus01 00 01 02Axis1 [403]

Axi

s2 [1

35

]

Health statusHMPSP

Sample groupCaseControl

(a)

Prevotella

Fusobacterium

Porphyromonas

Treponema

Streptococcu

s

Leptotri

chia

Capnocytophaga

Alloprevotella

Campylobacter

Neisse

ria

Veillonella

Bacteroidales_[Gminus2]

Tannerella

Fretibacterium

Filifactor

Aggregatibacter

SelenomonasTM7_[Gminus1]minus2

0

2

4

minus4 minus2 0 2PC1 (330 explained var)

PC2

(13

3 ex

plai

ned

var)

Sample group Case Control

(b)

Figure 6 (a) Principal coordinate analysis (PCoA) with weighted UniFrac distance matrixes for bacterial communities associated with thethree health statuses (b) Principal component analysis (PCA) of the dominant genera between samples from healthy patients and patientswith periodontitis Only genera with ge 1 mean relative abundance across all samples are shown

Table 4 Features with significant differences between healthy patients and patients with periodontitis Correlation coefficients and p valueswere determined by Spearmanrsquos rank correlation coefficient and KruskalminusWallis tests respectively Negative correlations indicated that thefeatures were observed more often in patients with periodontitis than in healthy patients

No Feature (Species) Correlation coefficient 119901

1 Stomatobaculum sp oral taxon 373 -0766029754 327E-112 Desulfobulbus sp oral taxon 041 -074877058 890E-113 Peptoniphilaceae sp oral taxon 113 -0723418056 373E-104 Streptococcus sanguinis 071684624 536E-105 Mollicutes sp oral taxon 906 -0709369416 808E-106 Aggregatibacter actinomycetemcomitans -0686608198 274E-097 Porphyromonas gingivalis -0683993685 315E-098 Peptostreptococcaceae sp oral taxon 950 -0681489164 359E-099 Lachnospiraceae sp oral taxon 500 -0670324546 643E-0910 Leptotrichia sp oral taxon 218 -0666642231 777E-0911 Bosea vestrisii 0665468802 826E-0912 Filifactor alocis -0656797473 129E-0813 Mycoplasma faucium -0641322841 279E-0814 Prevotella sp oral taxon 304 -0638587976 320E-0815 Fretibacterium sp oral taxon 359 -0632290825 436E-0816 Bergeyella sp oral taxon 322 0630961524 465E-0817 Tannerella forsythia -0628346704 528E-0818 Peptostreptococcus indolicus -0626504998 577E-0819 Johnsonella sp oral taxon 166 -0622396393 704E-0820 Peptostreptococcaceae [Eubacterium] saphenum -0616735679 924E-08

BioMed Research International 11

Table 5 Feature combinations and their predictive accuracies with different machine learning methods Only feature combinations withmore than 094 average accuracy are shown DL RF and LR represent deep learning random forests and logistic regression respectively

Feature combination DL RF SVM LR Average accuracyStomatobaculum sp oral taxon 373 0967 0973 0960 0933 0958Peptoniphilaceae sp oral taxon 113Desulfobulbaceae sp oral taxon 041

0933 0960 0973 0947 0953Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLachnospiraceae sp oral taxon 500Leptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0933 0973 0960 0947 0953Streptococcus sanguinisAggregatibacter actinomycetemcomitansDesulfobulbaceae sp oral taxon 041

0973 0967 0933 0927 0950Mollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0947 0953 0907 0987 0948Streptococcus sanguinisMollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0960 0967 0947 0913 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Desulfobulbaceae sp oral taxon 041

0933 0973 0933 0947 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0967 0933 0953 0933 0947Peptoniphilaceae sp oral taxon 113Mollicutes sp oral taxon 906Peptoniphilaceae sp oral taxon 113

0960 0987 0867 0967 0945Streptococcus sanguinisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0920 0947 0967 0947 0945Aggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0967 0967 0953 0893 0945Peptoniphilaceae sp oral taxon 113Porphyromonas gingivalisAggregatibacter actinomycetemcomitans

According to previous studies Caruana et al [69 70] pro-posed that the random forest method showed better accuracyin high-dimensional and large-scale data than neural netsSVM and logistic regression In this study we found that therandom forest method was more suitable for small-scale datathan other methods In contrast deep learning approachesled to good performance but required long computation

times and large amounts of memory particularly when thehidden layer size was increased

4 Conclusions

With the development of high-throughput DNA sequencingtechnology the limitations associated with difficult culture of

12 BioMed Research International

075

080

085

090

095

1 2 3 4 5 6 7 8 9 10Number of features in the combinstion

Aver

age a

ccur

acy

Machine learning algorithmsDeep learningLogistic regressionRandom forestSVM

Figure 7 Average accuracies of different numbers of features

manymicrobes that populate the oral cavity can be overcomefacilitating the analysis of bacterial community compositionUsing 16S rRNA sequencing of subgingival samples from 50individuals with periodontitis and 26 periodontally healthycontrols we determined the diversity of and differences incommunity compositions Moreover we identified microbesassociated with good health and periodontal disease andprovided a machine learning method for finding patternsand making predictions for oral microbiota associated withperiodontal disease

Our results showed that there was a higher diversity ofmicrobes in samples from patients with periodontal diseasethan in samples from healthy patients Importantly the coremicrobes in healthy patients were different significantly fromthose in patients with periodontitis We also found thatbacterial communities associated with healthy and diseasedstates were highly different in PCA and PCoA and thecompositions of microorganisms were more similar to eachother in samples from patients with periodontal disease thanin samples from healthy individuals

We proposed a novel feature selection method andinvestigated the potential of machine learning approachesfor determination of health status based on oral metage-nomics data By using nonparametric KruskalminusWallis teststo assess the significance of each microorganism we selectedsignificant microbes to generate prioritized feature combina-tions by our algorithm The performances of four machinelearning approaches were evaluated with these feature com-binations and random forests showed the best performance(average accuracy of 093 from 1023 feature combinations)followed by deep learning SVM and logistic regressionUsing machine learning methods training models couldaccurately predict the health status of samples by examiningfewer features According to our observations the accuracyof prediction generally increased slightly with the numberof features used except for logistic regression Notablycertain combinations composed of fewer features showedbetter accuracy than combinations composed of all selectedfeatures These combinations of features may only apply toour dataset However the results implied that a few relatedfeatures may have better predictive ability than multiple

independent features Therefore in order to improve theprediction accuracy of the model it is essential to identifythe most informative features Due to limitations in fundingtime and ethical considerations it is not easy to obtain largenumbers of oral samples from patients with periodontitisAlthough insufficient and incomplete samples could easilylead to bias and variance in training models our study stillprovided an important basis for further studies

Periodontitis is a chronic inflammatory disease involvingcomplex interactions between the oral microorganisms andthe host immune response In addition to the individualspecies associated with pathogenesis the system-level mech-anisms underlying the transition from a healthy state to adiseased state are key points for studying periodontal diseaseThus in our future studies we aim to elucidate the globalgenetic metabolic and ecological changes associated withperiodontitis and identify the pathogenic features of con-structing machine learning models Rapid molecular tech-niques and machine learning methods capable of identifyingperiodontal bacteria with great accuracy may eventuallyprovide improved classification and diagnosis of varioustypes of periodontal diseases and aid significantly in clinicaldecision-making

Data Availability

The raw sequences of human oral subgingival plaque sampleswere deposited at theNCBI SequenceReadArchive under theBioproject Accession no PRJNA437129

Conflicts of Interest

The authors declare no conflicts of interest

Authorsrsquo Contributions

Wen-Pei Chen and Shih-Hao Chang contributed equally tothis work

Funding

Thepresent work was partially supported by a grant from theMinistry of Science and Technology [grant number MOST107-2218-E-126-001-] and [grant number NSC 102-2622-E-126-002 CC1]

References

[1] L Gao T Xu G Huang S Jiang Y Gu and F Chen ldquoOral mi-crobiomesmore andmore importance in oral cavity and wholebodyrdquo Protein amp Cell pp 1ndash13 2018

[2] P D Marsh ldquoMicrobiology of dental plaque biofilms and theirrole in oral health and cariesrdquo Dental Clinics of North Americavol 54 no 3 pp 441ndash454 2010

[3] P D Marsh ldquoDental plaque as a biofilm and a microbialcommunity - Implications for health and diseaserdquo BMC OralHealth vol 6 no 1 2006

BioMed Research International 13

[4] R J Palmer Jr ldquoComposition and development of oral bacterialcommunitiesrdquo Periodontology 2000 vol 64 no 1 pp 20ndash392014

[5] B L Pihlstrom B S Michalowicz and N W Johnson ldquoPeri-odontal diseasesrdquoe Lancet vol 366 no 9499 pp 1809ndash18202005

[6] R J Genco and T E Van Dyke ldquoReducing the risk of CVD inpatients with periodontitisrdquo Nature Reviews Cardiology vol 7no 9 pp 479-480 2010

[7] K Lundberg N Wegner T Yucel-Lindberg and P J VenablesldquoPeriodontitis in RA-the citrullinated enolase connectionrdquoNature Reviews Rheumatology vol 6 no 12 pp 727ndash730 2010

[8] M Kebschull R T Demmer and P N Papapanou ldquoldquoGumbug leave my heart alonerdquomdashepidemiologic and mechanisticevidence linking periodontal infections and atherosclerosisrdquoJournal of Dental Research vol 89 no 9 pp 879ndash902 2010

[9] S E Whitmore R J Lamont and W E Goldman ldquoOral Bac-teria and Cancerrdquo PLoS Pathogens vol 10 no 3 p e10039332014

[10] Y W Han and X Wang ldquoMobile microbiome oral bacteria inextra-oral infections and inflammationrdquo Journal of Dental Re-search vol 92 no 6 pp 485ndash491 2013

[11] P N Madianos Y A Bobetsis and S Offenbacher ldquoAdversepregnancy outcomes (APOs) and periodontal disease Patho-genic mechanismsrdquo Journal of Clinical Periodontology vol 40no 14 pp S170ndashS180 2013

[12] S Temoin K LWu VWuM Shoham and YW Han ldquoSignalpeptide of FadA adhesin from Fusobacterium nucleatum playsa novel structural role by modulating the filamentrsquos length andwidthrdquo FEBS Letters vol 586 no 1 pp 1ndash6 2012

[13] S Malm M Jusko S Eick J Potempa K Riesbeck and AM Blom ldquoAcquisition of complement inhibitor serine proteasefactor i and its cofactors C4b-binding protein and factor H byprevotella intermediardquo PLoS ONE vol 7 no 4 2012

[14] S S Socransky A D Haffajee M A Cugini C Smith and R LKent Jr ldquoMicrobial complexes in subgingival plaquerdquo Journal ofClinical Periodontology vol 25 no 2 pp 134ndash144 1998

[15] R Teles F Teles J Frias-Lopez B Paster and A HaffajeeldquoLessons learned and unlearned in periodontal microbiologyrdquoPeriodontology 2000 vol 62 no 1 pp 95ndash162 2013

[16] P Belda-Ferre L D Alcaraz R Cabrera-Rubio et al ldquoThe oralmetagenome in health and diseaserdquo e ISME Journal vol 6no 1 pp 46ndash56 2012

[17] P S Kumar A L Griffen M L Moeschberger and E J LeysldquoIdentification of candidate periodontal pathogens and ben-eficial species by quantitative 16S clonal analysisrdquo Journal ofClinical Microbiology vol 43 no 8 pp 3944ndash3955 2005

[18] A P V Colombo S K Boches S L Cotton et al ldquoComparisonsof subgingival microbial profiles of refractory periodontitissevere periodontitis and periodontal health using the humanoral microbe identification microarrayrdquo Journal of Periodontol-ogy vol 80 no 9 pp 1421ndash1432 2009

[19] S Junemann K Prior R Szczepanowski et al ldquoBacterial com-munity shift in treated periodontitis patients revealed by IonTorrent 16S rRNA gene amplicon sequencingrdquo PLoS ONE vol7 no 8 Article ID e41606 2012

[20] P D Schloss S L Westcott T Ryabin et al ldquoIntroduc-ing mothur open-source platform-independent community-supported software for describing and comparing microbialcommunitiesrdquoApplied and EnvironmentalMicrobiology vol 75no 23 pp 7537ndash7541 2009

[21] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquoBioinformatics vol 26 no 19 pp 2460-2461 2010

[22] F Mahe T Rognes C Quince C de Vargas andM DunthornldquoSwarm Robust and fast clusteringmethod for amplicon-basedstudiesrdquo PeerJ vol 2014 no 1 2014

[23] M Ghodsi B Liu and M Pop ldquoDNACLUST Accurate andefficient clustering of phylogenetic marker genesrdquo BMC Bioin-formatics vol 12 article no 271 2011

[24] L Fu B Niu Z Zhu SWu andW Li ldquoCD-HIT accelerated forclustering the next-generation sequencing datardquoBioinformaticsvol 28 no 23 pp 3150ndash3152 2012

[25] H Teeling J Waldmann T Lombardot M Bauer and F OGlockner ldquoTETRA a web-service and a stand-alone programfor the analysis and comparison of tetranucleotide usage pat-terns inDNA sequencesrdquoBMCBioinformatics vol 5 article 1632004

[26] Y Wang H C M Leung S M Yiu and F Y L Chin ldquoMeta-Cluster 40 A novel binning algorithm for NGS reads and hugenumber of speciesrdquo Journal of Computational Biology vol 19no 2 pp 241ndash249 2012

[27] YWangH CM Leung SM Yiu and F Y L Chin ldquoMetaclus-ter 50 a two-roundbinning approach formetagenomic data forlow-abundance species in a noisy samplerdquo Bioinformatics vol28 no 18 Article ID bts397 pp i356ndashi362 2012

[28] D D Kang J Froula R Egan and Z Wang ldquoMetaBAT anefficient tool for accurately reconstructing single genomes fromcomplex microbial communitiesrdquo PeerJ vol 2015 no 8 2015

[29] J Alneberg B S Bjarnason I De Bruijn et al ldquoBinning meta-genomic contigs by coverage and compositionrdquo Nature Meth-ods vol 11 no 11 pp 1144ndash1146 2014

[30] Y-W Wu Y-H Tang S G Tringe B A Simmons and SW Singer ldquoMaxBin An automated binning method to recoverindividual genomes from metagenomes using an expectation-maximization algorithmrdquoMicrobiome vol 2 no 1 2014

[31] QWangGMGarrity JM Tiedje and J R Cole ldquoNaıve Baye-sian classifier for rapid assignment of rRNA sequences into thenew bacterial taxonomyrdquo Applied and Environmental Microbi-ology vol 73 no 16 pp 5261ndash5267 2007

[32] N Chaudhary A K Sharma P Agarwal A Gupta and V KSharma ldquo16S classifier A tool for fast and accurate taxono-mic classification of 16S rRNAhypervariable regions inmetage-nomic datasetsrdquo PLoS ONE vol 10 no 2 2015

[33] S P Szafranski M L Wos-Oxley R Vilchez-Vargas et alldquoHigh-resolution taxonomic profiling of the subgingival micro-biome for biomarker discovery and periodontitis diagnosisrdquoApplied and EnvironmentalMicrobiology vol 81 no 3 pp 1047ndash1058 2015

[34] K Vervier PMaheM Tournoud J-B Veyrieras and J-P VertldquoLarge-scale machine learning for metagenomics sequenceclassificationrdquo Bioinformatics vol 32 no 7 pp 1023ndash1032 2016

[35] A E Darling G Jospin E Lowe F A Matsen H M Bik andJ A Eisen ldquoPhyloSift Phylogenetic analysis of genomes andmetagenomesrdquo PeerJ vol 2013 no 1 2014

[36] Z Liu W Hsiao B L Cantarel E F Drabek and C Fraser-Liggett ldquoSparse distance-based learning for simultaneous mul-ticlass classification and feature selection of metagenomic datardquoBioinformatics vol 27 no 23 pp 3242ndash3249 2011

[37] O Tanaseichuk J Borneman and T Jiang ldquoPhylogeny-basedclassification ofmicrobial communitiesrdquoBioinformatics vol 30no 4 pp 449ndash456 2014

14 BioMed Research International

[38] H Cui and X Zhang ldquoAlignment-free supervised classificationof metagenomes by recursive SVMrdquo BMCGenomics vol 14 no1 article no 641 2013

[39] E Pasolli D T Truong F Malik L Waldron and N SegataldquoMachine Learning Meta-analysis of Large MetagenomicDatasets Tools and Biological Insightsrdquo PLoS ComputationalBiology vol 12 no 7 2016

[40] M Arumugam E D Harrington K U Foerstner J Raes andP Bork ldquoSmashCommunity A metagenomic annotation andanalysis toolrdquo Bioinformatics vol 26 no 23 pp 2977-29782010

[41] K J Hoff T Lingner P Meinicke and M Tech ldquoOrpheliaPredicting genes in metagenomic sequencing readsrdquo NucleicAcids Research vol 37 no 2 pp W101ndashW105 2009

[42] K J HoffM Tech T Lingner R Daniel BMorgenstern and PMeinicke ldquoGene prediction in metagenomic fragments A largescale machine learning approachrdquo BMC Bioinformatics vol 9article no 217 2008

[43] I Guyon and A Elisseeff ldquoAn introduction to variable andfeature selectionrdquo Journal of Machine Learning Research vol 3pp 1157ndash1182 2003

[44] Y Saeys I Inza and P Larranaga ldquoA review of feature selectiontechniques in bioinformaticsrdquo Bioinformatics vol 23 no 19 pp2507ndash2517 2007

[45] W-P Chen S-J J Tsai Y-C Hu and Y-L Lin ldquoMetage-nomic analysis and features selection in human oral microbiotaassociated with periodontal diseaserdquo in Proceedings of the 33rdWorkshop on Combinatorial Mathematics and Computationeory pp 58ndash64 2016

[46] G C Armitage ldquoDevelopment of a classification system forperiodontal diseases and conditionsrdquo Annals of Periodontologyvol 4 no 1 pp 1ndash6 1999

[47] C Y Tang S-M Yiu H-Y Kuo et al ldquoApplication of 16S rRNAmetagenomics to analyze bacterial communities at a respiratorycare centre in Taiwanrdquo Applied Microbiology and Biotechnologyvol 99 no 6 pp 2871ndash2881 2015

[48] L Cai L Ye A H Y Tong S Lok and T Zhang ldquoBiased Di-versityMetrics Revealed byBacterial 16S PyrotagsDerived fromDifferent Primer Setsrdquo PLoS ONE vol 8 no 1 2013

[49] J Zhang K Kobert T Flouri and A Stamatakis ldquoPEAR A fastand accurate Illumina Paired-End reAd mergeRrdquo Bioinformat-ics vol 30 no 5 pp 614ndash620 2014

[50] J G Caporaso J Kuczynski J Stombaugh et al ldquoQIIME allowsanalysis of high-throughput community sequencing datardquoNature Methods vol 7 no 5 pp 335-336 2010

[51] B J Haas D Gevers A M Earl et al ldquoChimeric 16S rRNAsequence formation and detection in Sanger and 454-pyro-sequenced PCR ampliconsrdquo Genome Research vol 21 no 3 pp494ndash504 2011

[52] R C Edgar B J Haas J C Clemente C Quince and RKnight ldquoUCHIME improves sensitivity and speed of chimeradetectionrdquo Bioinformatics vol 27 no 16 pp 2194ndash2200 2011

[53] T ChenW-H Yu J Izard OV BaranovaA Lakshmanan andF E Dewhirst ldquoThe Human Oral Microbiome Database a webaccessible resource for investigating oral microbe taxonomicand genomic informationrdquo Database the journal of biologicaldatabases and curation vol 2010 p baq013 2010

[54] T C J Hill K A Walsh J A Harris and B F Moffett ldquoUsingecological diversity measures with bacterial communitiesrdquoFEMS Microbiology Ecology vol 43 no 1 pp 1ndash11 2003

[55] R PDarveau ldquoPeriodontitis a polymicrobial disruption of hosthomeostasisrdquoNature ReviewsMicrobiology vol 8 no 7 pp 481ndash490 2010

[56] J M Albandar L J Brown and H Loe ldquoPutative PeriodontalPathogens in Subgingival Plaque of Young Adults with andWithout Early-Onset Periodontitisrdquo Journal of Periodontologyvol 68 no 10 pp 973ndash981 1997

[57] Y Takeuchi M Umeda M Sakamoto Y Benno Y Huang andI Ishikawa ldquoTreponema socranskii Treponema denticola andPorphyromonas gingivalis are associated with severity of peri-odontal tissue destructionrdquo Journal of Periodontology vol 72no 10 pp 1354ndash1363 2001

[58] J J Zambon ldquoActinobacillus actinomycetemcomitans in hu-man periodontal diseaserdquo Journal of Clinical Periodontologyvol 12 no 1 pp 1ndash20 1985

[59] J Slots and M Ting ldquoActinobacillus actinomycetemcomitansand Porphyromonas gingivalis in human periodontal diseaseOccurrence and treatmentrdquo Periodontology 2000 vol 20 no 1pp 82ndash121 1999

[60] BKChoi B J Paster F EDewhirst andUBGobel ldquoDiversityof cultivable and uncultivable oral spirochetes from a patientwith severe destructive periodontitisrdquo Infection and Immunityvol 62 no 5 pp 1889ndash1895 1994

[61] B Liu L L Faller N Klitgord et al ldquoDeep sequencing of theoral microbiome reveals signatures of periodontal diseaserdquoPLoS ONE vol 7 no 6 Article ID e37919 2012

[62] J Wang J Qi H Zhao et al ldquoMetagenomic sequencingreveals microbiota and its functional potential associated withperiodontal diseaserdquo Scientific Reports vol 3 2013

[63] A L Griffen C J Beall J H Campbell et al ldquoDistinct and com-plex bacterial profiles in human periodontitis and healthrevealed by 16S pyrosequencingrdquo e ISME Journal vol 6 no6 pp 1176ndash1185 2012

[64] M A Masalma F Armougom W Michael Scheld et al ldquoTheexpansion of the microbiological spectrum of brain abscessesWith use of multiple 16S ribosomal DNA Sequencingrdquo ClinicalInfectious Diseases vol 48 no 9 pp 1169ndash1178 2009

[65] MAlMasalmaM LonjonHRichet et al ldquoMetagenomic anal-ysis of brain abscesses identifies specific bacterial associationsrdquoClinical Infectious Diseases vol 54 no 2 pp 202ndash210 2012

[66] I Nasidze J Li DQuinque K Tang andM Stoneking ldquoGlobaldiversity in the human salivary microbiomerdquoGenome Researchvol 19 no 4 pp 636ndash643 2009

[67] F V Wintzingerode U B Gobel and E Stackebrandt ldquoDeter-mination of microbial diversity in environmental samples Pit-falls of PCR-based rRNA analysisrdquo FEMSMicrobiology Reviewsvol 21 no 3 pp 213ndash229 1997

[68] M Tezal F A Scannapieco J Wactawski-Wende S G Grossiand R J Genco ldquoSupragingival plaque may modify the effectsof subgingival bacteria on attachment lossrdquo Journal of Periodon-tology vol 77 no 5 pp 808ndash813 2006

[69] R Caruana and A Niculescu-Mizil ldquoAn empirical comparisonof supervised learning algorithmsrdquo in Proceedings of the ICML2006 23rd International Conference on Machine Learning pp161ndash168 USA June 2006

[70] R Caruana N Karampatziakis and A Yessenalina ldquoAn empir-ical evaluation of supervised learning in high dimensionsrdquo inProceedings of the 25th International Conference on MachineLearning pp 96ndash103 Finland July 2008

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 9: Composition Analysis and Feature Selection of the Oral

BioMed Research International 9

6

Health StatusHMP SP

4

2

0

0 5000 10000 15000Sequences Per Sample

Aver

age S

hann

on In

dex

(a)

Observed Chao1 ACE

Case

Control

Case

Control

Case

Control

0

100

200

300

400

500

Sample group

Alp

ha D

iver

sity M

easu

re

Shannon

Case

Control

30

35

40

45

p value = 0166

(b)

Figure 5 (a) The sequencing depths measured by average scores from the Shannon index reached a plateau when the sequence numberexceeded 5000 (b) Alpha-diversity metrics (richness and Shannon index) were employed to measure the microbial communities of samplesfrom healthy patients and patients with periodontitis The average richness of microbes was higher in patients with periodontal disease thanin healthy patients however the microbial communities of healthy patients exhibited higher Shannon indexes

more balanced in samples from healthy individuals than insamples from patients with periodontal disease and therewere more microbes with low relative abundance in samplesfrom patients with periodontitis

To further explore the relationships between bacterialcommunities in healthy patients and patients with peri-odontal disease PCoAwas performed (Figure 6(a)) Analysisof beta diversity based on the weighted UniFrac distancesshowed greater concentration in diseased samples than inhealthy samples In other words the microbial composi-tions of diseased samples were more similar to each otherAs shown in Figure 6(b) PCA of microbial communitiesrevealed that the core genera in healthy samples includedStreptococcus Capnocytophaga Campylobacter VeillonellaAlloprevotella TM7 [G-1] Leptotrichia and Selenomonaswhereas those in samples from patients with periodontitiswere Filifactor Treponema Fretibacterium Porphyromonasand Tannerella

34 Machine Learning and Feature Selection Before applyingthe machine learning algorithm to classify samples it isnecessary to select the features from the samples and trainpredictionmodels Table 4 lists featureswith difference scores

p lt 1E-07 Based on significant differences between healthypatients and patients with periodontitis we selected the top10 microbes with more information as features In total1023 combinations of selected features were generated by ouralgorithm All feature combinations were evaluated by SVMrandom forest logical regression and deep learning machinelearning methods and the average accuracies were 088 093085 and 090 respectively Figure 7 shows the performanceof each machine learning method In general the accuracyof prediction increased slightly with the number of featuresused except in logistic regression From our results wefound that random forests had better predictive ability thanthe other methods Applying combinations consisting ofPeptoniphilaceae sp oral taxon 113 Streptococcus sanguinisMollicutes sp oral taxon 906Aggregatibacter actinomycetem-comitans Porphyromonas gingivalis Peptostreptococcaceae sporal taxon 950 and Lachnospiraceae sp oral taxon 500or Stomatobaculum sp oral taxon 373 Desulfobulbus sporal taxon 041 Peptoniphilaceae sp oral taxon 113 Strep-tococcus sanguinis Aggregatibacter actinomycetemcomitansPorphyromonas gingivalis and Leptotrichia sp oral taxon 218showed that random forests could predict the health status ofsamples accurately The feature combinations having averageaccuracies of more than 094 are reported in Table 5

10 BioMed Research International

minus01

00

01

02

minus02 minus01 00 01 02Axis1 [403]

Axi

s2 [1

35

]

Health statusHMPSP

Sample groupCaseControl

(a)

Prevotella

Fusobacterium

Porphyromonas

Treponema

Streptococcu

s

Leptotri

chia

Capnocytophaga

Alloprevotella

Campylobacter

Neisse

ria

Veillonella

Bacteroidales_[Gminus2]

Tannerella

Fretibacterium

Filifactor

Aggregatibacter

SelenomonasTM7_[Gminus1]minus2

0

2

4

minus4 minus2 0 2PC1 (330 explained var)

PC2

(13

3 ex

plai

ned

var)

Sample group Case Control

(b)

Figure 6 (a) Principal coordinate analysis (PCoA) with weighted UniFrac distance matrixes for bacterial communities associated with thethree health statuses (b) Principal component analysis (PCA) of the dominant genera between samples from healthy patients and patientswith periodontitis Only genera with ge 1 mean relative abundance across all samples are shown

Table 4 Features with significant differences between healthy patients and patients with periodontitis Correlation coefficients and p valueswere determined by Spearmanrsquos rank correlation coefficient and KruskalminusWallis tests respectively Negative correlations indicated that thefeatures were observed more often in patients with periodontitis than in healthy patients

No Feature (Species) Correlation coefficient 119901

1 Stomatobaculum sp oral taxon 373 -0766029754 327E-112 Desulfobulbus sp oral taxon 041 -074877058 890E-113 Peptoniphilaceae sp oral taxon 113 -0723418056 373E-104 Streptococcus sanguinis 071684624 536E-105 Mollicutes sp oral taxon 906 -0709369416 808E-106 Aggregatibacter actinomycetemcomitans -0686608198 274E-097 Porphyromonas gingivalis -0683993685 315E-098 Peptostreptococcaceae sp oral taxon 950 -0681489164 359E-099 Lachnospiraceae sp oral taxon 500 -0670324546 643E-0910 Leptotrichia sp oral taxon 218 -0666642231 777E-0911 Bosea vestrisii 0665468802 826E-0912 Filifactor alocis -0656797473 129E-0813 Mycoplasma faucium -0641322841 279E-0814 Prevotella sp oral taxon 304 -0638587976 320E-0815 Fretibacterium sp oral taxon 359 -0632290825 436E-0816 Bergeyella sp oral taxon 322 0630961524 465E-0817 Tannerella forsythia -0628346704 528E-0818 Peptostreptococcus indolicus -0626504998 577E-0819 Johnsonella sp oral taxon 166 -0622396393 704E-0820 Peptostreptococcaceae [Eubacterium] saphenum -0616735679 924E-08

BioMed Research International 11

Table 5 Feature combinations and their predictive accuracies with different machine learning methods Only feature combinations withmore than 094 average accuracy are shown DL RF and LR represent deep learning random forests and logistic regression respectively

Feature combination DL RF SVM LR Average accuracyStomatobaculum sp oral taxon 373 0967 0973 0960 0933 0958Peptoniphilaceae sp oral taxon 113Desulfobulbaceae sp oral taxon 041

0933 0960 0973 0947 0953Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLachnospiraceae sp oral taxon 500Leptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0933 0973 0960 0947 0953Streptococcus sanguinisAggregatibacter actinomycetemcomitansDesulfobulbaceae sp oral taxon 041

0973 0967 0933 0927 0950Mollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0947 0953 0907 0987 0948Streptococcus sanguinisMollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0960 0967 0947 0913 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Desulfobulbaceae sp oral taxon 041

0933 0973 0933 0947 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0967 0933 0953 0933 0947Peptoniphilaceae sp oral taxon 113Mollicutes sp oral taxon 906Peptoniphilaceae sp oral taxon 113

0960 0987 0867 0967 0945Streptococcus sanguinisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0920 0947 0967 0947 0945Aggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0967 0967 0953 0893 0945Peptoniphilaceae sp oral taxon 113Porphyromonas gingivalisAggregatibacter actinomycetemcomitans

According to previous studies Caruana et al [69 70] pro-posed that the random forest method showed better accuracyin high-dimensional and large-scale data than neural netsSVM and logistic regression In this study we found that therandom forest method was more suitable for small-scale datathan other methods In contrast deep learning approachesled to good performance but required long computation

times and large amounts of memory particularly when thehidden layer size was increased

4 Conclusions

With the development of high-throughput DNA sequencingtechnology the limitations associated with difficult culture of

12 BioMed Research International

075

080

085

090

095

1 2 3 4 5 6 7 8 9 10Number of features in the combinstion

Aver

age a

ccur

acy

Machine learning algorithmsDeep learningLogistic regressionRandom forestSVM

Figure 7 Average accuracies of different numbers of features

manymicrobes that populate the oral cavity can be overcomefacilitating the analysis of bacterial community compositionUsing 16S rRNA sequencing of subgingival samples from 50individuals with periodontitis and 26 periodontally healthycontrols we determined the diversity of and differences incommunity compositions Moreover we identified microbesassociated with good health and periodontal disease andprovided a machine learning method for finding patternsand making predictions for oral microbiota associated withperiodontal disease

Our results showed that there was a higher diversity ofmicrobes in samples from patients with periodontal diseasethan in samples from healthy patients Importantly the coremicrobes in healthy patients were different significantly fromthose in patients with periodontitis We also found thatbacterial communities associated with healthy and diseasedstates were highly different in PCA and PCoA and thecompositions of microorganisms were more similar to eachother in samples from patients with periodontal disease thanin samples from healthy individuals

We proposed a novel feature selection method andinvestigated the potential of machine learning approachesfor determination of health status based on oral metage-nomics data By using nonparametric KruskalminusWallis teststo assess the significance of each microorganism we selectedsignificant microbes to generate prioritized feature combina-tions by our algorithm The performances of four machinelearning approaches were evaluated with these feature com-binations and random forests showed the best performance(average accuracy of 093 from 1023 feature combinations)followed by deep learning SVM and logistic regressionUsing machine learning methods training models couldaccurately predict the health status of samples by examiningfewer features According to our observations the accuracyof prediction generally increased slightly with the numberof features used except for logistic regression Notablycertain combinations composed of fewer features showedbetter accuracy than combinations composed of all selectedfeatures These combinations of features may only apply toour dataset However the results implied that a few relatedfeatures may have better predictive ability than multiple

independent features Therefore in order to improve theprediction accuracy of the model it is essential to identifythe most informative features Due to limitations in fundingtime and ethical considerations it is not easy to obtain largenumbers of oral samples from patients with periodontitisAlthough insufficient and incomplete samples could easilylead to bias and variance in training models our study stillprovided an important basis for further studies

Periodontitis is a chronic inflammatory disease involvingcomplex interactions between the oral microorganisms andthe host immune response In addition to the individualspecies associated with pathogenesis the system-level mech-anisms underlying the transition from a healthy state to adiseased state are key points for studying periodontal diseaseThus in our future studies we aim to elucidate the globalgenetic metabolic and ecological changes associated withperiodontitis and identify the pathogenic features of con-structing machine learning models Rapid molecular tech-niques and machine learning methods capable of identifyingperiodontal bacteria with great accuracy may eventuallyprovide improved classification and diagnosis of varioustypes of periodontal diseases and aid significantly in clinicaldecision-making

Data Availability

The raw sequences of human oral subgingival plaque sampleswere deposited at theNCBI SequenceReadArchive under theBioproject Accession no PRJNA437129

Conflicts of Interest

The authors declare no conflicts of interest

Authorsrsquo Contributions

Wen-Pei Chen and Shih-Hao Chang contributed equally tothis work

Funding

Thepresent work was partially supported by a grant from theMinistry of Science and Technology [grant number MOST107-2218-E-126-001-] and [grant number NSC 102-2622-E-126-002 CC1]

References

[1] L Gao T Xu G Huang S Jiang Y Gu and F Chen ldquoOral mi-crobiomesmore andmore importance in oral cavity and wholebodyrdquo Protein amp Cell pp 1ndash13 2018

[2] P D Marsh ldquoMicrobiology of dental plaque biofilms and theirrole in oral health and cariesrdquo Dental Clinics of North Americavol 54 no 3 pp 441ndash454 2010

[3] P D Marsh ldquoDental plaque as a biofilm and a microbialcommunity - Implications for health and diseaserdquo BMC OralHealth vol 6 no 1 2006

BioMed Research International 13

[4] R J Palmer Jr ldquoComposition and development of oral bacterialcommunitiesrdquo Periodontology 2000 vol 64 no 1 pp 20ndash392014

[5] B L Pihlstrom B S Michalowicz and N W Johnson ldquoPeri-odontal diseasesrdquoe Lancet vol 366 no 9499 pp 1809ndash18202005

[6] R J Genco and T E Van Dyke ldquoReducing the risk of CVD inpatients with periodontitisrdquo Nature Reviews Cardiology vol 7no 9 pp 479-480 2010

[7] K Lundberg N Wegner T Yucel-Lindberg and P J VenablesldquoPeriodontitis in RA-the citrullinated enolase connectionrdquoNature Reviews Rheumatology vol 6 no 12 pp 727ndash730 2010

[8] M Kebschull R T Demmer and P N Papapanou ldquoldquoGumbug leave my heart alonerdquomdashepidemiologic and mechanisticevidence linking periodontal infections and atherosclerosisrdquoJournal of Dental Research vol 89 no 9 pp 879ndash902 2010

[9] S E Whitmore R J Lamont and W E Goldman ldquoOral Bac-teria and Cancerrdquo PLoS Pathogens vol 10 no 3 p e10039332014

[10] Y W Han and X Wang ldquoMobile microbiome oral bacteria inextra-oral infections and inflammationrdquo Journal of Dental Re-search vol 92 no 6 pp 485ndash491 2013

[11] P N Madianos Y A Bobetsis and S Offenbacher ldquoAdversepregnancy outcomes (APOs) and periodontal disease Patho-genic mechanismsrdquo Journal of Clinical Periodontology vol 40no 14 pp S170ndashS180 2013

[12] S Temoin K LWu VWuM Shoham and YW Han ldquoSignalpeptide of FadA adhesin from Fusobacterium nucleatum playsa novel structural role by modulating the filamentrsquos length andwidthrdquo FEBS Letters vol 586 no 1 pp 1ndash6 2012

[13] S Malm M Jusko S Eick J Potempa K Riesbeck and AM Blom ldquoAcquisition of complement inhibitor serine proteasefactor i and its cofactors C4b-binding protein and factor H byprevotella intermediardquo PLoS ONE vol 7 no 4 2012

[14] S S Socransky A D Haffajee M A Cugini C Smith and R LKent Jr ldquoMicrobial complexes in subgingival plaquerdquo Journal ofClinical Periodontology vol 25 no 2 pp 134ndash144 1998

[15] R Teles F Teles J Frias-Lopez B Paster and A HaffajeeldquoLessons learned and unlearned in periodontal microbiologyrdquoPeriodontology 2000 vol 62 no 1 pp 95ndash162 2013

[16] P Belda-Ferre L D Alcaraz R Cabrera-Rubio et al ldquoThe oralmetagenome in health and diseaserdquo e ISME Journal vol 6no 1 pp 46ndash56 2012

[17] P S Kumar A L Griffen M L Moeschberger and E J LeysldquoIdentification of candidate periodontal pathogens and ben-eficial species by quantitative 16S clonal analysisrdquo Journal ofClinical Microbiology vol 43 no 8 pp 3944ndash3955 2005

[18] A P V Colombo S K Boches S L Cotton et al ldquoComparisonsof subgingival microbial profiles of refractory periodontitissevere periodontitis and periodontal health using the humanoral microbe identification microarrayrdquo Journal of Periodontol-ogy vol 80 no 9 pp 1421ndash1432 2009

[19] S Junemann K Prior R Szczepanowski et al ldquoBacterial com-munity shift in treated periodontitis patients revealed by IonTorrent 16S rRNA gene amplicon sequencingrdquo PLoS ONE vol7 no 8 Article ID e41606 2012

[20] P D Schloss S L Westcott T Ryabin et al ldquoIntroduc-ing mothur open-source platform-independent community-supported software for describing and comparing microbialcommunitiesrdquoApplied and EnvironmentalMicrobiology vol 75no 23 pp 7537ndash7541 2009

[21] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquoBioinformatics vol 26 no 19 pp 2460-2461 2010

[22] F Mahe T Rognes C Quince C de Vargas andM DunthornldquoSwarm Robust and fast clusteringmethod for amplicon-basedstudiesrdquo PeerJ vol 2014 no 1 2014

[23] M Ghodsi B Liu and M Pop ldquoDNACLUST Accurate andefficient clustering of phylogenetic marker genesrdquo BMC Bioin-formatics vol 12 article no 271 2011

[24] L Fu B Niu Z Zhu SWu andW Li ldquoCD-HIT accelerated forclustering the next-generation sequencing datardquoBioinformaticsvol 28 no 23 pp 3150ndash3152 2012

[25] H Teeling J Waldmann T Lombardot M Bauer and F OGlockner ldquoTETRA a web-service and a stand-alone programfor the analysis and comparison of tetranucleotide usage pat-terns inDNA sequencesrdquoBMCBioinformatics vol 5 article 1632004

[26] Y Wang H C M Leung S M Yiu and F Y L Chin ldquoMeta-Cluster 40 A novel binning algorithm for NGS reads and hugenumber of speciesrdquo Journal of Computational Biology vol 19no 2 pp 241ndash249 2012

[27] YWangH CM Leung SM Yiu and F Y L Chin ldquoMetaclus-ter 50 a two-roundbinning approach formetagenomic data forlow-abundance species in a noisy samplerdquo Bioinformatics vol28 no 18 Article ID bts397 pp i356ndashi362 2012

[28] D D Kang J Froula R Egan and Z Wang ldquoMetaBAT anefficient tool for accurately reconstructing single genomes fromcomplex microbial communitiesrdquo PeerJ vol 2015 no 8 2015

[29] J Alneberg B S Bjarnason I De Bruijn et al ldquoBinning meta-genomic contigs by coverage and compositionrdquo Nature Meth-ods vol 11 no 11 pp 1144ndash1146 2014

[30] Y-W Wu Y-H Tang S G Tringe B A Simmons and SW Singer ldquoMaxBin An automated binning method to recoverindividual genomes from metagenomes using an expectation-maximization algorithmrdquoMicrobiome vol 2 no 1 2014

[31] QWangGMGarrity JM Tiedje and J R Cole ldquoNaıve Baye-sian classifier for rapid assignment of rRNA sequences into thenew bacterial taxonomyrdquo Applied and Environmental Microbi-ology vol 73 no 16 pp 5261ndash5267 2007

[32] N Chaudhary A K Sharma P Agarwal A Gupta and V KSharma ldquo16S classifier A tool for fast and accurate taxono-mic classification of 16S rRNAhypervariable regions inmetage-nomic datasetsrdquo PLoS ONE vol 10 no 2 2015

[33] S P Szafranski M L Wos-Oxley R Vilchez-Vargas et alldquoHigh-resolution taxonomic profiling of the subgingival micro-biome for biomarker discovery and periodontitis diagnosisrdquoApplied and EnvironmentalMicrobiology vol 81 no 3 pp 1047ndash1058 2015

[34] K Vervier PMaheM Tournoud J-B Veyrieras and J-P VertldquoLarge-scale machine learning for metagenomics sequenceclassificationrdquo Bioinformatics vol 32 no 7 pp 1023ndash1032 2016

[35] A E Darling G Jospin E Lowe F A Matsen H M Bik andJ A Eisen ldquoPhyloSift Phylogenetic analysis of genomes andmetagenomesrdquo PeerJ vol 2013 no 1 2014

[36] Z Liu W Hsiao B L Cantarel E F Drabek and C Fraser-Liggett ldquoSparse distance-based learning for simultaneous mul-ticlass classification and feature selection of metagenomic datardquoBioinformatics vol 27 no 23 pp 3242ndash3249 2011

[37] O Tanaseichuk J Borneman and T Jiang ldquoPhylogeny-basedclassification ofmicrobial communitiesrdquoBioinformatics vol 30no 4 pp 449ndash456 2014

14 BioMed Research International

[38] H Cui and X Zhang ldquoAlignment-free supervised classificationof metagenomes by recursive SVMrdquo BMCGenomics vol 14 no1 article no 641 2013

[39] E Pasolli D T Truong F Malik L Waldron and N SegataldquoMachine Learning Meta-analysis of Large MetagenomicDatasets Tools and Biological Insightsrdquo PLoS ComputationalBiology vol 12 no 7 2016

[40] M Arumugam E D Harrington K U Foerstner J Raes andP Bork ldquoSmashCommunity A metagenomic annotation andanalysis toolrdquo Bioinformatics vol 26 no 23 pp 2977-29782010

[41] K J Hoff T Lingner P Meinicke and M Tech ldquoOrpheliaPredicting genes in metagenomic sequencing readsrdquo NucleicAcids Research vol 37 no 2 pp W101ndashW105 2009

[42] K J HoffM Tech T Lingner R Daniel BMorgenstern and PMeinicke ldquoGene prediction in metagenomic fragments A largescale machine learning approachrdquo BMC Bioinformatics vol 9article no 217 2008

[43] I Guyon and A Elisseeff ldquoAn introduction to variable andfeature selectionrdquo Journal of Machine Learning Research vol 3pp 1157ndash1182 2003

[44] Y Saeys I Inza and P Larranaga ldquoA review of feature selectiontechniques in bioinformaticsrdquo Bioinformatics vol 23 no 19 pp2507ndash2517 2007

[45] W-P Chen S-J J Tsai Y-C Hu and Y-L Lin ldquoMetage-nomic analysis and features selection in human oral microbiotaassociated with periodontal diseaserdquo in Proceedings of the 33rdWorkshop on Combinatorial Mathematics and Computationeory pp 58ndash64 2016

[46] G C Armitage ldquoDevelopment of a classification system forperiodontal diseases and conditionsrdquo Annals of Periodontologyvol 4 no 1 pp 1ndash6 1999

[47] C Y Tang S-M Yiu H-Y Kuo et al ldquoApplication of 16S rRNAmetagenomics to analyze bacterial communities at a respiratorycare centre in Taiwanrdquo Applied Microbiology and Biotechnologyvol 99 no 6 pp 2871ndash2881 2015

[48] L Cai L Ye A H Y Tong S Lok and T Zhang ldquoBiased Di-versityMetrics Revealed byBacterial 16S PyrotagsDerived fromDifferent Primer Setsrdquo PLoS ONE vol 8 no 1 2013

[49] J Zhang K Kobert T Flouri and A Stamatakis ldquoPEAR A fastand accurate Illumina Paired-End reAd mergeRrdquo Bioinformat-ics vol 30 no 5 pp 614ndash620 2014

[50] J G Caporaso J Kuczynski J Stombaugh et al ldquoQIIME allowsanalysis of high-throughput community sequencing datardquoNature Methods vol 7 no 5 pp 335-336 2010

[51] B J Haas D Gevers A M Earl et al ldquoChimeric 16S rRNAsequence formation and detection in Sanger and 454-pyro-sequenced PCR ampliconsrdquo Genome Research vol 21 no 3 pp494ndash504 2011

[52] R C Edgar B J Haas J C Clemente C Quince and RKnight ldquoUCHIME improves sensitivity and speed of chimeradetectionrdquo Bioinformatics vol 27 no 16 pp 2194ndash2200 2011

[53] T ChenW-H Yu J Izard OV BaranovaA Lakshmanan andF E Dewhirst ldquoThe Human Oral Microbiome Database a webaccessible resource for investigating oral microbe taxonomicand genomic informationrdquo Database the journal of biologicaldatabases and curation vol 2010 p baq013 2010

[54] T C J Hill K A Walsh J A Harris and B F Moffett ldquoUsingecological diversity measures with bacterial communitiesrdquoFEMS Microbiology Ecology vol 43 no 1 pp 1ndash11 2003

[55] R PDarveau ldquoPeriodontitis a polymicrobial disruption of hosthomeostasisrdquoNature ReviewsMicrobiology vol 8 no 7 pp 481ndash490 2010

[56] J M Albandar L J Brown and H Loe ldquoPutative PeriodontalPathogens in Subgingival Plaque of Young Adults with andWithout Early-Onset Periodontitisrdquo Journal of Periodontologyvol 68 no 10 pp 973ndash981 1997

[57] Y Takeuchi M Umeda M Sakamoto Y Benno Y Huang andI Ishikawa ldquoTreponema socranskii Treponema denticola andPorphyromonas gingivalis are associated with severity of peri-odontal tissue destructionrdquo Journal of Periodontology vol 72no 10 pp 1354ndash1363 2001

[58] J J Zambon ldquoActinobacillus actinomycetemcomitans in hu-man periodontal diseaserdquo Journal of Clinical Periodontologyvol 12 no 1 pp 1ndash20 1985

[59] J Slots and M Ting ldquoActinobacillus actinomycetemcomitansand Porphyromonas gingivalis in human periodontal diseaseOccurrence and treatmentrdquo Periodontology 2000 vol 20 no 1pp 82ndash121 1999

[60] BKChoi B J Paster F EDewhirst andUBGobel ldquoDiversityof cultivable and uncultivable oral spirochetes from a patientwith severe destructive periodontitisrdquo Infection and Immunityvol 62 no 5 pp 1889ndash1895 1994

[61] B Liu L L Faller N Klitgord et al ldquoDeep sequencing of theoral microbiome reveals signatures of periodontal diseaserdquoPLoS ONE vol 7 no 6 Article ID e37919 2012

[62] J Wang J Qi H Zhao et al ldquoMetagenomic sequencingreveals microbiota and its functional potential associated withperiodontal diseaserdquo Scientific Reports vol 3 2013

[63] A L Griffen C J Beall J H Campbell et al ldquoDistinct and com-plex bacterial profiles in human periodontitis and healthrevealed by 16S pyrosequencingrdquo e ISME Journal vol 6 no6 pp 1176ndash1185 2012

[64] M A Masalma F Armougom W Michael Scheld et al ldquoTheexpansion of the microbiological spectrum of brain abscessesWith use of multiple 16S ribosomal DNA Sequencingrdquo ClinicalInfectious Diseases vol 48 no 9 pp 1169ndash1178 2009

[65] MAlMasalmaM LonjonHRichet et al ldquoMetagenomic anal-ysis of brain abscesses identifies specific bacterial associationsrdquoClinical Infectious Diseases vol 54 no 2 pp 202ndash210 2012

[66] I Nasidze J Li DQuinque K Tang andM Stoneking ldquoGlobaldiversity in the human salivary microbiomerdquoGenome Researchvol 19 no 4 pp 636ndash643 2009

[67] F V Wintzingerode U B Gobel and E Stackebrandt ldquoDeter-mination of microbial diversity in environmental samples Pit-falls of PCR-based rRNA analysisrdquo FEMSMicrobiology Reviewsvol 21 no 3 pp 213ndash229 1997

[68] M Tezal F A Scannapieco J Wactawski-Wende S G Grossiand R J Genco ldquoSupragingival plaque may modify the effectsof subgingival bacteria on attachment lossrdquo Journal of Periodon-tology vol 77 no 5 pp 808ndash813 2006

[69] R Caruana and A Niculescu-Mizil ldquoAn empirical comparisonof supervised learning algorithmsrdquo in Proceedings of the ICML2006 23rd International Conference on Machine Learning pp161ndash168 USA June 2006

[70] R Caruana N Karampatziakis and A Yessenalina ldquoAn empir-ical evaluation of supervised learning in high dimensionsrdquo inProceedings of the 25th International Conference on MachineLearning pp 96ndash103 Finland July 2008

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 10: Composition Analysis and Feature Selection of the Oral

10 BioMed Research International

minus01

00

01

02

minus02 minus01 00 01 02Axis1 [403]

Axi

s2 [1

35

]

Health statusHMPSP

Sample groupCaseControl

(a)

Prevotella

Fusobacterium

Porphyromonas

Treponema

Streptococcu

s

Leptotri

chia

Capnocytophaga

Alloprevotella

Campylobacter

Neisse

ria

Veillonella

Bacteroidales_[Gminus2]

Tannerella

Fretibacterium

Filifactor

Aggregatibacter

SelenomonasTM7_[Gminus1]minus2

0

2

4

minus4 minus2 0 2PC1 (330 explained var)

PC2

(13

3 ex

plai

ned

var)

Sample group Case Control

(b)

Figure 6 (a) Principal coordinate analysis (PCoA) with weighted UniFrac distance matrixes for bacterial communities associated with thethree health statuses (b) Principal component analysis (PCA) of the dominant genera between samples from healthy patients and patientswith periodontitis Only genera with ge 1 mean relative abundance across all samples are shown

Table 4 Features with significant differences between healthy patients and patients with periodontitis Correlation coefficients and p valueswere determined by Spearmanrsquos rank correlation coefficient and KruskalminusWallis tests respectively Negative correlations indicated that thefeatures were observed more often in patients with periodontitis than in healthy patients

No Feature (Species) Correlation coefficient 119901

1 Stomatobaculum sp oral taxon 373 -0766029754 327E-112 Desulfobulbus sp oral taxon 041 -074877058 890E-113 Peptoniphilaceae sp oral taxon 113 -0723418056 373E-104 Streptococcus sanguinis 071684624 536E-105 Mollicutes sp oral taxon 906 -0709369416 808E-106 Aggregatibacter actinomycetemcomitans -0686608198 274E-097 Porphyromonas gingivalis -0683993685 315E-098 Peptostreptococcaceae sp oral taxon 950 -0681489164 359E-099 Lachnospiraceae sp oral taxon 500 -0670324546 643E-0910 Leptotrichia sp oral taxon 218 -0666642231 777E-0911 Bosea vestrisii 0665468802 826E-0912 Filifactor alocis -0656797473 129E-0813 Mycoplasma faucium -0641322841 279E-0814 Prevotella sp oral taxon 304 -0638587976 320E-0815 Fretibacterium sp oral taxon 359 -0632290825 436E-0816 Bergeyella sp oral taxon 322 0630961524 465E-0817 Tannerella forsythia -0628346704 528E-0818 Peptostreptococcus indolicus -0626504998 577E-0819 Johnsonella sp oral taxon 166 -0622396393 704E-0820 Peptostreptococcaceae [Eubacterium] saphenum -0616735679 924E-08

BioMed Research International 11

Table 5 Feature combinations and their predictive accuracies with different machine learning methods Only feature combinations withmore than 094 average accuracy are shown DL RF and LR represent deep learning random forests and logistic regression respectively

Feature combination DL RF SVM LR Average accuracyStomatobaculum sp oral taxon 373 0967 0973 0960 0933 0958Peptoniphilaceae sp oral taxon 113Desulfobulbaceae sp oral taxon 041

0933 0960 0973 0947 0953Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLachnospiraceae sp oral taxon 500Leptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0933 0973 0960 0947 0953Streptococcus sanguinisAggregatibacter actinomycetemcomitansDesulfobulbaceae sp oral taxon 041

0973 0967 0933 0927 0950Mollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0947 0953 0907 0987 0948Streptococcus sanguinisMollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0960 0967 0947 0913 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Desulfobulbaceae sp oral taxon 041

0933 0973 0933 0947 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0967 0933 0953 0933 0947Peptoniphilaceae sp oral taxon 113Mollicutes sp oral taxon 906Peptoniphilaceae sp oral taxon 113

0960 0987 0867 0967 0945Streptococcus sanguinisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0920 0947 0967 0947 0945Aggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0967 0967 0953 0893 0945Peptoniphilaceae sp oral taxon 113Porphyromonas gingivalisAggregatibacter actinomycetemcomitans

According to previous studies Caruana et al [69 70] pro-posed that the random forest method showed better accuracyin high-dimensional and large-scale data than neural netsSVM and logistic regression In this study we found that therandom forest method was more suitable for small-scale datathan other methods In contrast deep learning approachesled to good performance but required long computation

times and large amounts of memory particularly when thehidden layer size was increased

4 Conclusions

With the development of high-throughput DNA sequencingtechnology the limitations associated with difficult culture of

12 BioMed Research International

075

080

085

090

095

1 2 3 4 5 6 7 8 9 10Number of features in the combinstion

Aver

age a

ccur

acy

Machine learning algorithmsDeep learningLogistic regressionRandom forestSVM

Figure 7 Average accuracies of different numbers of features

manymicrobes that populate the oral cavity can be overcomefacilitating the analysis of bacterial community compositionUsing 16S rRNA sequencing of subgingival samples from 50individuals with periodontitis and 26 periodontally healthycontrols we determined the diversity of and differences incommunity compositions Moreover we identified microbesassociated with good health and periodontal disease andprovided a machine learning method for finding patternsand making predictions for oral microbiota associated withperiodontal disease

Our results showed that there was a higher diversity ofmicrobes in samples from patients with periodontal diseasethan in samples from healthy patients Importantly the coremicrobes in healthy patients were different significantly fromthose in patients with periodontitis We also found thatbacterial communities associated with healthy and diseasedstates were highly different in PCA and PCoA and thecompositions of microorganisms were more similar to eachother in samples from patients with periodontal disease thanin samples from healthy individuals

We proposed a novel feature selection method andinvestigated the potential of machine learning approachesfor determination of health status based on oral metage-nomics data By using nonparametric KruskalminusWallis teststo assess the significance of each microorganism we selectedsignificant microbes to generate prioritized feature combina-tions by our algorithm The performances of four machinelearning approaches were evaluated with these feature com-binations and random forests showed the best performance(average accuracy of 093 from 1023 feature combinations)followed by deep learning SVM and logistic regressionUsing machine learning methods training models couldaccurately predict the health status of samples by examiningfewer features According to our observations the accuracyof prediction generally increased slightly with the numberof features used except for logistic regression Notablycertain combinations composed of fewer features showedbetter accuracy than combinations composed of all selectedfeatures These combinations of features may only apply toour dataset However the results implied that a few relatedfeatures may have better predictive ability than multiple

independent features Therefore in order to improve theprediction accuracy of the model it is essential to identifythe most informative features Due to limitations in fundingtime and ethical considerations it is not easy to obtain largenumbers of oral samples from patients with periodontitisAlthough insufficient and incomplete samples could easilylead to bias and variance in training models our study stillprovided an important basis for further studies

Periodontitis is a chronic inflammatory disease involvingcomplex interactions between the oral microorganisms andthe host immune response In addition to the individualspecies associated with pathogenesis the system-level mech-anisms underlying the transition from a healthy state to adiseased state are key points for studying periodontal diseaseThus in our future studies we aim to elucidate the globalgenetic metabolic and ecological changes associated withperiodontitis and identify the pathogenic features of con-structing machine learning models Rapid molecular tech-niques and machine learning methods capable of identifyingperiodontal bacteria with great accuracy may eventuallyprovide improved classification and diagnosis of varioustypes of periodontal diseases and aid significantly in clinicaldecision-making

Data Availability

The raw sequences of human oral subgingival plaque sampleswere deposited at theNCBI SequenceReadArchive under theBioproject Accession no PRJNA437129

Conflicts of Interest

The authors declare no conflicts of interest

Authorsrsquo Contributions

Wen-Pei Chen and Shih-Hao Chang contributed equally tothis work

Funding

Thepresent work was partially supported by a grant from theMinistry of Science and Technology [grant number MOST107-2218-E-126-001-] and [grant number NSC 102-2622-E-126-002 CC1]

References

[1] L Gao T Xu G Huang S Jiang Y Gu and F Chen ldquoOral mi-crobiomesmore andmore importance in oral cavity and wholebodyrdquo Protein amp Cell pp 1ndash13 2018

[2] P D Marsh ldquoMicrobiology of dental plaque biofilms and theirrole in oral health and cariesrdquo Dental Clinics of North Americavol 54 no 3 pp 441ndash454 2010

[3] P D Marsh ldquoDental plaque as a biofilm and a microbialcommunity - Implications for health and diseaserdquo BMC OralHealth vol 6 no 1 2006

BioMed Research International 13

[4] R J Palmer Jr ldquoComposition and development of oral bacterialcommunitiesrdquo Periodontology 2000 vol 64 no 1 pp 20ndash392014

[5] B L Pihlstrom B S Michalowicz and N W Johnson ldquoPeri-odontal diseasesrdquoe Lancet vol 366 no 9499 pp 1809ndash18202005

[6] R J Genco and T E Van Dyke ldquoReducing the risk of CVD inpatients with periodontitisrdquo Nature Reviews Cardiology vol 7no 9 pp 479-480 2010

[7] K Lundberg N Wegner T Yucel-Lindberg and P J VenablesldquoPeriodontitis in RA-the citrullinated enolase connectionrdquoNature Reviews Rheumatology vol 6 no 12 pp 727ndash730 2010

[8] M Kebschull R T Demmer and P N Papapanou ldquoldquoGumbug leave my heart alonerdquomdashepidemiologic and mechanisticevidence linking periodontal infections and atherosclerosisrdquoJournal of Dental Research vol 89 no 9 pp 879ndash902 2010

[9] S E Whitmore R J Lamont and W E Goldman ldquoOral Bac-teria and Cancerrdquo PLoS Pathogens vol 10 no 3 p e10039332014

[10] Y W Han and X Wang ldquoMobile microbiome oral bacteria inextra-oral infections and inflammationrdquo Journal of Dental Re-search vol 92 no 6 pp 485ndash491 2013

[11] P N Madianos Y A Bobetsis and S Offenbacher ldquoAdversepregnancy outcomes (APOs) and periodontal disease Patho-genic mechanismsrdquo Journal of Clinical Periodontology vol 40no 14 pp S170ndashS180 2013

[12] S Temoin K LWu VWuM Shoham and YW Han ldquoSignalpeptide of FadA adhesin from Fusobacterium nucleatum playsa novel structural role by modulating the filamentrsquos length andwidthrdquo FEBS Letters vol 586 no 1 pp 1ndash6 2012

[13] S Malm M Jusko S Eick J Potempa K Riesbeck and AM Blom ldquoAcquisition of complement inhibitor serine proteasefactor i and its cofactors C4b-binding protein and factor H byprevotella intermediardquo PLoS ONE vol 7 no 4 2012

[14] S S Socransky A D Haffajee M A Cugini C Smith and R LKent Jr ldquoMicrobial complexes in subgingival plaquerdquo Journal ofClinical Periodontology vol 25 no 2 pp 134ndash144 1998

[15] R Teles F Teles J Frias-Lopez B Paster and A HaffajeeldquoLessons learned and unlearned in periodontal microbiologyrdquoPeriodontology 2000 vol 62 no 1 pp 95ndash162 2013

[16] P Belda-Ferre L D Alcaraz R Cabrera-Rubio et al ldquoThe oralmetagenome in health and diseaserdquo e ISME Journal vol 6no 1 pp 46ndash56 2012

[17] P S Kumar A L Griffen M L Moeschberger and E J LeysldquoIdentification of candidate periodontal pathogens and ben-eficial species by quantitative 16S clonal analysisrdquo Journal ofClinical Microbiology vol 43 no 8 pp 3944ndash3955 2005

[18] A P V Colombo S K Boches S L Cotton et al ldquoComparisonsof subgingival microbial profiles of refractory periodontitissevere periodontitis and periodontal health using the humanoral microbe identification microarrayrdquo Journal of Periodontol-ogy vol 80 no 9 pp 1421ndash1432 2009

[19] S Junemann K Prior R Szczepanowski et al ldquoBacterial com-munity shift in treated periodontitis patients revealed by IonTorrent 16S rRNA gene amplicon sequencingrdquo PLoS ONE vol7 no 8 Article ID e41606 2012

[20] P D Schloss S L Westcott T Ryabin et al ldquoIntroduc-ing mothur open-source platform-independent community-supported software for describing and comparing microbialcommunitiesrdquoApplied and EnvironmentalMicrobiology vol 75no 23 pp 7537ndash7541 2009

[21] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquoBioinformatics vol 26 no 19 pp 2460-2461 2010

[22] F Mahe T Rognes C Quince C de Vargas andM DunthornldquoSwarm Robust and fast clusteringmethod for amplicon-basedstudiesrdquo PeerJ vol 2014 no 1 2014

[23] M Ghodsi B Liu and M Pop ldquoDNACLUST Accurate andefficient clustering of phylogenetic marker genesrdquo BMC Bioin-formatics vol 12 article no 271 2011

[24] L Fu B Niu Z Zhu SWu andW Li ldquoCD-HIT accelerated forclustering the next-generation sequencing datardquoBioinformaticsvol 28 no 23 pp 3150ndash3152 2012

[25] H Teeling J Waldmann T Lombardot M Bauer and F OGlockner ldquoTETRA a web-service and a stand-alone programfor the analysis and comparison of tetranucleotide usage pat-terns inDNA sequencesrdquoBMCBioinformatics vol 5 article 1632004

[26] Y Wang H C M Leung S M Yiu and F Y L Chin ldquoMeta-Cluster 40 A novel binning algorithm for NGS reads and hugenumber of speciesrdquo Journal of Computational Biology vol 19no 2 pp 241ndash249 2012

[27] YWangH CM Leung SM Yiu and F Y L Chin ldquoMetaclus-ter 50 a two-roundbinning approach formetagenomic data forlow-abundance species in a noisy samplerdquo Bioinformatics vol28 no 18 Article ID bts397 pp i356ndashi362 2012

[28] D D Kang J Froula R Egan and Z Wang ldquoMetaBAT anefficient tool for accurately reconstructing single genomes fromcomplex microbial communitiesrdquo PeerJ vol 2015 no 8 2015

[29] J Alneberg B S Bjarnason I De Bruijn et al ldquoBinning meta-genomic contigs by coverage and compositionrdquo Nature Meth-ods vol 11 no 11 pp 1144ndash1146 2014

[30] Y-W Wu Y-H Tang S G Tringe B A Simmons and SW Singer ldquoMaxBin An automated binning method to recoverindividual genomes from metagenomes using an expectation-maximization algorithmrdquoMicrobiome vol 2 no 1 2014

[31] QWangGMGarrity JM Tiedje and J R Cole ldquoNaıve Baye-sian classifier for rapid assignment of rRNA sequences into thenew bacterial taxonomyrdquo Applied and Environmental Microbi-ology vol 73 no 16 pp 5261ndash5267 2007

[32] N Chaudhary A K Sharma P Agarwal A Gupta and V KSharma ldquo16S classifier A tool for fast and accurate taxono-mic classification of 16S rRNAhypervariable regions inmetage-nomic datasetsrdquo PLoS ONE vol 10 no 2 2015

[33] S P Szafranski M L Wos-Oxley R Vilchez-Vargas et alldquoHigh-resolution taxonomic profiling of the subgingival micro-biome for biomarker discovery and periodontitis diagnosisrdquoApplied and EnvironmentalMicrobiology vol 81 no 3 pp 1047ndash1058 2015

[34] K Vervier PMaheM Tournoud J-B Veyrieras and J-P VertldquoLarge-scale machine learning for metagenomics sequenceclassificationrdquo Bioinformatics vol 32 no 7 pp 1023ndash1032 2016

[35] A E Darling G Jospin E Lowe F A Matsen H M Bik andJ A Eisen ldquoPhyloSift Phylogenetic analysis of genomes andmetagenomesrdquo PeerJ vol 2013 no 1 2014

[36] Z Liu W Hsiao B L Cantarel E F Drabek and C Fraser-Liggett ldquoSparse distance-based learning for simultaneous mul-ticlass classification and feature selection of metagenomic datardquoBioinformatics vol 27 no 23 pp 3242ndash3249 2011

[37] O Tanaseichuk J Borneman and T Jiang ldquoPhylogeny-basedclassification ofmicrobial communitiesrdquoBioinformatics vol 30no 4 pp 449ndash456 2014

14 BioMed Research International

[38] H Cui and X Zhang ldquoAlignment-free supervised classificationof metagenomes by recursive SVMrdquo BMCGenomics vol 14 no1 article no 641 2013

[39] E Pasolli D T Truong F Malik L Waldron and N SegataldquoMachine Learning Meta-analysis of Large MetagenomicDatasets Tools and Biological Insightsrdquo PLoS ComputationalBiology vol 12 no 7 2016

[40] M Arumugam E D Harrington K U Foerstner J Raes andP Bork ldquoSmashCommunity A metagenomic annotation andanalysis toolrdquo Bioinformatics vol 26 no 23 pp 2977-29782010

[41] K J Hoff T Lingner P Meinicke and M Tech ldquoOrpheliaPredicting genes in metagenomic sequencing readsrdquo NucleicAcids Research vol 37 no 2 pp W101ndashW105 2009

[42] K J HoffM Tech T Lingner R Daniel BMorgenstern and PMeinicke ldquoGene prediction in metagenomic fragments A largescale machine learning approachrdquo BMC Bioinformatics vol 9article no 217 2008

[43] I Guyon and A Elisseeff ldquoAn introduction to variable andfeature selectionrdquo Journal of Machine Learning Research vol 3pp 1157ndash1182 2003

[44] Y Saeys I Inza and P Larranaga ldquoA review of feature selectiontechniques in bioinformaticsrdquo Bioinformatics vol 23 no 19 pp2507ndash2517 2007

[45] W-P Chen S-J J Tsai Y-C Hu and Y-L Lin ldquoMetage-nomic analysis and features selection in human oral microbiotaassociated with periodontal diseaserdquo in Proceedings of the 33rdWorkshop on Combinatorial Mathematics and Computationeory pp 58ndash64 2016

[46] G C Armitage ldquoDevelopment of a classification system forperiodontal diseases and conditionsrdquo Annals of Periodontologyvol 4 no 1 pp 1ndash6 1999

[47] C Y Tang S-M Yiu H-Y Kuo et al ldquoApplication of 16S rRNAmetagenomics to analyze bacterial communities at a respiratorycare centre in Taiwanrdquo Applied Microbiology and Biotechnologyvol 99 no 6 pp 2871ndash2881 2015

[48] L Cai L Ye A H Y Tong S Lok and T Zhang ldquoBiased Di-versityMetrics Revealed byBacterial 16S PyrotagsDerived fromDifferent Primer Setsrdquo PLoS ONE vol 8 no 1 2013

[49] J Zhang K Kobert T Flouri and A Stamatakis ldquoPEAR A fastand accurate Illumina Paired-End reAd mergeRrdquo Bioinformat-ics vol 30 no 5 pp 614ndash620 2014

[50] J G Caporaso J Kuczynski J Stombaugh et al ldquoQIIME allowsanalysis of high-throughput community sequencing datardquoNature Methods vol 7 no 5 pp 335-336 2010

[51] B J Haas D Gevers A M Earl et al ldquoChimeric 16S rRNAsequence formation and detection in Sanger and 454-pyro-sequenced PCR ampliconsrdquo Genome Research vol 21 no 3 pp494ndash504 2011

[52] R C Edgar B J Haas J C Clemente C Quince and RKnight ldquoUCHIME improves sensitivity and speed of chimeradetectionrdquo Bioinformatics vol 27 no 16 pp 2194ndash2200 2011

[53] T ChenW-H Yu J Izard OV BaranovaA Lakshmanan andF E Dewhirst ldquoThe Human Oral Microbiome Database a webaccessible resource for investigating oral microbe taxonomicand genomic informationrdquo Database the journal of biologicaldatabases and curation vol 2010 p baq013 2010

[54] T C J Hill K A Walsh J A Harris and B F Moffett ldquoUsingecological diversity measures with bacterial communitiesrdquoFEMS Microbiology Ecology vol 43 no 1 pp 1ndash11 2003

[55] R PDarveau ldquoPeriodontitis a polymicrobial disruption of hosthomeostasisrdquoNature ReviewsMicrobiology vol 8 no 7 pp 481ndash490 2010

[56] J M Albandar L J Brown and H Loe ldquoPutative PeriodontalPathogens in Subgingival Plaque of Young Adults with andWithout Early-Onset Periodontitisrdquo Journal of Periodontologyvol 68 no 10 pp 973ndash981 1997

[57] Y Takeuchi M Umeda M Sakamoto Y Benno Y Huang andI Ishikawa ldquoTreponema socranskii Treponema denticola andPorphyromonas gingivalis are associated with severity of peri-odontal tissue destructionrdquo Journal of Periodontology vol 72no 10 pp 1354ndash1363 2001

[58] J J Zambon ldquoActinobacillus actinomycetemcomitans in hu-man periodontal diseaserdquo Journal of Clinical Periodontologyvol 12 no 1 pp 1ndash20 1985

[59] J Slots and M Ting ldquoActinobacillus actinomycetemcomitansand Porphyromonas gingivalis in human periodontal diseaseOccurrence and treatmentrdquo Periodontology 2000 vol 20 no 1pp 82ndash121 1999

[60] BKChoi B J Paster F EDewhirst andUBGobel ldquoDiversityof cultivable and uncultivable oral spirochetes from a patientwith severe destructive periodontitisrdquo Infection and Immunityvol 62 no 5 pp 1889ndash1895 1994

[61] B Liu L L Faller N Klitgord et al ldquoDeep sequencing of theoral microbiome reveals signatures of periodontal diseaserdquoPLoS ONE vol 7 no 6 Article ID e37919 2012

[62] J Wang J Qi H Zhao et al ldquoMetagenomic sequencingreveals microbiota and its functional potential associated withperiodontal diseaserdquo Scientific Reports vol 3 2013

[63] A L Griffen C J Beall J H Campbell et al ldquoDistinct and com-plex bacterial profiles in human periodontitis and healthrevealed by 16S pyrosequencingrdquo e ISME Journal vol 6 no6 pp 1176ndash1185 2012

[64] M A Masalma F Armougom W Michael Scheld et al ldquoTheexpansion of the microbiological spectrum of brain abscessesWith use of multiple 16S ribosomal DNA Sequencingrdquo ClinicalInfectious Diseases vol 48 no 9 pp 1169ndash1178 2009

[65] MAlMasalmaM LonjonHRichet et al ldquoMetagenomic anal-ysis of brain abscesses identifies specific bacterial associationsrdquoClinical Infectious Diseases vol 54 no 2 pp 202ndash210 2012

[66] I Nasidze J Li DQuinque K Tang andM Stoneking ldquoGlobaldiversity in the human salivary microbiomerdquoGenome Researchvol 19 no 4 pp 636ndash643 2009

[67] F V Wintzingerode U B Gobel and E Stackebrandt ldquoDeter-mination of microbial diversity in environmental samples Pit-falls of PCR-based rRNA analysisrdquo FEMSMicrobiology Reviewsvol 21 no 3 pp 213ndash229 1997

[68] M Tezal F A Scannapieco J Wactawski-Wende S G Grossiand R J Genco ldquoSupragingival plaque may modify the effectsof subgingival bacteria on attachment lossrdquo Journal of Periodon-tology vol 77 no 5 pp 808ndash813 2006

[69] R Caruana and A Niculescu-Mizil ldquoAn empirical comparisonof supervised learning algorithmsrdquo in Proceedings of the ICML2006 23rd International Conference on Machine Learning pp161ndash168 USA June 2006

[70] R Caruana N Karampatziakis and A Yessenalina ldquoAn empir-ical evaluation of supervised learning in high dimensionsrdquo inProceedings of the 25th International Conference on MachineLearning pp 96ndash103 Finland July 2008

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 11: Composition Analysis and Feature Selection of the Oral

BioMed Research International 11

Table 5 Feature combinations and their predictive accuracies with different machine learning methods Only feature combinations withmore than 094 average accuracy are shown DL RF and LR represent deep learning random forests and logistic regression respectively

Feature combination DL RF SVM LR Average accuracyStomatobaculum sp oral taxon 373 0967 0973 0960 0933 0958Peptoniphilaceae sp oral taxon 113Desulfobulbaceae sp oral taxon 041

0933 0960 0973 0947 0953Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLachnospiraceae sp oral taxon 500Leptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0933 0973 0960 0947 0953Streptococcus sanguinisAggregatibacter actinomycetemcomitansDesulfobulbaceae sp oral taxon 041

0973 0967 0933 0927 0950Mollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0947 0953 0907 0987 0948Streptococcus sanguinisMollicutes sp oral taxon 906Porphyromonas gingivalisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0960 0967 0947 0913 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Desulfobulbaceae sp oral taxon 041

0933 0973 0933 0947 0947Peptoniphilaceae sp oral taxon 113Aggregatibacter actinomycetemcomitansLeptotrichia sp oral taxon 218Stomatobaculum sp oral taxon 373

0967 0933 0953 0933 0947Peptoniphilaceae sp oral taxon 113Mollicutes sp oral taxon 906Peptoniphilaceae sp oral taxon 113

0960 0987 0867 0967 0945Streptococcus sanguinisAggregatibacter actinomycetemcomitansStomatobaculum sp oral taxon 373

0920 0947 0967 0947 0945Aggregatibacter actinomycetemcomitansPeptostreptococcaceae sp oral taxon 950Stomatobaculum sp oral taxon 373

0967 0967 0953 0893 0945Peptoniphilaceae sp oral taxon 113Porphyromonas gingivalisAggregatibacter actinomycetemcomitans

According to previous studies Caruana et al [69 70] pro-posed that the random forest method showed better accuracyin high-dimensional and large-scale data than neural netsSVM and logistic regression In this study we found that therandom forest method was more suitable for small-scale datathan other methods In contrast deep learning approachesled to good performance but required long computation

times and large amounts of memory particularly when thehidden layer size was increased

4 Conclusions

With the development of high-throughput DNA sequencingtechnology the limitations associated with difficult culture of

12 BioMed Research International

075

080

085

090

095

1 2 3 4 5 6 7 8 9 10Number of features in the combinstion

Aver

age a

ccur

acy

Machine learning algorithmsDeep learningLogistic regressionRandom forestSVM

Figure 7 Average accuracies of different numbers of features

manymicrobes that populate the oral cavity can be overcomefacilitating the analysis of bacterial community compositionUsing 16S rRNA sequencing of subgingival samples from 50individuals with periodontitis and 26 periodontally healthycontrols we determined the diversity of and differences incommunity compositions Moreover we identified microbesassociated with good health and periodontal disease andprovided a machine learning method for finding patternsand making predictions for oral microbiota associated withperiodontal disease

Our results showed that there was a higher diversity ofmicrobes in samples from patients with periodontal diseasethan in samples from healthy patients Importantly the coremicrobes in healthy patients were different significantly fromthose in patients with periodontitis We also found thatbacterial communities associated with healthy and diseasedstates were highly different in PCA and PCoA and thecompositions of microorganisms were more similar to eachother in samples from patients with periodontal disease thanin samples from healthy individuals

We proposed a novel feature selection method andinvestigated the potential of machine learning approachesfor determination of health status based on oral metage-nomics data By using nonparametric KruskalminusWallis teststo assess the significance of each microorganism we selectedsignificant microbes to generate prioritized feature combina-tions by our algorithm The performances of four machinelearning approaches were evaluated with these feature com-binations and random forests showed the best performance(average accuracy of 093 from 1023 feature combinations)followed by deep learning SVM and logistic regressionUsing machine learning methods training models couldaccurately predict the health status of samples by examiningfewer features According to our observations the accuracyof prediction generally increased slightly with the numberof features used except for logistic regression Notablycertain combinations composed of fewer features showedbetter accuracy than combinations composed of all selectedfeatures These combinations of features may only apply toour dataset However the results implied that a few relatedfeatures may have better predictive ability than multiple

independent features Therefore in order to improve theprediction accuracy of the model it is essential to identifythe most informative features Due to limitations in fundingtime and ethical considerations it is not easy to obtain largenumbers of oral samples from patients with periodontitisAlthough insufficient and incomplete samples could easilylead to bias and variance in training models our study stillprovided an important basis for further studies

Periodontitis is a chronic inflammatory disease involvingcomplex interactions between the oral microorganisms andthe host immune response In addition to the individualspecies associated with pathogenesis the system-level mech-anisms underlying the transition from a healthy state to adiseased state are key points for studying periodontal diseaseThus in our future studies we aim to elucidate the globalgenetic metabolic and ecological changes associated withperiodontitis and identify the pathogenic features of con-structing machine learning models Rapid molecular tech-niques and machine learning methods capable of identifyingperiodontal bacteria with great accuracy may eventuallyprovide improved classification and diagnosis of varioustypes of periodontal diseases and aid significantly in clinicaldecision-making

Data Availability

The raw sequences of human oral subgingival plaque sampleswere deposited at theNCBI SequenceReadArchive under theBioproject Accession no PRJNA437129

Conflicts of Interest

The authors declare no conflicts of interest

Authorsrsquo Contributions

Wen-Pei Chen and Shih-Hao Chang contributed equally tothis work

Funding

Thepresent work was partially supported by a grant from theMinistry of Science and Technology [grant number MOST107-2218-E-126-001-] and [grant number NSC 102-2622-E-126-002 CC1]

References

[1] L Gao T Xu G Huang S Jiang Y Gu and F Chen ldquoOral mi-crobiomesmore andmore importance in oral cavity and wholebodyrdquo Protein amp Cell pp 1ndash13 2018

[2] P D Marsh ldquoMicrobiology of dental plaque biofilms and theirrole in oral health and cariesrdquo Dental Clinics of North Americavol 54 no 3 pp 441ndash454 2010

[3] P D Marsh ldquoDental plaque as a biofilm and a microbialcommunity - Implications for health and diseaserdquo BMC OralHealth vol 6 no 1 2006

BioMed Research International 13

[4] R J Palmer Jr ldquoComposition and development of oral bacterialcommunitiesrdquo Periodontology 2000 vol 64 no 1 pp 20ndash392014

[5] B L Pihlstrom B S Michalowicz and N W Johnson ldquoPeri-odontal diseasesrdquoe Lancet vol 366 no 9499 pp 1809ndash18202005

[6] R J Genco and T E Van Dyke ldquoReducing the risk of CVD inpatients with periodontitisrdquo Nature Reviews Cardiology vol 7no 9 pp 479-480 2010

[7] K Lundberg N Wegner T Yucel-Lindberg and P J VenablesldquoPeriodontitis in RA-the citrullinated enolase connectionrdquoNature Reviews Rheumatology vol 6 no 12 pp 727ndash730 2010

[8] M Kebschull R T Demmer and P N Papapanou ldquoldquoGumbug leave my heart alonerdquomdashepidemiologic and mechanisticevidence linking periodontal infections and atherosclerosisrdquoJournal of Dental Research vol 89 no 9 pp 879ndash902 2010

[9] S E Whitmore R J Lamont and W E Goldman ldquoOral Bac-teria and Cancerrdquo PLoS Pathogens vol 10 no 3 p e10039332014

[10] Y W Han and X Wang ldquoMobile microbiome oral bacteria inextra-oral infections and inflammationrdquo Journal of Dental Re-search vol 92 no 6 pp 485ndash491 2013

[11] P N Madianos Y A Bobetsis and S Offenbacher ldquoAdversepregnancy outcomes (APOs) and periodontal disease Patho-genic mechanismsrdquo Journal of Clinical Periodontology vol 40no 14 pp S170ndashS180 2013

[12] S Temoin K LWu VWuM Shoham and YW Han ldquoSignalpeptide of FadA adhesin from Fusobacterium nucleatum playsa novel structural role by modulating the filamentrsquos length andwidthrdquo FEBS Letters vol 586 no 1 pp 1ndash6 2012

[13] S Malm M Jusko S Eick J Potempa K Riesbeck and AM Blom ldquoAcquisition of complement inhibitor serine proteasefactor i and its cofactors C4b-binding protein and factor H byprevotella intermediardquo PLoS ONE vol 7 no 4 2012

[14] S S Socransky A D Haffajee M A Cugini C Smith and R LKent Jr ldquoMicrobial complexes in subgingival plaquerdquo Journal ofClinical Periodontology vol 25 no 2 pp 134ndash144 1998

[15] R Teles F Teles J Frias-Lopez B Paster and A HaffajeeldquoLessons learned and unlearned in periodontal microbiologyrdquoPeriodontology 2000 vol 62 no 1 pp 95ndash162 2013

[16] P Belda-Ferre L D Alcaraz R Cabrera-Rubio et al ldquoThe oralmetagenome in health and diseaserdquo e ISME Journal vol 6no 1 pp 46ndash56 2012

[17] P S Kumar A L Griffen M L Moeschberger and E J LeysldquoIdentification of candidate periodontal pathogens and ben-eficial species by quantitative 16S clonal analysisrdquo Journal ofClinical Microbiology vol 43 no 8 pp 3944ndash3955 2005

[18] A P V Colombo S K Boches S L Cotton et al ldquoComparisonsof subgingival microbial profiles of refractory periodontitissevere periodontitis and periodontal health using the humanoral microbe identification microarrayrdquo Journal of Periodontol-ogy vol 80 no 9 pp 1421ndash1432 2009

[19] S Junemann K Prior R Szczepanowski et al ldquoBacterial com-munity shift in treated periodontitis patients revealed by IonTorrent 16S rRNA gene amplicon sequencingrdquo PLoS ONE vol7 no 8 Article ID e41606 2012

[20] P D Schloss S L Westcott T Ryabin et al ldquoIntroduc-ing mothur open-source platform-independent community-supported software for describing and comparing microbialcommunitiesrdquoApplied and EnvironmentalMicrobiology vol 75no 23 pp 7537ndash7541 2009

[21] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquoBioinformatics vol 26 no 19 pp 2460-2461 2010

[22] F Mahe T Rognes C Quince C de Vargas andM DunthornldquoSwarm Robust and fast clusteringmethod for amplicon-basedstudiesrdquo PeerJ vol 2014 no 1 2014

[23] M Ghodsi B Liu and M Pop ldquoDNACLUST Accurate andefficient clustering of phylogenetic marker genesrdquo BMC Bioin-formatics vol 12 article no 271 2011

[24] L Fu B Niu Z Zhu SWu andW Li ldquoCD-HIT accelerated forclustering the next-generation sequencing datardquoBioinformaticsvol 28 no 23 pp 3150ndash3152 2012

[25] H Teeling J Waldmann T Lombardot M Bauer and F OGlockner ldquoTETRA a web-service and a stand-alone programfor the analysis and comparison of tetranucleotide usage pat-terns inDNA sequencesrdquoBMCBioinformatics vol 5 article 1632004

[26] Y Wang H C M Leung S M Yiu and F Y L Chin ldquoMeta-Cluster 40 A novel binning algorithm for NGS reads and hugenumber of speciesrdquo Journal of Computational Biology vol 19no 2 pp 241ndash249 2012

[27] YWangH CM Leung SM Yiu and F Y L Chin ldquoMetaclus-ter 50 a two-roundbinning approach formetagenomic data forlow-abundance species in a noisy samplerdquo Bioinformatics vol28 no 18 Article ID bts397 pp i356ndashi362 2012

[28] D D Kang J Froula R Egan and Z Wang ldquoMetaBAT anefficient tool for accurately reconstructing single genomes fromcomplex microbial communitiesrdquo PeerJ vol 2015 no 8 2015

[29] J Alneberg B S Bjarnason I De Bruijn et al ldquoBinning meta-genomic contigs by coverage and compositionrdquo Nature Meth-ods vol 11 no 11 pp 1144ndash1146 2014

[30] Y-W Wu Y-H Tang S G Tringe B A Simmons and SW Singer ldquoMaxBin An automated binning method to recoverindividual genomes from metagenomes using an expectation-maximization algorithmrdquoMicrobiome vol 2 no 1 2014

[31] QWangGMGarrity JM Tiedje and J R Cole ldquoNaıve Baye-sian classifier for rapid assignment of rRNA sequences into thenew bacterial taxonomyrdquo Applied and Environmental Microbi-ology vol 73 no 16 pp 5261ndash5267 2007

[32] N Chaudhary A K Sharma P Agarwal A Gupta and V KSharma ldquo16S classifier A tool for fast and accurate taxono-mic classification of 16S rRNAhypervariable regions inmetage-nomic datasetsrdquo PLoS ONE vol 10 no 2 2015

[33] S P Szafranski M L Wos-Oxley R Vilchez-Vargas et alldquoHigh-resolution taxonomic profiling of the subgingival micro-biome for biomarker discovery and periodontitis diagnosisrdquoApplied and EnvironmentalMicrobiology vol 81 no 3 pp 1047ndash1058 2015

[34] K Vervier PMaheM Tournoud J-B Veyrieras and J-P VertldquoLarge-scale machine learning for metagenomics sequenceclassificationrdquo Bioinformatics vol 32 no 7 pp 1023ndash1032 2016

[35] A E Darling G Jospin E Lowe F A Matsen H M Bik andJ A Eisen ldquoPhyloSift Phylogenetic analysis of genomes andmetagenomesrdquo PeerJ vol 2013 no 1 2014

[36] Z Liu W Hsiao B L Cantarel E F Drabek and C Fraser-Liggett ldquoSparse distance-based learning for simultaneous mul-ticlass classification and feature selection of metagenomic datardquoBioinformatics vol 27 no 23 pp 3242ndash3249 2011

[37] O Tanaseichuk J Borneman and T Jiang ldquoPhylogeny-basedclassification ofmicrobial communitiesrdquoBioinformatics vol 30no 4 pp 449ndash456 2014

14 BioMed Research International

[38] H Cui and X Zhang ldquoAlignment-free supervised classificationof metagenomes by recursive SVMrdquo BMCGenomics vol 14 no1 article no 641 2013

[39] E Pasolli D T Truong F Malik L Waldron and N SegataldquoMachine Learning Meta-analysis of Large MetagenomicDatasets Tools and Biological Insightsrdquo PLoS ComputationalBiology vol 12 no 7 2016

[40] M Arumugam E D Harrington K U Foerstner J Raes andP Bork ldquoSmashCommunity A metagenomic annotation andanalysis toolrdquo Bioinformatics vol 26 no 23 pp 2977-29782010

[41] K J Hoff T Lingner P Meinicke and M Tech ldquoOrpheliaPredicting genes in metagenomic sequencing readsrdquo NucleicAcids Research vol 37 no 2 pp W101ndashW105 2009

[42] K J HoffM Tech T Lingner R Daniel BMorgenstern and PMeinicke ldquoGene prediction in metagenomic fragments A largescale machine learning approachrdquo BMC Bioinformatics vol 9article no 217 2008

[43] I Guyon and A Elisseeff ldquoAn introduction to variable andfeature selectionrdquo Journal of Machine Learning Research vol 3pp 1157ndash1182 2003

[44] Y Saeys I Inza and P Larranaga ldquoA review of feature selectiontechniques in bioinformaticsrdquo Bioinformatics vol 23 no 19 pp2507ndash2517 2007

[45] W-P Chen S-J J Tsai Y-C Hu and Y-L Lin ldquoMetage-nomic analysis and features selection in human oral microbiotaassociated with periodontal diseaserdquo in Proceedings of the 33rdWorkshop on Combinatorial Mathematics and Computationeory pp 58ndash64 2016

[46] G C Armitage ldquoDevelopment of a classification system forperiodontal diseases and conditionsrdquo Annals of Periodontologyvol 4 no 1 pp 1ndash6 1999

[47] C Y Tang S-M Yiu H-Y Kuo et al ldquoApplication of 16S rRNAmetagenomics to analyze bacterial communities at a respiratorycare centre in Taiwanrdquo Applied Microbiology and Biotechnologyvol 99 no 6 pp 2871ndash2881 2015

[48] L Cai L Ye A H Y Tong S Lok and T Zhang ldquoBiased Di-versityMetrics Revealed byBacterial 16S PyrotagsDerived fromDifferent Primer Setsrdquo PLoS ONE vol 8 no 1 2013

[49] J Zhang K Kobert T Flouri and A Stamatakis ldquoPEAR A fastand accurate Illumina Paired-End reAd mergeRrdquo Bioinformat-ics vol 30 no 5 pp 614ndash620 2014

[50] J G Caporaso J Kuczynski J Stombaugh et al ldquoQIIME allowsanalysis of high-throughput community sequencing datardquoNature Methods vol 7 no 5 pp 335-336 2010

[51] B J Haas D Gevers A M Earl et al ldquoChimeric 16S rRNAsequence formation and detection in Sanger and 454-pyro-sequenced PCR ampliconsrdquo Genome Research vol 21 no 3 pp494ndash504 2011

[52] R C Edgar B J Haas J C Clemente C Quince and RKnight ldquoUCHIME improves sensitivity and speed of chimeradetectionrdquo Bioinformatics vol 27 no 16 pp 2194ndash2200 2011

[53] T ChenW-H Yu J Izard OV BaranovaA Lakshmanan andF E Dewhirst ldquoThe Human Oral Microbiome Database a webaccessible resource for investigating oral microbe taxonomicand genomic informationrdquo Database the journal of biologicaldatabases and curation vol 2010 p baq013 2010

[54] T C J Hill K A Walsh J A Harris and B F Moffett ldquoUsingecological diversity measures with bacterial communitiesrdquoFEMS Microbiology Ecology vol 43 no 1 pp 1ndash11 2003

[55] R PDarveau ldquoPeriodontitis a polymicrobial disruption of hosthomeostasisrdquoNature ReviewsMicrobiology vol 8 no 7 pp 481ndash490 2010

[56] J M Albandar L J Brown and H Loe ldquoPutative PeriodontalPathogens in Subgingival Plaque of Young Adults with andWithout Early-Onset Periodontitisrdquo Journal of Periodontologyvol 68 no 10 pp 973ndash981 1997

[57] Y Takeuchi M Umeda M Sakamoto Y Benno Y Huang andI Ishikawa ldquoTreponema socranskii Treponema denticola andPorphyromonas gingivalis are associated with severity of peri-odontal tissue destructionrdquo Journal of Periodontology vol 72no 10 pp 1354ndash1363 2001

[58] J J Zambon ldquoActinobacillus actinomycetemcomitans in hu-man periodontal diseaserdquo Journal of Clinical Periodontologyvol 12 no 1 pp 1ndash20 1985

[59] J Slots and M Ting ldquoActinobacillus actinomycetemcomitansand Porphyromonas gingivalis in human periodontal diseaseOccurrence and treatmentrdquo Periodontology 2000 vol 20 no 1pp 82ndash121 1999

[60] BKChoi B J Paster F EDewhirst andUBGobel ldquoDiversityof cultivable and uncultivable oral spirochetes from a patientwith severe destructive periodontitisrdquo Infection and Immunityvol 62 no 5 pp 1889ndash1895 1994

[61] B Liu L L Faller N Klitgord et al ldquoDeep sequencing of theoral microbiome reveals signatures of periodontal diseaserdquoPLoS ONE vol 7 no 6 Article ID e37919 2012

[62] J Wang J Qi H Zhao et al ldquoMetagenomic sequencingreveals microbiota and its functional potential associated withperiodontal diseaserdquo Scientific Reports vol 3 2013

[63] A L Griffen C J Beall J H Campbell et al ldquoDistinct and com-plex bacterial profiles in human periodontitis and healthrevealed by 16S pyrosequencingrdquo e ISME Journal vol 6 no6 pp 1176ndash1185 2012

[64] M A Masalma F Armougom W Michael Scheld et al ldquoTheexpansion of the microbiological spectrum of brain abscessesWith use of multiple 16S ribosomal DNA Sequencingrdquo ClinicalInfectious Diseases vol 48 no 9 pp 1169ndash1178 2009

[65] MAlMasalmaM LonjonHRichet et al ldquoMetagenomic anal-ysis of brain abscesses identifies specific bacterial associationsrdquoClinical Infectious Diseases vol 54 no 2 pp 202ndash210 2012

[66] I Nasidze J Li DQuinque K Tang andM Stoneking ldquoGlobaldiversity in the human salivary microbiomerdquoGenome Researchvol 19 no 4 pp 636ndash643 2009

[67] F V Wintzingerode U B Gobel and E Stackebrandt ldquoDeter-mination of microbial diversity in environmental samples Pit-falls of PCR-based rRNA analysisrdquo FEMSMicrobiology Reviewsvol 21 no 3 pp 213ndash229 1997

[68] M Tezal F A Scannapieco J Wactawski-Wende S G Grossiand R J Genco ldquoSupragingival plaque may modify the effectsof subgingival bacteria on attachment lossrdquo Journal of Periodon-tology vol 77 no 5 pp 808ndash813 2006

[69] R Caruana and A Niculescu-Mizil ldquoAn empirical comparisonof supervised learning algorithmsrdquo in Proceedings of the ICML2006 23rd International Conference on Machine Learning pp161ndash168 USA June 2006

[70] R Caruana N Karampatziakis and A Yessenalina ldquoAn empir-ical evaluation of supervised learning in high dimensionsrdquo inProceedings of the 25th International Conference on MachineLearning pp 96ndash103 Finland July 2008

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 12: Composition Analysis and Feature Selection of the Oral

12 BioMed Research International

075

080

085

090

095

1 2 3 4 5 6 7 8 9 10Number of features in the combinstion

Aver

age a

ccur

acy

Machine learning algorithmsDeep learningLogistic regressionRandom forestSVM

Figure 7 Average accuracies of different numbers of features

manymicrobes that populate the oral cavity can be overcomefacilitating the analysis of bacterial community compositionUsing 16S rRNA sequencing of subgingival samples from 50individuals with periodontitis and 26 periodontally healthycontrols we determined the diversity of and differences incommunity compositions Moreover we identified microbesassociated with good health and periodontal disease andprovided a machine learning method for finding patternsand making predictions for oral microbiota associated withperiodontal disease

Our results showed that there was a higher diversity ofmicrobes in samples from patients with periodontal diseasethan in samples from healthy patients Importantly the coremicrobes in healthy patients were different significantly fromthose in patients with periodontitis We also found thatbacterial communities associated with healthy and diseasedstates were highly different in PCA and PCoA and thecompositions of microorganisms were more similar to eachother in samples from patients with periodontal disease thanin samples from healthy individuals

We proposed a novel feature selection method andinvestigated the potential of machine learning approachesfor determination of health status based on oral metage-nomics data By using nonparametric KruskalminusWallis teststo assess the significance of each microorganism we selectedsignificant microbes to generate prioritized feature combina-tions by our algorithm The performances of four machinelearning approaches were evaluated with these feature com-binations and random forests showed the best performance(average accuracy of 093 from 1023 feature combinations)followed by deep learning SVM and logistic regressionUsing machine learning methods training models couldaccurately predict the health status of samples by examiningfewer features According to our observations the accuracyof prediction generally increased slightly with the numberof features used except for logistic regression Notablycertain combinations composed of fewer features showedbetter accuracy than combinations composed of all selectedfeatures These combinations of features may only apply toour dataset However the results implied that a few relatedfeatures may have better predictive ability than multiple

independent features Therefore in order to improve theprediction accuracy of the model it is essential to identifythe most informative features Due to limitations in fundingtime and ethical considerations it is not easy to obtain largenumbers of oral samples from patients with periodontitisAlthough insufficient and incomplete samples could easilylead to bias and variance in training models our study stillprovided an important basis for further studies

Periodontitis is a chronic inflammatory disease involvingcomplex interactions between the oral microorganisms andthe host immune response In addition to the individualspecies associated with pathogenesis the system-level mech-anisms underlying the transition from a healthy state to adiseased state are key points for studying periodontal diseaseThus in our future studies we aim to elucidate the globalgenetic metabolic and ecological changes associated withperiodontitis and identify the pathogenic features of con-structing machine learning models Rapid molecular tech-niques and machine learning methods capable of identifyingperiodontal bacteria with great accuracy may eventuallyprovide improved classification and diagnosis of varioustypes of periodontal diseases and aid significantly in clinicaldecision-making

Data Availability

The raw sequences of human oral subgingival plaque sampleswere deposited at theNCBI SequenceReadArchive under theBioproject Accession no PRJNA437129

Conflicts of Interest

The authors declare no conflicts of interest

Authorsrsquo Contributions

Wen-Pei Chen and Shih-Hao Chang contributed equally tothis work

Funding

Thepresent work was partially supported by a grant from theMinistry of Science and Technology [grant number MOST107-2218-E-126-001-] and [grant number NSC 102-2622-E-126-002 CC1]

References

[1] L Gao T Xu G Huang S Jiang Y Gu and F Chen ldquoOral mi-crobiomesmore andmore importance in oral cavity and wholebodyrdquo Protein amp Cell pp 1ndash13 2018

[2] P D Marsh ldquoMicrobiology of dental plaque biofilms and theirrole in oral health and cariesrdquo Dental Clinics of North Americavol 54 no 3 pp 441ndash454 2010

[3] P D Marsh ldquoDental plaque as a biofilm and a microbialcommunity - Implications for health and diseaserdquo BMC OralHealth vol 6 no 1 2006

BioMed Research International 13

[4] R J Palmer Jr ldquoComposition and development of oral bacterialcommunitiesrdquo Periodontology 2000 vol 64 no 1 pp 20ndash392014

[5] B L Pihlstrom B S Michalowicz and N W Johnson ldquoPeri-odontal diseasesrdquoe Lancet vol 366 no 9499 pp 1809ndash18202005

[6] R J Genco and T E Van Dyke ldquoReducing the risk of CVD inpatients with periodontitisrdquo Nature Reviews Cardiology vol 7no 9 pp 479-480 2010

[7] K Lundberg N Wegner T Yucel-Lindberg and P J VenablesldquoPeriodontitis in RA-the citrullinated enolase connectionrdquoNature Reviews Rheumatology vol 6 no 12 pp 727ndash730 2010

[8] M Kebschull R T Demmer and P N Papapanou ldquoldquoGumbug leave my heart alonerdquomdashepidemiologic and mechanisticevidence linking periodontal infections and atherosclerosisrdquoJournal of Dental Research vol 89 no 9 pp 879ndash902 2010

[9] S E Whitmore R J Lamont and W E Goldman ldquoOral Bac-teria and Cancerrdquo PLoS Pathogens vol 10 no 3 p e10039332014

[10] Y W Han and X Wang ldquoMobile microbiome oral bacteria inextra-oral infections and inflammationrdquo Journal of Dental Re-search vol 92 no 6 pp 485ndash491 2013

[11] P N Madianos Y A Bobetsis and S Offenbacher ldquoAdversepregnancy outcomes (APOs) and periodontal disease Patho-genic mechanismsrdquo Journal of Clinical Periodontology vol 40no 14 pp S170ndashS180 2013

[12] S Temoin K LWu VWuM Shoham and YW Han ldquoSignalpeptide of FadA adhesin from Fusobacterium nucleatum playsa novel structural role by modulating the filamentrsquos length andwidthrdquo FEBS Letters vol 586 no 1 pp 1ndash6 2012

[13] S Malm M Jusko S Eick J Potempa K Riesbeck and AM Blom ldquoAcquisition of complement inhibitor serine proteasefactor i and its cofactors C4b-binding protein and factor H byprevotella intermediardquo PLoS ONE vol 7 no 4 2012

[14] S S Socransky A D Haffajee M A Cugini C Smith and R LKent Jr ldquoMicrobial complexes in subgingival plaquerdquo Journal ofClinical Periodontology vol 25 no 2 pp 134ndash144 1998

[15] R Teles F Teles J Frias-Lopez B Paster and A HaffajeeldquoLessons learned and unlearned in periodontal microbiologyrdquoPeriodontology 2000 vol 62 no 1 pp 95ndash162 2013

[16] P Belda-Ferre L D Alcaraz R Cabrera-Rubio et al ldquoThe oralmetagenome in health and diseaserdquo e ISME Journal vol 6no 1 pp 46ndash56 2012

[17] P S Kumar A L Griffen M L Moeschberger and E J LeysldquoIdentification of candidate periodontal pathogens and ben-eficial species by quantitative 16S clonal analysisrdquo Journal ofClinical Microbiology vol 43 no 8 pp 3944ndash3955 2005

[18] A P V Colombo S K Boches S L Cotton et al ldquoComparisonsof subgingival microbial profiles of refractory periodontitissevere periodontitis and periodontal health using the humanoral microbe identification microarrayrdquo Journal of Periodontol-ogy vol 80 no 9 pp 1421ndash1432 2009

[19] S Junemann K Prior R Szczepanowski et al ldquoBacterial com-munity shift in treated periodontitis patients revealed by IonTorrent 16S rRNA gene amplicon sequencingrdquo PLoS ONE vol7 no 8 Article ID e41606 2012

[20] P D Schloss S L Westcott T Ryabin et al ldquoIntroduc-ing mothur open-source platform-independent community-supported software for describing and comparing microbialcommunitiesrdquoApplied and EnvironmentalMicrobiology vol 75no 23 pp 7537ndash7541 2009

[21] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquoBioinformatics vol 26 no 19 pp 2460-2461 2010

[22] F Mahe T Rognes C Quince C de Vargas andM DunthornldquoSwarm Robust and fast clusteringmethod for amplicon-basedstudiesrdquo PeerJ vol 2014 no 1 2014

[23] M Ghodsi B Liu and M Pop ldquoDNACLUST Accurate andefficient clustering of phylogenetic marker genesrdquo BMC Bioin-formatics vol 12 article no 271 2011

[24] L Fu B Niu Z Zhu SWu andW Li ldquoCD-HIT accelerated forclustering the next-generation sequencing datardquoBioinformaticsvol 28 no 23 pp 3150ndash3152 2012

[25] H Teeling J Waldmann T Lombardot M Bauer and F OGlockner ldquoTETRA a web-service and a stand-alone programfor the analysis and comparison of tetranucleotide usage pat-terns inDNA sequencesrdquoBMCBioinformatics vol 5 article 1632004

[26] Y Wang H C M Leung S M Yiu and F Y L Chin ldquoMeta-Cluster 40 A novel binning algorithm for NGS reads and hugenumber of speciesrdquo Journal of Computational Biology vol 19no 2 pp 241ndash249 2012

[27] YWangH CM Leung SM Yiu and F Y L Chin ldquoMetaclus-ter 50 a two-roundbinning approach formetagenomic data forlow-abundance species in a noisy samplerdquo Bioinformatics vol28 no 18 Article ID bts397 pp i356ndashi362 2012

[28] D D Kang J Froula R Egan and Z Wang ldquoMetaBAT anefficient tool for accurately reconstructing single genomes fromcomplex microbial communitiesrdquo PeerJ vol 2015 no 8 2015

[29] J Alneberg B S Bjarnason I De Bruijn et al ldquoBinning meta-genomic contigs by coverage and compositionrdquo Nature Meth-ods vol 11 no 11 pp 1144ndash1146 2014

[30] Y-W Wu Y-H Tang S G Tringe B A Simmons and SW Singer ldquoMaxBin An automated binning method to recoverindividual genomes from metagenomes using an expectation-maximization algorithmrdquoMicrobiome vol 2 no 1 2014

[31] QWangGMGarrity JM Tiedje and J R Cole ldquoNaıve Baye-sian classifier for rapid assignment of rRNA sequences into thenew bacterial taxonomyrdquo Applied and Environmental Microbi-ology vol 73 no 16 pp 5261ndash5267 2007

[32] N Chaudhary A K Sharma P Agarwal A Gupta and V KSharma ldquo16S classifier A tool for fast and accurate taxono-mic classification of 16S rRNAhypervariable regions inmetage-nomic datasetsrdquo PLoS ONE vol 10 no 2 2015

[33] S P Szafranski M L Wos-Oxley R Vilchez-Vargas et alldquoHigh-resolution taxonomic profiling of the subgingival micro-biome for biomarker discovery and periodontitis diagnosisrdquoApplied and EnvironmentalMicrobiology vol 81 no 3 pp 1047ndash1058 2015

[34] K Vervier PMaheM Tournoud J-B Veyrieras and J-P VertldquoLarge-scale machine learning for metagenomics sequenceclassificationrdquo Bioinformatics vol 32 no 7 pp 1023ndash1032 2016

[35] A E Darling G Jospin E Lowe F A Matsen H M Bik andJ A Eisen ldquoPhyloSift Phylogenetic analysis of genomes andmetagenomesrdquo PeerJ vol 2013 no 1 2014

[36] Z Liu W Hsiao B L Cantarel E F Drabek and C Fraser-Liggett ldquoSparse distance-based learning for simultaneous mul-ticlass classification and feature selection of metagenomic datardquoBioinformatics vol 27 no 23 pp 3242ndash3249 2011

[37] O Tanaseichuk J Borneman and T Jiang ldquoPhylogeny-basedclassification ofmicrobial communitiesrdquoBioinformatics vol 30no 4 pp 449ndash456 2014

14 BioMed Research International

[38] H Cui and X Zhang ldquoAlignment-free supervised classificationof metagenomes by recursive SVMrdquo BMCGenomics vol 14 no1 article no 641 2013

[39] E Pasolli D T Truong F Malik L Waldron and N SegataldquoMachine Learning Meta-analysis of Large MetagenomicDatasets Tools and Biological Insightsrdquo PLoS ComputationalBiology vol 12 no 7 2016

[40] M Arumugam E D Harrington K U Foerstner J Raes andP Bork ldquoSmashCommunity A metagenomic annotation andanalysis toolrdquo Bioinformatics vol 26 no 23 pp 2977-29782010

[41] K J Hoff T Lingner P Meinicke and M Tech ldquoOrpheliaPredicting genes in metagenomic sequencing readsrdquo NucleicAcids Research vol 37 no 2 pp W101ndashW105 2009

[42] K J HoffM Tech T Lingner R Daniel BMorgenstern and PMeinicke ldquoGene prediction in metagenomic fragments A largescale machine learning approachrdquo BMC Bioinformatics vol 9article no 217 2008

[43] I Guyon and A Elisseeff ldquoAn introduction to variable andfeature selectionrdquo Journal of Machine Learning Research vol 3pp 1157ndash1182 2003

[44] Y Saeys I Inza and P Larranaga ldquoA review of feature selectiontechniques in bioinformaticsrdquo Bioinformatics vol 23 no 19 pp2507ndash2517 2007

[45] W-P Chen S-J J Tsai Y-C Hu and Y-L Lin ldquoMetage-nomic analysis and features selection in human oral microbiotaassociated with periodontal diseaserdquo in Proceedings of the 33rdWorkshop on Combinatorial Mathematics and Computationeory pp 58ndash64 2016

[46] G C Armitage ldquoDevelopment of a classification system forperiodontal diseases and conditionsrdquo Annals of Periodontologyvol 4 no 1 pp 1ndash6 1999

[47] C Y Tang S-M Yiu H-Y Kuo et al ldquoApplication of 16S rRNAmetagenomics to analyze bacterial communities at a respiratorycare centre in Taiwanrdquo Applied Microbiology and Biotechnologyvol 99 no 6 pp 2871ndash2881 2015

[48] L Cai L Ye A H Y Tong S Lok and T Zhang ldquoBiased Di-versityMetrics Revealed byBacterial 16S PyrotagsDerived fromDifferent Primer Setsrdquo PLoS ONE vol 8 no 1 2013

[49] J Zhang K Kobert T Flouri and A Stamatakis ldquoPEAR A fastand accurate Illumina Paired-End reAd mergeRrdquo Bioinformat-ics vol 30 no 5 pp 614ndash620 2014

[50] J G Caporaso J Kuczynski J Stombaugh et al ldquoQIIME allowsanalysis of high-throughput community sequencing datardquoNature Methods vol 7 no 5 pp 335-336 2010

[51] B J Haas D Gevers A M Earl et al ldquoChimeric 16S rRNAsequence formation and detection in Sanger and 454-pyro-sequenced PCR ampliconsrdquo Genome Research vol 21 no 3 pp494ndash504 2011

[52] R C Edgar B J Haas J C Clemente C Quince and RKnight ldquoUCHIME improves sensitivity and speed of chimeradetectionrdquo Bioinformatics vol 27 no 16 pp 2194ndash2200 2011

[53] T ChenW-H Yu J Izard OV BaranovaA Lakshmanan andF E Dewhirst ldquoThe Human Oral Microbiome Database a webaccessible resource for investigating oral microbe taxonomicand genomic informationrdquo Database the journal of biologicaldatabases and curation vol 2010 p baq013 2010

[54] T C J Hill K A Walsh J A Harris and B F Moffett ldquoUsingecological diversity measures with bacterial communitiesrdquoFEMS Microbiology Ecology vol 43 no 1 pp 1ndash11 2003

[55] R PDarveau ldquoPeriodontitis a polymicrobial disruption of hosthomeostasisrdquoNature ReviewsMicrobiology vol 8 no 7 pp 481ndash490 2010

[56] J M Albandar L J Brown and H Loe ldquoPutative PeriodontalPathogens in Subgingival Plaque of Young Adults with andWithout Early-Onset Periodontitisrdquo Journal of Periodontologyvol 68 no 10 pp 973ndash981 1997

[57] Y Takeuchi M Umeda M Sakamoto Y Benno Y Huang andI Ishikawa ldquoTreponema socranskii Treponema denticola andPorphyromonas gingivalis are associated with severity of peri-odontal tissue destructionrdquo Journal of Periodontology vol 72no 10 pp 1354ndash1363 2001

[58] J J Zambon ldquoActinobacillus actinomycetemcomitans in hu-man periodontal diseaserdquo Journal of Clinical Periodontologyvol 12 no 1 pp 1ndash20 1985

[59] J Slots and M Ting ldquoActinobacillus actinomycetemcomitansand Porphyromonas gingivalis in human periodontal diseaseOccurrence and treatmentrdquo Periodontology 2000 vol 20 no 1pp 82ndash121 1999

[60] BKChoi B J Paster F EDewhirst andUBGobel ldquoDiversityof cultivable and uncultivable oral spirochetes from a patientwith severe destructive periodontitisrdquo Infection and Immunityvol 62 no 5 pp 1889ndash1895 1994

[61] B Liu L L Faller N Klitgord et al ldquoDeep sequencing of theoral microbiome reveals signatures of periodontal diseaserdquoPLoS ONE vol 7 no 6 Article ID e37919 2012

[62] J Wang J Qi H Zhao et al ldquoMetagenomic sequencingreveals microbiota and its functional potential associated withperiodontal diseaserdquo Scientific Reports vol 3 2013

[63] A L Griffen C J Beall J H Campbell et al ldquoDistinct and com-plex bacterial profiles in human periodontitis and healthrevealed by 16S pyrosequencingrdquo e ISME Journal vol 6 no6 pp 1176ndash1185 2012

[64] M A Masalma F Armougom W Michael Scheld et al ldquoTheexpansion of the microbiological spectrum of brain abscessesWith use of multiple 16S ribosomal DNA Sequencingrdquo ClinicalInfectious Diseases vol 48 no 9 pp 1169ndash1178 2009

[65] MAlMasalmaM LonjonHRichet et al ldquoMetagenomic anal-ysis of brain abscesses identifies specific bacterial associationsrdquoClinical Infectious Diseases vol 54 no 2 pp 202ndash210 2012

[66] I Nasidze J Li DQuinque K Tang andM Stoneking ldquoGlobaldiversity in the human salivary microbiomerdquoGenome Researchvol 19 no 4 pp 636ndash643 2009

[67] F V Wintzingerode U B Gobel and E Stackebrandt ldquoDeter-mination of microbial diversity in environmental samples Pit-falls of PCR-based rRNA analysisrdquo FEMSMicrobiology Reviewsvol 21 no 3 pp 213ndash229 1997

[68] M Tezal F A Scannapieco J Wactawski-Wende S G Grossiand R J Genco ldquoSupragingival plaque may modify the effectsof subgingival bacteria on attachment lossrdquo Journal of Periodon-tology vol 77 no 5 pp 808ndash813 2006

[69] R Caruana and A Niculescu-Mizil ldquoAn empirical comparisonof supervised learning algorithmsrdquo in Proceedings of the ICML2006 23rd International Conference on Machine Learning pp161ndash168 USA June 2006

[70] R Caruana N Karampatziakis and A Yessenalina ldquoAn empir-ical evaluation of supervised learning in high dimensionsrdquo inProceedings of the 25th International Conference on MachineLearning pp 96ndash103 Finland July 2008

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 13: Composition Analysis and Feature Selection of the Oral

BioMed Research International 13

[4] R J Palmer Jr ldquoComposition and development of oral bacterialcommunitiesrdquo Periodontology 2000 vol 64 no 1 pp 20ndash392014

[5] B L Pihlstrom B S Michalowicz and N W Johnson ldquoPeri-odontal diseasesrdquoe Lancet vol 366 no 9499 pp 1809ndash18202005

[6] R J Genco and T E Van Dyke ldquoReducing the risk of CVD inpatients with periodontitisrdquo Nature Reviews Cardiology vol 7no 9 pp 479-480 2010

[7] K Lundberg N Wegner T Yucel-Lindberg and P J VenablesldquoPeriodontitis in RA-the citrullinated enolase connectionrdquoNature Reviews Rheumatology vol 6 no 12 pp 727ndash730 2010

[8] M Kebschull R T Demmer and P N Papapanou ldquoldquoGumbug leave my heart alonerdquomdashepidemiologic and mechanisticevidence linking periodontal infections and atherosclerosisrdquoJournal of Dental Research vol 89 no 9 pp 879ndash902 2010

[9] S E Whitmore R J Lamont and W E Goldman ldquoOral Bac-teria and Cancerrdquo PLoS Pathogens vol 10 no 3 p e10039332014

[10] Y W Han and X Wang ldquoMobile microbiome oral bacteria inextra-oral infections and inflammationrdquo Journal of Dental Re-search vol 92 no 6 pp 485ndash491 2013

[11] P N Madianos Y A Bobetsis and S Offenbacher ldquoAdversepregnancy outcomes (APOs) and periodontal disease Patho-genic mechanismsrdquo Journal of Clinical Periodontology vol 40no 14 pp S170ndashS180 2013

[12] S Temoin K LWu VWuM Shoham and YW Han ldquoSignalpeptide of FadA adhesin from Fusobacterium nucleatum playsa novel structural role by modulating the filamentrsquos length andwidthrdquo FEBS Letters vol 586 no 1 pp 1ndash6 2012

[13] S Malm M Jusko S Eick J Potempa K Riesbeck and AM Blom ldquoAcquisition of complement inhibitor serine proteasefactor i and its cofactors C4b-binding protein and factor H byprevotella intermediardquo PLoS ONE vol 7 no 4 2012

[14] S S Socransky A D Haffajee M A Cugini C Smith and R LKent Jr ldquoMicrobial complexes in subgingival plaquerdquo Journal ofClinical Periodontology vol 25 no 2 pp 134ndash144 1998

[15] R Teles F Teles J Frias-Lopez B Paster and A HaffajeeldquoLessons learned and unlearned in periodontal microbiologyrdquoPeriodontology 2000 vol 62 no 1 pp 95ndash162 2013

[16] P Belda-Ferre L D Alcaraz R Cabrera-Rubio et al ldquoThe oralmetagenome in health and diseaserdquo e ISME Journal vol 6no 1 pp 46ndash56 2012

[17] P S Kumar A L Griffen M L Moeschberger and E J LeysldquoIdentification of candidate periodontal pathogens and ben-eficial species by quantitative 16S clonal analysisrdquo Journal ofClinical Microbiology vol 43 no 8 pp 3944ndash3955 2005

[18] A P V Colombo S K Boches S L Cotton et al ldquoComparisonsof subgingival microbial profiles of refractory periodontitissevere periodontitis and periodontal health using the humanoral microbe identification microarrayrdquo Journal of Periodontol-ogy vol 80 no 9 pp 1421ndash1432 2009

[19] S Junemann K Prior R Szczepanowski et al ldquoBacterial com-munity shift in treated periodontitis patients revealed by IonTorrent 16S rRNA gene amplicon sequencingrdquo PLoS ONE vol7 no 8 Article ID e41606 2012

[20] P D Schloss S L Westcott T Ryabin et al ldquoIntroduc-ing mothur open-source platform-independent community-supported software for describing and comparing microbialcommunitiesrdquoApplied and EnvironmentalMicrobiology vol 75no 23 pp 7537ndash7541 2009

[21] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquoBioinformatics vol 26 no 19 pp 2460-2461 2010

[22] F Mahe T Rognes C Quince C de Vargas andM DunthornldquoSwarm Robust and fast clusteringmethod for amplicon-basedstudiesrdquo PeerJ vol 2014 no 1 2014

[23] M Ghodsi B Liu and M Pop ldquoDNACLUST Accurate andefficient clustering of phylogenetic marker genesrdquo BMC Bioin-formatics vol 12 article no 271 2011

[24] L Fu B Niu Z Zhu SWu andW Li ldquoCD-HIT accelerated forclustering the next-generation sequencing datardquoBioinformaticsvol 28 no 23 pp 3150ndash3152 2012

[25] H Teeling J Waldmann T Lombardot M Bauer and F OGlockner ldquoTETRA a web-service and a stand-alone programfor the analysis and comparison of tetranucleotide usage pat-terns inDNA sequencesrdquoBMCBioinformatics vol 5 article 1632004

[26] Y Wang H C M Leung S M Yiu and F Y L Chin ldquoMeta-Cluster 40 A novel binning algorithm for NGS reads and hugenumber of speciesrdquo Journal of Computational Biology vol 19no 2 pp 241ndash249 2012

[27] YWangH CM Leung SM Yiu and F Y L Chin ldquoMetaclus-ter 50 a two-roundbinning approach formetagenomic data forlow-abundance species in a noisy samplerdquo Bioinformatics vol28 no 18 Article ID bts397 pp i356ndashi362 2012

[28] D D Kang J Froula R Egan and Z Wang ldquoMetaBAT anefficient tool for accurately reconstructing single genomes fromcomplex microbial communitiesrdquo PeerJ vol 2015 no 8 2015

[29] J Alneberg B S Bjarnason I De Bruijn et al ldquoBinning meta-genomic contigs by coverage and compositionrdquo Nature Meth-ods vol 11 no 11 pp 1144ndash1146 2014

[30] Y-W Wu Y-H Tang S G Tringe B A Simmons and SW Singer ldquoMaxBin An automated binning method to recoverindividual genomes from metagenomes using an expectation-maximization algorithmrdquoMicrobiome vol 2 no 1 2014

[31] QWangGMGarrity JM Tiedje and J R Cole ldquoNaıve Baye-sian classifier for rapid assignment of rRNA sequences into thenew bacterial taxonomyrdquo Applied and Environmental Microbi-ology vol 73 no 16 pp 5261ndash5267 2007

[32] N Chaudhary A K Sharma P Agarwal A Gupta and V KSharma ldquo16S classifier A tool for fast and accurate taxono-mic classification of 16S rRNAhypervariable regions inmetage-nomic datasetsrdquo PLoS ONE vol 10 no 2 2015

[33] S P Szafranski M L Wos-Oxley R Vilchez-Vargas et alldquoHigh-resolution taxonomic profiling of the subgingival micro-biome for biomarker discovery and periodontitis diagnosisrdquoApplied and EnvironmentalMicrobiology vol 81 no 3 pp 1047ndash1058 2015

[34] K Vervier PMaheM Tournoud J-B Veyrieras and J-P VertldquoLarge-scale machine learning for metagenomics sequenceclassificationrdquo Bioinformatics vol 32 no 7 pp 1023ndash1032 2016

[35] A E Darling G Jospin E Lowe F A Matsen H M Bik andJ A Eisen ldquoPhyloSift Phylogenetic analysis of genomes andmetagenomesrdquo PeerJ vol 2013 no 1 2014

[36] Z Liu W Hsiao B L Cantarel E F Drabek and C Fraser-Liggett ldquoSparse distance-based learning for simultaneous mul-ticlass classification and feature selection of metagenomic datardquoBioinformatics vol 27 no 23 pp 3242ndash3249 2011

[37] O Tanaseichuk J Borneman and T Jiang ldquoPhylogeny-basedclassification ofmicrobial communitiesrdquoBioinformatics vol 30no 4 pp 449ndash456 2014

14 BioMed Research International

[38] H Cui and X Zhang ldquoAlignment-free supervised classificationof metagenomes by recursive SVMrdquo BMCGenomics vol 14 no1 article no 641 2013

[39] E Pasolli D T Truong F Malik L Waldron and N SegataldquoMachine Learning Meta-analysis of Large MetagenomicDatasets Tools and Biological Insightsrdquo PLoS ComputationalBiology vol 12 no 7 2016

[40] M Arumugam E D Harrington K U Foerstner J Raes andP Bork ldquoSmashCommunity A metagenomic annotation andanalysis toolrdquo Bioinformatics vol 26 no 23 pp 2977-29782010

[41] K J Hoff T Lingner P Meinicke and M Tech ldquoOrpheliaPredicting genes in metagenomic sequencing readsrdquo NucleicAcids Research vol 37 no 2 pp W101ndashW105 2009

[42] K J HoffM Tech T Lingner R Daniel BMorgenstern and PMeinicke ldquoGene prediction in metagenomic fragments A largescale machine learning approachrdquo BMC Bioinformatics vol 9article no 217 2008

[43] I Guyon and A Elisseeff ldquoAn introduction to variable andfeature selectionrdquo Journal of Machine Learning Research vol 3pp 1157ndash1182 2003

[44] Y Saeys I Inza and P Larranaga ldquoA review of feature selectiontechniques in bioinformaticsrdquo Bioinformatics vol 23 no 19 pp2507ndash2517 2007

[45] W-P Chen S-J J Tsai Y-C Hu and Y-L Lin ldquoMetage-nomic analysis and features selection in human oral microbiotaassociated with periodontal diseaserdquo in Proceedings of the 33rdWorkshop on Combinatorial Mathematics and Computationeory pp 58ndash64 2016

[46] G C Armitage ldquoDevelopment of a classification system forperiodontal diseases and conditionsrdquo Annals of Periodontologyvol 4 no 1 pp 1ndash6 1999

[47] C Y Tang S-M Yiu H-Y Kuo et al ldquoApplication of 16S rRNAmetagenomics to analyze bacterial communities at a respiratorycare centre in Taiwanrdquo Applied Microbiology and Biotechnologyvol 99 no 6 pp 2871ndash2881 2015

[48] L Cai L Ye A H Y Tong S Lok and T Zhang ldquoBiased Di-versityMetrics Revealed byBacterial 16S PyrotagsDerived fromDifferent Primer Setsrdquo PLoS ONE vol 8 no 1 2013

[49] J Zhang K Kobert T Flouri and A Stamatakis ldquoPEAR A fastand accurate Illumina Paired-End reAd mergeRrdquo Bioinformat-ics vol 30 no 5 pp 614ndash620 2014

[50] J G Caporaso J Kuczynski J Stombaugh et al ldquoQIIME allowsanalysis of high-throughput community sequencing datardquoNature Methods vol 7 no 5 pp 335-336 2010

[51] B J Haas D Gevers A M Earl et al ldquoChimeric 16S rRNAsequence formation and detection in Sanger and 454-pyro-sequenced PCR ampliconsrdquo Genome Research vol 21 no 3 pp494ndash504 2011

[52] R C Edgar B J Haas J C Clemente C Quince and RKnight ldquoUCHIME improves sensitivity and speed of chimeradetectionrdquo Bioinformatics vol 27 no 16 pp 2194ndash2200 2011

[53] T ChenW-H Yu J Izard OV BaranovaA Lakshmanan andF E Dewhirst ldquoThe Human Oral Microbiome Database a webaccessible resource for investigating oral microbe taxonomicand genomic informationrdquo Database the journal of biologicaldatabases and curation vol 2010 p baq013 2010

[54] T C J Hill K A Walsh J A Harris and B F Moffett ldquoUsingecological diversity measures with bacterial communitiesrdquoFEMS Microbiology Ecology vol 43 no 1 pp 1ndash11 2003

[55] R PDarveau ldquoPeriodontitis a polymicrobial disruption of hosthomeostasisrdquoNature ReviewsMicrobiology vol 8 no 7 pp 481ndash490 2010

[56] J M Albandar L J Brown and H Loe ldquoPutative PeriodontalPathogens in Subgingival Plaque of Young Adults with andWithout Early-Onset Periodontitisrdquo Journal of Periodontologyvol 68 no 10 pp 973ndash981 1997

[57] Y Takeuchi M Umeda M Sakamoto Y Benno Y Huang andI Ishikawa ldquoTreponema socranskii Treponema denticola andPorphyromonas gingivalis are associated with severity of peri-odontal tissue destructionrdquo Journal of Periodontology vol 72no 10 pp 1354ndash1363 2001

[58] J J Zambon ldquoActinobacillus actinomycetemcomitans in hu-man periodontal diseaserdquo Journal of Clinical Periodontologyvol 12 no 1 pp 1ndash20 1985

[59] J Slots and M Ting ldquoActinobacillus actinomycetemcomitansand Porphyromonas gingivalis in human periodontal diseaseOccurrence and treatmentrdquo Periodontology 2000 vol 20 no 1pp 82ndash121 1999

[60] BKChoi B J Paster F EDewhirst andUBGobel ldquoDiversityof cultivable and uncultivable oral spirochetes from a patientwith severe destructive periodontitisrdquo Infection and Immunityvol 62 no 5 pp 1889ndash1895 1994

[61] B Liu L L Faller N Klitgord et al ldquoDeep sequencing of theoral microbiome reveals signatures of periodontal diseaserdquoPLoS ONE vol 7 no 6 Article ID e37919 2012

[62] J Wang J Qi H Zhao et al ldquoMetagenomic sequencingreveals microbiota and its functional potential associated withperiodontal diseaserdquo Scientific Reports vol 3 2013

[63] A L Griffen C J Beall J H Campbell et al ldquoDistinct and com-plex bacterial profiles in human periodontitis and healthrevealed by 16S pyrosequencingrdquo e ISME Journal vol 6 no6 pp 1176ndash1185 2012

[64] M A Masalma F Armougom W Michael Scheld et al ldquoTheexpansion of the microbiological spectrum of brain abscessesWith use of multiple 16S ribosomal DNA Sequencingrdquo ClinicalInfectious Diseases vol 48 no 9 pp 1169ndash1178 2009

[65] MAlMasalmaM LonjonHRichet et al ldquoMetagenomic anal-ysis of brain abscesses identifies specific bacterial associationsrdquoClinical Infectious Diseases vol 54 no 2 pp 202ndash210 2012

[66] I Nasidze J Li DQuinque K Tang andM Stoneking ldquoGlobaldiversity in the human salivary microbiomerdquoGenome Researchvol 19 no 4 pp 636ndash643 2009

[67] F V Wintzingerode U B Gobel and E Stackebrandt ldquoDeter-mination of microbial diversity in environmental samples Pit-falls of PCR-based rRNA analysisrdquo FEMSMicrobiology Reviewsvol 21 no 3 pp 213ndash229 1997

[68] M Tezal F A Scannapieco J Wactawski-Wende S G Grossiand R J Genco ldquoSupragingival plaque may modify the effectsof subgingival bacteria on attachment lossrdquo Journal of Periodon-tology vol 77 no 5 pp 808ndash813 2006

[69] R Caruana and A Niculescu-Mizil ldquoAn empirical comparisonof supervised learning algorithmsrdquo in Proceedings of the ICML2006 23rd International Conference on Machine Learning pp161ndash168 USA June 2006

[70] R Caruana N Karampatziakis and A Yessenalina ldquoAn empir-ical evaluation of supervised learning in high dimensionsrdquo inProceedings of the 25th International Conference on MachineLearning pp 96ndash103 Finland July 2008

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 14: Composition Analysis and Feature Selection of the Oral

14 BioMed Research International

[38] H Cui and X Zhang ldquoAlignment-free supervised classificationof metagenomes by recursive SVMrdquo BMCGenomics vol 14 no1 article no 641 2013

[39] E Pasolli D T Truong F Malik L Waldron and N SegataldquoMachine Learning Meta-analysis of Large MetagenomicDatasets Tools and Biological Insightsrdquo PLoS ComputationalBiology vol 12 no 7 2016

[40] M Arumugam E D Harrington K U Foerstner J Raes andP Bork ldquoSmashCommunity A metagenomic annotation andanalysis toolrdquo Bioinformatics vol 26 no 23 pp 2977-29782010

[41] K J Hoff T Lingner P Meinicke and M Tech ldquoOrpheliaPredicting genes in metagenomic sequencing readsrdquo NucleicAcids Research vol 37 no 2 pp W101ndashW105 2009

[42] K J HoffM Tech T Lingner R Daniel BMorgenstern and PMeinicke ldquoGene prediction in metagenomic fragments A largescale machine learning approachrdquo BMC Bioinformatics vol 9article no 217 2008

[43] I Guyon and A Elisseeff ldquoAn introduction to variable andfeature selectionrdquo Journal of Machine Learning Research vol 3pp 1157ndash1182 2003

[44] Y Saeys I Inza and P Larranaga ldquoA review of feature selectiontechniques in bioinformaticsrdquo Bioinformatics vol 23 no 19 pp2507ndash2517 2007

[45] W-P Chen S-J J Tsai Y-C Hu and Y-L Lin ldquoMetage-nomic analysis and features selection in human oral microbiotaassociated with periodontal diseaserdquo in Proceedings of the 33rdWorkshop on Combinatorial Mathematics and Computationeory pp 58ndash64 2016

[46] G C Armitage ldquoDevelopment of a classification system forperiodontal diseases and conditionsrdquo Annals of Periodontologyvol 4 no 1 pp 1ndash6 1999

[47] C Y Tang S-M Yiu H-Y Kuo et al ldquoApplication of 16S rRNAmetagenomics to analyze bacterial communities at a respiratorycare centre in Taiwanrdquo Applied Microbiology and Biotechnologyvol 99 no 6 pp 2871ndash2881 2015

[48] L Cai L Ye A H Y Tong S Lok and T Zhang ldquoBiased Di-versityMetrics Revealed byBacterial 16S PyrotagsDerived fromDifferent Primer Setsrdquo PLoS ONE vol 8 no 1 2013

[49] J Zhang K Kobert T Flouri and A Stamatakis ldquoPEAR A fastand accurate Illumina Paired-End reAd mergeRrdquo Bioinformat-ics vol 30 no 5 pp 614ndash620 2014

[50] J G Caporaso J Kuczynski J Stombaugh et al ldquoQIIME allowsanalysis of high-throughput community sequencing datardquoNature Methods vol 7 no 5 pp 335-336 2010

[51] B J Haas D Gevers A M Earl et al ldquoChimeric 16S rRNAsequence formation and detection in Sanger and 454-pyro-sequenced PCR ampliconsrdquo Genome Research vol 21 no 3 pp494ndash504 2011

[52] R C Edgar B J Haas J C Clemente C Quince and RKnight ldquoUCHIME improves sensitivity and speed of chimeradetectionrdquo Bioinformatics vol 27 no 16 pp 2194ndash2200 2011

[53] T ChenW-H Yu J Izard OV BaranovaA Lakshmanan andF E Dewhirst ldquoThe Human Oral Microbiome Database a webaccessible resource for investigating oral microbe taxonomicand genomic informationrdquo Database the journal of biologicaldatabases and curation vol 2010 p baq013 2010

[54] T C J Hill K A Walsh J A Harris and B F Moffett ldquoUsingecological diversity measures with bacterial communitiesrdquoFEMS Microbiology Ecology vol 43 no 1 pp 1ndash11 2003

[55] R PDarveau ldquoPeriodontitis a polymicrobial disruption of hosthomeostasisrdquoNature ReviewsMicrobiology vol 8 no 7 pp 481ndash490 2010

[56] J M Albandar L J Brown and H Loe ldquoPutative PeriodontalPathogens in Subgingival Plaque of Young Adults with andWithout Early-Onset Periodontitisrdquo Journal of Periodontologyvol 68 no 10 pp 973ndash981 1997

[57] Y Takeuchi M Umeda M Sakamoto Y Benno Y Huang andI Ishikawa ldquoTreponema socranskii Treponema denticola andPorphyromonas gingivalis are associated with severity of peri-odontal tissue destructionrdquo Journal of Periodontology vol 72no 10 pp 1354ndash1363 2001

[58] J J Zambon ldquoActinobacillus actinomycetemcomitans in hu-man periodontal diseaserdquo Journal of Clinical Periodontologyvol 12 no 1 pp 1ndash20 1985

[59] J Slots and M Ting ldquoActinobacillus actinomycetemcomitansand Porphyromonas gingivalis in human periodontal diseaseOccurrence and treatmentrdquo Periodontology 2000 vol 20 no 1pp 82ndash121 1999

[60] BKChoi B J Paster F EDewhirst andUBGobel ldquoDiversityof cultivable and uncultivable oral spirochetes from a patientwith severe destructive periodontitisrdquo Infection and Immunityvol 62 no 5 pp 1889ndash1895 1994

[61] B Liu L L Faller N Klitgord et al ldquoDeep sequencing of theoral microbiome reveals signatures of periodontal diseaserdquoPLoS ONE vol 7 no 6 Article ID e37919 2012

[62] J Wang J Qi H Zhao et al ldquoMetagenomic sequencingreveals microbiota and its functional potential associated withperiodontal diseaserdquo Scientific Reports vol 3 2013

[63] A L Griffen C J Beall J H Campbell et al ldquoDistinct and com-plex bacterial profiles in human periodontitis and healthrevealed by 16S pyrosequencingrdquo e ISME Journal vol 6 no6 pp 1176ndash1185 2012

[64] M A Masalma F Armougom W Michael Scheld et al ldquoTheexpansion of the microbiological spectrum of brain abscessesWith use of multiple 16S ribosomal DNA Sequencingrdquo ClinicalInfectious Diseases vol 48 no 9 pp 1169ndash1178 2009

[65] MAlMasalmaM LonjonHRichet et al ldquoMetagenomic anal-ysis of brain abscesses identifies specific bacterial associationsrdquoClinical Infectious Diseases vol 54 no 2 pp 202ndash210 2012

[66] I Nasidze J Li DQuinque K Tang andM Stoneking ldquoGlobaldiversity in the human salivary microbiomerdquoGenome Researchvol 19 no 4 pp 636ndash643 2009

[67] F V Wintzingerode U B Gobel and E Stackebrandt ldquoDeter-mination of microbial diversity in environmental samples Pit-falls of PCR-based rRNA analysisrdquo FEMSMicrobiology Reviewsvol 21 no 3 pp 213ndash229 1997

[68] M Tezal F A Scannapieco J Wactawski-Wende S G Grossiand R J Genco ldquoSupragingival plaque may modify the effectsof subgingival bacteria on attachment lossrdquo Journal of Periodon-tology vol 77 no 5 pp 808ndash813 2006

[69] R Caruana and A Niculescu-Mizil ldquoAn empirical comparisonof supervised learning algorithmsrdquo in Proceedings of the ICML2006 23rd International Conference on Machine Learning pp161ndash168 USA June 2006

[70] R Caruana N Karampatziakis and A Yessenalina ldquoAn empir-ical evaluation of supervised learning in high dimensionsrdquo inProceedings of the 25th International Conference on MachineLearning pp 96ndash103 Finland July 2008

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom

Page 15: Composition Analysis and Feature Selection of the Oral

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Journal of Parasitology Research

GenomicsInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Neuroscience Journal

Hindawiwwwhindawicom Volume 2018

BioMed Research International

Cell BiologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Genetics Research International

Hindawiwwwhindawicom Volume 2018

Advances in

Virolog y Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

International Journal of

MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom