Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
A m e t ho dology for on tology r e u s e : t h e c a s e of t h e a b do min al
ul t r a so u n d on tologyZulka r n ain, NZ a n d M ezia n e, F
h t t p://dx.doi.o rg/1 0.40 1 8/IJIIT.201 9 1 0 0 1 0 1
Tit l e A m e t hodology for on tology r e u s e : t h e c a s e of t h e a b do min al ul t r a sou n d on tology
Aut h or s Zulka r n ain, NZ a n d M ezia n e, F
Typ e Article
U RL This ve r sion is available a t : h t t p://usir.s alfor d. ac.uk/id/e p rin t/51 4 3 2/
P u bl i s h e d D a t e 2 0 1 9
U SIR is a digi t al collec tion of t h e r e s e a r c h ou t p u t of t h e U nive r si ty of S alford. Whe r e copyrigh t p e r mi t s, full t ex t m a t e ri al h eld in t h e r e posi to ry is m a d e fre ely availabl e online a n d c a n b e r e a d , dow nloa d e d a n d copied for no n-co m m e rcial p riva t e s t u dy o r r e s e a r c h p u r pos e s . Ple a s e c h e ck t h e m a n u sc rip t for a ny fu r t h e r copyrig h t r e s t ric tions.
For m o r e info r m a tion, including ou r policy a n d s u b mission p roc e d u r e , ple a s econ t ac t t h e Re posi to ry Tea m a t : u si r@s alford. ac.uk .
DOI: 10.4018/IJIIT.2019100101
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
Copyright©2019,IGIGlobal.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIGIGlobalisprohibited.
1
A Methodology for Ontology Reuse: The Case of the Abdominal Ultrasound OntologyNur Zareen Zulkarnain, Centre for Advanced Computing Technology, Fakulti Teknologi Maklumat Dan Komunikasi, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia
Farid Meziane, University of Salford, Manchester, United Kingdom
https://orcid.org/0000-0001-9811-6914
ABSTRACT
There is an abundance of existing biomedical ontologies such as the National Cancer InstituteThesaurusand theSystematizedNomenclatureofMedicine-ClinicalTerms. Implementing theseontologiesinaparticularsystemhowever,maycauseunnecessaryhighusageofmemoryandslowsdownthesystems’performance.Ontheotherhand,buildinganewontologyfromscratchwillrequireadditionaltimeandefforts.Therefore,thisresearchexplorestheontologyreuseapproachinordertodevelop anAbdominalUltrasoundOntologyby extracting concepts fromexistingbiomedicalontologies.Thisarticlepresentsthereaderwithastepbystepmethodinreusingontologiestogetherwithsuggestionsoftheoff-the-shelftoolsthatcanbeusedtoeasetheprocess.Theresultsshowthatontologyreuseisbeneficialespeciallyinthebiomedicalfieldasitallowsfordevelopersfromthenon-technicalbackgroundtobuildandusedomainspecificontologywithease.Italsoallowsfordeveloperswithtechnicalbackgroundtodevelopontologieswithminimalinvolvementsfromdomainexperts.
KeywORdSBiomedical Ontology, Bioportal, Ontology Reuse, Ontology
INTROdUCTION
Inpreviouswork,Zulkarnainetal.(2015a)proposedthedevelopmentofanarchitecturetosupportultrasoundreportgenerationandstandardisation.Theproposedarchitectureandreportwerevalidatedbyradiologistsandspecialistsatthe2015UKRadiologicalCongressheldintheCityofLiverpool(Zulkarnainetal.,2015b)andthe2016BritishMedicalUltrasoundAnnualScientificMeetingandExhibitionheldintheCityofYork(Zulkarnainetal.,2016a).Inthemedicalandradiographyfields,ultrasoundreportsarethemainmediausedtocommunicatetheresultsofanultrasoundexaminationfromasonographerorradiologisttoareferringclinician.Itwasreportedthatimagesaloneareoflimitedvaluesincetheoutcomesofanyultrasoundinvestigationarebasedonthefindingsduringthescan(Boland,2007).Indeed,manyfeaturesandquantitativedataarecollectedduringtheultrasoundexaminationsuchastissuecharacterisationandvariousmeasurementsanditisthisinformationthatiscommunicatedviathereports.
Theuseofinformationtechnology(IT)inthemedicalfieldsuchaselectronicpatients’recordsand somedecision support systemshasallowed for abetterunderstandingof somepathologies,healthmanagementandpatientcare.However,theintegrationofITintheradiologyfieldislimitedparticularlyinthereportingphase.Duringthedevelopmentofthestandardreportanditsvalidation,itwashighlightedbytheradiologistsandcliniciansthatthemainissueisthevariationsinthereporting
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
2
styles.Thesevariationswerenoticedinthestructureofthereportsaswellasintheterminologiesused.Thesevariationsmayimpactonthewayareportisinterpretedandinturnaffectthedecision-makingprocessandthewayapatientismanaged(Zulkarnainetal.,2015a).Radiologistsandcliniciansbelievethatthesolutiontothisproblemresidesinusingstructuredreportingwiththesupportofanontologyasitsknowledgebase(Kahn,etal.,2009).Theuseofanontologywillallowthestandardisationoftheterminologyusedduringreporting,allowingabetterexploitationofthesereportsbycomputerisedtoolsforknowledgediscovery,classificationandpredictions.
Theworkreportedinthispaper,isthedevelopmentofanontologythatwillcomplementandsupportthecomputerisedstandardreportdevelopedbyZulkarnainetal.,(2015a).Furthermore,thispaperisanextensionandconsolidationoftheworksreportedbyZulkarnainetal.(2015a,2015b,2016a,2016b).
Therearedifferentapproachestodevelopontologies.Wecanuseexistingontologies,developone from scratch or adapt and reuse existing ontologies. Each approach has its advantages anddisadvantages.Forexample,inonehandreusinganexistingontologywillrequirenoresourcesforitsdevelopmentbutmaybetoolargeforaspecificapplicationanditsintegrationtotherestofthesystemcouldbeproblematic.Ontheotherhand,developinganewonewillrequirealotofeffortsduringitsdevelopmentbutmayfitbettertotherequirementsofthenewapplication.Inthisresearch,wearespecificallyinterestedinabdominalultrasoundreporting,andtheexistingontologiesevaluatedwerefoundtobelarge.Forexample,usingtheNationalCancerInstituteThesaurus(NCIT)wouldrequirealargestorage,asitcontainsasmanyas118,941classes,andmoretimetoprocess.InthisresearchweadoptedtheontologyreuseapproachfordevelopingtheAbdominalUltrasoundOntology(AUO)tobeusedintheultrasoundreportingsystem.
Thispaperfirstreviewsthemethodsthathavebeenusedinpastontologyreuseworksandthenassesses thepossibilityofreusingoneof thethreeestablishedbiomedicalontologiesnamelytheFoundational Model of Anatomy (FMA), the Radiology Lexicon (RadLex) or the SystematizedNomenclatureofMedicine-ClinicalTerms(SNOMEDCT).Wethenproposeamethodologytoreusebiomedicalontologiestogetherwiththeexistingtoolsthatcanbeusedtofacilitatethereuseprocess.Themethodologyaidsontologydevelopersin:(i)selectingsuitableontologiesforreusebasedontheircorpus;(ii)selectingtheconceptstoreusefromtheselectedontologies;(iii)evaluatingthedevelopedontologywithminimalhelpfromthedomainexperts.ItisanticipatedthatthedevelopmentoftheAUOwillservetwomainpurposesinthestandardisationoftheultrasoundreportingsystem:(i)itwillbeusedtostandardizethedevelopmentofultrasoundreportsandenforcetheuseofastandardterminologyand(ii)toanalyseultrasoundreportswritteninNaturalLanguage(Englishfree-text)withtheaimofautomaticallytransformingthemintoastructuredformat.
Theremainingofthepaperisorganisedasfollows.Section2reviewssomeofthepreviousworksonontologyreuseanddiscussestheirsuitabilitytobeusedinthecurrentresearch.Insection2,wereviewsomeoftheexistingbiomedicalontologiesandhowwelltheycoverthetermsandlexiconusedinultrasoundreporting.Theproposedreusemethodologywillbedescribedindetailinsection3.WediscusstheresultsofusingtheproposedmethodologyindevelopingtheAUOinsection4andconcludethepaperinsection5.
ReLATed wORK
Ontologyreuseistheprocesswherepartsofexistingontologiesareusedinthedevelopmentofnewones(Bontas,2005).Thereusedpartsareveryoftenmanipulatedtomeettherequirementsofanewapplication(KatsumiandGruninger,2016).Suchanapproachreducesthedevelopmentcostofanontologyastheneedfordomainexpertsisminimal(Alanietal.,2006)andincreasesinteroperability(Simperl,2009)asthenewandoldontologieswillsharefeaturesandconcepts.However,existingtoolsarenotprovidingthenecessarysupportfortheontologyreuseprocess(Maedcheetal,2003;Simperl,2009).Mostontologyreusemethodologiesproposed(Alani,2006;Caldarolaetal.,2015;
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
3
Capellades,1999;Fernández-Lópezetal.,2013;Russetal.,1999;Shahetal.,2013;Simperl,2009,Uscholdetal.,1998)followthefollowingfoursteps:(i)Ontologyselectionforreuse,(ii)Conceptselection,(iii)Conceptcustomizationand(iv)Ontologyintegration.
Thepurposeofthefirststep,ontologyreuse,istoselecttheontologytobereused.Theselectionisbasedonasetofcriteriaaccording to the requirementsof thenewontologyand includes thelanguageusedinthedevelopmentoftheontology,itsreasoningcapabilitiesandhowwellitcoverstheterminologyusedinthespecificapplicationdomain.Wenotethatinthisphase,ifrequiredbytheneedsofthenewontology,severalontologiescanbeselectedforreuse.Theselectionstepisfollowedbytheconceptsselectionphasethatarethentranslatedandcustomisedtosuittheneedsofthenewontologyintheconceptcustomizationstep.Thefinalstepinvolvestheintegrationofthenewlydevelopedontologyintothenewsystemorapplication.
IntheontologyreusesystemproposedbyAlani(2006),allthenewtermsneededforthenewontology are first listed to determine which existing ontology should be selected for reuse. Thesystemthensearchesforrelevantontologiesonlineusingthetermsthathavebeenlistedandfromtheobtainedresult,theontologieswillthenberankedandonlythefirstfewwillbeanalysedandselectedforreuse.Oncetheontologyforreusehasbeenselected,thesystemwilldeterminewhethertoreusethewholeontologyoronlyasegmentofit.Thisstepcouldbeconsideredasconceptselectionwhererelevantconceptsarebeingselected.Inhiswork,Alani(2006)hasdecidedtoreuseseveralontologiesforoneconcept.Thus,eachgroupofrelatedconceptsneedstobecustomizedtoensurethattheconceptshaveastandardisedformatsothattheycanbemergedasoneconcept.Sincetheconceptisreusedfromseveralontologies,eachconceptcontainsdifferentpropertieswhichresultedinadditionalknowledgerepresentation.Finally,theontologyisautomaticallyevaluated.
AnotherexampleofontologyreuseisinthedevelopmentoftheOral-SystemicHealthCross-DomainOntology(OSHCO)byShahetal.(2013).IndevelopingOSCHO,thefirststeptakenbyShahetal.wastodeterminethescopeoftheontologybyrecognizingthedomainofcoverage,theintendeduseandwhatquestionsshouldOSCHObeabletoanswer.Oncethescopehasbeendetermined,Shahetal.usedatoolinBioportal1andsubmittedseveraldomainrelatedtermstotestthedomaincoverageofseveralontologiesthathavethepotentialtobereused.ThishasresultedinSNOMEDCTbeingthebestcandidate.ComparedtoAlani,Shahetal.reusedonlyoneontology,SNOMEDCTwheretheyselected theconceptsneeded thenaddedother relevantconceptsnot included inSNOMEDCT.Sincetheyreusedonlyoneontology,thenewontologydeveloped,OSCHO,closelyfollowsthemodelofSNOMEDCTwherethenewconceptsaddedarecustomizedtofollowthesamemodel.
Russetal.(1999)intheirworkaimtodevelopanaircraftontologythatcanbeusedbyseveralapplicationstoensureknowledgesharingbetweenthem.Duringthedevelopmentoftheirwork,theyrealizedthatthereexisttwoontologiesthatarerelatedtotheaircraftdomain.
Furthermore,someconceptsexistinoneontologybutnotintheother.Thus,Russetal.decidedtomergebothontologiestocreateamorecompleteone.ThefirststeptakenbyRussetal.isselectingtheontologies to reusewhich are the timeontology (which is publicly available) aswell as thetwoexistingaircraftontologies.ThetimeontologyiswritteninOntolinguawhilethetwoaircraftontologieswerewritteninLoom.Afterbothaircraftontologiesweremerged,thetimeontologyistranslatedtoLoomsothatitcouldbeintegratedintotheaircraftontology.ThesameapproachwasusedbyCaldarolaetal.(2015)indevelopinganontologyinthefooddomainwheremetadataweremanuallytranslatedtobetterunderstandtheconcepts.
Polionto(Ortiz,2011)isanotherexampleofanontologydevelopedusingontologyreuse.OrtizinhisworkdevelopedPoliontobyreusingtwoontologiesinthepoliticaldomainwherethefirstontologyisinPortugueseandtheotherisinEnglish.Hefirstselectsrelevantconceptsfrombothontologiestoreuseandcomparesthemtotheircorpus.ThoseselectedconceptsarethentranslatedandintegratedtocreateamultilingualpoliticalontologycalledPolionto.
Comprehendingtheneedforatoolthatwillassisttheprocessofontologyreuse,BontasandhercolleaguedevelopedatoolnamedPROMIthatisabletoperformthestepsrequiredinanontology
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
4
reuseprocess(Bontas,2007)asseeninFigure1.PROMIbeginsbypromptingtheuserstouploadatleasttwoontologiestobereused.Then,thelanguageoftheontologieswillbeexaminedinordertodecidewhetherthetransformationtoOWLlanguageshouldbeperformed.Next,thematchingoftheconceptswasconductedwherePROMIfirstseparatesandnormalisestheconceptnamesbeforecalculating thestringandconceptsimilarity.Finally, theconceptswillbemerged,andusersareallowedtoincludemoreconcepts,propertiesandaxiomsasnecessary.
ThedevelopmentofPROMIisanimmensestepforwardinassistingtheprocessofontologyreuse.However, thereare several limitations inusingPROMIasa tool toassistontology reuse.First,PROMIdoesnotprovidethefacilitytoassessthesuitabilityofanontologytobereusedforaspecificdomain.Instead,itbeginsbypromptingtheuserstouploadontologieswhichtheyhaveselectedbeforehand.Italsodoesnotrecommendanyontologiesforreusebasedonthedomainwhichmeansthattheuserswillneedtousetheirownexpertiseinselectingsuitableontologies.Second,theconceptmatchingbetweenthetwoontologiesdependsheavilyonthesimilaritymeasureschosenbytheusers.Forexample,thesimilaritymeasurefortheword“tournament”and“competition”was0usingtheEuclideanDistancemeasureand0.143usingtheHammingDistancemeasure.Therefore,theusersneedtohavesomeknowledgeaboutthedifferencesbetweenthesemeasuresforthemtobeabletoselectthemostappropriateone.Finally,inordertomergeconceptsinPROMI,userswillneedtoselectfromalistofconceptsanditsequivalentcandidateconceptswhichhaveacertaindegreeofsimilarity.ThisisshowninFigure2.PROMIcouldbeabeneficialtoolinassistingtheprocessofontologyreuse.However,therearestillseveralfeaturesthatcanbeincludedandimprovedsothatitcaneasetheontologyprocessevenmoreandallowstheusageonlargeontologiestobemoreefficient.
Fromtheexamplesmentionedabove,itcanbeseenthattheontologyreusemethodologiesusedinpreviousworksisroughlysimilartothefourstepsmentionedabove.Inthispaper,thesestepswerealsousedasaguidelineindevelopinganontologyreusemethodologyforthebiomedicaldomain.Themethodologypresentedinthispaperwillallowforthedevelopmentofanewontologybyreusingmultipleexistingontologiesandsuggesttoolsthatwouldhelpineachstepofthemethodology.Thedifferencebetweenourapproachand theexistingones is the incorporationofoff-the-shelf toolstoaidintherequiredtasks.Themainmotivationinproposingthismethodologyistoallownovice
Figure 1. The user interface of PROMI
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
5
developersespeciallyfromthenon-technicalbackgroundtocreateanduseontologyintheirfieldbyalleviatingthenotoriouslypainstakingtaskofdevelopingadomainspecificontologyfromscratch.Our proposed methodology would also allow for ontology to be developed with only minimalinvolvementofdomainexperts.
Review of existing Biomedical OntologiesManyontologiesaredevelopedinthebiomedicalfieldasithasalargenumberofterminologiesandconcepts.Inbuildingamedicalultrasoundreportingsystem,thefirststepwouldbetoinvestigateifthereisanexistingontologythatcanbeused,orifthereareontologiesthatcanbereusedinsteadofbuildinganewonefromscratch.Asafirststep,weneedtoreviewrelatedsuitableexistingontologies.Aftercarefulresearch,threeontologieshavebeeninitiallyselectednamely:theFoundationalModelofAnatomy(FMA),theSystematizedNomenclatureofMedicine-ClinicalTerms(SNOMEDCT)andtheRadiologyLexicon(RadLex).
Figure 2. The process of merging concepts in PROMI
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
6
Thisinitialselectionwaspurelybasedontheircoverage,languageandpopularityinthebiomedicalcommunity.Inselectingoneontologytobeadoptedwefirstlookatthedomainofeachontology.FMAcoverstheconceptsandrelationshipinthestructuralorganizationofthehumanbodyfromthemacrocellulartomicroscopiclevels(RosseandMejino,2003)whileSNOMEDCTcoversmoreandincludesclinicalfindings,chemicalsubstancescalesandothermiscellaneoushealthinformation(Benson,2010).RadLexontheotherhandincorporatesmanycomplexradiologyrelateddomainsfrombasicsciencetoimagingtechnology(Rubin,2008).ThisshowsthatbothFMAandSNOMEDCTcoverawiderareacomparedtoRadLex.However,RadLexwouldbemoredomainspecificinrelationtodevelopinganontologythatcoverstheabdominalultrasoundprocess.
Toselectthebestontologytobereused,weadoptedthemetricsprovidedinBioPortalasshowninTable1tocomparethethreeontologiesidentified.Fromhere,wecanseethatallthreeontologiesaretoolargetobeimplementedforonespecificsystemasthiscouldresultinslowcomputingtimeandtakingupalotofmemoryspace.Thus,wehavedecidedtoreuseonlytherelevantconceptsfromallthreeontologiesandmergethemintoanewonewiththeaimofachievingawidercoverageandlessresourcesconsumption.
The Proposed MethodologyThemostimportantpartinreusingontologiesistoselectthemostsuitableones.Fromtheinitialreview,FMA,SNOMEDCTandRadLexhavebeenidentifiedaspotentialontologiestobereused.However, theseontologieswillneed tobeevaluatedbycomparing themwithourcorpusof100medicalultrasoundreportstoensurethattheseontologiesaretherightonesforreuseandprovidetherightcoverage.
Inthissection,thedifferentstepsofthedevelopedontologyreusemethodologywillbedescribedindetail.Thetermsofthecorpusconstructedfromtheultrasoundreportswillbeusedforselectingsuitableontologies.Thesetermsarethenusedtoselecttherelevantconceptsfromtheseontologiesandmergethemintoasingleonebeforebeingevaluatedbydomainexperts.Figure3summarisesthemethodologysteps.
Term extractionThescopeanddomainoftheAbdominalUltrasoundOntology(AUO)istomodelthetaxonomy,pathology, equipment and other terms related to abdominal ultrasound reporting. 100 sampleabdominalultrasoundreportshavebeencollectedandusedtoconstructourcorpus.Thesereportshaveanaveragewordcountof70.82,averagenumberofwordspersentenceof11.15andaveragetokensizeof6.45characters.ThesesamplereportswereobtainedfromtheDirectorateofRadiologyoftheUniversityofSalford.Oncethecorpusisconstructed,thenextstepistoextractalltherelevanttermstogeneratealistofbiomedicalandtechnicaltermstobeincludedintheontology.Twobiomedicalextractionapplicationswereusedduringtheextractionprocessnamely:TerMine2andBioTex3todefineasubsetofthemostsuitabletermsforstandardreportingsystems.49outofthe100samplereportsweresubmittedtobothapplicationsandtheresultsobtainedaresummarisedinTable2.
Table 1. Comparison between FMA, SNOMED CT and RadLex
FMA SNOMED CT RadLex
Numberofclasses 104,258 324,129 46,140
Numberofindividuals 0 0 46,140
Numberofproperties 172 152 96
Format OWL/OBO OWL OWL
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
7
The BioTex application extracted more terms (761 terms) compared to only 241 terms forTerMine.ThesuperiorityofBioTexisduetoitsabilitytoextractbothmulti-wordsandsingle-wordsasTerMineonlyextractsmulti-words.Hence,BioTexwaschosenasthetermextractionapplicationtobeused.Forexample,ifthesentence“Normalliverwithnofocallesionsseen”wassubmittedtobothapplications,TerMinewillonlyextractonemulti-wordtermwhichis“focallesion”whileBioTexwillextractnotonlythemulti-wordtermbutalso“liver”whichisasinglewordterm.Ifsingle-wordtermssuchas“liver”,“kidney”and“spleen”arenotextracted,theontologydevelopedwouldbeincomplete.TermswhichareextractedfromBioTexwerealsovalidatedusingtheUnifiedMedicalLanguageSystem(UMLS)whichisasetofdocumentscontaininghealthandbiomedicalvocabulariesandstandards.UsingBioTexwemanagedtoextract1119termsfromthe100sampleultrasoundreports.
Ontology RecommendationOncethelistoftermsisobtained,thenextstepistoselecttheontology.Threecriteriaweresetfortheselectionoftheontologyandthesecanbesummarisedasfollows:
Figure 3. Ontology reuse methodology
Table 2. Comparison of biomedical term extraction using TerMine and BioTex
TerMine BioTex
Language English English
License Open Open
POSTagger GENIATagger/TreeTagger TreeTagger
TermsFound 241(GENIATagger) 761
232(TreeTagger)
ExtractionType Multi-wordextraction Multi-wordextractionand
Single-wordextraction
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
8
(1) Ontology coverage-Towhichextenddoestheontologycoversthetermsextractedfromthecorpus?
(2) Ontology acceptance-Istheontologybeingacceptedinthebiomedicalfieldandhowoftenisitused?
(3) Ontology language-IstheontologywritteninOWL,OBOorotherontologyformat?Howwidelyistheformatbeingusedintheontologycommunity?
Ontologycoveragehasthehighestweightagewhendeterminingwhetheranontologyissuitableforreuseornot.Anontologythatcontainsmostoftheconceptsneededwillpreservethemodeloftheontology.Thismodelwillthenbefollowedbyconceptstakenfromotherontologies.Thenextimportantcriteriaistheacceptanceoftheontologywithinthebiomedicalcommunityasthelevelofacceptanceindirectlyshowsthequalityoftheontology(McDanieletal.,2016).Thehighertheacceptance score, the more the ontology is being used thus promoting interoperability betweendifferentsystems.Finally,theformatusedtodeveloptheontologyisalsoanimportantcriteriontoavoidtranslatingtheontologytoanotherformat.
Usingthethreedefinedcriteria,FMA,SNOMEDCTandRadLexwereinitiallyselectedasthemostsuitablecandidatesforreuse.Asafurtherstep,wehaveusedtheontologyrecommendersystemprovidedbyBioPortalwhichisanopenontologylibrarythatcontainsontologieswithdomainsthatrangefromanatomy,phenotypeandchemistrytoexperimentalconditions(Noyetal.,2009).ThereareseveralframeworksthatcanbeusedtoselectsuitableontologiesforreusesuchastheoneproposedbyTrokanasandCecelja(2016)whichusessimilaritymeasurestocalculatethecompatibilityofanontologyforreuse(candidateontology)andtheontologytheywouldliketoexpand(primaryontology).However,wedecidedtousetherecommenderprovidedbyBioPortalasitisspecialisedforthebiomedicaldomainandallthreeontologiestobeinvestigatedareavailableonBioPortal.
Onceauserprovidesalistofterms,theontologyrecommenderavailableonBioPortalusesthesetermstosuggestthemostsuitableontologiesforreuse.Therecommenderusesfourcriterianamely:(i)Coverage,(ii)Acceptance,(iii)Detailand(iv)Specialization.BasedonBioPortalanalysis,thesearethecriteriathatarethemostrelevantforrecommendingontologiesforreuse(Jonquetetal.,2010).Thetermscanbesubmittedasaparagraphoralistoftermstotherecommenderandasaresult,itwillreturnalistof25recommendedontologiesrankedfromthehighesttothelowestscores(seeFigure4).TherankingofalltheontologiesavailableintheBioPortalrepositorythatmeetsallthecriteriaiscomputedbygivingscorestofourmetricsnamelycoverage,acceptance,knowledgedetailandspecialization.Thefinalscoreiscalculatedbasedonthefollowingformula:
FinalScore=(CoverageScore*0.55)+(AcceptanceScore*0.15)+(KnowledgeDetailScore*0.15)+(SpecializationScore*0.15)
Thecoveragescoreisgivenbasedonthenumberoftermsintheinputthatarecoveredbytheontology.Itisgiventhehighestweightagewhichis0.55comparedtotheotherthreemetricswhichareallgivenaweightageof0.15becauseontologycoverageisseenasanimportantfactorindeterminingthesuitabilityofreusinganontologyforacertaincorpus.Theacceptancescoreindicateshowwell-knownandtrustedtheontologyisinthebiomedicalfield.ItisgivenbasedonthetotalofwebsitevisitstotheontologyaswellaswhetherornottheontologyexistsinUMLS.Knowledgedetailscoreontheotherhandindicatesthelevelofdetailsintheontology;i.e.doestheontologyhavedefinitions,synonymsorotherdetails.Thisisalsoanimportantmetricindeterminingthequalityofanontology.Thisisbecauseanontologythatonlyhasahierarchyoftermsandcontainsnootherdetailssuchasdefinitionswouldmerelybeseenmoreasataxonomyratherthananontology.Thus,itwouldhavelesspurposeasaknowledgebaseinasystemorapplication.Lastly,thespecializationscoreisgivenbasedonhowwelltheontologycoversthedomainoftheinput.Thisisdifferentcomparedtothecoveragescorebecausethesizeoftheontologyisalsogivenattentionsincethespecialization
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
9
scoreranksdomainspecificontologieshighercomparedtogeneralontologies.Inthisresearch,thespecializationscoreisnotasimportantasthecoveragescoresincetheaimistoachievethehighestcoverageeventhoughwewouldneedtoreusemorethanoneontology.AnexampleofthisscoringsysteminactionisshowninFigure4where21termsextractedfromasampleultrasoundreportweresubmitted.
ThereishoweveralimitationwhenusingBioPortalasitallowssubmissionofonly500words.Toprocessallthe1119termsthatwereextractedfromourreports,wehavetocustomizetheexistingrecommendersystemtocomeupwithanothersystembymanipulatingthedatafromBioPortal’sontology recommender API (BioPortal, n.d.). We first developed the recommender system thatwouldsubmitall1119termstoBioPortal’sAPIandgivearecommendationof25ontologiesrankedaccordingtoitsfinalscorejustlikehowitwouldbeinBioPortal’srecommender.However,itseemsthatthe1119termsweretoobigfortherecommender’sservertohandleandcausestherecommendertocrashmidwaywithoutgivinganyresult.
Toovercomethis,wedecidedtodeveloptherecommendersystemthatwillallowall1119termstobesubmittedbutwillprocessthetermsonebyoneinstead.EachtermwillbesubmittedtotheAPItogetalistofontologyrecommendationsandtheirscores.Theontologywiththehighestscorewillbeselectedastheontologyrecommendedfortheindividualterm.Thefrequencyofallontologiesrecommendedwillthenbecounted.Theontologywiththehighestfrequencywillbereusedfirst,followedbythesecondandthefollowingontologyuntilweachievetheoptimumcoverage.Figure5summarizestheusedprocess.
Figure6showsanexcerptof theresultfromprocessing1119termsusingtherecommendersystemwehavedeveloped.Therecommendersystemallowsustosubmitquiteahugenumberoftermstobeprocessedandtheresultwillthenbestoredforanalysis.Thisisseenasabetteralternativebecauseitisimpossibletostraightawaysubmitall1119termsandgetaresultsincetheAPIserverwouldnotbeabletohandlesuchahugenumber.Figure6(a)showsanexcerptofallthelistoftermssubmittedandtheontologyrecommendedforeachtermbasedonthefinalscoretheygot.ThisscoreiscalculatedbyBioPortalusingthesameformulathattheyusedfortheirrecommender.
Figure 4. BioPortal’s ontology recommender
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
10
After all terms have been submitted to the recommender, the frequency of the ontologyrecommendedforeachtermwillbecountedandsortedfromhighesttolowest.TherecommendersystemhasrankedNCITastheontologywiththehighestfrequency–whichhavebeenrecommendedfor476terms;followedbySNOMEDCT(207)andRadLex(48)asshowninFigure6(b).ThishasproventhatourinitialselectionofreusingFMA,SNOMEDCTandRadLexwasnotveryaccuratesinceFMAdoesnotevenappearinthetop20oftheontologiesrecommendedforthiscorpus.ThereasonforthiscouldbebecauseFMAisverylargeandbroadsoitlosesalotofpointsinthespecializationscore.Itsbroaddomainalsomeansthatitcoversonlythegeneraltermsavailableinthecorpuscausingittoalsoloosepointsinthecoveragescorehenceresultinginitnotbeingrecommendedforreuse.
Figure 5. Algorithm for getting ontology recommendations for term list
Figure 6. (a) Ontology recommendation for each term (b) Ranking of ontology recommended
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
11
Term to Concept MappingTerm(fromthecorpus) toconcept (fromtheontology)mapping is thenextstep inbuilding theAUOandthisisachievedbyusingtheresultacquiredfromtheBioPortal’sSearchAPI(BioPortal,n.d.).TheAPIallowsuserstoinsertseveralparameterstoperformconceptsearchfromaspecificontologyusingseveralparameters(“q”forexampleisusedforsearching).TheAPIwillthenreturnconceptsthatmatchthetermwithsomepropertiessuchasthepreferredlabel,definition,synonym,matchtypeandtheterms’relationshipwithitschildren,descendant,parentsandancestors.Ifseveralconceptswerereturned,theconceptthathastheclosestsemanticmeaningtothetermsubmittedwillbemanuallychosen.Earlierworks(Mejinoetal.,2008;Shahetal.,2014)haveperformedthistaskbygoingthroughalltheconceptsinanexistingontologyanddeletingirrelevantconcepts.However,usingBioPortalprovideduswithamorerigorousapproach.Figure7illustratesanexampleoftheresultsreturnedwhenthetermpancreaswassearched.
TherewasinitiallyanattempttoautomaticallygeneratetheontologyusingProtègè(theOWLeditorthatwasusedindevelopingtheontology)bysimplydownloadingtheconceptsandrelationshipsreturnedbytheAPIinXMLformat.However,thiswasnotperformedatthisstagefortwomainreasons.Thefirst,isthatthedatareturnedbytheAPIdoesnotprovidethecompletepropertiesofaconcept.Someofthepropertieswereprovidedaslinkswherebyitneedstobevisitedfirstbeforewecanaccesstheconcept.Figure7showsascreenshotoftheresultreturnedbytheBioPortalAPIwhentheterm“pancreas”wassearched.Asseeninthefigure,thechildren,parentanddescendantsoftheconceptwasreturnedasalink.Ifthesedatawereusedtoautomaticallygeneratetheontology,itwillnotbemeaningful.Thesecondreasonistheissueofpolysemywheretermscanhavemanydifferentmeaningsandhumaninterventionisneededtoselecttherightmeaningdependingonthecontext.
Figure8isusedasaguidelinetodecidewhetheratermshouldbereusedornot.WeselectatermfromthetermlistandusingtheSearchAPI,querytheontologywiththehighestfrequency(NCITinthiscase).Ifamatchisfound,wecheckifthematchisapreferredlabel(PrefLabel)(theconceptfoundistheexactmatchoftheterm),synonym(thetermisfoundasasynonymoftheconcept)orpartialmatch(thereisnoexactmatchforthetermbutthereareatleasttwoconceptsthatmatchtheterm).IfthematchisaPrefLabelorsynonymmatch,theconceptwillbereused.Ifthematchispartial,theconceptsthatmakeupthetermwillalsobereusedbutitwillremaininthetermlisttobecomparedtotheconceptsofotherontologies.
Onceaconceptisselectedforreuse,thealgorithmsearchesifithasparentsorancestors.Doranetal.(2007)intheirworksuggestthatinordertomaintainthemodularityofanontology,aconcept
Figure 7. The result returned by BioPortal API when the term “pancreas” was searched
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
12
shouldbeextractedtogetherwithitssubclassesorchildreninsteadofparentsandancestors.Theyargue that immediateparentsandancestorsareunimportantandextracting themwould increasetheriskofcreatinganontologythat isequal to theontologybeingreused.However,webelievethatparentsandancestorsareimportantinconnectingconceptssothattheywouldnotbefloating.Ifweweretotake“spleen”anditssubclassasonemodule,“kidney”anditssubclassasanothermodule,aswellas“millimeter”anditssubclassasanothermodule,itwouldbehardtogroupthesemodulesunderthesamecategory.Furthermore,theontologybeingdevelopedisveryspecifictotheabdominalultrasounddomain.Thus,reusingparentsandancestorsofaconceptreducestheriskoftheontologybeingaslargeastheoriginalone.Oncealltermshavebeensearched,thisprocesswillthenberepeatedfortheremainingrecommendedontologieswhichareSNOMEDCTandRadLex.
Awalkthroughexample:Consideralistoftermsthatcontainsthreewordswhichare“gallbladder”,“ductdilation”and“gallstone”.Wefirsttakethefirstword“gallbladder”andqueryiftheconceptexistinNCIT.ThisreturnsanexactmatchwherethereexistsaconceptinNCITwiththepreferredlabel“gallbladder.”Thus,thisconcepttogetherwithallitsknowledgedetailwillbereused.Wethenlooktoseeiftheconcepthasanyparentsorancestors.“Gallbladder”hasaparent“organ”andanancestor“anatomicstructure,system,orsubstance”whichbothwillbereused.Since“gallbladder”hasanexactmatchinNCIT,itisremovedfromthetermlist.
Wethenquerythesecondword“ductdilation”inNCITwhichreturnsapartialmatchwhichconsistoftheword“duct”andanotherword“dilation”.Bothconceptswillbereusedtogetherwiththeirknowledgedetailsaswellastheirparentsandancestors.However,differentto“gallbladder”,“ductdilation”wouldnotberemovedfromthetermlistsinceitisonlyapartialmatch.Thefinalwordinthelist,“gallstone”isthenqueriedwhichgivesasynonymmatchtotheconcept“gallbladderstone”inNCIT.
“Gallbladderstone”togetherwithallitsknowledgedetailincludingsynonymswillbereused.Allitsparentsandancestorswillalsobereused,andthetermwillberemovedfromthetermlist.OncealltermshavebeenqueriedinNCIT,wethenseeifthereareanytermsleftinthetermlist.Thisgiveustheterm“ductdilation”whichisqueriedtoseewhetherthereisanymatchwiththesecondontologyrecommendedwhichisSNOMEDCT.Thisreturnsasynonymmatchwiththeconcept“dilationofduct”.Thus,theconcept“dilationofduct”isreusedandthetermtoconceptmappingprocessisfinallycomplete.
KatsumiandGruninger(2016)intheirworkdiscussedontologymodellinginontologyreusewhere theystressed that foranontology tobeconsideredasbeing reused fromanotherexistingontology,itshouldhaveatleastasmallfragmentoftheexistingontology.Indeterminingwhichmodelanontologyshouldfollow,KatsumiandGruningerstatesthatitdependsonthestrengthoftheexistingontology.Iftheexistingontologyisweakerthanthenewontologythatisbeingdeveloped,thenthenewontologymayonlymapsomepartoftheexistingontology’smodel.However,iftheexistingontologyisstrongerthanthenewontologybeingdeveloped,thenthenewontologymayfollowthemodeloftheexistingontology.
Inthecaseofseveralontologybeingreused,KatsumiandGruningersuggestedthatsomepartofthenewontologyshouldfollowsomepartoftheexistingontologybeingreused.Theyhowever,failtodefinewhatarethecriteriathatclassifiesanontologyasstrongerorweakercomparedtotheother ontology. In this research, threeontologieswere reusednamelyNCIT,SNOMEDCTandRadLex.SinceNCIThasthehighestfrequencybyfar(478),comparedtotheothertwo(207and48respectively),weassumedthatNCITisstrongerthanSNOMEDCTandRadLex.ThishaspromptedusinfollowingthemodellingofNCITinthedevelopmentoftheAbdominalUltrasoundOntology(AUO).
WhenmergingconceptsreusedfromSNOMEDCTandRadLexintotheontologiesreusedfromNCIT,wewouldfirstfindsuitableparentsfortheconcept.Ifnosuchparentexists,theparentandancestorsoftheconceptwillthenbereusedaccordingtothemodellingofNCIT.Ifnomatchisfoundinanyofthesethreeontologies,anewconceptwillthenbecreatedwiththehelpofdomainexperts.
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
13
ThenewconceptscreatedwillbecarefullyintegratedintoAUOandwillcontainthesamelevelofknowledgedetailsasotherconceptsinAUO.Expertisefromthedomainexpertsareneededatthisleveltogivethecorrectdefinitionandaddrelevantsynonymstothesenewconcepts.AllconceptsinAUOwereannotatedwiththeiroriginalontologysothatitiseasiertomakeanyreferenceinthefutureifnecessary.Figure9showsasnapshotoftheAbdominalUltrasoundOntologyasdisplayedinProtégé.
Ontology evaluationThedevelopedAUOwasvalidatedbydomainexpertsmainlytocheckthattherelationshipsbetweenconceptsandtheirdefinitionsarecorrect.Duringtheevaluationphase,thedomainexpertstogetherwiththeontologydevelopers,wentthroughthewholeontology.Someinconsistencieswerehighlightedbutoverallalltheexpertswerehappywiththeinformationgeneratedandstoredintheontology.With regards to vocabulary coverage, the experts believe that 92.7% ontology coverage is goodenoughtocoveralltheimportantconceptsneededforanabdominalultrasoundreportingsystem.Fortheremaining7.3%ofthetermsthathadnomatchintheontology,someofthemwerecausedbyhumanerrorsuchasspellingmistakesmadebythereporters.Therearealsoseveraltermsthatthedomainexpertbelievescouldbeomittedasthesewordsshouldnotbeinanultrasoundreportforgoodpractice.Examplesofsuchwordsare“comettail”,“NAD”,and“hepatopetal”.Thesewordsmightbeunderstoodbytheradiologistbutmightmakenosensetoothers(Edwardsetal.,2014).Themainobjectiveofusingthisontologyreusemethodologyistoachieveasmuchcoverageaspossibleandreducetheneedfordomainexpertsindevelopingtheontology.Ifanontologyweretobebuiltfrom scratch, domain expertswill beneeded from thevery first step indesigning theontology.However,withontologyreuse,domainexpertswereonlyneededattheendoftheprocesstoverifythateverythingiscorrectandtoassistinaddingfewnewconceptsintotheontology.
Figure 8. Term to Concept mapping guide
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
14
ReSULT ANd dISCUSSION
Theevaluationandtestingofthemethodologyweredividedintotwophases.Phase1triestoprovethatontologyreuseprovidesawidercoveragecomparedtousingonlyoneexistingontology.Thetestingwasperformedusingonly49sampleultrasoundreportwhichwasalsousedasthetrainingdataindevelopingAUO.Phase2looksathowwellAUOperformwhenbeingevaluatedwithexistingandnewsampleultrasoundreports.Italsolooksathowdifficultitistoincludeadditionalconceptswhentherearenewrequirements.
Phase 1: 49 Sample Ultrasound ReportsThecoverageofAUOwasfirsttestedonasetof49sampleultrasoundreportswithatotalof761termsincomparisonwithtwoexistingontologieswhichareNCITandSNOMEDCT.Atermtoconceptmatchingwasperformedusingthe761termsextractedfromthesampleultrasoundreportcorpusandtheresultareshowninFigure10.ThedevelopedAUOgavethehighestnumberofconceptmatchcomparedtoreusingonlyoneontology.BetweenNCITandSNOMEDCT,NCIThasthehigherconceptmatchwithatotalof151PrefLabelmatches,79synonymsmatchesand438partialmatches.SNOMEDCTontheotherhandhasonly98PrefLabelmatches,104synonymsmatchesand431partialmatches.ThereasonSNOMEDCThaslowerPrefLabelmatchescomparedtosynonymsisbecauseofitsnamingconvention.Forexample,thepreferredlabelfor“kidney”is“kidneystructure”and“entiregallbladder”for“gallbladder”.Whenwritingreports,radiologistoftenusesimplerwordslike“kidney”and“gallbladder”insteadof“kidneystructure”and“entiregallbladder”thus,whentermtoconceptmatchingwasperformed,SNOMEDCTreturnedmoresynonymmatchesthenPrefLabel.
ComparedtoNCITandSNOMEDCT,AUOreturnsthehighesttotalmatcheswith176PrefLabelmatches,111synonymmatchesand418partialmatches.ThereasonAUOreturnsthemostnumberofmatchesisbecausetheontologyreusemethodologyselectsthebestmatchesfromdifferentontologiesandmergethemintotheAUO.ItsexhaustivemappinginseveralontologiesbasedontheontologyrankhasensuredthatalmostalltermsinthecorpuswerecoveredbyAUO.Wheneverpossible,aPrefLabelmatchwillbeinsertedintheontology.Ifnot,asynonymmatchwillbeaddedthenonlypartialmatchesareincludedtoensuretheontologyhasawidecoverageofthecorpus.
Asexpected,andconfirmedbytheresults,itisbettertoreuseseveralontologiesthenusingasingleoneasthisoffersabettercoverage.Figure11showsthepercentageoftotalmatchandnomatchinallthreeontologies.Ifontologyreusewasdonebymappingthe761termsagainstNCIT,there
Figure 9. Snapshot of the Abdominal Ultrasound Ontology as displayed in Protégé
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
15
willonlybean87.8%ofcoverage.IfthemappingweredoneagainstSNOMEDCT,thepercentageofcoveragewouldbeonly83.2%whichislowerthanNCIT.However,thepercentageofcoverageincreasesto92.6%whenseveralontologieswerereused;whichinthiscaseareNCIT,SNOMEDCT,andRadLex.Thepercentageofnomatchisalsoverysmall(7.4%)whichmeansthattheAUOcoversalmostallthetermsinthecorpus.Thereasonthereisstill7.4%ofnomatchisbecausethereareseveraltermsinthecorpusthatthedomainexpertsbelievearepoorusageoftermstodescribefindingsinanultrasoundreport.ThedomainexpertbelievesthatthisisbadpracticeandthemedicalultrasoundexpertsarenowslowlycuttingdowntheusageofsuchwordsthusmakingitirrelevanttobeintheAUO.Anotherreasonforthe7.4%ofnomatchisspellingerrorsmadebyultrasoundreporters.Thisisnotaconcernfornowbutforfuturework,wecouldconsiderusingtheontologytoalsocorrectandunderstandtheseerrors.
NCIThasatotalof118,941classeswhileSNOMEDCThas324,129classes.However,thereareonly668and633matchesrespectivelyforeachNCITandSNOMEDCTregardingabdominalultrasoundterminology.Ontheotherhand,AUOhasonly509classeswhichislessthan0.5%ofeitherNCITorSNOMEDCTbutstillmanagedtohave705matcheswhichismorethanthematchesNCITandSNOMEDCTeachgets.Thisisbecauseofthespecializationoftheontology.Sincetheontologyhasanintendedpurposeinanapplication,itismuchbetterandmoreefficienttobuildadomainspecificontologythroughreuse.ItdefinitelywouldnotbeefficienttostorealargeontologysuchasNCITandSNOMEDCTanduseonlylessthan0.6%ofit.Thisisbecauseitwouldtakealotofstoragespaceanditwillalsoslowdowntheapplicationsincetheapplicationwillneedtogothroughthewholeontologytofindamatch.Thus,thebetterwaytodevelopanontology-basedapplicationistobuildanewdomainspecificontologythroughontologyreusemethodology.
Figure 10. Breakdown of total match according to type against NCIT, SNOMED CT and Abdominal Ultrasound Ontology (AUO)
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
16
Phase 2: 100 Sample Ultrasound ReportsPhase1ofthetestingandevaluationprocessoftheontologyreusemethodologyhasproventhatontologyreuseprovidesawidercoveragecomparedtousingonlyoneexistingontology.Phase2looksathowwellAUOperformsinrelationtothetermtoconceptmappingwhenbeingevaluatedusing100samplereports;where49ofthesamplereportswereusedfortrainingpurposeandtheother51wereanewsampleultrasoundreportswhichwerenotusedindevelopingAUO.Phase2alsolooksathowmuchworkisneededtoupdatetheontologytoincludeadditionalconceptsbasedontherequirementsofthenewcorpus.
Termsextractionwhichwasperformedonthe100samplereportsreturned1119termswhichis358morethanthetermsextractedinphase1.Thenumberofmatchesforthe761termshasalreadybeenfoundoutinPhase1.Inordertofindoutthenumberofmatchestherestofthetermsgot,wecheckedwhethertheremaining358termsexistinAUO.Figure12showsthepercentageofeachtypeofmatchesaswellasnomatchforall358termswhencomparedwithAUO.
Overall,thereisaquitehighpercentageoftotalmatchwhichis72.1%where237outofthe358termswerefoundaspartialmatch(66.2%),15assynonymmatch(4.2%)and6asprefLabelmatch(1.7%).ThisshowsthattheAUOisadequateenoughtocovermosttermsinthebiomedicaldomainsincethetotalofnomatchisquitesmallwhichis27.9%.Inabdominalultrasoundreporting,thecontentofthereportsusuallycontainswordswhicharemoreorlesssimilarinsemanticeventhoughdifferent termswereusedespecially in thecaseofnormalultrasoundreports.Only inabnormalultrasoundreportswecouldfindahighervarietyofwordsbeingused.WiththeexistenceofAUOasaknowledgebase,theusageofdifferentwordsinwritingultrasoundreportswouldnothavematteredbecausethesemanticofthewordsaremoreimportantthanthewordsthemselvesandbecauseofthesimilarityofthecontents,asmallnumberofthecorpususedastrainingdatastillgivesquiteagoodnumberoftotalmatchpercentage.
Aftercomparingthe358termswithAUO,wefoundthatthereare100termswithoutamatchinAUO.Agoodmethodologyshouldbeabletoallowforittoadapttoanychangeswhenrequired.Thus,inordertoreducethepercentageofnomatch,theontologyreuseprocesswasdoneagainonall358termstoincludenewconceptsintoAUO.Firstly,weneedtocheckwhethertheadditionof51newsamplereportschangestheontologyrecommendedforreuse.Fortunately,NCITstillcomesupasthebestontologytoreusefollowedbySNOMEDCTandRadLex.Next,all358termswerethencomparedwithNCITsothattermtoconceptmappingcouldbeperformed.Oncetheprocesswascompleted,itwasthenrepeatedwithSNOMEDCTandRadLex.Aftertheadditionofthenewconceptsandsynonymssuchas“adnexalmasses”,“incidentalfinding”and“morphology”toAUO,thetotalnumberoftermswithoutamatchwasreducedfrom100to26,whichisa74%reduction.
Figure 11. Percentage of total match and no match in NCIT, SNOMED CT and AUO
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
17
Figure13showsthepercentageoftotalmatchofallthreetypesofmatchesaswellasthepercentageofnomatchforthe1119termsafternewconceptshavebeenaddedtoAUO.
The totalmatchpercentageafter theadditionofnewconceptshas increased from72.1% to92.7%where692outofthe1119termswerefoundaspartialmatch(61.8%),139assynonymmatch(12.4%)and206asprefLabelmatch(18.4%).Beforetheadditionofthenewconcepts,thepercentageofpartialmatchwashigherat66.2%comparedtothecurrentpercentageof61.8%.ThisnumberwasreducedafterthetermtoconceptmappingwasdonebecausesomeofthepartialmatcheswerefoundaseitherasynonymorprefLabelmatchinNCIT,SNOMEDCTorRadLex.Forexample,theword“rightovary”wasfirstlabelledaspartialmatchsinceAUOdoesnothaveaconceptcalled“rightovary”butithastwoconceptswhichare“right”and“ovary”.However,afterperformingtermtoconceptmappingagain,“rightovary”wasfoundasaprefLabelmatchinNCITthustheconceptwasaddedtoAUO.ThiscausesthepercentageofsynonymandprefLabelmatchtoalsoincreasefrom4.2%and1.7%to12.4%and18.4%respectively.Figure14showsthecomparisonofthepercentageofprefLabel,synonym,partialandnomatchforphase1,phase2(beforetheextension)andphase2(aftertheextension).
Phase2ofthetestingandevaluationofAUOshowsthatthenumberofsampleultrasoundreportsusedas testingdatawasadequateinensuringthatAUOwouldbeable tocovermostabdominalultrasoundsamplereports.Thisalsoprovesthatthedomainexpert’sassumptionthatthe92.7%totalmatchesofthefirstversionofAUOisindeedenoughtocovermostabdominalultrasoundsamplereports.Italsoshowsthattheontologyreusemethodologyproposedinthisresearchisadaptablesinceitisfairlyeasyfornewconceptstobeaddedaccordingtonewrequirementsfromadditionalsamplereports.
CONCLUSION
Ontologyreusecanbebeneficialindevelopingdomainspecificontologiesforapplicationsystemwherebyitreducesdevelopmenttimeandredundancy.Thelackofpropermethodologyandtoolsin reusing ontology has hindered this effort. Thus, this paper proposed a methodology to reuseontologytogetherwithsupportingtoolsthatwouldmaketheontologyreuseprocessmucheasier.
Figure 12. Percentage of total prefLabel, synonym, partial and no match for 358 new terms when compared to AUO
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
18
ThedevelopmentofAUOusingthismethodologyhasproventhatontologyreuseisbeneficialindevelopingasmalldomainspecificontologywhichhaswidecoverageoftheterminologyusedintheapplicationsystemcomparedtousingalargegeneraldomainontology.Italsoprovesthatthemethodologyisadaptableasitallowsforchangestobemadeeasilywhentherearenewrequirementsfromthecorpus.Itishopedthattheproposedontologyreusemethodologywouldencouragemoreusageofontology inmedicalsystemwithout thedevelopmentofsimilardomainontologies thatwouldcauseredundancy.
Figure 13. Percentage of total prefLabel, synonym, partial and no match for all 1119 terms when compared to AUO
Figure 14. Comparison of the percentage of prefLabel, synonym, partial and no match for phase 1, phase 2 (before the extension) and phase 2 (after the extension)
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
19
RefeReNCeS
Alani,H.(2006).Positionpaper:Ontologyconstructionfromonlineontologies.InProceedings of the 15th International Conference on World Wide Web, WWW ’06,Edinburgh,Scotland(pp.491–495).NewYork,NY:ACM.doi:10.1145/1135777.1135849
Alani,H.,Brewster,C.,&Shadbolt,N.(2006).RankingOntologieswithAKTiveRank,pages1–15.SpringerBerlinHeidelberg,Berlin,Heidelberg.doi:10.1007/11926078_1
Benson,T.(2010).Principles of Health Interoperability HL7 and SNOMED.London:Springer.
BioPortal(n.d.).APIDocumentation.http://data.bioontology.org/documentation
Boland, G. (2007). Enhancing the radiology product: The value of voice-recognition technology. Clinical Radiology,62(11),1127.doi:10.1016/j.crad.2007.05.014PMID:17920876
Bontas,E.P.(2007).A contextual approach to ontology reuse.PhDthesis,FreieUniversitätBerlin.
Bontas,E.P.,Mochol,M.,&Tolksdorf,R.(2005).Casestudiesonontologyreuse.InProceedings of the 5th International Conference on Knowledge Management, IKNOW05.AcademicPress.
Caldarola,E.G.,Picariello,A.,&Rinaldi,A.M.(2015).Anapproachtoontologyintegrationforontologyreuseinknowledgebaseddigitalecosystems.InProceedings of the 7th International Conference on Management of Computational and Collective intelligence in Digital EcoSystems, MEDES ’15,Caraguatatuba,Brazil(pp.1-8).NewYork,NY:ACM.doi:10.1145/2857218.2857219
Capellades,M.A. (1999).Assessmentof reusabilityof ontologies: a practical example. InProceedings of AAAI1999 Workshop on Ontology Management(pp.74–79).AAAIPress.
Doran, P., Tamma, V., & Iannone, L. (2007). Ontology module extraction for ontology reuse: Anontology engineering perspective. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM ’07,Lisbon,Portugal (pp.61-70).NewYork,NY:ACM.doi:10.1145/1321440.1321451
Edwards,H.,Smith,J.,&Weston,M.(2014).Whatmakesagoodultrasoundreport?Ultrasound,22(1),57–60.doi:10.1177/1742271X13515216PMID:27433194
Fernández-López, M., Gómez-Pérez, A., & Suárez-Figueroa, M. C. (2013). Methodological guidelines forreusinggeneralontologies.Data & Knowledge Engineering,86,242–275.doi:10.1016/j.datak.2013.03.006
Jonquet,C.,Musen,M.A.,&Shah,N.H.(2010).Buildingabiomedicalontologyrecommenderwebservice.Journal of Biomedical Semantics,1(S1),S1.doi:10.1186/2041-1480-1-S1-S1PMID:20626921
Kahn,C.E.Jr,Langlotz,C.P.,Burnside,E.S.,Carrino,J.A.,Channin,D.S.,Hovsepian,D.M.,&Rubin,D.L.(2009).Towardbestpracticesinradiologyreporting.Radiology,252(3),852–856.doi:10.1148/radiol.2523081992PMID:19717755
Katsumi,M.,&Gruninger,M.(2016).Whatisontologyreuse?InFerrario,R.andKuhn,W.,(Eds.),Formal Ontology in Information Systems:Proceedings of the 9th International Conference, FOIS 2016(pp.9-22).IOSPress.
Lossio-Ventura, J. A., Jonquet, C., Roche, M., & Teisseire, M. (2014). Biotex: A system for biomedicalterminologyextraction,ranking,andvalidation.InProceedings of the 2014 International Conference on Posters & Demonstrations TrackISWC-PD’14,RivadelGarda,Italy(pp.157–160).CEUR-WS.org.
Maedche,A.,Motik,B.,Stojanovic,L.,Studer,R.,&Volz,R.(2003).Aninfrastructureforsearching,reusingandevolvingdistributedontologies.InProceedings of the 12th International Conference on World Wide Web, WWW ’03,Budapest,Hungary(pp.439–448).NewYork,NY:ACM.doi:10.1145/775152.775215
Martínez-Romero,M.,Jonquet,C.,Oâ˘,A.́ .Z.,Connor,M.J.,Graybeal,J.,Pazos,A.,&Musen,M.A.(2017).Ncboontologyrecommender2.0:Anenhancedapproachforbiomedicalontologyrecommendation.Journal of Biomedical Semantics,8(1),21.doi:10.1186/s13326-017-0128-yPMID:28592275
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
20
McDaniel,M.,Storey,V.C.,&Sugumaran,V.(2016).Theroleofcommunityacceptanceinassessingontologyquality.InE.Métais,F.Meziane,M.Saraeeetal.(Eds.),Natural Language Processing and Information Systems:21st International Conference on Applications of Natural Language to Information Systems, NLDB 2016,Salford,UK,June22-24(pp.24-36).SpringerInternationalPublishing.doi:10.1007/978-3-319-41754-7_3
Mejino,J.L.Jr,Rubin,D.L.,&Brinkley,J.F.(2008).FMARadLex:Anapplicationontologyofradiologicalanatomyderived from the foundationalmodelofanatomyreferenceontology.AMIA ... Annual Symposium Proceedings - AMIA Symposium.AMIA.PMID:18999035
Noy,N.F.,Shah,N.H.,Whetzel,P.L.,Dai,B.,Dorf,M.,Griffith,N.,&Musen,M.A.et al.(2009).Bioportal:Ontologiesandintegrateddataresourcesattheclickofamouse.Nucleic Acids Research,37,W170–W173.doi:10.1093/nar/gkp440PMID:19483092
Ortiz,A.(2011).Polionto:Ontologyreusewithautomatictextextractionfrompoliticaldocuments.InProceedings of the 6th Doctoral Symposium in Informatics Engineering(pp.309–320).AcademicPress.
Rosse,C.,&Mejino,J.L.V.Jr.(2003).Areferenceontologyforbiomedicalinformatics:TheFoundationalModelofAnatomy.Journal of Biomedical Informatics,36(6),478–500.doi:10.1016/j.jbi.2003.11.007PMID:14759820
Rubin,D.L.(2008).Creatingandcuratingaterminologyforradiology:Ontologymodelingandanalysis.Journal of Digital Imaging,21(4),355–362.doi:10.1007/s10278-007-9073-0PMID:17874267
Russ,T.,Valente,A.,MacGregor,R.,&Swartout,W.(1999).Practicalexperiencesintradingoffontologyusability and reusability. In Proceedings of the 12th Workshop on Knowledge Acquisition, Modeling and Management, KAW’99(pp.16–21).AcademicPress.
Shah,T.,Rabhi,F.,&Ray,P.(2013).OSHCO:Acrossdomainontologyforsemanticinteroperabilityacrossmedical and oral health domains. In Proceedings of the IEEE 15th International Conference on e-Health Networking, Applications Services, HEALTHCOM’13, Lisbon, Portugal,October9-12(pp.460–464).IEEE.doi:10.1109/HealthCom.2013.6720720
Shah,T.,Rabhi,F.,Ray,P.,&Taylor,K.(2014).Aguidingframeworkforontologyreuseinthebiomedicaldomain.InProceedings of the47th Hawaii International Conference on System Sciences, HICSS 2014(pp.2878–2887).IEEE.doi:10.1109/HICSS.2014.360
Simperl,E.(2009).Reusingontologiesonthesemanticweb:Afeasibilitystudy.Data & Knowledge Engineering,68(10),905–925.doi:10.1016/j.datak.2009.02.002
Trokanas,N.,&Cecelja,F.(2016).Ontologyevaluationforreuseinthedomainofprocesssystemsengineering.Computers & Chemical Engineering,85,177–187.doi:10.1016/j.compchemeng.2015.12.003
Uschold,M.,Clark,P.,Healy,M.,Williamson,K.,&Woods,S.(1998).Anexperimentinontologyreuse.InProceedings of the 11th Knowledge Acquisition Workshop, KAW’98.AcademicPress.
Zulkarnain,N.Z.,Crofts,G.,&Meziane,F.(2015a).Anarchitecturetosupportultrasoundreportgenerationandstandardisation.InProceedings of the 8th International Conference on Health Informatics, HEALTHINF 2015,Lisbon,Portugal(pp.508–513).SCITEPRESS.doi:10.5220/0005252505080513
Zulkarnain,N.Z.,Crofts,G.,&Meziane,F.(2015b).Aproposedmodelforthestandardisationofultrasoundreport.Poster presented at the UK Radiological Congress,ACCLiverpool,UK.AcademicPress.
Zulkarnain,N.Z.,Crofts,G.,&Meziane,F.(2016a).Automaticclassificationofabdominalultrasoundfindingsasnormal,abnormalor inconclusive.Poster presented at theBritish Medical Ultrasound Annual Scientific Meeting & Exhibition,York,UK.AcademicPress.
Zulkarnain,N.Z.,Meziane,F.,&Crofts,G.(2016b).Amethodologyforbiomedicalontologyreuse.InE.Métais,F.Meziane,M.Saraeeetal.(Eds.),Natural Language Processing and Information Systems:21st International Conference on Applications of Natural Language to Information Systems, NLDB 2016,Salford,UK,June22-24(pp.3-14).SpringerInternationalPublishing.
International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019
21
eNdNOTeS
1. http://bioportal.bioontology.org2. http://www.nactem.ac.uk/software/termine3. http://tubo.lirmm.fr/biotex/
Nur Zareen Zulkarnain received her BTech. (Hons) degree from Universiti Teknologi PETRONAS, Malaysia in 2011, an MSc. from The University of Manchester, UK in 2012 and a PhD. in Computer Science from the University of Salford, UK in 2017. Currently, she is a Senior Lecturer in the Department of Intelligent Computing & Analytics in Universiti Teknikal Malaysia Melaka where her teaching and research mainly involves artificial intelligence in the area of informatics and natural language processing.
Farid Meziane holds a chair in Data and Knowledge Engineering and a PhD in Computer Science from the University of Salford and is the head of the Informatics Research Centre. He is the co-Chair of the International Conference on Application of Natural Language to Information Systems (NLDB) and has served on the programme committees of over 20 conferences. He is on the editorial board of 5 international journals. His research interests are in the broad area of data and knowledge engineering. This includes data mining, information extraction and retrieval, Big Data, and the semantic web.