A methodology for ontology reuse : the case of the ...usir.salford.ac.uk/id/eprint/51432/8/Published Copy... · todevelopanAbdominalUltrasoundOntologybyextractingconceptsfromexistingbiomedical

A m e t ho dology for on tology r e u s e : t h e c a s e of t h e a b do min al

ul t r a so u n d on tologyZulka r n ain, NZ a n d M ezia n e, F

h t t p://dx.doi.o rg/1 0.40 1 8/IJIIT.201 9 1 0 0 1 0 1

Tit l e A m e t hodology for on tology r e u s e : t h e c a s e of t h e a b do min al ul t r a sou n d on tology

Aut h or s Zulka r n ain, NZ a n d M ezia n e, F

Typ e Article

U RL This ve r sion is available a t : h t t p://usir.s alfor d. ac.uk/id/e p rin t/51 4 3 2/

P u bl i s h e d D a t e 2 0 1 9

U SIR is a digi t al collec tion of t h e r e s e a r c h ou t p u t of t h e U nive r si ty of S alford. Whe r e copyrigh t p e r mi t s, full t ex t m a t e ri al h eld in t h e r e posi to ry is m a d e fre ely availabl e online a n d c a n b e r e a d , dow nloa d e d a n d copied for no n-co m m e rcial p riva t e s t u dy o r r e s e a r c h p u r pos e s . Ple a s e c h e ck t h e m a n u sc rip t for a ny fu r t h e r copyrig h t r e s t ric tions.

For m o r e info r m a tion, including ou r policy a n d s u b mission p roc e d u r e , ple a s econ t ac t t h e Re posi to ry Tea m a t : u si r@s alford. ac.uk .

mailto:[email protected]

DOI: 10.4018/IJIIT.2019100101

International Journal of Intelligent Information TechnologiesVolume 15 • Issue 4 • October-December 2019

Copyright©2019,IGIGlobal.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIGIGlobalisprohibited.

1

A Methodology for Ontology Reuse: The Case of the Abdominal Ultrasound OntologyNur Zareen Zulkarnain, Centre for Advanced Computing Technology, Fakulti Teknologi Maklumat Dan Komunikasi, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia

Farid Meziane, University of Salford, Manchester, United Kingdom

https://orcid.org/0000-0001-9811-6914

ABSTRACT

There is an abundance of existing biomedical ontologies such as the National Cancer InstituteThesaurusand theSystematizedNomenclatureofMedicine-ClinicalTerms. Implementing theseontologiesinaparticularsystemhowever,maycauseunnecessaryhighusageofmemoryandslowsdownthesystems’performance.Ontheotherhand,buildinganewontologyfromscratchwillrequireadditionaltimeandefforts.Therefore,thisresearchexplorestheontologyreuseapproachinordertodevelop anAbdominalUltrasoundOntologyby extracting concepts fromexistingbiomedicalontologies.Thisarticlepresentsthereaderwithastepbystepmethodinreusingontologiestogetherwithsuggestionsoftheoff-the-shelftoolsthatcanbeusedtoeasetheprocess.Theresultsshowthatontologyreuseisbeneficialespeciallyinthebiomedicalfieldasitallowsfordevelopersfromthenon-technicalbackgroundtobuildandusedomainspecificontologywithease.Italsoallowsfordeveloperswithtechnicalbackgroundtodevelopontologieswithminimalinvolvementsfromdomainexperts.

KeywORdSBiomedical Ontology, Bioportal, Ontology Reuse, Ontology

INTROdUCTION

Inpreviouswork,Zulkarnainetal.(2015a)proposedthedevelopmentofanarchitecturetosupportultrasoundreportgenerationandstandardisation.Theproposedarchitectureandreportwerevalidatedbyradiologistsandspecialistsatthe2015UKRadiologicalCongressheldintheCityofLiverpool(Zulkarnainetal.,2015b)andthe2016BritishMedicalUltrasoundAnnualScientificMeetingandExhibitionheldintheCityofYork(Zulkarnainetal.,2016a).Inthemedicalandradiographyfields,ultrasoundreportsarethemainmediausedtocommunicatetheresultsofanultrasoundexaminationfromasonographerorradiologisttoareferringclinician.Itwasreportedthatimagesaloneareoflimitedvaluesincetheoutcomesofanyultrasoundinvestigationarebasedonthefindingsduringthescan(Boland,2007).Indeed,manyfeaturesandquantitativedataarecollectedduringtheultrasoundexaminationsuchastissuecharacterisationandvariousmeasurementsanditisthisinformationthatiscommunicatedviathereports.

Theuseofinformationtechnology(IT)inthemedicalfieldsuchaselectronicpatients’recordsand somedecision support systemshasallowed for abetterunderstandingof somepathologies,healthmanagementandpatientcare.However,theintegrationofITintheradiologyfieldislimitedparticularlyinthereportingphase.Duringthedevelopmentofthestandardreportanditsvalidation,itwashighlightedbytheradiologistsandcliniciansthatthemainissueisthevariationsinthereporting

https://orcid.org/0000-0001-9811-6914


2

styles.Thesevariationswerenoticedinthestructureofthereportsaswellasintheterminologiesused.Thesevariationsmayimpactonthewayareportisinterpretedandinturnaffectthedecision-makingprocessandthewayapatientismanaged(Zulkarnainetal.,2015a).Radiologistsandcliniciansbelievethatthesolutiontothisproblemresidesinusingstructuredreportingwiththesupportofanontologyasitsknowledgebase(Kahn,etal.,2009).Theuseofanontologywillallowthestandardisationoftheterminologyusedduringreporting,allowingabetterexploitationofthesereportsbycomputerisedtoolsforknowledgediscovery,classificationandpredictions.

Theworkreportedinthispaper,isthedevelopmentofanontologythatwillcomplementandsupportthecomputerisedstandardreportdevelopedbyZulkarnainetal.,(2015a).Furthermore,thispaperisanextensionandconsolidationoftheworksreportedbyZulkarnainetal.(2015a,2015b,2016a,2016b).

Therearedifferentapproachestodevelopontologies.Wecanuseexistingontologies,developone from scratch or adapt and reuse existing ontologies. Each approach has its advantages anddisadvantages.Forexample,inonehandreusinganexistingontologywillrequirenoresourcesforitsdevelopmentbutmaybetoolargeforaspecificapplicationanditsintegrationtotherestofthesystemcouldbeproblematic.Ontheotherhand,developinganewonewillrequirealotofeffortsduringitsdevelopmentbutmayfitbettertotherequirementsofthenewapplication.Inthisresearch,wearespecificallyinterestedinabdominalultrasoundreporting,andtheexistingontologiesevaluatedwerefoundtobelarge.Forexample,usingtheNationalCancerInstituteThesaurus(NCIT)wouldrequirealargestorage,asitcontainsasmanyas118,941classes,andmoretimetoprocess.InthisresearchweadoptedtheontologyreuseapproachfordevelopingtheAbdominalUltrasoundOntology(AUO)tobeusedintheultrasoundreportingsystem.

Thispaperfirstreviewsthemethodsthathavebeenusedinpastontologyreuseworksandthenassesses thepossibilityofreusingoneof thethreeestablishedbiomedicalontologiesnamelytheFoundational Model of Anatomy (FMA), the Radiology Lexicon (RadLex) or the SystematizedNomenclatureofMedicine-ClinicalTerms(SNOMEDCT).Wethenproposeamethodologytoreusebiomedicalontologiestogetherwiththeexistingtoolsthatcanbeusedtofacilitatethereuseprocess.Themethodologyaidsontologydevelopersin:(i)selectingsuitableontologiesforreusebasedontheircorpus;(ii)selectingtheconceptstoreusefromtheselectedontologies;(iii)evaluatingthedevelopedontologywithminimalhelpfromthedomainexperts.ItisanticipatedthatthedevelopmentoftheAUOwillservetwomainpurposesinthestandardisationoftheultrasoundreportingsystem:(i)itwillbeusedtostandardizethedevelopmentofultrasoundreportsandenforcetheuseofastandardterminologyand(ii)toanalyseultrasoundreportswritteninNaturalLanguage(Englishfree-text)withtheaimofautomaticallytransformingthemintoastructuredformat.

Theremainingofthepaperisorganisedasfollows.Section2reviewssomeofthepreviousworksonontologyreuseanddiscussestheirsuitabilitytobeusedinthecurrentresearch.Insection2,wereviewsomeoftheexistingbiomedicalontologiesandhowwelltheycoverthetermsandlexiconusedinultrasoundreporting.Theproposedreusemethodologywillbedescribedindetailinsection3.WediscusstheresultsofusingtheproposedmethodologyindevelopingtheAUOinsection4andconcludethepaperinsection5.

ReLATed wORK

Ontologyreuseistheprocesswherepartsofexistingontologiesareusedinthedevelopmentofnewones(Bontas,2005).Thereusedpartsareveryoftenmanipulatedtomeettherequirementsofanewapplication(KatsumiandGruninger,2016).Suchanapproachreducesthedevelopmentcostofanontologyastheneedfordomainexpertsisminimal(Alanietal.,2006)andincreasesinteroperability(Simperl,2009)asthenewandoldontologieswillsharefeaturesandconcepts.However,existingtoolsarenotprovidingthenecessarysupportfortheontologyreuseprocess(Maedcheetal,2003;Simperl,2009).Mostontologyreusemethodologiesproposed(Alani,2006;Caldarolaetal.,2015;


3

Capellades,1999;Fernández-Lópezetal.,2013;Russetal.,1999;Shahetal.,2013;Simperl,2009,Uscholdetal.,1998)followthefollowingfoursteps:(i)Ontologyselectionforreuse,(ii)Conceptselection,(iii)Conceptcustomizationand(iv)Ontologyintegration.

Thepurposeofthefirststep,ontologyreuse,istoselecttheontologytobereused.Theselectionisbasedonasetofcriteriaaccording to the requirementsof thenewontologyand includes thelanguageusedinthedevelopmentoftheontology,itsreasoningcapabilitiesandhowwellitcoverstheterminologyusedinthespecificapplicationdomain.Wenotethatinthisphase,ifrequiredbytheneedsofthenewontology,severalontologiescanbeselectedforreuse.Theselectionstepisfollowedbytheconceptsselectionphasethatarethentranslatedandcustomisedtosuittheneedsofthenewontologyintheconceptcustomizationstep.Thefinalstepinvolvestheintegrationofthenewlydevelopedontologyintothenewsystemorapplication.

IntheontologyreusesystemproposedbyAlani(2006),allthenewtermsneededforthenewontology are first listed to determine which existing ontology should be selected for reuse. Thesystemthensearchesforrelevantontologiesonlineusingthetermsthathavebeenlistedandfromtheobtainedresult,theontologieswillthenberankedandonlythefirstfewwillbeanalysedandselectedforreuse.Oncetheontologyforreusehasbeenselected,thesystemwilldeterminewhethertoreusethewholeontologyoronlyasegmentofit.Thisstepcouldbeconsideredasconceptselectionwhererelevantconceptsarebeingselected.Inhiswork,Alani(2006)hasdecidedtoreuseseveralontologiesforoneconcept.Thus,eachgroupofrelatedconceptsneedstobecustomizedtoensurethattheconceptshaveastandardisedformatsothattheycanbemergedasoneconcept.Sincetheconceptisreusedfromseveralontologies,eachconceptcontainsdifferentpropertieswhichresultedinadditionalknowledgerepresentation.Finally,theontologyisautomaticallyevaluated.

AnotherexampleofontologyreuseisinthedevelopmentoftheOral-SystemicHealthCross-DomainOntology(OSHCO)byShahetal.(2013).IndevelopingOSCHO,thefirststeptakenbyShahetal.wastodeterminethescopeoftheontologybyrecognizingthedomainofcoverage,theintendeduseandwhatquestionsshouldOSCHObeabletoanswer.Oncethescopehasbeendetermined,Shahetal.usedatoolinBioportal1andsubmittedseveraldomainrelatedtermstotestthedomaincoverageofseveralontologiesthathavethepotentialtobereused.ThishasresultedinSNOMEDCTbeingthebestcandidate.ComparedtoAlani,Shahetal.reusedonlyoneontology,SNOMEDCTwheretheyselected theconceptsneeded thenaddedother relevantconceptsnot included inSNOMEDCT.Sincetheyreusedonlyoneontology,thenewontologydeveloped,OSCHO,closelyfollowsthemodelofSNOMEDCTwherethenewconceptsaddedarecustomizedtofollowthesamemodel.

Russetal.(1999)intheirworkaimtodevelopanaircraftontologythatcanbeusedbyseveralapplicationstoensureknowledgesharingbetweenthem.Duringthedevelopmentoftheirwork,theyrealizedthatthereexisttwoontologiesthatarerelatedtotheaircraftdomain.

Furthermore,someconceptsexistinoneontologybutnotintheother.Thus,Russetal.decidedtomergebothontologiestocreateamorecompleteone.ThefirststeptakenbyRussetal.isselectingtheontologies to reusewhich are the timeontology (which is publicly available) aswell as thetwoexistingaircraftontologies.ThetimeontologyiswritteninOntolinguawhilethetwoaircraftontologieswerewritteninLoom.Afterbothaircraftontologiesweremerged,thetimeontologyistranslatedtoLoomsothatitcouldbeintegratedintotheaircraftontology.ThesameapproachwasusedbyCaldarolaetal.(2015)indevelopinganontologyinthefooddomainwheremetadataweremanuallytranslatedtobetterunderstandtheconcepts.

Polionto(Ortiz,2011)isanotherexampleofanontologydevelopedusingontologyreuse.OrtizinhisworkdevelopedPoliontobyreusingtwoontologiesinthepoliticaldomainwherethefirstontologyisinPortugueseandtheotherisinEnglish.Hefirstselectsrelevantconceptsfrombothontologiestoreuseandcomparesthemtotheircorpus.ThoseselectedconceptsarethentranslatedandintegratedtocreateamultilingualpoliticalontologycalledPolionto.

Comprehendingtheneedforatoolthatwillassisttheprocessofontologyreuse,BontasandhercolleaguedevelopedatoolnamedPROMIthatisabletoperformthestepsrequiredinanontology


4

reuseprocess(Bontas,2007)asseeninFigure1.PROMIbeginsbypromptingtheuserstouploadatleasttwoontologiestobereused.Then,thelanguageoftheontologieswillbeexaminedinordertodecidewhetherthetransformationtoOWLlanguageshouldbeperformed.Next,thematchingoftheconceptswasconductedwherePROMIfirstseparatesandnormalisestheconceptnamesbeforecalculating thestringandconceptsimilarity.Finally, theconceptswillbemerged,andusersareallowedtoincludemoreconcepts,propertiesandaxiomsasnecessary.

ThedevelopmentofPROMIisanimmensestepforwardinassistingtheprocessofontologyreuse.However, thereare several limitations inusingPROMIasa tool toassistontology reuse.First,PROMIdoesnotprovidethefacilitytoassessthesuitabilityofanontologytobereusedforaspecificdomain.Instead,itbeginsbypromptingtheuserstouploadontologieswhichtheyhaveselectedbeforehand.Italsodoesnotrecommendanyontologiesforreusebasedonthedomainwhichmeansthattheuserswillneedtousetheirownexpertiseinselectingsuitableontologies.Second,theconceptmatchingbetweenthetwoontologiesdependsheavilyonthesimilaritymeasureschosenbytheusers.Forexample,thesimilaritymeasurefortheword“tournament”and“competition”was0usingtheEuclideanDistancemeasureand0.143usingtheHammingDistancemeasure.Therefore,theusersneedtohavesomeknowledgeaboutthedifferencesbetweenthesemeasuresforthemtobeabletoselectthemostappropriateone.Finally,inordertomergeconceptsinPROMI,userswillneedtoselectfromalistofconceptsanditsequivalentcandidateconceptswhichhaveacertaindegreeofsimilarity.ThisisshowninFigure2.PROMIcouldbeabeneficialtoolinassistingtheprocessofontologyreuse.However,therearestillseveralfeaturesthatcanbeincludedandimprovedsothatitcaneasetheontologyprocessevenmoreandallowstheusageonlargeontologiestobemoreefficient.

Fromtheexamplesmentionedabove,itcanbeseenthattheontologyreusemethodologiesusedinpreviousworksisroughlysimilartothefourstepsmentionedabove.Inthispaper,thesestepswerealsousedasaguidelineindevelopinganontologyreusemethodologyforthebiomedicaldomain.Themethodologypresentedinthispaperwillallowforthedevelopmentofanewontologybyreusingmultipleexistingontologiesandsuggesttoolsthatwouldhelpineachstepofthemethodology.Thedifferencebetweenourapproachand theexistingones is the incorporationofoff-the-shelf toolstoaidintherequiredtasks.Themainmotivationinproposingthismethodologyistoallownovice

Figure 1. The user interface of PROMI


5

developersespeciallyfromthenon-technicalbackgroundtocreateanduseontologyintheirfieldbyalleviatingthenotoriouslypainstakingtaskofdevelopingadomainspecificontologyfromscratch.Our proposed methodology would also allow for ontology to be developed with only minimalinvolvementofdomainexperts.

Review of existing Biomedical OntologiesManyontologiesaredevelopedinthebiomedicalfieldasithasalargenumberofterminologiesandconcepts.Inbuildingamedicalultrasoundreportingsystem,thefirststepwouldbetoinvestigateifthereisanexistingontologythatcanbeused,orifthereareontologiesthatcanbereusedinsteadofbuildinganewonefromscratch.Asafirststep,weneedtoreviewrelatedsuitableexistingontologies.Aftercarefulresearch,threeontologieshavebeeninitiallyselectednamely:theFoundationalModelofAnatomy(FMA),theSystematizedNomenclatureofMedicine-ClinicalTerms(SNOMEDCT)andtheRadiologyLexicon(RadLex).

Figure 2. The process of merging concepts in PROMI


6

Thisinitialselectionwaspurelybasedontheircoverage,languageandpopularityinthebiomedicalcommunity.Inselectingoneontologytobeadoptedwefirstlookatthedomainofeachontology.FMAcoverstheconceptsandrelationshipinthestructuralorganizationofthehumanbodyfromthemacrocellulartomicroscopiclevels(RosseandMejino,2003)whileSNOMEDCTcoversmoreandincludesclinicalfindings,chemicalsubstancescalesandothermiscellaneoushealthinformation(Benson,2010).RadLexontheotherhandincorporatesmanycomplexradiologyrelateddomainsfrombasicsciencetoimagingtechnology(Rubin,2008).ThisshowsthatbothFMAandSNOMEDCTcoverawiderareacomparedtoRadLex.However,RadLexwouldbemoredomainspecificinrelationtodevelopinganontologythatcoverstheabdominalultrasoundprocess.

Toselectthebestontologytobereused,weadoptedthemetricsprovidedinBioPortalasshowninTable1tocomparethethreeontologiesidentified.Fromhere,wecanseethatallthreeontologiesaretoolargetobeimplementedforonespecificsystemasthiscouldresultinslowcomputingtimeandtakingupalotofmemoryspace.Thus,wehavedecidedtoreuseonlytherelevantconceptsfromallthreeontologiesandmergethemintoanewonewiththeaimofachievingawidercoverageandlessresourcesconsumption.

The Proposed MethodologyThemostimportantpartinreusingontologiesistoselectthemostsuitableones.Fromtheinitialreview,FMA,SNOMEDCTandRadLexhavebeenidentifiedaspotentialontologiestobereused.However, theseontologieswillneed tobeevaluatedbycomparing themwithourcorpusof100medicalultrasoundreportstoensurethattheseontologiesaretherightonesforreuseandprovidetherightcoverage.

Inthissection,thedifferentstepsofthedevelopedontologyreusemethodologywillbedescribedindetail.Thetermsofthecorpusconstructedfromtheultrasoundreportswillbeusedforselectingsuitableontologies.Thesetermsarethenusedtoselecttherelevantconceptsfromtheseontologiesandmergethemintoasingleonebeforebeingevaluatedbydomainexperts.Figure3summarisesthemethodologysteps.

Term extractionThescopeanddomainoftheAbdominalUltrasoundOntology(AUO)istomodelthetaxonomy,pathology, equipment and other terms related to abdominal ultrasound reporting. 100 sampleabdominalultrasoundreportshavebeencollectedandusedtoconstructourcorpus.Thesereportshaveanaveragewordcountof70.82,averagenumberofwordspersentenceof11.15andaveragetokensizeof6.45characters.ThesesamplereportswereobtainedfromtheDirectorateofRadiologyoftheUniversityofSalford.Oncethecorpusisconstructed,thenextstepistoextractalltherelevanttermstogeneratealistofbiomedicalandtechnicaltermstobeincludedintheontology.Twobiomedicalextractionapplicationswereusedduringtheextractionprocessnamely:TerMine2andBioTex3todefineasubsetofthemostsuitabletermsforstandardreportingsystems.49outofthe100samplereportsweresubmittedtobothapplicationsandtheresultsobtainedaresummarisedinTable2.

Table 1. Comparison between FMA, SNOMED CT and RadLex

FMA SNOMED CT RadLex

Numberofclasses 104,258 324,129 46,140

Numberofindividuals 0 0 46,140

Numberofproperties 172 152 96

Format OWL/OBO OWL OWL


7

The BioTex application extracted more terms (761 terms) compared to only 241 terms forTerMine.ThesuperiorityofBioTexisduetoitsabilitytoextractbothmulti-wordsandsingle-wordsasTerMineonlyextractsmulti-words.Hence,BioTexwaschosenasthetermextractionapplicationtobeused.Forexample,ifthesentence“Normalliverwithnofocallesionsseen”wassubmittedtobothapplications,TerMinewillonlyextractonemulti-wordtermwhichis“focallesion”whileBioTexwillextractnotonlythemulti-wordtermbutalso“liver”whichisasinglewordterm.Ifsingle-wordtermssuchas“liver”,“kidney”and“spleen”arenotextracted,theontologydevelopedwouldbeincomplete.TermswhichareextractedfromBioTexwerealsovalidatedusingtheUnifiedMedicalLanguageSystem(UMLS)whichisasetofdocumentscontaininghealthandbiomedicalvocabulariesandstandards.UsingBioTexwemanagedtoextract1119termsfromthe100sampleultrasoundreports.

Ontology RecommendationOncethelistoftermsisobtained,thenextstepistoselecttheontology.Threecriteriaweresetfortheselectionoftheontologyandthesecanbesummarisedasfollows:

Figure 3. Ontology reuse methodology

Table 2. Comparison of biomedical term extraction using TerMine and BioTex

TerMine BioTex

Language English English

License Open Open

POSTagger GENIATagger/TreeTagger TreeTagger

TermsFound 241(GENIATagger) 761

232(TreeTagger)

ExtractionType Multi-wordextraction Multi-wordextractionand

Single-wordextraction


8

(1) Ontology coverage-Towhichextenddoestheontologycoversthetermsextractedfromthecorpus?

(2) Ontology acceptance-Istheontologybeingacceptedinthebiomedicalfieldandhowoftenisitused?

(3) Ontology language-IstheontologywritteninOWL,OBOorotherontologyformat?Howwidelyistheformatbeingusedintheontologycommunity?

Ontologycoveragehasthehighestweightagewhendeterminingwhetheranontologyissuitableforreuseornot.Anontologythatcontainsmostoftheconceptsneededwillpreservethemodeloftheontology.Thismodelwillthenbefollowedbyconceptstakenfromotherontologies.Thenextimportantcriteriaistheacceptanceoftheontologywithinthebiomedicalcommunityasthelevelofacceptanceindirectlyshowsthequalityoftheontology(McDanieletal.,2016).Thehighertheacceptance score, the more the ontology is being used thus promoting interoperability betweendifferentsystems.Finally,theformatusedtodeveloptheontologyisalsoanimportantcriteriontoavoidtranslatingtheontologytoanotherformat.

Usingthethreedefinedcriteria,FMA,SNOMEDCTandRadLexwereinitiallyselectedasthemostsuitablecandidatesforreuse.Asafurtherstep,wehaveusedtheontologyrecommendersystemprovidedbyBioPortalwhichisanopenontologylibrarythatcontainsontologieswithdomainsthatrangefromanatomy,phenotypeandchemistrytoexperimentalconditions(Noyetal.,2009).ThereareseveralframeworksthatcanbeusedtoselectsuitableontologiesforreusesuchastheoneproposedbyTrokanasandCecelja(2016)whichusessimilaritymeasurestocalculatethecompatibilityofanontologyforreuse(candidateontology)andtheontologytheywouldliketoexpand(primaryontology).However,wedecidedtousetherecommenderprovidedbyBioPortalasitisspecialisedforthebiomedicaldomainandallthreeontologiestobeinvestigatedareavailableonBioPortal.

Onceauserprovidesalistofterms,theontologyrecommenderavailableonBioPortalusesthesetermstosuggestthemostsuitableontologiesforreuse.Therecommenderusesfourcriterianamely:(i)Coverage,(ii)Acceptance,(iii)Detailand(iv)Specialization.BasedonBioPortalanalysis,thesearethecriteriathatarethemostrelevantforrecommendingontologiesforreuse(Jonquetetal.,2010).Thetermscanbesubmittedasaparagraphoralistoftermstotherecommenderandasaresult,itwillreturnalistof25recommendedontologiesrankedfromthehighesttothelowestscores(seeFigure4).TherankingofalltheontologiesavailableintheBioPortalrepositorythatmeetsallthecriteriaiscomputedbygivingscorestofourmetricsnamelycoverage,acceptance,knowledgedetailandspecialization.Thefinalscoreiscalculatedbasedonthefollowingformula:

FinalScore=(CoverageScore*0.55)+(AcceptanceScore*0.15)+(KnowledgeDetailScore*0.15)+(SpecializationScore*0.15)

Thecoveragescoreisgivenbasedonthenumberoftermsintheinputthatarecoveredbytheontology.Itisgiventhehighestweightagewhichis0.55comparedtotheotherthreemetricswhichareallgivenaweightageof0.15becauseontologycoverageisseenasanimportantfactorindeterminingthesuitabilityofreusinganontologyforacertaincorpus.Theacceptancescoreindicateshowwell-knownandtrustedtheontologyisinthebiomedicalfield.ItisgivenbasedonthetotalofwebsitevisitstotheontologyaswellaswhetherornottheontologyexistsinUMLS.Knowledgedetailscoreontheotherhandindicatesthelevelofdetailsintheontology;i.e.doestheontologyhavedefinitions,synonymsorotherdetails.Thisisalsoanimportantmetricindeterminingthequalityofanontology.Thisisbecauseanontologythatonlyhasahierarchyoftermsandcontainsnootherdetailssuchasdefinitionswouldmerelybeseenmoreasataxonomyratherthananontology.Thus,itwouldhavelesspurposeasaknowledgebaseinasystemorapplication.Lastly,thespecializationscoreisgivenbasedonhowwelltheontologycoversthedomainoftheinput.Thisisdifferentcomparedtothecoveragescorebecausethesizeoftheontologyisalsogivenattentionsincethespecialization


9

scoreranksdomainspecificontologieshighercomparedtogeneralontologies.Inthisresearch,thespecializationscoreisnotasimportantasthecoveragescoresincetheaimistoachievethehighestcoverageeventhoughwewouldneedtoreusemorethanoneontology.AnexampleofthisscoringsysteminactionisshowninFigure4where21termsextractedfromasampleultrasoundreportweresubmitted.

ThereishoweveralimitationwhenusingBioPortalasitallowssubmissionofonly500words.Toprocessallthe1119termsthatwereextractedfromourreports,wehavetocustomizetheexistingrecommendersystemtocomeupwithanothersystembymanipulatingthedatafromBioPortal’sontology recommender API (BioPortal, n.d.). We first developed the recommender system thatwouldsubmitall1119termstoBioPortal’sAPIandgivearecommendationof25ontologiesrankedaccordingtoitsfinalscorejustlikehowitwouldbeinBioPortal’srecommender.However,itseemsthatthe1119termsweretoobigfortherecommender’sservertohandleandcausestherecommendertocrashmidwaywithoutgivinganyresult.

Toovercomethis,wedecidedtodeveloptherecommendersystemthatwillallowall1119termstobesubmittedbutwillprocessthetermsonebyoneinstead.EachtermwillbesubmittedtotheAPItogetalistofontologyrecommendationsandtheirscores.Theontologywiththehighestscorewillbeselectedastheontologyrecommendedfortheindividualterm.Thefrequencyofallontologiesrecommendedwillthenbecounted.Theontologywiththehighestfrequencywillbereusedfirst,followedbythesecondandthefollowingontologyuntilweachievetheoptimumcoverage.Figure5summarizestheusedprocess.

Figure6showsanexcerptof theresultfromprocessing1119termsusingtherecommendersystemwehavedeveloped.Therecommendersystemallowsustosubmitquiteahugenumberoftermstobeprocessedandtheresultwillthenbestoredforanalysis.Thisisseenasabetteralternativebecauseitisimpossibletostraightawaysubmitall1119termsandgetaresultsincetheAPIserverwouldnotbeabletohandlesuchahugenumber.Figure6(a)showsanexcerptofallthelistoftermssubmittedandtheontologyrecommendedforeachtermbasedonthefinalscoretheygot.ThisscoreiscalculatedbyBioPortalusingthesameformulathattheyusedfortheirrecommender.

Figure 4. BioPortal’s ontology recommender


10

After all terms have been submitted to the recommender, the frequency of the ontologyrecommendedforeachtermwillbecountedandsortedfromhighesttolowest.TherecommendersystemhasrankedNCITastheontologywiththehighestfrequency–whichhavebeenrecommendedfor476terms;followedbySNOMEDCT(207)andRadLex(48)asshowninFigure6(b).ThishasproventhatourinitialselectionofreusingFMA,SNOMEDCTandRadLexwasnotveryaccuratesinceFMAdoesnotevenappearinthetop20oftheontologiesrecommendedforthiscorpus.ThereasonforthiscouldbebecauseFMAisverylargeandbroadsoitlosesalotofpointsinthespecializationscore.Itsbroaddomainalsomeansthatitcoversonlythegeneraltermsavailableinthecorpuscausingittoalsoloosepointsinthecoveragescorehenceresultinginitnotbeingrecommendedforreuse.

Figure 5. Algorithm for getting ontology recommendations for term list

Figure 6. (a) Ontology recommendation for each term (b) Ranking of ontology recommended


11

Term to Concept MappingTerm(fromthecorpus) toconcept (fromtheontology)mapping is thenextstep inbuilding theAUOandthisisachievedbyusingtheresultacquiredfromtheBioPortal’sSearchAPI(BioPortal,n.d.).TheAPIallowsuserstoinsertseveralparameterstoperformconceptsearchfromaspecificontologyusingseveralparameters(“q”forexampleisusedforsearching).TheAPIwillthenreturnconceptsthatmatchthetermwithsomepropertiessuchasthepreferredlabel,definition,synonym,matchtypeandtheterms’relationshipwithitschildren,descendant,parentsandancestors.Ifseveralconceptswerereturned,theconceptthathastheclosestsemanticmeaningtothetermsubmittedwillbemanuallychosen.Earlierworks(Mejinoetal.,2008;Shahetal.,2014)haveperformedthistaskbygoingthroughalltheconceptsinanexistingontologyanddeletingirrelevantconcepts.However,usingBioPortalprovideduswithamorerigorousapproach.Figure7illustratesanexampleoftheresultsreturnedwhenthetermpancreaswassearched.

TherewasinitiallyanattempttoautomaticallygeneratetheontologyusingProtègè(theOWLeditorthatwasusedindevelopingtheontology)bysimplydownloadingtheconceptsandrelationshipsreturnedbytheAPIinXMLformat.However,thiswasnotperformedatthisstagefortwomainreasons.Thefirst,isthatthedatareturnedbytheAPIdoesnotprovidethecompletepropertiesofaconcept.Someofthepropertieswereprovidedaslinkswherebyitneedstobevisitedfirstbeforewecanaccesstheconcept.Figure7showsascreenshotoftheresultreturnedbytheBioPortalAPIwhentheterm“pancreas”wassearched.Asseeninthefigure,thechildren,parentanddescendantsoftheconceptwasreturnedasalink.Ifthesedatawereusedtoautomaticallygeneratetheontology,itwillnotbemeaningful.Thesecondreasonistheissueofpolysemywheretermscanhavemanydifferentmeaningsandhumaninterventionisneededtoselecttherightmeaningdependingonthecontext.

Figure8isusedasaguidelinetodecidewhetheratermshouldbereusedornot.WeselectatermfromthetermlistandusingtheSearchAPI,querytheontologywiththehighestfrequency(NCITinthiscase).Ifamatchisfound,wecheckifthematchisapreferredlabel(PrefLabel)(theconceptfoundistheexactmatchoftheterm),synonym(thetermisfoundasasynonymoftheconcept)orpartialmatch(thereisnoexactmatchforthetermbutthereareatleasttwoconceptsthatmatchtheterm).IfthematchisaPrefLabelorsynonymmatch,theconceptwillbereused.Ifthematchispartial,theconceptsthatmakeupthetermwillalsobereusedbutitwillremaininthetermlisttobecomparedtotheconceptsofotherontologies.

Onceaconceptisselectedforreuse,thealgorithmsearchesifithasparentsorancestors.Doranetal.(2007)intheirworksuggestthatinordertomaintainthemodularityofanontology,aconcept

Figure 7. The result returned by BioPortal API when the term “pancreas” was searched


12

shouldbeextractedtogetherwithitssubclassesorchildreninsteadofparentsandancestors.Theyargue that immediateparentsandancestorsareunimportantandextracting themwould increasetheriskofcreatinganontologythat isequal to theontologybeingreused.However,webelievethatparentsandancestorsareimportantinconnectingconceptssothattheywouldnotbefloating.Ifweweretotake“spleen”anditssubclassasonemodule,“kidney”anditssubclassasanothermodule,aswellas“millimeter”anditssubclassasanothermodule,itwouldbehardtogroupthesemodulesunderthesamecategory.Furthermore,theontologybeingdevelopedisveryspecifictotheabdominalultrasounddomain.Thus,reusingparentsandancestorsofaconceptreducestheriskoftheontologybeingaslargeastheoriginalone.Oncealltermshavebeensearched,thisprocesswillthenberepeatedfortheremainingrecommendedontologieswhichareSNOMEDCTandRadLex.

Awalkthroughexample:Consideralistoftermsthatcontainsthreewordswhichare“gallbladder”,“ductdilation”and“gallstone”.Wefirsttakethefirstword“gallbladder”andqueryiftheconceptexistinNCIT.ThisreturnsanexactmatchwherethereexistsaconceptinNCITwiththepreferredlabel“gallbladder.”Thus,thisconcepttogetherwithallitsknowledgedetailwillbereused.Wethenlooktoseeiftheconcepthasanyparentsorancestors.“Gallbladder”hasaparent“organ”andanancestor“anatomicstructure,system,orsubstance”whichbothwillbereused.Since“gallbladder”hasanexactmatchinNCIT,itisremovedfromthetermlist.

Wethenquerythesecondword“ductdilation”inNCITwhichreturnsapartialmatchwhichconsistoftheword“duct”andanotherword“dilation”.Bothconceptswillbereusedtogetherwiththeirknowledgedetailsaswellastheirparentsandancestors.However,differentto“gallbladder”,“ductdilation”wouldnotberemovedfromthetermlistsinceitisonlyapartialmatch.Thefinalwordinthelist,“gallstone”isthenqueriedwhichgivesasynonymmatchtotheconcept“gallbladderstone”inNCIT.

“Gallbladderstone”togetherwithallitsknowledgedetailincludingsynonymswillbereused.Allitsparentsandancestorswillalsobereused,andthetermwillberemovedfromthetermlist.OncealltermshavebeenqueriedinNCIT,wethenseeifthereareanytermsleftinthetermlist.Thisgiveustheterm“ductdilation”whichisqueriedtoseewhetherthereisanymatchwiththesecondontologyrecommendedwhichisSNOMEDCT.Thisreturnsasynonymmatchwiththeconcept“dilationofduct”.Thus,theconcept“dilationofduct”isreusedandthetermtoconceptmappingprocessisfinallycomplete.

KatsumiandGruninger(2016)intheirworkdiscussedontologymodellinginontologyreusewhere theystressed that foranontology tobeconsideredasbeing reused fromanotherexistingontology,itshouldhaveatleastasmallfragmentoftheexistingontology.Indeterminingwhichmodelanontologyshouldfollow,KatsumiandGruningerstatesthatitdependsonthestrengthoftheexistingontology.Iftheexistingontologyisweakerthanthenewontologythatisbeingdeveloped,thenthenewontologymayonlymapsomepartoftheexistingontology’smodel.However,iftheexistingontologyisstrongerthanthenewontologybeingdeveloped,thenthenewontologymayfollowthemodeloftheexistingontology.

Inthecaseofseveralontologybeingreused,KatsumiandGruningersuggestedthatsomepartofthenewontologyshouldfollowsomepartoftheexistingontologybeingreused.Theyhowever,failtodefinewhatarethecriteriathatclassifiesanontologyasstrongerorweakercomparedtotheother ontology. In this research, threeontologieswere reusednamelyNCIT,SNOMEDCTandRadLex.SinceNCIThasthehighestfrequencybyfar(478),comparedtotheothertwo(207and48respectively),weassumedthatNCITisstrongerthanSNOMEDCTandRadLex.ThishaspromptedusinfollowingthemodellingofNCITinthedevelopmentoftheAbdominalUltrasoundOntology(AUO).

WhenmergingconceptsreusedfromSNOMEDCTandRadLexintotheontologiesreusedfromNCIT,wewouldfirstfindsuitableparentsfortheconcept.Ifnosuchparentexists,theparentandancestorsoftheconceptwillthenbereusedaccordingtothemodellingofNCIT.Ifnomatchisfoundinanyofthesethreeontologies,anewconceptwillthenbecreatedwiththehelpofdomainexperts.


13

ThenewconceptscreatedwillbecarefullyintegratedintoAUOandwillcontainthesamelevelofknowledgedetailsasotherconceptsinAUO.Expertisefromthedomainexpertsareneededatthisleveltogivethecorrectdefinitionandaddrelevantsynonymstothesenewconcepts.AllconceptsinAUOwereannotatedwiththeiroriginalontologysothatitiseasiertomakeanyreferenceinthefutureifnecessary.Figure9showsasnapshotoftheAbdominalUltrasoundOntologyasdisplayedinProtégé.

Ontology evaluationThedevelopedAUOwasvalidatedbydomainexpertsmainlytocheckthattherelationshipsbetweenconceptsandtheirdefinitionsarecorrect.Duringtheevaluationphase,thedomainexpertstogetherwiththeontologydevelopers,wentthroughthewholeontology.Someinconsistencieswerehighlightedbutoverallalltheexpertswerehappywiththeinformationgeneratedandstoredintheontology.With regards to vocabulary coverage, the experts believe that 92.7% ontology coverage is goodenoughtocoveralltheimportantconceptsneededforanabdominalultrasoundreportingsystem.Fortheremaining7.3%ofthetermsthathadnomatchintheontology,someofthemwerecausedbyhumanerrorsuchasspellingmistakesmadebythereporters.Therearealsoseveraltermsthatthedomainexpertbelievescouldbeomittedasthesewordsshouldnotbeinanultrasoundreportforgoodpractice.Examplesofsuchwordsare“comettail”,“NAD”,and“hepatopetal”.Thesewordsmightbeunderstoodbytheradiologistbutmightmakenosensetoothers(Edwardsetal.,2014).Themainobjectiveofusingthisontologyreusemethodologyistoachieveasmuchcoverageaspossibleandreducetheneedfordomainexpertsindevelopingtheontology.Ifanontologyweretobebuiltfrom scratch, domain expertswill beneeded from thevery first step indesigning theontology.However,withontologyreuse,domainexpertswereonlyneededattheendoftheprocesstoverifythateverythingiscorrectandtoassistinaddingfewnewconceptsintotheontology.

Figure 8. Term to Concept mapping guide


14

ReSULT ANd dISCUSSION

Theevaluationandtestingofthemethodologyweredividedintotwophases.Phase1triestoprovethatontologyreuseprovidesawidercoveragecomparedtousingonlyoneexistingontology.Thetestingwasperformedusingonly49sampleultrasoundreportwhichwasalsousedasthetrainingdataindevelopingAUO.Phase2looksathowwellAUOperformwhenbeingevaluatedwithexistingandnewsampleultrasoundreports.Italsolooksathowdifficultitistoincludeadditionalconceptswhentherearenewrequirements.

Phase 1: 49 Sample Ultrasound ReportsThecoverageofAUOwasfirsttestedonasetof49sampleultrasoundreportswithatotalof761termsincomparisonwithtwoexistingontologieswhichareNCITandSNOMEDCT.Atermtoconceptmatchingwasperformedusingthe761termsextractedfromthesampleultrasoundreportcorpusandtheresultareshowninFigure10.ThedevelopedAUOgavethehighestnumberofconceptmatchcomparedtoreusingonlyoneontology.BetweenNCITandSNOMEDCT,NCIThasthehigherconceptmatchwithatotalof151PrefLabelmatches,79synonymsmatchesand438partialmatches.SNOMEDCTontheotherhandhasonly98PrefLabelmatches,104synonymsmatchesand431partialmatches.ThereasonSNOMEDCThaslowerPrefLabelmatchescomparedtosynonymsisbecauseofitsnamingconvention.Forexample,thepreferredlabelfor“kidney”is“kidneystructure”and“entiregallbladder”for“gallbladder”.Whenwritingreports,radiologistoftenusesimplerwordslike“kidney”and“gallbladder”insteadof“kidneystructure”and“entiregallbladder”thus,whentermtoconceptmatchingwasperformed,SNOMEDCTreturnedmoresynonymmatchesthenPrefLabel.

ComparedtoNCITandSNOMEDCT,AUOreturnsthehighesttotalmatcheswith176PrefLabelmatches,111synonymmatchesand418partialmatches.ThereasonAUOreturnsthemostnumberofmatchesisbecausetheontologyreusemethodologyselectsthebestmatchesfromdifferentontologiesandmergethemintotheAUO.ItsexhaustivemappinginseveralontologiesbasedontheontologyrankhasensuredthatalmostalltermsinthecorpuswerecoveredbyAUO.Wheneverpossible,aPrefLabelmatchwillbeinsertedintheontology.Ifnot,asynonymmatchwillbeaddedthenonlypartialmatchesareincludedtoensuretheontologyhasawidecoverageofthecorpus.

Asexpected,andconfirmedbytheresults,itisbettertoreuseseveralontologiesthenusingasingleoneasthisoffersabettercoverage.Figure11showsthepercentageoftotalmatchandnomatchinallthreeontologies.Ifontologyreusewasdonebymappingthe761termsagainstNCIT,there

Figure 9. Snapshot of the Abdominal Ultrasound Ontology as displayed in Protégé


15

willonlybean87.8%ofcoverage.IfthemappingweredoneagainstSNOMEDCT,thepercentageofcoveragewouldbeonly83.2%whichislowerthanNCIT.However,thepercentageofcoverageincreasesto92.6%whenseveralontologieswerereused;whichinthiscaseareNCIT,SNOMEDCT,andRadLex.Thepercentageofnomatchisalsoverysmall(7.4%)whichmeansthattheAUOcoversalmostallthetermsinthecorpus.Thereasonthereisstill7.4%ofnomatchisbecausethereareseveraltermsinthecorpusthatthedomainexpertsbelievearepoorusageoftermstodescribefindingsinanultrasoundreport.ThedomainexpertbelievesthatthisisbadpracticeandthemedicalultrasoundexpertsarenowslowlycuttingdowntheusageofsuchwordsthusmakingitirrelevanttobeintheAUO.Anotherreasonforthe7.4%ofnomatchisspellingerrorsmadebyultrasoundreporters.Thisisnotaconcernfornowbutforfuturework,wecouldconsiderusingtheontologytoalsocorrectandunderstandtheseerrors.

NCIThasatotalof118,941classeswhileSNOMEDCThas324,129classes.However,thereareonly668and633matchesrespectivelyforeachNCITandSNOMEDCTregardingabdominalultrasoundterminology.Ontheotherhand,AUOhasonly509classeswhichislessthan0.5%ofeitherNCITorSNOMEDCTbutstillmanagedtohave705matcheswhichismorethanthematchesNCITandSNOMEDCTeachgets.Thisisbecauseofthespecializationoftheontology.Sincetheontologyhasanintendedpurposeinanapplication,itismuchbetterandmoreefficienttobuildadomainspecificontologythroughreuse.ItdefinitelywouldnotbeefficienttostorealargeontologysuchasNCITandSNOMEDCTanduseonlylessthan0.6%ofit.Thisisbecauseitwouldtakealotofstoragespaceanditwillalsoslowdowntheapplicationsincetheapplicationwillneedtogothroughthewholeontologytofindamatch.Thus,thebetterwaytodevelopanontology-basedapplicationistobuildanewdomainspecificontologythroughontologyreusemethodology.

Figure 10. Breakdown of total match according to type against NCIT, SNOMED CT and Abdominal Ultrasound Ontology (AUO)


16

Phase 2: 100 Sample Ultrasound ReportsPhase1ofthetestingandevaluationprocessoftheontologyreusemethodologyhasproventhatontologyreuseprovidesawidercoveragecomparedtousingonlyoneexistingontology.Phase2looksathowwellAUOperformsinrelationtothetermtoconceptmappingwhenbeingevaluatedusing100samplereports;where49ofthesamplereportswereusedfortrainingpurposeandtheother51wereanewsampleultrasoundreportswhichwerenotusedindevelopingAUO.Phase2alsolooksathowmuchworkisneededtoupdatetheontologytoincludeadditionalconceptsbasedontherequirementsofthenewcorpus.

Termsextractionwhichwasperformedonthe100samplereportsreturned1119termswhichis358morethanthetermsextractedinphase1.Thenumberofmatchesforthe761termshasalreadybeenfoundoutinPhase1.Inordertofindoutthenumberofmatchestherestofthetermsgot,wecheckedwhethertheremaining358termsexistinAUO.Figure12showsthepercentageofeachtypeofmatchesaswellasnomatchforall358termswhencomparedwithAUO.

Overall,thereisaquitehighpercentageoftotalmatchwhichis72.1%where237outofthe358termswerefoundaspartialmatch(66.2%),15assynonymmatch(4.2%)and6asprefLabelmatch(1.7%).ThisshowsthattheAUOisadequateenoughtocovermosttermsinthebiomedicaldomainsincethetotalofnomatchisquitesmallwhichis27.9%.Inabdominalultrasoundreporting,thecontentofthereportsusuallycontainswordswhicharemoreorlesssimilarinsemanticeventhoughdifferent termswereusedespecially in thecaseofnormalultrasoundreports.Only inabnormalultrasoundreportswecouldfindahighervarietyofwordsbeingused.WiththeexistenceofAUOasaknowledgebase,theusageofdifferentwordsinwritingultrasoundreportswouldnothavematteredbecausethesemanticofthewordsaremoreimportantthanthewordsthemselvesandbecauseofthesimilarityofthecontents,asmallnumberofthecorpususedastrainingdatastillgivesquiteagoodnumberoftotalmatchpercentage.

Aftercomparingthe358termswithAUO,wefoundthatthereare100termswithoutamatchinAUO.Agoodmethodologyshouldbeabletoallowforittoadapttoanychangeswhenrequired.Thus,inordertoreducethepercentageofnomatch,theontologyreuseprocesswasdoneagainonall358termstoincludenewconceptsintoAUO.Firstly,weneedtocheckwhethertheadditionof51newsamplereportschangestheontologyrecommendedforreuse.Fortunately,NCITstillcomesupasthebestontologytoreusefollowedbySNOMEDCTandRadLex.Next,all358termswerethencomparedwithNCITsothattermtoconceptmappingcouldbeperformed.Oncetheprocesswascompleted,itwasthenrepeatedwithSNOMEDCTandRadLex.Aftertheadditionofthenewconceptsandsynonymssuchas“adnexalmasses”,“incidentalfinding”and“morphology”toAUO,thetotalnumberoftermswithoutamatchwasreducedfrom100to26,whichisa74%reduction.

Figure 11. Percentage of total match and no match in NCIT, SNOMED CT and AUO


17

Figure13showsthepercentageoftotalmatchofallthreetypesofmatchesaswellasthepercentageofnomatchforthe1119termsafternewconceptshavebeenaddedtoAUO.

The totalmatchpercentageafter theadditionofnewconceptshas increased from72.1% to92.7%where692outofthe1119termswerefoundaspartialmatch(61.8%),139assynonymmatch(12.4%)and206asprefLabelmatch(18.4%).Beforetheadditionofthenewconcepts,thepercentageofpartialmatchwashigherat66.2%comparedtothecurrentpercentageof61.8%.ThisnumberwasreducedafterthetermtoconceptmappingwasdonebecausesomeofthepartialmatcheswerefoundaseitherasynonymorprefLabelmatchinNCIT,SNOMEDCTorRadLex.Forexample,theword“rightovary”wasfirstlabelledaspartialmatchsinceAUOdoesnothaveaconceptcalled“rightovary”butithastwoconceptswhichare“right”and“ovary”.However,afterperformingtermtoconceptmappingagain,“rightovary”wasfoundasaprefLabelmatchinNCITthustheconceptwasaddedtoAUO.ThiscausesthepercentageofsynonymandprefLabelmatchtoalsoincreasefrom4.2%and1.7%to12.4%and18.4%respectively.Figure14showsthecomparisonofthepercentageofprefLabel,synonym,partialandnomatchforphase1,phase2(beforetheextension)andphase2(aftertheextension).

Phase2ofthetestingandevaluationofAUOshowsthatthenumberofsampleultrasoundreportsusedas testingdatawasadequateinensuringthatAUOwouldbeable tocovermostabdominalultrasoundsamplereports.Thisalsoprovesthatthedomainexpert’sassumptionthatthe92.7%totalmatchesofthefirstversionofAUOisindeedenoughtocovermostabdominalultrasoundsamplereports.Italsoshowsthattheontologyreusemethodologyproposedinthisresearchisadaptablesinceitisfairlyeasyfornewconceptstobeaddedaccordingtonewrequirementsfromadditionalsamplereports.

CONCLUSION

Ontologyreusecanbebeneficialindevelopingdomainspecificontologiesforapplicationsystemwherebyitreducesdevelopmenttimeandredundancy.Thelackofpropermethodologyandtoolsin reusing ontology has hindered this effort. Thus, this paper proposed a methodology to reuseontologytogetherwithsupportingtoolsthatwouldmaketheontologyreuseprocessmucheasier.

Figure 12. Percentage of total prefLabel, synonym, partial and no match for 358 new terms when compared to AUO


18

ThedevelopmentofAUOusingthismethodologyhasproventhatontologyreuseisbeneficialindevelopingasmalldomainspecificontologywhichhaswidecoverageoftheterminologyusedintheapplicationsystemcomparedtousingalargegeneraldomainontology.Italsoprovesthatthemethodologyisadaptableasitallowsforchangestobemadeeasilywhentherearenewrequirementsfromthecorpus.Itishopedthattheproposedontologyreusemethodologywouldencouragemoreusageofontology inmedicalsystemwithout thedevelopmentofsimilardomainontologies thatwouldcauseredundancy.

Figure 13. Percentage of total prefLabel, synonym, partial and no match for all 1119 terms when compared to AUO

Figure 14. Comparison of the percentage of prefLabel, synonym, partial and no match for phase 1, phase 2 (before the extension) and phase 2 (after the extension)


19

RefeReNCeS

Alani,H.(2006).Positionpaper:Ontologyconstructionfromonlineontologies.InProceedings of the 15th International Conference on World Wide Web, WWW ’06,Edinburgh,Scotland(pp.491–495).NewYork,NY:ACM.doi:10.1145/1135777.1135849

Alani,H.,Brewster,C.,&Shadbolt,N.(2006).RankingOntologieswithAKTiveRank,pages1–15.SpringerBerlinHeidelberg,Berlin,Heidelberg.doi:10.1007/11926078_1

Benson,T.(2010).Principles of Health Interoperability HL7 and SNOMED.London:Springer.

BioPortal(n.d.).APIDocumentation.http://data.bioontology.org/documentation

Boland, G. (2007). Enhancing the radiology product: The value of voice-recognition technology. Clinical Radiology,62(11),1127.doi:10.1016/j.crad.2007.05.014PMID:17920876

Bontas,E.P.(2007).A contextual approach to ontology reuse.PhDthesis,FreieUniversitätBerlin.

Bontas,E.P.,Mochol,M.,&Tolksdorf,R.(2005).Casestudiesonontologyreuse.InProceedings of the 5th International Conference on Knowledge Management, IKNOW05.AcademicPress.

Caldarola,E.G.,Picariello,A.,&Rinaldi,A.M.(2015).Anapproachtoontologyintegrationforontologyreuseinknowledgebaseddigitalecosystems.InProceedings of the 7th International Conference on Management of Computational and Collective intelligence in Digital EcoSystems, MEDES ’15,Caraguatatuba,Brazil(pp.1-8).NewYork,NY:ACM.doi:10.1145/2857218.2857219

Capellades,M.A. (1999).Assessmentof reusabilityof ontologies: a practical example. InProceedings of AAAI1999 Workshop on Ontology Management(pp.74–79).AAAIPress.

Doran, P., Tamma, V., & Iannone, L. (2007). Ontology module extraction for ontology reuse: Anontology engineering perspective. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM ’07,Lisbon,Portugal (pp.61-70).NewYork,NY:ACM.doi:10.1145/1321440.1321451

Edwards,H.,Smith,J.,&Weston,M.(2014).Whatmakesagoodultrasoundreport?Ultrasound,22(1),57–60.doi:10.1177/1742271X13515216PMID:27433194

Fernández-López, M., Gómez-Pérez, A., & Suárez-Figueroa, M. C. (2013). Methodological guidelines forreusinggeneralontologies.Data & Knowledge Engineering,86,242–275.doi:10.1016/j.datak.2013.03.006

Jonquet,C.,Musen,M.A.,&Shah,N.H.(2010).Buildingabiomedicalontologyrecommenderwebservice.Journal of Biomedical Semantics,1(S1),S1.doi:10.1186/2041-1480-1-S1-S1PMID:20626921

Kahn,C.E.Jr,Langlotz,C.P.,Burnside,E.S.,Carrino,J.A.,Channin,D.S.,Hovsepian,D.M.,&Rubin,D.L.(2009).Towardbestpracticesinradiologyreporting.Radiology,252(3),852–856.doi:10.1148/radiol.2523081992PMID:19717755

Katsumi,M.,&Gruninger,M.(2016).Whatisontologyreuse?InFerrario,R.andKuhn,W.,(Eds.),Formal Ontology in Information Systems:Proceedings of the 9th International Conference, FOIS 2016(pp.9-22).IOSPress.

Lossio-Ventura, J. A., Jonquet, C., Roche, M., & Teisseire, M. (2014). Biotex: A system for biomedicalterminologyextraction,ranking,andvalidation.InProceedings of the 2014 International Conference on Posters & Demonstrations TrackISWC-PD’14,RivadelGarda,Italy(pp.157–160).CEUR-WS.org.

Maedche,A.,Motik,B.,Stojanovic,L.,Studer,R.,&Volz,R.(2003).Aninfrastructureforsearching,reusingandevolvingdistributedontologies.InProceedings of the 12th International Conference on World Wide Web, WWW ’03,Budapest,Hungary(pp.439–448).NewYork,NY:ACM.doi:10.1145/775152.775215

Martínez-Romero,M.,Jonquet,C.,Oâ˘,A.́ .Z.,Connor,M.J.,Graybeal,J.,Pazos,A.,&Musen,M.A.(2017).Ncboontologyrecommender2.0:Anenhancedapproachforbiomedicalontologyrecommendation.Journal of Biomedical Semantics,8(1),21.doi:10.1186/s13326-017-0128-yPMID:28592275

http://dx.doi.org/10.1145/1135777.1135849

http://dx.doi.org/10.1007/11926078_1

http://data.bioontology.org/documentation

http://dx.doi.org/10.1016/j.crad.2007.05.014

http://www.ncbi.nlm.nih.gov/pubmed/17920876

http://dx.doi.org/10.1145/2857218.2857219

http://dx.doi.org/10.1145/1321440.1321451

http://dx.doi.org/10.1177/1742271X13515216


http://dx.doi.org/10.1016/j.datak.2013.03.006

http://dx.doi.org/10.1186/2041-1480-1-S1-S1


http://dx.doi.org/10.1148/radiol.2523081992


http://dx.doi.org/10.1145/775152.775215

http://dx.doi.org/10.1186/s13326-017-0128-y



20

McDaniel,M.,Storey,V.C.,&Sugumaran,V.(2016).Theroleofcommunityacceptanceinassessingontologyquality.InE.Métais,F.Meziane,M.Saraeeetal.(Eds.),Natural Language Processing and Information Systems:21st International Conference on Applications of Natural Language to Information Systems, NLDB 2016,Salford,UK,June22-24(pp.24-36).SpringerInternationalPublishing.doi:10.1007/978-3-319-41754-7_3

Mejino,J.L.Jr,Rubin,D.L.,&Brinkley,J.F.(2008).FMARadLex:Anapplicationontologyofradiologicalanatomyderived from the foundationalmodelofanatomyreferenceontology.AMIA ... Annual Symposium Proceedings - AMIA Symposium.AMIA.PMID:18999035

Noy,N.F.,Shah,N.H.,Whetzel,P.L.,Dai,B.,Dorf,M.,Griffith,N.,&Musen,M.A.et al.(2009).Bioportal:Ontologiesandintegrateddataresourcesattheclickofamouse.Nucleic Acids Research,37,W170–W173.doi:10.1093/nar/gkp440PMID:19483092

Ortiz,A.(2011).Polionto:Ontologyreusewithautomatictextextractionfrompoliticaldocuments.InProceedings of the 6th Doctoral Symposium in Informatics Engineering(pp.309–320).AcademicPress.

Rosse,C.,&Mejino,J.L.V.Jr.(2003).Areferenceontologyforbiomedicalinformatics:TheFoundationalModelofAnatomy.Journal of Biomedical Informatics,36(6),478–500.doi:10.1016/j.jbi.2003.11.007PMID:14759820

Rubin,D.L.(2008).Creatingandcuratingaterminologyforradiology:Ontologymodelingandanalysis.Journal of Digital Imaging,21(4),355–362.doi:10.1007/s10278-007-9073-0PMID:17874267

Russ,T.,Valente,A.,MacGregor,R.,&Swartout,W.(1999).Practicalexperiencesintradingoffontologyusability and reusability. In Proceedings of the 12th Workshop on Knowledge Acquisition, Modeling and Management, KAW’99(pp.16–21).AcademicPress.

Shah,T.,Rabhi,F.,&Ray,P.(2013).OSHCO:Acrossdomainontologyforsemanticinteroperabilityacrossmedical and oral health domains. In Proceedings of the IEEE 15th International Conference on e-Health Networking, Applications Services, HEALTHCOM’13, Lisbon, Portugal,October9-12(pp.460–464).IEEE.doi:10.1109/HealthCom.2013.6720720

Shah,T.,Rabhi,F.,Ray,P.,&Taylor,K.(2014).Aguidingframeworkforontologyreuseinthebiomedicaldomain.InProceedings of the47th Hawaii International Conference on System Sciences, HICSS 2014(pp.2878–2887).IEEE.doi:10.1109/HICSS.2014.360

Simperl,E.(2009).Reusingontologiesonthesemanticweb:Afeasibilitystudy.Data & Knowledge Engineering,68(10),905–925.doi:10.1016/j.datak.2009.02.002

Trokanas,N.,&Cecelja,F.(2016).Ontologyevaluationforreuseinthedomainofprocesssystemsengineering.Computers & Chemical Engineering,85,177–187.doi:10.1016/j.compchemeng.2015.12.003

Uschold,M.,Clark,P.,Healy,M.,Williamson,K.,&Woods,S.(1998).Anexperimentinontologyreuse.InProceedings of the 11th Knowledge Acquisition Workshop, KAW’98.AcademicPress.

Zulkarnain,N.Z.,Crofts,G.,&Meziane,F.(2015a).Anarchitecturetosupportultrasoundreportgenerationandstandardisation.InProceedings of the 8th International Conference on Health Informatics, HEALTHINF 2015,Lisbon,Portugal(pp.508–513).SCITEPRESS.doi:10.5220/0005252505080513

Zulkarnain,N.Z.,Crofts,G.,&Meziane,F.(2015b).Aproposedmodelforthestandardisationofultrasoundreport.Poster presented at the UK Radiological Congress,ACCLiverpool,UK.AcademicPress.

Zulkarnain,N.Z.,Crofts,G.,&Meziane,F.(2016a).Automaticclassificationofabdominalultrasoundfindingsasnormal,abnormalor inconclusive.Poster presented at theBritish Medical Ultrasound Annual Scientific Meeting & Exhibition,York,UK.AcademicPress.

Zulkarnain,N.Z.,Meziane,F.,&Crofts,G.(2016b).Amethodologyforbiomedicalontologyreuse.InE.Métais,F.Meziane,M.Saraeeetal.(Eds.),Natural Language Processing and Information Systems:21st International Conference on Applications of Natural Language to Information Systems, NLDB 2016,Salford,UK,June22-24(pp.3-14).SpringerInternationalPublishing.

http://dx.doi.org/10.1007/978-3-319-41754-7_3


http://dx.doi.org/10.1093/nar/gkp440


http://dx.doi.org/10.1016/j.jbi.2003.11.007


http://dx.doi.org/10.1007/s10278-007-9073-0


http://dx.doi.org/10.1109/HealthCom.2013.6720720

http://dx.doi.org/10.1109/HICSS.2014.360

http://dx.doi.org/10.1016/j.datak.2009.02.002

http://dx.doi.org/10.1016/j.compchemeng.2015.12.003

http://dx.doi.org/10.5220/0005252505080513


21

eNdNOTeS

1. http://bioportal.bioontology.org2. http://www.nactem.ac.uk/software/termine3. http://tubo.lirmm.fr/biotex/

Nur Zareen Zulkarnain received her BTech. (Hons) degree from Universiti Teknologi PETRONAS, Malaysia in 2011, an MSc. from The University of Manchester, UK in 2012 and a PhD. in Computer Science from the University of Salford, UK in 2017. Currently, she is a Senior Lecturer in the Department of Intelligent Computing & Analytics in Universiti Teknikal Malaysia Melaka where her teaching and research mainly involves artificial intelligence in the area of informatics and natural language processing.

Farid Meziane holds a chair in Data and Knowledge Engineering and a PhD in Computer Science from the University of Salford and is the head of the Informatics Research Centre. He is the co-Chair of the International Conference on Application of Natural Language to Information Systems (NLDB) and has served on the programme committees of over 20 conferences. He is on the editorial board of 5 international journals. His research interests are in the broad area of data and knowledge engineering. This includes data mining, information extraction and retrieval, Big Data, and the semantic web.

Documents

A methodology for ontology reuse : the case of the ...usir.salford.ac.uk/id/eprint/51432/8/Published Copy... · todevelopanAbdominalUltrasoundOntologybyextractingconceptsfromexistingbiomedical