17
Huang et al. Journal of Biomedical Semantics (2016) 7:25 DOI 10.1186/s13326-016-0064-2 RESEARCH Open Access OmniSearch: a semantic search system based on the Ontology for MIcroRNA Target (OMIT) for microRNA-target gene interaction data Jingshan Huang 1* , Fernando Gutierrez 2 , Harrison J. Strachan 1 , Dejing Dou 2 , Weili Huang 3 , Barry Smith 4 , Judith A. Blake 5 , Karen Eilbeck 6 , Darren A. Natale 7 , Yu Lin 8 , Bin Wu 9 , Nisansa de Silva 2 , Xiaowei Wang 10 , Zixing Liu 11 , Glen M. Borchert 12 , Ming Tan 11 and Alan Ruttenberg 13 Abstract As a special class of non-coding RNAs (ncRNAs), microRNAs (miRNAs) perform important roles in numerous biological and pathological processes. The realization of miRNA functions depends largely on how miRNAs regulate specific target genes. It is therefore critical to identify, analyze, and cross-reference miRNA-target interactions to better explore and delineate miRNA functions. Semantic technologies can help in this regard. We previously developed a miRNA domain-specific application ontology, Ontology for MIcroRNA Target (OMIT), whose goal was to serve as a foundation for semantic annotation, data integration, and semantic search in the miRNA field. In this paper we describe our continuing effort to develop the OMIT, and demonstrate its use within a semantic search system, OmniSearch, designed to facilitate knowledge capture of miRNA-target interaction data. Important changes in the current version OMIT are summarized as: (1) following a modularized ontology design (with 2559 terms imported from the NCRO ontology); (2) encoding all 1884 human miRNAs (vs. 300 in previous versions); and (3) setting up a GitHub project site along with an issue tracker for more effective community collaboration on the ontology development. The OMIT ontology is free and open to all users, accessible at: http://purl.obolibrary.org/obo/omit.owl. The OmniSearch system is also free and open to all users, accessible at: http://omnisearch.soc.southalabama.edu/index.php/Software. Keywords: microRNA, Non-coding RNA, Target gene, Biomedical ontology, Ontology development, Data annotation, Data integration, Semantic search, SPARQL query Introduction microRNAs (miRNAs) are a type of non-coding RNA (ncRNA) with important biological, biomedical, and clin- ical impact. Prior research [1, 2] indicates that miRNAs perform significant roles in both biological and patholog- ical processes, thus affecting the control and regulation of various human diseases. miRNAs realize critical functions via binding to their respective target genes. The ability to identify and analyze miRNA-target interactions in an *Correspondence: [email protected] 1 School of Computing, University of South Alabama, Mobile, Alabama 36688-0002, USA Full list of author information is available at the end of the article effective manner is thus a key step in the understanding and delineation of miRNA functions. The conventional method by which the users of data (e.g., biologists, bioinformaticians, and clinical investiga- tors) determine miRNA functions involves: Searching for biologically validated miRNA targets, for example, by querying the PubMed database [3]; and Finding additional potential miRNA targets, for example, by initiating inquiries on various prediction databases or websites such as miRDB [4], TargetScan [5], and miRanda [6]. © 2016 Huang et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

RESEARCH OpenAccess …soc.southalabama.edu/~huang/papers/OMIT-JBMS-InPrint.pdfHuangetal.JournalofBiomedicalSemantics (2016) 7:25 DOI10.1186/s13326-016-0064-2 RESEARCH OpenAccess OmniSearch:asemanticsearchsystem

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

  • Huang et al. Journal of Biomedical Semantics (2016) 7:25 DOI 10.1186/s13326-016-0064-2

    RESEARCH Open Access

    OmniSearch: a semantic search systembased on the Ontology for MIcroRNA Target(OMIT) for microRNA-target gene interactiondataJingshan Huang1*, Fernando Gutierrez2, Harrison J. Strachan1, Dejing Dou2, Weili Huang3, Barry Smith4,Judith A. Blake5, Karen Eilbeck6, Darren A. Natale7, Yu Lin8, Bin Wu9, Nisansa de Silva2, Xiaowei Wang10,Zixing Liu11, Glen M. Borchert12, Ming Tan11 and Alan Ruttenberg13

    Abstract

    As a special class of non-coding RNAs (ncRNAs), microRNAs (miRNAs) perform important roles in numerous biologicaland pathological processes. The realization of miRNA functions depends largely on how miRNAs regulate specifictarget genes. It is therefore critical to identify, analyze, and cross-reference miRNA-target interactions to better exploreand delineate miRNA functions. Semantic technologies can help in this regard. We previously developed a miRNAdomain-specific application ontology, Ontology for MIcroRNA Target (OMIT), whose goal was to serve as a foundationfor semantic annotation, data integration, and semantic search in the miRNA field. In this paper we describe ourcontinuing effort to develop the OMIT, and demonstrate its use within a semantic search system, OmniSearch,designed to facilitate knowledge capture of miRNA-target interaction data. Important changes in the current versionOMIT are summarized as: (1) following a modularized ontology design (with 2559 terms imported from the NCROontology); (2) encoding all 1884 human miRNAs (vs. 300 in previous versions); and (3) setting up a GitHub project sitealong with an issue tracker for more effective community collaboration on the ontology development. The OMITontology is free and open to all users, accessible at: http://purl.obolibrary.org/obo/omit.owl. The OmniSearch systemis also free and open to all users, accessible at: http://omnisearch.soc.southalabama.edu/index.php/Software.

    Keywords: microRNA, Non-coding RNA, Target gene, Biomedical ontology, Ontology development, Data annotation,Data integration, Semantic search, SPARQL query

    IntroductionmicroRNAs (miRNAs) are a type of non-coding RNA(ncRNA) with important biological, biomedical, and clin-ical impact. Prior research [1, 2] indicates that miRNAsperform significant roles in both biological and patholog-ical processes, thus affecting the control and regulation ofvarious human diseases. miRNAs realize critical functionsvia binding to their respective target genes. The abilityto identify and analyze miRNA-target interactions in an

    *Correspondence: [email protected] of Computing, University of South Alabama, Mobile, Alabama36688-0002, USAFull list of author information is available at the end of the article

    effective manner is thus a key step in the understandingand delineation of miRNA functions.The conventional method by which the users of data

    (e.g., biologists, bioinformaticians, and clinical investiga-tors) determine miRNA functions involves:

    • Searching for biologically validated miRNA targets,for example, by querying the PubMed database [3];and

    • Finding additional potential miRNA targets, forexample, by initiating inquiries on various predictiondatabases or websites such as miRDB [4], TargetScan[5], and miRanda [6].

    © 2016 Huang et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, andreproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to theCreative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

    http://crossmark.crossref.org/dialog/?doi=10.1186/s13326-016-0064-2-x&domain=pdfhttp://purl.obolibrary.org/obo/omit.owlhttp://omnisearch.soc.southalabama.edu/index.php/Softwaremailto: [email protected]://creativecommons.org/licenses/by/4.0/http://creativecommons.org/publicdomain/zero/1.0/

  • Huang et al. Journal of Biomedical Semantics (2016) 7:25 Page 2 of 17

    Unfortunately, both steps currently require significantmanual effort because the relevant data sources areboth syntactically and semantically heterogeneous — thatis, the meaning of seemingly similar data from differ-ent sources may be quite different and thus open tomisinterpretation. It is therefore challenging for usersto identify and establish possible links among originaldata sources. As a result, conventional miRNA knowl-edge discovery and acquisition methodologies are time-consuming, labor-intensive, error-prone, and sensitive tolimitations in the prior knowledge of different end users.These barriers are exacerbated by the need to obtainadditional information for each and every miRNA tar-get (whether validated or putative) using existing datasources and analysis tools, including but not limited to:the DAVID Bioinformatics Resources (DAVID) [7], NCBIGene [8], the Medical Subject Headings (MeSH) Database[9], the HUGO Gene Nomenclature Committee (HGNC)Database [10], and NCBI Nucleotide [11].Emerging semantic technologies can help in address-

    ing the aforementioned challenges. The core of cur-rent semantic technologies include specifications such asthe resource description framework (RDF), RDF Schema(RDFS), and Web Ontology Language (OWL), all ofwhich are intended to provide a formal description ofclasses of entities of different types and of the relationsbetween them in such a way as to enable automatic rea-soning (inference). Semantic technologies can be appliedto miRNA knowledge acquisition by transforming dataobtained from heterogeneous miRNA-related databasesinto a common framework by utilizing a single format(such as RDF) and aligning the data through use of anno-tations from common, formally defined ontologies. Bymeans of this transformation we can use the SPARQLProtocol (SPARQL) [11] to query the enhanced data auto-matically.In previous research [12–17], we investigated the con-

    struction of an application ontology for the miRNA field,named Ontology for MIcroRNA Target (OMIT), the firstontology to formally encode miRNA domain knowledge.By providing a standardized metadata model to establishmiRNA data connections among heterogeneous sources,the OMIT is able to fill two gaps: the lack of commondata elements and the lack of data exchange standards formiRNA research, especially with regard to miRNA-targetinteractions.We describe two major scientific contributions in this

    paper: (1) recent improvements to the OMIT ontologyand (2) a semantic search system, which is built upon theontology and enables the capture of miRNA-target inter-action data in a way leading to more effective miRNAknowledge acquisition.The remainder of this paper is organized as follows.

    “Related work” Section summarizes state-of-the-art

    research in biomedical ontologies and semantic search,respectively. “OMIT reconstruction” Section reportsour efforts on reconstructing the OMIT ontology.“OmniSearch: an OMIT-based semantic search system”Section describes technical details of OmniSearch,an OMIT-based semantic search system. “Results anddiscussion” Section reports our experimental results.Finally, “Conclusions” Section summarizes the majorpoints and presents ideas for future research.

    Related workRelated work in biomedical ontologiesThe use of ontologies to describe, define, and inte-grate biological entities has long been embraced by thebiological, biomedical, and clinical research commu-nities. Here we briefly describe some representativebio-ontologies included in both the Open Biologicaland Biomedical Ontologies (OBO) Library [18] and theNational Center for Biomedical Ontology (NCBO) Bio-Portal [19] that are pertinent to the development of thisproject.The Gene Ontology (GO) [20] is by far the most

    successful and widely used ontology for biologicaldata description. It consists of three independent sub-ontologies: biological processes, molecular functions, andcellular components, which describe these aspects of geneproducts: both protein and RNA. The GO has been widelyutilized to annotate gene products of model organisms.By the time of writing this paper, there were GO annota-tions for 36 organisms including Homo sapiens availablefor download.The Sequence Ontology (SO) [21] is an ontology to cap-

    ture genomic features and the relationships that obtainbetween them. This ontology contains the features neces-sary to annotate a genome with structural features such asgene models and also the terms necessary for the anno-tation of genomic variants. SO terms define the kinds ofand parts of ncRNA features, and these terms are usedto identify these features and their location in genomicsequence.The PRotein Ontology (PRO) [22] is a comprehensive

    description of the forms of protein, including isoforms,modifications, and the relationships between them. Pro-teins are functional entities in many processes eventuallyimpacted by the regulatory effect of ncRNAs (e.g., miRNAbindings). The PRO provides an ontological representa-tion of proteins with a particular focus on human proteinsand disease-related variants thereof.The RNA Ontology (RNAO) [23] is a candidate OBO

    foundry reference ontology to catalogue the molecu-lar entities composing primary, secondary, and tertiarycomponents of RNA. The goal of this project is toenable integration and computation over diverse RNAdatasets.

  • Huang et al. Journal of Biomedical Semantics (2016) 7:25 Page 3 of 17

    Related work in semantic searchSemantic search is a research field that intends to improvethe access to contents by considering the semanticsbehind the search process [24]. In other words, semanticsearch goes beyond conventional, keyword-based searchby considering the contextual meaning of words, theintent of the user, and the nature of the search space.In general, semantic search requires the use of struc-tured knowledge, such as ontologies, in the modeling andinterpretation of queries. Ontologies can help improvethe search by query expansion. One main idea in manysemantic search systems (e.g., [25–29]) is, the originalset of query keywords can be expanded by drawing onsynonyms and other relationships (e.g., subclass and part-hood) that are not part of the query. For example, inthe work by Chauhan et al. [29], the original query wasfirst expanded by considering synonyms, then terms withhigh semantic similarity were chosen from the ontologyto be integrated to the search query, and the seman-tic similarity used for the query expansion was com-puted by the distance among concepts in the ontology,the position in the hierarchy, and the number of upperclasses.Another way to implement semantic search is to use

    ontologies to translate keyword-based search into formalsemantic queries. For example, Tran et al. [24] useda set of models (mental, user, system, and query) tocapture information, such as thought entities, languageprimitives, knowledge representation (KR) primitives,and query elements. These models were then com-bined with a set of assumptions to redefine originalqueries, filling the gap between terms with structuralinformation from an ontology. That is, each termwithin the query was considered a property of anotherterm.

    OMIT reconstructionModularized ontology designThe OMIT ontology consists of the following modules:

    • omit.owl — Defines all OMIT-specific terms andrelations, for example, prediction_from_miRDB andgene_context_score_in_TargetScan.

    • bfo.owl — Imports upper-level terms from the BasicFormal Ontology (BFO) [30], for example, genericallydependent continuant and material entity.

    • ro-imports.owl — Imports common relations (sharedacross different ontologies) from the RelationOntology (RO) [31], for example, has participant andregulates.

    • ncro.owl — Imports ncRNA-related terms andrelations from the Non-coding RNA Ontology(NCRO) [32], for example, miRNA_target_gene andmiRNA_gene_family.

    • go-imports.owl — Imports gene product terms fromthe GO, for example, RNA binding and regulation ofbiological process.

    • so-imports.owl — Imports sequence structuralfeature terms from the SO, for example,biological_region and insertion_site.

    • obi-imports.owl — Imports life-science and clinicalinvestigation terms from the Ontology for BiomedicalInvestigations (OBI) [33], for example, cultured cellpopulation and organism.

    • chebi-imports.owl — Imports molecular entity(especially small chemical compounds) terms fromthe Chemical Entities of Biological Interest Ontology(ChEBI) [34], for example, ribonucleic acid andribosomal RNA.

    • iao-imports.owl — Imports information entity termsfrom the Information Artifact Ontology (IAO) [35],for example, information content entity.

    • clo-imports.owl — Imports cell line-relevant termsfrom the Cell Line Ontology (CLO) [36], for example,cell line.

    • pr-imports.owl — Imports protein-related entityterms from the PRO, for example, amino acid chainand protein.

    • uberon-imports.owl — Imports cross-speciesanatomy terms from the Uberon multi-speciesanatomy ontology (UBERON) [37], for example,anatomical structure and organ.

    • doid-imports.owl — Imports disease terms from theHuman Disease Ontology (DOID) [38], for example,disease of cellular proliferation and cancer.

    Note that:

    (1) Orthogonality among different ontologies is oneof the important practices proposed by the OBOFoundry Initiative, and has been widely accepted inthe bio-ontology community. As a result, to achievebetter orthogonality, it is a common practice to reusecontents defined in relevant, existing ontologies.(2) The OMIT ontology directly imported the NCROontology (a comprehensive ncRNA domainontology), which in turn, directly imported otherontologies in the above list. Therefore, the OMITontology itself includes two OWL files: “omit.owl”and “ncro.owl.” All other OWL files,“go-imports.owl” and “so-imports.owl” for example,are shown as “indirectly imported” in Protégé.(3) Ontology concepts are referred to as “classes” inProtégé and “terms” in OBO Edit, respectively.Therefore, “classes” and “terms” are interchangeablyused throughout the whole paper.

    Table 1 lists a subset of important terms and relationsimported into the OMIT.

  • Huang et al. Journal of Biomedical Semantics (2016) 7:25 Page 4 of 17

    Table 1 A subset of imported terms and relations

    Imported term or relation Source ontology Original ID

    RO:part of Relation Ontology BFO_0000050

    RO:participates in Relation Ontology RO_0000056

    RO:has participant Relation Ontology RO_0000057

    BFO:entity Basic Formal Ontology BFO_0000001

    BFO:continuant Basic Formal Ontology BFO_0000002

    BFO:independent continuant Basic Formal Ontology BFO_0000004

    BFO:occurrent Basic Formal Ontology BFO_0000003

    BFO:material entity Basic Formal Ontology BFO_0000040

    CHEBI:molecular entity Chemical Entities of Biological Interest Ontology CHEBI_23367

    CHEBI:ribonucleic acid Chemical Entities of Biological Interest Ontology CHEBI_23367

    CHEBI:ribosomal RNA Chemical Entities of Biological Interest Ontology CHEBI_18111

    CHEBI:small nuclear RNA Chemical Entities of Biological Interest Ontology CHEBI_74035

    CHEBI:transfer RNA Chemical Entities of Biological Interest Ontology CHEBI_17843

    NCRO:human_miRNA Non-coding RNA Ontology NCRO_0000810

    NCRO:hsa-miR-125b-1-3p Non-coding RNA Ontology NCRO_0003283

    NCRO:hsa-miR-125b-2-3p Non-coding RNA Ontology NCRO_0003284

    NCRO:hsa-miR-125b-5p Non-coding RNA Ontology NCRO_0003282

    NCRO:miRNA_target_gene Non-coding RNA Ontology NCRO_0000025

    NCRO:miRNA_and_target_gene_binding Non-coding RNA Ontology NCRO_0000003

    NCRO:protein_miRNA_promoter_binding Non-coding RNA Ontology NCRO_0000011

    IAO:information content entity Information Artifact Ontology IAO_0000030

    IAO:measurement datum Information Artifact Ontology IAO_0000109

    • The format for the left column (Imported Term orRelation) is PREFIX:human-readable label, forexample, NCRO:miRNA_target_gene and RO:part of.

    • The format for the right column (Original ID) isPREFIX_unique identifier, for example,NCRO_0000025 and BFO_0000001.

    Ontology core designThe core design of the OMIT ontology is shown in Fig. 1.Compared with earlier versions, the current version con-tains many important new terms and relations, and someof which are listed in Tables 2 and 3, respectively.

    • Both terms and relations are represented in theformat of PREFIX:label in Fig. 1.

    • For the purpose of better readability, labels ratherthan identifiers are used in Tables 2 and 3.

    • Relations in Table 3 were either defined in orimported into the OMIT, which can be easilydistinguished from each other by different prefixesused in the first column.

    OmniSearch: an OMIT-based semantic searchsystemBased on the OMIT ontology, we developed a semanticsearch system:OmniSearch. First, the OmniSearch system

    will conduct semantic annotation on various sources thatwere originally heterogeneous in their semantics; follow-ing that, OMIT-annotated data will then be integratedinto a unified and consistent data layer in RDF; and finally,complex semantic queries will be performed to providemeaningful results and clues to system end users (e.g.,biologists, bioinformaticians, and clinical investigators).

    Data sources usedData sources used in the OmniSearch system includethree miRNA target prediction databases (miRDB, Tar-getScan, and miRanda), as well as PubMed, NCBI Gene,GO, RNA Central, DAVID, HGNC, and MeSH termdatabases. These sources contain both structured data(database instances) and unstructured data (free text), andare semantically heterogeneous among each other.

    Software architectureTheOmniSearch system consists of several softwaremod-ules: semantic annotation, data integration, and semanticsearch.Semantic data annotation is the process of tagging

    source files with predefined ontological metadata likenames, entities, attributes, definitions, and descriptions.The annotation provides original data with extrametadata

  • Huang et al. Journal of Biomedical Semantics (2016) 7:25 Page 5 of 17

    Fig. 1 The design of core terms and relations in the OMIT ontology (both terms and relations are represented in the format of PREFIX:label)

    Table 2 A subset of new OMIT terms

    OMIT new term Direct parent term Human-readable explanation

    computationally_asserted_evidence IAO:information content entity Evidence obtained from some

    computational methods.

    information_from_miRNA_ OMIT:computationally_asserted_evidence Records obtained from various

    target_prediction_database miRNA target prediction databases.

    prediction_from_miRDB OMIT:information_from_miRNA_ Records specifically obtained

    target_prediction_database from the miRDB database.

    prediction_from_TargetScan OMIT:information_from_miRNA_ Records specifically obtained

    target_prediction_database from the TargetScan database.

    prediction_from_miRanda OMIT:information_from_miRNA_ Records specifically obtained

    target_prediction_database from the miRanda database.

    target_score_in_miRDB IAO:measurement datum The score of some specific

    miRNA-target binding prediction

    from the miRDB database.

    gene_context_score_in_TargetScan IAO:measurement datum The context score of some specific

    miRNA-target binding prediction

    from the TargetScan database.

    mirSVR_score_in_miRanda IAO:measurement datum The mirSVR score of some specific

    miRNA-target binding prediction

    from the miRanda database.

    information_from_NCBI_gene IAO:information content entity Records obtained from NCBI Gene

    according to gene IDs or gene symbols.

    information_from_NCBI_nucleotide IAO:information content entity Records obtained from NCBI Nucleotide

    according to GenBank Accession numbers.

    information_from_PubMed IAO:information content entity Records obtained from the PubMed

    database according to PMIDs.

  • Huang et al. Journal of Biomedical Semantics (2016) 7:25 Page 6 of 17

    Table 3 A subset of new OMIT relations

    New relation Domain Range Human-readable explanation

    OMIT:miRNA_target_ NCRO:miRNA_and_ OMIT:computationally_ Specific miRNA-target binding

    assumption_ target_gene_binding asserted_evidence prediction is based on some

    based_on computationally asserted evidence.

    OMIT:is_quality_ IAO:measurement datum OMIT:computationally_ A piece of measurement datum

    measurement_of asserted_evidence (e.g., the target score in miRDB)

    is a quality measurement of

    computationally asserted evidence.

    OMIT:is_gene_ NCRO:miRNA_target_gene OMIT:target_protein A miRNA target gene

    template_of_protein serves as a template

    of relevant protein.

    RO:has participant OMIT:prediction_from_miRDB SO:miRNA Each miRNA-target binding

    prediction record has one

    miRNA as a participant.

    RO:has participant OMIT:prediction_from_miRDB NCRO:miRNA_target_gene Each miRNA-target binding

    prediction record has one

    target as a participant.

    RO:part of OMIT:target_score_in_miRDB OMIT:prediction_from_miRDB Each miRNA-target binding

    prediction record from

    miRDB contains one score.

    Each record from NCBI

    RO:part of OMIT:PubMed_summary_ OMIT:information_from_NCBI_gene Gene contains one or

    in_NCBI_gene more PubMed summaries.

    information formally defined in the OMIT ontology. Theoutput of semantic data annotation is a collection ofRDF triples (from both free text and database instances).These triples will be accumulated into a centralized RDFrepository: OmniStore.We used Python scripts to conduct automated seman-

    tic annotation and data integration. As an example, Fig. 2shows the flowchart of our programs to annotate miRDBdata. We explain below the detailed steps. One miRDBfile, the “miRNA data” file, contains two columns con-sisting of miRNA names and their associated interna-tionalized resource identifiers (IRIs). Another miRDB file,the “gene data” file, contains four columns consistingof miRNA names, gene IDs, gene symbols, and targetscores.

    • Step One: As each miRNA name and its associatedIRI were read in from the miRNA data file, they wereplaced into a dictionary where the miRNA name isthe key and the IRI is the value.

    • Step Two: All lines were read in from the gene datafile, and each line was converted into a total of fourRDF triples. (1) The first triple was generated torepresent a newly created instance of the

    prediction_from_miRDB class, namely, instance_i,and a new OMIT IRI was assigned to instance_i. (2)Next, the miRNA name read from the same line wasused to retrieve its corresponding IRI from thedictionary (generated in Step One). The second triplethen connected this retrieved IRI with instance_i. (3)Two more triples were generated to connectinstance_i with the corresponding gene ID and targetscore read in from the same line, respectively.

    • Step Three: Finally, all generated RDF triples werewritten into a Turtle file.

    Note that:

    (1) “Semantic annotation and data integration”Section exhibits some example triples resulted fromthe above-mentioned annotation process.(2) Mappings between database schemas andontological entities were defined in the OMITontology and can be reused or modified in the future,when needed.(3) Due to our automated annotation and integrationtechniques, only minimum effort will be required tointegrate a new resource in the future.

  • Huang et al. Journal of Biomedical Semantics (2016) 7:25 Page 7 of 17

    Fig. 2 Semantic annotation and data integration flowchart in the OmniSearch system

    Because all semantic tags are to be generated from theglobal metadata model defined in the OMIT ontology, theRDF triple repository will provide a unified view over orig-inal data sources at semantic level. Consequently, complexsemantic queries will be enabled. To implement semanticsearch, we made use of Apache HTTP server [39], PHP:Hypertext Preprocessor (PHP) server [40], and ApacheJena Fuseki server [41]. The overall software architec-ture is demonstrated in Fig. 3, with the following workingprotocol:

    • Query parameters are sent from the client’s browserto the Apache server through Ajax requests.

    • SPARQL queries are dynamically generated by theApache server using these query parameters, whichare then sent to the Apache Jena Fuseki server.

    • JSON objects, containing the requested information,are retrieved from the RDF triple store (installed onthe Apache Jena Fuseki server) after running thedynamically generated SPARQL queries.

    • These JSON objects are returned to the Apacheserver, which are used to generate either (1) a list of

    miRNAs and/or MeSH terms or (2) the HTMLMarkup for the search result table.

    • Finally, the Apache server sends the obtained data, oran error message if the search fails, back to theclient’s browser as a JSON object.

    User interface designThe OmniSearch is a Web-based search system that isfree and open to all users, accessible at: http://omnisearch.soc.southalabama.edu/index.php/Software. As shown inFig. 4, the main components of the graphic user inter-face (GUI) are: two search criteria boxes, a search resulttable, a pagination control, a set of result viewing filters,a result download tool, and DAVID analysis functionality.More discussion on our friendly user interface design canbe found in “Search results and discussion” Section.

    Results and discussionThe significantly refactored OMIT ontologyThe updated version of the OMIT ontology containsa total of 3169 terms and 46 relations (besides a totalof 5515 is_a relations). Note that out of 46 relations

    http://omnisearch.soc.southalabama.edu/index.php/Softwarehttp://omnisearch.soc.southalabama.edu/index.php/Software

  • Huang et al. Journal of Biomedical Semantics (2016) 7:25 Page 8 of 17

    Fig. 3 Semantic search architecture in the OmniSearch system

    mentioned here, there are 5 data properties, and therest are object properties. Also note that these termsand relations include both OMIT-specific ones and thoseimported ones1.Compared with the previous versions [12–17], impor-

    tant changes in the current version OMIT ontology aresummarized as follows.

    • As discussed earlier in “Modularized ontology design”Section, we have followed amodularized ontologydesign in this new version, which will significantlyfurther facilitate the ontology maintenance andupdate. In particular, a total of 2559 terms in theupdated OMIT have been imported from the NCROontology [32]. Because the NCRO is a comprehensivedomain ontology in the ncRNA field, following theNCRO hierarchy will enhance the interoperabilitybetween the OMIT and future ontologies to bedeveloped in other ncRNA sub-domains.

    • In the previous versions of OMIT, around 300human miRNAs were included. In the currentversion, all 1884 miRNAs appearing in humanshave been encoded, along with the information about

    the gene family group of each and every miRNA.According to miRBase [42], there are a total of 320different gene family groups. This information can behighly valuable because the fact that two or moremiRNAs of interest indeed belong to the same genefamily group can provide biologists,bioinformaticians, and clinical investigators withcritical clues in constructing new hypothesis.

    • In our previous investigations, we established adedicated project website [43], as well as entries inboth the OBO Library [44] and the NCBO BioPortal[45]. To further disseminate the ontology, and, togather feedback from community in a more effectivemanner, we have recently created a GitHub projectsite (https://github.com/OmniSearch/omit) for thisnew version OMIT ontology. We have alsoestablished a tracker [46] for an enhancedmechanism in handling the discussion amonggroups to further improve the ontology. Newconcepts, definitions, and their locations in theOMIT can now be proposed, debated, and approved(or rejected) by an open group of individuals throughthis tracker.

    https://github.com/OmniSearch/omit

  • Huang et al. Journal of Biomedical Semantics (2016) 7:25 Page 9 of 17

    Fig. 4 GUI design in the OmniSearch system

    Semantic annotation and data integrationExperimental setupThe OmniStore RDF repository is housed on a server withthe following configuration: Intel(R) Core(TM) i7-3632QM CPU @ 2.80 GHz 2.80 GHz; 32.00 GB memory; andWindows Server 8 Operating System.

    Semantic annotation and data integration resultsOmniStore contains a total of 6,136,514 RDF triples, andthe file size of OmniStore is 369 MB. All triples are rep-resented in RDF 1.1 Turtle: Terse RDF Triple Languageformat [47], for example:

    rdfs:subClassOf

    .

    rdfs:label

    "IRF4" .

    rdf:type

    .

    .

    .

    100 .

    The semantics of the above six example triples is: IRF4(OMIT_0015037) is a subclass of the miRNA_target_geneclass (NCRO_0000025); one miRDB database record(OMIT_0995324), which is an instance of the pre-diction_from_miRDB class (OMIT_0000020), indicatesthat IRF4 is a predicted target of the miRNA hsa-miR-125b-5p (OMIT_0050688); and the prediction score(OMIT_0000108) is 100.

  • Huang et al. Journal of Biomedical Semantics (2016) 7:25 Page 10 of 17

    Semantic searchWe use one example in this section to demonstrate indetail how the OmniSearch system assists in end users’knowledge acquisition.

    Experimental setupSemantic search was conducted on a personal computer(PC) with the following configuration: Intel(R) Core(TM)i7-3632 QM CPU @ 2.50 GHz 2.50 GHz; 16.00 GB mem-ory; and Windows 10 64-bit Operating System.

    SPARQL query statementsThe SPARQL statements to generate the miRNA andMeSH term lists in the two search boxes are as fol-lows, where the PHP variable $type is used to determinewhether the client is requesting a miRNA or MeSH term,and the PHP variable $input contains either a partial orexact miRNA or MeSH term. Note that each line of thequery statement has a detailed explanation right above it(the line starting with a pound sign “#”).

    # prefix declarations

    PREFIX rdfs:

    # result clause

    SELECT ?label

    # query pattern

    WHERE {

    # get IRI of either human_miRNA or

    MeSH_Term as parent

    ?parent rdfs:label $type .

    # get all children of parent

    ?child rdfs:subClassOf ?parent .

    # get label for each child

    ?child rdfs:label ?label .

    # filter results to only include label

    that match the user input

    FILTER REGEX(LCASE(?label), LCASE

    ($input))

    }

    # order the result by label

    ORDER BY ?label

    Suppose that the question of interest is: “What is therole of hsa-miR-125b-5p in cancer drug resistance?” TheSPARQL statements are as follows. Similarly, all querystatements have a detailed explanation.

    # prefix declarations

    PREFIX rdfs:

    PREFIX rdf:

    PREFIX obo:

    # result clause

    SELECT ?gene_symbol

    # group the gene ids together

    (GROUP_CONCAT(DISTINCT ?g_id;

    SEPARATOR=",") AS ?gene_id)

    # assign mdb_score to mirdb_score if

    bound, otherwise assign 0

    (MAX(COALESCE((?mdb_score),

    ?mdb_score,0)) AS ?mirdb_score)

    # assign ts_score to targetscan_score

    if bound, otherwise assign 0

    (MAX(COALESCE(?ts_score), ?ts_score, 0))

    AS ?targetscan_score)

    # assign absolute value of mrnd_score

    to miranda_score if bound, otherwise

    assign 0

    (MAX(COALESCE(ABS(?mrnd_score), 0)) AS

    ?miranda_score)

    # concatenate and group all pubmed ids

    together, separated by a comma

    (GROUP_CONCAT(DISTINCT ?pmid;

    SEPARATOR=",") AS ?pubmed_ids)

    # query pattern

    WHERE {

    # get microRNA IRI with label

    "hsa-miR-125b-5p"

    ?mirna rdfs:label "hsa-miR-125b-5p" .

    # get prediction that has_human_miRNA

    microRNA IRI

    ?prediction obo:OMIT_0000159 ?mirna .

    # get target where prediction has_

    miRNA_target_gene

    ?prediction obo:OMIT_0000160 ?target .

    # get gene symbol label of target

    ?target rdfs:label ?gene_symbol .

    # get target gene_id as g_id

    ?target obo:OMIT_0000109 ?g_id .

    OPTIONAL {

    # get prediction of type

    prediction_from_TargetScan

    ?prediction rdf:type obo:

    OMIT_0000019 .

    # get prediction score as ts_score

    ?prediction obo:OMIT_0000108

    ?ts_score

    }.OPTIONAL {

    # get prediction of type

    prediction_from_miRDB

    ?prediction rdf:type obo:OMIT_

    0000020 .

    # get prediction score as mdb_score

    ?prediction obo:OMIT_0000108 ?mdb_

    score

    }.

  • Huang et al. Journal of Biomedical Semantics (2016) 7:25 Page 11 of 17

    OPTIONAL {

    # get prediction of type

    prediction_from_miRanda

    ?prediction rdf:type obo:OMIT_

    0000021 .

    # get prediction score as mrnd_

    score

    ?prediction obo:OMIT_0000108 ?mrnd_

    score

    }.

    OPTIONAL {

    # get MeSH Term IRI for "drug

    resistance"

    ?mesh_term rdfs:label "drug

    resistance" .

    ### exact match ###

    # get ’pubmed id’ associated with

    the target gene

    ?target obo:OMIT_0000151

    ?pubmed_id .

    # get ’pubmed id’ associated with

    the mesh term

    ?mesh_term obo:OMIT_0000151

    ?pubmed_id

    ### narrower match ###

    # get each successive child of mesh

    term

    #?child rdfs:subClassOf* ?mesh_

    term .

    # get ’pubmed id’ associated with

    the target gene

    #?target obo:OMIT_0000151

    ?pubmed_id .

    # get ’pubmed id’ associated with

    the child mesh term

    #?child obo:OMIT_0000151

    ?pubmed_id

    ### broader match ###

    # get each successive parent mesh

    term

    #?mesh_term rdfs:subClassOf*?parent .

    # restric parent to be a subclass

    of MeSH_Term

    #?parent rdfs:subClassOf obo:OMIT_

    0000110 .

    # get ’pubmed id’ associated with

    the target gene

    #?target obo:OMIT_0000151

    ?pubmed_id .

    # get ’pubmed id’ associated with

    the parent mesh term

    #?parent obo:OMIT_0000151

    ?pubmed_id

    }.

    }

    # group the results by gene symbol

    GROUP BY ?gene_symbol

    # order the results by mirdb score then by

    targetscan score

    ORDER BY DESC(?mirdb_score)DESC

    (?targetscan_score) DESC(?miranda_score)

    Search results and discussionCorresponding to the aforementioned question of inter-est, Fig. 5 demonstrates the search results from a queryon hsa-miR-125b-5p along with a MeSH-term filter “drugresistance”.

    • The “Candidate Targets” column contains all targetspredicted by at least one target prediction database.The user can choose a prediction database and sortall targets by the scores, in descending order, fromthe selected database.

    • The “Predicted By” column shows that each target ispredicted by which database(s), along with the Weblink(s) to these database(s).

    • The “Publications” column links to all PubMedpublications that are relevant to the search andfiltering criteria. In this example, the criteria for anyline are: the predicted target on that line, the miRNAhsa-miR-125b-5p, and the MeSH-term filter “drugresistance.”

    • The “GO Annotations” column connects to GOannotation results of each predicted target and themiRNA hsa-miR-125b-5p, respectively.

    • Pathway analysis through DAVID can be performedon selected targets, either using the checkboxes to theleft of the table or clicking the “Select All Targets”checkbox. Additionally, the user can select the desiredtool to perform such analysis, “Gene FunctionalClassification,” “Functional Annotation Clustering,”“Functional Annotation Summary,” and so forth.

    • The whole result table can be downloaded in twodifferent formats (tab-delimited text or CSV format);the user is also able to download only the predictedtargets (selected ones or all).

    We examined the search results demonstrated in thisexample, and our observations are summarized below.

    1. Effective querying and accurate search results.

    • Potential targets from all three miRNA targetprediction databases (miRDB, TargetScan, and

  • Huang et al. Journal of Biomedical Semantics (2016) 7:25 Page 12 of 17

    Fig. 5 Search results for the question of “What is the role of hsa-miR-125b-5p in cancer drug resistance?”

    miRanda) were correctly retrieved. There were476 and 924 targets from miRDB andTargetScan, respectively; and there were 323common targets. Consequently, a total of 1077distinct targets were retrieved in the table whenthe “Predicted by Any Database” filter waschosen. Note that the miRanda database did notcontain prediction results for the miRNAhsa-miR-125b-5p; therefore, no resultsappeared in the table when the display filter wasset to “Predicted by All Databases.” In fact, thisobservation further verified the effectiveness ofthe OmniSearch system.

    • Relevant papers according to the search criteriawere successfully retrieved. For example, twopublications (PMID: 2497002 and 22808086)were retrieved for the predicted target LIN28A,supporting the conclusion that “Lin28A

    contributes to cancer drug resistance;” and threepublications (PMID: 21823019, 24643683, and19463775) were retrieved for the predictedtarget BAK1, supporting another conclusionthat “BAK1 has an important role in cancer drugresponse and drug resistance.”

    • RNA Central annotations and GO annotationswere correctly obtained. In this example query,a total of five sequences regarding the miRNAhsa-miR-125b-5p were retrieved from RNACentral annotations, and GO annotations for allpredicted targets were retrieved as well. Forexample, a total of 117 GO annotations(GO_REF:0000038, GO_REF:0000033, and soforth) were retrieved regarding a potentialtarget, BAK1.

    • Based on the above knowledge returned in theOmniSearch GUI, regarding the example

  • Huang et al. Journal of Biomedical Semantics (2016) 7:25 Page 13 of 17

    question of “What is the role ofhsa-miR-125b-5p in cancer drug resistance?”end users obtained the following answer: It isreasonable to speculate that expression ofthe miRNA hsa-miR-125b-5p contributes tocancer drug resistance, possibly through itssuppression of expression for target genesBAK1 and/or LIN28A.

    Discussion:(1) miRDB, TargetScan, and miRanda databases havequite different meanings among each other in termsof their database entities. Due to the underlyingOMIT and the formally defined semantics in theontology, the OmniSearch system was able toeffectively integrate the prediction results from allthree databases. Note that conventional,database-oriented techniques can also implementsuch integration; however, inflexible, ad-hochard-coding will be required.(2) To retrieve a correct set of relevant papersrequires accessing numerous heterogeneous datasources such as NCBI Gene, PubMed, HGNC, andMeSH. Without the common data elements definedin the OMIT and the thereafter semantictechnologies including semantic annotation and dataintegration, it would have been extremely challengingto effectively integrate data from these sources, whichis the case in database-oriented search and querying.(3) As discussed earlier in “OMITreconstruction” Section, the OMIT is closelyconnected with the GO by importing a set of GOterms. Compared with data integration based ontraditional, relational databases, our approach hasfurther facilitated integrating data about GOannotations.

    2. More efficient querying process.• One-stop visit rather than accessing different

    data sources separately, resulting in about 60 %of time saved for end users.

    • DAVID analysis was performed in a moreefficient manner due to the target gene listautomatically generated by the system. resultingin about 50 % of time saved for end users.

    • It was easier to compare different predictionresults among miRDB, TargetScan, andmiRanda databases, resulting in about 60 % oftime saved for end users.

    • The above percentages of saved time werecalculated as follows: We asked theaforementioned domain experts to perform agiven set of queries using their conventionalmethods; next, they performed the same set ofqueries through the OmniSearch GUI; and

    finally, the saved time for all domain expertswere averaged. Greater details on the systemtime and saved time for end users are containedin Table 4.

    • Applying the MeSH-term filter resulted in amuch smaller number of relevant publicationsreturned. For example, 50 vs. 16 for the targetABCC5, 13 vs. 2 for the target DPH2, and 31 vs.3 for the target FOXQ1. More examples aredemonstrated in Table 5.

    Discussion:(1) The reduced time spent by users was due to bothdata integration and the more accurate semanticsdefined in the ontology.(2) In an non-ontology software system, to filteringon MeSH terms almost unavoidably results inhard-coding some ad-hoc searching rules in sourcecode. On the contrary, semantics-oriented systems,such as OmniSearch, can well handle this issue in amore efficient manner. By decoupling domainknowledge from source code, ontologies and softwareapplications can be developed independently, leadingto more flexible software development.(3) Based on the is_a relation, the OmniSearchsystem can perform logic reasoning over theontology concept hierarchy (that is, both broader andnarrower terms of the ontology term of interest),thus greatly improving the flexibility of search andquery capability. For example, after a MeSH term ischosen by users, they are able to search the exactMeSH term, or its broader terms (i.e., ancestorterms) and narrower terms (i.e., offspring terms)defined in the ontology. Such results would not havebeen obtained without semantic technologiesbecause systems based on relational databases arenot able to perform any logical reasoning. Of course,users can still manually perform numerous queriesand then obtain similar results as obtained from oursystem. However, such manual querying issignificantly more time-consuming andlabor-intensive, and more importantly, error-prone.(4) Cross-referencing among miRDB, TargetScan,and miRanda prediction results was made mucheasier because relevant database entities have alreadybeen formally defined in the OMIT. In other words,unambiguous semantics was accurately encoded withcommon data elements provided by the ontology,resulting in successful data sharing and exchangingamong heterogeneous data sources.(5) We asked the aforementioned domain experts toverify the accuracy of MeSH-term filtering. Becauseall returned publications contained thecorresponding MeSH term, the Precision measure

  • Huang et al. Journal of Biomedical Semantics (2016) 7:25 Page 14 of 17

    Table 4 The system time and saved time for end users

    QueryFirst search Second search System time User time Percentage of saved Percentage of saved Percentage of savedcriterion criterion (seconds) (seconds) time for end users time on DAVID analysis time on result comparison

    1 hsa-miR-1231 cell movement 2.51 10 62 % 55 % 61 %

    2 hsa-miR-1288-5p cell proliferation 2.89 9 61 % 51 % 62 %

    3 hsa-miR-143-3p mitosis 5.54 10 61 % 52 % 60 %

    4 hsa-miR-192-5p leukemic infiltration 2.24 8 53 % 53 % 59 %

    5 hsa-miR-216a-5p drug resistance, 4.09 11 65 % 55 % 62 %

    multiple

    6 hsa-miR-29c-3p recurrence 8.99 11 68 % 53 % 63 %

    7 hsa-miR-3155a dna cleavage 1.21 6 53 % 47 % 55 %

    8 hsa-miR-320b drug resistance 17.59 18 73 % 51 % 66 %

    9 hsa-miR-3622a-5p entosis 0.30 6 51 % 43 % 57 %

    10 hsa-miR-371b-5p mitochondrial 3.89 12 66 % 59 % 64 %

    dynamics

    11 hsa-miR-3934-5p dna methylation 0.93 8 61 % 45 % 59 %

    12 hsa-miR-4263 mutagenesis 1.65 6 52 % 46 % 56 %

    13 hsa-miR-4431 mitochondrial 0.17 6 53 % 47 % 55 %

    degradation

    14 hsa-miR-4505 cell transformation, 4.25 10 63 % 55 % 61 %

    neoplastic

    15 hsa-miR-4648 cell polarity 0.71 6 52 % 45 % 57 %

    16 hsa-miR-4700-3p neoplasm regression, 1.56 7 53 % 51 % 59 %

    spontaneous

    17 hsa-miR-4756-5p endocytosis 3.76 10 67 % 53 % 62 %

    18 hsa-miR-4802-3p drug resistance, 1.67 7 55 % 47 % 59 %

    microbial

    19 hsa-miR-501-3p insulin resistance 1.78 8 57 % 43 % 61 %

    20 hsa-miR-520a-3p ubiquitination 13.31 17 75 % 55 % 65 %

    Average ————— ————— 3.95 9.30 60.05 % 50.30 % 60.15 %

    was evaluated as 100 %. As for the Recall measure, ittook a much longer time to evaluate because weneeded to identify all publications that wereincorrectly filtered out by the system. For example,there were three (one, resp.) publications relevant toCSNK2A1 (DVL3, resp.) that should not have beenfiltered out. More such examples are demonstratedin Table 6. Overall, an average Recall of 73 % wasachieved, meaning that while a user is able to obtaindesired knowledge in a much more efficient manner(by reading significantly less publications, as shown inTable 5), the potential information lost is rather low.

    3. Friendly user interface.

    • For both search boxes, a list of partiallymatching terms were presented in a drop-downbox as users typed in the box. Users were also

    allowed to not to type in anything, in which caseall terms will be presented.

    • The “Rows Per Page” drop-down and paginationcontrol helped users to easily navigate among allpredicted targets.

    • A set of display filters were designed to allowusers to conveniently and freely customize theirpreferred way to view retrieved results fromvarious facets. For example, results can besorted by the prediction score from any selectedprediction database; users can choose to viewonly results that have publication evidence, ordoes not have such evidence, or both; and soforth.

    • Flexible download options were provided, andall downloaded documents had self-explanatory,meaningful file names that contain the search

  • Huang et al. Journal of Biomedical Semantics (2016) 7:25 Page 15 of 17

    Table 5 Reduced number of publications after applying theMeSH-term filter “drug resistance”

    Target gene Original number Number of papers Percentagesymbol of papers after MeSH filtering reduced

    ABCC5 50 16 68 %

    DPH2 13 2 85 %

    FOXQ1 31 3 90 %

    CIAPIN1 43 4 91 %

    SLC38A9 12 1 92 %

    MCL1 452 31 93 %

    MKNK2 30 2 93 %

    BAG4 32 2 94 %

    ARID3B 18 1 94 %

    HSPB2 79 4 95 %

    THEMIS2 20 1 95 %

    BAK1 266 11 96 %

    SULT4A1 27 1 96 %

    FUT4 57 2 96 %

    GPC6 29 1 97 %

    DDX54 29 1 97 %

    MBD1 58 2 97 %

    PRDM1 118 4 97 %

    DTNB 30 1 97 %

    LIN28A 91 3 97 %

    SIRT7 33 1 97 %

    ZBTB7A 67 2 97 %

    NCOR2 240 7 97 %

    TTPA 35 1 97 %

    MAP3K10 35 1 97 %

    SGPL1 36 1 97 %

    MYO18A 36 1 97 %

    EIF4EBP1 217 6 97 %

    LIMK1 109 3 97 %

    TP53INP1 37 1 97 %

    CYTH1 39 1 97 %

    SLC7A1 41 1 98 %

    date, “Query_Results_for_hsa-miR-125b-5p-2015-12-05.csv” and“Target_List_for_hsa-miR-125b-5p-2015-12-05.txt” for example.

    ConclusionsAs a special class of ncRNAs, miRNAs have been demon-strated to play important roles in various biological andpathological processes. Because miRNAs realize theirfunctions by regulating respective targets, it is critical to

    Table 6 An example set of publications correctly/incorrectlyfiltered by “drug resistance”

    Gene symbol Total number ofpublicationswithout applyingthe “drug resistance”filter

    Total numberof publicationsthat contain theMeSH term “drugresistance”

    Total number ofincorrectly filteredpublications

    IRF4 130 3 0

    ARID3B 18 1 0

    SGPL1 36 1 0

    ESRRA 131 3 0

    PAFAH1B1 129 1 0

    ETS1 287 5 0

    TTPA 35 1 0

    DVL3 60 1 1

    THEMIS2 20 1 0

    VTCN1 66 1 0

    WDR5 128 1 0

    ETV6 198 4 0

    TAZ 74 1 0

    IL6R 300 1 0

    DPH2 13 2 0

    BTG2 84 1 0

    CYP24A1 146 2 0

    LIN28A 91 3 0

    TRPS1 69 1 0

    CSNK2A1 619 5 3

    TP53INP1 37 1 0

    GPC6 29 1 0

    DICER1 291 3 0

    identify and analyze miRNA-target interaction data tobetter explore and delineate miRNA functions. Semantictechnologies and domain ontologies have been utilized toovercome limitations of conventional miRNA knowledgeacquisitionmethods. To this end, we followed the researchdirection identified in our previous investigations regard-ing the establishment of common data elements and dataexchange standards in the miRNA research. Specifically,our major scientific contributions in this paper are:

    • We have significantly improved the OMIT ontologyby: (1) following a modularized ontology design; (2)encoding all 1884 human miRNAs; and (3) setting upa GitHub project site along with an issue tracker formore effective community collaboration on theontology development. The up-to-date ontology file isaccessible at: http://purl.obolibrary.org/obo/omit.owl.

    • Based upon the OMIT, we built the OmniSearchsemantic search system, accessible at: http://

    http://purl.obolibrary.org/obo/omit.owlhttp://omnisearch.soc.southalabama.edu/index.php/Software

  • Huang et al. Journal of Biomedical Semantics (2016) 7:25 Page 16 of 17

    omnisearch.soc.southalabama.edu/index.php/Software. Our experimental results demonstratedpromising performance of OmniSearch.Consequently, more effective, more efficientmiRNA-related knowledge capture has beenachieved.

    Finally, some research directions are envisioned as fol-lows for our future work.(1) To investigate a new set of filters to perform a wider

    scope of ontology reasoning. For example, potential filterscan be developed according to different miRNA cate-gories such as: oncogenic or tumor-suppressive miRNAs;individual tissues and/or cell lines in which miRNAs areexpressed; and the gene family group to which miRNAsbelong.(2) To verify the consistency of contents retrieved

    from different data resources is another important futureresearch topic. It is not trivial to resolve conflicting factsamong different sources.(3) It would be terrific for users to have more flexible

    options in further exploiting the semantics of the domain.Note that to construct more flexible queries will involvenatural language processing (NLP) techniques, which arebeyond the scope of this paper. Nevertheless, such aninteresting topic can be considered in the future.

    Endnote1There are 103 and 18 OMIT-specific terms and

    relations, respectively.

    Competing interestsThe authors declare that they have no competing interests.

    Authors’ contributionsAll authors performed requirements analysis. HJS contributed to GUIdevelopment. JH, AR, YL, HJS, and FG contributed to ontology development,term definition, and annotation examples. All authors read and approved thefinal manuscript.

    AcknowledgementsFunding for Huang, J was provided in part by the National Cancer Institute(NCI) at the National Institutes of Health (NIH), under the Award NumberU01CA180982. Funding for Borchert, GM was provided in part by NaturalScience Foundation (NSF) CAREER grant 1350064 (GMB) awarded by Divisionof Molecular and Cellular Biosciences (with co-funding provided by the NSFEPSCoR program) and in part by the Abraham A. Mitchell Cancer ResearchFund. The views contained in this paper are solely the responsibility of theauthors and do not represent the official views, either expressed or implied, ofthe NIH, NSF, the U.S. Government, or the Abraham A. Mitchell CancerResearch Fund.

    Author details1School of Computing, University of South Alabama, Mobile, Alabama36688-0002, USA. 2Computer and Information Science Department, Universityof Oregon, Eugene, Oregon 97403-1202, USA. 3Miracle Query, Inc., Eugene,Oregon 97403-1202, USA. 4Department of Philosophy, University at Buffalo,Buffalo, New York 14260-4150, USA. 5Genome Informatics, The JacksonLaboratory, Bar Harbor, Maine 04609-1523, USA. 6Department of BiomedicalInformatics, University of Utah, Salt Lake City, Utah 84112-5775, USA.7Department of Biochemistry and Molecular and Cellular Biology, GeorgetownUniversity Medical Center, Washington D.C. 20007-1485, USA. 8Center for

    Computational Science, University of Miami, Miami, Florida 33146-2960, U.S.A.9Department of Microbiology and Immunology, First Affiliated Hospital,Kunming Medical University, Kunming, Yunnan 650032, China. 10Departmentof Radiation Oncology, Washington University School of Medicine, St. Louis,Missouri 63110-0001, USA. 11Mitchell Cancer Institute, University of SouthAlabama, Mobile, Alabama 36604-1405, USA. 12Department of Biology,University of South Alabama, Mobile, Alabama 36688-0002, USA. 13School ofDental Medicine, University at Buffalo, Buffalo, New York 14214-8006, USA.

    Received: 15 December 2015 Accepted: 12 April 2016

    References1. Zhao YH, Zhou M, Liu H, Khong HT, Yu DH, Fodstad O, Tan M.

    Upregulation of lactate dehydrogenase-A by ErbB2 through heat shockfactor 1 promotes breast cancer cell glycolysis and growth. Oncogene.2009;28(42):3689–701.

    2. Liu Z, Liu H, Desai S, Schmitt D, Zhou M, Khong HT, Klos KS, McClellanS, Fodstad O, Tan M. MiR-125b functions as a key mediator forsnail-induced stem cell propagation and chemoresistance. J Biol Chem.2013;288(6):4334–4345.

    3. Lu Z. PubMed and beyond: a survey of web tools for searchingbiomedical literature. Database. 2011;2011:1–13.

    4. miRDB . [Online]. Available: http://mirdb.org/miRDB/. Accessed 19 Mar2016.

    5. TargetScan. [Online]. Available: http://www.targetscan.org. Accessed 19Mar 2016.

    6. miRanda. [Online]. Available: http://www.microrna.org. Accessed 19 Mar2016.

    7. DAVID Bioinformatics Resources. [Online]. Available: https://david.ncifcrf.gov/home.jsp. Accessed 19 Mar 2016.

    8. NCBI Gene. [Online]. Available: http://ncbi.nlm.nih.gov/gene. Accessed 19Mar 2016.

    9. Medical Subject Headings Database. [Online]. Available: https://www.nlm.nih.gov/mesh/. Accessed 19 Mar 2016.

    10. HUGO Gene Nomenclature Committee (HGNC) Database. [Online].Available: http://www.genenames.org/. Accessed 19 Mar 2016.

    11. NCBI Nucleotide. [Online]. Available: ncbi.nlm.nih.gov/nucleotide/.12. Huang J, Tan M, Dou D, He L, Townsend C, Rudnick R, Hayes P. MiRNA

    Ontology for Target Prediction in Human Cancer. In: Proc. 1st ACMInternational Conference on Bioinformatics and Computational Biology,ACM-BCB-2010. Niagara Falls, NY: ACM Press; 2010.

    13. Townsend C, Huang J, Dou D, Dalvi S, Hayes P, He L, Lin W, Liu H,Rudnick R, Shah H, Sun H, Wang X, Tan M. OMIT: Domain Ontology andKnowledge Acquisition in MicroRNA Target Prediction. In: Proc. 9th Intl’Conference on Ontologies, DataBases, and Applications of Semantics,ODBASE-2010. Crete, Greece: Springer-Verlag; 2010.

    14. Huang J, Townsend C, Dou D, Liu H, Tan M. OMIT: a domain-specificknowledge base for MicroRNA target prediction. Pharm Res. 2011;28(12):3101–4.

    15. Huang J, Dang J, Lu X, Dou D, Blake J, Gerthoffer W, Tan M. AnOntology-Based MicroRNA Knowledge Sharing and AcquisitionFramework. In: Proc. BHI Workshop at 2012 IEEE International Conferenceon Bioinformatics and Biomedicine, BIBM-2012. Philadelphia, PA: IEEEComputer Society Press; 2012.

    16. Huang J, Dang J, Lu X, Xiong M, Gerthoffer W, Tan M. Semi-AutomatedmicroRNA Ontology Development based on Artificial Neural Networks. In:Proc. 2013 IEEE International Conference on Bioinformatics andBiomedicine, (BIBM-2013). Shanghai, China: IEEE Computer Society Press;2013.

    17. Huang J, Dang J, Borchert GM, Eilbeck K, Zhang H, Xiong M, Jiang W,Wu H, Blake JA, Natale DA, Tan M. OMIT: Dynamic, Semi-AutomatedOntology Development for the microRNA Domain. PLOS ONE. 2014;9(7):1–16.

    18. OBO Library. [Online]. Available: http://obofoundry.org. Accessed 19 Mar2016.

    19. NCBO BioPortal. [Online]. Available: https://bioportal.bioontology.org/.Accessed 19 Mar 2016.

    20. Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J, Davis A,Dolinski K, Dwight S, Eppig J, Harris M, Hill D, Issel-Tarver L, Kasarskis A,Lewis S, Matese J, Richardson J, Ringwald M, Rubin G, Sherlock G. Gene

    http://omnisearch.soc.southalabama.edu/index.php/Softwarehttp://omnisearch.soc.southalabama.edu/index.php/Softwarehttp://mirdb.org/miRDB/http://www.targetscan.orghttp://www.microrna.orghttps://david.ncifcrf.gov/home.jsphttps://david.ncifcrf.gov/home.jsphttp://ncbi.nlm.nih.gov/genehttps://www.nlm.nih.gov/mesh/https://www.nlm.nih.gov/mesh/http://www.genenames.org/ncbi.nlm.nih.gov/nucleotide/http://obofoundry.orghttps://bioportal.bioontology.org/

  • Huang et al. Journal of Biomedical Semantics (2016) 7:25 Page 17 of 17

    ontology: tool for the unification of biology. The gene ontologyconsortium. Nat Genet. 2000;25(1):25–9.

    21. Eilbeck K, Lewis S, Mungall C, Yandell M, Stein L, Durbin R, AshburnerM. The Sequence Ontology: a tool for the unification of genomeannotations. Genome Biol. 2005;6(5).

    22. Natale D, Arighi C, Barker W, Blake J, Bult C, Caudy M, Drabkin H,D’Eustachio P, Evsikov A, Huang H, Nchoutmboube J, Roberts N, SmithB, Zhang J, Wu C. The Protein Ontology: a structured representation ofprotein forms and complexes. Nucleic Acids Res. 2011;39:D539—45.

    23. Hoehndorf R, Batchelor C, Bittner T, Dumontier M, Eilbeck K, Knight R,Mungall C, Richardson J, Stombaugh J, Westhof E, Zirbel C, Leontis N.The RNA Ontology (RNAO): An ontology for integrating RNA sequenceand structure data. Appl Ontol. 2011;6(1):53–89.

    24. Tran T, Cimiano P, Rudolph S, Studer R. Ontology-based interpretationof keywords for semantic search In: Aberer K, Choi K-S, Noy N, AllemangD, Lee K-I, Nixon L, Golbeck J, Mika P, Maynard D, Mizoguchi R,Schreiber G, Cudr-Mauroux P, editors. The Semantic Web. Berlin,Germany: Springer Berlin Heidelberg; 2007 vol. 4825. p. 523–36.

    25. Premerlani WJ, Blaha MR. An approach for reverse engineering ofrelational databases. Commun ACM J. 1994;37(5):42–49, 134.

    26. Stojanovic L, Stojanovic N, Volz R. Migrating data-intensive web sites intothe Semantic Web. In: Proc. ACM symposium on Applied computing.Madrid, Spain: ACM Press; 2002. p. 1100–7.

    27. Verheyden P, Bo JD, Meersman R. Semantically Unlocking DatabaseContent Through Ontology-Based Mediation. Proc. SWDB 2004. Berlin,Germany: Springer-Verlag; 2004, pp. 109–26.

    28. Lubyte L, Tessaris S. Extracting Ontologies from Relational Databases.Proc. Description Logics. Brixen-Bressanone, Italy: Free University ofBozen-Bolzano; 2007, pp. 122–6.

    29. Chauhan R, Goudar R, Sharma R, Chauhan A. Domain ontology basedsemantic search for efficient information retrieval through automaticquery expansion. In: Proc. Intelligent Systems and Signal Processing(ISSP), 2013 International Conference on. Vallabh Vidyanagar, Anand,India: IEEE Press; 2013. p. 397–402.

    30. BFO. [Online]. Available: http://www.ifomis.org/bfo/. Accessed 19 Mar2016.

    31. Smith B, Ceusters W, Klagges B, Köhler J, Kumar A, Lomax J, Mungall C,Neuhaus F, Rector A, Rosse C. Relations in biomedical ontologies.Genome Biol. 2005;6(5):1–15.

    32. Huang J, Eilbeck K, Blake J, Dou D, Natale D, Ruttenberg A, Smith B,Zimmermann M, Jiang G, Lin Y, Wu B, He Y, Zhang S, Wang X, ZhangH, Liu Z, Tan M. A domain ontology for the non-coding rna field. In: Proc.2015 IEEE International Conference on Bioinformatics and Biomedicine,BIBM 2015. Washington: IEEE; 2015. p. 621–4.

    33. OBI. [Online]. Available: http://obi-ontology.org/. Accessed 19 Mar 2016.34. CHEBI. [Online]. Available: http://obofoundry.org/ontology/chebi.html.

    Accessed 19 Mar 2016.35. IAO. [Online]. Available: http://obofoundry.org/ontology/iao.html.

    Accessed 19 Mar 2016.36. Cell Line Ontology. [Online]. Available: http://obofoundry.org/ontology/

    clo.html. Accessed 19 Mar 2016.37. Uberon multi-species anatomy ontology. [Online]. Available: http://

    obofoundry.org/ontology/uberon.html. Accessed 19 Mar 2016.38. Human Disease Ontology. [Online]. Available: http://obofoundry.org/

    ontology/doid.html. Accessed 19 Mar 2016.39. The Apache Software Foundation. [Online]. Available: http://www.

    apache.org/. Accessed 19 Mar 2016.40. PHP: Hypertext Preprocessor. [Online]. Available: http://php.net/.

    Accessed 19 Mar 2016.41. Apache Jena Fuseki. [Online]. Available: http://jena.apache.org/

    documentation/fuseki2/index.html. Accessed 19 Mar 2016.42. miRBase. [Online]. Available: http://www.mirbase.org/. Accessed 19 Mar

    2016.43. OMIT Project Site. [Online]. Available: http://omnisearch.soc.

    southalabama.edu. Accessed 19 Mar 2016.44. OMIT in OBO Library. [Online]. Available: http://www.obofoundry.org/cgi-

    bin/detail.cgi?id=omit. Accessed 19 Mar 2016.

    45. OMIT in NCBO BioPortal. [Online]. Available: http://bioportal.bioontology.org/ontologies/OMIT. Accessed 19 Mar 2016.

    46. OMIT Tracker. [Online]. Available: https://github.com/OmniSearch/OMIT-ontology-files/issues. Accessed 19 Mar 2016.

    47. RDF 1.1 Turtle: Terse RDF Triple Language. [Online]. Available: http://www.w3.org/TR/turtle/. Accessed 19 Mar 2016.

    • We accept pre-submission inquiries • Our selector tool helps you to find the most relevant journal• We provide round the clock customer support • Convenient online submission• Thorough peer review• Inclusion in PubMed and all major indexing services • Maximum visibility for your research

    Submit your manuscript atwww.biomedcentral.com/submit

    Submit your next manuscript to BioMed Central and we will help you at every step:

    http://www.ifomis.org/bfo/http://obi-ontology.org/http://obofoundry.org/ontology/chebi.htmlhttp://obofoundry.org/ontology/iao.htmlhttp://obofoundry.org/ontology/clo.htmlhttp://obofoundry.org/ontology/clo.htmlhttp://obofoundry.org/ontology/uberon.htmlhttp://obofoundry.org/ontology/uberon.htmlhttp://obofoundry.org/ontology/doid.htmlhttp://obofoundry.org/ontology/doid.htmlhttp://www.apache.org/http://www.apache.org/http://php.net/http://jena.apache.org/documentation/fuseki2/index.htmlhttp://jena.apache.org/documentation/fuseki2/index.htmlhttp://www.mirbase.org/http://omnisearch.soc.southalabama.eduhttp://omnisearch.soc.southalabama.eduhttp://www.obofoundry.org/cgi-bin/detail.cgi?id=omithttp://www.obofoundry.org/cgi-bin/detail.cgi?id=omithttp://bioportal.bioontology.org/ontologies/OMIThttp://bioportal.bioontology.org/ontologies/OMIThttps://github.com/OmniSearch/OMIT-ontology-files/issueshttps://github.com/OmniSearch/OMIT-ontology-files/issueshttp://www.w3.org/TR/turtle/http://www.w3.org/TR/turtle/

    AbstractKeywords

    IntroductionRelated workRelated work in biomedical ontologiesRelated work in semantic search

    OMIT reconstructionModularized ontology designOntology core design

    OmniSearch: an OMIT-based semantic search systemData sources usedSoftware architectureUser interface design

    Results and discussionThe significantly refactored OMIT ontologySemantic annotation and data integrationExperimental setupSemantic annotation and data integration results

    Semantic searchExperimental setupSPARQL query statementsSearch results and discussion

    ConclusionsEndnoteCompeting interestsAuthors' contributionsAcknowledgementsAuthor detailsReferences