Upload
arthur-gordon
View
212
Download
0
Embed Size (px)
Citation preview
Controller
View (web)
Model
THE EUPATHDB / GUS-WDK SEARCH STRATEGY SYSTEM
Cristina Aurrecoechea1, Brian P. Brunk2, Steve Fischer2, Xin Gao2, Omar S. Harb2, Mark
Heiges1, Jessica C. Kissinger1, Eileen T. Kraemer1, Cary Pennington1, David S. Roos2, Chris Ross1, Christian J. Stoeckert2 & Charles Treatman2
1Univ. Georgia, Athens GA, & 2Univ. Pennsylvania, Philadelphia PA
User perspectives on Strategies• Computer-human interaction (CHI) studies during
prototyping drove the design, and showed high user enthusiasm.• Usage stats show 3-fold increase in use of Booleans in two
months since release.• User feedback very positive.
WDK Implementation•Runs on any relational database schema•Model: configured by you in XML.
• Abstracts DB to high level Records (Genes, ORFs, etc)• Also specifies queries and returned columns• Automated sanity testing• Can talk to processes (BLAST) via a WS Framework
•View: Tomcat, JSP, tag library, JavaScript, Ajax, CSS• You embed JSP tags in your site and style them w/ CSS
•Controller: Struts
WDK Upcoming features• Add genes to a “basket” to generate a report, add to a strategy as a
step or send to a tool (e.g., multiple sequence alignment)•Web services access to queries•Assign weights to results from individual steps for improved filtering•Transform a set of one type into another type based on genome span relations
The EuPathDB suite of genome database web sites recently introduced a graphical search interface that motivates users to undertake dynamic computational experiments, exploring relationships across datasets to identify biologically meaningful genes and other entities. For example, users seeking novel therapeutic targets may wish to prioritize putative enzymes that distinguish pathogens from their hosts, and are expressed during appropriate developmental stages. Strategies are initiated by running one of 80+ queries, and extended by adding additional searches, linked via Boolean operators represented graphically as Venn diagrams. Sub-strategies allow modular construction and tree structures, and searches may be extended using filters (e.g. by strain or species) and transforms (e.g. orthologs). A graphical display makes the overall logic obvious, and facilitates revision of individual steps, with changes propagated forward through the strategy. Users may name and save their strategies, creating protocols that can be shared with colleagues. (See, e.g., http://plasmodb.org/plasmo/im.do?s=2aa0454db6a6cca0.)
The strategy system has been subjected to extensive usability studies, and
deployed on all EuPathDB databases (CryptoDB, GiardiaDB, PlasmoDB, ToxoDB, TrichDB and TriTrypDB). Although these sites have offered text-based Boolean operations for many years, usability analysis indicated that most users were not taking full advantage of that feature. Following release of the graphical Search Strategy system, the number of searches per visit dramatically increased. Response from our user community has been extremely positive, as investigators have discovered the power of combining datasets and making dynamic adjustments to define optimal parameters and highlight biologically-relevant relationships. With the accelerating growth in diversity and scale of available datasets, the potential for exploiting interrelationships increases dramatic ally, and we expect this interface to have a significant impact in bringing “genomic thinking” to a broad audience.
This system was developed using the GUS Web Development Kit (WDK), a schema-independent middleware system for generating genomics websites
The EuPathDB suite of databases covers genomic and functional genomics datasets for a variety of eukaryotic pathogens.
Shown here is PlasmoDB, which contains the genus Plasmodium, including P.falciparum, the malaria parasite.
Use Case Use data in PlasmoDB to find parasite (Plasmodium) drug target genes
This panel shows a schematic of a strategy, using queries and booleans. The actual strategy is built below.
Transferases (E.C.)[union] Kinase activity (GO)[intersect] ---------------------------------------------------------------------------[intersect] present in Haemosporida, not Mammals[intersect] not under diversifying selection (SNPs)[transform] orthology to any Plasmodium genes
Run a query (choose from menu) 2 Add a step (another query) Add more steps…
Build a Strategy
31
4
Revise steps at any time….Changes propagate forward.
A strategy can integrate data from genome annotation, expression, SNPs, proteomics,
etc.
Nest strategies to add complexity.
View results from all or any species.
Use orthology to transform results to other species.
Download customized reports of results.
Choose from many available columns.Sort and move columns.
Dynamically revise, add or delete steps.
Email a strategy link tocollegaues.
It’s Easy to Build a Strategy…
Genomics Database
WDK Engine Query Cache
Genomics DataDenormalized
For Query Speed
Genomics Data
User Login and Search
History
WDK Model(Java Objects)
WDK Model(XML)
WDK QueryEngine(Java)
Web Services
Framework
JavaBeans(JSP compatible)
JSP Tag Library
Struts controller
WDK Sanity Test
…Strategies are Powerful
Save and browse strategies.
Challenge: exploit the power of integrated genome annotation, expression data, proteomics data, SNPs, etc. Solution: Strategies… A Graphical Query Interface for Genomics Databases
# Nested Strategy P.f. transcript expr. at 24 hours +/- 8[union] P.f. transcript expr. in Trophozoites[union] P.f. protein expr. in Trophozoites
JSP and CSS
= You provide
= WDK provides
= Optional
Different types of strategies: Genes, Isolates, SNPs, Transcript assemblies,
Chromosomes, Array Elements, ORFs, etc.
Strategies Web Dev Kit (WDK)www.gusdb.org/wdk
EuPathDB is an NIAID Bioinformatics Resource Center Supported by NIAID Contract No. HHSN266200400037C and The Bill & Melinda Gates Foundation
Processes (eg, BLAST)