36
DecisionSite and DiscoveryLink Doug Del Prete IBM Life Sciences [email protected] Visualizing data in novel ways Spotfire User Conference Boston, MA October 28, 2003

DecisionSite and DiscoveryLink Doug Del Prete IBM Life Sciences [email protected] Visualizing data in novel ways Spotfire User Conference Boston, MA October

Embed Size (px)

Citation preview

DecisionSite and DiscoveryLink

Doug Del Prete

IBM Life Sciences

[email protected]

Visualizing data in novel ways

Spotfire User ConferenceBoston, MAOctober 28, 2003

2DecisionSite and DiscoveryLink

What is DiscoveryLink?

DiscoveryLink (DL) is a powerful technology available from IBM that will allow you to view many data sources – even non-relational ones like BLAST and HMMER – as one heterogeneous “virtual relational database”

Basically, a wide variety of data sources are all made to look like SQL-based tables/views, which makes it easy to access them and integrate all of this data together in an optimized way

3DecisionSite and DiscoveryLink

What is DiscoveryLink?

DiscoveryLink is based on the latest Information Integrator middleware technology, now a major IBM software initiative for data federation/integration

All data sources essentially become “SQL aware”, and do so under a cost-based optimizer that works against both relational and non-relational data sources and their associated queries

4DecisionSite and DiscoveryLink

Information Integration: Issues

For Scientists

Data Layer

Application Layer

Web Servers

Application

Multiple, varied sources Multiple queriesData freshness Legacy Databases

Without a Data Integration Layer

DB2DB2

OracleOracle

SQLSQLServerServer

ASCIIASCIIdata filedata file

Excel Excel SpreadsheetSpreadsheet

ApplicationApplication

ApplicationApplication

For IT

5DecisionSite and DiscoveryLink

DiscoveryLinkDiscoveryLink

Web Servers

DB2DB2

OracleOracle

SQLSQLServerServer

ASCIIASCIIdata filedata file

Excel Excel SpreadsheetSpreadsheet

™™

Multiple, varied sources Multiple queriesData freshness Legacy Databases

Application

Application

ApplicationApplication

Application

Information Integration: Solution

For Scientists

Toxicology Toxicology DataData

Proteomic Proteomic DataData

Compound Compound DataData

Genomic Genomic DataData

Textual Textual DataData

LegacyLegacyDataData

Gene Gene Expression Expression

DataData

Other Other DataData

SourcesSourcesDiscoveryLink

For IT

6DecisionSite and DiscoveryLink

Toxicology Toxicology DataData

Proteomic Proteomic DataData

Compound Compound DataData

Genomic Genomic DataData

Textual Textual DataData

LegacyLegacyDataData

Gene Gene Expression Expression

DataData

Other Other DataData

SourcesSourcesDiscoveryLink

Information Integration: Question

Q: Show me all the compounds similar to ketanserin that have been tested against members of the serotonin family and have characteristics of a good drug.

7DecisionSite and DiscoveryLink

Toxicology Toxicology DataData

Proteomic Proteomic DataData

Compound Compound DataData

Genomic Genomic DataData

Textual Textual DataData

LegacyLegacyDataData

Gene Gene Expression Expression

DataData

Other Other DataData

SourcesSourcesDiscoveryLink

Term Operator Value

Compound SimilarTo Ketanserin

Receptor Homologous Serotonin

IC50 <= 1E-8

Molwt > 375

Molwt < 425

logP > 4

logP < 6

BLAST Wrapper XML Wrapper Oracle Wrapper ODBC Wrapper

BLASTData Source

Oracle Compound DB

in Germany

Assay Results in

MySQL

Discovery Link

Result Set

(Visualization)

ParametersIIM

Query

Solution

XML Document

Q: Show me all the compounds similar to ketanserin that have been tested against members of the serotonin family and have characteristics of a good drug.

Architecture

Information Integration: IT Translation

8DecisionSite and DiscoveryLink

Toxicology Toxicology DataData

Proteomic Proteomic DataData

Compound Compound DataData

Genomic Genomic DataData

Textual Textual DataData

LegacyLegacyDataData

Gene Gene Expression Expression

DataData

Other Other DataData

SourcesSourcesDiscoveryLink

Solution

Architecture

SELECT a.compound_id, b.ic50, b.screen_nameFROM CMPNDDBS a, ACTIVITY_DATA b, BLASTP c WHERE a.compound_id = b.compound_id AND SimilarTo(a.compound_struct,:KETANSERIN_MOL) > 0.88 AND c.input_seq = :SEROTONIN AND c.protein_id = b.screen_name AND b.ic50 <= 0.000000001 AND a.mol_wt BETWEEN 375 AND 425 AND a.logP BETWEEN 4 AND 5

DL Query

BLAST Wrapper XML Wrapper Oracle Wrapper ODBC Wrapper

BLASTData Source

Oracle Coumpound DB in

Germany

Assay Results in

MySQL

Discovery Link

Result Set

(Visulaization)

IIMQuery

XMLDocument

Q: Show me all the compounds similar to ketanserin that have been tested against members of the serotonin family and have characteristics of a good drug.

Information Integration: IT Translation

9DecisionSite and DiscoveryLink

Q: Show me all the compounds similar to ketanserin that have been tested against members of the serotonin family and have characteristics of a good drug.

Results

Information Integration: Final Result

Compnd ID HTR1A US1111

HTR1B US1234

HTR1D US2534

HTR1E US 1111

HTR1F US2534

HTR2A US 1111

HTR2B US4791

HTR2C US 1111

HTR4 US1234

HTR5A US1111

HTR6 US1111

US-345123 0.001 0.5 [email protected] 2.1 3.8 1.1 0.001 53.5@5 0.01 0.02 [email protected]

UK-567345 <0.003 >5.0 [email protected] <6.0 8.8 5.5 5.9 16@5 5.6 4.3 [email protected]

US-234012 0.0025 >5.7 23@10 <6.0 8.9 5.4 7.0 15@5 4.8 19.0 [email protected]

US-321543 0.05 2.0 [email protected] 8.9 0.0 6.7 10.0 48@5 3.33 2.6 [email protected]

10DecisionSite and DiscoveryLink

Wrapper instance definition

RDB, Spreadsheet, Flat Files, Algorithms, etc. In diverse locations

<XML>text

</XML>

RqRq Rq

rsrsrs

Rq rs

Spotfire DecisionSite, Synapsia, Customer Application, SQL command line, etc.

(JDBC/ODBC)

Information IntegratorInformation Integrator

Views

Administration throughInformation Integrator

Control center

CatalogDB2

Dataloader

SwissProtKEGGdbESTLocus LinkUNIGENEand more …

WrappersWrappers

•DB2•Oracle

•Oracle Cartridge•MS SQL Server•Sybase•Informix•Teradata•ODBC (MySQL, Postgres…)•Excel•Flat Files in CSV format•Documentum•Blast•XML•ENTREZ (NCBI portal)•HMMer•Extended Search•BioRS

Wrapper Development Toolkit

Server Definition

User Mapping

Nicknames Definition

Optimizer Log

Wrappers plan

DiscoveryLink – Overall Architecture

11DecisionSite and DiscoveryLink

DiscoveryLink: A Robust Solution

• Access to multiple, heterogeneous sources

• Complex queries across distributed data sources

• Leverage existing IT infrastructure and use specialized functions of existing databases

• Integrating analysis tools and business intelligence

• Can put a SQL front-end and user security on data sources such as BLAST, Pubmed, Genbank, HMMER, XML

• Can use for fast and easy ad-hoc extensions to a data warehouse/mart

Benefits

12DecisionSite and DiscoveryLink

DiscoveryLink Value Proposition

A proven scalable data integration solution that enables efficient and effective queries across disparate data sources, thereby improving R&D efficiencies and productivity.

This translates into greater flexibility and competitive advantage in the marketplace.

13DecisionSite and DiscoveryLink

DecisionSite and DiscoveryLink

14DecisionSite and DiscoveryLink

DecisionSite and DiscoveryLink

DecisionSite, in its own right, can access many data sources to drive its powerful set of visualizations – i.e. any data source that is JDBC compliant can be configured as a data source

But DiscoveryLink as the data access engine can extend the reach of DecisionSite in both the types of data sources and performance/scalability

These two products together provide a robust, flexible “best of breed” approach to analyzing your data: DecisionSite as the user interface/front-end to DL DiscoveryLink as a federated data source access engine for DS

15DecisionSite and DiscoveryLink

DecisionSite Server

Information Interaction Services

Relational DB MiddlewareDB2/

DiscoveryLink

JDBC

WebServices

Txt/csv/etc…

DecisionSite and DiscoveryLink

BLAST/ HMMER XML

Relational(Optimized

Joins)

Pubmed/Genbank …

16DecisionSite and DiscoveryLink

Benefits Overview

Access more data sources – BLAST, HMMER, Genbank, Documentum, BioRS, etc.

Access existing DecisionSite data sources faster/easier – XML, Postgres, MS Access, etc.

Extend/augment an existing visualization with information from any of the above data sources

Optimized queries cross-joined across all data sources – relational and non-relational – all under one JDBC connection

17DecisionSite and DiscoveryLink

Visualize a BLAST result under DS/DL!

18DecisionSite and DiscoveryLink

DecisionSite and DiscoveryLink

Based on the natural fit of DiscoveryLink into DecisionSite’s IIM (Information Interaction Model)

Under the Information Interaction Designer (IID), you reference “Nicknames”, which are “virtual tables”, pointing to other tables/views and non-relational objects pre-configured under DiscoveryLink

These data sources naturally available to Information Builder and Information Library/Links, and beyond

19DecisionSite and DiscoveryLink

DecisionSite and BLAST

See how easy it is to configure a data source like BLAST to be used under DecisionSite

20DecisionSite and DiscoveryLink

Indicate BLAST Server/Algorithm

21DecisionSite and DiscoveryLink

Configure Defline

22DecisionSite and DiscoveryLink

View complete BLAST Nickname

23DecisionSite and DiscoveryLink

Configure under Information Designer

24DecisionSite and DiscoveryLink

Configure under Information Builder

25DecisionSite and DiscoveryLink

This is all done quickly and easily, even though there is no JDBC driver for BLAST readily available!

DecisionSite and BLAST

26DecisionSite and DiscoveryLink

DecisionSite and XML

Similarly, see how easy it is to configure XML to be used under DecisionSite

27DecisionSite and DiscoveryLink

Provide XML Schema/File Location

28DecisionSite and DiscoveryLink

Create all the relational Nicknames

29DecisionSite and DiscoveryLink

Configure Parent/Child Join

30DecisionSite and DiscoveryLink

Again, this can all be done quickly and easily, without having to find and manually configure a JDBC driver via a text editor, and then restarting DecisionSite, etc.

DecisionSite and XML

31DecisionSite and DiscoveryLink

Merged Queries - Example

In IID, set up the BLAST and Entrez Data Models

Also the Nucleotide/BLAST join via Accession #

Configure a BLAST Information Link Configure a Nucleotide Information Link Run BLAST visualization From the visualization, run the Entrez

Information Link to get all the applicable metadata information about each displayed Accession (in Details-on-Demand, Table, etc.)

32DecisionSite and DiscoveryLink

Query Optimization

Under IID, configure key Joins under DB2/ DiscoveryLink

This includes established DecisionSite data like Oracle and MySQL (very quickly configured in DiscoveryLink Control Center)

The Information Link makes only one JDBC connection to access all the data sources!

The underlying SQL query invokes the DB2 Optimizer to improve performance even more

33DecisionSite and DiscoveryLink

Query Optimization

Great for large relational data sets such as joining assay results with a chemical compound database, etc.

34DecisionSite and DiscoveryLink

DecisionSite and DiscoveryLink - Summary

You can extend all the powerful features of DecisionSite, like guided analytics and posters, to visualize information from more data sources, in an easier manner, and do so in an optimized environment

35DecisionSite and DiscoveryLink

DEMO

Visualize a BLAST protein similarity search against SwissProt

Thank you.

Doug Del PreteIBM Life Sciences [email protected]

Questions?