Upload
marcmaurer
View
711
Download
2
Embed Size (px)
DESCRIPTION
This presentation describes how the in-memory data platform SAP HANA can provide value for different use cases found in research & deveopment of big pharma companies.
Citation preview
Marc Maurer / September 9th 2013, v4
Why it could be beneficial for pharma R&Dto engage into a discussion about SAP HANA
© 2013 SAP AG. All rights reserved. 2Confidential
Intention of this slide deck
In the past 40 years, SAP has been known as the world’s leader for ERP applications.
Over the last few years, SAP did undergo a major transformation to dramatically broaden its portfolio and to come up with a breakthrough technology named SAP HANA.
This technology represents an in-memory based real-time data/analytics platform that is especially suited to adress the data management challenges of big pharma R&D.
The Hasso Plattner Institute (HPI), SAP, and a number of academic and commercial organizations from the global lifesciences industry are currently collaborating to plan and implement a number of different HANA use cases.
We believe that it would be beneficial for pharma R&D to start a discussion with SAP/HPI to learn about use cases and to explore how to adress existing problems or future challenges.
This slide deck adresses typical data management challenges found in big pharma R&D, highlights the areas where HANA would be of biggest value, lists several R&D HANA use cases, and proposes a number of ways how to start the conversation.
© 2013 SAP AG. All rights reserved. 3Confidential
R&D innovations in life sciencesChallenges in pharma R&D and how HANA adresses them
Challenges of data analysis and data management in big pharma
Characteristics of HANA
Thight integration of scientific data and analysis algorithms as relevant scientific data is usually distributed over many locations and stored in many different formats
User can implement domain-specific application logic (from high level SQLscript, full support of all "R" libraries to native function libraries)
All application logic is executed directly on data; no need of data transfer between different systems
As the different activities for development (e.g. assays, disease models, etc.) need to be transparent, versioning of algorithms and data is important
Every calculation model (algorithm) in HANA is registered in a repository; easy to re-create previous analysis steps
Every data record is associated with a transaction identifier; records can be mapped to revisions of calculation models to allow versioning
Support non-relational data structures and operations HANA supports data structures such as graphs to avoid emulating them on top of relational data (which often results in poor performance)
Support of big data initiatives HANA is integrated with map reduce implementations such as Hadoop to allow parallel exploitation of big data sources
Intuitive interface to design analysis pipelines, a system that is accessible to a wide range of users with a broad range of skill sets (scientists, analysts, developers)
Analysis pipelines are defined via a graphical user interface in HANA Studio
Researchers can compare results generated by different pipelines
© 2013 SAP AG. All rights reserved. 4Confidential
R&D innovations in life sciencesWhere HANA could be used in pharma R&D
Target identification
Define diseaseIdentify targetsCollect & analyze dataSelect targets
Target validation
Design validation exper.Validate drug targetsCollect & analyze dataSelect validate targets
Assay development
Design/test/adapt assayTransfer assayIn silico data acquisitionIn silico design exper.
Target discovery
Genomics
SequencingAlignmentVariant callingAnnotation & analysis
Bioinformatics
Proteomics
Protein sequencingAnalysis1
HT screening
Primary screeningSecondary screeningTertiary screeningCollect & analyze data
Lead development
Filter cluster compoun. compoundsSynthesize compoundsTest compounds
Optimize leads
Filter cluster leadsSynthesize leadTest compundsSynthesize leads
Lead discovery
LT toxicity (2 species) In vitro pharmacology
Synthesize compounds
Preclinical dev.
Translation. medicine
T1 Preclin. & P1 studiesT2 P2/P3 trialsT3 P4 & Outcomes Res.T4 Population analysis
Tox check/safety
PharmacodynamicsPharmacokineticsAnimal testing
areas with potential use of HANA
1 For more information see www.proteomicsdb.org or https://www.youtube.com/v/ao4oStycKnw
© 2013 SAP AG. All rights reserved. 5Confidential
R&D innovations in life sciencesProven benefits of HANA for genomics
Supported By: Carlos Bustamante lab
408,000x faster than traditional disk-based systems in technical
Proof of Concept
216x faster DNA analysis result – from 2-
3 days to 20 minutes
1,000x faster tumor data analyzed in
seconds instead of hours
2-10 sec for report execution
© 2013 SAP AG. All rights reserved. 6Confidential
R&D innovations in life sciencesSelected use cases for pharma R&D
Use cases for research Use cases for development
Secondary and tertiary analysis of genome data: Reduce time to analyse genome processing pipelines to minutes and hours. Automatic search in structured and unstructured data sources including entity extraction. For proteomics there is also a public available proteomics database powered by HANA (see www.proteomicsdb.org)
Clinical trial data cleansing: Automatic reformatting of clinical trial data from one format to another, automatic systematic quality monitoring to save outsourcing costs and clinical trial throughput speed.
Speeding up pathway analysis: Executing complex queries like «find a new molecule able to dock to kinase XYZ to inhibit enzymatic activity» much faster.
Clinical trial design: Analysis of patient cohorts in realtime; to make trial protocol adaptations ad hoc and saving time during trial design phase.
3D structures: Representing genomic/proteine structures in 3D e.g. to visually explore genetic pathways or comparing gene sections with a genome reference database (to identfy variants/mutations).
Patient recruiting optimization: Iincreasing forecast accuracy for recruiting patients into trials and addressing questions like how to select the right investigator, etc.
Virtual patient simulation: Combining molecular patient data with models of tumor cells to simulate the effects of different drugs.
Clinical trial optimization: Data platform to increase performance for trial simulations and integrating internal and external data sources.
Interorganizational data analysis: Several HANA instances in different research/healthcare organizations allow cross-analysis without moving confidential data between the organizations.
Fallen angels: Re-analysis of failed clinical trials where HANA could identify variants that responders and non-responders have in common to propose companion diagnostic in order to recover investments into failed trials.
Other use cases: Trial fraud management, risk-based trial monitoring, iRise clinical trial app, patient engagement apps (www.carecircles.com)
© 2013 SAP AG. All rights reserved. 7Confidential
R&D innovations in life sciencesHow to start the conversation
Webconference with specialists from HPI/SAP to discuss other use cases available, answer questions, and find possibilities for on-site interactions
On-site workshop with one of the following three scenarios: Focused approach based on concrete customer ideas and requirements Use case approach leveraging experience of other intiatives with other partners
1-day design thinking workshop to discover new and radically different ways for solving a data-related research problem of customer
M310 course: 6 students from Stanford university work two days a week for 9 months on a specific customer problem including documentation and prototype
Backup
Click icon to add picture
© 2013 SAP AG. All rights reserved. 9Confidential
Backup: Analyst opinionsSAP is a leader in big data analytics
Forrester Wave: Big Data Predictive Analytics Solutions, Q1/2013
Gartner Magic Quadrant for Data Warehouse Database Management Systems, Feb 2013
© 2013 SAP AG. All rights reserved. 10Confidential
Any attribute as index
Insert onlyfor time travel
Combined columnand row store
+
No aggregatetables
Minimalprojections
Partitioning
Analytics onhistorical datat
Single andmulti-tenancy
SQL interface on columns & rows
SQL
Reduction oflayers
xx
LightweightCompression
Multi-core/parallelization
On-the-flyextensibility
+++
Active/passivedata storePA
++ ++
T Text Retrievaland Extraction
Object to relational mapping
Dynamic multi-threading within nodes
Map reduceNo diskGroup Key
Bulk load
Backup: SAP HANA FeaturesIn-Memory Building Blocks (1/2)
© 2013 SAP AG. All rights reserved. 11Confidential
Bulk load Fast insertion of large genomic datasets or
other relevant datasets
T
Text Retrievaland Extraction
Text analytics engine for both structured
/unstructured data, integration with “R”
SQL interface on columns & rows
Easily connect with other tools (e.g.
Rstudio)
SQL
LightweightCompression
Fit big data in main memory while allowing
fast retrieval
Multi-core/parallelization
Speedup of relevant queries across many
nodes
On-the-flyextensibility
Adapting to new format requirements without going offline
(e.g. changing VCF files)
+++
Backup: SAP HANA FeaturesIn-Memory Building Blocks (2/2)
Contact information:
Dr. Marc MaurerSenior Global Account ExecutiveEmail: [email protected]. +41 79 9642 42 90