How SAP HANA can provide value for Pharma R&D

Marc Maurer / September 9th 2013, v4

Why it could be beneficial for pharma R&Dto engage into a discussion about SAP HANA

© 2013 SAP AG. All rights reserved. 2Confidential

Intention of this slide deck

In the past 40 years, SAP has been known as the world’s leader for ERP applications.

Over the last few years, SAP did undergo a major transformation to dramatically broaden its portfolio and to come up with a breakthrough technology named SAP HANA.

This technology represents an in-memory based real-time data/analytics platform that is especially suited to adress the data management challenges of big pharma R&D.

The Hasso Plattner Institute (HPI), SAP, and a number of academic and commercial organizations from the global lifesciences industry are currently collaborating to plan and implement a number of different HANA use cases.

We believe that it would be beneficial for pharma R&D to start a discussion with SAP/HPI to learn about use cases and to explore how to adress existing problems or future challenges.

This slide deck adresses typical data management challenges found in big pharma R&D, highlights the areas where HANA would be of biggest value, lists several R&D HANA use cases, and proposes a number of ways how to start the conversation.


R&D innovations in life sciencesChallenges in pharma R&D and how HANA adresses them

Challenges of data analysis and data management in big pharma

Characteristics of HANA

Thight integration of scientific data and analysis algorithms as relevant scientific data is usually distributed over many locations and stored in many different formats

User can implement domain-specific application logic (from high level SQLscript, full support of all "R" libraries to native function libraries)

All application logic is executed directly on data; no need of data transfer between different systems

As the different activities for development (e.g. assays, disease models, etc.) need to be transparent, versioning of algorithms and data is important

Every calculation model (algorithm) in HANA is registered in a repository; easy to re-create previous analysis steps

Every data record is associated with a transaction identifier; records can be mapped to revisions of calculation models to allow versioning

Support non-relational data structures and operations HANA supports data structures such as graphs to avoid emulating them on top of relational data (which often results in poor performance)

Support of big data initiatives HANA is integrated with map reduce implementations such as Hadoop to allow parallel exploitation of big data sources

Intuitive interface to design analysis pipelines, a system that is accessible to a wide range of users with a broad range of skill sets (scientists, analysts, developers)

Analysis pipelines are defined via a graphical user interface in HANA Studio

Researchers can compare results generated by different pipelines


R&D innovations in life sciencesWhere HANA could be used in pharma R&D

Target identification

Define diseaseIdentify targetsCollect & analyze dataSelect targets

Target validation

Design validation exper.Validate drug targetsCollect & analyze dataSelect validate targets

Assay development

Design/test/adapt assayTransfer assayIn silico data acquisitionIn silico design exper.

Target discovery

Genomics

SequencingAlignmentVariant callingAnnotation & analysis

Bioinformatics

Proteomics

Protein sequencingAnalysis1

HT screening

Primary screeningSecondary screeningTertiary screeningCollect & analyze data

Lead development

Filter cluster compoun. compoundsSynthesize compoundsTest compounds

Optimize leads

Filter cluster leadsSynthesize leadTest compundsSynthesize leads

Lead discovery

LT toxicity (2 species) In vitro pharmacology

Synthesize compounds

Preclinical dev.

Translation. medicine

T1 Preclin. & P1 studiesT2 P2/P3 trialsT3 P4 & Outcomes Res.T4 Population analysis

Tox check/safety

PharmacodynamicsPharmacokineticsAnimal testing

areas with potential use of HANA

1 For more information see www.proteomicsdb.org or https://www.youtube.com/v/ao4oStycKnw

http://www.proteomics.org/

https://www.youtube.com/v/ao4oStycKnw

https://www.youtube.com/v/ao4oStycKnw


R&D innovations in life sciencesProven benefits of HANA for genomics

Supported By: Carlos Bustamante lab

408,000x faster than traditional disk-based systems in technical

Proof of Concept

216x faster DNA analysis result – from 2-

3 days to 20 minutes

1,000x faster tumor data analyzed in

seconds instead of hours

2-10 sec for report execution


R&D innovations in life sciencesSelected use cases for pharma R&D

Use cases for research Use cases for development

Secondary and tertiary analysis of genome data: Reduce time to analyse genome processing pipelines to minutes and hours. Automatic search in structured and unstructured data sources including entity extraction. For proteomics there is also a public available proteomics database powered by HANA (see www.proteomicsdb.org)

Clinical trial data cleansing: Automatic reformatting of clinical trial data from one format to another, automatic systematic quality monitoring to save outsourcing costs and clinical trial throughput speed.

Speeding up pathway analysis: Executing complex queries like «find a new molecule able to dock to kinase XYZ to inhibit enzymatic activity» much faster.

Clinical trial design: Analysis of patient cohorts in realtime; to make trial protocol adaptations ad hoc and saving time during trial design phase.

3D structures: Representing genomic/proteine structures in 3D e.g. to visually explore genetic pathways or comparing gene sections with a genome reference database (to identfy variants/mutations).

Patient recruiting optimization: Iincreasing forecast accuracy for recruiting patients into trials and addressing questions like how to select the right investigator, etc.

Virtual patient simulation: Combining molecular patient data with models of tumor cells to simulate the effects of different drugs.

Clinical trial optimization: Data platform to increase performance for trial simulations and integrating internal and external data sources.

Interorganizational data analysis: Several HANA instances in different research/healthcare organizations allow cross-analysis without moving confidential data between the organizations.

Fallen angels: Re-analysis of failed clinical trials where HANA could identify variants that responders and non-responders have in common to propose companion diagnostic in order to recover investments into failed trials.

Other use cases: Trial fraud management, risk-based trial monitoring, iRise clinical trial app, patient engagement apps (www.carecircles.com)

http://www.proteomics.org/

http://www.carecircles.com/


R&D innovations in life sciencesHow to start the conversation

Webconference with specialists from HPI/SAP to discuss other use cases available, answer questions, and find possibilities for on-site interactions

On-site workshop with one of the following three scenarios: Focused approach based on concrete customer ideas and requirements Use case approach leveraging experience of other intiatives with other partners

1-day design thinking workshop to discover new and radically different ways for solving a data-related research problem of customer

M310 course: 6 students from Stanford university work two days a week for 9 months on a specific customer problem including documentation and prototype

Backup

Click icon to add picture


Backup: Analyst opinionsSAP is a leader in big data analytics

Forrester Wave: Big Data Predictive Analytics Solutions, Q1/2013

Gartner Magic Quadrant for Data Warehouse Database Management Systems, Feb 2013


Any attribute as index

Insert onlyfor time travel

Combined columnand row store

+

No aggregatetables

Minimalprojections

Partitioning

Analytics onhistorical datat

Single andmulti-tenancy

SQL interface on columns & rows

SQL

Reduction oflayers

xx

LightweightCompression

Multi-core/parallelization

On-the-flyextensibility

+++

Active/passivedata storePA

++ ++

T Text Retrievaland Extraction

Object to relational mapping

Dynamic multi-threading within nodes

Map reduceNo diskGroup Key

Bulk load

Backup: SAP HANA FeaturesIn-Memory Building Blocks (1/2)


Bulk load Fast insertion of large genomic datasets or

other relevant datasets

T

Text Retrievaland Extraction

Text analytics engine for both structured

/unstructured data, integration with “R”

SQL interface on columns & rows

Easily connect with other tools (e.g.

Rstudio)

SQL

LightweightCompression

Fit big data in main memory while allowing

fast retrieval

Multi-core/parallelization

Speedup of relevant queries across many

nodes

On-the-flyextensibility

Adapting to new format requirements without going offline

(e.g. changing VCF files)

+++

Backup: SAP HANA FeaturesIn-Memory Building Blocks (2/2)

Contact information:

Dr. Marc MaurerSenior Global Account ExecutiveEmail: [email protected]. +41 79 9642 42 90

mailto:[email protected]

Business

How SAP HANA can provide value for Pharma R&D