29
The application of cloud computing to Royal Society of Chemistry data platforms Valery Tkachenko, Ken Karapetyan, Jon Steele, Alexey Pshenichnov, Antony J. Williams ACS 247th National Meeting Dallas, TX March 18 th 2014

The application of cloud computing to royal society of chemistry data platforms

Embed Size (px)

DESCRIPTION

Cloud computing offers significant advantages for the hosting of RSC chemistry databases in terms of reliability, performance and access to large scale computational power. The ChemSpider database contains almost 30 million unique chemical compounds and access to compute power to regenerate properties and add new properties is essential for efficient delivery on a manageable timescale. The use of cloud-based facilities reduces the needs for internal infrastructure and enhances performance generally at the cost of significant recoding of the platforms. This presentation will review our move of our ChemSpider related projects to the cloud, the associated challenges and both the obvious and unforeseen benefits. We will also discuss our use of parallelization technologies for mass calculation using Hadoop.

Citation preview

Page 1: The application of cloud computing to royal society of chemistry data platforms

The application of cloud computing to Royal Society of Chemistry data platforms

Valery Tkachenko, Ken Karapetyan, Jon Steele,

Alexey Pshenichnov, Antony J. Williams

ACS 247th National Meeting

Dallas, TX

March 18th 2014

Page 2: The application of cloud computing to royal society of chemistry data platforms

ChemSpider

RSC Archive

RSC Chemistry Platform

Big Data world and chemistry

Data quality

Cloud Computing considerations

Page 3: The application of cloud computing to royal society of chemistry data platforms

• ~30 million chemicals and growing

• Data sourced from >500 different sources

• Live depositions

• Live crowd curation and annotation

• A structure centric hub for web-searching

Page 4: The application of cloud computing to royal society of chemistry data platforms

ChemSpider – user view

Page 5: The application of cloud computing to royal society of chemistry data platforms

ChemSpider – under the hood

Page 6: The application of cloud computing to royal society of chemistry data platforms

ChemSpider – load over years

2007•1 visitor (there is always the first one)

2009•3000 – 7000 visits/day

2014•50000 visits/day•40000 unique visitors/day•150000 page views/day•100 – 400 real-time visitors

Page 7: The application of cloud computing to royal society of chemistry data platforms

ChemSpider – bottlenecks analysis• “Live” database

o Read-only is easier to scale-out• Application server(s)

o Standard ways to scaleo Session persistence

• SQL server(s)o Expensive, but not all data are relational - NoSQLo Overhead for replicationo Alternatives do not work well for “live” databases

• Backend (processing) server(s)o Use of grid computing

• UI technologyo ASP.NET Formso MVC/REST

• Software as a Service (SaaS)o APIo Widgetso High-scalability

Page 8: The application of cloud computing to royal society of chemistry data platforms

ChemSpider – scaling out

Page 9: The application of cloud computing to royal society of chemistry data platforms

ChemSpider – geography

Globalization

Localization

CDN

Page 10: The application of cloud computing to royal society of chemistry data platforms

ChemSpider

RSC Archive

RSC Chemistry Platform

Big Data world and chemistry

Data quality

Cloud Computing considerations

Page 11: The application of cloud computing to royal society of chemistry data platforms

RSC Archive – since 1841

Page 12: The application of cloud computing to royal society of chemistry data platforms

Published article example

Compounds

Reaction

Analytical Data

Text and References

Page 13: The application of cloud computing to royal society of chemistry data platforms

New navigation style

What’s the structure?What’s the structure?

Are they in our file?

Are they in our file?

What’s similar?What’s

similar?

What’s the target?

What’s the target?Pharmacology

data?Pharmacology

data?

Known Pathways?

Known Pathways?

Working On Now?

Working On Now?Connections

to disease?Connections to disease?

Expressed in right cell type?Expressed in

right cell type?

Competitors?Competitors?

IP?IP?

Page 14: The application of cloud computing to royal society of chemistry data platforms

ChemSpider

RSC Archive

RSC Chemistry Platform

Big Data world and chemistry

Data quality

Cloud Computing considerations

Page 15: The application of cloud computing to royal society of chemistry data platforms

New architecture

Compounds Reactions Spectra Crystals Documents

CompoundsAPI

ReactionsAPI

SpectraAPI

CrystalsAPI

DocumentsAPI

CompoundsWidgets

ReactionsWidgets

SpectraWidgets

CrystalsWidgets

DocumentsWidgets

Data tier

Data access tier

User interface

components tier

Analytical Laboratory application

User interface tier

(examples) Electronic Laboratory Notebook

Paid 3rd party integrations (various platforms – SharePoint, Google, etc)

Chemical Inventory application

Page 16: The application of cloud computing to royal society of chemistry data platforms

ChemSpider

RSC Archive

RSC Chemistry Platform

Big Data world and chemistry

Data quality

Cloud Computing considerations

Page 17: The application of cloud computing to royal society of chemistry data platforms

We are a part of a much larger world

Page 18: The application of cloud computing to royal society of chemistry data platforms

APIs, endpoints and widgets

Page 19: The application of cloud computing to royal society of chemistry data platforms

Challenges of the Big Dataindexing, navigation, visualization

Page 20: The application of cloud computing to royal society of chemistry data platforms

Managing Big Data

Page 21: The application of cloud computing to royal society of chemistry data platforms

Consuming Big Data

Page 22: The application of cloud computing to royal society of chemistry data platforms

ChemSpider

RSC Archive

RSC Chemistry Platform

Big Data world and chemistry

Data quality

Cloud Computing considerations

Page 23: The application of cloud computing to royal society of chemistry data platforms

Chemistry Validation and Standardization Platform

Page 24: The application of cloud computing to royal society of chemistry data platforms

ChemSpider

RSC Archive

RSC Chemistry Platform

Big Data world and chemistry

Data quality

Cloud Computing considerations

Page 25: The application of cloud computing to royal society of chemistry data platforms

Cloud continuum

Page 26: The application of cloud computing to royal society of chemistry data platforms

Cloud services from major players

Page 27: The application of cloud computing to royal society of chemistry data platforms

Big Data in a Cloud whoops…

Page 28: The application of cloud computing to royal society of chemistry data platforms

Summary

Cloud definition is foggy

Demands for computing resources is growing tremendously as we move into a Big Data world

Moving into the Cloud is not an “if” question, it’s a “when” question

It’s also a question of timing, budgets and resources

Page 29: The application of cloud computing to royal society of chemistry data platforms

Thank you

Email: [email protected]

Slides: http://www.slideshare.net/valerytkachenko16