60
Building Global Chemistry Network at the Royal Society of Chemistry Valery Tkachenko ICSTI Workshop Data and Non-Data Integration – A Journey Across Disciplines Ottawa, October 16 th 2013

Building global chemistry network at the royal society of chemistry

Embed Size (px)

DESCRIPTION

The Royal Society of Chemistry is building a Global Chemistry Network which will connect chemical resources and chemists across the globe in a single scientific information network dynamically updated in real time. We have been working on a number of the foundation technologies for a number of years including a structure database containing almost 30 million chemicals, a micropublishing environment, a platform for the validation and standardization of chemical structure representations and a text-mining and semantic markup platform for data enabling our published articles. Our goal is to provide seamless tools for researchers, librarians, publishers, informational technology specialists and government agencies to facilitate scientific research by providing a free flow of information. This talk will review our work to date to provide a chemistry data platform for the community and will highlight some of the challenges we face as we expand the architecture for our Global Chemistry Network platform.

Citation preview

Page 1: Building global chemistry network at the royal society of chemistry

Building Global Chemistry Network at the Royal Society of Chemistry

Valery Tkachenko

ICSTI Workshop

Data and Non-Data Integration –

A Journey Across Disciplines

Ottawa, October 16th 2013

Page 2: Building global chemistry network at the royal society of chemistry

The World we live in

Internet World20+ years into the Internet RevolutionWeb 2.0 -> Web 3.0

Connected WorldSocial NetworksReal-time Communications

Big Data WorldSemantic contentNew Interfaces

Page 3: Building global chemistry network at the royal society of chemistry

Big Data challenge

RSC/ChemSpider platforms

Crowdsourcing and AltMetrics

New interfaces

Building Global Chemistry Network

Page 4: Building global chemistry network at the royal society of chemistry
Page 6: Building global chemistry network at the royal society of chemistry

Chemistry on the Internet

Page 7: Building global chemistry network at the royal society of chemistry

Why disproportion?Scientific complexity

Conservative nature

Page 8: Building global chemistry network at the royal society of chemistry

Big Data challenge

RSC/ChemSpider platforms

Crowdsourcing and AltMetrics

New interfaces

Building Global Chemistry Network

Page 9: Building global chemistry network at the royal society of chemistry

Royal Society of Chemistry (RSC)

Largest European organisation for advancing the chemical sciencesFounded 1841Not-for profit “To be the leading voice and trusted partner for science and humanity”Professional body with a worldwide network of 48,000 members International publisher ~400 employeesEducation facilitator, Science leader, E-Science leaders

Page 10: Building global chemistry network at the royal society of chemistry

About the RSC• Headquarters in London• Offices in Cambridge, Beijing, Shanghai, Philadelphia, TokyoBangalore, Sao Paulo

Page 11: Building global chemistry network at the royal society of chemistry

STM publisher

Knowledge

Our User Interfaces(Desktop, Web, Mobile, etc)

Customers

Delivery Magic

3rd party integrations(our web services)

Page 12: Building global chemistry network at the royal society of chemistry

ChemSpider Suite

Data Layer

ChemSpider Assays

ChemSpider Compounds

ChemSpider Reactions

ChemSpider Spectra

ChemSpider Materials

ChemSpider Algorithms

Business Objects Layer

CSAs BOCSC BO CSR BO CSS BO CSM BO CSA BO

APIs Layer

DS APIExport APISearch API Processing API

CSAs APICSC API CSR API CSS API CSM API CSA API

Components Layer

JS Components Google AppsComponents

Python widgets

SharePointComponents

PHP snippets

ASP.NET Components

UIs

ChemSpider website

ChemSpider Reactions

mobile web app

ChemSpider desktop app

Depositions client

Java Beans

Page 13: Building global chemistry network at the royal society of chemistry

• 29 million chemicals and growing

• Data sourced from >500 different sources

• Crowdsourced curation and annotation

• Ongoing deposition of data from our journals and our collaborators

• A structure centric hub for web-searching

Page 14: Building global chemistry network at the royal society of chemistry

ChemSpider

Page 15: Building global chemistry network at the royal society of chemistry

ChemSpider

Page 16: Building global chemistry network at the royal society of chemistry

ChemSpider

Page 17: Building global chemistry network at the royal society of chemistry

ChemSpider

Page 18: Building global chemistry network at the royal society of chemistry

ChemSpider

Page 19: Building global chemistry network at the royal society of chemistry

ChemSpider

Page 20: Building global chemistry network at the royal society of chemistry

ChemSpider

Page 21: Building global chemistry network at the royal society of chemistry

ChemSpider

Page 22: Building global chemistry network at the royal society of chemistry

ChemSpider

Page 23: Building global chemistry network at the royal society of chemistry
Page 24: Building global chemistry network at the royal society of chemistry
Page 25: Building global chemistry network at the royal society of chemistry
Page 26: Building global chemistry network at the royal society of chemistry

ChemSpider Reactions

Page 27: Building global chemistry network at the royal society of chemistry

ChemSpider Reactions

Page 28: Building global chemistry network at the royal society of chemistry

ChemSpider Reactions

Page 29: Building global chemistry network at the royal society of chemistry

ChemSpider Reactions

Page 30: Building global chemistry network at the royal society of chemistry

RSC Archive – since 1841

Page 31: Building global chemistry network at the royal society of chemistry

DERA - Digitally Enabling RSC Archive

Page 32: Building global chemistry network at the royal society of chemistry

Semantic Mark-up of Articles

Page 33: Building global chemistry network at the royal society of chemistry

It is so difficult to navigate…

What’s the structure?What’s the structure?

Are they in our file?

Are they in our file?

What’s similar?What’s

similar?

What’s the target?

What’s the target?Pharmacology

data?Pharmacology

data?

Known Pathways?

Known Pathways?

Working On Now?

Working On Now?Connections

to disease?Connections to disease?

Expressed in right cell type?Expressed in

right cell type?

Competitors?Competitors?

IP?IP?

Page 34: Building global chemistry network at the royal society of chemistry

DERA Architecture

Text, PDF, XML

Structures

Reactions

Spectra

Materials

Chemistry Validation andStandardization Platform

(CVSP)

DERA(Text Mining)

Biological Activities

Page 35: Building global chemistry network at the royal society of chemistry

Data quality issue and CVSP

Robochemistry

Proliferation of errors in public and private databases

Automated quality control system

Page 36: Building global chemistry network at the royal society of chemistry

DrugBank dataset (6516 records)

~60 records that can’t be dearomatized unambiguously

DB04283 DB04462

Page 37: Building global chemistry network at the royal society of chemistry

~30 records with bonds that do not make sense

DB04283

DDB04009

Page 38: Building global chemistry network at the royal society of chemistry

DB08128

J. Brechner, IUPACGraphical Representation of stereochem. configurationsSection: ST-1.1.10

DB06287

7 records with 2 stereo bonds at chiral atoms

Page 39: Building global chemistry network at the royal society of chemistry

“Direction of bond makes no sense” – 63%

Page 40: Building global chemistry network at the royal society of chemistry

“Stereo types of non-opposite bonds match” – 2%

Page 41: Building global chemistry network at the royal society of chemistry

ChemSpider Suite

Data Layer

ChemSpider Assays

ChemSpider Compounds

ChemSpider Reactions

ChemSpider Spectra

ChemSpider Materials

ChemSpider Algorithms

Business Objects Layer

CSAs BOCSC BO CSR BO CSS BO CSM BO CSA BO

APIs Layer

DS APIExport APISearch API Processing API

CSAs APICSC API CSR API CSS API CSM API CSA API

Components Layer

JS Components Google AppsComponents

Python widgets

SharePointComponents

PHP snippets

ASP.NET Components

UIs

ChemSpider website

ChemSpider Reactions

mobile web app

ChemSpider desktop app

Depositions client

Java Beans

Page 42: Building global chemistry network at the royal society of chemistry

Big Data challenge

RSC/ChemSpider platforms

Crowdsourcing and AltMetrics

New interfaces

Building Global Chemistry Network

Page 43: Building global chemistry network at the royal society of chemistry

AltMetrics

Page 44: Building global chemistry network at the royal society of chemistry

Plum Analytics

Page 45: Building global chemistry network at the royal society of chemistry

RSC/Rewards and Recognition

Congratulations! Your 1st CSSP article has been published. Philosopher Lao Tzu said “A journey of a thousand miles begins with a single step”. In the same way we hope that this will be the first of many submissions that you make to CSSP.

The First Step badge is awarded when a user submits (& has published) their 1st CSSP article.

Page 46: Building global chemistry network at the royal society of chemistry

Big Data challenge

RSC/ChemSpider platforms

Crowdsourcing and AltMetrics

New interfaces

Building Global Chemistry Network

Page 47: Building global chemistry network at the royal society of chemistry

Visualization

Page 49: Building global chemistry network at the royal society of chemistry

ChemSpider APIs

Page 50: Building global chemistry network at the royal society of chemistry

Big Data challenge

RSC/ChemSpider platforms

Crowdsourcing and AltMetrics

New interfaces

Building Global Chemistry Network

Page 51: Building global chemistry network at the royal society of chemistry

We are a part of a larger world

Page 52: Building global chemistry network at the royal society of chemistry

National Chemistry Database

Page 53: Building global chemistry network at the royal society of chemistry

National Data Repository

University 1

Data Hub

Workstations

University 2

Data Hub

Workstations

Company 3

Data Hub

Workstations

Data Repositoryindexed storage

Data Repository provideddata storage

Chemically intelligent services

Indexes

Data

External clients Publishers

Scientists Funding bodies

Page 54: Building global chemistry network at the royal society of chemistry

http://www.openphacts.org

Open PHACTS is an Innovative Medicines Initiative (IMI) project, aiming to reduce the barriers to

drug discovery in industry, academia and for small

businesses.

Semantic web is one of the corner stones

Page 55: Building global chemistry network at the royal society of chemistry
Page 56: Building global chemistry network at the royal society of chemistry

We know about Natural Products

Page 57: Building global chemistry network at the royal society of chemistry

Marinlit

Page 58: Building global chemistry network at the royal society of chemistry

OSDD

Page 59: Building global chemistry network at the royal society of chemistry

Internet Data

The Future

Commercial SoftwarePre-competitive Data

Open ScienceOpen DataPublishersEducators

Open DatabasesChemical Vendors

Small organic moleculesUndefined materialsOrganometallicsNanomaterialsPolymersMineralsParticle boundLinks to Biologicals

Page 60: Building global chemistry network at the royal society of chemistry

Thank you

Email: [email protected]

Slides: http://www.slideshare.net/valerytkachenko16