14
Challenges of curating approved medicines: Will the real drugs please stand up? Chris Southan, representing the Database Team NC-IUPHAR/BPS/GTPdb Biannual Meeting, Paris, October 2014 1

Southan real drugs_paris_oct_11_2014

Embed Size (px)

DESCRIPTION

The challenges of curating approved medicines:will the real drugs please stand up?

Citation preview

Page 1: Southan real drugs_paris_oct_11_2014

1

Challenges of curating approved medicines:Will the real drugs please stand up?

Chris Southan, representing the Database Team NC-IUPHAR/BPS/GTPdb Biannual Meeting, Paris, October 2014

Page 2: Southan real drugs_paris_oct_11_2014

2

What is the total for approved drug structures?

Take your pick …..

Page 3: Southan real drugs_paris_oct_11_2014

3

Discordance between sources inside PubChem

Page 4: Southan real drugs_paris_oct_11_2014

4

Explanations

• Discordance: distinctly different drug molecular representations from different sources that we would recognise canonically as the same bioactive substance

• These are merged into multiple CIDs per drug (i.e. “multiplexed”) via the PubChem chemistry rules due to:– Permutation of R/S stereo centers– Salt forms– Mixtures– Unresolved E/Z bonds– Tautomers– Isotopic derivatives including deuteration

Page 5: Southan real drugs_paris_oct_11_2014

5

Causes of drug structure multiplexing

• Inherent challenges and complexities of chemical representation• Utility of PubChem depends on advanced rules applied to a

submission-based system• Drug companies never verify their own structures in public

databases• Legacy of structure image primacy in documents• No clear accountability for correctness of public approved drug

structures (companies? FDA? WHO(INN)? AMA(USAN)? Wikipedia? CAS?)

• Structural variants enter databases from general source proliferation, large-scale patent extractions, chemical vendor submissions and repeated exemplifications in journals

• The net effect is an inexorable increase in multiplexing but not necessarily erroneous structures per se

Page 6: Southan real drugs_paris_oct_11_2014

6

A case of the wrong name > structure

Page 7: Southan real drugs_paris_oct_11_2014

7

Fixing errors: doing our bit

Page 8: Southan real drugs_paris_oct_11_2014

8

Taxol: a challenging example

Page 9: Southan real drugs_paris_oct_11_2014

9

Finding the links: multiplexed to 129 CIDs

Page 10: Southan real drugs_paris_oct_11_2014

10

Reading the links for alternative taxols:different structures > 20 sets of assay

results

Page 11: Southan real drugs_paris_oct_11_2014

11

Virtual deuteration: compounding drug multiplexing

Page 12: Southan real drugs_paris_oct_11_2014

12

Scale of the issue for approved drugs in PubChem:

multiplexing expansion from 2005 to 2014

Page 13: Southan real drugs_paris_oct_11_2014

13

So how are we doing in our database?

• Sets were salt-stripped for this comparison• GTPdb (Oct 2014) has 983 approved drug CIDs concordant with

either ChEMBL or DrugBank• But only 723 are 4-way concordant • We will inspect the 152, 192 and 180 sectors for consensus

expansion

Page 14: Southan real drugs_paris_oct_11_2014

14

Consequences and possible solutions to the drug multiplexing issue

• Our drugs annotation Committee cannot magic these issues away but their support is crucial

• Our consensus approach is useful and statistical defendable

• In the GTPdb we add curator comments and cross-pointers for key multiplexed examples

• Sources that make the effort to collate drug structure sets should cross-corroborate more

• A canonical approach to merging drug structure-to-bioactivity mappings could be considered

• The inner connectivity layer of the InChIKey goes some way towards this