Southan real drugs_paris_oct_11_2014

Preview:

DESCRIPTION

The challenges of curating approved medicines:will the real drugs please stand up?

Citation preview

1

Challenges of curating approved medicines:Will the real drugs please stand up?

Chris Southan, representing the Database Team NC-IUPHAR/BPS/GTPdb Biannual Meeting, Paris, October 2014

2

What is the total for approved drug structures?

Take your pick …..

3

Discordance between sources inside PubChem

4

Explanations

• Discordance: distinctly different drug molecular representations from different sources that we would recognise canonically as the same bioactive substance

• These are merged into multiple CIDs per drug (i.e. “multiplexed”) via the PubChem chemistry rules due to:– Permutation of R/S stereo centers– Salt forms– Mixtures– Unresolved E/Z bonds– Tautomers– Isotopic derivatives including deuteration

5

Causes of drug structure multiplexing

• Inherent challenges and complexities of chemical representation• Utility of PubChem depends on advanced rules applied to a

submission-based system• Drug companies never verify their own structures in public

databases• Legacy of structure image primacy in documents• No clear accountability for correctness of public approved drug

structures (companies? FDA? WHO(INN)? AMA(USAN)? Wikipedia? CAS?)

• Structural variants enter databases from general source proliferation, large-scale patent extractions, chemical vendor submissions and repeated exemplifications in journals

• The net effect is an inexorable increase in multiplexing but not necessarily erroneous structures per se

6

A case of the wrong name > structure

7

Fixing errors: doing our bit

8

Taxol: a challenging example

9

Finding the links: multiplexed to 129 CIDs

10

Reading the links for alternative taxols:different structures > 20 sets of assay

results

11

Virtual deuteration: compounding drug multiplexing

12

Scale of the issue for approved drugs in PubChem:

multiplexing expansion from 2005 to 2014

13

So how are we doing in our database?

• Sets were salt-stripped for this comparison• GTPdb (Oct 2014) has 983 approved drug CIDs concordant with

either ChEMBL or DrugBank• But only 723 are 4-way concordant • We will inspect the 152, 192 and 180 sectors for consensus

expansion

14

Consequences and possible solutions to the drug multiplexing issue

• Our drugs annotation Committee cannot magic these issues away but their support is crucial

• Our consensus approach is useful and statistical defendable

• In the GTPdb we add curator comments and cross-pointers for key multiplexed examples

• Sources that make the effort to collate drug structure sets should cross-corroborate more

• A canonical approach to merging drug structure-to-bioactivity mappings could be considered

• The inner connectivity layer of the InChIKey goes some way towards this

Recommended