24
George Papadatos Senior Technical Officer ChEMBL group [email protected] The SureChEMBL KNIME Nodes

TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

George  Papadatos  Senior  Technical  Officer  ChEMBL  group  [email protected]    

The  SureChEMBL  KNIME  Nodes  

Page 2: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

George  Papadatos  Senior  Technical  Officer  ChEMBL  group  [email protected]    

The  SureChEMBL  KNIME  Nodes  

Page 3: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

EMBL-­‐EBI  

17/02/2014   KNIME  UGM  2014  3  

Genomes  &  variaCon  •  Ensembl    •  Ensembl  Genomes  •  Genome-­‐phenome  archive  •  Metagenomics    

NucleoCde  sequences  •  European  NucleoCde  

Archive  (ENA)  

Expression  •  Array  Express  •  Expression  Atlas  •  PRIDE  •  R-­‐Workbench  

Proteins  •  The  Universal  Protein  

Resource  (UniProt)  •  InterPro  

Chemical  biology  •  ChEMBL  •  ChEBI  

Literature  &  ontologies  •  Europe  PubMed  Central  •  Gene  Ontology  

Biomolecule  structures  •  Protein  Data  Bank  in  Europe  •  PDBsum  •  ProFunc  

Pathways  •  IntAct  •  Reactome  •  Metabolights  

Systems  •  BioModels  •  Enzyme  Portal  •  BioSamples  

Page 4: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

KNIME  at  the  EBI  •  Provide  KNIME  training  to  scienCsts  and  researchers  •  CDK  community  nodes  development  •  Access  the  ChEBI  and  ChEMBL  databases  via  KNIME  nodes  

•  Trusted  community  nodes  

KNIME  UGM  2014  17/02/2014  4  

Page 5: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

Overview  of  EMBL-­‐EBI  chemistry  resources  

KNIME  UGM  2014  17/02/2014  5  

UniChem  –  InChI-­‐based  resolver  (full  +  relaxed  ‘lenses’)  

3rd  Party  Data    

ZINC,  PubChem,  ThomsonPharma  DOTF,  IUPHAR,  DrugBank,  KEGG,  

NIH  NCC,  eMolecules,  

mcule,  FDA  SRS,  PharmGKB,  Selleck,  ….  

 

ChEMBL    

BioacCvity  data  from  literature  

and  deposiCons  

 

ChEBI    

Nomenclature  of  primary  and  secondary  metabolites.  Chemical  &  FuncConal  Ontology  

 

Atlas    

Ligand  induced  transcript  response  

PDBe    

Ligand  structures  

from  structurally  defined  protein  

complexes    

SureChEMBL    

Molecule  structures  from  

patent  literature  

 

RDF  and  REST  API  interfaces  

REST  API  Interface  

15K   750  15M  1.5M  24K  

65M  

Page 6: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

Novelty  checking  with  UniChem  

KNIME  UGM  2014  17/02/2014  6  

hgps://www.ebi.ac.uk/unichem/  

Page 7: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

SureChEMBL  and  KNIME  

17/02/2014   KNIME  UGM  2014  7  

Page 8: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

The  story  •  EMBL-­‐EBI  have  acquired  SureChem  –  a  leading  ‘chemistry  

patent  mining’  product  from  Digital  Science,  Macmillan  Group  •  SureChem  not  aligned  with  core  future  academic  business  

•  User  base  •  Free  (SureChemOpen)  •  Paying  (SureChemPro)  

•  EMBL-­‐EBI  will  support  exisCng  licensees  •  Plans  to  provide  an  ongoing,  free,  open  resource  to  enCre  

community  •  Rebrand  to  SureChEMBL  

KNIME  UGM  2014  17/02/2014  8  

Page 9: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

Chemistry  patents?  •  patere  (LaCn)  =  to  lay  open  •  Legal  and  technical  documents  •  Disclosure  of  invenCon  in  exchange  for  exclusive  rights  

•  Usually  20  years  •  Driver  for  innovaCon  •  Most  of  the  knowledge  in  (chemical)  patents  will  never  

appear  anywhere  else  

KNIME  UGM  2014  17/02/2014  9  

Page 10: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

SureChEMBL  System  Overview  

17/02/2014  KNIME  UGM  2014  10  

WO  

EP  ApplicaCons&  Granted  

US  ApplicaCons  &  granted  

JP  Abstracts  

Patent Offices

Processed  patents  (service)  

Name  to  Structure  (five  methods)  

Image  to  Structure  (one  method)  

Database  

Chemistry  Database  

Patent  PDFs  

(service)  

ApplicaCon  Server  

EnCty  RecogniCon  

The Cloud - Amazon Web Services Users  

API  

SureChem IP

SureChem System

1-­‐[4-­‐ethoxy-­‐3-­‐(6,7-­‐dihydro-­‐1-­‐methyl-­‐7-­‐oxo-­‐3-­‐propyl-­‐1H-­‐pyrazolo[4,3-­‐d]pyrimidin-­‐5-­‐yl)phenylsulfonyl]-­‐4-­‐methylpiperazine  

Page 11: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

17/02/2014  KNIME  UGM  2014  11  

Keyword  search   Filter  by  authority  

Structure  sketch  

Filter  by  document  secCon  help  

Paste  SMILES,  MOL,  name  

Types  of  chemistry  search  

Filter  by  date  

http://www.surechembl.org/

help  

Patent  number  search  

Page 12: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

Similarity  searching  

KNIME  UGM  2014  17/02/2014  12  

Page 13: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

Reviewing  the  hits  

KNIME  UGM  2014  17/02/2014  13  

Page 14: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

From  hits  to  patent  documents  

KNIME  UGM  2014  17/02/2014  14  

Page 15: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

Full  text  patent  document  access  

KNIME  UGM  2014  17/02/2014  15  

Page 16: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

SureChEMBL  KNIME  Nodes  •  Developed  by  Max  Recall  InformaCon  Systems  GmbH  •  Main  funcConality  

•  Keyword  search  •  Lucene  syntax  and  Boolean  operators  •  pa:(Bayer  OR  Genentech  OR  Merck)  AND  desc:(chemotherap*  AND  

(PhosphoinosiCde  kinase  OR  Pi3K))  

•  Structure  search  •  AddiConal  phys/chem  filters  

•  Retrieve  patent  biblio  and  full  text  •  Extract  chemistry  from  patent  

•  AddiConal  filters  •  Document  secCon  counts  •  Chemical  corpus  counts    

KNIME  UGM  2014  17/02/2014  16  

Page 17: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

API  authentication  key  

KNIME  UGM  2014  17/02/2014  17  

Page 18: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

Node  description  

KNIME  UGM  2014  17/02/2014  18  

Page 19: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

Live  Demo!    •  Exploring  the  anC-­‐malarial  landscape  in  US  patents  

KNIME  UGM  2014  17/02/2014  19  

Page 20: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

More  use  cases  within  KNIME  •  ChemoinformaCcs  

•  Chemistry  landscape  for  a  parCcular  biological  target/disease  •  R-­‐group  analysis  for  a  parCcular  patent  family  claimed  chemistry  •  Novelty  checking  

•  CompeCCve  intelligence  •  ReporCng  •  Patent  alerts  •  Per  target/disease  

•  Prior  art  checking  •  Further  text-­‐mining  and  annotaCon  •  Network  analysis  of  citaCons  

KNIME  UGM  2014  17/02/2014  20  

Page 21: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

Timeframes  and  plans  •  About  2-­‐3  months  for  full  transfer  of  operaCons  •  Refactor  authenCcaCon  system  

•  Consider  fair  use  •  Future  ideas  for  development  –  dependent  on  funding!  

•  Add  sequence  searching  •  Add  disease  terms  and  target  indexing  •  Add  chemical  structure  tagging  &  search  to  full  text  content  of  Europe  PMC  

KNIME  UGM  2014  17/02/2014  21  

Page 22: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

Acknowledgements  •  ChEMBL  group  

•  John  Overington  •  Mark  Davies  

•  ChEBI  group  •  Stephan  Beisken  

•  SureChem  •  MaxRecall  

•  Michael  Digenbach  

•  KNIME  •  KNIME  community    

KNIME  UGM  2014  17/02/2014  22  

Page 23: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

Any  questions?  

17/02/2014   KNIME  UGM  2014  23  

Page 24: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&

George  Papadatos  Senior  Technical  Officer  ChEMBL  group  [email protected]    

The  SureChEMBL  KNIME  Nodes  

17/02/2014