39
Mind the Gap: The novel benefits of human-curated substance locations for chemical patent analysis Aalt van de Kuilen, Patent Information Services BV, NL Paul Peters, CAS/ACS International, DE ICIC 2016 October 18, 2016 Heidelberg, Germany CAS is a division of the American Chemical Society. Copyright 2016 American Chemical Society. All rights reserved.

ICIC 2016: Mind the Gap: The novel benefits of human-curated substance locations for chemical patent analysis

Embed Size (px)

Citation preview

Mind the Gap: The novel benefits of

human-curated substance locations for

chemical patent analysis

Aalt van de Kuilen, Patent Information Services BV, NL

Paul Peters, CAS/ACS International, DE

ICIC 2016

October 18, 2016

Heidelberg, Germany

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.

Finding the relevant section(s) within the full-text of chemical

patents is often a time-consuming challenge

• They are not always as easy to track

down as we might expect

• They can be long and “artfully” written

• The chemistry is often obscured within complex

names, tables, text, graphics, etc.

Sometimes it seems like the

search may be complete, but

the hunt is just beginning!

Even with a precise chemical patent search, reviewing the results

can quickly become overwhelming

=> FILE CAPLUS

=> S L3

L4 1014 L3

=> S L4 AND (BET OR BROMODOMAIN) AND P/DT

L5 35 L4 AND (BET OR BROMODOMAIN) AND P/DT

A query combining structure and text

terms yields 35 patent publications.

That shouldn’t be too bad, right?

Only 5,498 pages to review.

479 pages

428 pages

321 pages

277 pages

263 pages

261 pages

240 pages

229 pages

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.3

Technology can help, but algorithmic extraction of chemistry in

patents has significant limitations

4

Conclusion: Algorithmic extraction successfully found

only 50-60% of the chemical structures in patents

based on a limited sample, and they were often the

least interesting ones.

Algorithms miss key substances for a myriad of reasons

• Ambiguous naming

• Markush representations

• No name – Explanatory text or

images, rather than as chemical

names or structures

• Stereochemistry issues

• Multi-component substances

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.5

Normally PSS is used for poly(styrenesulfonic acid), but here it represents the aqueous dispersion, which CAS previously identified as poly(1-vinyl-2-pyrolidone)

Normally PSS is used for poly(styrenesulfonic acid), but here it represents the aqueous dispersion, which CAS previously identified as poly(1-vinyl-2-pyrolidone)

PatentPakTM addresses this gap by combining human curation

with new technology to expedite chemical patent analysis

• Rapidly track down the specific location of hard-to-find chemical information

in patents with interactive links to key substances

– Benefit from the indexing efforts of hundreds of CAS scientists

• Instantly and securely access patent PDFs from major patent offices

– No more wasting time navigating multiple web sites

• Locate patents in languages you know with CAplusSM global patent family

coverage

– Save time and translation costs

• Conveniently share these benefits with other IP stakeholders

– Even if they do not use STN® or SciFinder®

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.6

PatentPak is built on the indexing effort of the scientific analysts

that create CAS REGISTRYSM

• Scientists review each patent and

identify new substances for CAS

REGISTRY inclusion

• They mark the specific location of

substances in the text during analysis

• Algorithmic processing with human

intervention allows previously registered

substances to be located and annotated

in backfile documents

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.7

“I analyzed the chemistry in this

entire patent to save you time.”

Keiko Sugimoto

Sr. Scientific Information Analyst, CAS

CAS is a division of the American Chemical Society.

Copyright 2015 American Chemical Society. All rights reserved. 8

PatentPak supplements CAplus records with direct pointers to the

chemistry of interest

Bibliographic information (partially shown)

Hit substance indexing including roles

Hit structure display from CAS REGISTRY

PatentPak links for each hit compound

PatentPak links for document

CAS is a division of the American Chemical Society.

Copyright 2015 American Chemical Society. All rights reserved.

It is possible to access the original PDF…

CAS is a division of the American Chemical Society.

Copyright 2015 American Chemical Society. All rights reserved.

… the annotated PDF (PDF +) …

CAS is a division of the American Chemical Society.

Copyright 2015 American Chemical Society. All rights reserved.

… or review the patent using the interactive viewer

PatentPak links are available in transcripts, tables, and reports

and accessible without an STN login ID to support workflow

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.12

No STN login ID required

PatentPak is also available in SciFinder

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All

rights reserved.

New CAplus records from 31 countries are annotated as part of the

normal workflow, and the backfile is growing rapidly

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.14

The current backfile project will extend historical PatentPak

coverage of key offices by more than a decade by year end

ACS / Proprietary and Confidential / Do Not Distribute 15

16

PatentPak example US5739376: Backfile operation for one of the

first patents on Fullerene derivatives (Hoechst AG)

• Originally a

German basic

patent from 1994

but substance

locations have

been added to the

US equivalent

from 1998

• Fullerene structures

were symbolized by

simple rings

PatentPak example WO2016087417: substances identified in a

Markush table (Bayer CropScience AG)

• Only a few selected

substances in this

patent are fully

identified by name

or structure

• The vast majority

of substances are

indexed by

assembling

Markush tables

PatentPak example WO9851681: Substance identified as “oily

product” (Sanofi)

• This particular

substance is

only identified

as “oily product”

• CAS analyst

indexing from

the chemistry

PatentPak example WO2016120821: Find substances that cannot be

identified by algorithm or structure extraction (Novartis AG)

• Substances in formula VII

are claimed by Markush:

LG = “leaving group”

• Analyst marked four specific

compounds which are

defined later in the claims -

only a human can process

claims like this!

PatentPak example DE2013016487: Multiple location markings

(University of Heidelberg)

• Analyst has

marked multiple

locations - claims

and synthetic

example

21

PatentPak example WO2016001362: Find substances inferred by

their starting material after enzymatic conversion (BASF)

• Starting

materials

(substrates)

identified by

structure on

page 51

• Products not

listed but

inferred in a

table on

page 27

PatentPak example WO2015018558: Inorganic chemistry can be

equally challenging (PI Ceramic GmbH)

PatentPak example WO2014184355: Find assembled Markush tables

(Dr. August Wolff GmbH & Co Arzeneimittel)

• 9.5 pages of "table

Markush“ structures - a

core structure shown at

the top, with fragments

• The complete structure is

assembled in a table at

the back of PDF+

document, including page

numbers, CAS RN,

chemical name, and

structures

Case study on new Vitamin D metabolites

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.24

How many patent families have been filed since 2013 on new

Vitamin D metabolites?

Find the answer by with

Stepwise approach

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.25

1. Structure search in Registry

2. Remove old compounds

3. Keep compounds with low reference count in CAplus

4. Transfer to Chemical Abstracts

5. Limit to new compounds and published in patents

6. Display records which have a PatentPak record

PatentPak PDF| PatentPak PDF+ | PatentPak Interactive

Structure

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.26

Q

CH2

CH3

Ak

Broad definition of Vitamin D skeletonAll rings are isolated and double bonds are mandatory

CAS REGISTRY search

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.27

FILE 'REGISTRY‘ ENTERED ON 22 SEP 2016

STRUCTURE UPLOADED

=> L3 has 6806 unique substances in Registry

Refine to compounds registered since 2001 (ED>2000)

=> L4 has 2394 unique substances

Refine to substances with less than 5 references (REF.CAPLUS<5)

=> L5 has 2159 unique substances

CAplus search strategy

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.28

Cross-over of L4 with 2159 unique substances

=> L5 has 503 references from all years

Restrict the answer to patent records only (P/DT)

=> L6 has 234 patent references from all years

Restrict to patents with a stronger chemistry focus using C07C

as IPC or CPC codes

=> L7 has 136 patent references from all years

Restrict to patents with a priority year after 2012

=> L8 has 18 patent references

Findings of the 18 patent family records retrieved

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.29

Answer Country Language Pub.year Pages All subst Vitam-D PPAK

1 CN Chinese 2016 27 37 4 Yes

2 CN Chinese 2016 25 47 13 Yes

3 WO English 2016 106 202 55 PDF+

4 CN Chinese 2016 13 4 1 Yes

5 CN Chinese 2016 5 3 1 Yes

6 CN Chinese 2015 21 9 2 Yes

7 CN Chinese 2015 14 9 4 Yes

8 CN Chinese 2015 9 4 1 Yes

9 WO German 2015 45 14 3 Yes

10 DE German 2015 22 14 3 Yes

11 CN Chinese 2014 14 16 1 Yes

12 CN Chinese 2015 12 7 1 Yes

13 US English 2015 18 5 3 Yes

14 WO Spanish 2015 75 141 29 Yes

15 US English 2015 21 18 3 Yes

16 WO English 2015 61 18 2 Yes

17 ES Spanish 2013 55 141 29 Yes

18 WO English 2013 50 30 3 Yes

The result set includes

three “double basic” pairs:

9+10, 14+17, 15+16

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.30

L16 ANSWER 7 OF 18 CAPLUS COPYRIGHT 2016 ACS on STN

PatentPak PDF | PatentPak PDF+ | PatentPak Interactive

AN 2015:979679 CAPLUS Full-text<<LOGINID:ssscas83ppp:20160907>>

DN 163:118806

TI 24,28-Olefine-1-hydroxy-vitamin D derivatives and preparation

method

IN Fang, Zhijie; Guo, Wei; Liu, Yanan; Li, Hongliang

PA Nanjing University of Science and Technology, Peop. Rep. China

SO Faming Zhuanli Shenqing, 14pp.

CODEN: CNXXEV

DT Patent

LA Chinese

FAN.CNT 1

PPPI

PATENT NO. KIND DATE LANGUAGE PatentPak

--------------- ---- -------- ---------- ------------------------

CN 104693087 A 20150610 Chinese PDF | PDF+ | Interactive

PI

PATENT NO. KIND DATE APPLICATION NO. DATE

--------------- ---- -------- --------------------- --------

CN 104693087 A 20150610 CN 2013-10664076 20131210 <--

PRAI CN 2013-10664076 20131210 <--

Display

OriginalFull-text PDF

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.31

L16 ANSWER 7 OF 18 CAPLUS COPYRIGHT 2016 ACS on STN

PatentPak PDF | PatentPak PDF+ | PatentPak Interactive

AN 2015:979679 CAPLUS Full-text<<LOGINID:ssscas83ppp:20160907>>

DN 163:118806

TI 24,28-Olefine-1-hydroxy-vitamin D derivatives and preparation

method

IN Fang, Zhijie; Guo, Wei; Liu, Yanan; Li, Hongliang

PA Nanjing University of Science and Technology, Peop. Rep. China

SO Faming Zhuanli Shenqing, 14pp.

CODEN: CNXXEV

DT Patent

LA Chinese

FAN.CNT 1

PPPI

PATENT NO. KIND DATE LANGUAGE PatentPak

--------------- ---- -------- ---------- ------------------------

CN 104693087 A 20150610 Chinese PDF | PDF+ | Interactive

PI

PATENT NO. KIND DATE APPLICATION NO. DATE

--------------- ---- -------- --------------------- --------

CN 104693087 A 20150610 CN 2013-10664076 20131210 <--

PRAI CN 2013-10664076 20131210 <--

OriginalFull-text PDF + compound table

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.32

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.33

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.34

L16 ANSWER 7 OF 18 CAPLUS COPYRIGHT 2016 ACS on STN

PatentPak PDF | PatentPak PDF+ | PatentPak Interactive

AN 2015:979679 CAPLUS Full-text<<LOGINID:ssscas83ppp:20160907>>

DN 163:118806

TI 24,28-Olefine-1-hydroxy-vitamin D derivatives and preparation

method

IN Fang, Zhijie; Guo, Wei; Liu, Yanan; Li, Hongliang

PA Nanjing University of Science and Technology, Peop. Rep. China

SO Faming Zhuanli Shenqing, 14pp.

CODEN: CNXXEV

DT Patent

LA Chinese

FAN.CNT 1

PPPI

PATENT NO. KIND DATE LANGUAGE PatentPak

--------------- ---- -------- ---------- ------------------------

CN 104693087 A 20150610 Chinese PDF | PDF+ | Interactive

PI

PATENT NO. KIND DATE APPLICATION NO. DATE

--------------- ---- -------- --------------------- --------

CN 104693087 A 20150610 CN 2013-10664076 20131210 <--

PRAI CN 2013-10664076 20131210 <--

Interactive Viewer for substance locations

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.35

Interactive link to location of compound

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.36

Answer #3 has >600 substance locations, which can only be seen

in the PDF+; still very useful

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.37

Case study conclusions

CAS is a division of the American Chemical Society.

Copyright 2016 American Chemical Society. All rights reserved.38

1. Fast identification of relevant patents, containing new compounds

2. Easy access to the patent document

3. Time savings when finding the compounds in a specific patent

(PatentPak PDF+ compound table)

4. Quickly and easily locate a specific compound in a patent with links in

the PatentPak Interactive Viewer

Overall conclusions

39

• Semantic technology has made great advances in classifying, mining and

extracting chemical content from text; however, it has significant limitations

• Human analysis is still necessary to find many of the key compound locations

• PatentPak in STN provides convenient links for patent attorneys and outside

council to facilitate their analysis work

• PatentPak in SciFinder is designed to provide a direct interactive session for

scientists to find relevant compounds and search them in SciFinder

• PatentPak provides significant time savings when analyzing novel vitamin D

metabolites disclosed in patents