20
Adding structures to Web pages and data to structures Alex Allardyce ChemAxon Presented at ACS Spring Meeting, Anaheim, 2011

Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

Embed Size (px)

DESCRIPTION

chemicalize.org is a new free online service developed by ChemAxon which adds chemistry to Web pages as well as data and Web pages to structures. The primary use is to parse chemical names from Web page text and serve an annotated Web page version which includes structure images hyper-linked from the chemical name source. By storing structures and Web page URL's we can search the database to find those Web pages containing any given structure query. For each structure users can also generate structure based prediction results within a user customizable report, predictions include logP, pKa, logD etc. Current developments center around user profiles, 'tracking' structures in newly chemicalized pages and presenting chemicalize.org user activity to give a snapshot of current Web pages and structures that are interesting chemists online.This presentation will outline the aims of the development, describe the service, current developments and overview use and user feedback.

Citation preview

Page 1: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

Adding structures to Web pages and data to

structuresAlex Allardyce

ChemAxon

Presented at ACS Spring Meeting, Anaheim, 2011

Page 2: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

Demo – index page

• Lay out input box• Recently chemicalized, recent

queries…• Drag and drop structure images• Help, about

Example: http://www.chemicalize.org/

Presented at ACS Spring Meeting, Anaheim, 2011

Page 3: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

Demo – chemicalizing a Web page• URL paste

• Structure images• TOC and links• Properties link from mouse over image• Download• Links work

Example: http://www.chemicalize.org/?url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FPenicillin

Presented at ACS Spring Meeting, Anaheim, 2011

Page 4: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

Demo – Structure based predictions• Properties

• Manage views, move boxes• Open MarvinView from double click on

any structure image• Calculate on demand• Download results

Example: http://www.chemicalize.org/structure/#!mol=Penicillin&source=parser

Presented at ACS Spring Meeting, Anaheim, 2011

Page 5: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

Demo – Structure search

• Chem search pages• Search from Calculate properties• Open Marvin, power query features• Similarity default search, see other types

• Choose a structure• List of URL’s, chemicalized links• Show structures• Combine chem search with URL• Download results

Examples: 1. http://www.chemicalize.org/search/#m=Penicillin/t=t/h=0 2. http://www.chemicalize.org/search/#m=Penicillin/t=t/h=0/c=46260/p=0

Presented at ACS Spring Meeting, Anaheim, 2011

Page 6: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

Demo – Web search

• define chem and non-chem text query

• Structure synonyms in query• structures in results panel ‘like web text

search + structures in the results”

Example: http://www.chemicalize.org/websearch/#m=Serotonin+sexual+preference+site%3Anature.com/p=0

Presented at ACS Spring Meeting, Anaheim, 2011

Page 7: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

Who are we

• 70+ people making cheminformatics toolkits and GUI’s in Budapest, Hungary

• 4 areas of technology : • Cheminformatics platform toolkits• Discovery toolkits• Desktop applications• Markush and IP

• Lots of web ready chemistry functionality to play with

• Emerging as industry leader in platform cheminformatics

Presented at ACS Spring Meeting, Anaheim, 2011

Page 8: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

Why did we do this

History•Free academic package and FreeWeb licensing since 2005•Marvin free for all desktops (since the beginning)•Open support forum developed to allow support for free users (no login to see all threads)

Presented at ACS Spring Meeting, Anaheim, 2011

Page 9: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

So why did we do this

• There is a lot of content on the web• Useful + increase visibility/utility of chemical

structures• Creates user interest in this type of functionality

and so demand for chemistry and content for publishers

• Lets us develop directly with end users:• Functionality/feature development • GUI usability• Crowd sourced bug fixing “Report Error” for naming.

• Pushing state of the art • Browser tech (svg, chunking, reducing calls) • ChemAxon tech (on the web, must be superfast, finalise

features)• We love cheminformatics “cheminfomaniacs”Presented at ACS Spring Meeting, Anaheim, 2011

Page 10: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

chemicalize.org under the hoodweb application (15kloc):•MySQL: DB engine - structure/text storage•ChemAxon bits: see below•Apache Tomcat – servlet container with code logic •jQuery + Plugins – UI interactions with code logic

• A fair bit of home grown (46% of code) here

Presented at ACS Spring Meeting, Anaheim, 2011

Page 11: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

ChemAxon bits

• Marvin: structure editor, viewer, image generation• Name <> structure, Document to Structure:

parsing, dictionaries and lexing IUPAC names• JChem Base, JChem Web Services,

Standardizer, MCES: structure database, duplicate checking, structure search, web services layer, canonicalization, hit highlighting

• Calculator Plugins: structure based predictions like pKa, logP, logD, charge, HBDA, tautomer, stereoisomers, etc. Notable combined predictions yield argument results – like “Lipinski-likeness” etc

Presented at ACS Spring Meeting, Anaheim, 2011

Page 12: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

Use cases:

• Wanted to know the logP of…• What are the structures for known drugs

(http://en.wikipedia.org/wiki/List_of_drugs)

• Seeing structures in relation to the name• All wikipedia pages with a “chembox” have been

indexed by chemicalize.org so can be searched by structure search (sub structure, similar, exact)

• See all similar structures (and names) for any similar structure : sildenafil = viagra, lodenafil, aildenafil, udenafil …

• Draw a structure and see it’s name• Automatically chemicalize my blog (WordPress

plugin)Presented at ACS Spring Meeting, Anaheim, 2011

Page 13: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

Stats: Raw numbers(Apr 1, 2010 – Mar 25, 2011)

• URL’s visited: 232,648• Total number of names: 3,383,947 (14.58

names/page)• Unique names extracted: 220,117• Structures extracted: 175,598• Total number unique visitors: 44,535• Average number of visitors/day (March 2011):

212• Average/longest time on site: 4:03 / 28:41

(min:sec)

Presented at ACS Spring Meeting, Anaheim, 2011

Page 14: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

What are they doing on the site

Presented at ACS Spring Meeting, Anaheim, 2011

Page 15: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

How busy are they?

Presented at ACS Spring Meeting, Anaheim, 2011

Page 16: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

Top domains

Total domains: 13,390 Ave. 17.29 urls per domain

Presented at ACS Spring Meeting, Anaheim, 2011

Page 17: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

Top pages

1. en.wikipedia.org/wiki/List_of_anaesthetic_drugs2. www.reactivereports.com/chemistry-blog/arty-with-a-capital-

f-and-the-myth-of-absinthe.html/comment-page-13. en.wikipedia.org/wiki/Penicillin4. en.wikipedia.org/wiki/Aspirin5. en.wikipedia.org/wiki/Paracetamol6. www.ncbi.nlm.nih.gov/sites/entrez?

db=pccompound&term=aspirin7. en.wikipedia.org/wiki/List_of_organic_compounds8. www.biomedcentral.com/info/ifora/figuretypes/9. www.freepatentsonline.com/y2005/0037033.html10. www.vivo.colostate.edu/hbooks/pathphys/endocrine/

pancreas/insulin_phys.htmlData only available for last 2 weeksPresented at ACS Spring Meeting,

Anaheim, 2011

Page 18: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

Usage statistics – predictions

Presented at ACS Spring Meeting, Anaheim, 2011

Page 19: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

Future plans..?

• Remaining free• Crowdsourcing – new structures/names, bug reporting• Working on sorting and ordering results (biggie)• Personalization (login) = personal search history, profiles

(notifications), dictionaries, calculation/search parameter settings

• Index page as window into internet chemistry use• Browser Plugins = chemicalize better, particularly in

login/https pages (plugins tech approaching unity anyway)• How about working up the chemistry side such as

pharmacophore search, other screening, etc - there is a lot of ChemAxon tech here to play with

• Work on quality of name parsing, black lists etc• What else guys – this is a provisional listPresented at ACS Spring Meeting, Anaheim, 2011

Page 20: Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

Thanks to

• Andras StraczSite implementation

• Daniel BonniotDocument & Name to structure

• Alex Allardyce, Ferenc CsizmadiaFeatures, project management, idiot and advanced testing

• Zsolt KocsmarszkyDesign

• Roland MolnarJChem Web Services