42
Discover New IUPAC Organic Nomenclature with ACD/Name Andrey Yerin and all ACD/Labs team developing and supporting nomenclature tools San Francisco, USA, August 11, 2014 ACS 248th National Meeting ACD/Labs Workshop

Discover New IUPAC Organic Nomenclature with … New IUPAC Organic Nomenclature with ACD/Name Andrey Yerin and all ACD/Labs team developing and supporting nomenclature tools San Francisco,

  • Upload
    lynga

  • View
    234

  • Download
    3

Embed Size (px)

Citation preview

Discover New IUPAC Organic Nomenclaturewith ACD/Name

Andrey Yerin

and all ACD/Labs team developing and supporting nomenclature tools

San Francisco, USA, August 11, 2014

ACS 248th National Meeting

ACD/Labs Workshop

The Scope of this Workshop

Proliferation of chemical names

New IUPAC recommendations

The concept of preferred IUPAC name (PIN)

What is changed? Most principle cases.

How do we implement new rules?

The degree of changes

Are we ready to release a new software version?

The quality of chemical names

Generate and check!

Two ACD/Name related presentations already delivered:

CINF. New IUPAC Organic Nomenclature: from bottle label todatabase updateCHED. Teaching and learning of multilingual chemical nomenclature with ACD/Name

This presentation incorporates some parts of these presentations and includes specific examples of changes in organic nomenclature.

Proliferation of Names

OO

Several systematic names are possible for this structure

-ButyrolactoneDihydro-2(3H)-furanoneDihydrofuran-2(3H)-one4-Hydroxybutanoic acid lactone1-Oxacyclopentan-2-oneOxolan-2-one2-OxotetrahydrofuranTetrahydro-2-furanon

A chemical name is created according to nomenclature rules;most commonly according to IUPAC rules.But there are too many rules!

But the multiplicity of systematic names is inconvenient and diminishes the role of nomenclature.

All of these names correctly describe the structure.

The development of chemical nomenclature is governed by International Union of Pure and Applied Chemistry (IUPAC).

IUPAC Nomenclature

All specific IUPAC recommendations are published in Journal of Pure and Applied Chemistry and freely available.

A collection of main recommendations can be found at www.chem.qmul.ac.uk/iupac hosted by School of Biological and Chemical Sciences, Queen Mary University of London.

In additional to specific recommendations IUPAC publishes books for all areas of chemical nomenclature. The books can be purchased but after some period the content can be available free of charge.

www.iupac.orgDivision of Chemical Nomenclature and Structure

Representation (Division VIII)

Is there a “most systematic” name?

… the rules of systematic nomenclature need not necessarilylead to a unique name for each compound…

To force the naming of all compounds into the Procrustean bed of one set of rules would not serve the needs of general communication.

A Guide to IUPAC Nomenclature of Organic Compounds, 1993

…we have developed rules for assigning ‘preferred IUPAC names’, while continuing to allow alternative names in order to preserve the diversity and adaptability of the nomenclature…

Nomenclature of Organic Chemistry. IUPAC Recommendationsand Preferred Names 2013

The nice thing about standards is, there are so many to choose from

C. Northcote Parkinson

To enforce the role of systematic nomenclature the need of having one best name has been recognized by IUPAC:

New IUPAC Organic Nomenclature

In December 2013, the book “Nomenclature of Organic Chemistry. IUPAC Recommendations and Preferred Names” was published.

It is truly a long awaited publication.The work started in 1992. The IUPAC project was initiated in 2001.

This major organic nomenclature publication is an answer to the rapid development of chemistry and appearance of new classes of chemicals we have seen over the past 20 years.

The book deals with naming principles that were unchanged for about 35 years since the IUPAC Blue book was published in 1979.

New IUPAC Organic Nomenclature

IUPAC was not silent all these years. While “A Guide to IUPAC nomenclature of organic chemistry” 1993 did not include the criteria, it explained the principles and included some first hints of preferred names.

Between 1979 and 2013 several important recommendations related to specific classes have appeared:

The book incorporates all nomenclature development for the last 35 years and introduces many new procedures.

Von Baeyer polycyclic systems (1999)Extension and revision of nomenclature for spiro compounds (1999)Used and bridged-fused (1998)Phane nomenclature (1998, 2002)Natural products and related compounds (1999)Recommendations on fullerene nomenclature (2002, 2005)Other publications

New IUPAC Organic Nomenclature

Practically all of the work was done by two authors – Henri Favre and Warren Powell, but many people participated in all stages.

It is important to note that the book was reviewed not only by classical nomenclature experts but experts in computerized nomenclature as well.The experts from ACD/Labs, CambridgeSoft and NextMoveSoftware were involved.

The most principle innovation is the concept of preferred IUPAC name (PIN)selected by hierarchical sets of criteria.

There are many other changes compared to the Blue Book 1979. It takes ten pages just to list them. Lets see what ACD/Name provides for naming.

ACD/Name is Simple to Use but Advanced

Just draw a structure and push the button to get the chemical name.

Names can be generated for most classes of organic compounds in full accordance with IUPAC recommendations.

ACD/Name Tools

IUPAC numbering can be shown.

The display of numbering scheme allows you to understand the locants for substituents and functional groups.

ACD/Name Tools

Name fragments are associated with the structure fragments.

Pointing at any part of a name or structure allows you to understand the structure to name relationship.

ACD/Name Tools

Name segments are explained with the corresponding IUPAC rules.

Name segments are classified by their role in chemical names.

ACD/Name Tools

Full text of IUPAC recommendations is provided.

The links allow you to access any specific rule defining the naming principles for specific segments.

ACD/Name Tools

Stereodescriptors are generated and explained.

An assignment of each stereodescriptor is explained with a hierarchical graph to conclude the seniority of ligands.

ACD/Name Tools

Supports not only most classes of organic compounds but biochemicalsand some inorganic compounds also.

Thus, being powerful for scientific and industrial use, the program retains high value for all levels of chemical education.

New IUPAC Organic Nomenclature

The new Blue Book introduced many changes; the most significant of which include:

Of high importance is the fact that the number of criteria to select the best name has increased significantly. The nomenclature became more systematic but at the same time more difficult for humans.

Less trivial names

Length is senior to unsaturation

More locants

More enclosing marks

Multiplication over substitution

Al, Ga, In and Tl are now ‘organic elements’

many other specific changes

New IUPAC Organic Nomenclature

The systematic nomenclature assumes general naming principles and minimal number of trivial names.

Are there anybody who would argue with this change?

Such nomenclature is easier to learn and implement in software.

The downside is that some very common names are changed.

O

Was tetrahydrofuranPIN oxolane

Tetrahydrofuran is still acceptable!Some people would like to make it the preferred name, but we must agree to disagree.

Was tetrahydro-2H-pyranPIN oxane

O

New IUPAC Organic Nomenclature

You can see these changes for the following example.

New name is very simple as compared to the old name.

Why Wait to Program Our Software?

The first public draft of the new Blue Book was made available in 2004.

Some producers of naming software decided to implement new principles and claimed compatibility to ‘2004 rules’.

It means that changes concern not only advanced cases but some very simple cases as well. Note that new name corresponds to CAS naming principles.

C H 2

C H 3CH 3

Was 2-ethylbut-1-enePIN 3-methylidenepentane

Being involved in the development of IUPAC recommendations, ACD/Labs decided to wait until later versions, keeping in mind that several principle procedures were still under consideration well before the actual publication. We did implement some earlier well-agreed upon changes, including the change of seniority of chain length to degree of unsaturation.

New IUPAC Organic Nomenclature

Chain length is senior to unsaturation.

Old name was 2-heptylbuta-1,3-diene.

New IUPAC Organic Nomenclature

Cycles are now senior to chains for the same atom content.

Old rules had no specific criterion but “larger parent” was favored1-cyclopropyldodecane.

New IUPAC Organic Nomenclature

Multiplication is senior to substitution.

Sometimes old name was easier to construct.

New IUPAC Organic Nomenclature

Al, Ga, In and Tl substances are named now as organic compounds.

Previously were named as organometallic compounds.

New IUPAC Organic Nomenclature

Many previously omitted locants are now cited to avoid ambiguity.

Ambiguity is not obvious so better to cite more locants.

New IUPAC Organic Nomenclature

More enclosing marks are used to avoid ambiguity.

In most cases fragments with internal locants are enclosed.

New IUPAC Organic Nomenclature

Enclosing and locants for ring assemblies.

Assemblies with suffixes are now enclosed.

New IUPAC Organic Nomenclature

New numbering for assemblies with three and more members.

Assemblies with three to six members are numbered as phanes.

Time to Start Programming

ACD/Name is the most mature naming program on the market. The development was started in 1994 shortly after ACD/Labs was founded.

We support not only organic nomenclature but some inorganic and biochemical nomenclature as well.

It is very important that the PIN concept does not make all other names unacceptable. Chemists still have a choice on how to name the compound providing that the chemical name correctly and unambiguously defines the chemical structure. Such names are considered belonging to general nomenclature instead of preferred IUPAC nomenclature.

Thus, while programming new procedures, we retained most old naming conventions and trivial or some even outdated names.

It makes implementation of new procedures even more difficult but allows us to increase the educational value of ACD/Name.

Preferred Name Leaves No Choice

While IUPAC rules allow the user to choose naming concepts and other several options, the generation of PIN assumes only one possible way to name a structure.

The PIN generation option disables most other options, leaving the user with no choice.

How We Check New Procedures

The correspondence of names is compared by computer excluding even minor deviations from expected names.We have a database of structures mentioned in the book and all names are copy-pasted. In this example no changes were found.

ACD 1401 – name corresponding to old rules, NAME_PNAME_PIN – name from book, ACD 1500a – our latest new name. COMP scripted fields compare text strings for equivalence – YES or NO.

How We Check New Procedures

We can easily check what old names do not correspond to new principles – ‘NO’ value in COMP_1401.

‘YES’ in COMP_15a indicates that our new name corresponds to PIN.

This example corresponds to ‘a’-terms for cationic heteroatoms discontinued in new rules.

How We Check New Procedures

The difference of in a new name with PIN most often indicates that certain procedures were not implemented or there were mistakes in the naming algorithms.

We check all such cases to minimize the differences.

This case indicates absent support of CIP rule 5. Sometimes ‘NO’ indicates that error is not with ACD/Name but a mistake in the name provided in the book.

We Also Check the Book

Implementation of new procedures allows us to check the names provided in the book itself.

The new Blue Book is a very substantial publication. It has 1600 pages with more than 10,000 chemical names for 6000 structures.

All names have been created by the ultimate experts in nomenclature but it is too difficult for human beings to avoid mistakes and misprints. It is practically impossible to keep in mind all procedures and criteria and manually create fully correct chemical names.

Practically after each implemented procedure we find names in the book that include misprints or are simply inconsistent with the principles specified in the book.

Mistakes are hardly avoidable and it is especially so for the book with 10,000 chemical names.

Many mistakes are already found and ACD/Labs is probably a champion in spotting of errors. We do not search for mistakes, we just try to implement new rules.

Errata for New Blue Book

I arrived at the ACS Fall 2014 meeting just after the meeting of the IUPAC division of chemical nomenclature and structure representation (Division VIII) that took place in Bangor UK, August 1-4.

One of the most important discussions was devoted to publication of corrections for the new Blue Book.

Currently the list includes more than 500 corrections. While many of them are just textual corrections, a significant number deal with the correction of chemical names. In some cases these corrections refer to the principles of naming.

It means that even after the publication it is sometimes hard to decide what procedure to implement.

The errata can be expected about the end of this year. But IUPAC nomenclature continues its development along with chemistry itself and new recommendations should be expected.

Nomenclature in Other Languages

All IUPAC recommendations are developed and published in English

The adaptation of English nomenclature is a responsibility of the corresponding national nomenclature committees, if existing, or groups of enthusiastic chemists with a good knowledge of nomenclature

For some languages practically all IUPAC recommendations are translated and published. For example, German and Portuguese. For other languages such translations are scarce and often outdated. For example, Spanish and Italian.

Surprisingly, you can find the translation of some major recommendations in Macedonian but not in Spanish. While more than 400 M speak Spanish and just 2 M Macedonian.

That is why for some languages there is no agreement on how to name specific classes of chemicals. Even for basic educational resources.

Nomenclature in Other Languages

It takes significant time to translate and publish. Chemists need to wait years to follow actual nomenclature.

English 1993

It means that nomenclature conventions in some languages can fall behind in decades!

Portuguese 2002Czech 2000

Algorithmic name generation can help here as well if naming in other languages is supported.

ACD/Name and Languages

The current version of ACD/Name supports naming in 10 languages.

Although there are some small language specific issues, the program correctly names most chemical substances, especially organic.

ACD/Name exceeds the nomenclature knowledge of most professional chemists and educators.

ACD/Labs multilingual naming was inspired by a contract with the European Commission Taxation and Customs Union aimed at developing procedures for the translation of chemical names from English to other European languages.

ACD/Name – Additional Languages

We are going to extend naming to more languages

ACD/Name supports 16 languages in our internal version, compared to 10 available in the official version and we continue to add even more.

The development is not so quick due to difficulties to find reliable sources of nomenclature rules for specific languages and a lot of basic translations to collect.We gladly accept any help in finding the sources of names in specific language.

This work is under way and new languages will appear in ACD/Name in future versions.

ACD/Labs Naming Tools

Program Features

ChemSketch Name Free Free, 50 atoms only, English only*

ACD/Name Chemist Version Names only, all supported languages

ACD/Name Full featured

ACD/Name Batch Treatment of large structure sets

ACD/I-Lab Web-based, English only *, Free also

ACD/Labs produces several name generation tools.

We expect to release the new version ACD/Name with support of new IUPAC rules by the end of 2014.Additional languages will appear soon.

* We are considering the addition of multilingual naming

Demand and Share for PINs

There are no doubts that preferred IUPAC name will come in widespread use. First of all for official documents and regulations.

We got the first request from our Japanese customers a month before the actual publication of new Blue book.

Obviously not every name is changed by the new rules but according to our estimations it affects about 25% of names; and chemists will want to know is a particular name is preferred or not.

Everyone can agree that chemical nomenclature is complex. It is now even more complex, including more procedures and criteria.

New principles of organic nomenclature should be learned and taught at least at a basic level to distinguish the correct name from incorrect versions.

New nomenclature rules significantly concern the automatic treatment of chemical information including algorithmic name generation. The chemistry and nomenclature has already became too complex to rely fully on human interpretation.

Lets control the quality of names

Humans cannot be trusted!Manual naming and typing tends to produce mistakes and misprints.

Currently the most important task is algorithmic control of chemical name quality. People should not be trusted to generate and type names. It should be done by software tools or controlled by such tools.

2-(4-(tert-Butyl)phenyl)-1,1,1-trifluoro-2-((trimethylsilyl)methyl)butan-2-ol

F3C

CH3

OH

t-Bu

SiMe3

CF3

OH

t-Bu

SiMe3

2-(4-tert-butylphenyl)-1,1,1-trifluoro-3-(trimethylsilyl)propan-2-ol

dx.doi.org/10.1021/jo402577nJ. Org. Chem. 2014, 79, 1145−1155

The name below was found in recent publication. It looks quite systematic but once you try to derive the structure you understand that it is wrong.

while intended

Thank you for interest in ACD/Labs nomenclature tools

and care about the quality of chemical information.