37
An Expert System for Chemical Structure Elucidation Sean Walker COMP 4200 November 13, 2007

An Expert System for Chemical Structure Elucidation Sean Walker COMP 4200 November 13, 2007

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

An Expert System for Chemical Structure Elucidation

Sean WalkerCOMP 4200

November 13, 2007

Introduction

• I will be discussing an expert system developed to determine the chemical structure of an unknown compound (structure elucidation)

• The expert system is implemented on a blackboard

IntroductionMotivation

• Structure elucidation is a fundamental component of organic chemistry

• Requires a wide range of expertise– Each elucidation technique has its own unique

vocabulary that needs to be mastered

• An expert system can be used to simplify this process

IntroductionOutline

• Outline of presentation:1) Fundamentals of blackboard systems

2) The expertise being modeled• General spectroscopic techniques

3) Description of the expert system

Blackboard Systems

“Metaphorically, we can think of a set of workers, all looking at the same blackboard: each is able to read everything that is on it and to

judge when he has something worthwhile to add to it.” – Newell, 1969

Blackboard Systems

• A set of experts independently modify solution elements on a central database to produce a complete solution

• The experts communicate solely through their contributions to the central database

• Three major components:– 1) a globally accessible database (the blackboard)– 2) a set of knowledge sources (the experts)– 3) a control mechanism (the scheduler)

Blackboard SystemsThe Blackboard

• Blackboard is structured as an abstraction hierarchy

• Problems can be solved from different points by different knowledge sources

• Items on the blackboard are called entries

• Entries on the same level or on different levels of the hierarchy are linked

• Linked entries constitute a potential solution

Blackboard SystemsThe Knowledge Sources

• Knowledge sources are structured as condition-action pairs– The condition component monitors the blackboard for

any changes– The action component makes changes to the

blackboard when the condition-part is satisfied

• When the condition is satisfied, the knowledge source is “triggered” and the scheduler decides whether the knowledge source will execute its action

Blackboard SystemsThe Scheduler

• One or more problem solving strategies are implemented

• The scheduler examines the current state of the blackboard and decides which triggered knowledge source to execute based on the problem solving strategy in place

• The scheduler can abandon a strategy and adopt a new one or ignore a strategy altogether in order to pursue the most promising solution

Structure Elucidation

• Modern structure elucidation is done using spectroscopy

• In absorption spectroscopy a frequency of light is irradiated on a sample of the unknown and the absorption of the compound is measured

• The resulting data is analyzed by an expert and information about the structure of the unknown can be obtained

• The information collected from each spectra is integrated to determine the complete structure

SpectroscopyThe Electromagnetic Spectrum

Infrared Spectroscopy

• Involves the absorption of light in the infrared region of the electromagnetic spectrum

• Used primarily to determine what functional groups are present in a molecule

O

CH3

CH3

CH N

CH3 OH

CH3 NH

CH3

CH3

CH3

CH CH

Infrared Spectroscopy

• The broad peak at around 3000 cm-1 indicates the presence of a hydroxyl group (OH)

• The strong, sharp peak at around 1750 cm-1 indicates the presence of a carbonyl group

UV Spectroscopy

• Involves the absorption of light in the ultraviolet region of the electromagnetic spectrum

• Used to determine the level of conjugation in the unknown– Conjugation is alternating single and double bonds

• UV spectroscopy is not very useful in structure elucidation

CH3 CH2

Proton NMR

• Contains information about the hydrogens in the molecule

• Three key aspects:1) chemical shift – the “type” of hydrogen2) integration – ratio of different types of hydrogens3) splitting – nearest neighbour relationship

• Can be used to identify the presence of certain functional groups

• Used primarily to determine how the different functional groups present fit together (the connectivity)

Proton NMR• The peak at around 10 ppm indicates the presence of an aldehyde

• The peak at 2.6 ppm is split into 4 peaks (a quartet) indicating adjacent to a carbon with 3 hydrogens

O

H

CH3

Carbon-13 NMR

• Contains information about the carbons in the molecule

• Three key aspects:1) chemical shift – the “type” of carbon

2) splitting – the number of hydrogens bonded to each carbon

3) number of unique carbons present

• Used to determine connectivity

• Peak at 190 ppm indicates the presence of a carbonyl (C=O)

• There are 7 total peaks indicating that there are only 7 unique carbons in the molecule

Carbon-13 NMR

Mass Spectroscopy

• Mass spectroscopy is used to determine the molecular formula of the unknown compound

• Mass spectroscopy data that provides structural information tends to be unreliable and thus will only be used to verify a possible structure or in the event that the other spectral techniques are unsuccessful

Structure ElucidationApplicability of a Blackboard Architecture

• Each type of spectroscopy is unique • A human expert will often analyze a set of

spectra as a whole, selectively determining which spectral information to utilize at a given time

• The blackboard architecture is ideal for this approach

• The blackboard architecture also allows for new experts to be added (new spectroscopic techniques)

The Expert SystemThe Blackboard

• An expert system implemented on a distributed blackboard has been developed to determine the structure of a chemical compound

• A sequential implementation of a blackboard would allow only one expert to access the blackboard at a time

• In a distributed system experts can access different sections of the blackboard at the same time

The Expert SystemThe Blackboard

• The hierarchy of the blackboard is based on the complexity of the structures being produced– Low level, basic structures occupy a certain

level of the blackboard while more complicated structures occupy a different level

The Expert SystemThe Experts

• There are two main types of experts:1) Structure generation routines

2) Spectroscopy experts

Structure Generation RoutinesStoring Structures

• Ideally every possible chemical structure could be stored but this is not feasible– Even a simple formula such as C23H48 has 5,731,580

structural isomers

• Instead a set of substructures (components) is stored such that any possible structure can be formed from a combination of these components

• There are 630 total components• Components are classified as primary,

secondary or tertiary components

Structure Generation RoutinesTypes of Components

• 1) Primary Components:– Primary components are the most basic components

for constructing organic molecules (CH3, CH2, CH, C, CO, OH, O, NH2, NH, N, SH, S, F, Cl, Br, I)

• 2) Secondary Components:– Secondary components are combinations of primary

components– There are 86 secondary components

• 3) Tertiary Components:– Tertiary components are secondary components with

a restriction on what the component can bond to

Structure Generation Routines

• The structure generation routines produce sets of primary, secondary or tertiary components based on input data

• The sets can be further pruned using spectral information

Spectroscopy Experts

• There is an expert for each type of spectroscopy:1) Infrared Expert

2) Ultraviolet Expert

3) Proton NMR Expert

4) Carbon-13 NMR Expert

5) Mass Spectroscopy Expert

Spectroscopy Experts

Spectroscopy Experts

• The data contained in a spectrum may be unreliable or ambiguous– e.g. in a proton NMR spectrum if the chemical shift

between two hydrogens is < 1 then the splitting observed may be inaccurate

• Heuristic rules are used to handle this ambiguity• Uncertainty factors are attached to each

conclusion drawn from the spectra

Spectroscopy Experts

• Each spectral expert translates the data contained in the spectra into molecular fragments

• These fragments are placed in an “active list” which is used to direct and restrict the structure generation routines

• If fragments from different experts conflict then the fragment with the highest certainty factor is used

• The conflicting fragment is placed in an “inactive list” which is used in the event that a correct structure is not found using the active list

Spectroscopy Experts

• The spectroscopy experts are also used to test generated structures for consistency with the spectral information

• The system is able to identify when there is not enough information to verify a possible structure

An Example…

• Formula of unknown: C7H12O4

• 93 possible sets of primary components are produced

• Using these primary sets 497 sets of secondary components are possible– the number of sets of secondary components

can be decreased if the primary component sets are pruned using spectral data

An Example…

An Example…

• After pruning the sets of primary components only one possible set remains:– Set contains 2CH3, 2C=O, 2OH, 1C and 2CH2

O

OHCH3

CH3

O

OHOH

OO

OH

CH3

CH3

An Example…

Conclusion

• Determining the chemical structure of an unknown is an important part of organic chemistry

• Expert system technology can be applied to this domain

• A blackboard architecture is especially well suited to this task

References

1) Craig, I. D., Blackboard Systems, Artificial Intelligence Review (1988) 2, 103 - 118.

2) Funatsu, K., Susuta, Y., Sasaki, S., Introduction of Two-Dimensional NMR Spectral Information to an Automated Structure Elucidation System, CHEMICS. Utilization of 2D-Inadequate Information, J. Chem. Inf. Comput. Sci., 1989, 29, 6-11.

3) Sobczak, Ronald S., Matthews, Manton M., An Expert System for Chemical Structure Elucidation Implemented on a Blackboard, Proceedings of the 3rd International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, 1990, 91-98.

4) Sobczak, Ronald S., Matthews, Manton M., A Massively Parallel Expert System Architecture for Chemical Structure Analysis, Distributed Memory Computing Conference, 1990, 11-17.

5) Sasaki, S., Kudo, Y., Structure Elucidation System Using Structural Information from Multisources: CHEMICS, J. Chem. Inf. Comput. Sci., 1985, Vol. 25, 252-257.