A knowledge-based system for real-time quality control and fault diagnosis of multitest analyzers

Computer Methods and Programs in Biomedicine, 34 (1991) 175-190 175 © 1991 Elsevier Science Publishers B.V. 0169-2607/91/$03.50

COMMET 01141

A knowledge-based system for real-time quality control and fault diagnosis of multitest analyzers

Torgny Groth and Henrik Mod6n

Unit for Biomedical Systems Analysis, Uppsala Unioersity, S-750 02 Uppsala, Sweden

A PC-based real-time quality control (QC) system for multitest analyzers has been developed as a prototype. The system is built with use of a relational database management system (DBMS). Control values from the various analytical channels are stored and administrated with use of the DBMS. The control values collected during various stages of control are filtered through statistical control procedures and the control status of the instrument is continuously presented in color-coded fields indicating the possible presence of critically sized systematic or random analytical errors. The knowledge about rational trouble-shooting of a specific instrument is represented in a network structure and stored in relational tables of the DBMS. An inference engine performs alternating backward and forward reasoning in the network and guides the operator in trouble-shooting.

Knowledge-based system; Relational model; Real-time quality control; Trouble-shooting; Fault diagnosis; Multitest analyzer

1. Introduction

Quality control (QC) and quality assurance in the clinical chemistry laboratory are suitable applications of computer technology [1,2]. This is especially true for multichannel and multitest instruments considering the large volume of control data and the tedious manual work of plotting conventional control charts. Quality control simu- lators could be used to design and optimize the performance of analytical procedures with regard to productivity ('test yield') and quality as ex- pressed in terms of 'allowable analytical errors' and 'allowable defect rates' specified from a medi- cal point of view [3-5].

The multi-rule Shewhart procedure, published and established as a 'selected method' in clinical

Correspondence: Torgny Groth, Unit for Biomedical Systems Analysis, Uppsala University, Box 2103, S-750 02 Uppsala, Sweden.

chemistry [6], was designed to be applicable also in small laboratories without computer support. More complex trend analysis techniques could also be applied for continuous monitoring of critical changes in accuracy and precision. Manual implementation of such optimized control systems, with capabilities to utilize multi-rule and multi-stage procedures, to perform statistical analysis and graphical presentation, and to document control activities, would be difficult. Microcom- puters may be very useful, providing color-coded displays to facilitate the interpretation and com- munication of actual control status, and database management software for record keeping. The potential role of knowledge-based techniques is also great in these applications.

The goal should be to provide real-time quality control at the bench level and self-diagnosing analytical instruments giving advice on which mea- sures to take in case of malfunctioning. From a cost-benefit point of view this is probably one of

176

the best applications of knowledge-based system (KBS) methodology in the clinical laboratory.

In the present KUSIN-M~D project we have built a prototypical 'real-time QC system with an

expert trouble-shooter', with the aim to develop and apply appropriate KBS methodology for this type of problem. Our 'connectionist model' of knowledge representation and its implementation

HARDWARE SE RE ERROR

I S-ALAT [ ] [ ] 4 Monitoring 2 S-ASAT [] [] 6 Assessment 3 S-CK [ ] [ ] 2 Initial Check 4 S-LD [ ] [ ] 4 Monitoring 5 S-Urea [ ] [ ] 1 Monitoring 6 S-Urate [] [] 4 Initial Check 7 S-Na [ ] [ ] 5 Assessment 8 S-K [ ] [ ] 4 Assessment

I H I Instrument User and o Instrument Interface A

F--7 Monte-Carlo Management IQA Simulator System Design

Program

In.,=.n,

Description Base Editor Editor

~ ~ | Relational | ~ i i ~ ~ | Database | Statistical I - - ~ Management ~ " - I R u l e Control ~ System ~ Interpreter Procedure ~ &

~ ~ Inference Engine

IQC-Database

l, I Knowledge-base

Fig. 1. Outline of knowledge-based system for real-time quality control and trouble-shooting of multitcst analyzer.

177

in a relational database management system to provide a means for combined numeric and symbolic reasoning, is an approach of general applicability.

2. S y s t e m description

The system as outlined in Fig. 1 is built with use of a relational database management system (MIMER from Mimer Software AB, Uppsala, Sweden [7]). The DBMS is used to manage both a QC database and a knowledge-base on rational

trouble-shooting of a specific analytical instrument.

The input to the system is partly generated by the instrument and partly entered by the operator (Fig. 2). Data from the instrument consist of control values from the various analytical channels and hardware alarms. The operator initially speci- fies the technical set-up of analytical channels and the statistical QC procedures to be applied. Dur- ing routine operation the operator is expected to answer questions from the trouble-shooter, and to log restart conditions and maintenance.

The output from the system (Fig. 2) consists of

Configuration d a t a d Technical setup of analytical channels

QC-phases Structure of / QC-procedure~ ~ QC-rules

Number of controls

Input data

J J / Control d a t a - ~ - ~

Channel ID Control value

X Hardwore aarms ,.,....~ .... '.bLi ......

...... ;,iii

CP computer transmission error Real-time clock not running Main memory error

~ Answers to questions Operator response from trouble-shooter

Logging of restart conditions Logging of maintenance

Output information

Instr. status

t descriptors

Questions and related information

Restart information

~ Control phase Channel Control No. within status ~ the r u ~ Accepted

Status Warning code ~ Rejected

Hardware error Alarm~ Channel error

Fig. 2. Input and output diagrams for the system.

178

informat ion on the current control status of the

various analytical channels of the ins t rument , and alarms related to ' ha rdware errors' and critically sized changes in analyt ical imprecision and inac-

curacy ( ' channel errors'). Fur thermore, the system asks quest ions concern ing possible causes for

malfunct ioning, and provides in format ion on the

proper restart of analytical channels after correc- t ion of errors.

The system was developed with use of a Monte Carlo s imulator of the analyt ical ins t rument . This

feature is also very useful for t ra ining of oper- ators.

Various editors are required ( Ins t rument De- script ion Editor, Statistical Control Editor, and Knowledge-base Editor) of which the latter is of

special impor tance as it provides a means to docu-

men t t rouble-shoot ing procedures under develop- men t in the laboratory. A QC design program is another module of the system.

The various parts of the system are managed by the User and Ins t rumen t Interface Managemen t

System writ ten within the Microsoft Windows system.

2.1. Qua f i t y contro l da tabase

The database is time-related and divided into three main tables (the asterisks denote key fields):

Table: Individual control values * Test code defining analytical channel (system and compo-

.nent) * Date of analysis * Control material * Current number of analytical run * Control sample number within the run

Control phase Analytical method Control value Control status code

Table: Summary of control values from individual runs * Test code * Date of analysis * Control material * Current number of run

Analytical method Control status of the run Number of controls Mean value of control samples Range of control values Within-run standard deviation Coefficient of variation

Table: Long-term statistical summary of control values from several runs

* Test code * Control material * Date when summary was terminated * Date when summary was started

Analytical method Number of accepted runs in summary Total number of controls Average number of controls per run

CONTROLS

Start-up Reporting Reporting Analytical of Test Results of Test Results Procedure l 1

l 1 Routine Ana'ysis / Routine Analysis /

. . . . . . . . . . . . . . . . . '1' . . . . . . t : . . . . . . . . . . . . . . . . . '1' . . . . . . . . . . . . . ¥ . . . . . . I : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........... ~ ............... , , , . , ............... ~ , ............. , ....... , ....... . ........... ,~ .....................................

o ® Q 6 ® Initial Monitoring 1

during a run t Check t

Assessment of single run

@ ~ @ Long-term assessment of several runs

Fig. 3. Nested stages of analytical quality control (from [8], with permission).

Weighted mean value of all accepted control values Within-run imprecision Between-run imprecision Total imprecision Skewness Kurtosis Frequency of rejected runs Frequency of rejected runs with recorded faults

The control material is described in:

Table: Control material * Test code * Control material

Conventional true value Reference method Mean value (stable analytical performance) Within-run imprecision (stable performance) Between-run imprecision (stable performance) Total imprecision (stable performance)

Most of the basic data in the QC database are automatically provided by the analytical instrument during routine operation; some data are then derived from these by statistical calculations. The current control status of the analytical channels is recorded by error codes separating between systematic and random errors. There is also a table for logging of actions taken by the operator or generated by processes in the system during routine operation, and for documentation of method changes and maintenance. Logging texts are t ime-stamped with date and time of insertion.

2.2. Statistical control procedures

A strategy of 'nested stages of control ' was em- ployed (Fig. 3 and [8]) such that control information is obtained simultaneously from several procedures operating over different time periods, e.g. (1) an initial check with several control specimens analyzed during a short time to establish that the measurement procedure is operating properly prior to analysis of patient specimens; (2) monitoring during an analytical run with control specimens analyzed periodically to provide alerts on up-coming analytical disturbances of critical size; (3) assessment of a single analytical run in order to assure that the desired level of quality is achieved before reporting of the results; (4) assessment of two or more runs for monitoring of subcritical analytical disturbances; and (5) assessment of

179

many runs providing for estimates of analytical quality that are representative of long-term performance. Ideally the control information from the different stages should be available simultaneously to aid the interpretation and decision making.

Stages 1-3 will be primarily used by the bench level analysts, and reject signals should require investigation of analytical problems and rejection of patient data. Stages 4 -5 will be used by super- visors and directors, and control signals will trig- ger preventative maintenance procedures.

In the 'Statistical Control Editor' statistical rules may be selected from a menu of generic rules for detecting systematic errors, e.g.: - mean rules, significance level = X - Shewhart N ( X * S) rules - mean rules N(mean) - decision limit cusum rules - Trigg's tracking signal rule and another menu with rules for detecting random errors, e.g.: - Shewhart I ( X * S) rules - range rules, sign.level = X - chi-square rules, sign.level = X

The number of controls, N, in the different nested stages is also specified in this editor. For further details on statistical matters see [8,9]. A special QC design program is available to assist in the selection of QC procedures which could detect critically sized analytical disturbances [5]. See Sec- tion 2.3.

Our preference in this application has been to use a combination of mean and range rules in stages 1-3 (for initial check, monitoring during the run, and for assessment of the run before reporting).

A useful modification could be to include more than one control material (control level), e.g. on a high, medium and low concentration level in cases when it could be expected that analytical disturbances may appear in different concentration ranges. The evaluation of control values could then be performed within and across materials. In the latter case a normalization of control values to the mean and standard deviation of each control level is required:

G j = ( C i , j - 4 ) / S D i

180

Example

189 Channel Error

II The error occurs in one single channel

311 Subroutine call 'Flame photometer channel'?

306 Increased imprecision in a channel with a flame photometer

283 Subroutine call to find out if 'increased imprecision'?

522 The sample volume coil is not filled or only partly filled

473 Low sampling pressure to the multiple plate

III Unclean sample pick-up head

Unknown

7

283~~~

nkno 2 ~

306 ''I ~'-

521 / ~ 2 8 Unknown

236 O O 118

- / / /

\ % . ~ . ~ 2 5 6

~"291 473

i07 ~ ~ ' ~ Unknown

156

127 ~ 259

118 i[i 108

Fig. 4. Structure of knowledge-base for parts related to 'channel error'.

i = control level, e.g. high, medium, low, j = number of control sample on level i, C = measured concentration value, C = assigned concentration value of control

material. This transformation allows for evaluation of

pooled U-values. For further details of this tech- nique see [6].

For long-term assessment of two or several runs a mean rule is applied to the mean values for individual runs. It should be executed in situations when analytical disturbances may be suspected, e.g. when warning signals a p p e a r / h a v e appeared during monitoring, or when a reject signal in assessment of a single run is difficult to explain. To facilitate comparison of various channels a standard unit transformation of the mean values could be used also here [9].

2.3. QC Design program

An optimal design of a statistical control procedure (see [8]) has to consider the characteristic parameters of the measurement procedure during stable analytical conditions, as well as some conception of the type, frequency and size of analytical disturbances which are expected to appear. Limits for the 'al lowable analytical error' then determine critically sized systematic shifts from baseline performance and increases of the inherent random error, which have to be detected in order to assure the specified analytical quality. The con- sideration of all these factors and aspects in the design is greatly facilitated with use of the QC Design program. This program is described in detail elsewhere [3-5].

2.4. Knowledge-based trouble-shooter and advisor

This module was specifically built to provide guid- ance in trouble-shooting when malfunctioning of the instrument has been indicated by alarm signals. However, it was designed as a general software shell for combined numeric and symbolic reasoning within a relational database management system.

181

2.4.1. Knowledge-base Knowledge is represented in a network structure and is implemented in relational tables of the DBMS.

The network is made up of 'elements ' which could be single facts, subroutine calls, or instructions and questions to the operator.

The subroutine calls may result in simple table look-ups or complex searches of the database, or execution of other 'knowledge sources' performing more or less complex numeric a n d / o r symbolic reasoning processes.

A 'node ' is defined as a relation between elements and other nodes (cf. Fig. 4).

The elements are described in natural language stored in:

Tabs E L E M E N T 1 TEXT * C 64

2 ELNO I 4

Short text describing individual element

* Indexed* Element number

The relations describing the nodes and their ' t ransfer functions' are stored in:

Table RELATION 1 NDNO *1 4 Node number 2 SEQ *I 2 Sequence number 3 ARR C 64 Element array where the 'element array' defines the elements and the logical connections/relations between these elements.

These connections or special functions of the nodes are given in:

Table S P F U T X 1SFNUM *I 4

2 SFTEXT C 8 * I n d e x e d *

Special function number Special function text, e.g. NOT, AND, OR, XOR, NAND, NOR, XNOR

Elements made up of instructions, queries and related information to the operator, are stored in a separate table:

Table OPERA TTX 1 ELNO *I 4 Element number 2 SEQ *I 1 Sequence number 3 TEXT C 80 Text to operator

182

There are also a number of help tables, e.g.:

Table H I S T O R Y with date and historical information of all elements 1 El.NO * I 4 Element number 2 DATE C 6 Date of insertion or change 3 SOURCE C 80 Historical information

and a table with initialization values for all elements in a node which are set before the processing of that node:

Table 1NITNODE I NDNO *I 4 Node number 2 El,NO * I 4 Element number 3 VALUE I 2 Initialization value, e.g. TRUE,

FALSE, UNKNOWN

2.4.2. Inference engine Inferencing takes place by alternating backward and forward chaining depending on the actual logical structure of the knowledge base, and the mode of initialization of elements and nodes.

The inference engine always starts in the backward mode working top-down from indications of possible top-level conclusions, spreading nodal activations down to deep-level elements that are required to confirm a line of reasoning. In this first pass through the network the strategy is to exclude various possible alternatives, and continue the search via routes that cannot be excluded for the time being. Once the criteria are fulfilled to make a conclusion, the downward (backward) reasoning is stopped at that level in the network and upward (forward) confirmation is fired of higher level nodes along the successful route of reasoning. Inferencing is continued in this alternating backward-forward chaining modes until a major conclusion has been reached. For evaluation of a node (on whatever level of the network) the inference process proceeds step-wise in order to make sure that higher level reasoning takes place prior to the evaluation of the deeper levels:

Step 1: Check whether the value of the actual node is known (' true' or 'false') or 'unknown'.

Step 2: If 'unknown' calculate the value of the node from the corresponding expression, but without making subroutine calls.

Step 3: If step 2 still gives the value 'unknown', repeat it and make subroutine calls if any.

Step 4: Search and evaluate the whole network related to the node and make the necessary calls to subroutines.

Step 5: Repeat step 4 and put complementary questions to the operator, provided that step 4 evaluation has been completed for the node where inferencing was started off in the first place.

This strategy was chosen to avoid irrelevant questions to the operator, and to make the search as efficient as possible. The ordering of elements within a node should also be made in decreasing order of probability of occurrence, to speed up reasoning.

The inference engine can be set in either a 'multiple-fault' or 'single-fault' mode of operation. In the latter case the search is stopped when the first significant fault has been detected.

2.4. 3. Application A prototype system was built for the Clinicon PI~ISMA multichannel analysis system (New Clini- con AB, Bromma, Sweden). It is an instrument which can perform selective analysis of up to 300 samples per hour. It is configurated with eight analytical 'channels' grouped into one 'processing track', and with two tracks making up one 'processing unit'. Each track is served by a 'multiple plate'. There may be a maximum of four processing units providing a total of 64 channels.

The PRISMA Error Checklist, assembled and compiled by the Prisma Users Group in Europe, was represented in a network partly illustrated in Fig. 4 for the case of a so-called 'channel error'.

The number of elements in the whole network is 524, composed of 123 nodes, 17 subroutine calls and 388 questions to the operator. The number of 'hardware alarms' is 93. Sample listings are given in Figs. 5-7.

Node 189, 'channel error', is defined as a relation between nodes Nos. 11, 12, 13 and 14. Listed in natural language it reads: Node 189: Channel_error

e r r o r s_occu r_on_a l l _ c h a n n e L s_on_a l L mUni ts

OR malfunction_of_all_channels on one

_unit OR errors_occur_in_channels_supplied

_by_one_mu I t i p le_p I ate

146. flooding_of_processing_house

#97 AND ( #98 OR #i )

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

147. the_reaction_temperature in the processing_track is notcorrect

#102 AND ( #103 OR #i )

149. not_enoughsample_left in the sample_cups

#113 AND ( #114 OR #115 OR #i )

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

150. there is always_air to the_non-return valve(s)

#137 AND ( #138 OR #139 OR #i )

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

151. increased_imprecision in more_channels on same_multipleplate

#283 AND ( #92 OR #93 OR #94 OR #95 OR #96 OR #146 OR #99 OR #27 OR #205 OR #1 )

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

152. increased_inaccuracy_in more_channels on same_multiple_plate

#285 AND ( #i01 OR #147 OR #99 OR #144 OR #205 OR #i )

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

153. the_sample_volume_coils are not filled or only_partly_filled

#473 OR #149 OR #116 OR #117 OR #118 OR #119

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

154. sample and reagent_volume_coilsdo_not_empty_during_one_cycle

#120 OR #121

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

155. the_sample_cup_is_not_elevated

#122 OR #123 OR #124 OR #125 OR #126

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Fig. 5. Sample~sting ofnodesandtheirconnect ionswith othernodesand elements.

183

OR the_e nror_occurs_i n_one_si ng le _channel

OR unknown_occur fence and in short form it is represented as: #14 OR 013 OR #12 OR #11 OR #1

The inferencing proceeds as follows:

Since all nodes and corresponding elements are initialized as 'unknown' the evalution of node 189 will give the value 'unknown' in step 1 and step 2, and also in step 3 as there are no subroutine calls on this level. In step 4 the related nodes # 1 4 , # 1 3 , # 1 2 and # 1 1 are evaluated in more detail:

184

Node 14: Errors_occur_on_all_channeLs_on _ a l l _ u n i t s

#185 AND (#54 OR #55 OR #1) Node 13: Mal func t ion of a l l_channels_on

_one_unit #186 AND (#47 OR #1)

Node 12: Errors occur in channels s u p p l i e d by one m u l t i p l e p l a t e

#187 AND (#151 OR #152 OR #1) Elements #185, 186 and 187 are subroutine calls checking the pattern of r~ect signals on channels, tracks and units. These elements will all have the value 'false' in this case and node 11 is left as the only possible explanation of 'channel error' on this level of the network.

The reasoning therefore proceeds with: Node 11: The_error_occurs in one_single

_channel ( f lame_photometer

_channel AND ( i n c r e a s e d _ i m p r e c i s i o n

in one_channel_with_a _f lame_photometer

OR increased_inaccuracy in one_channel_with

_f lame_photometer OR unknown_occurrence ) ) OR ( NOT f lame_photometer

_channel AND ( i n c r e a s e d _ i m p r e c i s i o n

_in_one_channel_wi th- out_a_flame_photometer

OR increased_inaccuracy _ in_one_channel_wi th- out_a_flame_photometer

OR unknown_occurrence ) ) (#311 AND ( #306 OR #307 OR #1 ) )

OR ( NOT #311 AND ( #23 OR #24 OR #1 ) )

This node will get the value 'unknown' in steps 1 and 2; in step 3 there will be a subroutine call from element # 311 'Flame photometer channel?' gifing e.g. the value ' true'. This will then exclude the second half of rule 11, and step 3 evaluation continues with: Node 306: Increased imprecision in one

s ing le channeL wi th a f lame photome- t e r

#283 AND (#190 OR #522 OR #524 OR #526 OR #191 OR #192

OR #193 OR #337 OR #220 OR #204 OR #27 OR #205 OR #1)

Steps 1 and 2 will give the value 'unknown' for node 306; in step 3 there will be a subroutine call from #283 to find out if there is 'increased imprecision?'

If this is ' t rue' the search will go on with # 190, # 522, etc., down to the deepest level possible with step 3 and step 4 searches. Step 5 search of the network (including questions to the operator) is not possible until step 4 searches have been completed for node 189. In the step 5 evaluation of node 306 reasoning will possibly come down to the deeper level where questions to the operator may confirm e.g. that 'unclean sample pick-up head' is a reason for the indication of 'channel error' in the first place. Element #111 will then be given the value ' true' followed by upward (forward) setting of nodes #156, #473, #28 , #522, #306, #11 and #189 as ' true'.

In the present application the search is stopped when a fault has been identified and confirmed by the operator. The assumption was made that multiple faults do not occur very frequently, and therefore the operator should be spared the efforts of further trouble-shooting. Possible additional faults will show up when the channel is started again.

If #283 would be calculated as 'false', on the other hand, in the step 3 evaluation of # 306, the inferencing is reversed to 'upward mode' and # 3 0 6 gets the value 'false'. Node #307 is then evaluated in steps 1, 2, 3, 4 and so on, until a faulty component of the instrument is detected.

When the 'trouble-shooter' is started by the operator, clicking the 'advice' field on the screen, it will work with the facts available at that time, but it will also take into account new facts (rejects on other channels) coming-up during the trouble- shooting. The search then goes back to the central node 189 (channel error), but in the following reasoning the previous values are kept for nodes with questions already answered by the operator.

RLNO SEQ

190 1 Do the stirrers fail to work properly? 190 2 NOTE: 5mm amplitude

191 1 Is the airflow from the selective valve (96 00 3671) too high?

191 2 NOTE: See Ill. Parts Lists, p. 2.34-2.35.

192 1 Is the sample and/or the reagent block unclean?

193 1 Is it an air-leakage in the sample or reagent block? 193 2 NOTE: The volume of water delivered by the suspected 193 3 block can be weighed for checking

194 1 Does the filling tip need to be cleaned?

198 1 Is the photometer cuvette defect or contaminated?

199 1 Is the photometer tip not properly fitted in the photometer cuvette?

200 200

201 201 201 201

202

204

205 205 205 205 205 205 205 205

TEXT

205 205 205 205 205

1 Is the photometer tip too short? 2 NOTE: The tip should be 49 mm (support bush (96 00 1672)-

tip)

1 Does the photometer pump fail to work properly? 2 NOTE: 3 A properly working photometer pump should draw about 4 400 ul of reagent in one stroke.

1 Is the photometer pump waste-removal-system clogged?

1 Is it a microbial growth in reagent tubes and/or ACH pumps?

1 Do we have a defect 2XY-board? 2 NOTE: 3 Any defect 2XY-board may interfere with other channels 4 belonging to the same multiple plate. 5 Symptoms for a defect 2XY-board: 6 I. An even or uneven final digit on all results. What 7 digit you get differ between calibrations. 8 2. A change of level on all results within or between

series. 9 ---

i0 If you have alarm from the cuvette pumps although the Ii pumps are working, then you can pull out all 2XY-boards 12 on the module and mount one at a time to see whem the 13 fault occurs again. Run a test rack between each

mounting.

Fig. 6. Sample listing of instructions and questions to the operator.

185

The operator may also choose to 'ignore' some questions and just browse through the knowledge- base.

When a fault has been found the trouble-shooter will list analytical channels which are affected, and could be restarted after repair. Eventual re- maining faulty channels will then be considered.

Restart of an analytical channel may. be done in various ways, e.g. by running an 'initial check' before starting analysis of patient samples, or by entering directly into monitoring of the new run.

A suitable restart point may be defined after inspection of the status codes of individual controls in the rejected run.

2.4. 4. The Knowledge Base Editor This editor is of special importance since it should provide a means to document procedures for trouble-shooting as they are evolving in the laboratory.

The menu lists the following options: (1) insert new node; (2) update old node; (3) delete a node; (4) display node(s); (5) print node(s).

186

COD TXT

1607 Sample distribution motor 1617 Sample distribution cups horizontal position 1627 Sample distribution cups vertical position 1647 Cup rinser valve 1657 Cup rinsing suction valve 1667 Cup rinser lift valve 1677 Rinsing fluid valve 1707 Rinsing fluid suction valve 1717 Cup lift control valve 1727 Cup lift valve rinsing fluid position 1XX4 CP computer transmission error 3334 Illegal CP request (noise) 4MS4 AC Power out 5554 Maintenance control 6004 Boot error 6664 Boot of CP 7614 Prisma clock not running 7624 Real-time clock not running 7634 Both clocks not running 7654 Prisma clock no interrupt 7664 Real-time clock no interrupt 7674 Both clocks no interrupt 7774 Power Up 7XY4 Main memory error x002 Processing Track Horizontal movement X012 Processing Track Vertical movement X042 Cup Lift valve X052 Tube Rinser valve X062 Dispensing air valve X072 Stirrer x102 Photometer Light Source X402 Processing Track Horizontal movement X403 Sample Diluter Control valve X412 Processing Track Vertical movement X413 Sample Diluter Reset valve X442 Cup Lift valve X443 Waste Water Separator X452 Tube Rinser valve X453 Conc Waste Flask Full X462 Dispensing air valve X472 Stirrer X502 Photometer Light Source X503 A/D-reference Voltage X513 Power Supplies XXX0 Valve error XXX1 Photometer error

Fig. 7. Sample listing of hardware errors.

At present this editor is rather rudimentary regarding the node-specification language, and re- quires further development to be useful by non- programmers.

2.5. User interface

The user interface is based on the Microsoft Windows system. The main screen displays the

control status of individual analytical channels as grouped with regard to units and tracks (Fig. 8a). Two columns indicate the presence/absence of critically sized changes with regard to systematic error (SE) and random error (RE), respectively.

Separate columns denote the current control phase of the analytical channels (initial check, monitoring, assessment) and the number of control samples analyzed so far. The indicator fields

Fig. 8. Screen lay-outs of the user interface. ( a ) Main screen with pull-down menus, overview of control status, and alarm windows. (b) Window for dialog between the system and the operator during trouble-shooting. (c) Display of individual control values and

control status within an analytical run.

188

for systematic and random error are color-coded with 'green' for accept, 'yellow' for warning and ' red' for reject. Alarms regarding ' hardware error' or 'channel error' are presented in a separate window, and guided trouble-shooting is com- municated via another separate window (Fig. 8b).

Inspection of single runs may be performed by pointing and clicking with the mouse at the channel of interest. The control values of the current run will then be displayed in a row with indicators of control status with regard to critically sized systematic and random analytical disturbances (Fig. 8c). The various phases of control are indicated by I for initial check, M for monitoring and A for assessment. This display of control status indicators helps the operator to determine a suitable restart point after rejection of a run. As indicated in Fig. 8c, yellow warning signals of systematic error started to appear for control No. 7, ending with a red reject signal for control No. 10.

Inspection of previous runs may sometimes be of interest for assessing the eventual appearance of subcritical systematic errors. The display then shows the mean values of controls from previous runs asked for. A separate row shows the indicator for subcritical (but significant) errors. This indicator is set by a mean rule applied to the mean values of the last 20 runs.

Pull-down menus are provided for: Routine operation - Start-up - Normal operation Systems management - Description of analytical channels - Selection of QC procedures - Editing of knowledge-base - Database queries and statistics Demo / training - S p e c i f i c a t i o n of simulated analytical dis-

turbances - Simulate hardware alarms - Simulate analytical procedure

2.6. Instrument interface system

In the development of this prototype system we have used a Monte Carlo simulator of the instrument, where the various analytical channels are

described in terms of their respective within- and between-run components of analytical variation, control levels, number of controls, and assumed analytical disturbances (baseline shifts, and increased imprecision). Synthetic control data are generated from normally distributed random numbers.

In the PRISMA system all measurement data are acquired directly by an on-board microcomputer, which stores the control and patient results on disc where they are readily accessible for quality control, reporting and data transmission. The in- tegration of the present QC system with the PRISMA instrument was not completed in the present prototype phase.

The description of the analytical characteristics of the channels is entered and stored in tables in the database. The technical configuration of channels in 'tracks' and 'units ' is specified in:

Table CHANNELS 1 UNI *I 1 Processing unit 2 TRA * I 1 Processing track 3 ¢I-IA *I 1 Analytical channel 4 STA C 1 Channel status

2. 7. Event manager

The Microsoft Windows Event Manager handles all the events generated as a result of the user operating the mouse and the keyboard, or by the instrument (simulator) module. The events are stored as messages in a queue watched by the Event Manager, which triggers the appropriate processing, e.g. statistical control; updating or query of QC database; updating of display; di- alogue with trouble-shooter. The processing is performed in a semi-multitasking mode.

3 . H a r d w a r e a n d s o f t w a r e r e q u i r e m e n t s

The system is developed on an IBM PS/2 model 70 under MS-DOS 4.01. The MIMER database management system [7], implemented in C, was used in this project, but the data and knowledge structures are portable to other relational database management systems. The Inference Engine is written in Fortran, and the User and Instrument

Interface Management System is written in C and the Microsoft 386-Windows system, version 2.0.

4. Discuss ion

Knowledge-based technology can provide appropriate and powerful tools to facilitate fault diagnosis and trouble-shooting of technical systems of various kinds, e.g. computers and instrumentation (cf. [10]). This is especially the case in real-time, sensor-based fault detection in large and complex systems [11].

In the clinical chemistry laboratory there are many potential applications of knowledge-based systems (cf. [12]). One of the best suited areas is probably the process and quality control of the analytical instrumentation, which has been auto- mated by the construction of more and more sophisticated systems, integrating all the instru- mental and methodological components. During recent years instrument manufacturers have started to provide 'selected methods' [6] in their instrument software for real-time quality control, and there is also a growing interest in providing support for fault diagnosis.

In such an application it is crucial to have a true conception of the structure and function of the analytical procedure, and to have effective methods for handling time-related data (cf. [11,13]).

In the present project we have developed and applied a relational model for both date and knowledge processing. The analysis and display of control values is performed in real-time; the QC database has got a highly time-related structure, and the knowledge-base reflects the structure and function of the analytical procedures with use of a connectionist model of rational trouble-shooting.

The relational representation is very flexible and may be used to describe both hierarchical trees and networks. Another advantage is the in- tegration of database and knowledge-base in the same RDBMS, in a way that allows for both numeric and symbolic processing.

The inference engine was built to work in an alternating backward and forward mode, performing the top level reasoning before entering into

189

deep-level questioning of the operator. There is no true parallel processing in the network; the nodes representing technical faults on various levels are evaluated in sequential sorted order with a speed that makes the distinction between parallel and sequential processing irrelevant.

Presently there are no 'learning laws' implemented in the 'processing nodes'. A future possi- bility could be to use current statistics on the frequency of various faults as a basis for auto- matic restructuring of nodes. A similar adaptation could probably be performed automatically with regard to the optimal selection of statistical control procedures and number of control samples for individual channels. Specified limits for 'allowable analytical error' and current estimates of frequency of errors are the required entry parameters to 'selection grids' [14], which can be derived by Monte Carlo simulation studies and stored in the database.

As pointed out above the present system is a prototype, which was built to develop and apply suitable knowledge-based system methodology for quality control and fault diagnosis of analytical instruments in the clinical chemistry laboratory. The applicability and suitability of this methodology has to be further evaluated by field testing and application to other analytical systems.

Acknowledgements

This work was performed within the Nordic R&D Project on Knowledge-Based Systems in Medicine (KUSIN-MED). The support of the Nordic Fund for Technological and Industrial Development, and participating companies (New Clinicon AB, Hewlett-Packard Sweden AB and MIMER Software AB) is gratefully acknowledged.

Special thanks are due to the PRISMA Users Group and its Chairman, Associate Professor Anders Kallner, for providing a structured Error Checklist for the PRISMA instrument.

References

[1] J.O. Westgard, Better quality control through microcom- puters, Diagn. Med. 5 (1982) 60.

190

[2] J.O. Westgard and T. Groth, Computer systems for implementation of internal quality control procedures, Scand. J. Ciin. Lab. Invest. 44 (Suppl. 172) (1984) 203.

[3] T. Groth, H. Falk and J.O. Westgard, A quatity control simulator for design and evaluation of internal quality control procedures, Scand. J. Clin. Lab. Invest. 44 (Suppl. 172) (1984) 195.

[4] J.O. Westgard and T. Groth, Design and evaluation of statistical control procedures: applications of a computer 'quality control simulator' program, Clin. Chem. 27 (1981) 1536.

[5] T. Groth and J.O. Westgard, A second generation simulation program for optimizing quality control design, Clin. Chem. 36 (1990) 1003.

[6] J.O. Westgard, P.L. Barry, M.R. Hunt and T. Groth, A multi-rule Shewhart chart for quality control in clinical chemistry, Clin. Chem. 27 (1981) 493.

[7] MxMEg/DB: Database Manager and Program Interface, ISBN 91-7878-012-8 (MIMER Software AB, Uppsala, 1987).

[8] J.O. Westgard, T. Groth and C.H. de Verdier, Principles for developing improved quality control procedures, Scand. J. Clin. Lab. Invest. 44 (Suppl. 172) (1984) 19.

[9] T. Aronsson and T. Groth, Nested control procedures for internal analytical quality control. Theoretical design and practical evaluation, Scand. J. Clin. Lab. Invest. 44 (Suppl. 172) (1984) 51.

[10] K.R. Carr, Expert Systems Application for Diagnosis and Trouble-shooting of Computer Systems and Instrumenta- tion, 13 pp. (Oak Ridge Nat. Laboratory, DE 87000219, 1986).

[11] D.B. Malkoff, Real-time Fault Detection and Diagnosis: The Use of Learning Expert Systems to Handle the Tim- ing of Events, 42 pp. (Navy Personnel Research and Development Center, ADA174655/1, Nov. 1986).

[12] T. Groth, Data base and knowledge-based systems in clinical laboratory medicine, in: Laboratory Data and Patient Care, eds. Kerkhof and van Dieijen-Visser, p. 101 (Plenum Press, New York, 1988).

[13] P.M. Sanderson and J.M. Murtagh, Predicting fault diagnosis performance: why are some bugs hard to find? IEEE Trans. Syst. Man Cybern. 20 (1990) 274.

[14] J.O. Westgard, E.F. Quam and P.L. Barry, Selection grids for planning QC procedures, J. Clin. Lab. Sci. 3 (1990) 273.

Documents

A knowledge-based system for real-time quality control and fault diagnosis of multitest analyzers