11
Pergamon hr. J. Nun. Stud., Vol. 32, No. 2, pp. 115-125, 1995 Copyright 0 1995 Else& Science Ltd Printed in Great Britain. All rights reserved 002&7489/95 59.50+0.00 0020-7489(94)00042-5 Quality assessment instruments in nursing: towards validation SALLY J. REDFERN, BSc., Ph.D., R.N. IAN J. NORMAN, B.A., M.Sc., Ph.D., R.N., Dip.App.Soc.Stud., C.Q.S.W. Nursing Research Unit, King’s College, University of London, Cornwall House Annexe, Waterloo Road, London SE1 8TX, U.K. Abstract-The aim of this study was to explore the validity of the nursing quality assessment instruments, Monitor, Senior Monitor and Qualpacs. This follows recommendations in the literature for the need for more comprehensive val- idation of instruments than has been the case hitherto. A multiple triangulation research design was used which included observation of and interviews with nurses and patients as well as administration of the instruments with the same patients and a questionnaire completed by the nurses in charge on ward organisation and their approach to nursing care. Results reported here focus on our experiences of using the instruments, their inter-rater reliability and comparisons of instrument scores within medical, surgical and elderly care wards. Difficulties were encountered in using the instruments but most of these can be overcome given sufficient time for preliminary discussions. Inter-rater reliability of all three instruments taken as a whole reached acceptable levels, although some of the section score correlation coefficients were low, especially for Qualpacs. Convergent validity was achieved for the Senior Monitor-Qualpacs com- parisons in four elderly care wards. The results were less clear for the Monitor- Qualpacs comparisons in seven medical and surgical wards. Explanations for the equivocal results are suggested and subsequent hypotheses were tested which supported these explanations. Introduction The aim of this study was to explore the validity of some of the quality assessment instruments in common use in nursing in this country. Three instruments are included: 115

Quality assessment instruments in nursing: towards validation

Embed Size (px)

Citation preview

Pergamon hr. J. Nun. Stud., Vol. 32, No. 2, pp. 115-125, 1995

Copyright 0 1995 Else& Science Ltd Printed in Great Britain. All rights reserved

002&7489/95 59.50+0.00

0020-7489(94)00042-5

Quality assessment instruments in nursing: towards validation

SALLY J. REDFERN, BSc., Ph.D., R.N. IAN J. NORMAN, B.A., M.Sc., Ph.D., R.N., Dip.App.Soc.Stud.,

C.Q.S.W. Nursing Research Unit, King’s College, University of London, Cornwall House Annexe, Waterloo Road, London SE1 8TX, U.K.

Abstract-The aim of this study was to explore the validity of the nursing quality assessment instruments, Monitor, Senior Monitor and Qualpacs. This follows recommendations in the literature for the need for more comprehensive val- idation of instruments than has been the case hitherto.

A multiple triangulation research design was used which included observation of and interviews with nurses and patients as well as administration of the instruments with the same patients and a questionnaire completed by the nurses in charge on ward organisation and their approach to nursing care.

Results reported here focus on our experiences of using the instruments, their inter-rater reliability and comparisons of instrument scores within medical, surgical and elderly care wards.

Difficulties were encountered in using the instruments but most of these can be overcome given sufficient time for preliminary discussions. Inter-rater reliability of all three instruments taken as a whole reached acceptable levels, although some of the section score correlation coefficients were low, especially for Qualpacs.

Convergent validity was achieved for the Senior Monitor-Qualpacs com- parisons in four elderly care wards. The results were less clear for the Monitor- Qualpacs comparisons in seven medical and surgical wards. Explanations for the equivocal results are suggested and subsequent hypotheses were tested which supported these explanations.

Introduction

The aim of this study was to explore the validity of some of the quality assessment instruments in common use in nursing in this country. Three instruments are included:

115

116 S. J. REDFERN and I. J. NORMAN

Monitor, Senior Monitor and Qualpacs. All are derived from externally generated criteria and claim to provide a valid index of the quality of nursing care delivered in a ward. Positive and negative features of these instruments are presented and an argument for comprehensive validation is made. The findings presented here focus on inter-rater reliability and within- ward comparisons of the instruments which were administered with the same patients at the same time.

Generic quality assessment instruments

In the U.S. composite measures were developed by combining criteria of nursing care quality derived by pooling professional knowledge into checklists. These checklists were used to obtain a measure of the quality of nursing care as a whole and are known as pre-formulated generic quality assessment instruments. The instruments focus upon the performance of the nurse (e.g. the Slater Nursing Competencies Rating Scale-Wandelt and Stewart, 1975) or upon the care received by patients (e.g. the Quality of Patient Care Scale-Wandelt and Ager, 1974; the Phaneuf Audit-Phaneuf, 1976; and the Rush-Medicus Nursing Process Methodology-Jelinek et al., 1974).

In the U.K., there has been a marked increase in the use of pre-formulated generic quality assessment instruments over recent years, being used by some of the districts in all regional health authorities (Kitson et al., 1988). Of these, Monitor and its versions for different client groups is the most popular and Qualpacs is used in acute general and elderly care units.

Monitor

Adapted for use in the U.K. from the Rush-Medicus Nursing Process Methodology developed in Chicago (Jelinek et al., 1974) Monitor was developed by Goldstone and his colleagues (Goldstone et al., 1983; Goldstone, 1987a, b). It was part of their Criteria for Care system designed to establish nurse staffing levels and skill-mix from analysis of nursing activities (Ball et al., 1984). The instrument consists of a Ward Monitor containing 43 items that describe and assess the procedures and management of the ward (including structural factors such as staffing levels, grade-mix, workload, support services and environmental safety), and a Patient Monitor.

Four schedules are contained within Patient Monitor: the schedule for Dependency Group (DG)l patients (least dependent) contains 81 items, DG2 patients, 107 items, DG3 patients, 148 items and DG4, 118. Each schedule contains sections broadly organised around the stages of the nursing process. The first section, planning nursing care, contains parts relating to the patient’s admission, assessment and coordination of nursing with medical care; meeting the patient’s physical needs refers to patient safety, provision of physical comfort and rest, and hygiene, nutrition, fluid and elimination needs; non-physical needs of the patient are met contains parts relating to staff courtesy, patient privacy, rights, well-being, health promotion and prevention of ill health, and involvement of the patient’s family; and evaluation of nursing care objectives refers to documentation of observation, treatment and care and the patient’s response to treatment.

Information is collected from nursing records, discussion with nurses and patients and by observation. The items are rated with yes (score = l), yes-sometimes (score=O.5), no

QUALITY ASSESSMENT INSTRUMENTS IN NURSING 117

(score=O). Total scores are based on the sum of “yes” answers, with the closer to 100% the better the care.

Senior Monitor

Senior Monitor (Goldstone and Maselino-Okai, 1986) is an augmented modification of Monitor for elderly care wards. It uses the same scoring system but all 232 items within Senior Monitor are covered for each patient irrespective of dependency level. Senior Moni- tor contains three additional sections, making seven in all; these are the patient’s need for rehabilitation; care of the severely ill, terminally ill or the dying patient; and the last offices (care of the deceased). The number of parts in the first two sections is greater than in Monitor.

QuaIpacs

Qualpacs was designed by Wandelt and Ager in the U.S. (Wandelt and Ager, 1974). Nurse assessors observe nurse-patient interactions and assess the quality of care delivered. The scale’s 68 items are divided into into six sections: psychosocial-individual, psycho- social-group, physical, general, communication and professional implications. A set of cues guides the observer with illustrative examples for each item. Items are scored (from I= poorest to 5 = best care) with reference to the standard expected of a first level staff nurse, irrespective of the grade of nurse being observed. Section and total scale scores are derived for a ward.

The observation period is 2 h and follows an initial period of chart review and verbal report from the nurse allocated to the patient so that the observer can compile an outline care plan for the patient. After the observation the observer looks for evidence of indirect care by consulting the patient’s records; this information is scored in the same way on the Qualpacs schedule. All 68 items are scored or are endorsed as being not applicable or not observed.

Validit

Generic instruments differ in focus and in components of nursing addressed, but they share common characteristics. They give a broad evaluation of care as a whole rather than of specific aspects of nursing; they are based upon externally generated components of nursing that are related to quality; they purport to provide an “objective” assessment of the quality of nursing by rating items and allotting scores; and they are usually administered by external assessors. Although generic instruments tend to be associated with tog-down approaches to quality assessment (Harvey, 1990) this need not necessarily be the case. Advocating a bottom-up approach does not exclude the possibility of outside experts being invited by clinical nurses to assess the quality of their care using generic instruments. The bottom-up approach does require, however, clinical nurses to issue any invitation to external assessors and to own the results.

Validating generic instruments is a particular problem because most require some kind of judgement of quality. Inclusion of judgement, even those of professional experts, means that the validity of the instruments measured by statistical techniques is likely to be low. This conclusion is supported by studies that have compared scores from different generic instruments. A correlation coefficient between two “process” instruments, Qualpacs and the Phaneuf Audit, was 0.01 (Ventura, 1980). Similarly, correlations between Qualpacs and

118 S. J. REDFERN and I. J. NORMAN

the Rush-Medicus audit were low, being close to zero between the physical and psychosocial subscales of both instruments (Ventura et al., 1982).

Other research on instrument comparisons has shown little correlation (Giovannetti et al., 1986), suggesting that quality is multi-dimensional and a combination of measures is necessary for comprehensive assessment. Generic instruments aim to address all aspects of nursing care that are relevant to quality, which poses a difficulty because their pre-for- mulated criteria become outdated as the nature of nursing changes. For example, health promotion and prevention of illness are not adequately addressed by most instruments because of their emphasis on disease (Giovannetti et al., 1986; Phaneuf, 1976; Van Maanen, 1981).

The evidence for discriminant validity is scant, although Monitor was found to dis- criminate between wards with scores ranging from around 50% to 80% or more in several studies (Goldstone, 1987b). For Qualpacs, average scores compared against independent judgements of the quality of care in the same wards revealed a correlation coefficient of 0.52, leading to the conclusion that Qualpacs discriminates successfully between wards (Wandelt and Ager, 1974).

Inter-rater reliability has been achieved reasonably successfully with quality assessment instruments. For example, during development of Monitor, inter-rater agreement for both Patient Monitor and Ward Monitor was 80% or more (Goldstone, 1987b). For Qualpacs, inter-rater reliability correlation coefficients ranged from 0.64 to 0.91 (Wandelt and Ager, 1974) but later research did not achieve the researcher-specified criterion of 0.75 for any of the subscales (Ventura et al., 1982).

Towards validation

In this study, we attempted to validate Monitor, Senior Monitor and Qualpacs using a multiple triangulation research design. This includes simultaneous administration of the instruments to the same patients in medical, surgical and elderly care wards in order to compare scores between instruments. Other methods included:

l observation of self-care activities of patients and their interactions with nurses using a time-sampling technique similar to that developed by Kitson (1991);

l interviews with the same patients and their nurses to elicit their perceptions of high and low quality nursing care using a modification of Flanagan’s (1954) critical incident technique;

l a questionnaire based on Kitson’s (1991) Therapeutic Nursing Function Indicator com- pleted by the ward sister and deputy to establish their approach to nursing care and ward organisation.

Further information on the research design and methods is available elsewhere (Norman et al., 1992a, b).

Administration of Monitor and Senior Monitor

Several practical difficulties were encountered in the process of administering Monitor and Senior Monitor; these have been described in more detail in Tomalin et al. (1992) and are summarised here.

QUALITY ASSESSMENT INSTRUMENTS IN NURSING 119

Structure of the instruments. The schedules are disjointed and time consuming to complete because items are not grouped by subject matter nor by information source (i.e. consult patient’s records, ask patient, ask nurse, observe). It is most irritating for patients, nurses and assessors when assessors have to leaf through the whole manual to cover those items whose answers are to be found from the same source; also, items can be missed. A solution was to give assessors a checklist of all items listed under each source.

Accessing information. Patients’ records were not always available when needed because they were in use elsewhere. Also, talking to or observing the patient was sometimes imposs- ible or unacceptably intrusive.

Interpreting items and their responses. Many of the items required lengthy discussion before agreement by the assessors was reached. Another problem occurred when items and their cues conveyed different information. Moreover, confirmation of an item did not necessarily convey good care. For example, a “yes” score was awarded if there was “. . . a written statement of care given to pressure areas on the skin”, even when that statement conveyed inadequate care. A high score, therefore, did not always represent high quality care.

Disrupting ward staff. It was not always easy to find a convenient time to talk with nurses and patients.

Administration of Qualpacs

Qualpacs also presented some difficulties (see Redfern et al., 1993) which are summarised below.

Unwieldiness. The 6%item schedule is manageable but it is cumbersome to carry around the 20 pages of cues to the items. The cues were abandoned after training to avoid the temptation to rate the cue rather than the item.

Obscurity. The passive voice rather than the more familiar active voice is used; for example, “patient receives explanation and reassurance when needed” could be simplified to “nurse explains and reassures”. Simplification promotes speed and certainty and, therefore, accuracy in scoring when rating many observations that occur in rapid succession.

Uneyuicocal identljication. Comprehensiveness of the items and mutual exclusivity were further difficulties. For example, some items encompassed broad areas of nursing care (e.g. “patient receives nurse’s full attention”) whilst others were more specific but closely related to the same area (e.g. “patient is given an opportunity to explain his feelings”). Many items overlapped and some were multiple, making a single score impossible.

Delimiting an interaction. As with any observation study, deciding when an interaction began and ended required frequent discussion until agreement was reached.

Sectio+s. It was not always easy to identify the appropriate section for the interaction observed. For example, the section specified as communication refers to communications made by the care-giver on behalf of the patient rather than communication between care-

120 S. .I. REDFERN and I. J. NORMAN

giver and patient. Also, the general section could, we feel, be deleted and its items allocated to other sections or omitted altogether when alternatives exist elsewhere.

Rating an interaction. It took considerable discussion for the raters to reach consensus on the standard expected of a first level staff nurse and it was never easy to articulate the details of this consensus. We also had to get used to rating unqualified nurses and other carers against the same standard.

Omitted care. The manual specifies that omitted care should be rated the same as “poorest care”; it is important to distinguish between the two.

Directly and indirectly rated items. It is inappropriate to accord equal weight both to items rated by direct observation, which can be verified, and items rated indirectly (e.g. through record review) which are less verifiable.

A notable strength of Qualpacs’ is its unequivocal reliance on expert judgement, which is essential when rating personal interactions. Even though there were substantial difficulties in administering Qualpacs and both the Monitors, most of these can be overcome during the planning stage. The importance of allowing enough time for discussion and training of assessors cannot be overemphasised.

Inter-rater reliability of Monitor and Senior Monitor

Procedure and analysis. Inter-rater (inter-observer) reliability was estimated from the scores of two observers who watched the same event simultaneously and independently rated the relevant variables according to the instrument’s coding schedule. These two sets of scores were used to estimate inter-rater reliability using percent agreement and the intraclass correlation coefficient (ICC). The limitations of percent agreement have been well documented (Bartko and Carpenter, 1976) but it was decided to retain this measure for comparison with Goldstone’s (1987b) tests of Monitor.

Percent agreement was calculated by expressing the number of agreements between the assessors for each item as a percentage of the total number of agreements and disagreements. ICC was the preferred measure for inter-rater reliability and it was calculated using the method described by Bartko and Carpenter (1976). The test of significance for ICC is simply the ratio of the between subject (numerator) and within subject variance (denom- inator) which is referred to an F distribution with Nl, M-N degrees of freedom where M is the total number of ratings and N is the number of subjects.

Tests were made in an elderly care ward (El) and two surgical wards (S2 and S9). In El, two assessors simultaneously administered Senior Monitor to 9 patients and in S2 two assessors simultaneously administered Monitor to 10 patients. After eight months for Monitor and nine months for Senior Monitor, inter-rater reliability was tested with both instruments in S9. Three patients aged over 70 years were assessed with Senior Monitor and six patients with Monitor. This time interval was selected simply because resources (raters) were available then.

A limitation of the inter-rater reliability test was that it focused on total and section scores of the instruments rather than on item scores. This was the approach taken by Ventura’s group (Ventura, 1980; Ventura et al., 1980, 1982) and by Giovannetti’s (Giovannetti et al., 1986) both of whom confined their testing to total scale and section

QUALITY ASSESSMENT INSTRUMENTS IN NURSING 121

scores. Item testing would have been ideal but would have generated, for Qualpacs, up to 75 tests per ward and many more for each of the Monitors. In this study, not all patients received a score on every item, particularly those that occurred infrequently. Therefore, some of the items could not have been tested, whereas section scores were, on the whole, complete for all patients and raters.

It was not intended to provide a definitive analysis of inter-rater reliability. The measures of reliability were used solely to assess for consistency amongst raters employed for this study and not as a way of checking the reliability of the instrument itself. We required a simple expedient method that would give a reasonable indication of any major differences between the raters before proceeding to validity testing. This seemed a sensible course of action given that these instruments are in widespread use, that validity had not been explored sufficiently in the past and that our resources were limited. Hence the number of individuals sampled was small but sufficient for the purpose described. Our approach would have been different had we been constructing a new instrument; in that case we would have needed a larger sample of patients. Nunnally (1978) suggests samples of 300 or more are required for studies of measurement errors.

Results. For Monitor, mean percent agreement across patients in the two surgical wards was 79% (range 58-89%, n = 10) and 89% (range 8693%, IZ = 6). For Senior Monitor, the mean was 89% (range 8496%, n=9) in El and 80% for the three patients over 70 years in S9 (range 73-84%). Mean section score agreement was never below 65% and was usually above 75%. These figures are reasonably close to Goldstone’s (1987b) criterion of 80% for high agreement. We have been unable to find the rationale for Goldstone’s criterion of 80% other than his view that this is more than adequate.

For Senior Monitor, the total score ICC for El was 0.81 (p<O.OOl) so exceeding Ventura’s (1980) criterion of 0.75. An ICC could not be calculated for Senior Monitor in S9 because of the small sample. For Monitor, the total score ICC was 0.89 for S2 and 0.88 for S9 (both p<O.OOl). ICCs for section scores ranged from 0.43 to 0.98 for Monitor and from 0.26 to 0.94 for Senior Monitor. It was the section on evaluating nursing care objectives (D for Monitor and G for Senior Monitor) that had the lowest ICCs.

The rationale for Ventura’s criterion of 0.75 for the ICC, also specified by other investi- gators of nursing quality assessment instruments (Giovannetti et al., 1986), is not clear. Whereas the criterion for developing an instrument should be stringent (perhaps over 0.90) particularly when important decisions are made as a result of specific test scores (Nunnally, 1978) the criterion for testing agreement between raters need not be so high. Since our purpose was to test consistency between raters rather than item-based reliability, we thought it reasonable to follow the lead of Ventura and Giovannetti and accept 0.75.

Inter-rater reliability of Qualpacs

Procedure and analysis. As with the Monitors, inter-rater reliability of Qualpacs was tested by administering Qualpacs simultaneously with Senior Monitor in wards El and S9 and with Monitor in wards S2 and S9. In each ward the two instruments were administered to eight patients. The inter-rater reliability test used was the ICC only.

Results. The ICCs for Qualpacs total scale scores was 0.98 (p<O.OOl) for El and 0.82 (p < 0.00 1) for S2 but it dropped to 0.60 (p < 0.05) for S9. The section score ICCs ranged

122 S. J. REDFERN and I. J. NORMAN

Table 1. Monitor compared to Qualpacs

Ward

Monitor

Score s.d.

Qualpacs

Score s.d. n r P

M6 54.72 8.50 52.40 5.81 13 0.36 ns M7 52.46 4.61 44.77 6.67 12 -0.20 ns M8 53.89 9.00 50.33 4.06 12 -0.24 ns Medical 53.72 7.50 49.25 6.37 37 0.09 ns

from a low and non-significant 0.07 to a high 0.98 (p <O.OOl). Tests could not be made for Section 2 (psychosocial-group) because there were not enough observations for analysis. In El, all the section score ICCs were 0.85 or higher. None of the section ICCs in S2 and S9 reached Ventura’s 0.75 criterion except for Section 6 (professional implications) in S2.

Instrument comparisons

Comparisons were made between Qualpacs and either Monitor or Senior Monitor in each ward using Pearson’s correlation coefficient. This told us the extent of association between the instruments when administered to the same patients at the same time.

The MonitorQualpacs mean scores and correlation coefficients for the three medical wards are shown in Table 1.

The highest correlation between instrument scores was for M6 with a coefficient (r) of 0.36; not high enough to be significant at the 5% level. The coefficients for M7 and M8 were negative and also not significant.

For the four surgical wards (Table 2), none of the coefficients were high or significant and the predominantly negative correlations indicate an inverse association, most notably for S2 and S9. The results for the medical and surgical wards combined indicate very little association between Monitor and Qualpacs.

So, there was no significant correlation between the instruments for the seven medical and surgical wards. Wards M6, S2 and S9 revealed the highest coefficients but the associ- ation was positive only for M6. The combined correlations for ward type should be treated with caution because some wards revealed positive coefficients and others negative. The results indicate low convergent validity; that is to say, the instruments measure different components of quality.

Table 2. Monitor compared to Qualpacs

Monitor Qualpacs

Ward Score s.d. Score s.d. n i- P

s2 55.13 8.99 51.02 5.43 8 -0.35 ns s9 46.69 9.74 50.58 6.33 11 -0.40 ns SlO 56.93 6.37 49.34 7.16 12 0.15 ns s11 60.67 6.21 53.37 3.63 12 -0.12 ns Surgical 55.02 9.22 51.09 5.81 43 -0.15 ns Medical + surgical 54.42 8.44 50.24 6.11 80 -0.13 ns

QUALITY ASSESSMENT INSTRUMENTS IN NURSING 123

Table 3. Senior Monitor compared to Qualpacs

Ward

Senior Monitor

Score s.d.

Qualpacs

Score s.d. n r P

El 42.12 5.38 45.65 13.81 7 0.70 ns E3 47.62 4.53 48.21 6.60 12 0.46 ns E4 44.84 5.81 46.92 7.14 12 0.82 <o.ot ES 52.65 7.61 49.60 10.33 12 0.70 <0.05 Elderly 47.35 6.92 47.82 9.05 43 0.58 to.01

The picture was much clearer for the four elderly care wards, in which Senior Monitor was compared to Qualpacs (Table 3).

The correlation coefficients were all positive and high although they were significant only for E4 and E5 and when all wards were combined. This provides evidence of convergent validity for Senior Monitor and Qualpacs.

We also compared Monitor to Senior Monitor scores to see whether these very similar instruments correlated with each other. The result might throw light on the low correlation between Monitor and Qualpacs. That is to say, if the correlation between Monitor and Senior Monitor is low and the correlation between Senior Monitor and Qualpacs is high, then the Monitor-Qualpacs correlation would be low. Monitor and Senior Monitor were administered at the same time to a small, additional sample of patients aged over 70 years in the same wards. As Table 4 shows, the correlation coefficients were positive and high for all the wards; not unexpected given the similarity between these two instruments.

Thus, even though Senior Monitor contains three additional sections that do not feature in Monitor (i.e. rehabilitation, care of severely/terminally ill, care of the deceased patient), correlation between the instruments was high. The low correlation between Monitor and Qualpacs has not, therefore, been explained.

Another possibility is that one or more of the dependency group schedules within Monitor might have been responsible for the lack of association between Monitor and Qualpacs. The Monitor dependency group schedules do differ slightly whereas Senior Monitor and

Table 4. Monitor compared to Senior Monitor

Ward

Monitor

Score s.d.

Senior Monitor

Score s.d. n r P

E3 51.78 8.35 E4 46.77 4.46 ES 45.06 12.13 M6 47.46 8.81 M7 49.23 7.40 M8 47.61 12.58 s9 51.12 13.39 SlO 51.21 9.27 Sll 56.16 6.85 Total 49.88 9.49

45.59 5.48 4 0.90 ns 43.07 4.06 4 0.96 -co.05 45.53 7.67 4 1.00 <O.Ol 40.87 5.85 5 0.95 <0.05 44.60 5.99 5 0.84 -Co.05 39.33 10.16 6 0.94 <O.Ol 48.85 10.23 6 0.91 <0.05 43.60 8.06 6 0.74 ns 46.71 6.14 6 0.81 ns 44.24 7.5 46 0.84 10.01

124 S. J. REDFERN and I. J. NORMAN

Table 5. Monitor-Qualpacs comparisons within dependency groups

DG Wards

Medical 1 Surgical

Med + surg Medical

2 Surgical Med + surg Medical

3 Surgical Med + surg

n r P

10 0.38 ns 11 -0.47 ns 21 -0.08 ns 13 -0.27 ns 17 -0.26 ns 30 -0.14 ns 8 0.69 ns

12 0.18 ns 20 0.45 < 0.05

4 Medical Surgical Med + surg

6 -0.57 ns 3 0.75 ns 9 -0.02 ns

Qualpacs each consist of a single multi-dimensional scale. As Table 5 shows, there was some support for this hypothesis.

For the DGl schedule, two coefficients were of moderate magnitude (although not significant) but the correlation was positive for the medical wards and negative for the surgical wards. For the DG2 schedule, none of the coefficients were high and all were negative, suggesting lack of association between the instruments.

Only for the DG3 schedule were the scores all positively related, reaching 0.69 for the medical wards and 0.45 for the medical and surgical wards combined; the latter reached significance at the 5% level. A difference between the DG3 schedule and the others is that it contains many more items in Section B (meeting the patient’s physical needs): 11 more than the DG4 schedule, 37 more than DG2 and 57 more than DGI. These additional items extend the subsections within Section B that cover protection from accident and injury, meeting needs for physical comfort and rest, meeting activity needs, and hygiene needs. The high (although non-significant) coefficient for the DG4 schedule for the surgical wards is promising but, with such a small sample, this would need confirmation.

Our conclusions are that the DGl, DG2 and DG4 schedules revealed results that were sufficiently discrepant to raise doubts about the convergent validity of Monitor and Qual- pats and suggest that these instruments measure different constructs. Only for the DG3 schedule was the correlation reasonably high and the direction positive. These results should, however, be accepted cautiously; only one coefficient was statistically significant at the criterion set (5%) and so confirmation is needed with a larger sample.

The implications of these findings are that users should not assume that ward scores emerging from the Monitor DGl, DG2 and DG4 schedules will correlate with Qualpacs. However, it is reasonable to make this assumption for the Monitor DG3 schedule and for Senior Monitor compared to Qualpacs.

Since we found a close association between Senior Monitor and Qualpacs, choice of either of these instruments might be preferable to Monitor for use with elderly patients in medical and surgical wards as well as in elderly care wards. Our Monitor-Senior Monitor comparisons included only elderly patients and so conclusions cannot be drawn for patients under 70 years of age. Senior Monitor would also be preferred over Monitor if the user wishes to include the three additional sections and if classification into dependency groups

QUALITY ASSESSMENT INSTRUMENTS IN NURSING 125

is unnecessary. For younger patients in medical and surgical wards, users could select either Monitor or Qualpacs for those patients classified in dependency group 3. We have less confidence in recommending the DGl, DG2 and DG4 schedules of Monitor because of their lack of convergence with Qualpacs.

Acknowledgements-The study was funded by the Department of Health. We are grateful for the contribution to the fieldwork and analysis by other members of the research team: Deborah Tomalin, Sarah Oliver and Trevor Murrells. We are indebted to the hospital management for access and, above all, to the nurses and patients who participated in the study.

References

Ball, J. A., Goldstone, L. A. and Collier, M. M. (1984). Criteria for Care. The manual of the North West nurse staffing levels project. Newcastle upon Tyne, Newcastle upon Tyne Polytechnic Products Ltd.

Bartko, J. J. and Carpenter, W. T. (1976). On the methods and theory of reliability. J. Nervous Mental His. 163(5), 3077317.

Flanagan, J. (1954). The critical incident technique. Psychol. Bull. 51, 327-358. Giovannetti, P. B., Kerr, J. C., Bay, K. and Buchan, J. (1986). Measuring Quality of Nursing Care: analysis of

reliability and validity of selected instruments. Unpublished Report, Faculty of Nursing, University of Alberta. Goldstone, L. A. (1987a). Quality counts: the Monitor experience. Newcastle upon Tyne; Newcastle upon Tyne

Polytechnic Products. Goldstone, L. A. (1987b). Monitor. In Nursing Quality Measurement (Pearson, A., Ed.). Wiley, Chichester. Goldstone, L. A. and Maselino-Okai, C. V. (1986). Senior Monitor. An index of the quality of nursing care for

senior citizens on hospital wards. Newcastle upon Tyne, Newcastle upon Tyne Polytechnic Products, Goldstone, L. A., Ball, J. A. and Collier, M. (1983). Monitor: an index of the quality of nursing care for acute

medical and surgical wards. Newcastle upon Tyne Polytechnic Products. Harvey, G. (1990). Which way to quality? A study of the implementation offour quality assurance tools. Standards

of Care Project Report, Royal College of Nursing, London, Jelinek, R., Haussman, R. K. D., Hegyvary, S. T. and Newman, J. E. (1974). A Methodology for Monitoring

Quality of Care. US. Department of Health, Education and Welfare, Bethesda, Maryland. Kitson, A. L. (1991). Therapeutic Nursing and the Hospitalised Elderly. Scutari Press, London, Kitson, A. L., Harvey, G. and Guzinska, M. (1988). Nursing Quality Assurance Directory. Second edition. RCN

Standards of Care Project, Royal College of Nursing and King’s Fund, London. Norman, I. J., Redfern, S. J., Tomalin, D. A. and Oliver S. (1992a). Applying triangulation to the assessment of

quality of nursing. Nursing Times Occasional Paper 88(S), 4346. Norman, I. J., Redfern, S. J., Tomalin, D. A. and Oliver, S. (1992b). Developing Flanagan’s critical incident

technique to elicit indicators of high and low quality nursing care from patients and their nurses. J. Adr. Nurs. 17, 590-600.

Nunnally, J. C. (1978). Psychometric Theory. McGraw-Hill, New York. Phaneuf, M. (1976). The Nursing Audir. AppletonCentury-Crofts, New York. Redfern, S. J., Norman, 1. J., Tomalin, D. A. and Oliver, S. (1993). Assessing quality of nursing care. Qual. Hlth

Care 2, 124128. Tomalin, D. A., Redfern, S. J., Norman, I. J. and Oliver, S. (1992). Monitor and Senior Monitor: problems of

administration and some proposed solutions. J. Adr. Nurs. 17(l), 72-82. Van Maanen, H. M. (1981). Improvement of quality of nursing care: a goal to challenge in the eighties. J. A&.

Nurs. 6, 3-9. Ventura M. R. (1980). Correlation between the Quality of Patient Care Scale and the Phaneuf audit. Int. J. Nurs.

Stud. 17, 1555162. Ventura, M. R., Hageman, P. T., Slakter, M. J. and Fox, R. N. (1980). Inter-rater reliabilities for two measures

of nursing care quality. Res. Nurs. Hlth 3,25-32. Ventura, M. R., Hageman, P. T., Slakter, M. J. and Fox, R. N. (1982). Correlations of two quality of nursing

care measures. Res. Nurs. Hlth 5, 3743. Wand&, M. A. and Stewart, D. S. (1975). The Slater Nursing Competencies Rating Scale. Appleton-Century-

Crofts, New York. Wandelt, M. and Ager, J. (1974). Quality Patient Care Scale. Appleton-Century-Crofts, New York.

(Receioed 25 January 1993; acceptedfor publication 14 October 1994)