126
Teaching scientific measurement at university: understanding students’ ideas and laboratory curriculum reform Abstract This paper chronicles a major research and development project aimed at providing a theoretical basis for the construction and implementation of an introductory physics laboratory course. The character of the laboratory curriculum makes the findings applicable to laboratory courses at first year level in other science disciplines, such as biology and chemistry. The present paper details the development, validation and use of research instruments used to identify students’ decisions while making scientific measurements. It highlights the process used to infer the thinking behind these decisions. The resulting model of the point and set paradigms for students’ understandings of measurement is defined and tested. It describes how students’ views of measurement and uncertainty and the probabilistic approach to measurement have been embodied into a new laboratory programme. One set of studies reported here consists of surveys of students’ perceptions of measurement and uncertainty in scientific contexts. A second set of studies evaluates two different physics laboratory programmes, one using a traditional frequentist the other a probabilistic approach. The studies made use of sets of diagnostic probes centred around scenarios of common measurement decisions. These together with their coding schemes are appended for use as diagnostic tools of students’ understanding of measurement and uncertainty. 1 Science monograph - final 15/9/05 2:57 pm Page 1

Teaching scientific measurement at university

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Teaching scientific measurement at university

Teaching scientific measurement at university:understanding students’ ideas

and laboratory curriculum reform

AbstractThis paper chronicles a major research and development project aimed at providing a theoretical basis for the construction and implementation of anintroductory physics laboratory course. The character of the laboratory curriculummakes the findings applicable to laboratory courses at first year level in other sciencedisciplines, such as biology and chemistry. The present paper details thedevelopment, validation and use of research instruments used to identify students’decisions while making scientific measurements. It highlights the process used to infer the thinking behind these decisions. The resulting model of the point and setparadigms for students’ understandings of measurement is defined and tested. Itdescribes how students’ views of measurement and uncertainty and the probabilisticapproach to measurement have been embodied into a new laboratory programme.One set of studies reported here consists of surveys of students’ perceptions ofmeasurement and uncertainty in scientific contexts. A second set of studies evaluatestwo different physics laboratory programmes, one using a traditional frequentist theother a probabilistic approach. The studies made use of sets of diagnostic probescentred around scenarios of common measurement decisions. These together withtheir coding schemes are appended for use as diagnostic tools of students’understanding of measurement and uncertainty.

1

Science monograph - final 15/9/05 2:57 pm Page 1

Page 2: Teaching scientific measurement at university

Table of contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Theoretical background

- Premise 1: the scientific approach to enquiry is a domain of knowledge . 7

- Premise 2: undergraduate laboratory courses should improve students’

scientific approach to enquiry . . . . . . . . . . . . . . . . . . . . . . . . . .8

- Premise 3: knowledge is constructed as a social activity . . . . . . . . . . . . . .9

Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10

Contextual background for the study programme . . . . . . . . . . . . . . . . . . . . . . . 14

Methodology used for the studies

- Design of the probes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15

- The set of probes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18

- Administration of the probes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19

- Coding of probe responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19

- Analysis of probe data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20

- Validation of probes and coding schemes . . . . . . . . . . . . . . . . . . . . . . . . .20

Study 1: Surveying novice university physics students’ understanding of

measurement (Stage 1)

- Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

- Results:

(i) Actions and reasoning for data collection . . . . . . . . . . . . . . . . . . . . . 21

(ii) Actions and reasoning for data collection and data set comparison . . 25

- Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30

The point and set paradigms of measurement . . . . . . . . . . . . . . . . . . . . . . . . . .32

Study 2: Surveying novice university physics students’ understanding of

measurement (Stage 2)

- Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

- Results:

(i) The use of a paradigm for data collection . . . . . . . . . . . . . . . . . . . . .34

(ii) The use of a paradigm for data processing . . . . . . . . . . . . . . . . . . . . .36

(iii) The use of a paradigm for data set comparison . . . . . . . . . . . . . . . . .40

- Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42

Teaching Scientific Measurement at University

2

Science monograph - final 15/9/05 2:57 pm Page 2

Page 3: Teaching scientific measurement at university

Study 3: Evaluation of the original laboratory course

- Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47

- Description of the GEPS physics laboratory course . . . . . . . . . . . . . . . . . .48

- Evaluation method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .49

- Results:

(i) The use of a paradigm for data collection . . . . . . . . . . . . . . . . . . . . . .50

(ii) The use of a paradigm for data processing . . . . . . . . . . . . . . . . . . . . 52

(iii) The use of a paradigm for data set comparison . . . . . . . . . . . . . . . . .53

- Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59

Framework for establishing a new physics laboratory curriculum . . . . . . . . . . . .62

The probabilistic framework for measurement and uncertainty . . . . . . . . . . . . . .64

Description of the new (probabilistic) laboratory course . . . . . . . . . . . . . . . . . . .67

Study 4: Evaluation of the probabilistic laboratory course

- Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70

- Evaluation method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70

- Results:

(i) The use of a paradigm for data collection (a single reading) . . . . . . . .72

(ii) The use of a paradigm for data collection (ensemble of data) . . . . . . .74

(iii) The use of a paradigm for data set comparison . . . . . . . . . . . . . . . . . .76

(iv) Understanding uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76

(v) Comparison between the traditional and new laboratory courses . . . 79

- Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .80

Conclusion

- Students’ perceptions of measurement and uncertainty . . . . . . . . . . . . . . .81

- Evaluation of introductory laboratory courses . . . . . . . . . . . . . . . . . . . . . .82

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .84

Appendix 1: Probes about ideas about measurement and uncertain . . . . . . . . . .90

Appendix 2: Coding schemes for probes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .106

Appendix 3: Summary of main ideas regarding probability density functions 119

Appendix 4: Summary of outputs from this research project (1998-2004) . . . 126

Teaching Scientific Measurement at University

3

Science monograph - final 15/9/05 2:57 pm Page 3

Page 4: Teaching scientific measurement at university

Teaching Scientific Measurement at University

IntroductionSince 1995, members of the science education research groups in theDepartment of Physics at the University of Cape Town (UCT), South Africa, andin the Department of Educational Studies at the University of York (UOY),United Kingdom, have been engaged in a research programme that has sought toexplore and interpret undergraduate students’ understandings of measurement.The aim of the research has been to provide a theoretical basis for theconstruction and implementation of an introductory physics laboratorycurriculum. This sets out to develop both students’ abilities to undertake theprocedures of scientific measurement and data analysis, and their understandingof the nature of measurement and uncertainty. The nature of introductorylaboratory curricula in university science courses makes the findings of thepresent research applicable to the development of curricula for laboratorycourses in other science disciplines, such as biology and chemistry.

This publication tracks the development of our research programme, describesthe research approaches adopted, the outcomes identified and the actions taken.One of our aims in writing this account is to draw attention to the findings of theresearch and to indicate how the physics laboratory curriculum at UCT hasevolved to meet the needs of its students, while also incorporating recentchanges in the international conventions for the reporting of scientificmeasurement.

The present paper details:

(i) the development, validation and use of research instruments used to identify students’ decisions while making scientific measurements;

(ii) the process used to infer the thinking behind these decisions; (iii) the evolution of our theories about students’ understandings of

measurement; and (iv) how we have embodied the outcomes of our research into a new

laboratory programme. One set of studies reported here consists of surveys of students’ perceptions ofmeasurement and uncertainty in scientific contexts. A second set of studiesevaluates two different physics laboratory programmes. Furthermore, we haveincluded our research instruments in full, not only to increase the transparencyof our work, but also to enable others to develop their own research projects intheir own contexts. If readers decide to use or modify our instruments, then wewould very much like to hear about their research findings.

Theoretical backgroundIn this section three premises underpinning our research are set out. Thesubsequent literature survey includes a review of students’ problems in dealingwith multiple and single measurements, models for levels of understanding ofmeasurement and the influence of the epistemology of the nature of science onviews of scientific measurement.

4

Science monograph - final 15/9/05 2:57 pm Page 4

Page 5: Teaching scientific measurement at university

Teaching Scientific Measurement at University

5

Premise 1: The scientific approach to enquiry is a domain of knowledge

Our first premise is that practical laboratory work in science, and thus in physics,may contribute to three aspects of knowing science, i.e. various forms ofpractical work can help learning science, doing science and learning aboutscience (Hodson, 1998). In this context learning science refers to the improvedunderstanding of science concepts, phenomena and laws, often called declarativeknowledge (e.g. Black, 1993). Doing science refers to developing proceduresused in science laboratory practical activities. Learning about science includeshelping to increase understanding of what is meant by a scientific approach toinquiry. Distinct from declarative knowledge, some literature defines this as‘procedural knowledge’ (Millar et al., 1994). Our view is that proceduralknowledge is a distinct domain of knowledge to be learned, rather than acollection of skills to be practised. It should be noted that this definition of‘procedural knowledge’ in the context of experimental work in science is distinctfrom its use to describe students’ abilities to apply algorithmic procedures whensolving written problems (see for instance, Larkin and Reif (1989) and Chi et al.(1981)).

From their study of British secondary school students Millar et al. (1994)recognise three areas of procedural understanding. Firstly, they identify ahierarchy of ‘frames’ for doing experimental work, i.e. students’ perceptions ofthe purpose of practical experimentation. This refines the dichotomy of theengineering versus the science model of experimentation (Schauble et al., 1991).Secondly, Millar et al. suggest that the skill of students in manipulatingexperimental apparatus determines the nature of the action taken. Finally, theyargue that these actions are critically influenced by students’ understanding of‘concepts of evidence’ (Gott and Duggan, 1996). This understanding allows oneto judge the quality of experimental results and, ultimately, informs claims as towhether or not the results constitute believable new knowledge. Gott andDuggan (1996) suggest that students’ perceptions of the validity and reliabilityof an experimental procedure influence the different stages of a practicalinvestigation (e.g. inclusion of a control, the choice of a sample size), the waysthe data are collected (e.g. varying one variable at a time, taking repeatmeasurements), reported (e.g. in graphs or tables) and interpreted (e.g. notionsabout spread of results). In terms of undergraduates’ learning about science, wenote that for a science knowledge claim generated from experimentalmeasurements to pass from the personal domain to the realm of shared scientificknowledge, the quality of the claim, i.e. the reliability and validity of theconsolidated result, has to be considered and communicated (McGinn and Roth,1999). The unambiguous communication of measurement results and thecomparison of personal findings with other measurements, or with theory, areimportant elements of science laboratory work, both of which need to beexplicitly developed in the teaching laboratory.

Science monograph - final 15/9/05 2:57 pm Page 5

Page 6: Teaching scientific measurement at university

Teaching Scientific Measurement at University

6

Premise 2: Undergraduate laboratory courses should improve students’ scientificapproach to enquiry

Our second premise is that undergraduate laboratory courses should aim atdeveloping students’ knowledge of learning about science, i.e. at helpingstudents understand what is meant by a scientific approach to enquiry. Mostundergraduate physics laboratory courses lead students through a series ofhighly structured experiments intended to increase their understanding ofconcepts, laws and models (declarative knowledge) introduced in lectures(Meester and Maskill, 1995; Laws, 1996; Tiberghien et al., 2001). However,serious doubt has been expressed about the effectiveness of such hands-onexperimental work for illustrating theory or phenomena (Roth et al., 1997;Kirchner and Huisman, 1998). Montes and Rockley (2002) report that teachers’resistance to replacing traditional verification experiments by inquiry-typeexperiments is mostly based on expediency rather than pedagogy. Teachers seethemselves as the main beneficiaries of selecting verification experiments sincethey are easy to prepare and assess, are adjustable for large groups, and havepredictable and non-controversial outcomes.

Recently, Etkina et al. (2002) have suggested a ‘process approach’ to structuringlaboratory courses around three types of experiments, each with their ownspecific purposes. Observational experiments illustrate new phenomena forwhich students then devise possible explanations, testing experiments verify aprediction based on a previously developed tentative explanation of the samephenomenon, and application experiments use an explanation for onephenomenon to predict another. However, even with this ‘process approach’ theemphasis remains on concept and model development through the laboratoryexperiences. In contrast, we agree with Osborne (1996) that the purposes ofhands-on experimental work should be more strongly focused on developing ascientific approach to scientific enquiry. In terms of the ‘map’ of practical tasksconstructed by Millar et al. (1999), learning outcomes in this area mayconcentrate on students’ ability to: (a) set up a standard piece of apparatus andcarry out a standard procedure; (b) plan an investigation to address a givenquestion; (c) collect, process and compare data; (d) use data to support aconclusion; and (e) communicate the results of experimental work. In particular,different understandings of the concept of validity underlie many decisions madeduring designing and planning experiments (learning outcome (b) above). Onthe other hand, different understandings of the concept of reliability informdecisions made during data manipulation (learning outcome (c) above). Withinthe latter, our studies have specifically explored students’ understanding ofmeasurement and uncertainty, and thus deal with the investigative stages of datacollection, data presentation and data comparison.

Science monograph - final 15/9/05 2:57 pm Page 6

Page 7: Teaching scientific measurement at university

Teaching Scientific Measurement at University

7

Premise 3: Knowledge is constructed as a social activity

Our third premise underpinning the present research programme is thatknowledge only has meaning within a socially defined context (Vygotsky,1978). This view of knowledge has influenced our development and use ofexperimental scenario-based tasks set in specific (everyday, technological orscientific) contexts (Roth and Roychoudhury, 1993), which have formed thebasis of laboratory learning activities and research instruments. Eachexperimental task has a specified audience to whom the outcomes of the task arereported, thus emphasising the need to provide persuasive arguments based onthe experimental data (Bartholomew et al., 2003). This approach is alsosupportive of students’ acculturation into the form of scientific discourseassociated with experimentation (Kuhn, 1970). This can be particularlyimportant in contexts within which the students have had little or no meaningfulexperience with practical work and can thus be considered virtual outsiders tothe scientific discourse (Lemke, 1997)

Vygotsky’s view of the social nature of knowledge has profound consequencesfor knowledge acquisition through experimental work. If learning is regarded toresult from a simultaneous process of group (inter-mental) and individual (intra-mental) sense making, then the tasks set for students to complete in thelaboratory need to acknowledge this. For example, in our studies we takestudents’ existing knowledge of measurement and uncertainty as the point ofdeparture for carefully structured interventions. Although over the last threedecades a large body of literature (summarised by Pfundt and Duit, 1994) hasemerged which describes the declarative knowledge held by science students(for undergraduate level see, for example, McDermott and Schaffer, 1992;Tornkvist et al., 1993; Halloun and Hestenes, 1985), significantly less has beenreported on students’ procedural knowledge (Roth and Roychondhury, 1993;Germann and Aram, 1996).

More recently an alternative explanation of students’ existing knowledge hasbeen proposed within the social constructivist view of learning, which emergedfrom a phenomenological perspective. An explanation for existing, and oftenpersistent, student conceptions is thought to have a basis in more fundamentalprimitives, i.e. ideas independent of specific science concepts. DiSessa (1993)identified an initial 29 phenomenological primitives (or p-prims) that hedescribes as follows:

They are phenomenological in the sense that they often originate in nearly superficialinterpretations of experienced reality. [ ] They are ready schemata in terms of which onesees and explains the world. There are also two senses of primitiveness involved. P-prims are often self-explanatory and are used as if they needed no justification. But also,primitive is meant to imply that that these objects are primitive elements of cognitivemechanism - nearly minimal memory elements, evoked as a whole, and they are perhapsas atomic and isolated a mental structure as one can find. (diSessa, 1993, p112).

Science monograph - final 15/9/05 2:57 pm Page 7

Page 8: Teaching scientific measurement at university

Teaching Scientific Measurement at University

8

These p-prims have the characteristic that they may be valid as explanatory toolsin some, but not in other, situations. Students’ alternative ideas in science arethen seen as a result of an inappropriate application of a general, essentiallycorrect, primitive, rather than as a result of a substitution of the scientificallyaccepted concept by an incorrect alternative conception. The use of p-prims (asopposed to alternative conceptions) for explaining students’ intuitiveunderstandings of science concepts has also implications for teaching (diSessaand Sherin, 1998). Rather than carefully structuring conceptual conflictsituations showing the limited power of a particular misconception in explaininga selected experience, a phenomenological teaching intervention identifies the p-prim, emphasises its usefulness in several situations but illuminates the reasonsfor its inapplicability for the concept under discussion. Several studies have usedp-prims for surveying students’ intuitive understanding in areas of declarativeknowledge (for instance, see Wittmann (2002) and Wittmann et al. (2003) forphysics concepts, and Southerland et al. (2001) for concepts in biology).Although Lippmann (2003) reports on an evaluation of a course in measurementand uncertainty based on the phenomenological approach, we have yet to locatereports of studies using the p-prim approach to investigate students’ intuitiveunderstanding of the scientific approach to inquiry, including theirunderstanding of measurement and uncertainty.

Literature reviewThere has been a surprisingly small number of studies on the understanding ofmeasurement and uncertainty of undergraduate science students. Séré et al.(1993) reported that French physics students after a practical course were, ingeneral, proficient in applying certain algorithms, such as calculating means,standard deviations and confidence intervals, but showed little underlyingunderstanding. Students repeated measurements mainly when they believed thatthey had reasons to distrust the first measurement. Even after repeatingmeasurements, students preferred the first or a recurring measurement torepresent their final result. Séré and her colleagues observed that students werevery loose in their use of terms such as “precision”, “accuracy” and “systematicand random errors”. Garrett et al. (2000) report the same for British first yearchemistry students.

To remedy confusions about terminology (and underlying concepts), Tomlinsonet al. (2001) suggest that students should be required to make explicit use of awell-defined set of key words in their practical reports. As Thomson (1997)highlights, however, such terminology is not used consistently even in physicspublications. In this context it is interesting to note that the InternationalOrganisation for Standardisation has expressed concern about the term‘precision’ for descriptions of scientific measuring instruments because of itsmany confusing everyday connotations (Giordano, 1997). Séré et al. (1993)concluded that even the correct use of statistical procedures by students seldomindicates an appreciation of the purposes behind such procedures, or anunderstanding of how to assess the reliability of data.

Science monograph - final 15/9/05 2:57 pm Page 8

Page 9: Teaching scientific measurement at university

Teaching Scientific Measurement at University

9

Masnick and Morris (2002) surveyed the way in which the comparison of two datasets is influenced by the characteristics of the sets. In interviews with individualstudents, they presented tables of data related to the achievement of two athletes.They varied the data sets systematically in size (from one to six data points), thefrequency of overlapping data points (from zero to two) and the variability, or range,relative to the mean. American undergraduates were asked what conclusions theycould draw from the information, the reasons for these conclusions and how certainthey were. They were also asked to predict the next data point for each athlete, andhow certain they were about the difference between the two predicted values. Theresults indicate that judgements were highly sensitive to sample size (for a largersample size students were significantly more certain of their conclusions andpredictions), and to the number of overlapping data points (fewer overlapping datapoints resulted in a significantly higher certainty of difference between the athletes’performances). Apart from sample size and overlapping data points, conclusionswere based on criteria related to comparison between data points (as frequency, orproportion) and the means of the sets of data points. Only a small minority ofstudents suggested being influenced by variability or outliers within the data sets, orby characteristics of the experimenter, or the apparatus.

Vellom and Anderson (1999) studied the strategies used by American sixth gradestudents to persuade their peers to accept their experimental findings. They reportedthat the students used a wide variety of approaches including appeals to social oracademic status and the restatement of their views with increased vehemence. Thestudents’ attempts to reach consensus led to a focus on the nature of theexperimentation. Discussions of experimental techniques and the need to be able toreplicate data were seen to be important. It was concluded that even young childrenacting as ‘a community of validators’ (Cobb and Bauerfield, 1995) use aspects of ascientific approach to enquiry to persuade their peers rather than invoke the authorityof the teacher.

In a French study of more advanced pupils (14-17 years), Coelho and Séré(1998) describe students’ search for the ‘true value’ of a quantity, and theirdissatisfaction with the inconsistency of their measurements. The authors seesuch a view of measurement as reflecting a ‘spontaneous deep realism’ andindicate that this notion can be either an advantage or an obstacle, depending onthe nature of the teaching and learning activities being presented to the students.Thus, a student could develop a view in which uncertainty is recognised asinherent to all measurement, or consolidate a view that uncertainties can beeliminated entirely by good experimental techniques and sound equipment.Unfortunately, the nature of instruction in traditional laboratory courses can, infact, lead to the latter outcome. For instance, Fairbrother and Hackling (1997)claim that the closed nature of many laboratory tasks stems from theepistemological view of science as a body of facts to be catalogued. Such closedtasks reinforce students’ expectation of the existence of a ‘right answer’ to anyexperimental problem. If students obtain inconsistent measurements or adifferent answer to the one they expect, then they think that they have made anerror. If the idea of uncertainty is not appreciated, then “errors” in theirmeasurement are seen as being able to be eliminated completely.

Science monograph - final 15/9/05 2:57 pm Page 9

Page 10: Teaching scientific measurement at university

Teaching Scientific Measurement at University

10

Lubben and Millar (1996) suggested a model (see Table 1) for the progression oftypes of student ideas about measurement. They focussed mainly on perceptionsabout repeated readings.Table 1: Model of progression of ideas concerning experimental data.(Adapted from Lubben and Millar (1996)).

Level Students’ view of the process of measuring

A Measure once and this is the right value.

B Unless you get a value different from what you expect, a measurement is correct.

C Make a few trial measurements for practice, then take the measurement you want.

D Repeat measurements till you get a recurring value. This is the correct measurement.

E You need to take a mean of different measurements. Slightly vary the conditions to avoid getting the same results.

F Take a mean of several measurements to take care of variation due to inaccurate measuring. Quality of the result can be judged only by authority source.

G Take a mean of several measurements. The spread of all the measurements indicates the quality of the result.

H The consistency of the set of measurements can be judged and anomalous measurements need to be rejected before taking a mean.

This model was based on a series of pencil and paper exercises undertaken by English secondary school and pre-university students. The study collected data onmeasurement actions students would take in a large number of experimentalscenarios. Responses from the entire set of scenarios were used to generate the eightprogressive levels shown in Table 1. The authors emphasise that progression throughthe levels results from the logic of the procedural concept (measurement) and doesnot reflect students’ progressive learning paths. However, the model provides a toolfor classifying measurement actions in terms of the underlying measurement ideas.

Evangelinos et al. (1998) studied undergraduate physics students’ handling ofexperimental measurement, especially their perceptions of single readings. Theyreport that their students use repeated measurements to validate a firstmeasurement and are regarded unnecessary when the students use what theyconsider to be a high precision laboratory instrument. Their understanding ofprecision was linked to a view that the readings on instruments are exact factsand that precision is associated with either the existence, or the lack, of manydigits on the display. Evangelinos et al. found that the students they studied haddeeply rooted views about exactness and precision that acted as barriers to theiracceptance of uncertainty as an intrinsic property of scientific measurements.

Science monograph - final 15/9/05 2:57 pm Page 10

Page 11: Teaching scientific measurement at university

Teaching Scientific Measurement at University

11

Even after instruction, many students would retain the view that a singlemeasurement taken with a laboratory instrument could give the true value of ameasurand.

More recently Evangelinos et al. (2002) have reported on an intervention studyusing the probabilistic approach to measurement with first year universitystudents in Greece. They categorised their students as being “exact”,“approximate” or “interval” reasoners with regard to their views on therelationship between theory (the variable to be measured, i.e. the measurand) andthe datum (the reading). The authors found that a majority of students adhered tothe notion of a ‘good’ single measurement representing an exact value. Inaddition, a large proportion of these students considered that since an idealmeasurement is unobtainable, a single measurement needs to be reported as anapproximate value. Only if the measurement is considered really ‘bad’ then itwill be reported as an interval. Results from their study suggest that theintervention helped students understand the fundamental difference between anexact and an uncertain quantity, and apply concepts of uncertainty andprobability to single measurements.

Lippmann (2003) reports an evaluation of an intervention, the ScientificCommunity Laboratory (SCL), for teaching measurement and uncertainty tophysics undergraduates in the USA. The SCL approach uses the notion of p-prims for dealing with data as a resource that students bring to the laboratory.The intervention draws on students’ everyday skills of argument and decisionmaking for data collection and interpretation. Through the design of thelaboratory tasks the SCL explicitly creates measurement ‘frames’ (mind-sets)illustrating the usefulness of these everyday skills when dealing withmeasurements in the laboratory. The results indicated that after the interventiona large proportion of SLC students understand the use of intervals for comparingdata sets.

Hammer (1994) has investigated the epistemological beliefs of a small group ofundergraduate physics students by categorising their understanding of the nature ofknowledge and learning under three headings. He identified beliefs about thestructure of physics (isolated information or a coherent framework); beliefs about thecontent of physics (fact and formulae or concepts); and beliefs about learning physics(receiving information or reconstructing understandings) concluding that their beliefaffected their success in learning physics. Elby (2000) recognises thatepistemological beliefs are important and affect a student’s mindsets, metacognitivepractices and study habits. Knowing that traditional physics courses tend not tosignificantly change students’ epistemological beliefs, he describes anepistemological focused physics education course designed to help students todevelop a more sophisticated belief system. His aim was to move students from aposition where they considered that common sense cannot be trusted in scienceclasses to one in which they saw scientific thinking as a refinement of everydaythinking. Developing this line of research, Hammer and Elby (2003) review literature

Science monograph - final 15/9/05 2:57 pm Page 11

Page 12: Teaching scientific measurement at university

Teaching Scientific Measurement at University

12

that evidences the importance of this epistemological component to students’ successin learning in introductory physics courses. They further illustrate how high schoolstudents have formed robust but counter productive epistemological beliefs aboutscience. Of significance to our studies is the finding that many students see scienceas a set of facts and thus, in making laboratory measurements, a failure to verify aparticular fact implies experimenter error. This view is supported by the reports ofRyder and Leach (2000) and Leach et al. (2000) from a study of data interpretationby nearly 800 students in upper secondary school and universities in five Europeancountries. Their results suggest that students tend to ignore the central role oftheoretical models in their interpretation of data and use multiple forms ofepistemological reasoning, which need to be considered when designing curricula.

Séré et al. (2001) have investigated the nature and status of understanding ofmeasurement held by upper secondary school and first year university students.They report on a diagnostic questionnaire study of about 400 students in Franceand Spain that explored students reasoning about handling sets of experimentalmeasurements. They were interested in eliciting the knowledge that students useto inform their actions in laboratory work and thus to what extent students’claims about data were warranted by their epistemological positions. Theyconcluded that students’ decisions could not be attributed to a consistentepistemological position. This would seem to reinforce the finding of Leach etal. (2000) that students use more than one form of epistemological reasoning.They concluded that in laboratory work, an understanding of what entails areliable measurement, decisions on how to measure, how to processmeasurements and how to interpret the processed measurements to reachconclusions draw on different epistemologies in different contexts.

Contextual background for the present studiesThe work reported on here has focused on students entering first year scienceprogrammes at the University of Cape Town (UCT), and studying physics. Ourcentral concern was initially with students registered for the General Entry toProgrammes in Science (GEPS), formally known as the Science FoundationProgramme (SFP). GEPS is a structured 4-year BSc programme primarilytargeted at educationally disadvantaged black students who do not securesufficiently high scores on school examinations to gain direct entry to the BScprogramme. Selection to GEPS is based on school performance, equity targetsand a range of indicators used to judge potential to succeed. Nearly all GEPSstudents come from schools previously managed by the (now disbanded)Department of Education and Training (DET) in South Africa. GEPS studentstypically do not speak English as a first language and tend to come from socio-economically disadvantaged backgrounds. Although our research started withGEPS students it has expanded to include mainstream or “direct entry” physicsstudents (DES) following a 3-year BSc degree programme. Such students couldbe expected to have enjoyed better schooling than the GEPS students, to havehad experience of practical work (including measurement) in school science, andto have performed well in school certificate examinations.

Science monograph - final 15/9/05 2:57 pm Page 12

Page 13: Teaching scientific measurement at university

Fundamental to the physics element of the GEPS and DES curricula islaboratory practical work. Learning activities and resources presented in theearly stages of the GEPS curriculum (see Allie and Buffler, 1998) were designedspecifically to equip students with the practical tools, skills and proceduresconsidered to be required to maximize the chance of success in their course. Oneafternoon per week is devoted to practical experimental work in the laboratoryduring which time students work in groups of two or three to carry outmeasurements of various types with an assortment of equipment. It was ourexperience that, although the laboratory course was structured and the centralideas of measurement and data analysis were explicitly introduced, manystudents found considerable difficulty on encountering these concepts. Althoughthese difficulties appeared to diminish as students became more adept atapplying the formal rules they were given, we could not assume that they hadgained any depth of understanding of the fundamentals of measurement. It wasfrom this background and concern that our research has arisen.

The education system in South Africa is still evolving. We must recognise,however, that many students who study science in higher education are likely toencounter practical work for the first time on entering the first year laboratory.The TIMMS research project (see, for example, Howie and Hughes, 1998) hasreported the declarative understanding of science of Grade 12 South Africanstudents, and these findings may serve as a point of departure for the design ofmore advanced programmes. However, these studies have not been concernedwith the practical abilities, or procedural understanding, of students. Althoughour studies have focused on students at UCT, we see these as representative ofuniversity entrants more generally, of whose knowledge of scientificmeasurement we know very little.

Methodology used for the studiesData for these studies were obtained from two sources: written questionnaires(or “probes”) and student interviews.

Design of the probes

Our probes focus on the decisions made by students during different phases ofexperimental laboratory work. A starting point for the design of our probes werethe instruments developed for the PACKS project (Lubben and Millar, 1996)which were designed for UK school pupils aged 11-15, and hence were seen asunsuitable to be used directly with university students in South Africa.Therefore, although the PACKS instruments provided many useful ideas, mostof our probes were developed specifically for the present studies. We recognisedthat decisions about measurement are potentially difficult to explore throughwritten probes since respondents often have difficulty in visualizing ‘thoughtexperiments’. Furthermore, it was considered that a range of contexts, such asused in the PACKS probes, might have led to confusion in our situation. In orderto minimise these potential problems, an easily recognisable context associated

Teaching Scientific Measurement at University

13

Science monograph - final 15/9/05 2:57 pm Page 13

Page 14: Teaching scientific measurement at university

with the physics laboratory was chosen to place the students in a scientific frameof mind. All the probes we developed were related to the same experimentalcontext: a ball rolling down a slope fixed at the edge of a bench. This was chosenfor its simplicity of description and because it was considered unlikely thatstudents would have encountered this experiment before. This context is shownin Figure 1.

Figure 1: The experimental context used for the probes.

All the probes were of the same style and related to the experimental contextshown in Figure 1. In all our studies probes were consciously sequenced forcompletion in the order of decisions to be made in experimental work. Figure 2shows an example of a probe. Each probe presents a brief text stem whichintroduces a practical laboratory situation which requires a decision to be made.The cartoon characters depicted in the probe offer a number of alternativeactions. The student is asked to select an action, and the reason for choosing that

Teaching Scientific Measurement at University

14

An experiment is being performed by students in the PhysicsLaboratory.A wooden slope is clamped near the edge of a table. A ball is releasedfrom a height h above the table as shown in the diagram. The ballleaves the slope horizontally and lands on the floor a distance d fromthe edge of the table. Special paper is placed on the floor on whichthe ball makes a small mark when it lands.The students have been asked to investigate how the distance d onthe floor changes when the height h is varied. A metre stick is usedto measure d and h .

Science monograph - final 15/9/05 2:57 pm Page 14

Page 15: Teaching scientific measurement at university

action is requested in open form written prose. It may be argued that the choicesprovided in the probes are unnecessary pre-determined and that recognition,rather than understanding is thus being tested. However, the options werepresented in such a way that any one of the suggested actions could have beenchosen for a variety of reasons. Furthermore, most probes also provided for thestudent to suggest his or her own alternative to the variety of actions presented.

Teaching Scientific Measurement at University

15

The students work in groups on the experiment. Their first task is to determine d whenh = 400 mm. One group releases the ball down the slope at a height h = 400 mm and,using a metre stick, they measure d to be 436 mm.

The following discussion then takes place between the students.

With whom do you most closely agree? (Circle ONE): A B C

Explain your choice.

Figure 2: The RD (“Repeating Distance”) probe.

I think we shouldroll the ball a fewmore times fromthe same heightand measure deach time.

Why? We’ve got theresult already. We donot need to do anymore rolling.

I think weshould roll theball down theslope just onemore time fromthe sameheight.

A B C

Science monograph - final 15/9/05 2:57 pm Page 15

Page 16: Teaching scientific measurement at university

Since many of the students in the present studies did not speak English as a firstlanguage, particular care was taken in the choice of the language structure andvocabulary used in the probes. Consequently, a terse writing style was adopted.In addition, cartoon characters were used in order to mask ethnicity andeliminate cultural and gender differentiation. We also used the letters A, B, C,etc. instead of real names to identify the cartoon characters. It was felt that, in acountry characterised by a history of racial segregation, cultural and gender bias,life-like pictures (‘talking heads’) with real names could influence responses tothe probes. We based our decision to use nameless cartoon characters by seekingstudents’ views both in written feedback and through interviews. Students werepresented with ‘talking heads’ with African names and with cartoons identifiedwith letters as ways of identifying alternative responses in probes. Anoverwhelming majority of interviewees had a clear preference for cartoons withletters and regarded the cartoons as gender neutral and race free. Theseinterviews also confirmed that the text used was accessible linguistically to thevast majority of target students.

The set of probes developed

We have developed and validated a range of probes for use in our investigations.Each has been targeted at a particular aspect of measurement and seeks to determinestudents’ decision and illuminate their reasoning. Table 2 lists and describes eachprobe. Four probes (SDR, RD, RDA and RT) are concerned with collecting data,three probes (UR, AN and SLG) are concerned with processing data, four probes(SMDS, DMSS, DMOS and DMSU) are concerned with comparing measurementsand two probes (NU1 and NU2) focus on views of uncertainty. The full probes arepresented in Appendix 1.

Table 2: The probes used in studies.

Probe code Name of Probe Aspect of measurement

SDR “Single Distance Reading” Data collection RD “Repeating Distance” RDA “Repeating Distance Again” RT “Repeating Time”

UR “Using Repeats” Data processing AN “Anomaly” SLG “Straight Line Graph”

SMDS “Same Mean Different Spread” Comparison of results DMSS “Different Mean Similar Spread” DMOS “Different Mean Overlapping Spread” DMSU “Different Mean Same Uncertainty”

NU1 “No Uncertainty 1” Views about uncertainty NU2 “No Uncertainty 2”

Teaching Scientific Measurement at University

16

Science monograph - final 15/9/05 2:57 pm Page 16

Page 17: Teaching scientific measurement at university

Administration of probes

Our protocol for administering the probes requires each probe to be answeredindividually in strict sequence and under examination conditions. For all of ourstudies an envelope containing the particular set of probes to be answered ishanded out to each student. Each probe is printed on a separate sheet of paperand each page, together with the envelope, is stamped with a unique number toidentify the set. The relevant instructions and the experimental context, togetherwith a diagram of the apparatus (see Appendix 1) are printed on the front of theenvelope. Before starting, it is emphasized to students that they are not writinga test, there are no right or wrong answers and that their explanations fordecisions are of paramount importance. The students are instructed to answer thequestions strictly in the sequence given and not to look ahead to future probes orturn back to previous pages. The students are also told to place each completedprobe inside their envelope and not to take it out again, even if they want tochange an answer. They are told that the last sheet in the pack of probes will givethem an opportunity to note any changes they wish to make to a previous answer.

A large-scale version of the apparatus (a wooden ramp and a tennis ball) is usedto demonstrate the ‘experiment’ before the probes are answered. The text asgiven in Figure 1 above is read out twice while the ball is released from twodifferent positions on the slope. Care is taken in this presentation not to provideany clues for answering the probes. Although students were not compelled totake part in our various studies, very few chose not to participate and mostregarded the experience as a worthwhile learning activity. We found that ourstudents on average take about 45 minutes to answer 10 probes and to providerequested biographical data. Generally fewer than 10% of the students make useof the opportunity to indicate changes to answers.

Coding of probe responses

Coding of students’ responses was based on the choice of action (A, B, C, etc.)together with the explanation for their actions. Categories of responses toindividual probes were developed following the systematic consideration ofindividual responses using the Grounded Theory method (Strauss and Corbin,1990). Research team members looked at a number of scripts independently toidentify different categories of reasoning. Descriptors for each category wereclarified, refined and amended. Where necessary, categories were subdividedand delineated to make them mutually exclusive. This resulted in a draft of acoding scheme with high validity that was then used to independently code setsof students’ responses. These were then compared and the coding scheme furtherrefined as necessary into a valid scheme. This alphanumeric coding schememakes use of codes having a letter (indicating the choice of action) and twodigits. The first digit is associated with a major category of reasoning, while thesecond digit allows a sub-category. Responses that were impossible to interpretwere recorded as ‘not codeable’. The reliability of coding was verified by having

Teaching Scientific Measurement at University

17

Science monograph - final 15/9/05 2:57 pm Page 17

Page 18: Teaching scientific measurement at university

at least two researchers working independently to code responses and resolvingany differences through discussion informed by inspection of the responses froman individual student across clusters of related probes. The coding schemesdeveloped for each of the probes used in our studies are provided in Appendix 2.

Analysis of probe data

The frequencies of responses for each probe were tallied and clusters ofresponses showing similar types of reasoning were identified e.g. the codes A30,B30 and C30 would indicate a different action, but all resulting from the sameidea about measurement. This enabled the underlying reasoning to be identifiedfor each student across different measurement-related situations, such as datacollection, data processing and data comparison. Relationships between thetypes of reasoning used in each of these broad areas, and others, wereinvestigated in order to identify the main criteria used by students when makingdecisions at various stages of measurement and data analysis.

Validation of probes and coding scheme by interviews

After completion of the probes, samples of up to 30 volunteer students in eachstudy were interviewed by one of the researchers for about 30 minutes each.These interviews allowed us to check students’ understanding of the questionsand the interviewer’s interpretation of their responses. We were also able toconfirm that the probes presented sufficient alternatives covering a wide enoughrange of possibilities in each case. In addition, the interviews allowed us toexplore the reaction of the students to being presented with the probes. Theoverall impression was that the students felt that they had expressed themselvesadequately when answering the probes. Only a small minority articulated a senseof frustration by claiming to have an explanation for a choice but not the wordsto express it. There was no indication that our interpretation of the students’written responses was inconsistent with their ideas.

Studies completedThe results from four studies will be reported here. The first two studies wereconcerned with surveying novice university physics students’ understanding ofmeasurement. The second two studies were concerned with the evaluation of twodifferent physics laboratory programmes aimed at developing suchunderstanding.

Teaching Scientific Measurement at University

18

Science monograph - final 15/9/05 2:57 pm Page 18

Page 19: Teaching scientific measurement at university

Study 1: Surveying novice university physics students’understanding of measurement (Stage 1)In this first study, partly reported in Allie et al. (1998), we posed the following threeresearch questions:

• What do novice university physics students understand about measurement ina scientific context?

• How does this understanding differ in the three measurement-related situationsof data collection, data processing and data comparison?

• How appropriate is it to apply the Lubben-Millar model of progression ofstudents’ ideas about measurement (see Introduction) to novice universityphysics students in South Africa?

A set of six probes were administered to a sample of 121 GEPS physics studentsduring the first year of study at university. Three probes were concerned withrepeating measurements of time and distance. Since we anticipated that the orderof the probes could affect the responses to the RT and RD probes, these probeswere included in reversed order in half of the sets. The remaining three probeswere concerned with how to deal with ensembles of data. The latter covered theissues of how to handle an anomalous reading, how to compare two sets of datahaving the same mean but different spreads and how to compare two sets of datahaving a similar spread but different means.

Results

The results from the probes are presented in two sections below. The first sectionreports the responses to three probes (RT, RD and RDA) exploring students’ ideasabout data collection. The second section reports the responses to three probesfocusing on students’ ideas about data processing (AN) and data comparison (SMDS,DMSS). The quotes presented in support of the analysis are drawn from differentstudents with no student being quoted more than once.

(i) Actions and reasoning for data collection

The first of the three probes on repeating measurements dealt with the issue ofrepeating time measurements (RT):

The students are given a stopwatch and are asked to measure the timethat the ball takes from the edge of the table to hitting the ground afterbeing released at h = 400 mm. They discuss what to do.

A: “We can roll the ball once from h = 400 mm and measure the time. Once is enough.”

Teaching Scientific Measurement at University

19

Science monograph - final 15/9/05 2:57 pm Page 19

Page 20: Teaching scientific measurement at university

B: “Let’s roll the ball twice from height h = 400 mm, and measure the time for each case.”

C: “I think we should release the ball more than twice from h = 400 mm and measure the time in each case.”

The “Repeating Distance” (RD) probe used the same format and had thefollowing text:

After measuring the time, the students now have to determine d when h = 400 mm. One group releases the ball down the slope at a height h = 400 mm and,using a metre stick, they measure d to be 436 mm.

The following discussion then takes place between the students.

A: “I think we should roll the ball a few more times from the same height and measure d each time.”

B: “Why? We’ve got the result already. We do not need to do any more rolling.”

C: “We must roll the ball down the slope just one more time from the same eight.”

This was followed by the “Repeating Distance Again” (RDA) probe in which twosubsequent distance measurements provide different readings:

The group of students decide to release the ball again from h = 400 mm.This time they measure d = 426 mm.First release: h = 400 mm d = 436 mmSecond release: h = 400 mm d = 426 mm

The following discussion then takes place between the students.

A: “We know enough. We don ‘t need to repeat the measurement again.”

B: “We need to release the ball just one more time.”C: “Three releases are not enough. We must release the ball

several more times.”

Independently of the option chosen for each probe, six main ideas about thepurpose of repeating measurements arose in the responses to these three probes.Table 3 below shows the frequencies of these main ideas. These have been listedin order of least to most sophisticated.

Teaching Scientific Measurement at University

20

Science monograph - final 15/9/05 2:57 pm Page 20

Page 21: Teaching scientific measurement at university

Teaching Scientific Measurement at University

21

Table 3: Summary of responses to the three probes on repeatingmeasurements for time (RT), distance (RD) and distance again (RDA). (n = 121)

Category Description No. of students (%)RT RD RDA

R1 No repeats are needed 0 (0) 9 (7) 2 (2)

R2 Repeats provide practice to improve the process 15(12) 12(10) 9 (7) of taking measurements

R3 Repeats are needed to find the recurring 5 (4) 12 (10) 4(3) measurement

R4 Repeats are needed to improve the accuracy 8 (7) 10 (8) 28(23)

R5 Repeats are needed for establishing a mean 77(64) 60 (50) 61(50)

R6 Repeats are needed for establishing a spread 14(11) 11 (9) 11(9)

R0 Not codeable 2(2) 7 (6) 6 (5)

Students placed in category R1 did not see any purpose in repeatingmeasurements. Typically, they argued that:

They don’t need to do any more rolling, because there is paper on the floor. The ballwill make marks while hitting the floor. This is the distance they want. (RDresponse)

or that

If the same wooden slope is used the distance should be the same. Without frictionthe ball will land on top of the previous mark. (RDA response)

Responses in category R2 indicated that repeating was seen to be required in orderto gain practice and thus perfect the individual measurement. Typically, thesestudents claimed that:

By releasing the ball more than twice from h=400 we can be more certain of ouranswer. If we release our ball maybe five times we can limit the chances of doingmistakes when using the stopwatch. (RT response)

About a third within this cluster saw perfecting the measuring technique as anintroduction to calculating a mean. For example, one student suggested that:

They have to release the ball more than twice to ensure that the times that they aregetting are consistent and accurate. Once they are sure of the time, they can take themean of the values. (RT response)

Such an understanding is more advanced and links to category R5.

Science monograph - final 15/9/05 2:57 pm Page 21

Page 22: Teaching scientific measurement at university

The responses in category R3 indicated that repeating measurements is neededin order to find a recurring value, which is then perceived as the correct reading.A typical response was:

If the measurements are taken several times, it will be evident if the measurementscorrespond. It will be of great advantage finally to get the same measurement forseveral attempts. (RDA response)

Category R4 consists of responses that made a very general reference torepeating in order to increase “accuracy”. One student in this group wrote that:

The larger the number of readings, the greater the accuracy of the times achievedfor the experiment (RT response)

and another

The more measurements you take the more you know how accurate you are. Oneor two measurements doesn’t tell you enough about the real time taken. (RDAresponse)

Almost all responses within this cluster referred to aiming for a single “real” or“true” value, indicating a lack of appreciation of the inherent variation inrepeated experimental observations.

Included in category R5 are the responses that focused on repeatingmeasurements in order to calculate a mean value. A large number of the studentsin this group indicated that taking the mean compensates for random errors inindividual measurements. One explained that:

It is tricky to measure time accurately with a stopwatch, so I reckon that youshould take more than 2 readings. More readings would eliminate human error instopping and starting the stopwatch when the average is taken. (RT response)

and another that

The ball has to be rolled a few more times because there is always error in anyexperiment. The most accurate way of determining the precise measurement is totake the average of the values that came out of the experiment. (RD response)

About a third of this cluster explicitly stated that the mean value will be close tothe true value. In contrast, the more sophisticated thinkers within this categoryappreciated that an increase in the number of measurements will increase thereliability of the mean. For example one student wrote that:

It is better to obtain more results in order to have a more accurate and meaningfulmean. (RT response)

Teaching Scientific Measurement at University

22

Science monograph - final 15/9/05 2:57 pm Page 22

Page 23: Teaching scientific measurement at university

Category R6 comprises those responses that suggested repeating is needed toreduce the uncertainty in the measurement. A characteristic response was:

In order to be more precise, that is reduce the uncertainty, we have to take as manyreadings as possible. (RDA response)

More than half of the responses in this cluster also mentioned calculating amean. For example, one response was:

For any measurement in physics there will be systematic errors. Hence the valueof time in each case will differ. So they will need to find the average time. Thenthere will be the uncertainty associated with that average of time. (RT response)

Although the analysis and classification of the responses for each probe providesan overview for the ideas being used by the total sample of students, it is usefulto look at the sets of responses of individual students. This establishes theconsistency of the use of these different types of reasoning about repeatingmeasurements. Four types of reasoning were identified. A small cluster ofrespondents (7%), the ‘non-repeaters’, did not see a purpose in repeatingdistance measurements at all due to the static nature of the measuring points. Onthe other hand, all of these ‘non-repeaters’ reasoned that several timemeasurements needed to be taken. They also stated that the mean time had to becalculated with the specific purpose of compensating for reading errors in orderto approach the ‘true’ value for the time. A second small cluster (8%), here called‘perfecters’, reckoned that repeats of time and distance measurements areneeded to practice and perfect the experimental procedure (R2). Confronted withdifferent repeated measurements in the RDA probe, half of this cluster suggestedcontinuing to repeat and to calculate the mean. A third small cluster of students(10%) wanted to repeat distance measurements in order to find a recurring value.Half of these ‘confirmers’ persisted in this view when presented with twodifferent distance readings. The fourth and largest cluster (58%) can beconsidered to be consistent ‘mean reasoners’, who gave R5 responses (repeatingin order to establish a mean) to either two or all three probes. Within this cluster,7% of the sample mentioned the calculation of a spread, or uncertainty, as areason for repeating. This sub-set may be termed ‘spontaneous spreadreasoners.’ The analysis of the responses to the three probes below providesfurther insight to understanding the large cluster of ‘mean reasoners’.

(ii) Actions and reasoning for data processing and data set comparison

The first probe in this section dealt with how to handle an anomaly (AN):

A group of students have to calculate the average of their (distance)measurements after taking six readings. Their results are as follows(mm): 443, 422, 436, 588, 437, 429.

Teaching Scientific Measurement at University

23

Science monograph - final 15/9/05 2:57 pm Page 23

Page 24: Teaching scientific measurement at university

The students discuss what to write down for the average of the readings.

A: “All we need to do is to add all our measurements and then divide by 6.”

B: “No. We should ignore 588 mm, then add the rest and divide by 5.”

Table 4: Summary of responses to the AN probe. (n = 121)

Category Description No of students (%)

AN1 The anomaly must be included when taking an average since all readings must be used. 37 (30)

AN2 The anomaly is noted, but it has to be included since it is part of the spread of results. 14 (12)

AN3 The anomaly must be excluded as it is most likely a mistake. 30 (25)

AN4 The anomaly must be excluded as it is outside the acceptable range. 38 (31)

AN0 Not codeable 2 (2)

It can be seen from Table 4 that 42% of the students chose to include theanomaly while 56% felt that the anomaly should be excluded from the data. Theformer group may be divided into two sub-groups categorised as AN1 and AN2,respectively, with about three times as many students falling into the formergrouping. In the AN1 category the procedure for taking the average is thedominant consideration and this allows no freedom for judging the data.Typically, these students argued that:

This is a correct method of finding the average. (AN response)

or that

One cannot choose to ignore certain results: all results must be used. (AN response)

The smaller sub-group (AN2) acknowledged that the reading of 588 mm waswell outside the range defined by the other readings. However, this reading didnot pose a problem to these students as it formed part of the spread. Acharacteristic argument for inclusion was that:

The value 588 mm shows how big the spread of the values are and should be usedbecause that is what the group has measured and should form part of their results.(AN response)

Teaching Scientific Measurement at University

24

Science monograph - final 15/9/05 2:57 pm Page 24

Page 25: Teaching scientific measurement at university

With regard to the students who excluded the anomalous measurement, justunder half (AN3) suggested that the anomaly should be ignored. Onerepresentative response noted:

They may have made a mistake while they were measuring it. (AN response)

A few of these students suggested that this measurement should be repeated. Theremaining students (AN4) excluded the anomaly on the grounds that the pointwas outside an acceptable range or was not consistent with the other values.They claimed, for example, that:

All the measurements except 588 are in the range of 2 mm: 588 is out of this rangeby more than 140 mm. (AN response)

The “Same Mean Different Spread” (SMDS) probe provided two sets of datathat had the same mean but different dispersions. The intention of this probe wasto establish how the quality of a data ensemble is characterised:

Two groups of students compare their results for a distancemeasurement.

Group A: 444 432 424 440 435 Average = 435 mm Group B: 441 460 410 424 440 Average = 435 mm

A: “Our results are better. They are all between 424 mm and 444 mm. Yours are spread between 410 mm and 460 mm.”

B: “Our results are just as good as yours. Our average is the same as yours. We both got 435 mm for the distance.”

Table 5: Summary of responses to the SMDS probe. (n = 121)

Category Description No. of students (%)

SMDS1 The results are equally good since the 58 (48)averages are identical.

SMDS2 The results of group A are better since the 53 (44)data of group A are closer together than those of group B.

SMDS0 Not codeable 10 (8)

With reference to Table 5, it can be seen that the students were dividedapproximately equally on whether the two sets of results were equally good(SMDS1) or whether group A had the better results (SMDS2). Students in theformer group used the average as the only criterion to compare the two sets ofdata. Two responses typify this category. Firstly, those who simply mentionedthe average without referring to the spread and stated

Teaching Scientific Measurement at University

25

Science monograph - final 15/9/05 2:57 pm Page 25

Page 26: Teaching scientific measurement at university

Because group B has the same average as group A. (SMDS response)

and those (about 60% of the SMDS1 group) who stated very clearly that

The spread of measurements has nothing to do with the average value. (SMDSresponse)

therefore implying that the spread was not a criterion to be used in making thenecessary comparison. The students in the SMDS2 group concluded that theresults of group A were better and appear to have used some notion of the spreadin the data in reaching their conclusion. However, the large majority of theresponses were not very clearly stated, with terms such as ‘uncertainty’ and‘spread’ used loosely in the explanations, as illustrated by such statements as

The values [for] calculating final d must not be spread out too much. (SMDSresponse)

and

The uncertainty between readings obtained by group A is about 20 mm while thatobtained by group B is about 50 mm. (SMDS response)

The overall pattern of the responses suggests that the students were not able todifferentiate clearly between the overall spread of the data ensemble and thedifferences between the individual data points within the ensemble. Hardly anystudents invoked the former concept in an unambiguous way.

The final probe in this section was the “Different Mean Similar Spread” (DMSS)probe which provided two sets of data that had a different mean but similar andoverlapping dispersions:

Two groups of students compare their results for five releases of the ballat h = 400 mm.

Group A: 440 438 433 422 432 Average = 433 mmGroup B: 432 444 426 433 440 Average = 435 mm

A: “Our result agrees with yours.”B: “No, your result does not agree with ours.”

Teaching Scientific Measurement at University

26

Science monograph - final 15/9/05 2:57 pm Page 26

Page 27: Teaching scientific measurement at university

Table 6: Summary of responses to the DMSS probe. (n = 121)

Category Description No of students (%)

DMSS1 It depends on how close the averages are. 62 (52)

DMSS2 It depends solely on the relative spreads 4 (3)of the data.

DMSS3 It depends on the degree of 10 (8)correspondence between individual measurements in the two sets.

DMSS4 It depends on both the averages and the 34 (28)uncertainties.

DMSS0 Not codeable 11 (9)

By far the most prevalent idea (see Table 6) was to compare averages and thenmake a decision about whether the averages were ‘close’, ‘far’ or ‘consistent’(DMSS1). About two thirds of the students in the DMSS1 grouping concludedthat the two averages were consistent. A typical suggestion was that:

The averages might not be the same but they are only different by 2 mmwhich is a very small distance. (DMSS response)

The remaining third expressed the contrary view. One characteristic responseexpressed the view that:

433 and 435 are totally different numbers. (DMSS response)

Another stated that:

The answers aren’t exactly the same are they! How can they agree with each other?(DMSS response)

The grouping DMSS2 comprised only four students who used the criterion thatthe relative spreads were the basis for the comparison (DMSS2). For example,one stated that:

The results don’t agree since the uncertainty in group A will be greater than groupB. (DMSS response)

A group comprising 8% of the students tried to come to a conclusion bycomparing individual measurements between the two sets of data (DMSS3),typically reasoning that:

The values for the two groups match almost exactly. (DMSS response)

The most sophisticated reasoning was evidenced by about a third of the students(DMSS4) who used the notion of uncertainty or spread in conjunction with the

Teaching Scientific Measurement at University

27

Science monograph - final 15/9/05 2:57 pm Page 27

Page 28: Teaching scientific measurement at university

average to come to a conclusion. This group had some difficulty in expressingtheir ideas. One wrote:

If we find the uncertainties in A and B the average of A will most likely fall in therange of B(av)±B and the same will apply to the average of B to A(av)±A. (DMSSresponse)

and another

With every average there should be a standard deviation and chances are both willbe in the same range. (DMSS response)

Discussion

The probes used in the study proved to be useful tools to provide insights intonovice university physics students’ understanding of measurement. Studentsreported that they were straightforward to complete, and coding of students’responses, while time consuming, determined significant differences in students’decisions and reasoning.

When presented with the option of repeating a measurement, very few studentsopted not to do so. Significant minorities repeated to perfect a measurement orconfirm a measurement. However, although the reasons given for repeating incategories R2, R3 and R4 of the RT, RD and RDA probes (see Table 3) appearto be different, they all have one common feature: the data processing involveseither comparisons between the individual readings and/or judgments about oneof a set of readings. There is therefore, no recognition that a data set should beviewed collectively as an ensemble and modelled by some characterisingparameters. These students aimed to obtain the true value of the measurand andevidenced that they understood this to be a single data point. By contrast, at leasthalf of the students opted to take further readings in order to calculate a mean.These students appear to have the understanding that the best value of themeasurand in such cases is represented by the average of the data. However,more students chose to repeat in order to establish a mean for time (64%) thandistance (50%) measurements, suggesting that these students are basing theirdecision to repeat on some other criterion, such as the perceived nature of timeas a dynamic quantity or their understanding of the instrument being used. Onlya small group of students made explicit reference to the dispersion in theobserved data and their reason for repeating was to establish the spread in thedata. From this it appears that students’ decisions to repeat are based on anunderlying concept of the measurand either being able to be represented by asingle ‘true value’ or a ‘spread’ of values.

Further evidence that some students reason from the concept of a measurandbeing a spread comes from the responses to the DMSS probe which focuses onusing the spread around a mean to compare whether the measurements areconsistent with each other. Here 30% of the total sample may be regarded as

Teaching Scientific Measurement at University

28

Science monograph - final 15/9/05 2:57 pm Page 28

Page 29: Teaching scientific measurement at university

using ‘spread reasoning’. Comparing the responses for probes SMDS and DMSSprovides indicators of the consistency (or fragility) of ‘spread reasoning’. Theseprobes confront students with ‘spread thinking,’ i.e. that the individual datapoints form an ensemble which can be represented by two theoretical constructs,namely, a mean and a ‘width’. (The AN probe can be interpreted as requiring ajudgment about which data to use to form the ensemble). Although it wouldappear from category SMDS2 of the SMDS probe that a good proportion ofstudents might have understood the notion of spread, i.e. that the closer the dataare to each other the ‘better’ the result for the measurand, it is clear from theresponses to the DMSS probe that even when provided with a spread, only theaverage was recognised as representing the data while the scatter was ignoredand purely subjective notions of ‘closeness’ were employed. While about half ofthe respondents related the widths of the spreads to the accuracy of the dataensembles (SMDS), only about a third of this group applied the criterion ofoverlapping spreads to decide whether the data ensembles in the DMSS probewere consistent with each other. In addition, there was a group of students whoused ‘spread reasoning’ when deciding to take repeat measurements to obtain amean (the RT probe) but not for the DMSS probe. One may conclude that thesestudents recognised that there are variations between the data points in theDMSS probe but did not synthesise this to formulate a notion of spread that theycould use together with the mean to characterise the data ensemble. In summary,only some 15% of the total sample of these novice university physics studentsmay therefore be regarded as using ‘spread reasoning’ in a consistent way.However, these ‘spread reasoners’ exhibited greater sophistication than allowedfor in the Lubben-Millar model of progression of understanding aboutmeasurement (Table 1). This suggests that their model could be extended toinclude an additional higher-level category. We suggest that this category (I)caters for students that show understanding that consistency of data ensemblescan be judged by comparing the relative positions of their means in conjunctionwith their spreads. Apart from a few instances where A, C and D level reasoningcan be identified, although not consistently, most of the novice students may beclassified as falling into levels F, G, H or I.

Although the students in the sample may be classified as advanced reasoners,their language usage was typically haphazard. Terms reflecting collected andcomputed data such as ‘measurement’, ‘calculation’, ‘result’, and ‘value’ wereused interchangeably. There was also considerable confusion about terminologysuch as ‘spread’, ‘error’, ‘range’, ‘uncertainty’, ‘precision’ and ‘accuracy’.Although this may be attributed to linguistic difficulties, it appears more likelythat this is related to the lack of understanding of the nature of measurementuncertainty in the minds of the students. For example, the vast majority ofstudents argued that repeating is needed to limit the ‘random error’. At the sametime, however, 51% of the responses (all of R2 and R3, and part of R4 and R5)indicated that repetition is required in order to get closer to the ‘real’ or ‘correct’value for the time or distance measurement.

Teaching Scientific Measurement at University

29

Science monograph - final 15/9/05 2:57 pm Page 29

Page 30: Teaching scientific measurement at university

The point and set paradigms of measurementWhile the extended Lubben-Millar model appears to be useful to describe thedifferent levels of sophistication reached by students, it is a descriptive schemaand does not look to explain student responses. Having established that students’responses to our probes are related to their understanding of a measurementeither as providing a single ‘true value’ or as a ‘spread’ of values we considerthat students can be characterised as ‘point reasoners’ or ‘set reasoners’, and thateach conforms to the strictures of one of the two paradigms described below.

The point paradigm (see Table 7) is characterized by the notion that eachmeasurement yields either the correct (true) value or an incorrect value of thequantity being measured (the measurand). As a consequence each measurementis regarded as independent of the others, except to confirm or reject a specificvalue, and individual readings are not combined in any way. This way ofthinking also manifests itself in the belief that only a single (very careful)measurement is required to establish the true value. If an ensemble of readingswith dispersion does emerge, representations of the measurand are based on theindividual data points only, such as for example, the selection of a recurringvalue in the data set or a one-to-one comparison of data values between differentdata sets.

On the other hand, the set paradigm (Table 7) is characterised by the notion thateach reading is an approximation of the measurand and that knowledge about themeasurand can never be perfect in principle. The most information regarding themeasurand is obtained by using all available data to construct distributions fromwhich the best approximation of the measurand and an interval of uncertainty arederived. In nearly all practical situations in the introductory laboratory, the bestapproximation of the measurand will either be the reading itself (in the case of asingle reading) or the calculated average value of a set of repeated readings.

Table 7: The point and set paradigms.

Point Paradigm Set Paradigm

The measurement process allows you The measurement process provides to determine the true value of the incomplete information about the measurand. measurand.

“Errors” associated with the All measurements are subject to measurement process may be uncertainties that cannot be reduced

reduced to zero.

A single reading has the potential of All available data are used to being the true value. construct distributions from which

the best approximation of the measurand and an interval of uncertainty are derived.

Teaching Scientific Measurement at University

30

Science monograph - final 15/9/05 2:57 pm Page 30

Page 31: Teaching scientific measurement at university

In summary, the key difference between the two paradigms is that students usingthe point paradigm draw conclusions about the measurand directly fromindividual data points, while those using the set paradigm draw conclusionsabout the measurand from the properties of the distribution constructed from thewhole ensemble of available data. The point paradigm can be regarded as a localrealistic way of viewing data while the set paradigm uses theory to mediatebetween the data and the measurand.

Establishing a baseline of use of point and set paradigms by novice sciencestudents provides a potentially useful system to explain their measurementactions and reasoning. This was the focus of our second study which is describedbelow.

Study 2: Surveying novice university physics students’understanding of measurement (Stage 2)Stage 2 of the survey, partly reported in Lubben et al. (2001), intended to furtherexplore students’ understanding of measurement by extending the study tostudents from a larger diversity of backgrounds. This survey also addressed thefirst two research questions as described above for Stage 1, while the thirdresearch question was replaced by the following:

■ To what extent is our model of point and set paradigms useful for interpretingstudents’ ideas about measurement and uncertainty?

Therefore for the Stage 2 study described below, student responses were codedaccording to the definitions of the point and set paradigms.

The data for Study 2 were collected at the beginning of the academic year, beforeany instruction had taken place. A cohort of 257 first time entering physicsstudents completed seven probes according to the protocol described earlier.They were also asked to provide demographic data, including gender, homelanguage and previous laboratory experience. The sample included a largervariety of achievement, language background and pre-university laboratoryexperience, as 83 students were registered for the mainstream or “direct entry”programme, and the remaining 174 students for GEPS. The probes covered threebroad areas of measurement-related activities. Three probes investigated theunderstanding of measurement for the area of data collection (the RT, RD andRDA probes), two for data processing (the SLG and UR probes) and two fordata comparison (the SMDS and DMSS probes). The full probes are given inAppendix 1. The probes were analysed in the same way as described above andstudents’ use of the two paradigms in the different measurement-relatedsituations was investigated.

Teaching Scientific Measurement at University

31

Science monograph - final 15/9/05 2:57 pm Page 31

Page 32: Teaching scientific measurement at university

Results

(i) Use of a paradigm for data collection

Three situations were presented to survey students’ understanding ofmeasurement when collecting data: the “Repeating Time” (RT) probe, the“Repeating Distance” (RD) probe, and the “Repeating Distance Again” (RDA)probe. Examples of typical responses associated with the use of either the pointparadigm or set paradigm are provided below.

The point paradigm was used by those students who indicated that repeating, fortime and distance measurements, would simply yield identical values, andtherefore asserted that repeating measurements was purposeless, as exemplifiedby the following quote:

They do not need to repeat, because the force of friction is the same every time, his constant, acceleration due to gravity is constant, so distance travelled is constant.Once is enough. (RD response)

Students who indicated practice as the reason for repeating were also classifiedas using the point paradigm, as illustrated by the following responses:

You need to roll the ball a few more times because the first one or twomeasurements are usually rough estimates. You need to take more timemeasurements and then only can you take an accurate measurement. (RT response)

We need to do it once more, because the results may only vary since the height [ofthe releases] is not accurately measured. So for the last time one should measurethe height of exactly 400 mm or also the edge of the table could be the problem,so [the slope] should be placed exactly on the edge of the table. (RDA response)

The largest proportion of responses that displayed point reasoning came fromthose students who argued for repeating measurements in order to select therecurring value. This type of point reasoning is illustrated by, for example:

You need to repeat several more times. If you rolled the ball just once from 400mm, you might have misjudged the height and then have a wrong conclusion. Ifyou roll the ball twice and yet get different answers there will be uncertainty. Whenrolling the ball a number of times, and keeping measuring the same time, you canbe sure it is correct. (RT response)

Teaching Scientific Measurement at University

32

Science monograph - final 15/9/05 2:57 pm Page 32

Page 33: Teaching scientific measurement at university

Since the ball was released again and two different results were obtained it isimportant to release it several more times until the equal or the same results can beobtained. This will be the exact distance required in mm. (RDA response)

In contrast, responses were classified as indicative of the use of a set paradigmwhere the repeating of measurements was justified by making reference tocalculating an average of a number of readings. Some responses within thissizeable group specifically justified averaging as a way of dealing with thevariation in experimental readings, as illustrated by the second quotation below:

If you roll the ball one more time and then take the average of all three answers,the average of the three will be more precise then the average of the first 2 releasesonly. (RDA response)

More than 3 repeats will give a more accurate result as the average is of a biggernumber and abnormal measurements will not affect the average greatly. (RDAresponse)

Set reasoning was also reflected in the group of responses which explained theneed for the repeating of measurements specifically as a means of obtaining anindication of the spread of the measurements, as illustrated by:

They need to repeat several more times to get an idea of the range of the results.(RT response)

On the other hand, some responses indicated that averaging was needed toapproach the ‘true’ value. These were also classified as set reasoning due to theacknowledgement of the imperfections in experimental measurement. Theintention of using a number of readings is illustrated, for example, by:

If they roll the ball from the same height a set number of times, they could obtainan average d which will be closer to the right answer. (RD response)

Several responses indicated that the spread associated with data collection wasnot appreciated and that calculating the mean was largely a rote response. Thisapparent set reasoning is illustrated by:

We always take an overall average of all the observations to get an accurateanswer, so the experiment must be repeated. (RD response)

Teaching Scientific Measurement at University

33

Science monograph - final 15/9/05 2:57 pm Page 33

Page 34: Teaching scientific measurement at university

The combination of the responses to the RT, RD and RDA probes was used toidentify individual students as users of the point or set paradigms. If all threeresponses to the data collection probes showed either point or set reasoning, thestudent was considered to be using consistently the point paradigm (P) or the setparadigm (S), respectively. If the responses showed a mixture of point and setreasoning, the student was classified as using predominantly the point (Q) or theset paradigm (R). Table 8 provides a summary of the frequencies of students forthe various classifications.

Table 8: Students’ use of paradigms when collecting data (RT, RD andRDA probes) (n = 257)

Use of paradigm Code Number of students (%)

Consistent point paradigm P 77 (30)

Predominant point paradigm Q 52 (20)

Predominant set paradigm R 46 (18)

Consistent set paradigm S 75 (29)

Not classified U 7 (3)

Total 257(100)

The results in Table 8 show that just under 60% of respondents were dividedequally between consistent point reasoning (P) and consistent set reasoning (S).The remaining 40% of respondents used both forms of reasoning, again beingequally split between predominantly point (Q) and predominantly set (R)reasoning. It is noteworthy that about two thirds of the inconsistency (69 of the98 cases from within Q and R) can be explained by the use of different types ofreasoning when dealing with repeating time as apposed to distancemeasurements. Students in this sub-group used the set paradigm for collectingtime measurements (they repeated in order to establish an average), but the pointparadigm for collecting distance measurements (they repeated in search of arecurring value).

(ii) Use of a paradigm for data processing

Two probes (see Appendix 1) were concerned with data processing: one probeexplored students’ ideas about ways of representing and Using Repeatedmeasurements (UR probe), and another probe requested the respondent to fit aStraight Line Graph to a series of plotted points (SLG probe).

When dealing with the UR probe, several students selected one reading on thebasis of its position in the ordered or non-ordered set of data, thus using the pointparadigm. On occasion this procedure was called ‘averaging’ as illustrated by thelast two of the following quotes:

Teaching Scientific Measurement at University

34

Science monograph - final 15/9/05 2:57 pm Page 34

Page 35: Teaching scientific measurement at university

436. The first reading is 436 mm. I reckon that this first value contains muchchances of being very near the correct result, since this first result is the initialrelease so all the factors are quite stable. What I mean is, at the fifth release thewooden slope could be a bit tilted due to repeated use so the result is not veryreliable. (UR response).

Looking at the results, all the measurements seem to be running between 426 -438. The best shot is 438 as the maximum shot. So 438 should be the final record.(UR response).

438 mm, because it is the average measurement. (UR response)

434. This reading is more average than the others - it is not extreme high or low.(UR response)

Many students showed point reasoning in processing the numerical data byrepresenting the series of measurements by the recurring value:

d = 426 mm. After the five releases two equal results were obtained. The ball fellon the same place and therefore it is the right and accurate measure comparingwith the other ones that were different. (UR response)

426. It has twice been their final result. If you do something twice and still get thesame results then you must accept it as your final result. (UR response)

Once again, several respondents called this procedure ‘averaging’:

426, because they got 426 twice. So the average value of d is 426 mm. (URresponse)

In contrast, students were classified as using the set paradigm when theycalculated the arithmetic mean (432 mm) of the five readings, althoughoccasionally the mean was calculated to be 433.5 mm which was arrived at bydisregarding one of the 426 mm readings. Several students within this clusterprovided an even more complete generalization by referring to an averagetogether with a range. Some examples are provided below:

432 mm is the average of the five numbers. Taking the average, I think, is the mostaccurate. (UR response).

My final result: for 5 experiments, a range between 426 mm and 438 mm givingan average of 432 mm. Obviously the answer is not going to converge to a specificnumber. Therefore the results of the experiment should show this fact. The rangeprovides more information than the average alone. The individual numbers don’tseem important because there is no trend in the numbers. The average gives us theclosest estimation of the distance. (UR response).

Teaching Scientific Measurement at University

35

Science monograph - final 15/9/05 2:57 pm Page 35

Page 36: Teaching scientific measurement at university

Sometimes an average was taken only after a judgement was made aboutparticular readings in relation to the others in the set. These responses were alsointerpreted as indicative of set reasoning, for example:

430 mm. The result of 426 mm occurred twice and 436 mm and 434 mm I took as435 mm which therefore occurred twice. So then I took the average between 426and 435 which is 430. The 438 mm result is obviously a freak. (UR response).

(436 + 426 + 434) ÷ 3. By avoiding the highest and the lowest results andcalculating the average of the other three, you should arrive at the most accurateanswer. (UR response).

A sizeable cluster of students displayed aspects of set reasoning by calculatingthe mean of the five readings, but then consolidated their position as pointreasoners by selecting one of the readings to represent the set, as shown by:

434 mm is the value closest to the average of these 5 results. They should not takethe average because this does not refer to a result obtained from one of theexperiments. (UR response).

434. The average of the 5 releases is 432. Therefore I feel that 434 best representsthis investigation. (UR response).

The SLG probe also explored understanding of measurement when processingdata. In this probe the respondent was provided with a series of plotted pointsand asked to draw in a line that best fitted the data set (see Appendix 1).

Two broad categories of graphs were identified in the responses. Students whoconnected a number of selected points or who used lines or undulating curves,describing what they did as “joining all the plotted points”, were classified asusing the point paradigm. Similarly, students presenting graphs showing a lineconnecting only the first point (sometimes the origin) and the last point were alsoclassified as using a point paradigm. The procedure would be described bystatements such as:

The graph has to be a straight line. The line is drawn from the first to the last heightand it is in the middle of the plotted points. (SLG response).

More frequently, users of the point paradigm tried to draw a line through asmany points as possible, often disregarding the other points as “abnormalreadings”, illustrated by:

I started drawing a line from the origin. I chose a group of dots that, when a lineis drawn through them, goes through the origin. (SLG response).

Teaching Scientific Measurement at University

36

Science monograph - final 15/9/05 2:57 pm Page 36

Page 37: Teaching scientific measurement at university

On the other hand, those that appeared to fit a line (not necessarily a straightline) by taking into account all the plotted points were classified as users of theset paradigm. Several students produced lines taking account of all points andproviding quite sophisticated explanations for such lines of best fit, for example:

I have tried to have an equal number of heights on both sides of the line. This linecan then act as an average between all distances. I mean, one can then determinethe average gradient of all the heights. (SLG response).

I have tried to make the line lie so that all points lie at a minimum distance fromthe line. (SLG response).

The responses for the data collection probes (RT, RD and RDA) were combinedwith those from the data processing probes (UR and SLG) in order to determinewhether students were consistent in their reasoning in both of these phases oflaboratory work. Table 9 shows the numbers of point and set paradigm users indata processing, versus users of consistent and predominant point (P+Q) and set(R+S) paradigms in data collection (see Table 8).

Table 9: Relationship between the use of paradigms in data collection anddata processing. (n = 257)

Paradigms used for data processing (from UR and SLG probes)

Point 70 45 14 0 129 paradigm (27%) (18%) (5%) (0%) (50%)

Set 12 47 61 1 121 paradigm (5%) (18%) (24%) (0%) (47%)

Not 6 0 1 0 7classified (2%) (0%) (0%) (0%) (3%)

Total 88 92 76 1 257 (34%) (36%) (30%) (0%) (100%)

The data in Table 9 show that about one third of the sample used the pointparadigm consistently for data processing, one third consistently used the setparadigm, and the remaining third used both paradigms. The latter clustermainly used set reasoning for data processing when representing a set of data(UR probe) but resorted to point reasoning when fitting a line to a series ofplotted points (SLG probe). The inconsistent use of paradigms for dataprocessing occurs particularly amongst those using the set paradigm for datacollection. Of special note is the sub-cluster (n = 35) of point reasoners in datacollection who also use point reasoning to represent the plotted points (SLG) but

Teaching Scientific Measurement at University

37

Consistentpoint

paradigm

Inconsistentpoint-setparadigm

Consistentset

paradigm

Notclassified

Total

Paradigms used for

datacollection(from RT,RD and

RDAprobes)

Science monograph - final 15/9/05 2:57 pm Page 37

Page 38: Teaching scientific measurement at university

(more advanced) set reasoning for processing the set of repeated measurements(UR). Closer inspection of their UR responses shows that many within thiscluster resorted to taking an arithmetic mean as a second option as illustratedbelow:

(436 + 426 + 438 + 426 + 434) mm ÷ 5. The experiment has been performed toofew times to find a credible mode, thus the average distance for the fiveexperiments would be as accurate as possible an answer with the giveninformation. (UR response)

This sub-cluster of users of both paradigms in data processing may thus be seenas disguised point reasoners.

Most importantly, the results in Table 8 suggest that a strong link exists betweenthe use of the point paradigm for data collection and for data processing on theone hand, and the use of the set paradigm for data collection and data processingon the other (χ2 = 70.5).

(iii) Use of a paradigm for data set comparison

Two probes were used to investigate the reasoning used to compare different setsof data: the “Same Mean Different Spreads” (SMDS) probe, and the “DifferentMean Same Spread” (DMSS) probe. The full probes are provided in Appendix 1.

Both these depicted scenarios that suggested aspects of set reasoning as they notonly provided the series of readings but also modelled these readings by a singlevalue, i.e. the mean. As the mean was used in the probes, student responsesfocussing only on the mean were seen as indicative of imposed set reasoning.However, the responses allowed for the identification of a more established useof set reasoning that also considered the spread of the readings, i.e. internalisedset reasoning. The SMDS probe presented the issue of spread explicitly, and thusrequired recognition of spread as a descriptor of the quality of a series ofmeasurements. In the DMSS probe, the notion of the spread as an indicator ofthe uncertainty (standard deviation of the mean) of a set of measurements neededto be conceptualised and applied when deciding whether or not the intervalsdefined by the two series of data overlapped or not. For each probe, responseswere classified as using only the mean (imposed set paradigm) or using the meanand the spread (internalised set paradigm) for the decision-making process.

The responses of many students were classified as indicating imposed setreasoning when they compared only the means and ignored the spread in the dataseries when reaching the conclusion that they were equally good. In the SMDSprobe this was characterized by responses such as:

Teaching Scientific Measurement at University

38

Science monograph - final 15/9/05 2:57 pm Page 38

Page 39: Teaching scientific measurement at university

They both got the same average and that’s all that matters. It’s not relevant whetherthe results are spread far or not. (SMDS response)

Similar reasoning, using only the mean, was applied in the DMSS probe whenintuitive notions of ‘closeness’ were used to compare the averages in order tocome to a decision about the agreement between the two series of measurements,such as:

The two averages are so close that it is possible to say that they agree with eachother. (DMSS response)

Other students used such imposed set reasoning to draw the opposite conclusion,as illustrated by:

A’s average is smaller than that of B’s, so the results do not agree. (DMSSresponse)

In contrast, some internalised set reasoning was detected in the SMDSresponses, for example:

Group A’s readings are better. It may well have been luck that A’s answers wereclosely related, but it seems more likely that A’s answers were more consistent asa result of good control of their experiment, and it was chance that the averageswere the same. (SMDS response)

Although the two groups have the same average, group A has much less variance.If more experiments were to be carried out, their averages would most likely differ.(SMDS response)

In the DMSS probe fewer than 10% of the respondents used internalised setreasoning by considering the relative spreads of the data as a criterion, asillustrated below:

The difference between the two results (= average) is very small and in anyscientific experiment results will vary. Therefore, the chance that one group willobtain exactly the same result as another group is highly unlikely. However theresults will lie within a certain range and any two results within that range areconsidered to agree, as in the above case. (DMSS response)

Combining the responses by the same student to both probes, students werelabelled as consistently using an imposed set paradigm when all decisions weremade based solely on the mean. Amongst the respondents who indicated the useof an internalised set paradigm, a number were categorised as using an‘inconsistent set paradigm’ when they showed recognition of the spread of thedata in the SMDS probe, but not for the DMSS probe. Lastly, students werecategorised as using a ‘consistent internalised set paradigm’ if they took accountof spread in the data in both the SMDS and DMSS probes. The relationship

Teaching Scientific Measurement at University

39

Science monograph - final 15/9/05 2:57 pm Page 39

Page 40: Teaching scientific measurement at university

between point or set reasoning for data collection, and imposed and internalisedset reasoning for data comparison is summarized in Table 10.

Table 10: Relationship between the use of paradigms in data collection anddata comparison. (n = 257)

Paradigm used for data comparison (from SMDS and DMSS probes)

Point 86 40 2 1 129paradigm (33%) (16%) (1%) (0%) (50%)

Set 69 48 3 1 121paradigm (27%) (19%) (1%) (0%) (47%)

Not 5 1 0 1 7classified (2%) (0%) (0%) (0%) (3%)

Total 160 89 5 3 257 (62%) (35%) (2%) (1%) (100%)

Table 10 shows that two thirds of the students used imposed set reasoning bycomparing the data only on the basis of the calculated mean, and that only a smallgroup (5 students) consistently used internalised set reasoning by considering boththe mean and the spread for their comparisons. One third of the students usedinconsistent set reasoning by considering the spread as a crucial descriptor of anensemble of readings in only one of the probes, invariably in the SMDS probe,where the spread was being tested at the level of recognition. The data show nosignificant relationship between the use of an internalised set paradigm for datacomparison and the use of a set paradigm for data collection, due to the lowfrequencies in the earlier categories (χ2 = 2.96).

Discussion

The Stage 2 survey shows that about two thirds of the students consistently useeither the point or a set paradigm for decisions on measurement for datacollection. It also shows that there is a good correlation between the type ofreasoning students use across data collection and data processing if theclassification into point and set reasoning is applied. If the students whocalculated the mean as a rote-learned routine are also treated as point reasoners,then this pattern is even stronger. It may, therefore, be concluded that theconstructs of the point and set paradigms form a useful basis for the interpretationof students’ decision-making processes during investigative activities in thelaboratory, such as data collection and data processing. The reasons for repeatingmeasurements, the ways of dealing with a series of repeated measurements and

Teaching Scientific Measurement at University

40

Consistentimposed

setparadigm

Inconsistentset

paradigm

Consistentinternalised

set paradigm

Notclassified

Total

Paradigms used for

datacollection(from RT,RD and

RDAprobes)

Science monograph - final 15/9/05 2:57 pm Page 40

Page 41: Teaching scientific measurement at university

the fitting of a straight line to a number of plotted points all appear to be rootedin this same common construct.

Thus, students who use point reasoning typically do not repeat measurements, orelse repeat in order to seek identical readings or to perfect their measurementskill. They also opt to represent a series of data by a single measurement, oftena recurrent value. Furthermore, when asked to plot a straight line graph, theyeither draw a line through as many points as possible or else connect all thepoints by a series of straight line segments. On the other hand, students who useset reasoning repeat measurements in order to calculate an average or todetermine the spread of the readings. They typically represent a series ofmeasurements by a calculated mean and may include an estimation of theuncertainty. When fitting a straight line or smooth curve, they take all the pointsinto account usually by having the same number of points above and below thefitted line or curve. In short, set reasoners are prepared to model the availabledata using some form of theoretical constructs, while point reasoners hold fast tothe individual readings. In addition, the use of the point and set paradigms forthe analysis of the probes indicates that only a very small proportion of studentresponses appear to be uncodeable, which further strengthens the validity of theanalysis method.

The fact that the reasoning used by students in the laboratory can be classifiedinto two broad categories is a considerable improvement on the model ofprogression of students’ understanding of evidence as suggested by Lubben andMillar (1996). Their model (Table 1) provides a series of measurement actionsas indicators of progressive understanding. In contrast, the point and setparadigms provide a direct classification of students’ understanding ofmeasurement. It is striking that the same dichotomy of the two paradigms hasbeen established for students’ perceptions of measurement if surveyed in achemistry context (Rollnick et al., 2001), with roughly the same proportions ofnovice students subscribing to these as emerge from our study. This findingconfirms our assumption that understanding of measurement and uncertaintyamounts to a specific domain of knowledge.

The alternate usage of point and set reasoning by the same students for differentprobes within (but hardly across) the stages of data collection and dataprocessing occurs frequently, but not randomly. Students relate their choice tothe procedural context. For instance, a number of consistent point reasonersexceptionally adopted set reasoning when stating that they took repeat timemeasurements in order to calculate an average. For these students, the difficultnature of stopwatch operation may intuitively have emphasized the variability oftime measurements, whereas for distance measurements the same studentssought a recurring value. The reverse, however, did not occur, i.e. overall setreasoners using point reasoning for collecting time measurements. For dataprocessing, a few consistent point reasoners exceptionally used set reasoning by

Teaching Scientific Measurement at University

41

Science monograph - final 15/9/05 2:57 pm Page 41

Page 42: Teaching scientific measurement at university

representing a series of repeated measurements by a calculated mean. The setsof data presented together with their averages are likely to have triggered someform of set reasoning. This finding of the dependency of the use of point or setreasoning on the measurement context contradicts one of the basic ideas of theextensive study of the Assessment Performance Unit (APU, 1988), i.e. thetransferability of procedural abilities. Our findings also extend beyond earlierfindings by Song and Black (1992) reporting that practical performancedepended on the conceptual demand of the science context and the laboratory-versus-everyday context of the investigative task. We conclude thatmeasurement decisions (also) depend on the measurement context of the task.

Even where set reasoning is displayed consistently across data collection anddata processing (about a quarter of the students), a fully internalisedunderstanding is still absent for the university entrant students observed in thisstudy. This is evidenced by the responses to the SMDS and DMSS probes whichrequired students to display internalised set thinking. In these instances, thestudents had to be aware that not only the mean is required to represent a seriesof individual data points but that the degree of dispersion is also essential tocharacterise the data. The findings show that a good proportion of the studentsrecognised the inherent spread in the data. However, when asked to apply thisnotion of spread, these students used only the mean to represent the data. Thebreakdown of set reasoning for data comparison is consistent with the findingsof Gott and Duggan (1995) who reported that secondary school children wererarely successful in meaningful data interpretation, of which data comparison ispart. They concluded that data interpretation has a higher procedural demandthan experimental design, data collection and data presentation. Our results showthat at this high level of measurement demand, set reasoning is only maintainedat a low cognitive level, i.e. recognition.

It is interesting to note examples of students who appear to use both point andset reasoning in a fragmented way. Often, their action is not coherent with theirstated reasoning about measurement, as also noted by Germann et al. (1996). Forinstance, some students state that they repeat measurements in order to take anaverage but then select the recurring value to represent the series ofmeasurements when this option is provided. Other students use the term“averaging” and also describe the procedure to do so correctly, but then choosea reading closest to the calculated mean value to represent the measurement.This contradiction between reasoning and action is also apparent in theresponses to the straight line graph probe where some students describe anappropriate procedure to fit a line to the data, but then draw a line segmentthrough a number of specific data points. It is therefore helpful to differentiatebetween reasoning about measurement and measurement action for each of thetwo paradigms, as illustrated in Table 11 on the next page.

Teaching Scientific Measurement at University

42

Science monograph - final 15/9/05 2:57 pm Page 42

Page 43: Teaching scientific measurement at university

Table 11: Actions and reasoning associated with the point and setparadigms.

Point paradigm:

Set paradigm:

Teaching Scientific Measurement at University

43

Measurement phase

Data collection

Data processing

(Calculation)

Data processing(Straight line graph)

Data set comparison

Action

No repeating ofmeasurements is necessary,or repeat to find recurringvalue, or repeat for practice.

A single (best) measurement ,e.g. the recurring value, isselected to represent the truevalue.

All points joined by multipleline segments or a single linedrawn through selected datapoints.

A value-by-value comparisonof the two sets, orcomparison based on the“closeness” of the means (ifgiven).

Reasoning

A measurement leads to asingle, “point-like” valuerather an contributing to aninterval. Only one goodmeasurement is required.

Each single measurement isindependent of all othersand can in principle be thetrue value.

The trend of the data isbest represented byselecting particular datapoints which describe thedesired trend.

No basis for the need torepeat measurementstherefore comparisonsmade on the basis of thecloseness of individualpoints.

Measurement phase

Data collection

Data processing

(Calculation)

Action

Repeating of measurementsof the same quantity isnecessary as a consequenceof the inherent spread indata.

A set of measurements isrepresented by theoreticalconstructs, e.g. the meanand standard deviation.

Reasoning

Each measurement is onlythan approximation to thetrue value and that thedeviation from the truevalue is random. A largenumber of measurementsare required to form adistribution that will clusteraround some particularvalue.

The best informationregarding the true value isobtained by combining themeasurements usingtheoretical constructs inorder to characterise the setas a whole.

Science monograph - final 15/9/05 2:57 pm Page 43

Page 44: Teaching scientific measurement at university

The results from Study 2 show that it is possible for the actions and reasoningused by students in the laboratory to be drawn on an ad hoc basis from either thepoint or set paradigms, depending on the demands of the particular laboratorycontext. This is illustrated in Figure 3 where the four regions represent the fourbroad categories into which students may be classified based on both theiractions and reasoning. We have no evidence to suggest that this pattern ofreasoning and behaviour is any different for female and male students.

Students whose reasoning and actions are both firmly rooted within the pointparadigm may be located in the bottom left-hand region, while students whoboth act and reason according to the set paradigm may be located in the upperright-hand region. These are the ‘pure’ cases of the point and set paradigms asdescribed in Table 11. Two other possibilities exist. Some students may be ableto use the tools of statistical data analysis, i.e. are able to complete data analysisprocedures associated with the set paradigm, but are theoretically rooted withinthe point paradigm. Such students therefore use the tools and actions of the setparadigm by rote (see Table 11). Many traditional laboratory courses emphasisethe development of set paradigm action, at the expense of promoting setparadigm reasoning. The fourth possibility in Figure 3 is characterised by thosestudents who have a coherent “set” theoretical view of measurement but whohave generally not mastered the operational tools and procedures of dataanalysis. These students, therefore, use actions associated with the pointparadigm.

Teaching Scientific Measurement at University

44

Measurement phase

Data processing(Straight line graph)

Data set quality

Data set comparison

Action

All the measurements takeninto account by a leastsquares straight line fit to allthe data.

For the same number ofmeasurements, the bettermeasurement is chosen to bethe one associated with thesmallest standard deviation.

The agreement of twomeasurements is related tothe degree of the overlap oftheir intervals.

Reasoning

The best graphicalrepresentation of series ofmeasurements is obtainedby modelling the trend ofthe data.

The standard deviation isrelated to the precision ofthe measurement.

The mean and standarddeviation define aconfidence interval which isrelated to both the bestestimate and the reliabilityof the measurement.

Set paradigm (continued):

Science monograph - final 15/9/05 2:57 pm Page 44

Page 45: Teaching scientific measurement at university

Figure 3: The goal of instruction in relation to the point and set paradigms.

The broad purpose of laboratory instruction should therefore be directed atfacilitating a shift in the paradigm used by student for understanding scientificmeasurement. The next section of this paper presents the evaluations of twokinds of physics laboratory courses in terms of their effectiveness in supportingsuch a shift.

Study 3: Evaluation of our original physics laboratorycourse Studies 1 and 2 above described the development of our paradigmatic model ofstudents’ understanding of measurement. Study 3, partly reported in Buffler et al.(2001), focussed on evaluating our physics laboratory course in terms of itseffectiveness in moving students toward using the set paradigm of measurement in ascientific laboratory context.

The research questions for the evaluation were:

■ In what ways have students’ use of the point and set paradigms changed at theend of the laboratory course?

■ In what ways do these shifts differ with respect to decisions made during datacollection, data processing and data comparison?

Teaching Scientific Measurement at University

45

Rote and ad hocset actions

Setparadigm

Pointparadigm

Rote and ad hocset reasoning

Goal of

instruc

tion

Set

Actions

Point

Actions

PointReasoning

SetReasoning

Science monograph - final 15/9/05 2:57 pm Page 45

Page 46: Teaching scientific measurement at university

The GEPS physics laboratory course

The GEPS physics course (see Allie and Buffler, 1998) comprises of a theorycomponent, a laboratory-based experimental component and a communicationsskills component. Since almost all the GEPS students have had little or no first handpractical experience, the prime aim of the laboratory course is to allow students toengage with a variety of experimental situations using various pieces of apparatusand to introduce the basics of scientific measurement. The aim for the laboratorycourse was not the illustration of physics concepts or phenomena, but rather todevelop students’ understanding of the measurement process and their skills in usinga variety of measurement instruments and data analysis tools. Aspects of datacollection, data processing and data comparison were addressed by a number offormal skills development exercises. These covered drawing up of tables, plottingand interpreting graphs, making measurements and dealing with uncertainties.

Since the majority of students on the course have had little or no exposure to “hands-on” practical work, they are mostly unfamiliar with the planning and executing astructured experimental task. The well-known “recipe-style” laboratory practicals,often an exercise in reproducing well-known results, also do not lend themselves toforegrounding the measurement issues introduced in the course. Furthermore, sincemost students also do not speak English as a first language, the terse and technicalway in which these tasks are formulated play an obscurant role which diminishes thevalue of the learning experience. Often the authoritative tone of a list of detailedinstructions can intimidate students, leading to a paralysed response for even simpletasks. Another skill which was felt to be desirable, particularly in view of thelanguage factor, was that of scientific report writing which serves the role ofexposition as well as providing a tool for students to reflect on the proceedings of alaboratory experience as a whole. For this purpose, too, the “recipe-style” laboratorywas found to be unsuited since it tends to generate a piece of writing which closelyresembles the original set of instructions.

For these reasons, laboratory tasks in the course were framed in the form of authenticproblems that require an experimental investigation for their resolution (see Allie etal., 1997). As an example, Figure 4 shows the well-known pendulum practical recastin this form. This leads to the production of a writing-intensive report as a naturalconsequence of the investigation. Students, working in groups of three, withassistance from “roving” laboratory demonstrators, generate the procedures that arerequired as the experiment progresses. At the same time, reporting on a completedexperiment forms a central part of the experience and producing writing intensivereports provide additional learning and assessment opportunities (Allie et al., 1997).The report also serves as a focus for synthesizing the various experiences in thelaboratory. The way in which the task is presented, together with the positing of anaudience which is not present during the investigation, obviates the problem of thereport comprising a series of instructions rather than an account of the experiment.Assessing these writing-intensive reports posed a difficulty since most of the markers

Teaching Scientific Measurement at University

46

Science monograph - final 15/9/05 2:57 pm Page 46

Page 47: Teaching scientific measurement at university

are post-graduate students who were not comfortable in dealing with thecommunication aspects of the report. An assessment instrument (Allie et al., 1997),based around the concept of the coherence of the report, was developed, whichprovides both a framework for the markers to use and provides clear feedback to thestudents. Using the assessment instrument with its explicit criteria has removed thesubjective nature of the assessment and the award of an “impression” grade.

Figure 4: An example of an “authentic” problem-based practical exercise.

The laboratory course consisted of one 3-hour session per week for 12 weeks.About half of these sessions were spent on the laboratory investigations, with theother half devoted to the skills development exercises (Allie and Buffler, 1998).

Evaluation Method

A cohort of GEPS students was asked to respond to a set of probes both before andafter completion of the course. The pre-course probes were written by 147 students

Teaching Scientific Measurement at University

47

Pendulum problem swings you into action

Imagine that you now work for a Scibucks Enterprises, a scientific companythat consults for industry. Your boss calls you into her office and explainsthat she wants you to undertake an investigation for a client who is a clockmaker. The clock maker says that he needs to know what the relationship isbetween the length of a pendulum and its period and must have evidence thatthis relationship works in practice. You remember from your undergraduatephysics days that the period, T , of a pendulum is related to the length, l,of the string by where g is the acceleration due to

gravity.

You therefore devise two experiments to test the theory:Experiment A. Measure T for different lengths l and then plot a suitablegraph to show that the above equation is valid. Experiment B. Choose one length l and measure T many times, and thencalculate g (using the equation above.) If your measured value g ± u(g)agrees with the theoretical value for Cape Town (9.79 m s-2) then this wouldsuggest that the equation for T is correct.

Your boss tells you that she must have your report completed before 10:00on this Friday which should include a full description of your method, all themeasurements you make, the calculations and graph, an uncertainty budget,and a suitable discussion and set of recommendations to the clock maker.

lg

T = 2�

Science monograph - final 15/9/05 2:57 pm Page 47

Page 48: Teaching scientific measurement at university

and the post probe by 125 students. The sample of this evaluation study comprisesthe 70 students who wrote both sets of probes. A total of eight probes were writtenbefore the course, and seven after the course. Students’ pre-course understanding ofmeasurement for data collection was identified through three probes, i.e. the“Repeating Distance” (RD) probe, the “Repeating Distance Again” (RDA) probe andthe “Repeating Time” (RT) probe. These were identical to probes used in the surveyof students’ understanding described earlier. Since we expected little differentiationin understanding of measurement for data collection at the end of the course, studentsanswered only the “Repeating Distance” (RD) probe after the course. Students’understanding of measurement for data processing was identified through the sametwo probes before and after the course, i.e. the “Using Repeats” (UR) probe and the“Straight Line Graph” (SLG) probe. These too were identical to those used for thesurvey reported earlier. Students’ understanding of measurement for data comparisonwas investigated using three probes before the course, i.e. the “Same Mean DifferentSpread” (SMDS) probe, the “Different Mean Same Spread” (DMSS) probe and the“Different Mean Overlapping Spread” (DMOS) probe. In addition, the post-courseinstrument included a fourth item probing students’ understanding of measurementfor data comparison, the “Different Mean Same Uncertainty” (DMSU) probe. Thefull probes are presented in Appendix 1.

The administration of the probes strictly followed the protocol described earlier. Theanalysis of the responses focussed on students’ use of the point and set paradigms foreach probe. Typical responses for point and set reasoning for data collection, dataprocessing and data comparison were similar to those presented as illustrations in thesurvey above (Study 2). Frequencies of the use of point and set reasoning for pre-post course comparisons were compiled for each area of measurement for the wholesample. Changes in the use of paradigms for individual students were also identifiedand clustered.

While we have focused our research on GEPS students, we have also studiedmainstream (“direct entry”) physics students. In terms of our research on their ideasabout measurement it is important to note that they too followed a laboratory coursewith aims similar to those of the GEPS students. However, the approach and pace oftheir course recognised the higher entry attainment and laboratory experience ofthese mainstream students compared to the GEPS students.

Results

(i) Use of a paradigm for data collection

Students’ pre-course understanding of measurement for data collection wasidentified through three probes, i.e. the “Repeating Distance” (RD) probe, the“Repeating Distance Again” (RDA) probe and the “Repeating Time” (RT) probe.The range of responses indicating the use of the point and set paradigm for datacollection was similar to that reported in the surveys described above. Table 12

Teaching Scientific Measurement at University

48

Science monograph - final 15/9/05 2:57 pm Page 48

Page 49: Teaching scientific measurement at university

Teaching Scientific Measurement at University

49

presents the frequencies of the student responses for these probes in terms of the useof a point or set paradigm before instruction, against responses by the same studentsafter their laboratory course.

Table 12: Students’ use of paradigms when collecting data (RD, RDA and RT probes). (n = 70)

Paradigm after instruction (from RD probe)

Consistent 6 28 4 38 point (9%) (40%) (6%) (54%)paradigm

Mixed 5 15 1 21paradigms (7%) (21%) (1%) (30%)

Consistent 2 3 0 5set paradigm (3%) (4%) (0%) (7%)

Not 2 4 0 6codeable (3%) (6%) (0%) (9%)

Total 15 50 5 70(21%) (71%) (7%) (100%)

The data in Table 12 show a distinct shift after instruction towards the use of a setparadigm, i.e. the notion of repeating measurements in order to find a mean.Whereas before teaching, more than half of the students consistently used the pointparadigm, only one in five of the students did so after instruction. On the other hand,one in twelve students consistently used the set paradigm before instruction, whereasover two thirds did so after instruction. Deeper analysis of the data shows that thelargest shift from the use of a point to set paradigm occurred in the group of studentswho initially repeated to find the recurring value, but later chose to calculate a mean.The vast majority of students persistently using the point paradigm claimed that theywanted to repeat in order to perfect their measuring skill.

The question arises as to whether the students who indicated the intention ofrepeating in order to calculate a mean have indeed embraced the set paradigm. Asshown previously, many students simply saw calculating a mean as being arequirement of an experiment. Since the necessity of a mean requires generatingmany different readings, they repeated their measurements. Thus, the remainingprobes explored whether the apparent shift away from the point paradigm wassuperficial (rote or formulaic), or whether the changes indicated a genuineparadigmatic shift.

Pointparadigm

Set paradigm

Notcodeable

Total

Paradigmbefore

instruction(from RD,

RDA and RTprobes)

Science monograph - final 15/9/05 2:57 pm Page 49

Page 50: Teaching scientific measurement at university

(ii) Use of a paradigm for data processing

Two probes were used to solicit students’ ideas about data processing: one probeasked students to represent a set of data by a single quantity (the “Using Repeats”(UR) probe), and the other required a series of plotted points to be modelledgraphically (the “Straight Line Graph” (SLG) probe). Responses to the UR probewere generally similar to those provided for Studies 1 and 2 described above.However, the SLG probe provided an interesting range of responses.

When students select specific points through which to draw a straight line, thisindicates the use of the point paradigm. The origin often plays a prominent role asillustrated by the last quotation below

I have plotted line through these two points [the first and last point] because the slopecan be calculated very well when using this line. (SLG response)

I have drawn a line through as many points as possible since then the line touches themost number of points. (SLG response)

I started drawing a line from the origin. I chose a group of dots that, when a line isdrawn through them, goes through the origin. (SLG response)

Students who drew undulating curves through all the points were also classifiedaccording to the point paradigm. This interpretation was supported by explanationssuch as:

I drew a curve through all the points since all the points must be touched. (SLGresponse)

These students were clearly unable to see the data as displaying a trend that neededto be modelled. Only those students who fitted a line by taking into account all theplotted points were classified according to the set paradigm. Since least squaresfitting is not introduced at school, responses associated with the set paradigm beforeinstruction were exemplified by commentary such as:

The line should have more or less equal numbers of dots (representing measurements)above and below it, because in that case we get best average value. (SLG response)

I have tried to make the line lie so that all points lie at a minimum distance fromthe line. (SLG response)

After instruction, students could explain their set action according to the procedureof least squares fitting, for example:

I just took the path that is in between all points plotted. Not to pass through them all,but to decrease the space between all points and the line. (SLG response)

Teaching Scientific Measurement at University

50

Science monograph - final 15/9/05 2:57 pm Page 50

Page 51: Teaching scientific measurement at university

I have drawn a line which approximates a least squares fit to the data. The distancebetween the line and each point has been minimised (approximately). (SLG response)

Table 13 summarises the changes in students’ use of the point and set paradigmswhen dealing with the processing of data. From Table 13 it can be seen that 43% ofthe students used the set paradigm consistently after instruction, while a similarpercentage of students were inconsistent in their use of the two paradigms. Theproportion of students using consistently a point paradigm for data processingdecreased after instruction from 77% to 13%, while the proportion of those usingconsistently a set paradigm increased from 7% to 43%. However, caution needs tobe exercised since it is possible that the formalistic procedures of finding the meanand fitting a straight line may have been used by rote. It is necessary to include theanalysis of the probes dealing with the comparison of data sets in order to assess theinternalised understanding of the set paradigm.

Table 13: Students’ use of paradigms when processing data sets (UR andSLG probes). (n = 70)

Paradigm after instruction

Consistent 9 26 19 54point (13%) (37%) (27%) (77%) paradigm

Inconsistent 0 5 6 11paradigms (0%) (7%) (9%) (16%)

Consistent 0 0 5 5set (0%) (0%) (7%) (7%)paradigm

Total 9 31 30 70 (13%) (44%) (43%) (100%)

(iii) Use of a paradigm for data set comparison

Three probes each provided two sets of data together with their calculated means andasked the student to comment either on the compatibility or relative quality of thetwo data sets. They were the “Same Mean Different Spread” (SMDS) probe, the“Different Mean Same Spread” (DMSS) probe and the “Different Mean OverlappingSpread” (DMOS) probe (see Appendix 1). The range of responses indicating the useof the point and set paradigm for these probes extends beyond that reported inStudies 1 and 2, reported above.

The post-course responses provided several examples of the use of a fullyinternalised set paradigm. In order for a student’s response to be associated with the

Teaching Scientific Measurement at University

51

Consistentpoint

paradigm

Inconsistentparadigms

Consistent setparadigm

Total

Paradigmbefore

instruction

Science monograph - final 15/9/05 2:57 pm Page 51

Page 52: Teaching scientific measurement at university

set paradigm, there needed to be some evidence that the student took all the data intoaccount when deciding whether or not the two sets of data agreed with each other,for example

The results agree because the measurements overlap with each other. The twomeans are the same by chance. (SMDS response)

The two Groups’ results agree with one another. We are getting an average and thuswill also have an uncertainty ranging between the numbers they got. Therefore,both fall within an uncertainty range of each other’s. (DMSS response)

The most sophisticated responses indicating fully internalised set reasoning were ofthe type

The results of both groups agree, because surely when they calculate their finalresults with their uncertainty value, their results for d will probably lie in one-another’s confidence interval. (DMSS response)

You will need to work out the standard deviations of the mean for each data set. Ifyou do so, then I think that it will be smaller for Group A’s data. (SMDS response)

The “Different Mean Overlapping Spread” (DMOS) probe was used for the first timein this study:

Two groups of students compare results for five releases of the ball at h = 400 mm.

Group A: 444 435 424 440 432 Average = 435 mmGroup B: 458 438 462 449 443 Average = 450 mm

A: “Our result agrees with yours.”B: “No, your result does not agree with ours.”

The dispersion in the data sets in the SMDS probe differed from one another, whilein the DMSS and DMOS probes, the degree of spread of each data set was similarand overlapped in each case. The main focus in the analysis was on the ways inwhich the students used the notion of spread in their reasoning and how thisreasoning refined the classification of these students according to the point or setparadigm on the basis of their responses to earlier probes. The SMDS probepresented the issue of spread explicitly, and thus required recognition of dispersionas a descriptor of the quality of a set of readings. In the two subsequent probes(DMSS and DMOS), the notion of the spread as an indicator of the uncertainty(standard deviation of the mean) of a set of readings needed to be conceptualised and

Teaching Scientific Measurement at University

52

Science monograph - final 15/9/05 2:57 pm Page 52

Page 53: Teaching scientific measurement at university

applied when deciding whether or not the intervals defined by the two sets ofreadings indeed overlapped.

Responses to the DMOS probe (as with the other probes) were associated with theimposed set paradigm when students used their intuitive notions of ‘closeness’ of theaverages to come to a decision about agreement of the two sets of readings, ascharacterised by

The two results have a difference of 1.5 cm which is not much so I would say that thetwo results do agree as they are in the same ballpark. (DMOS response)

Other students used the same type of reasoning to draw the opposite conclusion,as illustrated by

The results of both groups do not agree, because the difference in the average ismore than 5, therefore they do not agree with each other. (DMOS response)

It should be noted that, although the text in these three probes provided a mean as asummary of the measurements, many students still resorted to examining theindividual readings in the data sets in order to make a decision. This extreme formof point reasoning is illustrated by the following responses

The results of both groups do not agree, because none of their measurements arethe same. One or two may be close but not that close. (DMOS response)

In order for a student’s response to be associated with the internalised set paradigm,there needed to be some evidence that the student took all the measurements intoaccount when deciding whether or not the two sets of data agreed with each other(intervals overlapped), for example

Checking the spreading of Group A (424 to 444) and of Group B (438 to 462) youcan tell that these two agree within the experimental error. (DMOS response)

The most sophisticated responses were of the type:

The results of both groups agree, because surely when they calculate their finalresults with their uncertainty value, their results for d will probably lie in one-another’s confidence interval. (DMOS response)

After analysis of the responses according to the use of the point or set paradigm,Table 14 shows student responses to the SMDS, DMSS and DMOS probes, groupedaccording to whether or not the responses across all three probes were consistentwith in internalised set paradigm.

Teaching Scientific Measurement at University

53

Science monograph - final 15/9/05 2:57 pm Page 53

Page 54: Teaching scientific measurement at university

Table 14: Students’ use of paradigms when comparing data sets (SMDS,DMSS and DMOS probes). (n = 70)

Paradigm after instruction

Inconsistent 48 16 5 69paradigms (68%) (23%) (7%) (98%)

Consistent 0 0 0 0internalised (0%) (0%) (0%) (0%)set paradigm

Not 1 0 0 1codeable (2%) (0%) (0%) (2%)

Total 49 16 5 70(70%) (23%) (7%) (100%)

As expected from the previous experiences of the students, prior to instruction it wasnot possible to classify any of them as being consistent users of a fully internalisedset paradigm. In addition, 98% of the students were unable to reason consistentlythrough the various situations posed by the probes. Most of these students used onlythe means to compare the two data sets while only a few gave any recognition to thespread in the data. This was to be expected since the school science curriculum inSouth Africa does not require emphasis to be placed on ways of dealing with thespread in repeated experimental measurements. The ‘average’ is introduced as theonly construct for dealing with repeated observations. After their laboratory course,more than two thirds of the students (70%) still did not reason consistently across thethree probes according an internalised set paradigm, thereby suggesting a fragmentedunderstanding. The consistency of the use of the point or set paradigms in makingdecisions in all three areas of measurement after the course is explored in Table 15,which shows the post instruction results for the use of the two paradigms for datacollection and data processing (as in Tables 12 and 13) cross-tabulated against theuse of these paradigms for data comparison (as in Table 14).

Teaching Scientific Measurement at University

54

Inconsistentparadigm

Consistentinternalised

set paradigmNot codeable Total

Paradigmbefore

instruction

Science monograph - final 15/9/05 2:57 pm Page 54

Page 55: Teaching scientific measurement at university

Table 15: Students’ use of paradigms for data collection/processing anddata set comparison after instruction. (n = 70)

Paradigms used in data collection / processing

Inconsistent 9 25 13 2 49paradigms (13%) (37%) (18%) (2%) (70%)

Consistent 0 3 15 0 18 internalised (0%) (4%) (21%) (0%) (26%)set paradigm

Not 0 1 2 0 3codeable (0%) (1%) (3%) (0%) (4%)

Total 9 29 30 2 70(13%) (42%) (43%) (3%) (100%)

The data in Table 15 suggest that even after the intensive laboratory course, onlyabout one fifth (21%) of the students consistently based their responses on aninternalised set paradigm for all three areas of data collection, data processing anddata-set comparison. On the other hand, three quarters of the students (70%+42%-37%) were still inconsistent in their actions and/or reasoning across the set of probesafter instruction.

The final probe used was the “Different Mean Same Uncertainty” (DMSU) probewhich was only included in the post-instruction set. The DMSU probe askedstudents to compare two measurements which are presented in terms of a mean anda standard deviation of the mean.

The “Different Mean Same Uncertainty” (DMSU) probe:

Two other groups of students compare their results for d obtained byreleasing the ball at h = 400 mm. Their means and standard deviation ofthe means for their releases were:

Group A: d = 436 ± 5 mmGroup B: d = 442 ± 5 mm

A: “Our result agrees with yours.”B: “No, your result does not agree with ours.”

Teaching Scientific Measurement at University

55

Consistentpoint

paradigm

Inconsistentpoint-setparadigm

Consistentset

paradigm

Notclassified

Total

Paradigms used indata set

comparison

Science monograph - final 15/9/05 2:57 pm Page 55

Page 56: Teaching scientific measurement at university

Responses to the DMSU probe were associated with the imposed set paradigm if thecomparison between the two results was made either using only the mean values, oronly the numerical value of the standard deviations, as illustrated by:

The means are different so the results are different. (DMSU response)The uncertainties are the same so the results are the same. (DMSU response)

Sometimes a comparison was made using both the mean and the numerical value ofthe standard deviation:

Although the uncertainty is the same the results are different because the means aredifferent. (DMSU response)

On the other hand, a student’s response was identified with an internalised setparadigm if the student attempted to reach a decision according to the degree to whichthe intervals of the two measurements overlap, for example:

The two measurements agree with each other since the intervals defined by theirstandard deviations overlap with each other. (DMSU response)

The responses to the DMSU probe could therefore involve an action which is inkeeping with the set paradigm but the reasons given may or may not have followedfrom that paradigm (see Figure 3). Table 16 shows the classification of studentresponses on the basis of all the probes discussed thus far, together with the results ofthe DMSU probe.

Table 16: Students’ use of paradigms for data collection, data processingand data-set comparison after instruction. (n = 70)

Paradigm used for DMSU probe

Consistent 3 6 0 9point (4%) (9%) (0%) (13%)paradigm

Inconsistent 11 17 1 29paradigm (17%) (24%) (1%) (42%)

Consistent 14 16 0 30set (20%) (23%) (0%) (43%)paradigm

Not 1 1 0 2codeable (1%) (1%) (0%) (3%)

Total 29 40 1 70(42%) (57%) (1%) (100%)

Teaching Scientific Measurement at University

56

Imposed setparadigm

Internalised setparadigm

Not codeable

Total

Classification based on all

previousprobes

Science monograph - final 15/9/05 2:57 pm Page 56

Page 57: Teaching scientific measurement at university

Although more than half the students (57%) carried out a set paradigm action, fewerthan half of this group (23% of the sample) appear to be located firmly within the setparadigm. Thus for the DMSU probe, three-quarters of the students either used animposed paradigm action (42%) or appear to have used the correct set paradigmaction by rote or in an ad hoc way (33% = 57%−23%−1%). There appears to be norelationship between ability of the students to apply the formalistic rules ofoverlapping intervals and their underlying understanding of the statistical nature ofmeasurement. In other words, the present results suggest that the ability of studentsto reason appropriately when the results are provided in a formal way (i.e. as a meanand a standard deviation) does not imply that the same students have developed acommensurate conceptual understanding of the underlying principles of theirreasoning.

Discussion

The results of Study 3, described above, support the view that students’understanding of measurement during the phases of data collection, data processing(by calculation or graph) and data set comparison (for quality or compatibility) canbe characterised in terms of point and set paradigms. These paradigms form a usefulbasis for the interpretation of students' decision-making processes and actions duringinvestigative activities in the laboratory before and after instruction. Students’ use ofpoint and set paradigms can therefore inform laboratory curriculum development.

Nearly all of the students in Study 3 could be classified as subscribing to the pointparadigm prior to instruction. This proportion is much higher than in Study 2, mainlybecause of the set paradigm use amongst the DES students in the earlier study. Evenwhen students carried out an action associated with the set paradigm, for examplefinding a mean, their responses to the probes which dealt with the dispersion in setsof data (the SMDS and DMSS probes) confirmed that their set reasoning was eitherundeveloped or nonexistent.

After their laboratory course, the vast majority of the students were able to representan ensemble of readings of a quantity by a mean. However, the fact that the mean ofa set of measurements has little significance without some indication of an intervalof uncertainty seems not to have been widely internalised. This fact was evidencedby the response patterns in the probes (SMDS and DMSS and DMOS) that requiredstudents to use the set paradigm at a deeper level. Although most students recognisedthe presence of the scatter in the data (SMDS probe), when asked to use the spreadto make a decision, many students resorted to purely subjective notions of acceptable‘closeness’ of the means. On the other hand, many of these students correctly optedto reason on the basis of overlapping intervals (Table 16) when confronted with dataalready represented by a mean and an uncertainty. This suggests that the laboratorycourse, although successful in its aims of teaching students the formal procedures ofdata analysis, was not able to provide the necessary links between the nature of

Teaching Scientific Measurement at University

57

Science monograph - final 15/9/05 2:57 pm Page 57

Page 58: Teaching scientific measurement at university

measurement, and the techniques for processing data. A study of the effectiveness ofa similar laboratory course for equivalent chemistry students (Rollnick et al., 2002)illustrates the same limited improvements. More strikingly, an evaluation of a similarlaboratory course for mainstream physics students (Volkwyn et al., 2004) concludesthat, although the level of understanding of measurement at entry into theundergraduate programme seemed considerably greater for this higher achieving andwell-prepared sample of students, after the course their understanding was found tobe also based entirely on the use of data analysis algorithms. This finding isconsistent with the study of French physics students (Séré et al., 1993) that showedthat the virtuosity displayed in applying the formal calculational aspects of dataanalysis was not commensurate with their understanding of the underlyingprinciples. These results are not surprising, as most laboratory curricula, includingthe course evaluated here, assume that understanding emerges from the applicationof formalistic rules.

Although the students in the sample may be classified as advanced reasoners, thequotations from their explanations of actions show that their language usage washaphazard. There was considerable confusion about terminology such as ‘spread’,‘range’, ‘uncertainty’, ‘precision’ and ‘accuracy’. Post instruction responses oftenstruggled with explanations in terms of features of ‘systematic’ and ‘random’ errors,although the vast majority of students argued that repeating is needed to limit therandom error, and therefore to improve precision. This confirms findings by Sere etal. (1993) and Tomlinson et al. (2001). The frequent reference to variability inreadings by value-laden terms such as error, mistake or inaccuracy hints at afundamental misunderstanding of the nature of scientific measurement. At the sametime, over half of the responses in the total sample indicated that repetition isrequired in order to get closer to the real or correct value for a time/distance measure.This implies that such students have the notion that in the ideal case variability couldand should be eliminated.

There can be no doubt that measurement context influences the choice of paradigmto be used. For example, two out of three students used the set paradigm in the‘dynamic’ case of measuring time, whereas only one in two students used it for themore ‘static’ distance readings. Later studies (Lubben et al., 2004) have also shownthat context is important. Asked to consider scale readings in a kitchen andlaboratory environment, students applied set thinking to the home situation but pointthinking to the laboratory scenario. Similarly, Leach et al. (2000) found thatmeasurements were related to theory differently by the same students depending onthe measurement context. Their survey suggests that students use differentepistemologies of the nature of science for interpreting measurements in differentcontexts.

It also appears that an understanding of the nature of measurement is closely linkedto the development of the appropriate use of the tools for data reduction and analysis.

Teaching Scientific Measurement at University

58

Science monograph - final 15/9/05 2:57 pm Page 58

Page 59: Teaching scientific measurement at university

If a student is a point reasoner, then the use of constructs such as the mean andstandard deviation will likely be learned by rote. On the other hand, even if a studentis a consistent set reasoner, then this does not imply understanding of the operationaltools of data analysis. Therefore, activities are required which allow students todevelop the use of the full range of measurement-related tools while engaging intasks that challenge their views about the fundamental nature of measurement andmodify their behaviour in the laboratory.

It is also our strong impression from teaching these students that procedural ‘rules ofthumb’ acquired at school could be seriously impeding the development ofprocedural understanding at university. For example, students who join the datapoints on a graph when asked to ‘fit’ a straight line seem to be more easily introducedto the notion of a ‘best fit’ straight line than those who have come from school withan algorithm such as drawing a single line through as many points as possible.Furthermore, the notion of the mean as a panacea for all the problems ofexperimental ‘error’, seems to impede the development of the ideas of inherent‘uncertainty’ in measured quantities. It might be harder to shift students from the‘rote set actions’ region of Figure 3 to a coherent use of the set paradigm, than it isto shift students who both act and reason according to the point paradigm.

The key outcomes of our work at this point can be summarised as follows:

■ a methodology to investigate students’ ideas about measurement;■ a set of diagnostic probes and coding schemes to illuminate students’ ideas

about measurement;■ the development of a paradigmatic theory to describe students’ reasoning

about measurement (point and set paradigms);■ evidence that the established laboratory course at UCT was not particularly

successful in moving students from using a point to using a set paradigm whendealing with measurement in scientific context;

■ evidence that the algorithms for handling data that were learned at schoolimpeded the acquisition of understanding of measurement.

In addition, at this stage we had become increasingly concerned about themismatch between the approach suggested for use when reporting onexperimental measurements in physics research, and what was being taught inthe undergraduate laboratory. Evangelinos et al. (1998) came to similarconclusions and recommended that ‘...probabilistic reasoning in the context oflabwork should be presented not only as a technique for data treatment but as aninherent feature of scientific enquiry’. We therefore set out to rethink ourframework for teaching measurement and to redesign, implement and evaluate anew laboratory programme.

Teaching Scientific Measurement at University

59

Science monograph - final 15/9/05 2:57 pm Page 59

Page 60: Teaching scientific measurement at university

Framework for establishing a physics laboratorycurriculumIn putting together the new curriculum three areas were identified that would impacton this enterprise. Firstly, students’ existing views of measurement would need to betaken into account in structuring the learning programme. The results from ourstudies reported earlier indicate clearly that whereas the set paradigm is theappropriate way to approach scientific measurement, students in general arrive atuniversity with a point paradigm perspective which they would apply to situations inthe physics laboratory. Even in the cases where students were exposed to practicalwork at school and could perform certain set actions, their reasoning was oftenrooted in the point paradigm. This is probably not surprising considering thatmeasurements for pragmatic purposes and for reaching specific targets and goals areubiquitous throughout our lives. Thus, we all have endless encounters judgingwhether we have measured out sufficient of a quantity (closeness to the target) orwhether the size of an object is acceptable (goodness of fit). All these activitiesinvolve comparisons and judgements with a pre-set criterion and where recourse tothe point paradigm is highly appropriate, in particular the fact that all suchcomparisons are performed in the “space” of the real data at hand. It is thereforeinteresting to reiterate again the finding that the notion of “approximate” isassociated with how far away from the perceived ideal the datum actually is and that“intervals” are often associated with this ‘distance”.

The second area that impacted upon the design of the curriculum was the primarypurpose of the course. At first year level many laboratory courses have the primaryintention of demonstrating particular phenomena and principles of physics.Experimentation per se is usually seen as a vehicle for attaining these goals. Issueswith regard to experimental technique and data analysis are usually taught as andwhen the need arises. However, as we now know from our studies reported earlier,this approach has not been satisfactory for promoting the understanding of physicsas an experimentally based science. In setting up the new curriculum we decided tomake the teaching of experimentation and the understanding of measurement andmeasurement uncertainty the prime focus. This does not, however, replace the needfor opportunities for students to have hands-on experience with the phenomena thatare being discussed in a course. Both types of activities are essential forunderstanding physics but each requires its own focus.

The final area of consideration is the content that has to be covered. Throughout thework that we have presented thus far, we have made the tacit assumption that thereis an agreed body of knowledge that students have to internalise to enable them toparticipate meaningfully in the process of scientific experimentation. However, thesituation here is problematic. The reason for this is that the methods of measurementpresently being taught at the undergraduate level are different to those being used atthe research level in a fundamental way. Much of what is currently being taught in

Teaching Scientific Measurement at University

60

Science monograph - final 15/9/05 2:57 pm Page 60

Page 61: Teaching scientific measurement at university

data analysis courses at the first year level in physics goes against the internationallyrecommended practice for professional scientists. Part of the problem has been thatthese recommendations, which will be discussed in detail later, have been slow inpermeating the general scientific community, possibly because the new approachdiffers markedly in both philosophy and practice from established procedures. Forexample, the terms “random errors” or “systematic errors“ and the various ad hocprescriptions associated with these terms are hardly questioned. However, one of thedifficulties encountered during teaching is the problem in reconciling the formalismthat is dependent on large data sets with situations in the first year laboratory wheremore often than not only few readings are available. The case is often exacerbatedby the use of digital instruments requiring a decision on how to deal with ameasurement which can only be performed once or which yields the identical valueeach time. Another feature is the issue of significant figures which often dominatesthe discussion of data analysis at this level and in many cases is used to introduce theconcept of uncertainty. Here much of the discussion tends to involve rules of thumband notions that are never brought within the formalism. These issues have madeboth the teaching and learning of experimentation difficult.

The issues mentioned above are not solvable within the framework of data analysisthat has been most common up to now. One of the key features of this framework isthat it relies on analysing data in terms of frequencies so that at least 20 to 30 datapoints are required to perform a statistical analysis. Thus, there is no solution to thedifficulties with handling small data sets. However, there has always been acompeting theory that has been felt to be more appropriate for dealing with suchlimited data. There are fundamental philosophical differences between the twoapproaches. One of the key features of the competing theory is that the formalismleads directly to drawing conclusions about the measurand from the data in alogically consistent way. The approach views measurement as a problem of inferenceand uses probability theory to construct claims about a measurand based on the data(or datum) at hand. It is interesting to note that this approach was developed by thephysicist Laplace. The traditional approach which relies on frequency analysis willbe referred to as the frequentist approach while the inference based approach will bereferred to as the probabilistic approach.

The difficulties inherent with the frequentist approach together with the fragmentedand inconsistent way of applying the formalism and terminology across differentscience disciplines led the Bureau International des Poids et Mesures (BIPM) toreview the situation with regard to calculating and reporting measurements anduncertainties. These efforts which started in the late 1970s culminated in a set ofrecommendations and guidelines issued in the 1990s. These recommendations havebeen adopted by all international standards organisations including IUPAP(International Union of Pure and Applied Physics) and IUPAC (International Unionof Pure and Applied Chemistry). The two documents that are regarded asauthoritative are the International Vocabulary of Basic and General Terms in

Teaching Scientific Measurement at University

61

Science monograph - final 15/9/05 2:57 pm Page 61

Page 62: Teaching scientific measurement at university

Metrology (ISO, 1993) and the Guide to the Expression of Uncertainty inMeasurement (ISO, 1995), often referred to as GUM. A shorter version of the latteris publicly available as NIST Technical Note 1297 (Taylor and Kuyatt, 1994). Thekey point about the recommendations is that they are based on a probabilisticframework in keeping with the Laplace-Bayesian approach to analysing andinterpreting data.

Although many of the formulae from mathematical statistics appear within theprobabilistic framework, in particular where the data are sufficiently large, theinterpretation of these formulae is not the same as within the frequentist frameworki.e. the new approach has far reaching consequences for the teaching of measurementuncertainty if understanding is regarded as a desirable outcome. Soon after thepublication of the ISO standards in 1995 Evangelinos et al. (1998) concluded that aprobabilistic approach was not only required for teaching the technique for datatreatment but that this approach also serves as guide for scientific enquiry inexperimental work. Our own work has led us to a similar conclusion and we haveadopted the probabilistic framework as the core of the new laboratory course inmeasurement. Because of the central importance of the new framework for teachingdata analysis we present a brief review of the key concepts and features below.Various additional technical issues are discussed in the paper by Allie et al. (2003).

The probabilistic framework for measurement and uncertainty

In the frequentist approach it is usually assumed that there is a true value for ameasurand and that each measurement that is performed in order to ascertain thisvalue has some random scatter around this value. The distribution of the scatter isusually assumed to be Gaussian and that as more and more data are gathered themean value of the Gaussian will tend to the true value. Thus, the true value of themeasurand is considered to have no uncertainty associated with it. It is the data thatare regarded as being “uncertain”. In contrast, in the probabilistic approach the dataare regarded as the real manifestation of the phenomenon. Thus, the data are notregarded as having some random error but rather that the data (readings) areconstants and the inference drawn about the measurand has some degree ofuncertainty associated with it. In other words, what we conclude must necessarily beincomplete since our knowledge about the measurand is based on the finite data wehave at hand.

Since we have no direct way of interrogating the measurand, the inference (possiblytogether with some other prior knowledge we might have about the measurand)contains everything we know about the value of the measurand. Should we gathermore data this could modify the value of the measurand. Thus, the value of themeasurand is a parameter that depends on the measurement that was performed.Even if we imagine there to be a “true” value we have no access to it, and hence theterm can at most be regarded as a non-technical descriptor. We can only ever have

Teaching Scientific Measurement at University

62

Science monograph - final 15/9/05 2:57 pm Page 62

Page 63: Teaching scientific measurement at university

access to estimates of the measurand, and since we can never have completeinformation about the measurand we also need to accompany the best inferredapproximation with an estimate of how incomplete our knowledge is about themeasurand. The measure of the degree of this incompleteness is the uncertainty. Onecan think of the data as manifestations of the measurand or more directly, that themeasurand “causes” the data. More technically, in the definition of the ISO“Uncertainty is a parameter associated with a measurement result, that characterizesthe dispersion of the values that could reasonably be attributed to the measurand”(ISO 1993, 1995).

The formal route that enables the data at hand to be transformed into inferences aboutthe measurand is probability theory. We start by taking the data we have at hand aswell as any prior information and model these with appropriate probability densityfunctions (pdfs). Appendix 3 contains brief details about pdf’s. The most commonpdfs that are used in metrology are the Gaussian, the rectangular (uniform) and thetriangular ones. The pdfs are combined to form the final probability density functionwhich encapsulates all our knowledge of the measurand. We point out forcompleteness that central to the notion of proceeding from the data to inferencesabout the measurand is Bayes’ Theorem that formally allows us to proceed frombeing able to make statements about the data to making statements about themeasurand itself.

For most practical purposes in metrology, the final pdf is summarised by twoquantities which, formally, are the zeroth and second moments of the distribution.(We can think of these from the point of view of mass distributions in mechanics asbeing like the centre of mass and the moment of inertia, respectively). In the case ofthe Gaussian, triangular and uniform pdf’s the zeroth moment coincides with thecentre of the distribution which for the Gaussian and triangular cases is alsoassociated with the region of maximum probability. In the context of measurementthe zeroth moment is termed the “best estimate”. The second moment is related tothe width of the pdf and in the measurement context is termed the variance. Thesquare root of the variance is termed the standard uncertainty u. If xb is the bestestimate then the interval described by xb ± u is a measure of how incomplete ourknowledge is. The purpose of measurement in general is to make this interval asnarrow as possible. The area under the pdf spanned by u on either side of the bestestimate is known as the level of confidence or the coverage probability. This tells usthe probability that the measurand lies between xb-u and xb+u. In the case of theGaussian this probability is 0.68 or 68%, while in the case of the triangular pdf theprobability is 0.65 (65%) and in the case of the uniform distribution it is 0.58 (58%),as described in Appendix 3.

A typical measurement result would thus be presented as “the best estimate of thevalue of the measurand is xb with a standard uncertainty u and the probability thatthe measurand lies on the interval xb ± u is z%”. Figure 5 summarises the processthat leads up to this statement.

Teaching Scientific Measurement at University

63

Science monograph - final 15/9/05 2:57 pm Page 63

Page 64: Teaching scientific measurement at university

Figure 5: A model for determining the result of a measurement.

As noted earlier one of the problems in the frequentist framework is that it is notpossible to use the formalism to deal with a single measurement. However, theprobabilistic formalism deals as easily and consistently with a single measurementas with many. In the latter case the evaluation of the uncertainty associated with thedispersion is undertaken using statistical methods and is called a Type A evaluation.Usually this involves using a Gaussian pdf and the statistical formulae that have thesame form as those in the frequentist approach. Although we do not go into detailshere, we point out that the interpretation is not the same. On the other hand, a TypeB evaluation of uncertainty is estimated using all available non-statisticalinformation such as instrument specifications, previous measurements, theobserver’s personal judgment, etc. In practice this mostly means the use of either atriangular or a uniform pdf. In both cases the standard uncertainties can becalculated and treated identically i.e. uncertainties from Type A evaluations can beadded to those from Type B evaluations as the sum of the squares (in quadrature). Itis important to note that these classifications are not the same as “random” and“systematic errors”. The GUM suggests that although there may be some use for thisdistinction as a broad description, there is usually no intrinsic value in doing so.Examples showing how one would deal with a single digital reading and a singleanalogue reading using only Type B evaluations are detailed in Appendix 3.

One of the ISO recommendations that lends itself naturally to introducinguncertainty to students is the “uncertainty budget”. The uncertainty budget is a list ofall sources of uncertainty associated with making the measurement together with anevaluation of each individual contribution based on a suitable pdf. Most of the

Teaching Scientific Measurement at University

64

prior information new data

Probability densityfunction

Inferences about the quantitybeing measuredBest estimate &

uncertainty

final pdf

Science monograph - final 15/9/05 2:57 pm Page 64

Page 65: Teaching scientific measurement at university

Teaching Scientific Measurement at University

65

sources of uncertainty (e.g. zero offsets, reaction times, instrument ratings etc.) arelikely to be Type B evaluations. The overall or combined uncertainty uc is thencalculated using the usual uncertainty propagation formulae.

Description of the new (probabilistic) laboratory courseThe materials for the new laboratory course that we have designed attempt to weavetogether our understanding of students’ prior knowledge about measurement, ourdesired learning outcomes for the laboratory course and the nature ofexperimentation as specified by the ISO recommendations (ISO 1993, 1995). Aninteractive student workbook has been written1 which aims to introduce the mainideas of measurement and uncertainty (see Table 17). Note that the units listed inTable 17 do not necessarily constitute individual lessons, but rather the broad contentareas dealt with in the workbook. Students work through the activities in theworkbook in small groups in a tutorial-type environment and are assisted whennecessary by one of a number of roving tutors. Every other week the students areinvolved in laboratory activities supporting the new ideas about measurements andproviding “hands-on” laboratory experiences. The style of these laboratory tasks isof the same form as described earlier and illustrated in Figure 4. The new courseconsists of a 3-hour session per week for 16 weeks. It was piloted with a class of 160students in the Physics Department at the University of Cape Town in 2002, andrepeated in 2003.

Table 17: Outline of the content of the interactive student workbook.

Unit

1. Introduction to measurement The relationship between science and experiment. The nature and purpose of measurement.

2. Basic concepts of measurement Probability and inference. Reading digital and analogue scales. The nature of uncertainty. A probabilistic model of measurement.

3. The single reading Probability density functions. Representing knowledge graphically using a pdf. Evaluating standard uncertainties for a single reading. The result of a measurement.

4. Repeated readings that are Dispersion in data sets. Evaluating standard dispersed uncertainties for multiple readings. Type A

and type B evaluation of uncertainties.

5. Working with uncertainties Propagation of uncertainties. Combined standard uncertainty. The uncertainty budget. Comparing different results. Repeatability and reproducibility

6. Modelling trends in data Principle of least squares. Least squares fitting of straight lines.

Description

1The student workbook can be downloaded from http://www.phy.uct.ac.za/people/buffler/labmanual.html

Science monograph - final 15/9/05 2:57 pm Page 65

Page 66: Teaching scientific measurement at university

The activities in the workbook have been designed to challenge students’ pointparadigm about scientific measurement and provide them with opportunities to adopta set paradigm. Chapter 1 of the workbook introduces the concept of the measurandand that a measurement always involves a comparison with a reference standard thatprovides the units. Since the reference standard can never be infinitely small in size,this provides a limit to the knowledge that we can obtain about a measurand throughmeasurement. The next exercise explores the different purposes of measurement inboth everyday and scientific contexts. The final part of the introduction dealsexplicitly with the difference between a reading from an apparatus and what one mayconclude about the value of the measurand from the information at hand. This is acrucial aspect to the course. The data that are obtained from experiments are exact(numbers) while the information one has about a measurand is always imperfect.Students are asked to reflect back on a previous experiment and write down all thepossible factors that could have influenced their results, and whether they thoughtthese influences were “large” or small” relative to each other. This leads naturally tothe introduction of the idea for the need of a universally agreed framework for theanalysis and communication of scientific measurements in ways that are meaningfulto all scientists, providing a clear motivation for the present course.

The next section in the book is concerned with the information about a measurandthat may be inferred from a single reading of digital or analogue scale on aninstrument. Students are asked to consider a digital scale and asked to predict whatdigit will show if the sensitivity of the instrument is increased by a factor of ten. Itis easy for most students to realise that there is an equal probability of the next(unknown) digit being a 1 or 2 or 3, etc. The particular instrument can be made to be“infinitely sensitive”, i.e. providing a reading with an infinite number of digits. Evenin the absence of all other sources of uncertainty, the knowledge about the measurandwill always be limited to an interval, the width of which can never be reduced tozero. In this way a student’s belief in the possibility of knowing the “true value” ischallenged. The same is shown to be the case with respect to reading an analoguescale which always requires some form of judgement on the part of the observer.Even by adding a finer and finer graduation scale, judgement is always requiredwhen reading the scale. Students are also provided with simple apparatus and askedto make measurements and consider both the uncertainty associated with reading thescale of the instrument and all other possible sources of uncertainty in each case.

At this stage the more formal tools for dealing with uncertainty are introduced,including probability density functions (pdf). The idea that the pdf is a tool whichsummarises all the available knowledge about a particular measurand is illustrated inthe contexts of reading digital and analogue scales where rectangular and triangularpdf’s are used. The best approximation (the most likely value - usually at the centreof the pdf) and the standard uncertainty (associated with the average width of thepdf) are shown to be the two parameters which may be used to summarise all the

Teaching Scientific Measurement at University

66

Science monograph - final 15/9/05 2:57 pm Page 66

Page 67: Teaching scientific measurement at university

information contained in a particular pdf. Students are asked to read a variety ofscales and determine the standard uncertainty associated with reading the scale ineach case. The final stage in the sequence has to do with reporting the result of ameasurand as a probabilistic statement, as discussed in the previous section. Thevarious exercises are designed to reinforce the fundamental tenant of the setparadigm that the act of making a measurement involves modelling all the availabledata with other available knowledge, thereby providing the result of a measurement.

Only at this stage is the issue introduced concerning how to handle repeatedobservations of the same measurand which exhibit dispersion. This is deliberatelydelayed until after dealing with a single measurement since the idea of using theaverage value to “handle all experimental errors” is so strongly entrenched fromschool in many students, as discussed earlier. By first dealing with the fundamentalsof measurement uncertainty in the case of a single reading, dispersion in data maythen be introduced as one of many sources of uncertainty, and not necessary thedominant one. We deal with dispersion by using the same experimental context asused for the probes (see Appendix 1). One roll of the ball from a particular heightproduces a single spot on the floor. A single roll produces a second spot in a differentplace, and so on for a third and fourth roll. Two questions then arise: what is the bestvalue to use for the measurand and how is the observed scatter characterised? Aplausibility argument is used to introduce the Gaussian pdf as the most appropriatepdf to use to model the available information from the data. The best approximationof the measurand is located at the centre of the Gaussian pdf and the standarduncertainty is associated with the “average scatter” of the data which is related to thestandard deviation of the mean. In this way the statistical formulae for the mean andstandard deviation of the mean are introduced as a way to undertake a Type Aevaluation of uncertainty. The quality of sets of data displaying dispersion ispresented as being related to the degree of scatter observed in the data.

Once students have been exposed to a range of sources of uncertainty and canundertake both Type A and Type B evaluations of uncertainty, then the notion ofdrawing up an uncertainty budget as a convenient summary of uncertainties in ameasurement and the procedures of combining sources of uncertainty are introduced.The interactive examples in the workbook guide students through a range ofmeasurement contexts, calculating standard uncertainties for all reasonable sourcesof uncertainty and “summing” these to provide a combined standard uncertainty forthe measurement. In this way the theme of always thinking about all sources ofuncertainty introduced in the first chapter culminates in the students being able todraw up an “uncertainty budget” and being able to determine a reasonable totaluncertainty for a measurement.

The workbook also contains a number of practical examples for the students toundertake, most notably a speed of sound experiment which provides opportunitiesto analyse both repeated (dispersed) time readings and a single distance reading. Thetechnique of undertaking a least squares analyses of linear data is also handled

Teaching Scientific Measurement at University

67

Science monograph - final 15/9/05 2:57 pm Page 67

Page 68: Teaching scientific measurement at university

through a combination of theoretical and practical exercises. Appendixes in theworkbook provide exercises covering the areas of converting units, using tables andgraphs, writing a laboratory report and using the statistics functions on a scientificcalculator. There are also more advanced supplementary notes on probability densityfunctions and expanded uncertainties.

Study 4: Evaluation of the probabilistic physicslaboratory course Study 4 is concerned with the evaluation of the new course and materials based onthe probabilistic framework of measurement. The course evaluation, partly reportedin Buffler et al. (2003), focuses on students’ understanding of measurement, inparticular their use of the point and set paradigm before and after instruction. Theevaluative research questions were:

■ In what ways has students’ use of the point and set paradigms changed at theend of the new laboratory course?

■ In what ways do these changes differ from the results of the evaluation ofthe original course?

Evaluation Method

A new cohort of GEPS students was asked to respond to a set of probes before thestart and after completion of the new course. The pre-course probes were written by126 students and the post probes by 139 students. The sample of this evaluationstudy comprises the 106 students who wrote both sets of probes.

A total of eight probes were written before the course, and seven after the course.Students’ responses to four common probes are presented here. Students’understanding of measurement for data collection was investigated using two probes.Firstly, their understanding of single readings was identified through the “SingleDistance Reading” (SDR) probe, both before and after the course. Secondly,students’ understanding of data collection through an ensemble of readings wasidentified using the “Repeating Distance” (RD) probe. Students’ understanding ofmeasurement in the area of data processing is not presented here, while theirunderstanding of measurement in the area of data comparison was identified throughthe “Different Mean Same Spread” (DMSS) probe. Lastly, students’ understandingof the limits to uncertainty was investigated using the “No Uncertainty” (NU) Probe.The full probes are presented in Appendix 1.

The administration of the pre-course and post-course probes strictly followed thestrategies described earlier. The analysis of the responses identified students’ use ofthe point and set paradigms for each of the probes. Typical responses for point andset reasoning for data collection and data comparison were similar to those presentedas illustrations in the survey above. Illustrations of responses indicative of point andset reasoning for single readings (the SDR probe) and for the perceived limits to

Teaching Scientific Measurement at University

68

Science monograph - final 15/9/05 2:58 pm Page 68

Page 69: Teaching scientific measurement at university

uncertainty (the NU probe) are presented below. Frequencies of the use of point andset reasoning for pre-post course comparisons were compiled for each area ofmeasurement for the whole sample. Changes in the use of paradigms for individualstudents were also identified and clustered.

Figure 6: The SDFR (“Single Distance Reading”) probe.

Teaching Scientific Measurement at University

69

Spot made by the ball on paper

The students work in groups on the experiment. Their first task is to determine d whenh = 90 mm. One group lets the ball roll down the slope from a height h = 90 mm anduse a metre rule to measure the distance d. What they see is shown below.

With whom do you most closely agree? (Circle ONE) A B C D E

Explain your choice.

_______________________________________________________________________________

A B C D E

I thinkthat the

distance theball travelled

is exactly57.68 cm.

I think thatthe distance theball travelled isapproximately

57.68 cm.

No, thedistance theball travelledis between57.60 and57.70 cm.

The distancethe ball

travelled isbetween

57.63 and57.73 cm.

I don’tagree withany of you.

Science monograph - final 15/9/05 2:58 pm Page 69

Page 70: Teaching scientific measurement at university

Results

(i) Use of a paradigm for data collection (a single reading)

Figure 6 shows the Single Distance Reading (SDR) probe in full which exploredstudents’ understanding relating to reading the scale on a ruler.

This question probed student ideas about how best to describe a single reading interms of exactly, approximately or an interval description. The view that the readingrepresents the measurand exactly would be classified as an indication of the use of apoint paradigm. Students in this category gave varied reasons for doing so. Contrastthe two student quotes below, for example:

The spot where the ball landed shows that it is exactly on 57.68 cm according tomy observation. (SDR response)

In physics accuracy is very important. To really investigate the relationship a stabledecision should be made. (SDR response)

On the other hand, an interval descriptor of the reading would be ascribed to the useof a set paradigm. Nobody chose option D, possibly due to the fact that the intervalspecified crossed a marking on the metre stick while the spot was clearly on one sideof the marking. As will be seen in the section that follows, the markings on the metrestick play an important part in the way a large number of students perceive thesituation. The difficulty in ascribing set paradigm reasoning to students whosediscussion was in terms of an interval can be seen from two of the responses below:

The ball landed between 57.60 and 57.70. The true distance is unknown but it mustbe between 57.60 and 57.70. The distances 57.60 and 57.70 are known. (SDRresponse)

The distance the ball travelled is between 57.60 and 57.70 cm. The ball has not yetplaced a specific mark on the metre rule. It is also between 57.60 and 57.70. (SDRresponse)

Both responses contain hints that the desired “exactness” has not been fulfilled, i.e.that the spot did not coincide with a metre rule marking. Thus, it is possible that someof the students who chose to discuss the situation in terms of the interval may bedoing so with the view that the situation is “bad enough” to warrant the intervaldescription. However, all interval type responses were assumed for the presentpurpose to be associated with the set paradigm. An example of the small number ofstudents who chose approximately (option B) but who reasoned in terms of aninterval is:

It is approximately 57.68 cm, because the spot is between 57.6 and 57.7 and isclose to 57.7. (SDR response)

Teaching Scientific Measurement at University

70

Science monograph - final 15/9/05 2:58 pm Page 70

Page 71: Teaching scientific measurement at university

However, the vast majority of the students who chose to use approximately todescribe the situation seem to base this on their use of a point paradigm. The wordapproximate was used in a number of different ways with the largest group using itto indicate “nearly equal to” or “close to” some exact reference, namely a metre stickmarking. Typical ways of expressing these ideas were:

The metre rule cannot measure the exact point to two decimal places since itssmallest unit is a millimetre, so you have to approximate the measurement that ismost likely to be accurate. (SDR response)

The ball did not reach a distance of 57.7 cm so the approximate amount would be57.68 cm. (SDR response)

Another smaller percentage reasoned in terms of the finite size of the spot or the ballsuch as:

The spot made by the ball is round so I can’t be sure where to take themeasurements from…therefore it must not be exactly 57.68 cm but approximately57.68 cm. (SDR response)

A few students emphasised the human judgement that was required such as:

The ball did not land exactly on 57.70 cm. To the naked [eye] it looks like 57.68but it could also not be, so it is approximately 57.58 cm. (SDR response)

One response which possibly explains why students may find the nature ofintervals difficult is:

We are not sure about the distance the ball moved but in Physics we cannot putsomething in between, as it will make us suffer in our calculations. (SDR response)

A somewhat larger than expected number of responses for the pre-probe were notcodeable. More than half of these were so for the reason that the question wasinterpreted as a “trick” question in which the respondents felt that the length of theramp and the curved path of the ball had to be added to the reading on the metrestick.

Table 18 presents the frequencies of the student responses for this probe in terms ofthe use of a point or set paradigm before instruction, against responses by the samestudents after the new laboratory course.

Teaching Scientific Measurement at University

71

Science monograph - final 15/9/05 2:58 pm Page 71

Page 72: Teaching scientific measurement at university

Table 18: Students’ use of paradigms for data collection: a single reading(SDR probe). (n = 106)

Paradigm after instruction

Point 20 51 2 73paradigm (19%) (48%) (2%) (69%)

Set 7 9 0 16paradigm (7%) (8%) (0%) (15%)

Not 6 11 0 17codeable (6%) (10%) (0%) (16%)

Total 33 71 2 106(31%) (67%) (2% ) (100%)

The data in Table 18 indicate that whereas 69% of the students were point paradigmthinkers before instruction, this proportion dropped to 31% after instruction.However, it is not clear to what extent the use of interval language is an expressionof internalisation of the concepts or whether it is a rote learnt response, in particularwhere students started out within the point paradigm. One of the interesting aspectsis that particularly those students who chose “approximate” in the pre-probeschanged over from the point paradigm to the set paradigm.

(ii) Use of a paradigm for data collection (ensemble of data)

The “Repeating Distance” (RD) probe was designed to investigate students’ reasonsfor repeating measurements of the same quantity (see Appendix 1). Table 19 presentsthe frequencies of the student responses for this probe in terms of the use of a pointor set paradigm before instruction, against responses by the same students after thenew laboratory course.

Teaching Scientific Measurement at University

72

Pointparadigm

Set paradigm Not codeable

Total

Paradigmbefore

instruction

Science monograph - final 15/9/05 2:58 pm Page 72

Page 73: Teaching scientific measurement at university

Table 19: Students’ use of paradigms for data collection: an ensemble ofreadings (RD probe). (n = 106)

Paradigm after instruction

Point 3 70 3 76paradigm (3%) (66%) (3%) (72%)

Set paradigm 0 26 0 26(0%) (24%) (0%) (24%)

Not codeable 0 4 0 4(0%) (4%) (0%) (4%)

Total 3 100 3 106(3%) (94%) (3%) (100%)

The data in Table 19 show a distinct shift after instruction towards the use of a setparadigm, i.e. the notion of repeating measurements in order to find a mean.Whereas before teaching, more than three quarters of the students used the pointparadigm, hardly any of the students did so after instruction. Deeper analysis of thedata shows that the largest shift from the use of the point to the set paradigm occurredin the group of students who initially repeated to find the recurring value, but laterchose to calculate a mean.

Although the data in Table 19 suggest a strong shift towards the set paradigm afterinstruction, caution needs to be applied when assigning all responses that mentiontaking an average as being indicative of the set paradigm. As found in Studies 2 and3, it is likely that many of these students are making the decision to calculate themean by rote learning that this is the right procedure to follow. These students seemto be motivated to repeat their readings so that a variety may be obtained in order foran average to be calculated, sometimes as representing the “true value”, as indicatedby the following example

If they repeat it a few more times they can then use the average. That would givethem what the true value is. (RD response)

Many of these students think that by taking an average, all uncertainty may bereduced:

By following the same procedure a few more times, an average can be worked out.This would allow all other factors influencing this result to be ruled out. (RDresponse)

Teaching Scientific Measurement at University

73

Point paradigm

Set paradigm Not codeable

Total

Paradigmbefore

instruction

Science monograph - final 15/9/05 2:58 pm Page 73

Page 74: Teaching scientific measurement at university

(iii) Use of a paradigm for data set comparison

The “Different Mean Same Spread” (DMSS) probe provided two ensembles ofmeasurements together with their calculated means and asked the student to commenton the compatibility of the two data sets (see Appendix 1).

The main focus in the analysis was on the ways in which the students used the notionof spread and overlapping uncertainties in their reasoning. The idea of the spread asan indicator of the uncertainty of a set of measurements needed to be conceptualisedand applied when deciding whether or not the two results agree. After analysis of theresponses according to the use of the imposed or internalised set paradigm, Table 20summarises the changes in students’ perceptions of measurement.

Table 20: Students’ use of paradigms for data comparison (DMSS probe).(n = 106)

Paradigm after instruction

Imposed set 10 93 3 106paradigm (9%) (88%) (3%) (100%)

Internalised set 0 0 0 0paradigm (0%) (0%) (0%) (0%)

Not 0 0 0 0codeable (0%) (0%) (0%) (0%)

Total 10 93 3 106(9%) (88%) (3%) (100%)

From the data in Table 20 it can be seen that all students were classified according tothe imposed set paradigm before instruction, while after the new laboratory course88% of the students responded in a way commensurate with the internalised setparadigm.

(iv) Understanding uncertainty

The “No Uncertainty” (NU) probe investigated the students' ideas about the existenceof a true value and whether or not perfect knowledge about a measurand may beknown from measurement. A slightly different version of this probe was used for thepre- and post-test. The version used before instruction (NU1) was as follows:

When they are finished, the two groups discuss how they can improve their rolling ball experiment next time.

A: “If we practice enough we will be able to perfect our technique so that only one measurement will give us the true value.”

Teaching Scientific Measurement at University

74

Imposed setparadigm

Internalised setparadigm

Not codeable

Total

Paradigmbefore

instruction

Science monograph - final 15/9/05 2:58 pm Page 74

Page 75: Teaching scientific measurement at university

B: “No, that is not possible.”

Most students who were categorised according to the point paradigm claimed thatthrough practice you become an expert:

Because practice makes perfect. (NU response)

Others point paradigm users claimed that practice will provide a better reading,sometimes stating that experimental error will be reduced:

The more practice you get the more accurate your results become. (NU response)

Student responses which were classified according to the set paradigm includedthose which indicated that there will always be a variation, even with practice:

It is impossible to even think that the ball will land in the same spot. (NU response)

No matter how much they practice it will never give them the same measurement.There are too many variables. (NU response)

More than 20% of the students stated that there will always be some influence fromoutside conditions:

No matter how much you practice there will always be factors which we have nocontrol over, influencing our results, Therefore the results will always be different. (NUresponse)

It is not humanly possible to be 100% perfect when executing experiments. (NUresponse)

About 10% of the students wanted to take several readings and calculate a mean.Although these students were classified according to the set paradigm, it was notclear what motivated this response or whether or not they believed that the meanprovided the true value:

No matter how many times they do it, they won’t be perfect. They will always have tofind the average because of human error. (NU response)

To be accurate you need to take a number of readings and work out an average. (NUresponse)

The post-test version (NU2) of this probe took the form:

Two groups continue to discuss doing experiments in physics.

A: “It is possible for scientists to design a physics experiment that will provide a result with no uncertainty.”

B: “No, it is impossible to have such an experiment.”

Teaching Scientific Measurement at University

75

Science monograph - final 15/9/05 2:58 pm Page 75

Page 76: Teaching scientific measurement at university

In the post-test, all the codeable B-responses were classified according to the setparadigm. About a quarter of the students stated that there will always beenvironmental or experimental conditions that contribute towards the uncertainty ina result:

There will always be a measure of uncertainty during experiments as no result can beperfect. This is because various factors (weather, apparatus, etc.) all contribute to theresult. (NU response)

It is impossible to have such an experiment because when doing an experiment externalfactors always play a part. (NU response)

Some students stated that “errors” or “mistakes” contribute towards the uncertainty:

No matter if you are a scientist you can still make mistakes. (NU response)

We are all humans and will always make a mistake. Therefore it will always benecessary to have the standard uncertainty. (NU response)

About another quarter of the students considered the apparatus itself to be the reasonwhy there will always be some uncertainty:

It is impossible to get an exact measurement for anything. There is no apparatus to givea measurement with an infinite number of digits. (NU response)

No matter how precise the equipment we use, we can never determine a result withoutany uncertainty. We will always wonder if we could know another digit in the reading.(NU response)

Another quarter of the students simply stated that the true value of a measurand maynever be known:

Because the true value of a measurand can never to known, that is why you have tohave a result with its standard uncertainty. (NU response)

There are no exact measurements. (NU response)

Many of these responses were probably rote responses, as were the group that statedthat you always need to state an uncertainty as part of the measurement result:

For every result you get there must be the uncertainty of your measurement. (NUresponse)

It is impossible to ever find a true reading due to the standard uncertainty. (NUresponse)

Teaching Scientific Measurement at University

76

Science monograph - final 15/9/05 2:58 pm Page 76

Page 77: Teaching scientific measurement at university

Table 21: Students’ use of paradigms for notions of uncertainty (NUprobe). (n = 106)

Paradigm after instruction

Point 0 49 6 55paradigm (0%) (46%) (6%) (52%)

Set 1 46 4 51paradigm (1%) (44%) (3%) (48%)

Not 0 0 0 0codeable (0%) (0%) (0%) (0%)

Total 1 95 10 106(1%) (90%) (9%) (100%)

Table 21 presents the data for the NU probe. Once again there are strong indicationsthat many students are using set paradigm reasoning after the course to deal with thequestion.

Comparison between the traditional and new laboratory courses

The responses from all four probes (QD, RD, DMSS and NU) were combined to forma single classification for each student (pre and post testing). If three of the fourresponses to the probes were classified as either “point” or “set”, then the student wasclassified as “consistent point paradigm” or “consistent set paradigm” respectively forthe set of four probes. The “inconsistent paradigm” classification was used for anyother combination of point, set and “non codeable” responses. The results are shownin Table 22.

It can be seen from Table 22 that before the course 73% of the sample were classifiedas using “consistently point paradigm” and 26 % as using “inconsistent paradigms”.After the course the vast majority of the sample (89%) were classified as “consistentset paradigm” with 9% using “inconsistent paradigms”. These outcomes may becompared with the results from a similar set of probes administered to a cohort of 70students attending the original version of our laboratory course (see Study 3 above).For this cohort very similar pre-course approaches (67% using “consistent pointparadigm” and 25% “inconsistent paradigms”) were found. However, a much smallerpercentage of the students in the traditional course were classified as “consistent setparadigm” users after their course (16% compared with 89% after the new course).

Teaching Scientific Measurement at University

77

Point paradigm

Set paradigm Not codeable

Total

Paradigmbefore

instruction

Science monograph - final 15/9/05 2:58 pm Page 77

Page 78: Teaching scientific measurement at university

Table 22: Students’ overall use of paradigms before and after instructionin the new course (based on all probes). (n = 106)

Paradigms used after instruction

Consistent 0 7 70 77point (0%) (7%) (66%) (73%)paradigm

Inconsistent 1 3 24 28paradigms (1%) (3%) (22%) (26%)

Consistent 0 0 1 1set paradigm (0%) (0%) (1%) (1%)

Total 1 10 95 106(1%) (9%) (89%) (100%)

Discussion

The materials in our new course take into account students’ prior knowledge aboutthe nature of measurement and we have adopted what we believe to be a logicallyconsistent framework, i.e. the probabilistic approach to measurement and uncertaintyas advocated by the ISO. This approach leads naturally from the exact nature of areading to the uncertain nature of the inferences about the measurand, in both casesof single and repeated readings. Our evaluation of the new course has indicated thatit is much more successful than the traditional course in moving students’understanding towards the set paradigm of measurement. Since other researchfindings (for instance Campbell et al., 2000) indicate that responses to hypotheticalscenarios of experimental work do not always correspond with decisions that studentsmake in practice in the laboratory, further observational studies are needed to confirmthese apparently positive findings.

When the data from the four probes presented here are considered together, there isfurther compelling evidence that most students come to university with a view thatthe objective of scientific measurement is “exactness”, one of the main indicators ofthe point paradigm. This view is related to two underlying ideas of the nature ofexperimentation in physics. Firstly, high quality measurement is equated with‘accuracy’, and in many cases students use the terms ‘good results’ and ‘accurateresults’ interchangeably. Secondly, it seems that many students perceive the purposeof collecting experimental data to be the identification of numerical values forsubstitution into formulae. Since mathematics deals with either specific symbols ornumbers, it is clear that the notion of an interval as a result from a physics experiment

Teaching Scientific Measurement at University

78

Consistentpoint

paradigm

Inconsistentparadigms

Consistent setparadigm

Total

Paradigmsused beforeinstructions

Science monograph - final 15/9/05 2:58 pm Page 78

Page 79: Teaching scientific measurement at university

falls outside of this framework. Although many students use the word “approximate”when considering a single reading in a scientific context, the data indicate that thisis not meant to indicate a “fuzzy” (extended) point (Evangelinos et al., 2002), butrather a point “close to” a particular reference, which is either the perceived truevalue or the nearest mark on the instrument.

One of the interesting aspects of our findings is that half of the students who chose“approximate” in the pre-probes changed over from the point to the set paradigm. Infuture studies this will be investigated more carefully as it is clear that students’notions of approximate as a pre-cursor to developing ideas that incorporate thescientific idea of “best approximation”, coupled with an interval descriptor, is a keyissue for pedagogy.

A number of further issues do arise from our data. In particular, many students appearto be using intervals without an apparent understanding of the nature of ameasurement result. Although all responses to probes in which an interval was usedwere coded in the present study as being indicative of the set paradigm, it was notclear from the four probes alone whether or not these students had indeedinternalized the set paradigm or were responding by rote. Results from other probes,not presented here, do indicate that many students had not fully understood the coreidea that the pdf models all available knowledge about the measurand and that thebest estimate and standard uncertainty (together defining an interval) is just aconvenient way of representing the measurement result. This view is furthersupported by data obtained in the students’ mid-term examination which included awritten data analysis component. Although most students were able to apply intervalreasoning appropriately to situations where, for example, they were asked tocompare two measurement results for agreement, from written responses to otherquestions it became clear that many students had an inappropriate understanding ofan interval, especially with regard to it representing a pdf.

Conclusion

Students’ perceptions of measurement and uncertainty

Our surveys of students’ understanding of measurement and uncertainty indicate thatthe point and set paradigms are useful theoretical constructs underlying a range ofmeasurement actions at different stages of experimentation, i.e. data collection, dataprocessing and data comparison. These paradigms usefully allow the classificationof actions of undergraduate students of a variety of abilities and practicalexperiences, both in physics and chemistry. Although the reported studies involvedfirst year undergraduates, other reports suggest that the same paradigms may be usedfor explaining the reasoning underpinning measurement actions of more experiencedstudents (Lubben et al., 2000; Davidovitz et al., 2001). As suggested by Redish(2003), the set of probes with accompanying coding schemes provides a validateddiagnostic tool for students’ ideas of measurement.

Teaching Scientific Measurement at University

79

Science monograph - final 15/9/05 2:58 pm Page 79

Page 80: Teaching scientific measurement at university

The findings also suggest that students use these paradigms consistently across thedifferent stages of experimentation. However, some rote usage of the set paradigmterminology and procedures during data collection and data processing can beidentified which can be unmasked as point paradigm reasoning by considering theactions during data comparison. But what determines the choice of paradigms? Morespecifically, why do the majority of novice undergraduates use the point paradigmfor determining measurement actions? In order to explore this issue, it may behelpful to look outside the physics laboratory. The consistency of paradigm usageacross different experimental stages in the physics laboratory contrasts withinconsistency in their use for measurement actions for one of these stages, i.e. dataprocessing, across different contexts. Lubben et al. (2004) find that students usedifferent paradigms for measurement in everyday and laboratory situations. Theiranalysis shows, however, that for the majority of students the choice of paradigm isconsistently determined by the perceived purpose of the measurement being taken,i.e. by an epistemology of the nature of measurement or investigative frames (Millaret al., 1994; Lippmann, 2003). Sere et al. (2001) presented students withmeasurements in different science disciplines (biology and physics) and in everydaysituations. They found that students used different epistemologies and ontologies ofthe nature of science for processing the data in the various contexts. The notion of anepistemology of the nature of measurement, as distinct from an epistemology of thenature of science, needs further research. Resulting research findings will then allowtargeting the teaching programme more effectively.

The findings do not provide evidence that our premise of social constructivismas a perspective for researching knowledge and knowledge acquisition has beenmisplaced. Scenario-based probes asking students for measurement actions withjustification resulted in illuminative and differentiated responses. Theseresponses allowed for meaningful inferences of underlying ‘paradigms’, firstconstructed from response sets and later tested for coverage on a more diversesample of students. The probes have proven useful in evaluating laboratorycourses for major learning outcomes. However, the jury is still out on ourinterpretation of the point paradigm as a misconception of measurement in thelaboratory. The fact that the point and set paradigms are used by the same studentfor different contexts - sometimes appropriately sometimes not -, may suggestthat these paradigms could be considered to be p-prims (diSessa, 1993) ratherthan misconceptions. This debate will need to be settled after the evaluation ofthe effectiveness of teaching programmes based on either cognitive conflictstrategies or explicit primitive application strategies (for instance Lippmann,2003).

Evaluation of introductory laboratory courses

Following our studies of experimentation in physics, several others (Davidovitz etal., 2001; Rollnick et al., 2001, 2002) confirm that the use of point and set paradigmsare equally recognizable in measurement decisions in chemistry. This reinforces thepremise that measurement relies on a specific domain of knowledge, rather than a set

Teaching Scientific Measurement at University

80

Science monograph - final 15/9/05 2:58 pm Page 80

Page 81: Teaching scientific measurement at university

of subject-related skills. The communality also implies that understanding ofmeasurement is generic and, theoretically, could be transferred between sciencedisciplines. Teaching the principles of measurement in physics would suppose theapplication of this knowledge in chemistry (or biology). Although confirmation oftransferability of understanding of measurement will make teaching considerablymore efficient, no report of a test of such transferability has been found.

We have come to the conclusion that one of the main purposes of introductorylaboratory courses is the shift of students’ use of a point paradigm to that of a setparadigm when taking measurement decisions. Our research findings indicate that atraditional (frequentist) laboratory course only provides limited improvement instudents’ understanding of measurement and uncertainty, i.e. the use of the setparadigm. It is noticeable that students entering university with stronger entryqualifications, and usually with a wider experience of practical work, show a betterunderstanding of measurement at entry, but hit the same ceiling in theirunderstanding at the end of a traditional laboratory course as do less well qualifiedand less experienced students.

The results show that students’ understanding of measurement improvesconsiderably more if taught through a probabilistic approach. Part of such a positiveeffect may be due to the fact that this approach unifies the treatment of single andrepeated readings, and that it focuses on the uncertainty of the knowledge about themeasurand rather than the reliability of the data collected. This approach avoids theconfusions between systematic and random error and between accuracy andprecision of the data (Sere et al., 1993; Tomlinson et al., 2001), and replaces thenegative connotations of the term error by the more neutral term uncertainty(Fairbrother and Hackling, 1997). Apart from this pedagogic success, a majoradvantage of the probabilistic approach to measurement is the fact that regulatingprofessional bodies, such as the ISO, recommend this approach for treating andreporting scientific research data.

AcknowledgementsThe research and development project reported here was sponsored by a HigherEducation Link of the Department for International Development (DFID)administered by the British Council, the National Research Foundation (SouthAfrica) and by research funds from the University of Cape Town (AcademicDevelopment Programme) and the University of York. We thank the staff of the UCTPhysics Department workshop and the tutors supporting students in the differentcohorts. The cartoon figures used in the probes are from “King Tut” by GeoffWatson. We also acknowledge the permission of publishers John Wiley & Sons andTaylor & Francis for their kind permission to draw on our papers previouslypublished. Above all we wish to thank the students who have participated in ourstudies.

Teaching Scientific Measurement at University

81

Science monograph - final 15/9/05 2:58 pm Page 81

Page 82: Teaching scientific measurement at university

ReferencesAllie, S. and Buffler, A. (1998) A course in tools and procedures for Physics 1. American

Journal of Physics, 66 (7), 613-624.

Allie, S., Buffler, A., Campbell, B., Lubben, F., Evangelinos, D., Psillos, D. and Valassiades,O. (2003) Teaching measurement in the introductory physics laboratory. The PhysicsTeacher, 41 (7), 394-401.

Allie, S., Buffler, A., Kaunda, L., Campbell, B. and Lubben, F. (1998) First year physicsstudents’ perceptions of the quality of experimental measurements. InternationalJournal of Science Education, 20 (4), 447-459.

Allie, S., Buffler, A., Kaunda, L. and Ingles, M. (1997) Writing-intensive physics laboratoryreports: tasks and assessment. The Physics Teacher, 35, 399-405.

APU (1988) Science at age 13: review report. London: Department of Education andScience.

Bartholomew, H., Osborne, J. and Ratcliffe, M. (2003) Teaching students ‘ideas-about-science’: five dimensions of effective practice. Science Education, 87, 1-28.

Black, P. (1993) The purposes of science education. In R. Hull (ed), ASE Secondary ScienceTeachers' Handbook. London: Simon & Schuster.

Brown, S., Collins, A. and Duguid, P. (1989) Situated cognition and the culture of learning.Educational Researcher, 18, 32-41.

Buffler, A., Allie, S., Lubben, F. and Campbell, B. (2001) The development of first yearphysics students’ ideas about measurement in terms of point and set paradigms.International Journal of Science Education, 23 (11), 1137-1156.

Buffler, A., Allie, S., Lubben, F. and Campbell, B. (2003) Evaluation of a research-basedcurriculum for teaching measurement in the first year physics laboratory. Paperpresented at the bi-annual Conference of the European Science Education ResearchAssociation, Noordwijkerhout, The Netherlands, August 2003.

Campbell, B., Kaunda, L., Allie, S., Buffler, A. and Lubben, F. (2000) The communicationof laboratory investigations by university entrants. Journal of Research in ScienceTeaching, 37 (8), 839-853.

Chi, M., Feltovitch, P. and Glaser, R. (1981) Categorization and representation of physicsproblems by experts and novices. Cognitive Science, 5, 121-152.

Cobb, P. and Bauerfield, H. (1995) The emergence of mathematical meaning: interaction inclassroom cultures. Hillidale NJ: Erlbaum.

Coelho, S. and Séré, M-G. (1998) Pupils’ reasoning and practice during hands-on activitiesin the measurement phase. Research in Science and Technological Education, 16 (1),79-96.

Teaching Scientific Measurement at University

82

Science monograph - final 15/9/05 2:58 pm Page 82

Page 83: Teaching scientific measurement at university

d’Agostini, G. (1999) Bayesian Reasoning in High Energy Physics – Principles andApplications. (CERN Yellow Report 99-3). Geneva: CERN.

Davidowitz, B., Lubben, F. and Rollnick, M. (2001) Undergraduate science and engineeringstudents' understanding of the reliability of chemical data. Journal of ChemicalEducation, 78 (2), 247-252.

diSessa, A. (1993) Towards an epistemology of physics. Cognition and Instruction, 10, 105-225.

diSessa, A. and Sherin, B. (1998) What changes in conceptual change? International Journalof Science Education, 20, 1155-1191.

Elby, A. (2001) Helping physics students learn how to learn. American Journal of Physics,69 (7), S54-S64.

Etkina, E., Van Heuvelen, A., Brookes, D. and Mills, D. (2002) Role of experiments inphysics instruction - a process approach. The Physics Teacher, 40, 351-355.

Evangelinos, D., Psillos, D. and Valassiades, O. (1998) Students’ introduction tomeasurement concepts: A metrological approach. In European Commission Report onProject PL 95-2005 Labwork in Science Education. pp. 561-587.

Evangelinos, D., Psillos, E. and Valassiades, O. (2002) An investigation of teaching andlearning about measurement data and their treatment in the introductory physicslaboratory. In D. Psillos and H. Niederrer (eds.): Teaching and learning in the sciencelaboratory. Dordrecht: Kluwer Academic Publishers. pp. 179-190.

Fairbrother, R. and Hackling, M. (1997) Is this the right answer? International Journal forScience Education, 19 (8), 887-894.

Garrett, J., Horn, A. and Tomlinson, J. (2000) Misconceptions about error. UniversityChemistry Education, 4 (2), 54-57.

Germann, P. and Aram, R. (1996) Student performances on the science processes ofrecording data, analysing data, drawing conclusions and providing evidence. Journal ofResearch in Science Teaching, 33 (7), 773-798.

Germann, P., Aram, R. and Burke, G. (1996) Identifying patterns and relationships amongthe responses of seventh-grade students to the science process skill of designingexperiments. Journal of Research in Science Teaching, 33 (1), 79-99.

Giordano, J. (1997) On the sensitivity, precision and resolution in DC Wheatstone bridges,European Journal of Physics, 18 (1), 22-27.

Gott, R. and Duggan, S. (1995) Investigative work in the science curriculum. Buckingham:Open University Press.

Gott, R. and Duggan, S. (1996) Practical work: its role in the understanding of evidence inscience. International Journal of Science Education, 18 (7), 791-805.

Teaching Scientific Measurement at University

83

Science monograph - final 15/9/05 2:58 pm Page 83

Page 84: Teaching scientific measurement at university

Halloun, A. and Hestenes, D. (1985) The initial knowledge state of college physics students.American Journal of Physics, 53, 1056-1065.

Hammer, D. (1994) Epistemological beliefs in introductory physics. Cognition andInstruction, 12, 151-183.

Hammer, D. and Elby, A. (2003) Tapping epistemological resources for learning science.Journal of the Learning Sciences, 12 (1), 53-90.

Hodson, D. (1996) Laboratory work as a scientific method: three decades of confusion anddistortion. Journal of Curriculum Studies, 28 (2), 115-135.

Hodson, D. (1998) Taking practical work beyond the laboratory. International Journal ofScience Education, 20 (6), 629-632.

Howie, S. and Hughes, C. (1998) Mathematics and science literacy of final-year schoolstudents in South Africa: A report of the performance of South African students inTIMMS. Pretoria: HSRC.

International Organization for Standardization (1993) International Vocabulary of Basic andGeneral Terms in Metrology (VIM). Geneva: ISO.

International Organization for Standardization (1995) Guide to the expression of uncertaintyin measurement (GUM). Geneva: ISO.

Kirschner, P. and Huisman, W. (1998) Dry laboratories in science education: computer-basedpractical work. International Journal of Science Education, 20 (6), 665-682.

Kuhn, T. (1970) The structure of scientific revolutions. Chicago: University of ChicagoPress.

Larkin, J. and Reif, F. (1979) Understanding and teaching problem solving in physics,European Journal of Science Education, 1, 191-203.

Lave, J. and Wenger, E. (1991) Situated learning: legitimate peripheral participation.Cambridge: Cambridge University Press.

Laws, P. (1996) Millikan Lecture 1996: Promoting active learning based on physicseducation research in introductory physics courses. American Journal of Physics, 65(1), 14-21.

Leach, J., Millar, R., Ryder, J. and Séré, M-G. (2000) Epistemological understanding inscience learning: the consistency of representations across contexts. Learning andInstruction, 10, 497-527.

Lemke, J. (1997) Cognition, context and learning: A social semiotic perspective. In D.Kirschner and J. Whitson (Eds.): Situated cognition: Social, semiotic andpsychological perspectives. Mahwah, NJ: Lawrence Erlbaum. pp. 37-56.

Lippmann, R. (2003) Students’understanding of measurement and uncertainty in the physicslaboratory: social construction, underlying concepts, and quantitative analysis.Unpublished PhD thesis: University of Maryland.

Teaching Scientific Measurement at University

84

Science monograph - final 15/9/05 2:58 pm Page 84

Page 85: Teaching scientific measurement at university

Lubben, F., Davidowitz, B. and Rollnick, M. (2000) Undergraduate Engineering andScience students' understanding of the reliability of chemical data. in S. Mahlomaholo(ed): Proceedings of the 8th Annual Conference of the Southern African Association forResearch in Mathematics and Science Education, Port Elizabeth, pp. 267-273.

Lubben, F., Buffler, A., Allie, S. and Campbell, B. (2001) Point and set reasoning in practicalscience measurement by entrant university freshmen. Science Education, 85, 311-327.

Lubben, F., Campbell, B., Buffler, A. and Allie, S. (2004) The influence of context onjudgements of the quality of experimental measurements. In A. Buffler and R.Laugksch (eds): Proceedings of the 12th Annual Conference of the Southern AfricanAssociation for Research in Mathematics, Science and Technology Education. CapeTown, pp. 569-577.

Lubben, F. and Millar, R. (1996) Children’s ideas about the reliability of experimental data.International Journal of Science Education, 18, 955-968.

Masnick, A. and Morris, B. (2002) Reasoning from data: the effect of sample size andvariability on children's and adults' conclusions. In W. Gray and C. Schunn (eds.):Proceedings of the 24th Annual Conference of the Cognitive Science Society. Mahwah,NJ: Lawrence Erlbaum. pp. 643-648.

McDermott, L. and Schaffer, P. (1992) Research as a guide for curriculum development: anexample from introductory electricity, part 1. American Journal of Physics, 60, 994-1013.

McGinn, M. and Roth, W-M. (1999) Preparing students for competent scientific practice:implications of recent research in science and technology studies. EducationalResearcher, 28 (3), 14-24.

Millar, R., Le Marechal, J-F. and Tiberghien, A. (1999) ‘Mapping’ the domain: varieties ofpractical work. In D. Psillos and H. Niederrer (eds.): Teaching and learning in thescience laboratory. Dordrecht: Kluwer Academic Publishers. pp. 33-59.

Millar, R., Lubben, F., Gott, R. and Duggan, S. (1994) Investigating in the school sciencelaboratory: Conceptual and procedural knowledge and their influence on performance.Research Papers in Education, 9, 207-248.

Meester, M. and Maskill, R. (1995) First year chemistry practicals at universities in Englandand Wales - aims and the scientific level of the experiments. International Journal ofScience Education, 17 (5), 575-588.

Montes, L. and Rockley, M. (2002) Teacher perceptions in the selection of experiments.Journal of Chemical Education, 79 (2), 244-247.

Pfundt, H. and Duit, R. (1994) Bibliography: Students' Alternative Frameworks and ScienceEducation. Kiel: IPN.

Redish, E. (2003) Teaching physics with the physics suite. Hoboken, NJ: Wiley.

Rollnick, M., Dlamini, B., Lotz, S. and Lubben, F. (2001) Views of South African chemistry

Teaching Scientific Measurement at University

85

Science monograph - final 15/9/05 2:58 pm Page 85

Page 86: Teaching scientific measurement at university

students in university bridging programmes on the reliability of experimental data.Research in Science Education, 31 (4), 553-573.

Rollnick, M., Lubben, F., Lotz, S. and Dlamini, B. (2002) What do under prepared studentslearn about measurement from introductory laboratory work? Research in ScienceEducation, 32 (1), 1-18.

Roth, W-F., McRobbie, C., Lucas, K. and Boutonne, S. (1997) Local production of order intraditional science labs. Learning and Instruction, 8 (2), 107-136.

Roth, W. and Roychoudhury, A. (1993) The development of science process skills inauthentic contexts. Journal of Research in Science Teaching, 30, 127-152.

Ryder, J. and Leach, J. (2000) Interpreting experimental data: the views of upper secondaryschool and university science students. International Journal of Science Education, 22(10), 1069-1084.

Schauble, L., Klopfer, L. and Raghavan, J. (1991) Students’ transition from an engineeringmodel to a science model of experimentation. Journal of Research in Science Teaching,28, 859-882.

Séré, M-G., Fernandez-Gonzalez, F., Gallegos, J., Gonzalez-Garcia, F., De Manuel, E.,Perales, J. and Leach, J. (2001) Images of science linked to labwork: a survey ofsecondary school and university students, Research in Science Education, 31, 499-523.

Séré, M-G., Journeaux, R. and Larcher, C. (1993) Learning the statistical analysis ofmeasurement error. International Journal of Science Education, 15 (4), 427-438.

Shepardson, D. and Moye, E. (1999) The role of anomalous data in restructuring fourthgraders' frameworks for understanding electric circuits. International Journal ofScience Education, 21 (1), 77-94.

Song, J. and Black, P. (1992) The effect of concept requirements and task contexts on pupils’performance in control variables. International Journal of Science Education, 14 (1),83-93.

Southerland, S., Abrams, E., Cummins, C. and Anzelmo, J. (2001) Understanding students’explanations of biological phenomena: conceptual frameworks or p-prims? ScienceEducation, 85, 328-348.

Strauss, A. and Corbin, J. (1990) Basics of qualitative research: Grounded theory proceduresand techniques. Newbury Park: Sage.

Taylor, B. and Kuyatt, C. (1994) Guidelines for Evaluating and Expressing the Uncertaintyof NIST Measurement Results. (NIST Technical Note 1297, 1994). Also available athttp://physics.nist.gov/Pubs/guidelines/contents.html .

Thomson, V. (1997) Precision and the terminology of measurement. The Physics Teacher,35 (1), 15-17.

Tiberghien, A., Veillard, L., Le Marechal, J-F., Buty, C. and Millar, R. (2001) An analysis of

Teaching Scientific Measurement at University

86

Science monograph - final 15/9/05 2:58 pm Page 86

Page 87: Teaching scientific measurement at university

labwork tasks used in science teaching at upper secondary school and university levelsin several European countries. Science Education, 85 (5), 483-508.

Tomlinson, J., Dyson, P. and Garratt, J. (2001) Student misconceptions of the language oferror. University Chemistry Education, 5 (1), 1-8.

Tornkvist, S., Petterson, K. and Transtromer, G. (1993) Confusion by representation: onstudents’ comprehension of the electric field concept. American Journal of Physics, 61,335-338.

Vellom, R. and Anderson, C. (1999) Reasoning about data in middle school science. Journalof Research in Science Teaching, 36 (2), 179-199.

Volkwyn, T., Allie, S., Buffler, A., Lubben, F. and Campbell, B. (2004) First year physicsstudents’ understanding of measurement in the context of laboratory practicals. In A.Buffler and R. Laugksch (eds): Proceedings of the 12th Annual Conference of theSouthern African Association for Research in Mathematics, Science and TechnologyEducation. Cape Town, pp. 1011-1017.

Vygotsky, L. (1978) Mind in society. Oxford: Blackwell.

Wittmann, M. (2002) The object coordination class applied to wave pulses: analysingstudent reasoning in wave physics. International Journal of Science Education, 24 (1),97-118.

Wittmann, M., Steinberg, N. and Redish, E. (2003) Understanding and affecting studentreasoning about sound waves. International Journal of Science Education, 25 (8), 991-1013.

Teaching Scientific Measurement at University

87

Science monograph - final 15/9/05 2:58 pm Page 87

Page 88: Teaching scientific measurement at university

Appendix 1: Probes used in the studies

This appendix provides the rolling ball experimental context, 13 probes foridentifying students’ ideas about measurement and uncertainty, and a final reflectionsheet. All the probes have been discussed in the text. They have been validated inorder to access students’ use of a point or set paradigm in dealing with measurement.Reasoning and action about measurement are solicited in three stages ofexperimental work, i.e. data collection, data processing and comparison of results. Inaddition, more fundamental ideas about the nature of uncertainty are probed directly.The probes are grouped as indicated in Table 2 which has been reproduced below.

Table 2: The probes used in studies.

Probe code Name of Probe Aspect of measurement

SDR “Single Distance Reading” Data collection RD “Repeating Distance” RDA “Repeating Distance Again” RT “Repeating Time”

UR “Using Repeats” Data processing AN “Anomaly” SLG “Straight Line Graph”

SMDS “Same Mean Different Spread” Comparison of DMSS “Different Mean Similar Spread” results DMOS “Different Mean Overlapping Spread” DMSU “Different Mean Same Uncertainty”

NU1 “No Uncertainty 1” Views about NU2 “No Uncertainty 2” uncertainty

Several more probes have been developed and tested but not discussed in thisreport. If interested, please contact the authors for more detail.

Teaching Scientific Measurement at University

88

Science monograph - final 15/9/05 2:58 pm Page 88

Page 89: Teaching scientific measurement at university

University of Cape TownDepartment of Physics

Laboratory Procedures Questionnaire

Instructions:

Write your name in the box above.Inside this envelope there are pages numbered up to page 15.Read the text below and answer the questions on each sheet.If you need more space for your answers, then use the backs of the sheets.It should take you about 5 minutes to answer each question.

Answer the questions in order and do not skip any sheet.When you have completed a question, put the sheet inside this envelope anddo not take it out again, even if you want to change your answer.

Note: It is possible that some answers may be similar or exactly the same asothers. Please write all answers out in full, even if you feel that you arerepeating yourself.

Context:

An experiment is being performed by students in the Physics Laboratory.A wooden slope is clamped near the edge of a table. A ball is released from a heighth above the table as shown in the diagram. The ball leaves the slope horizontally andlands on the floor a distance d from the edge of the table. Special paper is placed onthe floor on which the ball makes a small mark when it lands.

The students have been asked to investigate how the distance d on the floor changeswhen the height h is varied. A metre stick is used to measure d and h .

Teaching Scientific Measurement at University

89

Science monograph - final 15/9/05 2:58 pm Page 89

Page 90: Teaching scientific measurement at university

RT

The students work in groups on the experiment. They are given a stopwatchand are asked to measure the time that the ball takes from the edge of thetable to hitting the ground after being released at h = 400 mm. They discuss what to do.

A B C

With whom do you most closely agree? (Circle ONE):

Explain your choice.

Teaching Scientific Measurement at University

90

We can roll theball once from h = 400 mmand measurethe time. Onceis enough.

Let’s roll the ball twiceform the heighth = 400 mm, and measurethe time for each case.

I think we shouldrelease the ballmore than twicefrom h = 400 mmand measure thetime in each case.

A B C

Science monograph - final 15/9/05 2:58 pm Page 90

Page 91: Teaching scientific measurement at university

SDR

After measuring the time, the students now have to determine d whenh = 400 mm. The students use a metre rule to measure the distance d forone role of the ball. What they see is shown below.

A B C D E

With whom do you most closely agree? (Circle ONE):

Explain your choice.

Teaching Scientific Measurement at University

91

I think thatthe truedistancethe balltravelled isexactly57.68 cm.

I think that thetrue distancethe balltravelled isapproximately57.68 cm.

No, the truedistance theball travelledis between57.60 cm and57.70 cm.

The true distancethe ball travelled isbetween 57.63 cmand 57.73 cm.

I don’tagreewith anyof you.

A B C D

Spot made by ball on paper

Science monograph - final 15/9/05 2:58 pm Page 91

Page 92: Teaching scientific measurement at university

RD

Another group releases the ball down the slope at a height h = 400 mm and,using a metre stick, they measure d to be 436 mm.

The following discussion then takes place between the students.

A B C

With whom do you most closely agree? (Circle ONE):

Explain your choice.

Teaching Scientific Measurement at University

92

I think we shouldroll the ball a fewmore times fromthe same heightand measure deach time.

I think we shouldroll the ball downthe slope just onemore time fromthe same height.

Why? We’ve got theresult already. We donot need to do anymore rolling.

A B C

Science monograph - final 15/9/05 2:58 pm Page 92

Page 93: Teaching scientific measurement at university

RDA

The group of students decide to release the ball again from h = 400 mm.This time they measure d = 426 mm.

First release: h = 400 mm d = 436 mmSecond release: h = 400 mm d = 426 mm

The following discussion then takes place between the students.

A B CWith whom do you most closely agree? (Circle ONE):

Explain your choice.

Teaching Scientific Measurement at University

93

A B C

We knowenough. Wedon’t need torepeat themeasurementagain.

We need torelease the balljust one moretime.

Three releases willnot be enough. Weshould release theball several moretimes.

Science monograph - final 15/9/05 2:58 pm Page 93

Page 94: Teaching scientific measurement at university

UR

The students continue to release the ball down the slope at a height h = 400 mm.Their results after five releases are:

Release d (mm)1 4362 4263 4384 4265 434

The students then discuss what to write down for d as their final result.

Write down what you think the students should record as their final result for d.

Explain your choice.

Teaching Scientific Measurement at University

94

I wonder what we shouldwrite down as our finalresult for d.

Science monograph - final 15/9/05 2:58 pm Page 94

Page 95: Teaching scientific measurement at university

ANAnother group of students have decided to calculate the average of all theirmeasurements of d for h = 400 mm. Their results after six releases are:

Release d (mm)1 4432 4223 4364 5885 4376 429

The students then discuss what to write down for the average of d.

A B

With whom do you most closely agree? (Circle ONE):

Explain your choice.

Teaching Scientific Measurement at University

95

All we need to do isto add all ourmeasurement andthen divide by 6.

No. We shouldignore d = 588mm and thenadd the restand divide by 5.

A B C

Science monograph - final 15/9/05 2:58 pm Page 95

Page 96: Teaching scientific measurement at university

SMDS

Two groups of students compare their results for d obtained by releasing theball at h = 400 mm. Their results for five releases are shown below.

Group A Group BRelease d (mm) d (mm)

1 444 4412 432 4603 424 4104 440 4245 435 440

Average: 435 435

A B CWith which group do you most closely agree? (Circle ONE):

Explain your choice. Do not use the word “results” in your explanation.

Teaching Scientific Measurement at University

96

A B C

Our results arejust as good asyours. Ouraverage is thesame as yours.We both got 435mm for d.

I think theresults of groupB are better thanthe results ofgroup A.

Our results are better.They are all between424 mm and 444mm.Yours are spreadbetween 410 mm and460 mm.

Science monograph - final 15/9/05 2:58 pm Page 96

Page 97: Teaching scientific measurement at university

DMSS

Two other groups of students compare their results for d obtained byreleasing the ball at h = 400 mm. Their results for five releases are shownbelow.

Group A Group BRelease d (mm) d (mm)

1 440 4322 438 4443 433 4264 422 4335 432 440

Average: 433 435

A BWith which group do you most closely agree? (Circle ONE):

Explain your choice. Do not use the word “results” in your explanation.

Teaching Scientific Measurement at University

97

Our results agreewith yours.

No, yourresults donot agreewith ours.

A B

Science monograph - final 15/9/05 2:58 pm Page 97

Page 98: Teaching scientific measurement at university

DMOS

Two groups of students compare their results for d obtained by releasing theball at h = 400 mm. Their results for five releases are shown below.

Group A Group BRelease d (mm) d (mm)

1 444 4582 435 4383 424 4624 440 4495 432 443

Average: 435 450

A BWith which group do you most closely agree? (Circle ONE):

Explain your choice. Do not use the word “results” in your explanation.

Teaching Scientific Measurement at University

98

Our results agreewith yours.

No, your resultsdo not agreewith ours.

A B

Science monograph - final 15/9/05 2:58 pm Page 98

Page 99: Teaching scientific measurement at university

DMSU

Two other groups of students compare their results for d obtained byreleasing the ball at h = 400 mm. Their means and standard deviation of themeans for their releases are shown below.

Group A: d = 436 ± 5 mm

Group B: d = 442 ± 5 mm

A BWith which group do you most closely agree? (Circle ONE):

Explain your choice. Do not use the word “result” in your explanation.

Teaching Scientific Measurement at University

99

A B

Our result agreewith yours.

No, your resultdoes not agreewith ours.

Science monograph - final 15/9/05 2:58 pm Page 99

Page 100: Teaching scientific measurement at university

SLG

A group of students collect data at different heights and use it to plot astraight line graph. The data are plotted below. On this graph, draw the linethat you think best fits this data.

Explain carefully what you have done and why.

Teaching Scientific Measurement at University

100

Science monograph - final 15/9/05 2:58 pm Page 100

Page 101: Teaching scientific measurement at university

NU1

When they are finished, the two groups discuss how they can improve theirrolling ball experiment next time.

A B

With which group do you most closely agree? (Circle ONE):

Explain your choice.

Teaching Scientific Measurement at University

101

If we practice enough we will beable to perfect our technique sothat only one measurement willgive us the true value. No, that is not

possible.

A B

Science monograph - final 15/9/05 2:58 pm Page 101

Page 102: Teaching scientific measurement at university

NU2

The two groups continue to discuss doing experiments in physics ...

A B

With which group do you most closely agree? (Circle ONE):

Explain your choice.

Teaching Scientific Measurement at University

102

It is possible forscientists to design aphysics experiment thatwill provide a result withno uncertainty.

No, it is impossibleto have such anexperiment.

A B

Science monograph - final 15/9/05 2:58 pm Page 102

Page 103: Teaching scientific measurement at university

Comments

Are there any answers to the previous question sheets thatyou want to change?Please do not remove any sheets from the envelope.What was the question about and how do you want tochange your answer?

Any other comments?

In this laboratory questionnair, I thought male female mixedthat the cartoon figures were (tick one): gender

Teaching Scientific Measurement at University

103

Science monograph - final 15/9/05 2:58 pm Page 103

Page 104: Teaching scientific measurement at university

Appendix 2: Coding schemes for probes

This appendix provides coding schemes for all 13 probes reported in Appendix 1.Some of these schemes are, on purpose, comparable (for instance, the codingschemes for the RT and RD probes) some are identical (for instance, those for theDMSS and DMOS probes).

All codes emerged from student responses. This alphanumeric coding scheme makesuse of codes having a letter (indicating the choice of action) and two digits. The firstdigit is associated with a major category of reasoning, while the second digit allowsa sub-category. The same major category of reasoning about measurement was oftenused for justifying a variety of actions. In addition, the coding scheme relates eachcode to either the point (P) or the set (S) paradigm.

Users of these coding schemes should expect to make modifications to theseschemes, depending on the respondents’ level of sophistication in measurement anduncertainty.

Teaching Scientific Measurement at University

104

Science monograph - final 15/9/05 2:58 pm Page 104

Page 105: Teaching scientific measurement at university

RT: Repeating Time

N00 - No response U00 - Not able to code response

A Once is enough, because ... A00 P (no reason given) A01 P (not able to code reason given) A30 P repeating will give the same result A40 P repeating will give different results which is confusing A50 P repeating is a waste of time or resources

B Let’s roll the ball twice from height h = 400 mm, and measure the time foreach case, because ... B00 - (no reason given) B01 - (not able to code reason given) B10 P practice will make the second measurement more accurate B11 P practice will reduce the systematic error in the measurement B20 S you can calculate the average from two measurements B21 S you can get a more accurate/reliable average B30 P to see if you get the same (i.e. correct) result B40 P you need to get a variety of results B50 P many repeats are a waste of time or resources B51 P many repeats are desirable, but time consuming

C I think we should release the ball more than twice from h = 400 mm andmeasure the time in each case, because ... C00 - (no reason given) C01 - (not able to code reason given) C10 P practice will produce a more accurate or better measurement. C11 P practice will reduce the systematic error in the measurement C12 P you have to repeat until the readings are close together C20 S you need more readings to get an average/mean C21 S to get a more accurate/reliable average/mean C22 S to get an average and a spread/uncertainty C23 S to get an average and a better/narrower spread/uncertainty C24 S to get an average in order to get closer to the true value C30 P a few more times may get you the same (i.e. correct) answer C40 P you need to get a variety of results C60 P you have to do it several times (no reason provided) C62 P you must always take three measurements C64 P the answer gets more accurate; closer to the true value C72 S to determine the spread/uncertainty C73 S to determine a better/narrower spread/uncertainty C74 P to determine the uncertainty to get closer to the true value C80 P especially with time measurements repeats are needed

Teaching Scientific Measurement at University

105

Science monograph - final 15/9/05 2:58 pm Page 105

Page 106: Teaching scientific measurement at university

SDR: Single Distance Reading

N00 - No response U00 - Not able to code response

A I think that the true distance the ball travelled is exactly 57.68 cm, because ... A00 P (no reason given) A01 P (not able to code reason given) A10 P measurement is an exact number, 57.68 cm in this case A11 P The ball has travelled exactly 57.68 cm A20 P exact conditions for repeating the experiment will result in exact same

distance. A22 P physics requires exact numbers for investigating relationships and

doing calculations

B I think that the true distance the ball travelled is approximately 57.68 cm, because ... B00 - (no reason given) B01 - (not able to code reason given) B10 P the ball has a finite size B11 P the spot has a finite size B12 P the spot is not exactly on a mark on the ruler B13 P human judgement is necessary to estimate the reading B20 S measurement is always uncertain B21 S the measurement is not exact B22 S physics requires exact numbers but the measurement not exact B23 S of external factors, experimental error or the measuring process B24 S the dot is in the vicinity of 57.68 cm B25 S measurements are not perfect B30 S repeating will give rise to scatter which needs to be taken into account B31 S measurement needs an uncertainty, so repeat and determine the mean

and uncertainty. B32 S external factors when repeating will cause slight variations

C No, the true distance the ball travelled is between 57.60 and 57.70 cm, because ... C00 - (no reason given) C01 - (not able to code reason given) C22 S physics requires exact numbers for investigating relationships and

doing calculations C30 S repeated measurements will fall within this interval, 57.60 - 57.70 cm C32 S external factors when repeating will cause slight variations C33 S the interval represents experimental uncertainty

D The true distance the ball travelled is between 57.63 and 57.73 cm, because ... D00 - (no reason given) D01 - (not able to code reason given) D10 P the measurement can theoretically be exact with a better rulerD20 S measurement is always uncertainD30 S repeated measurements will fall within this interval 57.63- 57.73 cm

E I don’t agree with any of you, because ... E00 - (no reason given) E01 - (not able to code reason given) E20 S measurement is always uncertain

Teaching Scientific Measurement at University

106

Science monograph - final 15/9/05 2:58 pm Page 106

Page 107: Teaching scientific measurement at university

RD: Repeating Distance

N00 - No response U00 - Not able to code response

A I think we should roll the ball a few more times from the same height andmeasure d each time because ... A00 - (no reason given) A01 - (not able to code reason given) A10 P practice will produce a more accurate or better measurement. A11 P practice will reduce the systematic error in the measurement A12 P you have to repeat until the readings are close together A20 S you need more readings to get an average/mean A21 S you need to get a more accurate/reliable average/mean A22 S you need to get an average and a spread/uncertainty A23 S you need to get an average and a better/narrower spread/uncertaintyA24 S you need to get an average in order to get closer to the true value A30 P a few more rolls may get you the same (i.e. correct) answer A40 P you need to get a variety of results A60 P you have to do it several times (no reason provided) A62 P you must always take three measurements A64 P the answer gets more accurate; closer to the true value A72 S you need to determine the spread/uncertainty A73 S you need to determine a better/narrower spread/uncertainty A74 P you need to determine the uncertainty to get closer to the true value

B Why? We’ve got the result already. We do not need to do any more rolling,because .... B00 P (no reason given) B01 P (not able to code reason given) B30 P repeating will give the same result B40 P repeating will give different results which is confusing B50 P repeating is a waste of time or resources

C I think we should roll the ball down the slope just one more timefrom the same height, because ... C00 - (no reason given) C01 - (not able to code reason given) C10 P practice will make the second measurement more accurate C11 P practice will reduce the systematic error in the measurement C20 S you can calculate the average from two measurements C21 S you can get a more accurate/reliable average C30 P you need to see if you get the same (i.e. correct) result C40 P you need to get a variety of results C50 P many repeats are a waste of time or resources C51 P many repeats are desirable, but time consuming

Teaching Scientific Measurement at University

107

Science monograph - final 15/9/05 2:58 pm Page 107

Page 108: Teaching scientific measurement at university

RDA: Repeating Distance Again

N00 - No response U00 - Not able to code response

A We know enough. We don’t need to repeat the measurement again, because ...A00 P (no reason given) A01 P (not able to code reason given) A40 P repeating will give a different result again, which is confusing A50 P it saves doing it again; repeats are a waste of time / resources

B We need to release the ball just one more time, because .... B00 P (no reason given) B01 P (not able to code reason given) B10 P practice will make the third measurement even more accurate B11 P practice will reduce the systematic error in the measurement B20 P you need more measurements to get an average / mean B21 S you need to get a more accurate average / mean B22 S you need to get an average and a spread / uncertainty B23 S you need to get an average and a more accurate / narrower uncertainty B24 S you need to get the average in order to get closer to the true value B30 P the 3rd measurement may give the same (i.e. correct) answer B40 P 3 measurements are enough; too many different answers are confusing B50 P many repeats are a waste of time or resources B51 P many repeats are desirable, but time consuming B60 P you have to do it three times (no reason provided) B64 P the answer gets more accurate; closer to the true value B72 S you need to determine the spread / uncertainty B73 S you need to determine a more accurate / narrower spread / uncertainty B74 P you need to determine the uncertainty to get closer to the true value

C Three releases will not be enough. We should release the ball several more times,because .... C00 - (no reason given) C01 - (not able to code reason given) C10 P the more practice, the more accurate your measurement gets C11 P practice will reduce the systematic error in the measurement C12 P you have to repeat until the measurements are close together C20 S you need more measurements to get an average / mean C21 S you need to get a more accurate average / mean C22 S you need to get the average/mean and the spread/uncertainty C23 S you need to get the average and a more accurate spread/uncertainty C24 S you need to get an average in order to get closer to the true value C30 P a few more times may get you the same (i.e. correct) answer C40 P you need to get a large variety of results C60 P you have to do it more than three times (no reason provided) C64 P the answer gets more accurate; closer to the true value C72 S you need to determine the spread / uncertainty C73 S you need to determine a more accurate / narrower spread / uncertainty C74 P you need to determine the uncertainty to get closer to the true valueC80 - you need many repeated measurements for plotting a graph

Teaching Scientific Measurement at University

108

Science monograph - final 15/9/05 2:58 pm Page 108

Page 109: Teaching scientific measurement at university

UR: Using Repeats

N00 - No response U00 - Not able to code response 10 P 436 mm is the first measurement 11 S 436 mm is the average 12 P 436 mm is closest to the average 13 S 436 mm is the median

20 P 440 mm is the middle measurement 21 S 440 mm is the average 22 P 440 mm is closest to the average 23 S 440 mm is the median

30 P 434 mm is the last measurement 31 S 434 mm is the average 32 P 434 mm is closest to the average 33 S 434 mm is the median

40 P 425 mm occurs twice 41 S 425 mm is the average 42 P 425 mm is closest to the average 43 S 425 mm is the median

51 S 432 mm is the average 52 P 432 mm is closest to the average 53 S 432 mm is the median

Teaching Scientific Measurement at University

109

Science monograph - final 15/9/05 2:58 pm Page 109

Page 110: Teaching scientific measurement at university

AN: Anomaly

N00 - No response U00 - Not able to code response

A All we need to do is to add all our measurements and then divide by 6, because .... A00 - (no reason given) A01 - (not able to code reason given) A10 - we just have to take the average that way A20 - all the measurements have to be taken in to account A30 - 588 mm is a bit far from the other values, but there always is a spread

in the measurements A40 - 588 mm is a bit far from the other values, but still it needs to be

included in the average

B No. We should ignore d = 588 mm and then add the rest and divide by 5, because .... B00 - (no reason given) B01 - (not able to code reason given) B30 - one measurement is a out of the range / too far away B40 - 588 mm may be a mistake, so we need to exclude it for the average B41 - 588 mm is a mistake, so we need to take another measurement to take

the average out of 6.

Teaching Scientific Measurement at University

110

Science monograph - final 15/9/05 2:58 pm Page 110

Page 111: Teaching scientific measurement at university

SMDS: Same Mean Different Spread

N00 - No response U00 - Not able to code response

A A’s results are better, because ... A00 - (no reason given) A01 - (not able to code reason given) A11 S they have a smaller range/spread because of outside factors A12 S they have smaller range/spread because fewer mistakes were made A13 S they have smaller range/spread , therefore a more accurate/reliable

average A14 S they have smaller range/spread, therefore are closer to true value A15 S they have smaller range/spread because group A was more skilful A20 S there is less deviance from the average A21 S there is less deviance from the average because of outside factors A22 S less deviance from the average because of fewer mistakes made A25 S less deviance from the average because group A was more skilful A40 - you usually get the results so close together A50 P their average (435 mm) is also one of the measurements A63 S A’s results are more accurate/ consistent

B B’s results are just as good as A’s, because ... B00 - (no reason given) B01 - (not able to code reason given) B10 S they got more or less the same measurements B20 P they have the same average B21 P they have the same average although different outside factors caused

deviation B22 P they have the same average although mistakes caused deviation B23 P they have the same average, and the spread is not important B26 P they have the same average, deviation not important as expected B29 P they have the same average and same number of readings B30 P they have the same average, although A got 435 mm on their last

measurement. B60 - there is no exact answer to an experiment like this B65 P the accuracy of individual readings is not under consideration, the

average is important B70 P it is a natural outcome of the same experiment, the spread is not

important)

C I think that the results of group B are better than the results of group Abecause ... C00 - (no reason given) C01 - (not able to code reason given) C10 S B’s results are closer together; they don’t vary as much C11 S B’s average is more accurate/reliable C12 S B’s spread is smaller, so the average is more accurate C40 - you usually get the results so close together C50 P A’s average (435 mm) is also one of the measurements

Teaching Scientific Measurement at University

111

Science monograph - final 15/9/05 2:58 pm Page 111

Page 112: Teaching scientific measurement at university

Teaching Scientific Measurement at University

112

DMSS: Different Mean Similar Spread

DMOS: Different Mean Overlapping Spread

N00 - No response U00 - Not able to code response

A Our results agree with yours, because ... A00 - (no reason given) A01 - (not able to code reason) A10 P the readings/measurements for both sets are more or less the same A12 P the readings/measurements for both sets have the same spread A13 S the readings/measurements have an overlapping spread A20 P the averages are more or less the same A21 P the averages are more or less the same, difference due to external

factors A22 P the averages are more or less the same, difference due to experimental

errors A24 P the averages are more or less the same, both close to true value A26 P the averages are more or less the same as there will always be

deviation A30 S the uncertainties of the averages may overlap A31 S the averages are more or less the same with similar ranges/spreads A40 P three out of five (the majority) of readings are the same A50 P if you round off the averages, then they are identical

B No, your results do not agree with ours, because ... B00 - (no reason given) B01 - (not able to code response) B12 S the spreads of both sets are different B20 P The averages are different B21 P The averages are different due to different conditions/external factors B22 P The averages are different due to experimental errors B24 P The averages different – uncertain about where the true value lies B25 P The averages are different, absolute accuracy/identical results required

to agree B26 P The averages are too different even though deviation is taken into

consideration B30 S the averages are too far apart for the uncertainties to overlap B31 P average is different and all individual readings are not the same B32 S The spread differs between the two B40 P both groups got some different measurements B50 P if you round off the averages, then they are very different B60 P an average is only true if the average value also appears as one of the

measurements

Science monograph - final 15/9/05 2:58 pm Page 112

Page 113: Teaching scientific measurement at university

Teaching Scientific Measurement at University

113

DMSU: Different Mean Same Uncertainty

N00 - No response U00 - Not able to code response

A Our result agrees with yours, because ... A00 - (no reason given) A01 - (not able to code reason) A20 P the averages are more or less the same A21 P the averages are more or less the same, difference due to external factors A22 P the averages are more or less the same, difference due to experimental

errors A24 P the averages are more or less the same, both close to true value A30 S the uncertainties of the averages overlap (overlapping intervals) A31 P the uncertainties are the same

B No, your result does not agree with ours, because ... B00 - (no reason given) B01 - (not able to code response) B20 P the averages are different B21 P the averages are different due to different conditions/external factors B22 P the averages are different due to experimental errors B24 P the averages different – uncertain about where the true value lies B30 S the uncertainties do not overlap (incorrect conclusion, but set reasoning) B31 P the uncertainties are the same

Science monograph - final 15/9/05 2:58 pm Page 113

Page 114: Teaching scientific measurement at university

Teaching Scientific Measurement at University

114

SLG: Straight Line Graph

N00 - No response U00 - Not able to code response 10 P I have joined all the points 11 S I have included all the points 20 P I have joined the lowest and highest points 21 P The line goes through the middle point 22 P The line goes through the most number of points 30 S Some points lie above the line, some below the line 31 S The same number of points lie above and below the line 40 S All the points are quite close to the line 41 S This gives the smallest uncertainty / smallest sum of squares 42 S The line is a least squares fit to the data 50 - This is how we always draw straight lines 60 - The line goes through the origin

Science monograph - final 15/9/05 2:58 pm Page 114

Page 115: Teaching scientific measurement at university

Teaching Scientific Measurement at University

115

NU1: No Uncertainty 1

N00 - No response U00 - Not able to code response

A If we practice enough we will be able to perfect our technique so that only onemeasurement will give us the true value, because ... A00 - (no reason given) A01 - (not able to code reason) A10 P “practice makes perfect.” A11 P then they get a more accurate result/ the

true value A12 P if no external forces are involved, then it is only about experimental

errors.

B No, that is not possible, because ... B00 - (no reason given) B01 - (not able to code response) B10 P one reading can’t give you the true value: you need to confirm it with

additional readings B11 P you don’t know whether you obtained the true value as you cannot

control outside factors. B12 P the same conditions should give the same readings (true value): in reality

conditions differ B13 - results will always deviate (no reason given) B14 - you always have to measure more than once. B15 S you cannot avoid human error or mistakes B16 S your uncertainty can never be zero B17 - in physics you never will know the true result/value B18 S you need to repeat in order to calculate the mean and uncertainty

Science monograph - final 15/9/05 2:58 pm Page 115

Page 116: Teaching scientific measurement at university

Teaching Scientific Measurement at University

116

NU2: No Uncertainty 2

N00 - No response U00 - Not able to code response

A It is possible for scientists to design a physics experiment that will provide a resultwith no uncertainty, because ... A00 - (no reason given) A01 - (not able to code reason) A10 P “practice makes perfect.” A11 P then they get a more accurate result/ the true value A12 P if no external forces are involved, then it is only about experimental

errors. A13 P technology is improving all the time

B No, it is impossible to have such an experiment, because ... B00 - (no reason given) B01 - (not able to code response) B10 P one reading can’t give you the true value: you need to confirm it with

additional readings B11 P you don’t know whether you obtained the true value as you cannot

control outside factors. B13 - results will always deviate (no reason given) B15 S you cannot avoid human error or mistakes B16 S your uncertainty can never be zero B17 - in physics you never will know the true result/value

Science monograph - final 15/9/05 2:58 pm Page 116

Page 117: Teaching scientific measurement at university

Teaching Scientific Measurement at University

117

Appendix 3: A short summary of main ideas regarding probabilitydensity functions and their use in measurement

A probability density function is a function (a distribution) that describes theprobability that the measurand lies between two values. Since we are dealing with adensity function the probability is given by the area under the curve, the total area ofwhich must be unity, or 100%. By way of example we present a Gaussian shapedpdf that represents the final result of a measurement in which we are trying to findthe mass M of some object. Thus, on the x-axis we have the value of the measurand(M) in kilograms while the y-axis we have the probability per unit mass p(M). Theprobability that the value of mass M lies within the values 50.4 kg and 50.5 kg, isrepresented by the area under the curve between 50.4 kg and 50.5 kg. Thus, supposethe shaded area is calculated to be 0.2 then we can say that the probability that themass M lies between 50.4 kg and 50.5 kg is 0.2 or 20%. We note here that the x-axisrepresents what in frequentist terms would be called the “true” value and that in theseterms the (true) value is a variable.

Figure 7: A Gaussian pdf.

Intuitively, from the shape of this pdf we can see that the best estimate of themass will be in the region around the peak of the pdf. Another way to look atthis is that for a given probability the value of the measurand will be confined toa tighter and tighter range as we move from the tail region towards the centre ofthe pdf. This summarises the essence of the purpose of measurement which, fora given probability, is to make the uncertainty interval as narrow as possible.

The shaded areabetween 50.4 kgand 50.5 kg = 0.2.

0M

p(M)

50.1 50.2 50.3 50.4 50.5

Science monograph - final 15/9/05 2:58 pm Page 117

Page 118: Teaching scientific measurement at university

Teaching Scientific Measurement at University

118

There are three types of pdf that cover most situations in metrology: a uniform orrectangular pdf, a triangular pdf and a Gaussian pdf. Two other pdfs that are alsoused but won’t be discussed further, namely, the “student-t” distribution and thetrapezoidal. The pdfs may be characterised by two of their moments, namely thezeroth moment and the second moment. The zeroth moment (akin to the centre ofgravity) gives us the best estimate of the measurand. Since all the pdf’s mentionedare symmetrical, the best estimate is simply located at the centre of the distribution.Thus, in the example above the best estimate of the mass will be 50.3 kg. The secondmoment of the distribution provides a consistent way of indicating the “average”width of the distribution. The second moment is the variance of the distribution andthe square root gives us the standard uncertainty u. If we represent the best estimateby x then the area under the curve between x + u and x - u gives us the coverageprobability of the level of confidence. Readers familiar with mathematical statisticsshould note that there is a semantic difference between the term level of confidence,which implies that both Type A and Type B evaluations have been used, andconfidence level that implies only a Type A evaluation. In the table below wesummarise the standard uncertainty and the associated level of confidence for eachof the three pdfs.

Table 23: Some common pdfs used in metrology.

Type of pdf Standard Level ofuncertainty u confidence

The “flat” or uniform pdf of interval width a 0.58

The triangular pdf of interval width a 0.65

The “bell-shaped” or Gaussian pdf u = � the standard 0.68 of width 2� deviation of the mean

With regard to the Gaussian form, in most practical situations we will not know� but can estimate it from the data by calculating the experimental standarddeviation of the mean from the usual statistical formula. The standarduncertainty is thus given by the experimental standard deviation of the meanwhile the best estimate is given by the arithmetic average of the data.

a

a

au =

2 3

au =

2 6

Science monograph - final 15/9/05 2:58 pm Page 118

Page 119: Teaching scientific measurement at university

Teaching Scientific Measurement at University

119

Examples of implementing the ISO approach

Example 1. A digital voltmeter reading

We place a voltmeter across a battery and obtain the reading as shown in Figure 8. What is the voltage of the battery that we can infer from this reading?Clearly the best estimate is 2.37 V but what is the uncertainty? Firstly weidentify the sources of uncertainty and then quantify them, i.e. we put togetheran uncertainty budget. In this case there would be a number of such sources, forexample, the resolution of the scale of the instrument, the rated accuracy,environmental factors such as the temperature, etc. For the purposes ofillustration we will only evaluate two sources of uncertainty, namely, that due tothe resolution of the scale us and that associated with the rating ur of thevoltmeter which we shouldfind marked on the instrument, such as ±1%. We canthen calculate the combined uncertainty uc associated with the measurement.

Figure 8: (a) A single digital reading. (b) The uniform pdf used to model theuncertainty due to reading the scale of the instrument, expressing that the

measurand could lie with equal probability at any position within the interval. Thevalue of 0.0029 V is the standard uncertainty us associated with the scale reading

only. A similar uniform pdf models the uncertainty due to rated accuracy ur = 0.0136 V, as well as the combined uncertainty uc= 0.0139 V. The measurement

result is expressed as Vresult = 2.370 ± 0.014 V.

Science monograph - final 15/9/05 2:58 pm Page 119

Page 120: Teaching scientific measurement at university

Teaching Scientific Measurement at University

120

We start with evaluating the scale uncertainty. If we look at the reading we canassume that the value of the voltage is between 2.365 and 2.375. If the voltagewere larger than 2.375 the voltmeter would have displayed 2.38 while if thevoltage were smaller than 2.365 we would have viewed 2.36. Since there is noinformation to favour any of the digits within the interval we assign a uniformpdf centred around 2.370 with an interval of 0.01 and edges at 2.365 and 2.375as shown in the diagram. Using the results from Table 23 we can calculate thestandard uncertainty us as follows.

.

We now have to convert the rated accuracy of the voltmeter, given as ±1% (ofwhatever the reading is), to a standard uncertainty ur. In such cases GUMsuggests using a uniform pdf and to assume that the percentage refers to half ofthe width of the interval. Thus half the width of the distribution will be (0.01) (2.37) = 0.0237 V. The standard uncertainty ur is then given by

.

The combined uncertainty uc is therefore

.

In practice this uncertainty estimate might be larger if some of the other sources ofuncertainty, neglected here, are included in the uncertainty budget.

We would like to point out that in each case above we have quoted the finaluncertainties to two figures as advocate by GUM and not rounded it off to one. Thereason for this is that although the first digit might suffice to indicate which digit inthe best estimate we are uncertain of, we often have to use the results in othercalculations where the variances are usually used. Since we have to square thestandard uncertainties to do so using rounded off values can change later uncertaintyestimates by unacceptably large amounts while using the two digits limits suchdifference to a few percent.

Example 2. An analogue voltmeter reading

The reading on an analogue voltmeter is shown in Figure 9. What is the voltage thatwe can infer from this reading? Here of course there is some degree of subjectivityand the final result will also reflect this. For example if the pointer were exactly ona mark we would be more confident of the reading than in the case of the one shown.

half the width of the rectangle 0.005us = = = 0.0029 V

3 3

0.0286ur = = 0.0136 V

3

uc = us2+ ur

2 = (0.0029)2 + (0.0136)2 = 0.014 V

Science monograph - final 15/9/05 2:58 pm Page 120

Page 121: Teaching scientific measurement at university

Teaching Scientific Measurement at University

Thus, in the former case we would expect a somewhat smaller scale uncertainty thanin the case shown.

We might decide that the best estimate of the voltage is 2.27 V, certainly no higherthan 2.29 V and not lower than 2.25 V with decreasing probability as we go towardsthese values form our best estimate. We may consider it appropriate to choose to usea triangular pdf in this case for which the interval goes from 2.25 V to 2.29 V i.e. is0.04 V wide. The standard uncertainty associated with the reading of the scale usingthe information in Table 23 is

Once again we need to consider the rating of the instrument which will be thedominant component in the uncertainty budget. Another source of uncertainty toconsider is the zero reading. Evaluating these sources of uncertainty and combingthem all in quadrature leads to the combined uncertainty.

Figure 9: (a) A single analogue reading. (b) The triangular pdf used to model theuncertainty due to reading the scale of the instrument. The value of 0.0082 V is the

standard uncertainty us associated with the scale reading only.

121

half the width of the rectangle 0.002us = = = 0.0082 V

3 3

Science monograph - final 15/9/05 2:58 pm Page 121

Page 122: Teaching scientific measurement at university

Teaching Scientific Measurement at University

Example 3. An ensemble of dispersed repeated readings

Consider an experiment where we make 20 repeated observations of a time t underthe same conditions, for example in measuring the period of a pendulum, with astopwatch having a resolution of 1 ms and rated accuracy of 0.1 ms. The 20 readingsare summarized and represented as a histogram of relative frequencies (Figure 10).According to the traditional approach, the measured values ti are modeled as valuesof a random variable tmeasured. The 20 values are considered to be sampled from anidealized Gaussian distribution, which would occur if the data were infinite and thehistogram bins were reduced to zero width. From our sample we can estimate theparameters of this idealized Gaussian through the familiar quantities of thearithmetic mean t of the N=20 observations as

and the experimental standard deviation s(t) of the observations,

The calculations for the data in question yield that t =1.015 s and s(t) = 0.146 s.

Based on the result from the Central Limit Theorem that the sample means aredistributed normally, the experimental standard deviation of the mean s( t ) isgiven by

which yields s( t ) = 0.033 s in the present example. In the traditional approachs( t ) is often termed the “standard error of the mean”, and is denoted by �m. Theinterpretation of this result according to mathematical statistics is that “we are68% confident that the mean (of any future sample taken) will lie within± 0.033 s of the measured mean of 1.015 s” (Conclusion I).

Physicists tend to interpret Conclusion I in accordance with their needs formaking an inference about the true value as follows, “we are 68% confident thatthe “true value” (of the measurand) lies in the interval 1.015 ± 0.033 s”(Conclusion II).

However, Conclusion II cannot easily be justified in the traditional approachsince t and s( t ) are calculated from observed values, and can only summarizewhat we know about the data since there is no formal link between knowledgeof the measurand (Conclusion II) and knowledge of the data (Conclusion I).

1 N

s(t) = (ti – t )2

N – 1i=1

s(t)s( t ) =

N

1 N

t = ti N i=1�

122

Science monograph - final 15/9/05 2:58 pm Page 122

Page 123: Teaching scientific measurement at university

Teaching Scientific Measurement at University

Thus the measurement result cannot be represented directly on Figure 10(a)because the relative frequency histogram and the predicted Gaussian of infinitemeasurements (Figure. 10(a)) are plotted against tmeasured . In the probabilistic approach, however, all inferences about the measurand areexpressed via the pdf of Figure 10(b) which is plotted against ttrue. Using theconcepts of prior information and data at hand, we are able to conclude in astraightforward and logically consistent way the final result as follows: “the bestestimate of the value of the time is 1.015 s with a standard uncertainty of 0.033s, and we are 68% confident that the best estimate of the time lies within theinterval 1.015 ± 0.033 s, assuming that the distribution of measured times isGaussian”. In practice, of course, the uncertainty budget for this measurementof t would include a number of additional sources of uncertainty, each of whichwould be estimated using a Type B evaluation, so that the combined uncertaintywould be larger than 0.033s.

Figure 10: (a) Distribution of relative frequencies for the time readings tmeasured .The dotted line represents the predicted Gaussian distribution of the population from

which the 20 readings were sampled. (b) A Gaussian pdf used to model themeasurement result. The final result tresult indicated assumes that all other sources

of uncertainty are negligible.

123

(a)

(b)

Science monograph - final 15/9/05 2:58 pm Page 123

Page 124: Teaching scientific measurement at university

Teaching Scientific Measurement at University

124

Appendix 4: Summary of outputs resulting from this research project (1998-2004)

Journal papers:

Allie, S., Buffler, A., Campbell, B., Lubben, F., Evangelinos, D., Psillos, D. and Valassiades, O.(2003) Teaching measurement in the introductory physics laboratory. The Physics Teacher,41 (7), 394-401.

Allie, S., Buffler, A., Kaunda, L., Campbell, B. and Lubben, F. (1998) First year physicsstudents’ perceptions of the quality of experimental measurements. International Journalof Science Education, 20 (4), 447-459.

Buffler, A., Allie, S., Lubben, F. and Campbell, B. (2001) The development of first year physicsstudents’ ideas about measurement in terms of point and set paradigms. InternationalJournal of Science Education, 23 (11), 1137-1156.

Campbell, B., Kaunda, L., Allie, S., Buffler, A. and Lubben, F. (2000) The communication oflaboratory investigations by university entrants. Journal of Research in Science Teaching,37(8), 839-853.

Kaunda, L., Allie, S., Buffler, A., Campbell, B. and Lubben, F. (1998) An investigation ofstudents' ability to communicate science investigations. South African Journal for HigherEducation, 12 (1), 122-129.

Lubben, F., Buffler, A., Allie, S. and Campbell, B. (2001) Point and set reasoning in practicalscience measurement by entrant university freshmen. Science Education, 85, 311-327.

Rollnick, M., Allie, S., Buffler, A., Campbell, B. and Lubben, F. (in press) Development andapplication of a model for students’ decision making in laboratory work. African Journalof Research in Mathematics, Science and Technology Education.

Chapter in book:

Allie, S., Buffler, A., Lubben, F. and Campbell, B. (2001) Point and set paradigms in students’handling of experimental measurements. in H. Behrendt, H. Dahncke, R. Duit, W. Graber,M. Komorek, A. Kross and P. Reiska (eds): Research in Science Education - Past, Presentand Future. Dordrecht: Kluwer Academic. pp. 331-336.

Published conference proceedings:

Allie, S., Buffler, A., Campbell, B. and Lubben, F. (1999) Procedural understanding of pre-firstyear students at the University of Cape Town, South Africa. In C. Rust (ed): Proceedingsof the 6th International Symposium 'Improving Student Learning Outcomes', Brighton,United Kingdom, July 1998, pp. 146-156.

Allie, S., Buffler, A., Kaunda, L., Campbell, B. and Lubben, F. (1997) Procedural understandingof first year science students. In M. Sanders (ed): Proceedings of the Fifth Annual Meetingof the Southern African Association for Research in Mathematics and Science Education.pp. 464-470.

Allie, S., Buffler, A., Kaunda, L., Campbell, B. and Lubben, F. (1997) Analysing how studentscommunicate science investigations. In M. Sanders (ed): Proceedings of the Fifth AnnualMeeting of the Southern African Association for Research in Mathematics and ScienceEducation. pp. 471-476.

Allie, S., Buffler, A., Lubben, F. and Campbell, B. (2002) Constructing a research-basedframework for a first year physics laboratory curriculum. In C. Malcolm and C. Lubisi(eds): Proceedings of the 10th Annual Conference of the Southern African Association forResearch in Mathematics, Science and Technology Education, Durban, pp. III-4-5.

Science monograph - final 15/9/05 2:58 pm Page 124

Page 125: Teaching scientific measurement at university

Teaching Scientific Measurement at University

125

Buffler, A., Allie, S., Campbell, B. and Lubben, F. (1998) The role of laboratory experience onthe procedural understanding of pre-first year science students at UCT. In N. A. Ogude andC. Bohlmann (eds): Proceedings of the Sixth Annual Meeting of the Southern AfricanAssociation for Research in Mathematics and Science Education. pp. 496-502.

Buffler, A., Allie, S., Lubben, F. and Campbell, B. (1999) Procedural understanding: point andset reasoning within a cohort of first year science students. In J. Kuiper (ed): Proceedingsof the Seventh Annual Meeting of the Southern African Association for Research inMathematics and Science Education. (pp. 76-84).

Buffler, A., Allie, S., Lubben, F. and Campbell, B. (2003) Evaluating a research-basedcurriculum for teaching measurement in the first year physics laboratory. In B. Putsoa, M.Dlamini, B. Dlamini and V. Kelly (eds): Proceedings of the 11th Annual Conference of theSouthern African Association for Research in Mathematics, Science and TechnologyEducation, Swaziland, pp 689-695.

Kaunda, L., Allie, S. Buffler, A., Campbell, B. and Lubben, F. (1998) Pre-first year sciencestudents' ability to report their understanding of laboratory procedures: Language casestudies. In N. A. Ogude and C. Bohlmann (eds): Proceedings of the Sixth Annual Meetingof the Southern African Association for Research in Mathematics and Science Education.(pp. 216-224).

Kaunda, L., Allie, S., Buffler, A., Campbell, B. and Lubben, F. (1999) The communication oflaboratory investigations by entering university students. In M.A. Clements and L.Y. Pak(eds): Conference Proceedings ,Cultural and language aspects of science, mathematics andtechnical education', June 1999, Brunei Darussalam. (pp 176-185).

Lubben, F., Campbell, B., Buffler, A. and Allie, S. (2004) The influence of context on judgementsof the quality of experimental measurements. In A. Buffler and R. Laugksch (eds):Proceedings of the 12th Annual Conference of the Southern African Association forResearch in Mathematics, Science and Technology Education. Cape Town, pp. 569-577.

Rollnick, M., Allie, S., Buffler, A., Campbell, B., Kaunda, L. and Lubben, F. (1999) Towards thedevelopment of a model for describing students' thought processes in a laboratory situation.In J. Kuiper (ed): Proceedings of the Seventh Annual Meeting of the Southern AfricanAssociation for Research in Mathematics and Science Education. pp. 366-371.

Volkwyn, T., Allie, S., Buffler, A., Lubben, F. and Campbell, B. (2004) First year physicsstudents’ understanding of measurement in the context of laboratory practicals. In A.Buffler and R. Laugksch (eds): Proceedings of the 12th Annual Conference of the SouthernAfrican Association for Research in Mathematics, Science and Technology Education.Cape Town, pp. 1011-1017.

Unpublished theses for higher degrees:

Govender, I. (2000) Students’ perception of the need to repeat measurements in differentcontexts. Unpublished Honours project. Physics Department, University of Cape Town

Heinicke, S. (2002) First year students’ ideas about dispersion in measurement data prior tolaboratory instruction. Unpublished Honours project. Physics Department, University ofCape Town.

Ibrahim, B. The relationship between views of the nature of science and views of the nature ofscientific measurement. MSc thesis (in progress). Physics Department, University of CapeTown.

Pillay, S. The evaluation of a research-based curriculum for teaching measurement in the firstyear physics laboratory. MSc thesis (in progress). Physics Department, University of CapeTown.

Science monograph - final 15/9/05 2:58 pm Page 125

Page 126: Teaching scientific measurement at university

Teaching Scientific Measurement at University

126

Volkwyn, T. First year students' understanding of measurement in the context of physicslaboratory practicals. MSc thesis (in progress). Physics Department, University of CapeTown.

Other conference presentations:

Allie, S., Buffler, A., Lubben, F. and Campbell, B. (1999) Procedural understanding of first yearuniversity science students. Paper presented at the Annual Conference of the South AfricanInstitute of Physics, Cape Town, South Africa, July 1998.

Allie, S., Buffler, A., Lubben, F. and Campbell, B. (1999) Point and set reasoning in the contextof experimental measurements in a first year university physics course. Paper presented atthe 2nd International Conference of the European Science Education Research Association,Kiel, Germany, September 1999.

Allie, S., Buffler, A., Lubben, F. and Campbell, B. (1999) Constructing a research-basedframework for a first year physics laboratory curriculum. Paper presented at the 10thAnnual Meeting of the Southern African Association for Research in Mathematics andScience Education, Durban, South Africa, January 2002.

Allie, S., Buffler, A., Lubben, F. and Campbell, B. (2003) A probabilistic approach to teachingmeasurement and uncertainty. Paper presented at the Annual Conference of the SouthAfrican Institute of Physics, Stellenbosch, South Africa, July 2003.

Buffler, A., Allie, S., Lubben, F. and Campbell, B. (2001) The point and set paradigms: towardsthe effective teaching of measurement in the first year physics laboratory. Paper presentedat the 3rd International Conference of the European Science Education ResearchAssociation, Thessaloniki, Greece, August 2001.

Buffler, A., Allie, S., Lubben, F. and Campbell, B. (2001) Toward effective teaching ofmeasurement in the freshman physics laboratory. Paper presented at the 123rd (Summer)Meeting of the American Association of Physics Teachers, Rochester, New York, July 2001.

Buffler, A., Allie, S., Lubben, F. and Campbell, B. (2003) Students’ understanding ofmeasurement and uncertainty. Paper presented at the Annual Conference of the SouthAfrican Institute of Physics, Stellenbosch, South Africa, July 2003.

Buffler, A., Allie, S., Lubben, F. and Campbell, B. (2003) Evaluation of a research-basedcurriculum for teaching measurement in the first year physics laboratory. Paper presentedat the 4th International Conference of the European Science Education ResearchAssociation, Noordwijkerhout, The Netherlands, August 2003.

Campbell, B., Lubben, F., Allie, S., Buffler, A. and Kaunda, L. (1997) Procedural understandingof first year university students. Paper presented at the Annual Conference of the BritishEducational Research Association, York, United Kingdom, September 1997.

Heinicke, S., Allie, S., Buffler, A., Volkwyn, T., Lubben, F. and Campbell, B. (2002) First yearstudents’ views on spread in data sets prior to laboratory instruction. Paper presented at theAnnual Conference of the South African Institute of Physics, Potchefstroom, South Africa,September 2002.

Course manual:

Buffler, A., Allie, S., Lubben, F. and Campbell, B. (2002) Introduction to measurement in thephysics laboratory. A probabilistic approach. Laboratory Manual, Department of Physics,University of Cape Town. (Can be downloaded from http://www.phy.uct.ac.za/people/buffler/labmanual.html )

Science monograph - final 15/9/05 2:58 pm Page 126