Research Methods and Stattistics Notes 1

Embed Size (px)

DESCRIPTION

Research

Citation preview

  • RESEARCH METHODS AND

    STATISTICS

    DEPARTMENT OF CONSTRUCTION ECONOMICS AND

    MANAGEMENT

    CEDAT

    SEMESTER II, 2014

    BSC(CM) II, BS (QS) III, BS (LE) III 3/5/2014 1

  • TUTORS: Godfrey Mwesige

    BSC (Civil Eng.), Mak; MSC (Civil Eng.), Illinois; Dip. Road

    Traffic Safety, Lund

    Mr. Julius Ssemanda

    BS(QS); Mak

    3/5/2014 2

  • COURSE OUTLINE

    PART I: RESEARCH METHODS

    Categories of research

    Surveys and Experiments

    Data Collection Methods

    Sampling Errors and Mitigation Measures

    Design of Survey Data Collection Instrument- Questionnaire

    Planning and Designing a Research Study

    Ethical Considerations in Research

    Disseminating Research Results

    3/5/2014 3

  • COURSE OUTLINE PART II: STATISTICS

    Nature of Data

    Data Types

    Descriptive Statistics; Collect, Present and Characterize Data; Measures of Location and Dispersion/Variability.

    Statistical Inference; Estimation and Hypothesis Testing.

    Linear Regression and Correlations.

    3/5/2014 4

  • REFERENCES

    Marczyk, G., Dematteo, D. & Festinger, D. (2005). Essentials of Research Design and Methodology. John Wiley & Sons Inc.

    Johnson, R., Freund, J. & Miller, I. (2011). Miller and Freunds Probability and Statistics for Engineers. 8th Edition, Prentice Hall.

    Blaxter, L., Hughes, C. & Tight, M. (2006). How to Research. 3rd Ed., Open University Press

    Any other Reference Texts on Research Methods and Statistics

    3/5/2014 5

  • Course Management

    Mode of Delivery: Lectures, Labs and Class Exercises. All compulsory.

    Continuous Assessment (40%) :

    Coursework (20%) : 30th April, 2014.

    Two Tests (20%): Dates to be communicated

    Final Exam (60%): Covering entire course content

    3/5/2014 6

  • Introduction

    Research methods is a scientific approach to solving problems real or abstract following scientific steps; problem definition, questions,

    hypothesis and objectives, data collection, analysis and conclusion.

    Research mainly follows empirical approach which relies on direct observation and experimentation in acquisition on new knowledge (Kazdin, 2003).

    Direct observation of a population (surveys) or manipulation of variables to influence outcome (Experiments) are major sources of

    data.

    Both Surveys and Experiments involve some degree of measurements to obtain data. These measurements must be carefully carried out to

    obtain accurate data.

    In this course, we shall concentrate on methods and measurements techniques of data collection, analysis and reporting in both parts.

    3/5/2014 7

  • Categories of Research

    There are two broad categories of research a researcher should be familiar with;

    Qualitative or quantitative.

    Quantitative research involves studies that make use of statistical analyses to obtain their ndings. Key features include formal and systematic measurement and the use of statistics.

    Qualitative research involves studies that do not attempt to quantify their results through statistical summary or analysis. Qualitative studies typically involve interviews and observations without formal measurement. A case study, which is an in-depth examination of one person, is a form of qualitative research. Qualitative research is often used as a source of hypotheses for later testing in quantitative research.

    In either category of research either Primary or Secondary Data is required; Primary data; obtained by the researcher through surveys and experimentation. Secondary data are data that were collected for some purpose other than your study

    Examples- government records, internal documents, previous surveys

    3/5/2014 8

  • Sources of Data by Categories

    Qualitative Focus Group

    In-Depth Interview

    Case Study

    Participant observation

    Secondary data analysis

    Quantitative Questionnaires

    Experiments

    Structured observation

    Secondary data analysis

    3/5/2014 9

  • Surveys and Experiments

    Survey - to collect data for the analysis of some aspect of a group or area.

    measure things as they are; that is, snapshot of population at one point in time.

    You can not change the variables being measured, but record them as they are.

    often used as a synonym for questionnaire. Methods of data collection include; telephone, self-administered,

    personal interview, etc

    Experiment - manipulate at least one variable (treatment) to evaluate response, to study cause-effect relationships.

    Can be either field and laboratory experiments. Example; measuring the effect on cement content on crushing

    strength (response) of blocks is an experiment. Cement content is a variable that is being manipulated.

    3/5/2014 10

  • Data Collection Methods: Surveys

    Most projects in construction management,

    valuation and quantity surveying will mainly involve surveys and not many experiments.

    How can we collect data in surveys? Face to face (talk) Telephone call respondent and ask a series of

    questions

    Computer /Internet- Design and send survey instrument on-line, eg using survey monkey

    Mail (no outdated as post mails not working efficiently) Observe (on-site) Gather secondary sources

    3/5/2014 11

  • General Guidelines for Suitable Method of

    Data Collection

    1. Describing a population - surveys

    2. Describing users/visitors - on-site survey

    3. Describing non-users, potential users or general population - household survey

    4. Measuring impacts, cause-effect relationships - experiments

    5. Short, simple household studies phone

    6. Captive audience or very interested population survey (self-administered questionnaire)

    7. Testing new ideas - experiments or focus groups

    8. In-depth study surveys (in-depth personal interviews, focus groups, case studies).

    9. Anytime suitable secondary data exists - secondary data

    3/5/2014 12

  • Survey Implementation Process

    Survey planning define the problem & hypotheses;

    Survey design background info, design procedures, organization, sampling, drafting questions, constructing survey;

    Field implementation pretesting, training, briefing, interviewing & data collection;

    Data preparation coding & data entry, cleaning, programming, compiling; and

    Data analysis analysis, testing, reporting, using.

    3/5/2014 13

  • Survey Data Collection Methods: In-

    depth Interviews

    One person answers detailed questions; Can answer many open-ended questions

    Can be recorded (audio or video)

    Usefulness;

    Explorative research,

    Cannot be generalized, and

    Mainly useful for qualitative research

    Meeting with people face-to-face can be the most effective method of asking questions.

    It is also perhaps the most expensive and time consuming

    3/5/2014 14

  • Survey Data Collection Methods: Focus

    Groups

    Qualitative method similar to in-depth interview.

    A group of selected persons answers detailed questions in a group discussion lead by a

    moderator.

    Person selection.

    Homogeneity (similar attributes) vs. heterogeneity (different attributes)

    Ideally 5-7 persons.

    Many open questions.

    Documented with video taping.

    Generalization of results. 3/5/2014 15

  • Merits and Demerits of Data Collection

    by Focus Groups

    Merits

    Relaxed atmosphere.

    Less pressure on individuals.

    Productive and creative character of discussion.

    Insights into attitudes and ideas of people.

    Demerits

    Hard to recruit people

    Importance of moderator

    Expensive or costly

    Time consuming data analysis

    Generalization of results

    3/5/2014 16

  • Survey Data Collection Methods:

    Telephone Interviews

    Calling subjects on the telephone can be a very effective way of

    interviewing:

    It is faster, less expensive in general.

    But people are more likely to refuse to participate.

    3/5/2014 17

  • Survey Data Collection Methods: Mail

    Surveys

    Survey instruments (questionnaires) can be mailed to subjects for them to

    fill out on their own and mail back. Can be done in large numbers.

    Often results in a high rate of errors and non-response.

    3/5/2014 18

  • Merits and Demerits of Mail Surveys

    3/5/2014 19

  • Survey Data Collection Methods:

    Internet/E-mails

    Merits

    Speed

    Practically no cost once the set up has been completed.

    Can include pictures and sound.

    Some webpage surveys can use complex skip logic,

    randomizations, etc, not generally

    possible with paper or e-mail

    surveys

    People may respond more honestly to sensitive questions

    when giving their answers to a

    computer.

    Web page surveys eliminate the need for data entry.

    Demerits

    Only reach people with e-mail/ internet access does not reflect the population

    as a whole.

    People dislike unsolicited e-mail.

    E-mail addresses needed.

    People can easily quit in the middle of the survey.

    No control over who replies to (pop-up) webpage surveys.

    Multiple responses/ignored instructions (advanced systems can check such

    things) .

    Most e-mail surveys cannot use skip logic, randomization, etc.

    3/5/2014 20

  • Enhancing Quality of Survey Data

    3/5/2014 21

  • Categories of Survey Errors

    Sampling errors Sampling errors are the random errors that are introduced into the

    survey simply because not every member of the survey population is

    included in the drawn sample.

    Sampling errors reflect the potential variability between the estimate of a parameter in the sample and its true value in the population.

    These errors affect the precision of the survey results.

    Non-sampling errors and biases Caused by problems that can occur during the survey design and data

    collection stages which may cause survey measures and parameter

    estimates to be systematically incorrect.

    Non-sampling errors reflect how well the information is collected, and include non-response biases often caused by refusals and response

    biases which reflect a systematic distortion of survey responses.

    These errors affect the accuracy of the survey results.

    3/5/2014 22

  • Precision vs Accuracy (Reliability vs Validity-Bulls eye example)

    3/5/2014 23

  • Sampling Errors

    3/5/2014 24

  • Non-Sampling Errors

    Non-sampling errors occur due to mistakes made along the process of data acquisition; faulty

    questionnaire or measuring instrument.

    Increasing sample size will not reduce this type of errors.

    There are three types of non-sampling errors; Errors in data acquisition,

    Non-response errors,

    Selection bias.

    3/5/2014 25

  • Data Acquisition Error

    3/5/2014 26

  • Non-response Error

    3/5/2014 27

  • Selection Bias

    3/5/2014 28

  • Strategies to Reduce Errors

    Sampling Design.

    Design of Survey Instrument (Questionnaire).

    3/5/2014 29

  • Sampling Design: Key Questions

    What are the primary objectives and constraints of the survey?

    What are the variables of greatest interest and the desired level of precision?

    How should the study population be defined?

    What information is readily available on the variables of interest?

    Which sampling method should be used to meet the precision requirements?

    What sample size is required to satisfy these precision requirements?

    Are there enough resources to collect such a sample?

    What is the precision and confidence level corresponding to a smaller sample size?

    3/5/2014 30

  • Basic Sampling Concepts

    Target Population Specifies all elements of interest to the researcher within geographical, business or professional boundaries. It could be

    construction firms at a macro level or internal elements within individual

    construction firms at micro level. This is defined by the purpose of the

    research.

    Sampling Unit: In the example above, the sampling unit could individual construction firms or specific elements within individual firms

    that will be compared across a defined spectrum of construction firms.

    Sampling Frame:

    list of all sampling units

    depends on such factors as the target population, the mode of data collection, and the sampling unit.

    Sampling strategy :

    Probability sample

    Non-probability sample

    3/5/2014 31

  • Class Exercise: Performance Evaluation

    Problem Background:

    It is University Policy that individual academic departments invite professors from other

    universities as External Examiners every academic year to evaluate the quality of course

    assessments in a particular program. In Academic Year 2010/11, the External Examiner

    evaluated the quality of examinations questions, marking guides, and performance of randomly

    selected five (05) students from BSC (CM) program for both Semesters I and II. In his

    assessment, he observed that in general, Semester II performance was better than Semester I and

    that female students performed even better. However, it is not clear to the Quality Control

    Directorate whether to uphold the observation of the external examiner or not.

    Questions that have to be answered to solve the problem:

    1) Is the professors assessment valid? (This is a problem)

    Not sure, we need data.

    2) How can we prove the validity of the Professors observation?

    We need to collect data on performance of students in the two semesters in question.

    3/5/2014 32

  • Class Exercise: Performance Evaluation

    Problem 3) What type of data is required to prove the Professors observation?

    Quantitative: GPA of CM II students, academic year 2010/11 in Semester I & II, male and female

    4) How Can we collect this data?

    Seek data from CEDAT registrar (secondary data sources) or collect it from students themselves (primary data sources)

    5) What is the target population?

    CM II Students

    6) What is the sampling frame?

    The problem has two strata; Semester I & II, and Gender (male & Female), so random sampling by strata is advised.

    7) What method of data collection should we use?

    The data required is resident in the target population. That is, it is already there. So a survey using a questionnaire is appropriate. Or obtain secondary data from CEDAT registrar 3/5/2014 33

  • Class Exercise: Performance Assessment

    Problem

    9) How should the sample be drawn from the Population?

    Random sampling of the population in respective strata (Male and Female).

    10) How can we control sampling errors?

    Random sampling will eliminate sampling errors, resulting from say only smart students sitting together that would influence the outcome.

    11) How should the data collection instrument be designed?

    What data needs to be collected in the instrument?

    The data collection instrument should be a questionnaire, it should only contain the data required to solve the problem, and none that can be used to positively

    identify the respondent. This creates confidence in the respondent to give accurate

    data. Remember this in all your future research endeavors!

    3/5/2014 34

  • Questionnaire for the Class Exercise:

    Performance Evaluation Problem

    Makerere University

    College of Engineering, Design, Art and Technology

    School of Built Environment

    Department of Construction Economics and Management

    RESEARCH QUESTIONNAIRE:

    Evaluation of Performance of BSC(CM) Program for Academic Year 2010/11

    Background:

    Dear Respondent, the Department of Construction Economics and Management is

    carrying out research as per the above title. You have been identified as one of the

    key respondents in the study population. Kindly fill-in this questionnaire and return it

    to the researcher as soon as you can. All your responses will be treated with utmost

    confidentiality, and we also promise to communicate to you the results of the

    research as soon as it is concluded.

    Thanks

    Researcher.

    3/5/2014 35

  • Questionnaire for the Class Exercise:

    Performance Evaluation Problem

    Academic Year: 2010/11

    Sex of the Respondent (tick appropriately)

    Male Female

    Grade Point Average (GPA) for Semester I (Write to 2 decimal places)

    : ----------------------

    Grade Point Average (GPA) for Semester II (Write to 2 decimal places)

    : ----------------------

    End of Questionnaire

    Thank You

    3/5/2014 36

  • Questionnaire Data Collection Results

    Grade Point Averages (GPA) for Semester I & II, Academic Year 2010/11 for BSC (CM), Year II

    SEM I (M+F) SEM II (M+F) SEM I (M) SEM II (M) SEMI (F) SEM II (F) SAMPLE A (SEM I) SAMPLE B (SEM I) SAMPLE C (SEM I)

    4.93 4.75 4.93 4.75 3.53 3.5 3.7 4.02 3.96

    4.55 4.29 4.55 4.29 2.2 3 3.95 3.91 4.56

    4.52 4.46 4.52 4.46 4.52 4.3 3.93 4.23 3.56

    4.5 4.28 4.5 4.28 3.7 3.53 4.28 4.27 3.3

    3.95 3.97 3.95 3.97 4.59 4.49 3.87 3 3.98

    3.41 3.12 3.41 3.12 4.55 4.35 4.27 4.27

    3.96 3.8 3.96 3.8 3.82 3.81 4.36 4.23

    3.93 3.93 3.93 3.93 4.3 3.73

    4.66 4.52 4.66 4.52 4.36 4.1

    4.2 4.32 4.2 4.32 3.91

    3.75 3.7 3.75 3.7 2.93

    3.67 3.05 3.67 3.05 3.09

    4.56 4.3 4.56 4.3 3.88

    3.56 3.7 3.56 3.7

    3.5 3.01 3.5 3.01

    3.8 3.93 3.8 3.93

    4.02 3.88 4.02 3.88

    4.28 4.19 4.28 4.19

    3.92 3.74 3.92 3.74

    4.02 3.71 4.02 3.71

    3.87 3.66 3.87 3.66

    3.91 3.98 3.91 3.98

    4.23 4.2 4.23 4.2

    4.27 4.14 4.27 4.14

    3 2.97 3 2.97

    4.27 4.09 4.27 4.09

    4.36 3.35 4.36 3.35

    2.48 3.12 2.48 3.12

    3.3 3.2 3.3 3.2

    3.98 4.55 3.98 4.55

    4.27 3.54 4.27 3.54

    4.3 4.32 4.3 4.32

    4.36 4.07 4.36 4.07

    3.91 4.34 3.91 4.34

    2.93 3.14 2.93 3.14

    3.09 3.89 3.09 3.89

    3.88 3.56 3.88 3.56

    4.23 4.48 4.23 4.48

    4.48 4.48 4.48 4.48

    3.73 3.02 3.73 3.02

    4.44 2.96 4.44 2.96

    4.48 4.56 4.48 4.56

    4.1 3.12 4.1 3.12

    3.53 3.5

    2.2 3

    4.52 4.3

    3.7 3.53

    4.59 4.49

    4.55 4.35

    3.82 3.81

    3/5/2014 37

  • Summary Statistics: Measures of

    Location and Variance

    Statistical Descriptors SEM I

    (M+F) SEM II

    (M+F) SEM I (M) SEM II (M) SEM I (F) SEM II (F) SAMPLE A

    (SEM I) SAMPLE B

    (SEM I) SAMPLE C

    (SEM I)

    Mean 3.97 3.85 3.99 3.85 3.84 3.85 3.95 3.89 3.97

    Standard Error 0.08 0.07 0.08 0.08 0.32 0.21 0.09 0.15 0.13

    Median 4.00 3.91 4.02 3.93 3.82 3.81 3.93 4.02 3.98

    Mode 4.27 3.12 4.27 3.12 #N/A #N/A #N/A 3.91 #N/A

    Standard Deviation 0.56 0.53 0.51 0.53 0.85 0.55 0.21 0.53 0.39

    Sample Variance 0.31 0.28 0.26 0.28 0.72 0.30 0.04 0.28 0.15

    Kurtosis 1.55 -1.14 0.96 -1.13 1.83 -1.19 2.17 -0.38 -0.21

    Skewness -1.12 -0.29 -0.91 -0.29 -1.29 -0.33 0.97 -1.08 -0.32

    Range 2.73 1.79 2.45 1.79 2.39 1.49 0.58 1.43 1.26

    Minimum 2.20 2.96 2.48 2.96 2.20 3.00 3.70 2.93 3.30

    Maximum 4.93 4.75 4.93 4.75 4.59 4.49 4.28 4.36 4.56

    Sum 198.47 192.37 171.56 165.39 26.91 26.98 19.73 50.53 35.69

    Count 50.00 50.00 43.00 43.00 7.00 7.00 5.00 13.00 9.00

    Largest(1) 4.93 4.75 4.93 4.75 4.59 4.49 4.28 4.36 4.56

    Smallest(1) 2.20 2.96 2.48 2.96 2.20 3.00 3.70 2.93 3.30

    Confidence

    Level(95.0%) 0.16 0.15 0.16 0.16 0.79 0.51 0.26 0.32 0.30

    3/5/2014 38

  • Data Inspection: Overview of

    Descriptive The mean GPA for Semester I (3.97) is higher than that for Semester II (3.85); in

    which case you would conclude that in fact Semester II performance was lower

    than Semester I. Then conclude that the professors observation was not correct.

    Not necessarily true!!

    This conclusion would only be based on measure of location (mean), neglecting measures of dispersion (standard deviation) that would tell us whether the

    difference is significant. Will deal with this later.

    A similar observation hold true for female student performance. Notice how sample mean varies with sample sizes, recall how the samples were selected. The

    method and sample size reduces sampling error.

    You ought to appreciate how the data was collected, the reason, and therefore the kind of analysis we will do now on in the course.

    In this lab, you are required to reproduce these results your self. Download the Excel File from your Group Email.

    3/5/2014 39

  • Steps in Sampling

    Define study population.

    Specify sampling frame and unit.

    Specify sampling method.

    Determine sample size.

    Choose sample. 3/5/2014 40

  • Sampling Methods

    Simple random sample.

    Systematic sample.

    Stratified sample.

    Cluster sample.

    Choice-based sample.

    Combinations

    3/5/2014 41

  • Simple Random Sampling

    All members of a given population have the same chance of being selected.

    Selection of each member must be independent from the selection of any other member of the sample.

    Put every members of the population in a pool, then randomly select from that pool. e.g. lottery.

    Assigns a single number to each element in the list then randomly draws the sample by choosing numbers.

    3/5/2014 42

  • Systematic Sampling

    A starting point in a list is randomly determined and every kth element of the sampling frame is selected

    thereafter.

    From a sampling frame of 1,000 a sample of size 100 would be drawn taking every 10th element.

    (sampling interval = 1000/100 = 10)

    3/5/2014 43

  • Systematic Sampling

    With this approach, rather than randomly selecting from the sampling frame, the researcher selects sampling units in

    sequences separated by a preset interval.

    Provided that the sampling frame order is relatively unbiased, this approach is essentially equivalent to the simple

    random sample.

    Often, in personal surveys, fieldworkers are instructed to approach every n person passing a certain point.

    3/5/2014 44

  • Stratified Sampling

    This sampling procedure separates the population into mutually exclusive sets (strata), and then draws simple

    random samples from each stratum.

    3/5/2014 45

  • Stratified Sampling

    Stratified sampling is based on the fact that a homogeneous population produces samples with

    smaller sampling errors.

    a population is divided into subgroups (strata), according to certain stratification variables.

    appropriate numbers of elements are drawn from each stratum proportionately.

    3/5/2014 46

  • Cluster Sampling

    Cluster sampling is a simple random sample of groups or clusters of sampling units. All of the units within a selected cluster may be included in the sample.

    Workplace/establishment surveys are examples of cluster surveys. Some small number of establishments are first selected from the population of all

    establishments within a study area. Employees and visitors are then sampled

    within the selected establishments.

    Households, construction firms by classification are examples of clusters.

    This procedure is useful when;

    it is difficult and costly to develop a complete list of the population

    members.

    the population members are widely dispersed geographically.

    Cluster sampling may increase sampling error, because of probable similarities among cluster members.

    3/5/2014 47

  • Questionnaire Construction

    Good questionnaires are more than sets of well written questions.

    How items are put together into a complete survey instrument is also very important.

    The questionnaire should be organized in a concise manner that keep the interest of the respondents, encouraging them to complete the entire

    questionnaire.

    The format is as important as the words;

    Spread the questions out, use white space.

    A good layout should minimize errors by making directions, questions and response items clear and easy to follow.

    3/5/2014 48

  • Questionnaire Construction Guidelines

    A good questionnaire must be well designed:

    Keep the questionnaire as short as possible.

    Ask short, simple, and clearly worded questions.

    Start with demographic questions to help respondents get started comfortably.

    Use dichotomous and multiple choice questions.

    Use open-ended questions cautiously.

    Avoid using leading-questions.

    Pretest a questionnaire on a small number of people.

    Think about the way you intend to use the collected data when preparing the questionnaire. 3/5/2014 49

  • Questionnaire Construction: What to

    Avoid Confusing Questions

    Outside respondents experience

    Technical terms

    Assuming knowledge

    Overfamiliarity

    Uncommon idioms/slangs

    Ambiguous questions Confusing two-parts questions

    Indefinite comparisons

    Indefinite persons/Places, e.g neighborhoods.

    Incomplete questions

    Imprecise

    Indefinite in time

    Loaded Questions Provide unfair alternatives

    Link personalities to questions

    Link institutions to questions 3/5/2014 50

  • Planning and Designing a Research Study

    3/5/2014 51

  • Steps in Planning and Designing a Research Study

    Choose a research topic

    Carry out literature review

    Formulate a research problem

    Define objectives

    Articulate research hypothesis

    Choose variables to study

    Define the methods: Tools and sampling plan

    Collect the necessary data

    Analysis the data: Descriptive and inferential statistics

    Report Findings based on the hypothesis and collected data precisely and clearly

    State any major constraints and/or challenges to the study.

    3/5/2014 52

  • Choosing a Research Topic

    Out of Interest Something at heart!

    Problem Solving Real or imaginary, e.g. improve productivity, efficiency, scheduling or financial control of large construction projects.

    Previous Research Continuation of unfinished research.

    Testing a theory about something.

    Advise from research supervisor or funder.

    Example: Measuring productivity of masons on building projects below and above window levels.

    3/5/2014 53

  • Examples of Past Student Research Titles

    3/5/2014 54

    Name Year Program Research Title

    John Musiime 2012 CM Development of Visual Basic Application in Excel for

    Management of Road Construction Projects in Uganda

    Angella Uwase 2011 QS Assessment of the Effect of Equipment Availability and

    Costs on Road Maintenance projects

    Edgar Mukata 2013 CM Material Management on Large Building Construction Sites: A

    case study of Kampala City

    Timothy Ndifuna 2013 CM Assessing the Quality of Burnt Clay and Earth Bricks used in

    Building Projects

    Angella Asiimwe 2011 QS Empirical Research on Unit Rate of Paving a Kilometer of

    Road in Uganda: A Case of Kubiri-Northern By-Pass Link

    Jonathan Gombya 2013 CM The Extent of delays and Cost Overruns on Private Building

    Projects in Uganda

    Bongole Lutaaya 2012 CM Evaluation of the Extent of Use of Partition Walling in

    Office Buildings in Kampala

    Darlius Tumwiine 2011 CM Assessing the Quality of Precast Concrete Blocks on Market

    in Kampala

  • Literature Review

    Literature review simply means getting familiar with published work about the topic. Example: Productivity of Masons.

    The objective is to getting an understanding of how the topic has been addressed by other researcher; definitions, measurement criteria, variables,

    methods and tools of data collection, data analysis and reporting.

    The information is summarized in a good format that guides the reader into deeper understanding of the research topic.

    Literature review follows approved referencing and citation formats; for example APA (American Psychological Association) that Makerere

    University uses.

    3/5/2014 55

  • Literature Review: Citation and

    Referencing in APA

    The objective of citation and referencing is to avoid plagiarism, which is using someones work without attrition. It is a major academic offense.

    Citation formats in APA system is given as a handout as part of the Class Notes. Demonstrate in Class an Example in Mendeley.

    Remember to discuss the review coherently and clearly so as not to confuse the reader.

    If you find that at the end of the review, you are unable to proceed with any of the problem definition, methods, variables, objectives, etc, it

    simply means you have not exhausted the literature on the subject matter.

    3/5/2014 56

  • Formulating a research Problem

    The research problem should in a way clearly and precisely define the problem of the study topic.

    Good research problems must meet three criteria (see Kerlinger, 1973). First, the

    research problem should describe the relationship between two or more variables. Second,

    the research problem should take the form of a question. Third, the research problem

    must be capable of being tested empirically (i.e., with data derived from direct

    observation and experimentation) Marczyk, Dematteo & Festinger (2005).

    The research question must clearly indicate what is being study clearly and precisely. In example topic, the research question (s) as part of the

    problem definition would be Is the productivity of masons on building projects

    different working below and above window levels? If so, what factors greatly influence

    this productivity? These are research question examples for the title.

    3/5/2014 57

  • Define the Objectives

    The objective is the main intent of the study, similar to the study title rephrased.

    If the title is Measuring productivity of masons on building projects below and above window levels, then the objective would be;

    To measure productivity of masons on building projects below and above window levels.

    There are also specific objectives, which are sub-tasks that have to be fulfilled to realize the general objective.

    3/5/2014 58

  • Define the Objectives: Specific Objectives

    The specific objectives represent sub-tasks that have to be performed in the study to realize the main objectives and/or

    solve the problem.

    Simply put, its a breakdown of activities that have to be performed in the study to realize the main objective.

    Specific objectives are several depending on the extent of the study.

    3/5/2014 59

  • Research Hypothesis

    A hypothesis is an educated and testable guess to an answer to the research question (Marczyk, Dematteo & Festinger, 2005).

    It is an attempt to explain, predict, and explore the phenomenon of interest.

    There are two types; null and alternative hypothesis

    The null hypothesis represents the educated guess of the researcher, while the alternative represents the contrary.

    The hypothesis could be direction or non-directional.

    3/5/2014 60

  • Research Hypothesis

    The null hypothesis predicts no difference between variables measured across time or treatment levels.

    For example in earlier exercise in class, the null hypothesis would be:

    Ho: There is no difference in performance between Sem I and II of CM I

    students in 2010/11.

    The alternative (Ha) predicts there will be a difference between performance in the two semesters.

    However, the professor has already given the null hypothesis

    Ho: Performance of CMI students in 2010/11 was greater for Sem II than Sem I.

    Ha: Performance of CM I students in 2010/11 was not greater (less or equal) for Sem II than Sem I.

    3/5/2014 61

  • Choosing Variables to Study

    A variable is anything that can take on different values. For example, height, weight, age, race, attitude, and IQ are variables because there are different

    heights, weights, ages, races, attitudes, and IQs (Marczyk, Dematteo &

    Festinger, 2005).

    In construction management, valuation and quantity surveying, variables may include; productivity, compressive strength of concrete elements, cost per unit,

    consumption per unit time, etc.

    In the class exercise, the main variable was GPA because it varies amongst students, and is a measure of academic performance.

    In planning your study, identify the variables that will be measured. It is these that you design for a data collection method.

    Many categories of variable exists for a research study. 3/5/2014 62

  • Choosing Variables to Study: Types

    Two main types of variables exist:

    Dependent a measure of the effect (if any) of the independent variable. GPA for performance as in the class exercise.

    Independent the factor that is manipulated or controlled by the researcher and independent of the outcome being

    measured. These were not considered in the exercise but

    could include; sex, aggregate weighted score in Advanced level,

    attendance per semester per student, etc.

    The independent variables, the researcher seeks to explain whether the dependent variable is predictable, and therefore

    useful in regression and correlation analysis. 3/5/2014 63

  • Categorical v Continuous Variables

    Categorical variables are variables that can take on specic values only within a dened range of values.

    Examples; gender, marital status, class of construction firms, class of academic degrees, etc.

    Continuous variables are variables that can theoretically take on any value along a continuum.

    Examples; height, weight, income, volume, weight, time, area, etc.

    3/5/2014 64

  • Research Methodology : Methods

    There is a distinction between methods and methodology.

    According to Blaxter, Hughes & Tight (2006), the distinction is as follows;

    Method means the tools of data collection or analysis: techniques such as questionnaires and interviews.

    Methodology means the approach or paradigm that underpins the research. That is, whether it is a quantitative or

    qualitative research; experiment or survey. A clear

    statement must be defined in the methodology citing

    reasons based on literature review and/or experience.

    3/5/2014 65

  • Choosing a Research Method: Factors

    (Blaxter, Hughes and Tight, 2006)

    Research Questions: What exactly are you trying to nd out? This can lead you either into the quantitative or qualitative direction.

    Are you interested in making standardized and systematic comparisons or do you really want to study this phenomenon or situation in detail? Experiment or survey.

    The Literature: How have other researchers dealt with this topic? To what extent do you wish to align your own research with standard approaches to the topic?

    Practical Considerations: Issues of time, money, availability of samples and data, familiarity with the subject under study, access to situations, gaining co-operation.

    Knowledge payoff: Will you learn more about this topic using quantitative or qualitative forms of research? Which approach will produce more useful knowledge? Which

    will do more good?

    Style: Some people prefer one approach to another.

    3/5/2014 66

  • Choosing a Research Method: Details

    Required The following need to be stated with respect to the research

    question, hypothesis or specific objectives;

    Nature of data collection (survey or experiment),

    Data collection instrument: questionnaire for survey and experiment design for experiments stating the

    equipment and set-up,

    Target population,

    Sampling frame,

    Sampling method, and sample size,

    Data collection, preparation and analysis; statistical or other tests to be used to analyze the data, and

    How the expected outcome will be reported.

    3/5/2014 67

  • Data Collection: Measurements

    The main concern in data collection is method of measurement.

    There are two reasons why measurement technique is important in data collection;

    Allows the researcher to quantify the variables

    The level of statistical sophistication required to analyze the data depend on the scale of measurement to quantify the variables.

    There are four main measurement scales; nominal, ordinal, interval and ratio.

    The measurement scale and equipment affects reliability and validity of measurements.

    3/5/2014 68

  • Data Collection: Nominal Scale

    Characteristics

    Used only to qualitatively classify or categorize not to quantify.

    No absolute zero point.

    Cannot be ordered in a quantitative sequence.

    Impossible to use to conduct standard mathematical operations.

    Examples include; gender, religious and political afliation, and marital status.

    Purely descriptive and cannot be manipulated mathematically.

    3/5/2014 69

  • Data Collection: Ordinal Scale

    Characteristics

    Build on nominal measurement.

    Categorize a variable and its relative magnitude in relation to other variables.

    Represent an ordering of variables with some number representing more than another.

    Information about relative position but not the interval between the ranks or categories.

    Qualitative in nature.

    Example would be nishing position of runners in a race, position in class etc.

    Lack the mathematical properties necessary for sophisticated statistical analyses.

    3/5/2014 70

  • Data Collection: Interval Scale

    Characteristics

    Quantitative in nature.

    Build on ordinal measurement.

    Provide information about both order and distance between values of variables.

    Numbers scaled at equal distances.

    No absolute zero point; zero point is arbitrary.

    Addition and subtraction are possible.

    Examples include temperature measured in Fahrenheit and Celsius.

    Lack of an absolute zero point makes division and multiplication impossible. 3/5/2014 71

  • Data Collection: Ratio Scale

    Characteristics

    Identical to the interval scale, except that they have an absolute zero point.

    Unlike with interval scale data, all mathematical operations are possible.

    Examples include; height, weight, and time.

    Highest level of measurement.

    Allow for the use of sophisticated statistical techniques.

    3/5/2014 72

  • Data Collection: Main Approach

    Methods

    Formal testing; using scientific equipment/tools and procedures.

    Interviewing; in a questionnaire by answering a series of guided questions.

    Global ratings (Also known as Likert Scale); such as personal attitude about a statement strongly disagree, disagree, agree or

    strongly agree. E.g on a scale of 1 to 5, indicate the extent to which you

    enjoy the course so far.

    1 2 3 4 5

    Hate it Neutral Love it

    Observation; has to do with time and a defined group.

    3/5/2014 73

  • Data Preparation, Analyses and

    Interpretation: Data Preparation

    The process of data analysis involves three main steps;

    (1) preparing the data for analysis,

    (2) analyzing the data, and

    (3) interpreting the data (i.e., testing the research hypotheses and

    drawing valid inferences).

    This includes; 1. Logging and tracking data

    2. Data screening

    3. Constructing a database

    4. Data entry and coding missing variables

    5. Data transformation if necessary: square root, log or inverse

    transformation of variables.

    3/5/2014 74

  • Data Preparation, Analyses and

    Interpretation: Data Analysis

    1. Descriptive Statistics:

    General description to summarize data: frequency tables, histograms, bar charts, box-plots, stem-and-leaf plots.

    Measures of central tendency or location: mean, median, mode.

    Measures of dispersion or variation: range, variance, standard deviation, interquartile range.

    Measures of association: correlation coefficient, coefficient of determination, and Pearson coefficient for ratio or interval scale data,

    Spearman Rank order for ordinal data.

    3/5/2014 75

  • Data Preparation, Analyses and

    Interpretation: Data Analysis

    2. Inferential Statistics:

    T-Test: to test mean difference between two groups on interval or ratio scales.

    Analysis of Variance (ANOVA): comparing means across more than two groups, an extension of the t-test.

    Chi-Square (2): Used to test hypothesis when the data is nominal or ordinal. Summarizes the discrepancies between

    observed and expected values.

    Regression: Simple linear, multiple and logistic regression. 3/5/2014 76

  • Interpreting Data and Drawing Inferences

    Data collection and statistical testing is about solving a problem, by answering a question through hypothesis testing

    using collected data so as to arrive at a conclusion.

    Therefore, based on data collected, you will test the hypothesis and make a statement about the findings

    statistically and the meaning of the outcome in real sense.

    Ordinary readers understand real sense not your statistics, but

    statistics is a tool to arrive at a good conclusion.

    Part II of this course, we will explore statistical approaches to descriptive and inferential statistics necessary for data analysis.

    3/5/2014 77

  • Ethical Considerations in Research

    Fundamental Ethical principles;

    Respect for persons, especially those you are superior.

    Justice in a way you choose research participants not avoid bias.

    Confidentiality; to treat results of your respondents with utmost confidentiality, not to be used in a form that can harm, injure your

    respondents. E.g. studying about robbers at the university and later

    disclosing to police who they are is unethical.

    Not to use minors and impaired persons in research because they do not make rational decisions.

    Protection of research assistants from danger of the research; appropriate protection, and safety measures.

    Avoid plagiarism; which is falsifying work, or coping someone elses work without due attrition. It is an offense. There are principles on how to

    avoid this. See the handout on referencing and citation in APA.

    END OF PART I

    3/5/2014 78

  • Course Work: Write a Research Proposal (20%)

    (Date of Issue:) Instructions:

    1) Following the concepts learnt so far in class, you are required to write a

    research proposal, bound and submitted in a hard copy to the Tutor

    not later than .

    2) You are expected to follow and apply most of the principles learnt in

    this course.

    3) Take this exercise seriously. Devote time and effort to it.

    4) All work or ideas produced MUST be original and reflect your own

    effort. Reproducing past work will be detected and will score no mark.

    5) The length of proposal should NOT BE MORE THAN 10 PAGES.

    3/5/2014 79

  • Format of the Research Proposal

    1) Title Page: This should include the University Name, College, School and Department, Title of research Topic, Name of Researcher and Registration

    Number (You), and Name of Supervisor (Tutor), and this Text A Research

    proposal submitted to the Department of Construction Economics and Management for the

    Award of a Degree of Bachelor of Science in Construction Management of Makerere

    University, and finally, month and Year. This should fit on one full page.

    2) Table of contents, List of figures /Tables, and List of acronyms.

    3) Chapter 1: Introduction including;

    a. a background to your research topic, identifying the need, social, economic, professional

    relevancy and urgency.

    b. Problem Statement

    c. Main Objective and Specific objectives,

    d. Hypothesis,

    e. Justification, and

    f. scope of the study.

    3/5/2014 80

  • Format of the Research Proposal

    3) Chapter 2: Literature Review using APA (American Psychological

    Association citation and referencing format) on past work about the research topic;

    a. Key definitions of terms in the research topic,

    b. Type of research and applicable procedures in the literature,

    c. Methods of data collection and measurements in general, and

    d. Data analysis approaches.

    4) Chapter 3: Methodology;

    a. Define the type of research,

    b. Method of data collection; Experimentation or survey.

    c. Type of Data to be collected including major variables (dependent and independent if

    applicable),

    d. Target population/experimental set-up, sampling frame, sampling methods, sample size,

    e. Data collection instrument (s) includes type of data to be collected, and

    f. Statement on how data will be analyzed (descriptive or inferential statistics in detail)

    3/5/2014 81

  • Format of the Research Proposal

    5) References ( Follow APA format).

    6) Time Schedule for the study; break down study per activity and how long it will take you to accomplish

    each activity.

    7) Estimated Budget for the study; stationery, transport, material purchases, remuneration of research

    assistants, hire of equipment if applicable, material testing if applicable.

    8) Appendices; copies of standards to use, questionnaire, or photos of equipment.

    NB:

    You are collecting data from different sources, kindly write it logically and coherently so as to interest the reader. Do not look at this as a mere class exercise. Just assume the TUTOR is a businessman with

    capacity to finance your research and you want him to get interested and fund it. In this tone, you will

    present the best of everything stated above, coherently and clearly so as to attract the necessary interest!

    You may have to consult other members of staff, practicing construction mangers among other people to identify a suitable topic and/or problem. You are also free to consult me during working time.

    GOOD LUCK

    3/5/2014 82

  • PART II: STATISTICS

    Descriptive Statistics

    Inferential Statistics

    3/5/2014 83

  • Terminology

    A population is a collection of all units of interest.

    A parameter is a numerical characteristic of a population eg ( mean, standard deviation, variance).

    A sample is a subset of a population that is actually observed.

    A random sample gives an equal pre-assigned chance to every unit of the population to enter the sample.

    The population and its parameters are typically unknown and the sample is used to infer the values

    of the parameters.

    A statistic is a numerical characteristic of a sample, used to infer the values of parameters. 3/5/2014 84

  • Statistics

    Science of collecting and analyzing data for drawing conclusions and making decisions.

    Descriptive Statistics and exploratory data analysis summary and description of collected data.

    Inferential Statistics generalizing from a sample to a population.

    3/5/2014 85

  • Descriptive Statistics

    Collect data e.g. Survey

    Present data e.g. Tables and graphs

    Characterize data e.g. Sample mean

    3/5/2014 86

  • Graphical Methods to Summarize

    Data Graphical methods to summarize data include;

    Pie charts, and

    Bar graphs.

    In each case, data is organized so that data falls into only one category of variable.

    Pie Charts - Basically are used for discrete variables with few outcomes. Choose a small number of categories for the variable because too many variables make the pie chart difficult to interpret.

    3/5/2014 87

  • Graphical Methods Bar Charts

    Used to display frequency data from qualitative variables.

    Label frequencies on one axis and categories on the other axis. This can be done pretty well and quick in MS Excel.

    For histograms, data has to be summarized in a frequency table from which frequency and relative frequency diagrams are drawn. However, histograms

    are only applicable to grouped data.

    The decision on how many classes to use depends on the range, your personal input, and number of class intervals that represent your data the

    best.

    3/5/2014 88

  • Histograms

    3/5/2014 89

  • Histogram Characteristics

    The location of the modal interval or class keeps shifting with the number of classes. Therefore very sensitive to the number

    of classes.

    If we are to compare two different samples or populations, it is important that the decision is based on a relative frequency

    histogram rather than a frequency histogram as it eliminates

    issues of variability in sample sizes.

    3/5/2014 90

  • Histogram Jargon

    A histogram with one major peak is called uni-modal, that with two major peaks is bi-modal and uniform if every interval has

    essentially the same number of observations.

    A histogram can also be described based on symmetry of the distribution created as;

    Symmetrical if the right and left sides have essentially the same shapes, and usually the mean, mode and median coincide.

    Skewed to the right if much of the data by frequency is to the left and very few to the right, and the median and mode are to the

    left of the mean. Long tail to the right.

    Skewed to the left if much of the data by frequencies is to the right and few to the left. Long tail to the left. The median and

    mode are to the right of the mean.

    See Class work Example 1 for the distinction 3/5/2014 91

  • Histograms for GPA in SEM I & II Exercise

    Recall the class exercise carried out in the first part of the course. Take a close look at the summary of descriptive statistics

    (Slide 16).

    For SEM I (M+F), note that the median and the mode are to the right of the mean, suggesting performance is skewed to the

    left. That is, you have many students scoring above 3.97 (the

    Mean) than below it. Does the histogram show the same?

    On the other hand, Sem II performance, the; mode is to the left of the mean, and the median to the right. Performance is

    neither skewed to the left or right. That is nearly symmetrical.

    Does it seem so?

    3/5/2014 92

  • Histograms for GPA in SEM I & II

    Exercise - Contd

    3/5/2014 93

  • Measures in Descriptive Statistics

    Graphical methods are mainly used to summarize the data and to give a visual trend about the data.

    The Histogram for instance will show where the data

    is located and variability.

    However, magnitude of measures of location and variability as seen from the histogram has to be

    determined. These are referred to as measures of

    location and variability or dispersion, summarized in

    the next slide.

    3/5/2014 94

  • Summary of Measures

    3/5/2014 95

  • Measures of Location

    3/5/2014 96

  • Mean

    3/5/2014 97

  • Characteristics of the Mean

    3/5/2014 98

    There is only one mean for dataset.

    It is influenced by extreme measurements.

    Means of subset can be combined to determine the mean of the complete data set.

    Can be used only with quantitative data.

    For group data the mean is rather stable even when data is organized into different classes.

  • Median

    3/5/2014 99

    The sample median, is the middle value in a set of data that is arranged in ascending order.

    For an even number of data points the median is the average of the middle two.

  • Characteristics of the Median

    3/5/2014 100

    There is only one median for dataset.

    Not influenced by extreme measurements.

    Medians of subset cannot be combined to determine the median of the complete data set.

    Can be used only with quantitative data.

  • Example

    3/5/2014 101

  • Median for Grouped Data

    3/5/2014 102

    The median for grouped data is given by;

    L + w/f (0.5n cfb)

    Where; L is the lower class boundary of the median class, w is the class width f is the frequency of the median class n is the total number of observations cfb is the cumulative frequency of the class before the

    median class.

  • Mode

    3/5/2014 103

    The measurement that occurs more often (with the highest frequency).

    The mode exhibits the following characteristics;

    There can be more than one mode for dataset.

    Not influence by extreme measurements.

    Modes of subset cannot be combined to determine the mode of the complete data set.

    Can be used for qualitative and quantitative data.

    In group data the mode can change depending on the categories (classes) used.

  • Mode for Grouped Data

    3/5/2014 104

    We can define the MODAL INTERVAL to be the class interval with the highest frequency.

    Since we would not know the actual measurements but only how many measurements fall into each

    interval, the mode is taken as the midpoint of the

    modal interval, it is an approximation of the mode

    of the actual sample measurement.

  • Percentiles

    3/5/2014 105

    Median divides data set into two parts of equal size.

    Quartiles divide the data set into 4 equal parts. That is the 25th and 75th percentiles are also referred to as Lower and upper quartiles respectively.

    Percentiles divide the data set into even finer parts, e.g. 99%.

  • Calculating Percentiles

    3/5/2014 106

    Order the n observations from smallest to largest.

    Determine the product np where, p is the proportion required. For example, p=0.25 for 25th percentile, p=0.75 for 75th percentile.

    If np is an integer, round it up to the next integer and find the corresponding ordered value.

    If np is an integer, say k, calculate the mean of kth and (k+1)th ordered observations.

    Consider the example in the next slide.

  • Worked Example on Percentiles and

    Quartiles

    3/5/2014 107

    Given the data below, obtain the quartiles; 25th, 50th and 75th and the 93rd percentile.

    Note the data is already sorted.

    221 234 245 253 265 266 271 272 274 276

    276 276 278 284 289 290 290 292 292 296

    297 298 300 303 304 305 305 308 308 309

    310 311 312 314 315 315 323 330 333 336

    337 338 343 346 355 364 366 373 390 391

  • Worked Example on Percentiles and

    Quartiles

    3/5/2014 108

    n=50, for 25th, 50th , 75th and 93rd percentiles, p=0.25, 0.50, 0.75, and 0.93 respectively.

    25th percentile (Lower quartile, Q1); np=50*0.25 = 12.5. Rounded up to 13. That is the 13th observation = 278

    The second quartile or median; np=50*0.50 = 25, an integer, so we obtain the mean of the 25th and 26th observations as follows (304+305)/2 = 304.5

  • Worked Example on Percentiles and

    Quartiles

    3/5/2014 109

    75th percentile (upper quartile, Q3), np = 50*0.75 = 37.5, rounded up to 38th observation which is 330.

    93rd percentile, np=50*0.93=46.5, rounded up to 47th observation which is 366.

    You can compute interquartile range = Q3-Q1= 330-278 = 52. It indicates that 50% of the observations are within 52 units of each other. Or simply bound by 278 and 330.

  • Boxplots

    3/5/2014 110

    A summary of information contained in quartiles can be summarized in a diagram called a boxplot.

    The diagram gives a visual representation of how the data is distributed from the smallest to the largest.

    The boxplot can be used to locate extreme observations in the data that are classified as outliers.

    Could be used to detect errors in data collected but not always.

  • Constructing a Boxplot

    3/5/2014 111

    The centre half of the data extending from the lower to upper quartile is represented by a rectangle.

    The median or second quartile is represented by a line through the rectangle.

    A line is drawn extending from the upper quartile to the largest value, and another from the lower quartile to the

    smallest value. These lines are called whiskers.

    If the data set is symmetrical, the box plot will also indicate so.

  • Constructing a Boxplot

    3/5/2014 112

    To identify outliers in the data set, the whiskers should extend 1.5*IQR from Q1 and Q3 respectively.

    If an observation is outside these whiskers, they are referred to as outliers.

    Therefore the boxplot can be used to show the outliers in the data. These could be real errors in data collection, or simply a

    non-homogeneous population as assumed in data collection.

  • Boxplot for the Example

    3/5/2014 113

    221 Median=304.5 391

    Q3=330 Q1=278

    Labwork: Construct a box plot for SEM I GPA (M+F) on Slide 41

  • Measures of Variability

    3/5/2014 114

    There are four major measures of variability;

    Range,

    Variance,

    Standard deviation, and

    Coefficient of variation.

  • Measures of Variability - Range

    3/5/2014 115

    The range is the difference between the largest and the smallest value of a data set.

    The range is influenced by the extreme values, and indicates how dispersed is the data set.

    The range divided by 4 is an approximation of the standard deviation. That is;

    Standard deviation = = (range/4)

  • Measures of Variability - Variance

    3/5/2014 116

    The variance is the mean of the square deviation.

    The deviation is the difference between individual observation and the mean. The sum of the deviation

    is always zero.

    Variance for a sample is denoted as (s2) and for a population as (2). Has square units of the data.

    Formula for ungrouped data;

  • Measures of Variability Standard

    Deviation

    3/5/2014 117

    The standard deviation is the square root of variance.

    It has the same units as the data.

    Standard deviation for a sample is denoted as (s) and for a population as (). Formula for

    ungrouped data;

  • Measures of Variability Coefficient

    of Variation

    3/5/2014 118

    This is a measure of the extent of variability by expressing the ratio of the standard

    deviation to the mean multiplied by 100%.

    It is denoted as COV. If COV=0, it is a uniform distribution, COV=1 is a Poisson

    distribution; 0

  • Measures of Variability Coefficient

    of Variation

    3/5/2014 119

  • Example Question on Measures of

    Variability

    3/5/2014 120

    The following are 14 measurrements on the strength of paper to be used in Cardboards;

    121, 128, 129, 132, 135, 133, 127, 115, 131,

    125, 118, 114, 120, 116. Compute;

    a) The range and an approximation of standard

    deviation.

    b) The mean

    c) The Variance and standard deviation

    d) The coefficient of variation.

  • Probability Distributions

    3/5/2014 121

    In the first part of the course, we saw that variables of interest are the ones for which

    data is collected.

    Such variables are also referred to as random variables, because they can take on any value.

    These values when plotted on a histogram, the form a probability distribution if drawn

    with relative frequency on the vertical axis.

  • Probability Distributions

    3/5/2014 122

    Several distributions result depending on the nature of data; discrete or continuous.

    Examples of probability distributions that may result from the data are;

    a) Binomial (discrete)

    b) Poisson (discrete), and

    c) Normal (continuous).

  • Binomial Distribution

    3/5/2014 123

    Binomial distributions result from Bernoulli trials in which there are only two outcomes; success or

    failure.

    The experimenter fixes the number of trials (n) and counts the number of successes (s) in n trials.

    The probability of success is the same for each trial.

    The outcomes (successes) from different trails are independent.

  • Binomial Distribution

    3/5/2014 124

    Question: Can the following be treated as a Bernoulli trial? Drivers stooped at a roadblock to be checked

    for possession of drivers license.

    Solution

    There are only two outcomes, you either have a drives license or not.

    If we treat all drivers the same, they will have equal probability of not having a drivers license.

    Possession of one drivers license is independent of the next driver in successive arrival.

    It is therefore a Bernoulli Trial, and the distribution will be Binomial.

  • Binomial Distribution- Practical Problem

    3/5/2014 125

    As an example, if you wish to determine whether paving blocks manufactured in Kampala meet

    specifications, and if you decide to sample at least 10

    blocks from each manufacturer. If success is defined

    as block meeting specifications, then the number of

    successes in n trials from m manufacturers follow a

    binomial distribution.

  • Binomial Distribution

    3/5/2014 126

    Where p is the probability of success in an experiment, n is the number of trials, x is the value

    for which probability is being computed.

    The mean of a binomial distribution = np and the variance = np(1-p).

    If n is sufficiently large, the binomial distribution approximates to normal distribution.

  • Binomial Distribution Question

    3/5/2014 127

    It has been claimed that in 60% of all solar-heat installations the utility bill is reduced by at least one-

    third. Accordingly, what are the probabilities that the

    utility bill will be reduced by at least one-third in;

    a) Four of five installations,

    b) At least four of five installations?

    c) Compute the mean and variance of the

    distribution.

  • Poisson Distribution

    3/5/2014 128

    Poisson distribution arises from a Poisson process in which measurements are time dependent.

    That is the number of events occurring in a fixed time interval. Examples, number of calls received

    every minute at a customer service centre, the

    number of students arriving at CEDAT every five

    minutes between 7 and 9 AM.

    Poisson process is always random with the mean equal to the variance.

  • Poisson Distribution Model

    3/5/2014 129

    Where lambda is the mean arrival in the time interval under consideration.

  • Normal Distribution

    3/5/2014 130

    The normal distribution is the most important distribution describing quantitative continuous data.

    The normal distribution describes most natural phenomena even for discrete variables of large samples.

    The distribution is symmetrical about the mean, and as discussed earlier, the mode, median and mean are located

    together.

    The symmetry allows the distribution to be standardized.

  • Normal Distribution Curve

    3/5/2014 131

  • Standardized Normal Curve

    Z is the score on a standardized normal curve, x is the random measurement with a

    standard deviation of sigma.

    3/5/2014 132

  • 3/5/2014 133

  • 3/5/2014 134

  • Properties of a normal

    distribution

    The standardized normal distribution has zero mean and standard deviation one.

    68% of the data will always lie within one standard deviation of the mean.

    95% of the data lies within approximately two standard deviations of the mean.

    The 95% is often taken in statistics as a confidence level to carry out further statistical tests.

    3/5/2014 135

  • Example Question

    The time for a super glue to set can be treated as a random variable having a

    normal distribution with mean 30 seconds.

    Find its standard deviation if the probability

    is 0.20 that it will take on a value greater

    than 39.20 seconds.

    3/5/2014 136

  • Solution: Presentation of Data

    3/5/2014 137

    =?

    =30 x=39.20

    P(x>39.20)=0.20

  • Solution: Computation

    Using the z-score equation; z1 = (x-)/, x=39.20, u=30, sigma is

    unknown.

    However, the Probability P(z>z1) =0.20. We can read the value of z1 for the probability from z-table. z1=0.84.

    Sigma = (39.20 30)/0.84 = 9.20/0.84 = 10.95 seconds

    3/5/2014 138

  • Exercise Question: Normal

    distribution

    The initial setting time of a new type of cement is a random variable having the

    normal distribution with mean = 4.76

    seconds, and standard deviation 0.04

    seconds. What is the probability that this

    kind of cement will set in;

    a) Less than 4.66 seconds

    b) More than 4.80 seconds

    c) Anywhere from 4.70 to 4.82 seconds.

    3/5/2014 139

  • Exercise Question: Solution

    3/5/2014 140

  • INFERENTIAL STATISTICS

    3/5/2014 141

  • Inferences Concerning the Mean

    Inferential statistics helps the researcher to generalize from information contained in the sample about the general population.

    The approaches available are hypothesis testing and confidence intervals usually that is all about the mean.

    We use the sampling distribution of the mean property.

    If several samples are drawn from the population that is normal, the distribution of the sample means is normal, with mean (mu) and standard deviation equal to standard error = (sigma/root n).

    The difference between the true mean and mean of the distribution of means is referred to as the Error or tolerance (E).

    3/5/2014 142

  • Sampling Distribution of the Means

    3/5/2014 143

    x= /n

    x-bar

    Error (E)

  • Estimation of the Error

    Recall that to compute z with random variable x, and standard deviation ;

    For a distribution of the means the z-score is computed as with x-bar as the mean of means;

    3/5/2014 144

  • Estimation of the Error (E)

    Re-arranging the formula for distribution of means;

    The difference between the mean of the means and the true mean is the error (E). However, it is always not known since the true mean is always unknown.

    We use the property of symmetry of normal distribution and probability to state the accuracy of the mean estimate.

    The 95% confidence level is often used state the confidence level. 99% confidence level is also common where accuracy of estimate is paramount.

    3/5/2014 145

  • Confidence Level Demonstration

    3/5/2014 146

    x= /n

    (upper estimate) (lower estimate)

    Probability associated with estimation of the

    mean (1-)

    /2 /2

  • 95% Confidence Level

    3/5/2014 147

    x= /n

    (upper estimate) Z0.025 = 1.96

    (lower estimate) Z0.025 = -1.96

    (1-0.05)=0.95

    0.025 0.025

  • 99% Confidence Level

    3/5/2014 148

    x= /n

    (upper estimate) Z0.005 = 2.575

    (lower estimate) Z0.005 = -2.575

    (1-0.01)=0.99

    0.005 0.005

  • Large Sample Confidence Interval

    for with known

    Recall that;

    We can re-arrange the above formula to create the confidence bounds about the true mean as

    follows;

    3/5/2014 149

    n

    zEx

    n

    zx 2

    nzx

    2

    nnzxzx

    22

  • Small Sample Confidence Interval

    for with unknown For small samples, with unknown population standard

    deviation, the z-score is replaced with a t from a t-

    distribution and standard deviation of the sample;

    We can re-arrange the above formula to create the confidence bounds about the true mean as follows;

    3/5/2014 150

    n

    stx2

    nstx

    2

    ns

    ns txtx

    22

    n

    stEx2

  • Example on Confidence Interval

    A random sample of size n=100 is taken from a population with =5.1. Given that the sample mean (x-bar)=21.6, construct a 95% confidence

    interval for the population mean

    Solution

    This is a large sample with known population standard deviation. So substitute the values in the confidence interval expression with z=1.96.

    The interpretation is that either the interval 20.6 to 22.6 contains the population mean or not, but we are 95% confident that it does.

    3/5/2014 151

    1001.5

    1001.5 *96.16.21*96.16.21

    6.226.20

  • Determination of Sample Size

    Recall that the Error (E) is given by;

    If we make n the subject in the above equation, we obtain;

    3/5/2014 152

    n

    zEx 2

    nzE

    2

    2

    2

    E

    z

    n

  • Example on Sample Size

    A research worker wants to determine the average time it takes a

    mechanic to rotate the tires of a car, and she wants to be able

    to assert with 95% confidence that the mean of her sample is

    off by at most 0.50 minutes. If she can presume from past

    experience that sigma=1.6 minutes, how large a sample will

    she have to take?

    Solution

    Substitute the values, E=0.50, z0.025=1.96, and sigma=1.60

    Approximately a sample of 40 will be required.

    3/5/2014 153

    3.39250.0

    60.1*96.1 n

  • Trial Question on Sample Size

    If we want to determine the average mechanical aptitude of a large group of workers, how large a random sample will we

    need to be able to assert with probability 0.95 that the sample

    mean will not differ from the true mean by more than 3.0

    points? Assume that it is known from past experience that

    sigma=20.0.

    3/5/2014 154

  • Hypothesis Testing

    There are many problems in which, rather than estimate the value of a parameter, we must decide whether a statement concerning a parameter is true or false.

    That is we must test a hypothesis or an assertion about a parameter.

    Recall from the class exercise about performance, we could test whether GPA in SEM II was more than GPA in SEM I. To validate this observation, we use hypothesis testing.

    Many similar observations, beliefs and assertions are validated this way scientifically.

    3/5/2014 155

  • Hypothesis Testing: Steps

    Formulate the null and alternative hypotheses

    Specify the probability of rejection, Type I error, also referred to as level of significance (alpha=0.05 or 0.01).

    Construct a criteria to reject the null hypothesis against the alternative (critical z- or t-score).

    Calculate from the data the value of the statistic on which decision is based (z-score or t-score).

    Decide whether to reject the null hypothesis or fail to reject it.

    3/5/2014 156

  • Hypotheses Concerning one

    Mean: Example A trucking firm is suspicious of the claim that the average

    lifetime of certain tires is at least 28,000 km. To check the claim, the firm puts 40 of these tires on its trucks and gets a mean lifetime of 27,463 km with a standard deviation of 1,348 km. what can it conclude if the probability of type I error (alpha=0.01)?

    Solution

    State the hypotheses: Ho & Ha Ho: Average lifetime of a certain tire is 28,000 km or greater (

    28,000).

    Ha: Average lifetime of a certain tire is less than 28,000 km ( < 28,000).

    Level of significance: =0.01

    3/5/2014 157

  • Hypotheses Concerning one

    Mean: Example Criterion: The critical z-score corresponding to alpha value

    of 0.01 for one-tailed distribution is Z= 2.33. Reject Ho if

    the modulus of Z calculated is greater than Z critical (2.33).

    Calculations:

    3/5/2014 158

    n

    xz

    0

    52.2

    52.2000,28463,27

    40

    348,1

    z

    z

  • Hypotheses Concerning one

    Mean: Example Decision: Since the modulus of Z calculated (2.52) is greater

    than Z critical (2.33), we reject the null hypothesis at

    alpha=0.01.

    Conclusion: The trucking firms suspicion that the average lifetime of a certain tire is less than 28,000 km is confirmed.

    Note that for small sample sizes and unknown sigma, you will have to use a t-test instead of a z-test.

    3/5/2014 159

  • Inferences Concerning Proportions

    Many engineering and construction management problems deal with proportions, percentages or probabilities.

    The information that is usually available for the estimation of a proportion is the number x, that an appropriate event occurs in n trials, occasions or observations.

    Sample proportion = x/n, population proportion is denoted as p.

    We shall not go through the derivation of the formula. Those interest could read Chapter 10, Johnson, Freund & Miller (2011).

    3/5/2014 160

  • Large Sample Confidence Interval

    for p

    3/5/2014 161

    nn

    xnn

    x nx

    nx

    nx

    nx

    zz

    11

    22

  • Example on Confidence interval of

    proportions If x=36 of n=100 persons interviewed are familiar with the

    tax incentive for installing energy saving bulbs, construct a

    95% confidence interval for the true population proportion.

    Solution

    x/n = 36/100 = 0.36; za/2 = 1.96.

    We are 95% confident that the population proportion of persons

    familiar with tax incentives is between 0.266 and 0.454.

    3/5/2014 162

    100

    64.036.0

    100

    64.036.096.136.096.136.0

    454.0266.0

  • Sample Size for Proportions

    If your research involves interviewing people or firms about a certain issues, with two outcomes where one is success, then

    the sample size should be computed as follows if p is known

    either from past studies or pilot study;

    If p is unknown, then the sample size should be computed as follows;

    3/5/2014 163

    2

    21

    E

    z

    ppn

    2

    41 2

    E

    z

    n

  • Example on Sample Size for

    Proportions Suppose we want to estimate the true proportion of

    contractors using prefabricated formwork to cast slabs, and

    that we want to be at least 95% confident that the error is at

    most 0.2. How large a sample will we need if

    a) We have no idea what the true proportion might be;

    b) We know the true proportion does not exceed 0.12?

    Solution

    a)

    b)

    3/5/2014 164

    1114.1088.012.0 22.096.1 n

    2501.2422.096.1

    41 n

  • REGRESSION ANALYSIS

    The main objective of many statistical investigations is to be able to predict outcomes on the basis of equations.

    Prediction is made based on dependent (response) and independent (explanatory) variables.

    For instance the crushing strength of concrete is dependent on water cement ratio, cement content, proportion of

    aggregates and age of concrete after casting.

    The independent variables are used to predict the dependent variable.

    3/5/2014 165

  • REGRESSION ANALYSIS: Model

    form The model form for simple regression model is as follows;

    Where Yi=the observation, i, of the dependent variable, 0 =population parameter for the intercept, 1= population

    parameter for the slope, Xi= is the independent variable

    corresponding to dependent variable, i, and i=error term that

    is independent and normally distributed with mean zero, and

    variance, 2.

    3/5/2014 166

    iii xy 10

  • REGRESSION ANALYSIS: Model

    form The model form for simple regression model is as follows;

    Where Yi=the observation, i, of the dependent variable, 0 =population parameter for the intercept, 1= population

    parameter for the slope, Xi= is the independent variable

    corresponding to dependent variable, i, and i=error term that

    is independent and normally distributed with mean zero, and

    variance, 2.

    3/5/2014 167

    iii xy 10

  • REGRESSION ANALYSIS: Sample

    Statistics for the Model The model form for a sample is as follows;

    Estimation of the statistics of the model is based on the method of least squares. The derivation of the formula is not

    covered in this course.

    3/5/2014 168

    ii xbby 10

  • REGRESSION ANALYSIS:

    Method of Least Squares Methods of least squares commences with computation of

    sum of squares;

    3/5/2014 169

    yyxxS

    yyS

    xxS

    i

    n

    i

    ixy

    n

    i

    iyy

    n

    i

    ixx

    1

    2

    1

    1

    2

  • REGRESSION ANALYSIS:

    Method of Least Squares The estimate statistics;

    3/5/2014 170

    xx

    xy

    S

    Sb

    xbyb

    1

    10

  • REGRESSION ANALYSIS:

    Coefficient of Determination The coefficient of determination (R2) expresses how well the model

    predicts the data. If R-square is close to 1, then model is predicts 100% of the data. This is ideal not often obtained from experimental data.

    Y-hat is the predicted y-value using the least square estimate statistics.

    The square root of the coefficient of determinations yields correlation coefficient.

    3/5/2014 171

    n

    i

    i

    n

    i

    i

    yy

    yy

    R

    1

    2

    1

    2

    2

    1

  • REGRESSION ANALYSIS:

    Worked Example Given the following data, fit a simple linear regression model

    by method of least squares.

    3/5/2014 172

    y x

    0.18 20

    0.37 60

    0.35 100

    0.78 140

    0.56 180

    0.75 220

    1.18 260

    1.36 300

    1.17 340

    1.65 380

  • REGRESSION ANALYSIS:

    Worked Example-Sum of Squares Sum of Squares

    3/5/2014 173

    40.505

    13745.2

    000,132

    1

    2

    1

    1

    2

    yyxxS

    yyS

    xxS

    i

    n

    i

    ixy

    n

    i

    iyy

    n

    i

    ixx

  • REGRESSION ANALYSIS:

    Worked Example-Statistics Statistics

    The full model;

    3/5/2014 174

    069.0200*00383.0835.0

    00383.0132000

    40.505

    200;835.0

    0

    10

    1

    b

    xbyb

    S

    Sb

    xy

    xx

    xy

    ii xy 00383.0069.0

  • REGRESSION ANALYSIS: Worked

    Example-R-square

    Coefficient of Determination;

    Correlation coefficient (r);

    3/5/2014 175

    905.0

    13745.2

    202.01

    1

    13745.2

    202.0

    1

    2

    1

    2

    2

    1

    2

    1

    2

    n

    i

    i

    n

    i

    i

    n

    i

    i

    n

    i

    i

    yy

    yy

    R

    yy

    yy

    95.0905.02 Rr

  • REGRESSION ANALYSIS: Worked

    Example-MS Excel

    It is possible using MS Excel 2007 to carry out the above analysis. Go to Data, Data Analysis, choose regression and

    follow instructions. The output is as follows;

    3/5/2014 176

    SUMMARY

    OUTPUT

    Regression Statistics

    Multiple R 0.951

    R Square 0.905

    Adjusted R Square 0.893

    Standard Error 0.159

    Observations 10

    ANOVA

    df SS MS F Significance F

    Regression 1 1.93507 1.9351 76.49 0.00002

    Residual 8 0.20238 0.0253

    Total 9 2.13745

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept (b0) 0.0692 0.1010 0.6857 0.5123 -0.1636 0.3021

    Slope (b1) 0.0038 0.0004 8.7460 0.0000 0.0028 0.0048

  • END OF COURSE:

    WISH YOU GOOD LUCK PROFESSIONALLY!!

    3/5/2014 177