2
BNFO615: Bioinformatics Centric Data Structure Prof. Zhi Wei Homework Assignment I Due: Wed. Oct. 6 Notes: 1) Hand in a paper copy at the beginning of the class period on the due date. 2) The paper copy (properly stapled together) must include the following items if any: Detailed calculation for probability problems; Source code of your program; Output produced by the program; A brief description if needed. 3) Late assignments will be penalized at a rate of 25 points per 24-hour period. Problem 1 (20 pt): There is a rare disease that only happens to 1 out of 100,000 people. A test shows positive 99% of times when applied to an ill patient and, 5% of times when applied to a healthy patient. a) What is the probability for a patient to have the disease given that his test result is positive? b) If the patient did another test and the result turned out to be positive again, what is the probability for the patient to be ill this time? Assume that the two tests are independent. c) Assume that the patient keeps on trying the tests, what is the minimum number of tests that the patient has to try to be 99% percent sure that he is actually ill? Assume that all tests are independent. Problem 2 (20 pt): Given three events: 1) R: It rains, 2) W: Glass is wet, and 3) U: People bring umbrella. We assume that event U and W are conditionally independent given event R. More specifically, we have Pr(U, W|R) = Pr(U|R)Pr(W|R) and Pr(U, W|R) = Pr(U|R)Pr(W|R). The graphical representation of the relationship among the three events is illustrated in Figure 1. The prior probability for event R is Pr(R) = 0.8. The conditional probabilities Pr(U|R) and Pr(W|R) are given in the below two tables. Based on these information, compute the conditional probability Pr(W|U).

Homework 1

Embed Size (px)

Citation preview

Page 1: Homework 1

BNFO615: Bioinformatics Centric Data Structure

Prof. Zhi Wei

Homework Assignment I

Due: Wed. Oct. 6

Notes:

1) Hand in a paper copy at the beginning of the class period on the due date.

2) The paper copy (properly stapled together) must include the following items if any:

Detailed calculation for probability problems;

Source code of your program;

Output produced by the program;

A brief description if needed.

3) Late assignments will be penalized at a rate of 25 points per 24-hour period.

Problem 1 (20 pt):

There is a rare disease that only happens to 1 out of 100,000 people. A test shows positive

99% of times when applied to an ill patient and,

5% of times when applied to a healthy patient.

a) What is the probability for a patient to have the disease given that his test result is positive?

b) If the patient did another test and the result turned out to be positive again, what is the

probability for the patient to be ill this time? Assume that the two tests are independent.

c) Assume that the patient keeps on trying the tests, what is the minimum number of tests that

the patient has to try to be 99% percent sure that he is actually ill? Assume that all tests are

independent.

Problem 2 (20 pt):

Given three events: 1) R: It rains, 2) W: Glass is wet, and 3) U: People bring umbrella. We

assume that event U and W are conditionally independent given event R. More specifically, we

have Pr(U, W|R) = Pr(U|R)Pr(W|R) and Pr(U, W|R) = Pr(U|R)Pr(W|R). The graphical

representation of the relationship among the three events is illustrated in Figure 1. The prior

probability for event R is Pr(R) = 0.8. The conditional probabilities Pr(U|R) and Pr(W|R) are

given in the below two tables. Based on these information, compute the conditional probability

Pr(W|U).

Page 2: Homework 1

BNFO615: Bioinformatics Centric Data Structure

Prof. Zhi Wei

Pr(W|R) R R

W 0.7 0.4

W 0.3 0.6

Problem 3 (30 pt):

Install the package UsingR and then consider the following problems:

1) The data set pi2000 (UsingR) contains the first 2,000 digits of π. What is the percentage of

digits that are 3 or less? What percentage of the digits are 5 or more?

2) The time variable in the nym.2002 (UsingR) data set contains the time to finish the 2002

New York City marathon for a random sample of the finishers.

1. What percent ran the race in under 3 hours?

2. What is the time cutoff for the top 10%? The top 25%?

3. What time cuts off the bottom 10%?

Do you expect this data set to by symmetrically distributed?

Problem 4 (30 pt):

1) An elevator can safely hold 3,500 pounds. A sign in the elevator limits the passenger count to

15. If the adult population has a mean weight of 180 pounds with a 25-pound standard

deviation, how unusual would it be, if the central limit theorem applied, that an elevator

holding 15 people would be carrying more than 3,500 pounds?

2) A traffic officer writes an average of four tickets per day, with a variance of one ticket.

Assume the central limit theorem applies. What is the probability that she will write fewer

than 75 tickets in a 21-day cycle?

Pr(U|R) R R

U 0.9 0.2

U 0.1 0.8

R

W U

Figure 1: Graphical representation of events R, W, and U