16
Tutorial 2: Visual basic functions In the last tutorial you generated a list of countries with their GDP. In this tutorial, you will work with that data to generate a figure (learn some general guidelines for figure preparation) and how to calculate a few metrics of central tendency (mean, median, mode). 1. From your last tutorial you ended up with a list of countries that looks like Fig. 1. Figure 1. 2. Let’s start by cleaning up the data. You will notice that several countries did not have GDP, and they have “#N/A” in column D. To remove those cells, select columns D:A (Fig. 2) and then click on sort (red box in Fig.2). Because the columns were selected from D to A, that function will sort all rows between column D to A by column D. If in the contrary, you first selected those columns from A to D, the function sort will sort by column A.

University of Hawaii Central Tende… · Web viewNow find out what is the number that is repeated 5 times. For this use the function match. Go to cell F7 and type “=MATCH(F8,F10:F193,0)”

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: University of Hawaii Central Tende… · Web viewNow find out what is the number that is repeated 5 times. For this use the function match. Go to cell F7 and type “=MATCH(F8,F10:F193,0)”

Tutorial 2: Visual basic functions

In the last tutorial you generated a list of countries with their GDP. In this tutorial, you will work with that data to generate a figure (learn some general guidelines for figure preparation) and how to calculate a few metrics of central tendency (mean, median, mode).

1. From your last tutorial you ended up with a list of countries that looks like Fig. 1.

Figure 1.2. Let’s start by cleaning up the data. You will notice that several countries did not have GDP, and they

have “#N/A” in column D. To remove those cells, select columns D:A (Fig. 2) and then click on sort (red box in Fig.2). Because the columns were selected from D to A, that function will sort all rows between column D to A by column D. If in the contrary, you first selected those columns from A to D, the function sort will sort by column A.

Figure 2.3. Select only the cells with “#N/A” and text and delete them (Fig. 3)

Page 2: University of Hawaii Central Tende… · Web viewNow find out what is the number that is repeated 5 times. For this use the function match. Go to cell F7 and type “=MATCH(F8,F10:F193,0)”

Figure 3.4. The next thing to note from your data, is that the numbers of GDP you want have some text

attached to them (e.g., the year in which the data was available). Because we want the GDP in a single column, use the function text to column: Select Column D, then click on Text to Columns function (highlighted in Fig 4. You used this function before, so try to figure it out).

Figure 45. You should end up with your data on GDP like in Fig. 5. Basically, each country has they per

capita GDP in column D.

Figure 56. Now that your data is clean and ready for analysis. The first thing to do is a frequency

distribution. Basically, how many countries have given GDPs. Lets do the figure and then you will then understand better the purpose of frequency distributions. First, let’s find out the smallest of all GDPs; for this, go to cell F1 and type “=min(D:D)”, click enter. The smallest GDP is $400 (Figure 6)

Figure 67. Next, find out the largest of all GDPs; for this, go to cell F2 and type “=max(D:D)”, click enter.

The largest GDP is $132,100 (Figure 7). Now imagine living in a country, where the average

Page 3: University of Hawaii Central Tende… · Web viewNow find out what is the number that is repeated 5 times. For this use the function match. Go to cell F7 and type “=MATCH(F8,F10:F193,0)”

person makes $132.100 a year…..keep thinking about it…what you will do with that kind of money…keep thinking….now, stop dreaming and lets continue our tutorial.

Figure 78. Frequency distributions divide the data into different categories and then counts how many

data entries are on each category. These types of figures require a bit of test and error to see which one looks best. And one important attribute is how many categories they will use. To make this change dynamic, lets start with 10 categories, so to cell F3 and type 10.

Figure 89. Now let’s find out the size of each interval. For this, you subtract the smallest GDP (Value in cell

F1) to the largest GDP (Value in cell F2) and divide that number by the number of intervals (Value in cell F3). Basically, go to cell F4 and type “ =(F2-F1)/F3”, then click enter (Fig. 9). With 10 categories of GDP, the size of each interval between the smallest and the largest GDP is 13,170.

Figure 910. Now lest generate an array with the values of such categories. First, create some titles to these

columns, just to keep some order. Go to cell H2, and type “Category #”.Go to cell H3 and type “1”, and in cell H4 type “2”. Then select cell H3 to H4 (Fig.10), left click on the black square in the corner of that selection (Red box in figure 10), and while holding the mouse click scroll down to about 40 numbers.

Figure 1011. Go to cell I2 and type “Value” (Fig 11). Now lets find out the numbers for the different

categories. Of course, the first category starts with the smallest GDP (Value in cell F1). So go to cell I3 and type “=F1” and click enter(Fig. 11)

Figure 11

Page 4: University of Hawaii Central Tende… · Web viewNow find out what is the number that is repeated 5 times. For this use the function match. Go to cell F7 and type “=MATCH(F8,F10:F193,0)”

12. Now, the value for the second category is the value in the first category (value in cell I3) plus the size of the interval (value in cell F4). Basically, go to cell I4 and type “=I3+F$4” (Fig. 12). Note the “$” sing that appears between F4. That sign is called an anchor. You may find out the function of this sign by seeing what it does…… now you want to find out the values for the remaining categories. To do this, select cell I4, and then left click on the black square at the right bottom corner (see red box in Fig. 12) and scroll down that selection until you reach the 40Th row you enter earlier. Alternatively, double click on the black square at the right bottom corner. Now, you can see all the values for the different categories. Select on any of the intermediate cell in column I (example in Fig. 13). You will notice that despite the fact that your scroll down of cell F$ increased sequentially the first part, the cell “F4” remained the same (Red box in fig. 13), that is the function of the “$” sign, it anchors the content of cells in fucntions.

Figure 1213. To verify that we are doing the right thing, note that the breaking point of the 11 th category is

the same value as the maximum GDP (Value in cell F2). Basically, we have created the breaking points for 10 categories of equal interval. Now if you replace the number of intervals (Value in cell F3) to say 15, you find out that Excel adjusts automatically the breaking points and now the maximum breaking point for the 15th category in in the 16th position. Replace, the 15 back to 10.

Figure 1314. Now let’s count how many countries are on each GDP category. For this, we will use the

function Frequency. Frequency requires you to enter the data that you want to count and the size of the categories you want the data to be binned. Basically, select all the cells between cell J1 to J42 and type “=frequency(D:D, I3:I42)” (Fig. 14) and then click “Shitf + Ctrl + Enter”. Now you will notice how many countries are at each of such intervals. In cell J2 type “Number of countries”. If you change the number of categories in cell F3, you will notice that Excell calculates automatically the countries at each category of GDP.

Page 5: University of Hawaii Central Tende… · Web viewNow find out what is the number that is repeated 5 times. For this use the function match. Go to cell F7 and type “=MATCH(F8,F10:F193,0)”

Figure 14

15. Now let’s do the frequency plot. Frequency distributions are normally displayed as bar charts. So select the number of countries at each category (cells between J3 and J13, Fig. 15). Then click on “Insert” “Column” (highlighted selection in Fig. 15). That will generate the chart in Fig. 16.

Figure 1516. Now let’s do some cosmetics on the figure. Many journals are very picky about how they like

their figures. Here are some general guidelines. Delete horizontal lines and the legend. Simply select them and click “delete”

Figure 16

17. Your figure should look like Fig. 17.

Page 6: University of Hawaii Central Tende… · Web viewNow find out what is the number that is repeated 5 times. For this use the function match. Go to cell F7 and type “=MATCH(F8,F10:F193,0)”

Figure 17.

18. Note that the x-values in the chart are not really related to the actual values of GDP. So right click on the chart and select “Select Data” (see fig. 18)

Figure 18.

19. Then select Series1, then click “Edit” (red square in Fig. 19) and then select the cells with the values of the GDP categories (Values between cells I3 to I13).

Page 7: University of Hawaii Central Tende… · Web viewNow find out what is the number that is repeated 5 times. For this use the function match. Go to cell F7 and type “=MATCH(F8,F10:F193,0)”

Figure 1920. You will now have a chart like Fig. 20, where the x-axis displays the GDP values.

Figure 20

21. Two important recommendations about figures for papers. First, the color of the axis in Excell figures are grey color as default. You may not notice this, but that will be evident when photocopying the image, which is the reason why most journals ask you to check for this in advance. So double click in the Y-axis first, then in “line color” then select “Solid line”, and the black color (Fig. 21). The other limitation is that Excell has 0.75 as the default value for the thickness of the lines in axis. And they need to be at least 1 point of larger. In the same tab (as in Fig. 21), select “Line style” and increase the “width” to 1pt. If you are doing posters, you will need to user ticker line widths. Repeat the same corrections for the x-axis.

Page 8: University of Hawaii Central Tende… · Web viewNow find out what is the number that is repeated 5 times. For this use the function match. Go to cell F7 and type “=MATCH(F8,F10:F193,0)”

Figure 21

22. You can also increase or reduce the size of the leters in the figure by selecting the axis, then go to “Home”, and select the type and size of letter you want (Fig. 22)

Figure 2223. Another attribute of figures for reports or scientific journals, is that you should not use color in

your figures. Keep them black and white as much as possible. At times some jounrals may allow you to use color, but you will have to pay for that. Also, they will print black and while when you photocopy them, defeating the purpose of using color. To change the colors in our figure, double click in one of the bars, select “Fill” , then “Solid color” and then select the black color (Fig. 23).

Page 9: University of Hawaii Central Tende… · Web viewNow find out what is the number that is repeated 5 times. For this use the function match. Go to cell F7 and type “=MATCH(F8,F10:F193,0)”

Figure 23

24. Now let put some axis names. Click on the chart, then on “Layout”, then on “Axis titles” then on “Primary Horizontal axis” and on “title below axis”. This will generate a box below the x-axis, you can rewrite in there “GDP”. Do the same for the y-axis and name it “Number of countries”

Figure 24

25. You should now have a chart like fig. 25, which is very close to the quality required for most journals.

Figure 2526. If you observed Fig. 25, you will notice that the distribution is skew to the “left”. That is, most

countries have small GDP per capita (e.i., most countries have a per capita GDP of $13,570), while only a few countries have small GDPs (g., only one country has a GDP smaller than $400) and even fewer larger GDPs (i.e., only one country has a GDP larger than $123,100). The

Page 10: University of Hawaii Central Tende… · Web viewNow find out what is the number that is repeated 5 times. For this use the function match. Go to cell F7 and type “=MATCH(F8,F10:F193,0)”

pressing question is how to we describe that distribution with numbers using just text?. One way to describe such distribution is the use of metric of central tendency. Those metrics include the mean, median and the mode.

27. The mean. The mean is the sum of values divided by the count of such numbers. Simply, add up all the numbers, then divide by how many numbers there are. First, let’s name the variables. Go to cell L3 and type “sum of values”. In L4 type “Number of Values” and in L5 type “Mean (also average)”. Lets now calculate those values. To sum the values, go to cell M3 type “=SUM(D:D)” (note that all the values of GDP are in column D). To calculate the number of values of GDP, go to cell M4 type “=counta(D:D)”. The function “counta” is a variant of “count”, which count all cells with numbers as oppose to “count”, which counts all occupied cells, including cells with text. To calculate the mean or average, got to cell M5 and type “=M3/M4”, note that M3 is where the sum of GDP is and M4 is where the number of values is. Based on the result in cell M5 then the average is $21,470 (fig. 26).

Figure 2628. Of course, Excel has its own function to calculate the mean. Go to cell N5 and type

“=average(D:D)” (Fig. 27). By the way go to cell M3 and type “Your calculation” and in N3 type “Excel function” so we differentiate both types of calculations.

Figure 2729. The median. The median is the number in the "middle" of a sorted list of numbers. To make this

calculation, select columns D to A, then click on sort (see red box in Fig. 28). That will sort the numbers of GDPs from smallest to the largest. Go to cell L6 and type “Median”.

Fig. 28.30. There may be several ways to find out the number in the middle of sorted GDPs, here we will

use the function Index (Fig. 29). Go to cell M6 and type “=INDEX(D:D,(M4/2))”, note that “D:D” is the column where the sorted GDPs are, “M4” is the number of GDPs and dividing it by 2 is half

Page 11: University of Hawaii Central Tende… · Web viewNow find out what is the number that is repeated 5 times. For this use the function match. Go to cell F7 and type “=MATCH(F8,F10:F193,0)”

the number of GDPs. The function “Index” simply returns the value located in column D by row (229/2). The median is $13,700. Of course excel has its own function, called median. Go to cell N6 and type “=Median(D:D)”, click enter. From excel the median is $13,700. If you check the value we calculate in M6, our calculated median was $13,700. What do you think is the reason of this difference?

Fig. 29.31. The mode. The mode is the value that is repeated the most. To calculate the mode, you need to

first find out the list of unique GDPs. There are many ways to do this, so lets use our friend Google for some help. Go to google, and type for instance “list of unique numbers in excel”. The first link I found is: http://www.excelhowto.com/5-ways-to-get-unique-values-in-excel/ , and there I found this function: “=IFERROR(INDEX($A$2:$A$16, MATCH(0, COUNTIF($B$1:B1, $A$2:$A$16), 0)),"")”. Looking over this function, you will notice that the data to look into is in column A, and the unique value is returned in column B. In our case, the data we want to search for is in Column D and we can enter the new values lets use column E, starting in row 10. So go to cell E10 and place the function above (Fig. 30). Next replace the “$A$2:$A$16” for D:D, where our data is and “$B$1:B1” to “$E$9:E9” where our unique values will be displayed. Then press SHIFT + CTRL + Enter. If you remember from the earlier tutorial, press SHIFT + CTRL + Enter is used when you want to look over an array of values rather than a single value. In this case, we want to look over all GDPs and get only values that are unique. You now will notice that the first unique value is 400.

Figure 3032. Now expand the function to look for the other unique values. For this, select the black square at

the bottom of cell E10 (Fig. 31), click the mouse and scroll down a few cells. Continue scrolling down the function until you get no more numbers. If all the values are the same, you will need to recalculate all your sheet, for this click F9. Also wait a bit, it may take a bit to do all calculations,

Page 12: University of Hawaii Central Tende… · Web viewNow find out what is the number that is repeated 5 times. For this use the function match. Go to cell F7 and type “=MATCH(F8,F10:F193,0)”

Figure 31

33. In my case the last unique GDP is in cell E193. Now we need to count how many times is GDP occurred in our database, the GDP that is the most repeated that is the mode. So go to cell F10 and type “=countif(D:D, E10)”, click Enter (Fig.32). This function counts in column D (where our GDPs are), the number of times the value in E10 exists. In this case, that number is 1. There is only 1 GDP of $400 in our database.

Figure 3234. Now lets count the number of other GDPs. For this, double click at the black square in the

corner of cell F10 (see red box in Fig. 33)

Figure 33.35. Now find out the most repeated GDP. For this, use the function “MAX”, which returns the

largest number in an array. Go to cell F8 and type “=MAX(F10:F193)”. In our case the largest number is 5 times.

Page 13: University of Hawaii Central Tende… · Web viewNow find out what is the number that is repeated 5 times. For this use the function match. Go to cell F7 and type “=MATCH(F8,F10:F193,0)”

Figure 3436. Now find out what is the number that is repeated 5 times. For this use the function match. Go to

cell F7 and type “=MATCH(F8,F10:F193,0)” (Fig. 35). “Match” looks the value in cell F8, in the array between F10:F193, and uses an exact match (0), and returns the row position in which that criteria is met. In our case, the value is 12 rows from the start of the array.

Figure 35

37. Now find out the GDP that is repeated 5 times. Go to cell L7 and type Mode, and in cell M7 type “=INDEX(E10:E193,F7)” click Enter. INDEX returns the value in the array E10:E193, at the row, provided in cell F7. In our case the GDP that repeats the most is $1800.

Figure 3638. Of course, Excel also provides a function that calculates the Mode. Go to cell N7 and type

“=Mode(D:D)”. That returns the value $1,800, which is the GDP that is the most repeated.Now compare the different results you obtained for the Mean, the Median and the Mode, and lets discuss what you chose in class.