35
10/28/2015 OPTIMIZATION CSC342FALL 2015Prof. IZIDOR GERTNER VITO KLAUDIO

OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

10/28/2015

OPTIMIZATION CSC342–FALL 2015–Prof. IZIDOR GERTNER

VITO KLAUDIO

Page 2: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

1

Table of contents

1. Objective .................................................................................... pg. 2

2. Overview .................................................................................... pg. 3

3. Compiler Generated Index Performance ............................... pg. 4

4. Compiler Generated Pointer Performance ............................ pg. 13

5. Optimized Index Performance ................................................ pg. 17

6. Optimized Pointer Performance ............................................. pg. 20

7. Analysis ...................................................................................... pg. 23

8. Conclusion ................................................................................. pg. 25

9. Appendix.................................................................................... pg. 26

Page 3: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

2

1. Objective

The objective of this project is to prove that the compiler generated assembly code for a

function is not very good in terms of running time. I will use a simple function that clears an array

i.e. that sets all its elements to zero, to test the running time. I will use two versions of the same

program, namely clearing an array by indices and clearing an array by pointers and the array will

have different sizes ranging from 10 to 1,000,000. The compiler generated assembly code for these

functions will then be optimized manually. I will measure the running time of the optimized

version of the program and plot the measurements in a graph to prove that the optimized code for

clearing an array using pointers is the fastest.

Keep in mind that I will be using a computer with multi-core processor therefore it is hard to

get accurate time my measurements. I will run each of tests five times and take the average of that

to approximate the true running time of the algorithm.

Page 4: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

3

2. Overview

In order to optimize the machine instructions for my functions I will generate the assembly

code and make some modifications to it. All my work will be done on Microsoft’s Visual Studio

2012. There is one problem in this case: I need to compile assembly language code and then link

it to a C++ main file. Visual Studio does not know how to deal with assembly language files by

default, therefore we will need to perform a “Custom Build” operation to achieve our goal. Once

the compilation and linking, called “Build Solution” in Visual Studio, are successful the path is

clear to start timing my functions.

To approximate the running time of my function I will use the “QueryPerfomanceFunction”

which can be found on the Microsoft official web page with the following link:

https://support.microsoft.com/en-us/kb/815668. We use this function because we want to

achieve the highest resolution timer. This function will give a very accurate approximation of the

real running time of the function since it is not possible to truly time the running time of a function

on a multi-core processor.

Page 5: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

4

3. Compiler Generated Index Performance

This part of the project deals with the performance of the compiler generated assembly code

of a function that clears an array using indices. I will start a new project on Visual Studio as a

“Console Application” and have it as an “Empty Project”. There are two files that I am going to

use. The first is the “main” file, which is the main program that calls the clearing of the array

function. In a separate file, I will code the “ClearArrayUsingIndex” function. I will start my

project as simple as possible, therefore we consider an array of size 10 initially. The following

picture shows how these function are set on my Visual Studio project. The code for all the functions

used in this project will be available in the appendix section.

Figure 1 – Main Function Compilation

Page 6: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

5

Figure 2 – ClearArrayUsingIndex Compilation

As we can see from the above screenshots the two function files are successfully compiled

separately. We can now build the function, which means linking the two object files generated for

each function after the compilation. After the linking is completed we can go ahead and generate

the assembly code for the ClearArrayUsingIndex function, but we need to let the compiler know

that we want to do that. I do this by right clicking on the “ClearArrayUsingIndex” file and

selecting “Properties”, in the window that appears expand the “C/C++” menu and go to the

“Output Files” section. The second listing from the top, “Assembler Output” is what we are

looking for. I change the listing to “Assembly-Only Listing”. The following screenshot depicts

what the window should look like before clicking “Apply” and then “OK”:

Page 7: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

6

Figure 3 – Generating Assembly Code from C/C++ Files

After I complete this task, I have to recompile the function. After the compilation is successful

the “.asm” file is generated by the compiler and it can be found in the “debug” folder of my

working directory. I can add this file to my project by right clicking on the “Source Files” folder

and then “Add -> Add Existing File” and select the file from the already mention directory. At

this point we do not need the C/C++ file anymore therefore we right click it and remove it by

clicking on “Exclude From Project”. We open the “ClearArrayUsingIndex.asm” file to check its

contents. And it truly is what we expected, assembly code to clear an array using indices. Now I

need to compile this file in order to create the object file that will be linked with the main file.

Notice that when we right click the .asm file, the compile feature is disabled. This happens because

Page 8: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

7

the Visual Studio compiler does not know by default how to compile assembly code, hence we

need to “Custom Build” our function. To do this we need to right click on the

“ClearArrayUsingIndex.asm” file and select “Properties”. The following window will appear:

Figure 4 – Custom Building Assembly Files (Part 1)

We can see that we do not have any custom building options here, yet. To tell the compiler that

we want to custom build our file we need to click on the “Item Type” list select “Custom Build

Tool” and click “Apply”. The following screenshot shows the effect of this:

Page 9: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

8

Figure 5 - Custom Building Assembly Files (Part 2)

As we can see the “Custom Build Tool” appears on the left hand side panel, depicted with a

red circle in the screenshot. We select it and now we need to change the “Command Line” and

“Outputs”. We don’t need to know how the commands work, simple copy and paste is enough

information for the moment.

Command Line:

ml -c "-Fl$(IntDir)%(FileName).lst" "-Fo$(IntDir)%(FileName).obj" "%(FullPath)"

Outputs:

$(IntDir)%(FileName).obj;

Page 10: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

9

After this operations are performed we apply the modification and then compile the assembly

file. I notice that there are two errors generated by the compiler. The errors point to the lines

depicted in red in the following screenshot:

Figure 6 - Custom Building Assembly Files (Part 3)

We can easily solve this problem by deleting these lines, or just commenting them out and try

to compile again. In this case the compilation is successful and we can proceed with the linking of

the two files by clicking on “Build -> Rebuild Solution”. After the linking is successful we need

to time our program. We do this by using the “QueryPerformanceCounter” function. The main

function in this case needs to be modified. We include two libraries that are “tchar.h” and

“windows.h”. Furthermore, I use a new name space for the performance counter called “System”.

Another modification to the main program is done to fill an array of arbitrary size, therefore we

need to take the “size” variable out of the main scope and declare it as “const” and use a “for loop”

to fill the array, then perform the performance counter to time the function. The following

screenshot shows the new main function:

Page 11: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

10

Figure 7 – Query Performance Counter Function

We can now run this program to check its timing by clicking on “Debug” and then “Start

Without Debugging”. The following screenshot shows a sample output of this program:

Figure 8 – Results from QueryPerformanceCounter

The value produced is a sample value for clear an array of random size.

Page 12: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

11

I chose to repeat my experiment five times for each size from 10 to 1,000,000 and take the

average of them in order to approximate the real running time as much as possible. The following

tables show the results:

Table 1 - ClearArrayUsingIndex( Size = 10)

Runs Running Time (seconds) Average (seconds)

1 0.000225839938509944 0.0002707940626337985

2 0.000248081750635923

3 0.000241238116135622

4 0.000334910363358497

5 0.000272889925699516

Table 2 - ClearArrayUsingIndex( Size = 100)

Runs Running Time (seconds) Average (seconds)

1 0.000310102188294904 0.0002873043308657753

2 0.000261341292480257

3 0.000281872195981161

4 0.000337048999139841

5 0.00023824402604174

Table 3 - ClearArrayUsingIndex( Size = 1,000)

Runs Running Time (seconds) Average (seconds)

1 0.000281872195981161 0.0002945756925223456

2 0.000291282193419076

3 0.00030069219085699

4 0.000245943114854579

5 0.00024551538769831

Table 4 - ClearArrayUsingIndex( Size = 10,000)

Runs Running Time (seconds) Average (seconds)

1 0.000468788963270641 0.0004392330167724649

2 0.000416178523049575

3 0.00121688375958483

4 0.000305824916732216

5 0.000366134445766121

Page 13: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

12

Table 5 - ClearArrayUsingIndex( Size = 100,000)

Runs Running Time (seconds) Average (seconds)

1 0.0014354523364382 0.0010076396347381172

2 0.00117710513405183

3 0.000880262487601259

4 0.000878551578976184

5 0.00114716423311301

Table 6 - ClearArrayUsingIndex( Size = 1,000,000)

Runs Running Time (seconds) Average (seconds)

1 0.00593257565744872 0.007948368199512477

2 0.00882230032520096

3 0.00683593541148849

4 0.00689538948620986

5 0.00813152096782679

We can visualize these results better by plotting them into a graph.

Graph 1 – Compiler Generated Time of ClearArrayUsingIndex

0

0.0005

0.001

0.0015

0.002

0.0025

0.003

10 100 1000 10000 100000 1000000

TIM

E (S

ECO

ND

S)

SIZE

Compiler Generated Running Time of ClearArrayUsingIndex

Index

Page 14: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

13

4. Compiler Generated Pointer Performance

I continue my study of compiler generated assembly language code for the same function but

using a different approach that of using pointers. I create new project in Visual Studio and add two

separate files, same as when using indices to clear the array. One of the files is the main function

that calls the second file which is the “ClearUsingPointers.cpp” file. The following screenshots

show the modification of the function:

Figure 9 – ClearUsingPointers Function

Figure 10 – Main Function for ClearUsingPointers

Page 15: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

14

We can see that the method of clearing the array has changed. We are now using pointers to

clear our array. We expect the pointer approach to be more efficient then the indices approach. I

come to this conclusion because using indices we are accessing the memory location of each of

the elements in the array by going back and forth in memory, while on the other hand by using

pointers we are accessing the elements by their address and not their content. We already know

that register operations are much faster then memory operations.

We continue our study by generating the assembly code for the ClearUsingPointers.cpp

function and compile it following the same steps that we used for clearing the array using indices.

Again, there will be some errors in during compilation which we solve by simply commenting out

the lines that provide the error. Once we have the compiled file we link it to the main function by

using the “Build Solution” option of Visual Studio. It is time to test our function by running several

test on different sizes of the array which again varies over the same range as before, 10 to 1,000,000

elements. The following tables show the results:

Table 7 - ClearUsingPointers (Size = 10)

Runs Running Time (seconds) Average (seconds)

1 0.000204025853540234 0.0002282779833006765

2 0.000254925385136225

3 0.000226267665666213

4 0.000225839938509944

5 0.000227550847135019

Table 8 - ClearUsingPointers (Size = 100)

Runs Running Time (seconds) Average (seconds)

1 0.000289143557637731 0.0002439755699357425

2 0.000227978574291288

3 0.000260913565323988

4 0.000228406301447557

5 0.000223273575572331

Page 16: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

15

Table 9 - ClearUsingPointers (Size = 1,000)

Runs Running Time (seconds) Average (seconds)

1 0.000322934002982969. 0.0002602719745895851

2 0.00024936493210473

3 0.000236105390260396

4 0.000233539027322783

5 0.000268612654136827

Table 10 - ClearUsingPointers (Size = 10,000)

Runs Running Time (seconds) Average (seconds)

1 0.000304969462419678 0.0003327717275771524

2 0.000308391279669829

3 0.000305397189575947

4 0.000304541735263409

5 0.000316945822795206

Table 11 - ClearUsingPointers (Size = 100,000)

Runs Running Time (seconds) Average (seconds)

1 0.000998742909887726 0.0009799656877275237

2 0.000775041607159126

3 0.00101071927026325

4 0.00104750380570237

5 0.000888817030726636

Table 12 - ClearUsingPointers (Size = 1,000,000)

Runs Running Time (seconds) Average (seconds)

1 0.00762680292342957 0.007945716291143611

2 0.00766786473043138

3 0.00767513609208795

4 0.00847626905577947

5 0.00890314075773577

The results from these tables show that the running time of the pointer based clearance of the

array is slightly faster than the index method. But this is just the compiler generated code. Let’s

take a look at the graph we get by plotting these values:

Page 17: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

16

Graph 2 – Compiler Generated Time of ClearUsingPointers

If we compare this graph to Graph 1, we can see that the graph for the method using pointers

grow slightly slower, meaning that the running time is slightly better than the one achieved from

the indices method. We are not satisfied with this results. We need to optimize manually the

assembly code.

0

0.0005

0.001

0.0015

0.002

0.0025

0.003

10 100 1000 10000 100000 1000000

TIM

E (S

ECO

ND

S)

SIZE

Compiler Generated Running Time of ClearUsingPointers

Pointer

Page 18: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

17

5. Optimized Index Performance

In order to optimize our methods of clearing an array of arbitrary numbers we will take the

assembly code and see what is it that we can remove without ruining the algorithm or more

importantly, what instructions can we substitute. This means that we will take some instructions,

two or three at a time, and substitute them with just one instruction and this way the running time

will be reduced considerably. I will focus on the instructions that deal with memory. I will try to

reduce as much as possible the number of instructions that tell the processor to go and look into

memory back and forth throughout the program. The following screenshot shows the results after

the optimization:

Figure 11 – Optimized ClearArrayUsingIndex

Here, the red squares depict the instructions that were substituted while the green squares

contain the new instructions which we expect to work faster than before. We optimized the array

by substituting the instructions that work with memory. We run the program, and test it for

different sizes of the array. The tables below show the results:

Page 19: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

18

Table 13 - ClearUsingIndexOptimized( Size = 10)

Runs Running Time (seconds) Average (seconds)

1 0.000236533117416665 0.000188028857895779

2 0.000112919969254972

3 0.000112064514942434

4 0.000242949024760697

5 0.000235677663104127

Table 14 - ClearUsingIndexOptimized( Size = 100)

Runs Running Time (seconds) Average (seconds)

1 0.000214291305290686 0.0002800757419248322

2 0.000237816298885471

3 0.000239954934666816

4 0.00034474808795268

5 0.000363568082828508

Table 15 - ClearUsingIndexOptimized( Size = 1,000)

Runs Running Time (seconds) Average (seconds)

1 0.000254925385136225. 0.0002647631097304078

2 0.000291709920575344

3 0.0002566362937613

4 0.000266901745511752

5 0.000253642203667418

Table 16 - ClearUsingIndexOptimized( Size = 10,000)

Runs Running Time (seconds) Average (seconds)

1 0.000215146759603223 0.000322335184964193

2 0.000459806692988996

3 0.000447830332613469

4 0.000236960844572934

5 0.000251931295042343

Table 17 - ClearUsingIndexOptimized( Size = 100,000)

Runs Running Time (seconds) Average (seconds)

1 0.000767770245502556 0.0005071988619035828

2 0.000427299429112565

3 0.000490175321084083

4 0.000426016247643758

5 0.000424733066174952

Page 20: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

19

Table 18 - ClearUsingIndexOptimized( Size = 1,000,000)

Runs Running Time (seconds) Average (seconds)

1 0.00338631589618035 0.002520339495598474

2 0.00216387168356403

3 0.00249279386673476

4 0.00234651117929082

5 0.00221220485222241

We can already see the difference. The optimized Index method works faster than the pointer

method without optimization. Let’s make it clear by plotting it on a graph:

Graph 3 – Optimized Running Time of ClearArrayUsingIndex

Let’s continue and see how optimization effects pointer based array clearing.

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

10 100 1000 10000 100000 1000000

TIM

E (S

ECO

ND

S)

SIZE

Optimized Running Time of ClearArrayUsingIndex

Optimized Index

Page 21: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

20

6. Optimized Pointer Performance

We follow the same procedure as before for optimizing the pointer method. The following

screenshot shows the optimized assembly code for the ClearUsingPointers method:

It is clear that we removed all the redundant instructions and substituted them with new

instructions which use fewer space. Now it is time to test our optimization. The following tables

show the results from testing the optimized pointer method for different sizes of an array:

Table 19 - ClearUsingPointerOptimized( Size =10)

Runs Running Time (seconds) Average (seconds)

1 0.000260913565323988 0.000254839839704971

2 0.000233966754479052

3 0.000229689482916364

4 0.00023482220879159

5 0.000314807187013861

Page 22: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

21

Table 20 - ClearUsingPointerOptimized( Size =100)

Runs Running Time (seconds) Average (seconds)

1 0.000224129029884869 0.0002680993815493048

2 0.000352874903921788

3 0.000277167197262204

4 0.000227550847135019

5 0.000258774929542644

Table 21 - ClearUsingPointerOptimized( Size =1,000)

Runs Running Time (seconds) Average (seconds)

1 0.000227550847135019 0.0002276363925662732

2 0.000226695392822482

3 0.000226695392822482

4 0.000228406301447557

5 0.000228834028603826

Table 22 - ClearUsingPointerOptimized( Size =10,000)

Runs Running Time (seconds) Average (seconds)

1 0.000242949024760697 0.0002857217403875806

2 0.000246370842010848

3 0.00024551538769831

4 0.000266901745511752

5 0.000426871701956296

Table 23 - ClearUsingPointerOptimized( Size =100,000)

Runs Running Time (seconds) Average (seconds)

1 0.000417461704518381 0.0004474026054571994

2 0.00041788943167465

3 0.000427727156268833

4 0.000361429447047164

5 0.000612505287776969

Table 24 - ClearUsingPointerOptimized( Size =1,000,000)

Runs Running Time (seconds) Average (seconds)

1 0.00196583401021156 0.00240801834436228

2 0.00247140750892132

3 0.00245258751404549

4 0.00217199849953314

5 0.00297826418909989

Page 23: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

22

We have already gotten the feeling that the optimized pointer method will be the fastest of all

of them. Let’s see the graph that we obtain from these averages:

Graph 4 – Optimized Running Time of ClearUsingPointers

We can see the difference now. The optimized version of the pointer method runs faster than

any of the previous methods.

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

10 100 1000 10000 100000 1000000

TIM

E (S

ECO

ND

S)

SIZE

Optimized Running Time of ClearUsingPointers

Optimized Pointer

Page 24: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

23

7. Analysis

In this section we analyze the running time of all our examples presented in this laboratory.

We combine the results from the timing of the functions and create a new table for them:

Table 24 – Optimization Summary

Size Index Pointers Index Optimized Pointer Optimized

10 0.000271 0.000228 0.000188 0.000255

100 0.000287 0.000244 0.000280 0.000268

1,000 0.000295 0.000260 0.000265 0.000228

10,000 0.000439 0.000333 0.000322 0.000286

100,000 0.001008 0.000980 0.000507 0.000447

1,00,0000 0.007948 0.00765 0.002520 0.0024080

The results of this lab are summarized by the following graph:

Page 25: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

24

Graph 5 – Optimization Results

0

0.005

0.01

0.015

0.02

0.025

10 100 1000 10000 100000 1000000

Tim

e (s

eco

nd

s)

Size

OPTIMIZATION RESULTS

Optimized Pointer Optimized Index

Pointer Index

Page 26: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

25

8. Conclusion

In this laboratory we took into consideration two methods for clearing an array. The first

method uses indices to clear the array while the second method uses pointers to do the same thing.

We tested the running time of these methods. Afterwards we optimized the running time by

substituting instructions in the assembly code generated by the compiler. We timed again the

running time to find out that:

The conclusion of this laboratory is that the optimized version of clearing an

array by pointers is the fastest way to complete the task.

Page 27: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

26

9. Appendix

CLEAR USING INDICES

Main.cpp

#include <tchar.h>

#include <windows.h>

void ClearUsingIndex(int [], int);

using namespace System;

const int n = 100000000;

static int arr[n];

int main() {

for (int i = 0; i < n; i++)

arr[i] = i+1;

__int64 ctr1 = 0, ctr2 = 0, freq = 0;

int acc = 0, i = 0;

// Start timing the code.

if (QueryPerformanceCounter((LARGE_INTEGER *)&ctr1)!= 0)

{

// Code segment is being timed.

ClearUsingIndex(arr, n);

// Finish timing the code.

QueryPerformanceCounter((LARGE_INTEGER *)&ctr2);

Console::WriteLine("Start Value: {0}",ctr1.ToString());

Console::WriteLine("End Value: {0}",ctr2.ToString());

QueryPerformanceFrequency((LARGE_INTEGER *)&freq);

//Console::WriteLine(S"QueryPerformanceCounter minimum resolution: 1/{0}

Seconds.",freq.ToString());

// In Visual Studio 2005, this line should be changed to:

Console::WriteLine("QueryPerformanceCounter minimum resolution: 1/{0}

Seconds.",freq.ToString());

Console::WriteLine("100 Increment time: {0} seconds.",((ctr2 - ctr1) * 1.0

/ freq).ToString());

}

else

{

DWORD dwError = GetLastError();

//Console::WriteLine(S"Error value = {0}",dwError.ToString());// In Visual

Studio 2005, this line should be changed to: Console::WriteLine("Error value =

{0}",dwError.ToString());

Page 28: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

27

}

// Make the console window wait.

Console::WriteLine();

Console::Write("Press ENTER to finish.");

Console::Read();

return 0;

}

ClearArrayUsingIndex.cpp

void ClearUsingIndex(int arr[], int size)

{

int i;

for (i = 0; i < size; i++)

arr[i] = 0;

}

ClearArrayUsingIndex.asm

; Listing generated by Microsoft (R) Optimizing Compiler Version 17.00.50727.1

TITLE C:\Users\Klaudio\Desktop\CSC342-343 CLASS\csc342Project -

optimization\CSC342_Project\CSC342_Project\ClearArrayUsingIndex.cpp

.686P

.XMM

include listing.inc

.model flat

INCLUDELIB MSVCRTD

INCLUDELIB OLDNAMES

PUBLIC ?ClearArrayUsingIndex@@YAXQAHH@Z ; ClearArrayUsingIndex

EXTRN __RTC_InitBase:PROC

EXTRN __RTC_Shutdown:PROC

; COMDAT rtc$TMZ

rtc$TMZ SEGMENT

__RTC_Shutdown.rtc$TMZ DD FLAT:__RTC_Shutdown

rtc$TMZ ENDS

; COMDAT rtc$IMZ

rtc$IMZ SEGMENT

__RTC_InitBase.rtc$IMZ DD FLAT:__RTC_InitBase

rtc$IMZ ENDS

; Function compile flags: /Odtp /RTCsu /ZI

; COMDAT ?ClearArrayUsingIndex@@YAXQAHH@Z

_TEXT SEGMENT

_i$ = -8 ; size = 4

_arr$ = 8 ; size = 4

_size$ = 12 ; size = 4

?ClearArrayUsingIndex@@YAXQAHH@Z PROC ; ClearArrayUsingIndex, COMDAT

Page 29: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

28

; File c:\users\klaudio\desktop\csc342-343 class\csc342project -

optimization\csc342_project\csc342_project\cleararrayusingindex.cpp

; Line 2

push ebp

mov ebp, esp

sub esp, 204 ; 000000ccH

push ebx

push esi

push edi

lea edi, DWORD PTR [ebp-204]

mov ecx, 51 ; 00000033H

mov eax, -858993460 ; ccccccccH

rep stosd

; Line 3

mov DWORD PTR _i$[ebp], 0

; Line 4

mov DWORD PTR _i$[ebp], 0

jmp SHORT $LN3@ClearArray

$LN2@ClearArray:

mov eax, DWORD PTR _i$[ebp]

add eax, 1

mov DWORD PTR _i$[ebp], eax

$LN3@ClearArray:

mov eax, DWORD PTR _i$[ebp]

cmp eax, DWORD PTR _size$[ebp]

jge SHORT $LN4@ClearArray

; Line 5

mov eax, DWORD PTR _i$[ebp]

mov ecx, DWORD PTR _arr$[ebp]

mov DWORD PTR [ecx+eax*4], 0

jmp SHORT $LN2@ClearArray

$LN4@ClearArray:

; Line 6

pop edi

pop esi

pop ebx

mov esp, ebp

pop ebp

ret 0

?ClearArrayUsingIndex@@YAXQAHH@Z ENDP ; ClearArrayUsingIndex

_TEXT ENDS

END

Page 30: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

29

ClearArrayUsingIndexOptimized.asm

; Listing generated by Microsoft (R) Optimizing Compiler Version 15.00.21022.08

TITLE c:\Users\.....\ClearArrayIndexOptimized.cpp

.686P

.XMM

include listing.inc

.model flat

;

; OPTIMIZED!!!!

; Custom Build Step, including a listing file placed in intermediate directory

; but without Source Browser information

; debug:

; ml -c -Zi "-Fl$(IntDir)\$(InputName).lst" "-Fo$(IntDir)\$(InputName).obj"

"$(InputPath)"

; release:

; ml -c "-Fl$(IntDir)\$(InputName).lst" "-Fo$(IntDir)\$(InputName).obj"

"$(InputPath)"

; outputs:

; $(IntDir)\$(InputName).obj

PUBLIC ?ClearUsingIndexOptimized@@YAXQAHH@Z ;

ClearUsingIndexOptimized

_TEXT SEGMENT

_i$ = -8 ; size = 4

_Array$ = 8 ; size = 4

_size$ = 12 ; size = 4

;

?ClearUsingIndexOptimized@@YAXQAHH@Z PROC ; ClearUsingIndexOptimized,

COMDAT

; Line 3

push ebp

mov ebp, esp

sub esp, 204 ; 000000ccH

push ebx

push esi

push edi

lea edi, DWORD PTR [ebp-204]

mov ecx, 51 ; 00000033H

mov eax, -858993460 ; ccccccccH

rep stosd

; Line 5

; mov DWORD PTR _i$[ebp], 0 ; i =0 on stack

mov eax, 0 ; initialize i in EAX to 0

mov edx, DWORD PTR _size$[ebp] ; store ARRAY size in EDX

mov ecx, DWORD PTR _Array$[ebp] ; move address of the ARRAY from stack

to ecx

jmp SHORT $LN3@ClearUsing

$LN2@ClearUsing:

; mov eax, DWORD PTR _i$[ebp] ; move again i from stack to eax

Page 31: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

30

add eax, 1 ; increament i in EAX

; mov DWORD PTR _i$[ebp], eax ; move eax onto stack

$LN3@ClearUsing:

; mov eax, DWORD PTR _i$[ebp] ; move i from stack to eax

; cmp eax, DWORD PTR _size$[ebp] ; compare i in eax with ARRAY

size on stack

cmp eax, edx ; compare i in eax with ARRAY

size in EDX

jge SHORT $LN4@ClearUsing ; if done exit

; Line 6

; mov eax, DWORD PTR _i$[ebp] ; move again i into eax

; mov ecx, DWORD PTR _Array$[ebp] ; move address of the ARRAY from

stack to ecx

mov DWORD PTR [ecx+eax*4], 0 ; compute the effective address

and move zero to the address

jmp SHORT $LN2@ClearUsing ; jump to the begginning of the

LOOP

$LN4@ClearUsing:

; Line 7

pop edi

pop esi

pop ebx

mov esp, ebp

pop ebp

ret 0

?ClearUsingIndexOptimized@@YAXQAHH@Z ENDP ; ClearUsingIndexOptimized

_TEXT ENDS

END

CLEAR USING POINTERS

Main.cpp

#include <tchar.h>

#include <windows.h>

void ClearUsingPointers(int*, int);

using namespace System;

const int n = 100000000;

static int arr[n];

int main() {

for (int i = 0; i < n; i++)

arr[i] = i+1;

__int64 ctr1 = 0, ctr2 = 0, freq = 0;

int acc = 0, i = 0;

// Start timing the code.

if (QueryPerformanceCounter((LARGE_INTEGER *)&ctr1)!= 0)

{

// Code segment is being timed.

Page 32: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

31

ClearUsingPointers(arr, n);

// Finish timing the code.

QueryPerformanceCounter((LARGE_INTEGER *)&ctr2);

Console::WriteLine("Start Value: {0}",ctr1.ToString());

Console::WriteLine("End Value: {0}",ctr2.ToString());

QueryPerformanceFrequency((LARGE_INTEGER *)&freq);

//Console::WriteLine(S"QueryPerformanceCounter minimum resolution: 1/{0}

Seconds.",freq.ToString());

// In Visual Studio 2005, this line should be changed to:

Console::WriteLine("QueryPerformanceCounter minimum resolution: 1/{0}

Seconds.",freq.ToString());

Console::WriteLine("100 Increment time: {0} seconds.",((ctr2 - ctr1) * 1.0

/ freq).ToString());

}

else

{

DWORD dwError = GetLastError();

//Console::WriteLine(S"Error value = {0}",dwError.ToString());// In Visual

Studio 2005, this line should be changed to: Console::WriteLine("Error value =

{0}",dwError.ToString());

}

// Make the console window wait.

Console::WriteLine();

Console::Write("Press ENTER to finish.");

Console::Read();

return 0;

}

ClearUsingPointers.cpp

void ClearUsingPointers(int* arr, int size){

int *p;

for( p = &arr[0]; p < &arr[size]; p = p+1)

*p = 0;

};

Page 33: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

32

ClearUsingPointers.asm

; Listing generated by Microsoft (R) Optimizing Compiler Version 17.00.50727.1

TITLE C:\Users\Klaudio\Desktop\CSC342-343 CLASS\10-28-2015\10-28-

2015\ClearUsingPointers.cpp

.686P

.XMM

include listing.inc

.model flat

INCLUDELIB MSVCRTD

INCLUDELIB OLDNAMES

PUBLIC ?ClearUsingPointers@@YAXPAHH@Z ; ClearUsingPointers

EXTRN __RTC_InitBase:PROC

EXTRN __RTC_Shutdown:PROC

; COMDAT rtc$TMZ

rtc$TMZ SEGMENT

;__RTC_Shutdown.rtc$TMZ DD FLAT:__RTC_Shutdown

rtc$TMZ ENDS

; COMDAT rtc$IMZ

rtc$IMZ SEGMENT

;__RTC_InitBase.rtc$IMZ DD FLAT:__RTC_InitBase

rtc$IMZ ENDS

; Function compile flags: /Odtp /RTCsu /ZI

; COMDAT ?ClearUsingPointers@@YAXPAHH@Z

_TEXT SEGMENT

_p$ = -8 ; size = 4

_arr$ = 8 ; size = 4

_size$ = 12 ; size = 4

?ClearUsingPointers@@YAXPAHH@Z PROC ; ClearUsingPointers, COMDAT

; File c:\users\klaudio\desktop\csc342-343 class\10-28-2015\10-28-

2015\clearusingpointers.cpp

; Line 1

push ebp

mov ebp, esp

sub esp, 204 ; 000000ccH

push ebx

push esi

push edi

lea edi, DWORD PTR [ebp-204]

mov ecx, 51 ; 00000033H

mov eax, -858993460 ; ccccccccH

rep stosd

; Line 3

mov eax, 4

imul eax, 0

add eax, DWORD PTR _arr$[ebp]

mov DWORD PTR _p$[ebp], eax

jmp SHORT $LN3@ClearUsing

$LN2@ClearUsing:

mov eax, DWORD PTR _p$[ebp]

add eax, 4

mov DWORD PTR _p$[ebp], eax

Page 34: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

33

$LN3@ClearUsing:

mov eax, DWORD PTR _size$[ebp]

mov ecx, DWORD PTR _arr$[ebp]

lea edx, DWORD PTR [ecx+eax*4]

cmp DWORD PTR _p$[ebp], edx

jae SHORT $LN4@ClearUsing

; Line 4

mov eax, DWORD PTR _p$[ebp]

mov DWORD PTR [eax], 0

jmp SHORT $LN2@ClearUsing

$LN4@ClearUsing:

; Line 5

pop edi

pop esi

pop ebx

mov esp, ebp

pop ebp

ret 0

?ClearUsingPointers@@YAXPAHH@Z ENDP ; ClearUsingPointers

_TEXT ENDS

END

ClearUsingPointersOptimized.asm

; Listing generated by Microsoft (R) Optimizing Compiler Version 17.00.60610.1

.686P

.XMM

include listing.inc

.model flat

INCLUDELIB MSVCRTD

INCLUDELIB OLDNAMES

PUBLIC ?ClearArrayPointerOptimized@@YAXPAHH@Z ; clear_array_pointer

EXTRN __RTC_InitBase:PROC

EXTRN __RTC_Shutdown:PROC

; COMDAT rtc$TMZ

rtc$TMZ SEGMENT

; __RTC_Shutdown.rtc$TMZ DD FLAT:__RTC_Shutdown

rtc$TMZ ENDS

; COMDAT rtc$IMZ

rtc$IMZ SEGMENT

; __RTC_InitBase.rtc$IMZ DD FLAT:__RTC_InitBase

rtc$IMZ ENDS

; Function compile flags: /Odtp /RTCsu /ZI

; COMDAT ?clear_array_pointer@@YAXPAHH@Z

_TEXT SEGMENT

_p$ = -8 ; size = 4

_ary$ = 8 ; size = 4

_size$ = 12 ; size = 4

?ClearArrayPointerOptimized@@YAXPAHH@Z PROC ; clear_array_pointer, COMDAT

; 2 : {

push ebp

mov ebp, esp

Page 35: OPTIMIZATION - The City College of New Yorkvito0681/pdf/optimization.pdf · OPTIMIZATION VITO KLAUDIO 3 2. Overview In order to optimize the machine instructions for my functions

OPTIMIZATION VITO KLAUDIO

34

sub esp, 204 ; 000000ccH

push ebx

push esi

push edi

lea edi, DWORD PTR [ebp-204]

mov ecx, 51 ; 00000033H

mov eax, -858993460 ; ccccccccH

rep stosd

; 3 : int *p;

; 4 : for(p = &ary[0]; p<&ary[size]; p= p+1)

mov eax, DWORD PTR _ary$[ebp]

mov DWORD PTR _p$[ebp], eax

mov ebx, DWORD PTR _size$[ebp]

lea edx, DWORD PTR [eax+ebx*4]

jmp SHORT $LN3@clear_arra

$LN2@ClearArrayPointerOptimized:

add eax, 4

$LN3@clear_arra:

cmp eax, edx

jae SHORT $LN4@ClearArrayPointerOptimized

; 5 : *p = 0;

mov DWORD PTR [eax], 0

jmp SHORT $LN2@ClearArrayPointerOptimized

$LN4@ClearArrayPointerOptimized:

; 6 : }

pop edi

pop esi

pop ebx

mov esp, ebp

pop ebp

ret 0

?ClearArrayPointerOptimized@@YAXPAHH@Z ENDP ; clear_array_pointer

_TEXT ENDS

END