Theory of Algorithms...3 0011 -3 1101 4 0100 -4 1100 5 0101 -5 1011 6 0110 -6 1010 7 0111 -7 1001-8 1000 Table1.1: 4-bitsignedinteger The two’s complement representation allows to

Contents

1 Data and Computing 11.1 Computation Model . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Memory Content Notation . . . . . . . . . . . . . . . . . . . . . . 41.4 Floating-Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.5 Bitwise Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 71.6 Programming Language . . . . . . . . . . . . . . . . . . . . . . . 91.7 Variables, Structs, Arrays . . . . . . . . . . . . . . . . . . . . . . 101.8 Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Algorithm Analysis 132.1 Measuring Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2 Running Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3 Average and Expected . . . . . . . . . . . . . . . . . . . . . . . . 162.4 Asymptotic Notation . . . . . . . . . . . . . . . . . . . . . . . . . 162.5 Asymptotic Properties . . . . . . . . . . . . . . . . . . . . . . . . 182.6 Asymptotic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 202.7 Amortised Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Data Structures 233.1 Dynamic Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

i

Chapter 1

Data and Computing

An algorithm is an explicit, precise, unambiguous, mechanically-executablesequence of elementary instructions intended to accomplish a specific purpose.In the context of computing, the purpose is usually to process data.

Any study of algorithms is built around some computation model that definesexactly what the data is and how it can be manipulated. The models can havedifferent levels of abstration: from purely mathematical (like Turing machine orλ-calculus) to overly practical.

This course covers algorithms to be implemented as computer programs. Forthis purpose we describe a model that is closely related to modern computers. Itdoes have many naive assumptions that do not hold in the real world. However,it will be sufficient for basic algorithm design and analysis and will not distractus with abundant specifics of the particular hardware.

1

1.1 Computation ModelFor an object with two possible states, the bit is a unit of information that

tells us which state the object is currently in. A binary digit, which is 0 or 1,is a perfect example of such object. Another example is an electric signal withthe two states defined by its voltage being close to 0 or to the supply voltage.Unsurpisingly, both things are called bits in the context of computing.

Computer memory is basically an ordered group of semiconductor cells, eachholding a binary electric signal. In more abstract terms, the memory is a finite-length sequences of bits, or bit string. Any piece of data we want to store in thememory has to be represented as a finite bit string. The memory is divided intocontiguous 8-bit chunks called bytes. Each byte is assigned with a sequentialnumber called the address1.

There is also a small set (up to several hunderds) of fast memory locationscalled registers. Each register holds a fixed-length bit string, typically 32- or64-bit, which is also referred to as a word and denoted by w.

CPU (central processing unit) provides a set of instructions to manipulatethe memory bytes and perform computations on registers. A computer programis composed of these instructions2 executed sequentially, one by one.

There are instructions that work with the memory in two particular ways:(1) read some bytes from a memory address into a register (a.k.a. load) or(2) write some bytes from a register into a memory address (a.k.a. store). Inboth cases the memory address is taken from another register. We assume thatthe memory is random-access (hence RAM) meaning that loads and stores takeconstant amount of time regardless of the memory address.

The rest of instructions are register-only, they can:• perform an arithmetic or logical operation on one or two register(s), pos-

sibly putting the result into other register(s);• compare the content two registers;• jump to another instruction in the program, either unconditionally or de-

pending on the result of comparison.We assume that each of these instructions also takes some constant amount

of time. The amount of data processed by each instruction is fixed and limitedby w, typically being 8, 16, 32, or 64-bit. It is a substantial limitations meaningthat our pieces of data must either fit into the word or be composed of word-sizedchunks.

A w-length bit string has 2w possible states that may correspond to varioussets of objects depending on the interpretations. You may be tempted to treatit as a positive integer in base-2 numeral system, but we are not limited withthat.

1Byte is also defined as a minimal addressable unit of memory. There used to be computers withbytes not equal to 8 bits (don’t tell the French!)

2Textual representation of the CPU instructions is also referred to as an assembly language.

2

We assume the CPU has built-in instructions that treat register contents asintegers and floating point numbers, and also bitwise instructions operating onindividual bits.

1.2 IntegersA non-negative integer can be directly expressed using the base-2 (binary)

number system. The binary number an−1, ..., a0 (ai ∈ 0, 1) represents thevalue of

a0 + a1 · 2 + a2 · 4 + · · ·+ an−1 · 2n−1 ≡n−1∑i=0

ai · 2i

It allows a bit string of length n to hold an integer from 0 to 2n − 1, suchrepresentation is called unsigned. The digits a0 and an−1 are called the leastsignificant bit and most significant bit correspondingly. In order to distinguishfrom decimals, binary numbers are denoted with either a subscript, e.g. 110112.

CPU generally can only do arithmetic operations with same-size operands.For binary numbers it is not really restrictive since we can always add leadingzeros to the shorter operand without changing its quantity: 110112 = 000110112.

We also assume the results to have the same length as the operands. Thissometimes leads to mathematically wrong results due to an integer overflow. Forunsigned integers it happens if the intended result is negative or does not fit tothe given bit-length. In both cases the result is taken modulo 2n. In the lattercase this means that only lower bits of the actual result are used:

11001001

− 11111111

11001010

(got 202 ≡ −54 mod 256)

111001001

+ 01011011

00100100

(expected 1001001002 ≡ 292)

Depending of the CPU, such overflow can be detected in case a programmerwants to handle them differently.

Negative values can be represented with signed integers where the range ofpossible values is divided by half. Out of the n bits, the lower n − 1 are usedfor non-negative values in the range [0, 2n−1 − 1]. Their negation is defined bytaking a two’s complement:

−x ≡ 2n − x ≡ x+ 1

where x is a binary negation with each bit changed to the opposite value.The result in case of n = 4 is shown in Table 1.1. You can see that the

negative integers are taking exactly half of the available states ranging from −1to −2n inclusively and perfectly wrapping around 0 when counting backwards.

3

0 00001 0001 -1 11112 0010 -2 11103 0011 -3 11014 0100 -4 11005 0101 -5 10116 0110 -6 10107 0111 -7 1001

-8 1000

Table 1.1: 4-bit signed integer

The two’s complement representation allows to use the same algorithm foradding and subtracting both signed and unsigned integers. The only differenceis in overflow detection since wrapping around 0 does not indicate overflow forsigned integers.

Overall, given 64-bit registers, CPU instructions can do integer arithmetic inthe range [0, 264−1] for unsigned or [−264, 263−1] for signed values. Anythingbeyond that requires additional programming effort.

1.3 Memory Content NotationIn the following chapters the content of memory or registers will be indicated

with the monospace font and an encoding prefix. The prefix 0b will stand forbinary content, with the leftmost figure as the most significant, and the rightmostas the least significant bit. For example, 0b11001010 can represent an 8-bitunsigned number 202 or 8-bit signed number −54.

However, for larger length the ‘raw’ binary encoding becomes cumbersome.To show the binary content in a human-readable way octal and hexadecimal rep-resentations are commonly used. Their advantage is that their digits structurallycorrespond to a fixed number of bits: three for octal (78 = 1112), and four forhexadecimal (f16 = 11112).

This makes a straighforward conversion from binary to oct/hex and back:

The decimal value of the same bits would be 3561. Converting it back tobinary requires a series of divisions by 2 and remembering the remainders, whichis much harder of a mental calculation.

Hexadecimals are prefixed with 0x (0xde9 or 0xDE9) and octals — with 0o(0o6751); it is also a standard notation for the most programming languages.

4

Example. Octal notation is heavily used for Unix file permissions. With regardto file access, Unix-like systems define three scopes of users:

• user: a user that own the file or directory• group: members of the group assigned to the file or directory• others: everybody elseFiles and directories have three permissions that apply to each class: read

(r), write (w), and execute (x). There is a symbolic notations for them:

rwx︸︷︷︸user

r−−︸︷︷︸group

r−−︸︷︷︸others

However, since each permission is binary (you either can or cannot read thefile), a bit string can be used to represent the same permissions: 0b111100100.And these 9 bits perfectly match octal notations: each octal digit indicates per-missions of the corresponding scope: 0o755.

The notation is widely supported in software:

$ touch foo$ chmod 600 foo$ ls -l foo-rw------- 1 <user> <group> 0 Dec 21 23:25 foo

1.4 Floating-PointWhile a bit string of length n can only represent a finite subset of integal

numbers, at least they are from a consecutive range. The latter is impossiblewith real numbers where any non-empty range contains an uncountable numberof elements. This means that any computer representation of real numbers isbound to be approximate with an error that may or may not be suitable forparticular applications.

Just like base-10, base-2 notation can be extended to represent fractionalvalues. It is done by separating the fractional part with a binary point. Thedigits on the right hand from the point are taken with reciprocal power of 2multipliers:

an−1, . . . , a0 . a−1, . . . , a−m ≡n−1∑i=0

ai · 2i +

−m∑i=−1

a−i ·1

2i, ai ∈ [0, 1]

For example:

1011.1010012 = 11 + 1/2 + 1/8 + 1/64 = 11.640625

Unfortunately, the binary point cannot be stored into a computer memory.Its presence can be assumed on a predefined position; such representation of

5

numbers is called the fixed-point. The downside is that it restricts both rangeand precision of the numbers. For example a 32-bit register with a fixed 16-bit fractional part can only give the decimal precision of ≈ 0.000015 and theunsigned magnitude of 65536.

A more practical way to encode fractional values is to use a standard formnotation:

S · F × 2E , S ∈ −1, 1, F ∈ [1, 2), E ∈ Z\0

where S is the sign, F is the fraction (also mantissa or significand) and E is theexponent. Both F and E are binary numbers with fixed number of bits. Thenotation represents a floating-point number; the term refers to the fact that theeffictive size of the fractional part varies depending on the exponent.

Example. A 16-bit fraction and 8-bit exponent can give us an approximate valueof π up to the 4th decimal place:

1.1001001000011112×21 = (1+1/2+1/16+1/128+· · · )×2 = 3.14154052734375

At the same time it can represent numbers as large as the Avogadro constant(defined as exactly 6.02214076× 1023):

1.1111111000011002 × 278 = 6.022124...× 1023

The precision is still limited: only the first 5 significant digits are correct.

The most common binary encoding for floating-point numbers follows thestandard called IEEE 754 and consists of the following bits:

sign︷︸︸︷s e0e1 · · · en−1︸︷︷︸

unsigned biased exponent

fraction︷︸︸︷f0f1 · · · fm−1

which stands for the number

(−1)s [1.f1 · · · fm−1]2 × 2e−B

where B is a constant called bias. It allows to store the exponent as an unsignedinteger, while the exponent itself can be negative (in the range [−B, 2n − B]).Also note that since the fraction is always greater than 1, only its fractional partis encoded.

IEEE 754 numbers typically the size of 32 bits (single precision) or 64 bits(double precision); Table 1.2 summarises parameters of both encodings.

6

Total Sign Exponent Fraction Bias Magnitudes, ≈32 1 8 23 127 ±1.18× 10−38, ±3.4× 1038

64 1 11 52 1023 ±2.2× 10−308, ±1.8× 10308

Table 1.2: Typical sizes of floating point IEEE 754 numbers

Example. Let us find a 32-bit IEEE 754 representation of 0.4.Find a normalising exponent i.e. such that makes a fraction greater than 1:

E = blog2 0.4c = −2

It makes a fraction: F = 0.4/2−2 = 1.6. Since the leading 1 is omitted inthe encoding, look for a binary fraction of 0.6 (the first few steps are shown inTable 1.3):

0.6 = 0.100110011001100110011012

Remainder Coefficient Bit0.6 0.5 10.1 0.25 00.1 0.125 00.1 0.0625 10.0375 0.03125 10.00625 0.015625 0... ... ...

Table 1.3: Step of finding a binary fraction of 0.6

Encode the exponent by adding the bias (127 for single-precision numbers):

−2 + 127 = 125 = 011111012

Combine the sign bit (0), exponent, and fraction to get the full representation(vertical bars are for readability):

0.4|IEEE754, 32-bit = 0b0|01111101|10011001100110011001101

We assume that the CPU supports arithmetic instructions for both 32- and64-bit IEEE 754 numbers (as it is the case for modern general-computing pro-cessors).

1.5 Bitwise OperationsMost CPU also support bitwise operations, i.e. such that operate on register

content at the level of individual bits. They allow programmers to design custom

7

binary representation of data apart from the built-in types. Sometimes theseoperations are referred to as logical since they treat bits as values in the Booleanalgebra with 0 being false and 1 being true.

Shift operations move the bits of a register to the left or right by a givennumber of places. Logical shift always fills the vacant places with zeros:

0b01001111left, 1−−−→ 0b10011110

0b01001111right, 2−−−−→ 0b00010011

Given an unsigned integer, shifting it n bits to the left multiplies it by 2n,while shifting it n bits to the right divides it by 2n rounding the result down:

5 ≡ 1012left, 2−−−→ 101002 ≡ 20

5 ≡ 1012right, 1−−−−→ 102 ≡ 2

This property is often applied in programming since such shifts are fasterthan the actual integer multiplication and division. In fact, it it so common thatmost CPU also implement arithmetic shift that preserves the sign bit of signedintegers by copying it to the leftmost vacant place:

−10 ≡ |8-bit signed0b11110110right, 1−−−−→ 0b11111011 ≡ −5

Bitwise NOT performs logical negation of each bit flipping their values:

NOT 0b11001001 = 0b00110110

Bitwise AND, OR, and XOR operations take two operands and perform logicalAND, OR, and exclusive OR (see the truth table in Table 1.4) on each pair ofcorresponding bits:

11001001

AND 10100111

10000001

11001001

OR 10100111

11101111

11001001

XOR 11111111

00110110

a b a AND b a OR b a XOR b0 0 0 0 00 1 0 1 11 0 0 1 11 1 1 1 0

Table 1.4: Truth table for bitwise logical operations

8

AND can zero out particular bits while preserving the other. Another appli-cation is testing whether a particular bit is 1:

11001001

AND 00001111

00001001

(zero out the higher half)

11001001

AND 00001000

00001000

(non-zero is the bit is set; zero otherwise)

OR can set particular bits to 1 while preserving the others:

11001001

OR 00001111

11001111

(set the lower half to all 1s)

XOR can flip particular bits while preserving the others:

11001001

XOR 00001111

11000110

(flips the lower half)

1.6 Programming LanguageWe do not program with CPU instructions directly (machine code) anymore,

not even with their textual representation (the assembly language). Instead,higher level programming languages are used, which can be divided into twogroups depending on how they ‘become’ the machine code:

• compiled languages require tools called compilers to translate the sourcecode into a file containing the machine code (an executable), which is thencan be executed to run the program;

• intepreted languages depend on tools called iterpreters that statement-by-statement translate the source code into machine code and execute,without the intermediate compilating step;

While the distinction between the two is not always clear, for now it is suf-ficient to assume that some languages are primarily compiled (C, C++, Java,Go, Rust) and some are primarily iterpreted (Javascript, Python, Ruby).

In this course we will primarily use the Go programming language (alsoGolang), which is a compiled language syntactically influenced by C. The choiceis mainly because of the simple data model with fixed-size types, and syntax thatmakes a clear connection between the source and machine code. For example,arithmetic operations in Go always take constant amount of time (since theyare implemented with corresponding CPU instructions), and non-constant time

9

statements can be clearly identified in the syntax3.Sometimes we will also use pseudocode language in cases where Go is too

specific. We assume that each pseudocode statement is compiled into someconstant number of CPU instructions, unless specified explicitly. This meansthat each statement can only process a constant amount of data and take aconstant time.

In the remaining sections we will cover some fundamental data primitives inprogramming languages and how they are used in Go.

1.7 Variables, Structs, ArraysA variable is a symbolic name that represents a region of memory holding a

value. Its size and interpretation of the value depends on the variable type. InGo each variable has either static (known during the compilation) or dynamic(determined at run-time) type.

Table 1.5 shows some of the basic data types of Go and their sizes. int<...>types stand for signed and uint<...> for unsigned integers. float<...> cor-respond to IEEE 754 floating-point numbers. Note that the type sizes do notexceed 64-bit so that they fit into a register and arithmetic operations on themdirectly match CPU instructions. There are also int and uint types whose sizeis either 32 or 64 depending on the implementaion.

8-bit bool, byte, uint8, int816-bit uint16, int1632-bit uint32, int32, float32, rune64-bit uint64, int64, float64

Table 1.5: Primitive types in Go

Go provides standard notation for arithmetic and bitwise operations. Unlikemany languages, Go does not allow operations on different types and requiresexplicit type conversion:

var (a uint8 = 15; b uint8 = 25; c float64 = 4.2);a * b // -> 119, silent overflowa - b // -> 246, silent overflow(a << 2) - b // -> 35, left shift by 2 bitsa * c // -> not allowedfloat64(b) * c // -> 105.0

3Unlike Python, where integers have arbitrary precision so that arithmetic with them may takea non-constant time depending on the values. Also there can be a lot of confusion even with basicoperators; such as in for substring inclusion, which is implemented not at all trivially.

10

A struct can be used to create a new type that glues related named values(fields) into a single collection. The values can have different types. Struct fieldsare places contiguously in memory in order of their declaration:type RGB struct

R, G, B uint8colour := RGB0xFF,0x41,0x7Bcolour.G // -> 0x41

An array holds a constant number of same-type values contiguously in mem-ory; the values can be accessed by 0-based indexes in constant time. This ispossible because the values have the same size: if an array starts at the addressA and holds values of type T , the value of index i is located at

A+ size(T ) · i

Compilers do the math every time you use the square brakets operator:

P := [5]int162,3,5,7,11P[3] // -> 7P[4] = 13

string type in Go is equivalent to an array of bytes except that strings areimmutable. Index accesses will retrieve individual bytes as uint8 values, whichare not at all equivalent to characters in general case.

a := "zя"a[0] // -> 0x7a, ASCII code for ‘z‘a[1] // -> 0xd1, lower byte in UTF-8 encoding of ‘я‘a[2] // -> 0x8f, higher byte in UTF-8 encoding of ‘я‘

1.8 PointersWhen you assign a variable to another variable or pass it into a function,

its value is copied as-is into another location in memory. This includes structsand arrays, which depending on the size of those may significantly affect bothrunning time and memory space.

func main()a := [1024]int64 ... b := afoo(b)

func foo(a [1024]int64)

11

// at this point the array is copied 3 times in memory// 3 * 8096B = 24 kB is used

A pointer is a values that holds a memory address of another value of aparticular type. With a pointer you can read or modify the value at the memorylocation it references. Pointer to a value of type T has type *T. Internally pointersare unsigned integers same size as uintptr (32 or 64 bits, depending on yoursystem) except that you cannot easily treat them as integers4.

Pointers are especially useful for large data structures: you can create a singlecopy of your data and use it everywhere else via pointers thus avoiding expensivecopying of the data itself:

func main()a := [1024]int64 ... b := &a // pointer to the array, not the array itselffoo(b)

func foo(a *[1024]int64)a[43] = a[41] + a[42]// 8096 bytes on the array + 16 bytes on pointers

4This property is called pointer arithmetic and can be found in C or C++.

12

Chapter 2

Algorithm Analysis

The goal of analysing an algorithm is predicting the resource consumpitionof its implementation without actually implementing and running it. Anothergoal is to compare the results of such analysis for multiple algorithms solvingthe same problem and identify the most efficient one.

In this course will focus on such resources as running time and memory.

13

2.1 Measuring TimeWe cannot easily predict the exact algorithm running time in seconds as it

depends on too many factors: hardware, operating system, programming lan-guage, compiler, and the quality of the algorithm implementation itself.

Instead we use abstract units of time. By saying that an algorithm has therunning time N we mean that its execution takes N time units.

We assume that each basic statement in our programming language (or pseu-docode) takes a single unit of time. By ‘basic’ we mean statements that can beexecuted with a constant number of CPU instructions and operate on a constantamount of data.

These assumptions may seem naıve. Clearly, floating point division, due toits complexity, takes longer than an unsigned addition. And yet, both operationsare considered taking the same unit of time. Instructions count can also vary:some statements take a single instruction, some take hundrends. Later we willsee that these details can be ignored as long as the real running time of thesestatements can be bound with a constant.

In Go, due to its fixed-sized types, most arithmetic and assignment state-ments are basic. For instance, we can assume that line 4 in the following codetakes a single unit of time:

1 var A []int = []int<...>2 var product int = 13 for _, x := range A 4 product *= x5

It does not hold for the similar Python code: due to its arbitrary-precisionintegers the multiplication running time depends on the actual array values andincreases as product gets more digits:

1 A = [<...>]2 product = 13 for x in A:4 product *= x

While the Go example computes mathematically incorrect product due tooverflows, its running time can be analysed without imposing additional restric-tions on the values.

2.2 Running TimeGiven an algorithm implementation, we can immediately compute its running

time on a particular input X by counting the units of time it takes. Assumingthe algorithm is deterministic (i.e., always executing the same sequence of stepsgiven a particular input), its running time T (X) can be defined as a function

14

of input. However, functions with such an abstract domain are hard to analyse:think of defining T (X) for a sorting algorithm on all possible arrays of integers.

Instead the running time is studied as a function of the input size denotedby n ∈ N. Generally, the larger the input, the longer it takes the algorithm torun. Usually the input is measured as a number of items, e.g. array elements, orbits in a number. Other measures are also possible depending on the particulaltype of data. In any case, for an input X we denote its size as |X|.

T∗(n) denotes the running time on input size n. However, it is not a propermathematical function (hence the asterisk): different inputs of the same size canresult in different running times.

Consinder an algorithm for finding the minimum in an array of unsignedintegers in Listing 2.1. The comments give numbers of times each statementis executed, where n is the array length. Note that for the loop body thesenumbers vary depending on the position of the minimum, and also on whetherthe array contains 0.

1 func min(A []uint) uint // assumes a non -empty array2 n := len(A) // (1)3 x := ^uint (0) // (1)4 for i := 0; i < n; i++ // (1); (n+1); (n)5 if A[i] == 0 // [1, n]6 return 0 // [0, 1]7 else if A[i] < x // [0, n]8 x = A[i] // [0, n]9

10 11 return x // [0, 1]12

Listing 2.1: Finding the mininum in an array of unsigned integers

We will still use T∗(n) when referring to the running time in general. How-ever, in algorithm analysis we will usually focus on finding the worst-case runningtime, i.e. the longest running time for any input of size n:

Tw(n) = max|X|=n

T (X)

The worst-case analysis gives an upper bound on the running time: it is guar-anteed that the algorithm will newer take any longer regardless of the input. Itis essentially a level-playing field for algorithms efficiency: we assume that thedata is equally ‘bad’ and focus on how algorithms handle that.

For the sake of completeness we can also define a lower bound, or the best-caserunning time:

Tb(n) = min|X|=n

T (X)

However the best-case analysis bears little practical significance: such cases aremostly trivial and do not represent any real usage the algorithms.

15

For Listing 2.1 we can see that Tw(n) = 5n + 4 (if the array is sorted inreverse order) and Tb(n) = 6 (if A[0] == 0).

Figure 2.1: Running time as a function of input size

2.3 Average and ExpectedAlthough less common in practice, average-case running time is defined as

an expected value of the running time over all inputs of size n as if they occurdue to some probability distribution. The uniform input distribution is oftenassumed, in which all inputs of size n are equally likely:

Tavg(n) =∑|X|=n

T (X) · Pr[X]

There is also a class of randomised algorithms that can make a random choiceduring their execution. The choice is normally made based on a random numbergenerator with known probability properties. Therefore, while a randomisedalgorithm can have a variable running time on the same input, we can reliablycompute its expected value.

We use the term expected running time to refer to the running time of ran-domised algorithms, taking expectation over the internal random choices (ratherthan treating the input as a random variable). We are particularly interested inthe worst-case expected running time:

Texp(n) = max|X|=n

Er [T (X, r)]

where r denotes in internal random choice make by the algorithm.

2.4 Asymptotic NotationIn Section 2.2 we have got a linear formula for the running time of finding

an unsigned minimum:T1(n) = 5n+ 4 (2.4.1)

16

Such simplicity does not always happen if you compute the running time pre-ciesly. For example, in [1] Knuth gives the following average-case running timefor insertion sort algorithm:

T2(n) = 2.25n2 + 7.75n− 3Hn − 6 where Hn =

n∑k=1

1

k. (2.4.2)

Luckily, for our purposes such precision is not required:• If n is large enough, contributions of the lower-order terms are so insignifi-

cant that we can ignore them at all. This leaves us with the running timesof 4n and 2.25n2 respectively.

• We are interested in the order of growth, i.e. how the running time isaffected by changing the input size n. This allows us to drop the con-stant factors too and just say that T1(n) grows linearly, and T2(n) growsquadratically.

We formalise these observations using asymptotic notation that described thebehaviour of functions as their arguments tend to infinity.

Definition (big-O). We write f(n) = O(g(n)) (and say ‘f is big-oh of g’) ifthere exist positive integers N and K such that

|f(n)| ≤ K |g(n)| ∀n ≥ N (2.4.3)

In other words, f(n) = O(g(n)) means that for sufficiently large n the valuesof f(n) lie below some constant multiple of g(n). It is said that g(n) is anasymptotic upper bound of f(n).

Proposition. Any polynomial of degree m is O(nm):

Pm(n) = a0 + a1n+ a2n2 + · · ·+ amn

m = O(nm)

Proof. Let n ≥ 1

|Pm(n)| ≤ |a0|+ |a1|n+ · · ·+ |am|nm =(|a0| /nm + |a1| /nm−1 + · · ·+ |am|

)nm

≤ (|a0|+ |a1|+ · · ·+ |am|)nm

So in (2.4.3) we can take K = |a0|+ |a1|+ · · ·+ |am| and N = 1.

We can now see that T1(n) = O(n) and T2(n) = O(n2). On the other hand,the definition does not stop us from saying that both T1 and T2 are also O(n3).To make a stronger statement about the order of growth we need a similarnotation for the asymptotic lower bound:

Definition (big-omega, Ω). We write f(n) = Ω(g(n)) if there exist positiveintegers N and K such that

|f(n)| ≥ K |g(n)| ∀n ≥ N (2.4.4)

17

Corollary. f(n) = Ω(g(n)) if and only if g(n) = O(f(n))

Proposition. Any polynomial∑m

i=0 aini with am > 0 is Ω(nm).

Proof. Let n ≥ 1

|Pm(n)| ≥m∑i=0

aini ≥ amnm −

m−1∑i=0

|ai|ni =

(am −

m−1∑i=0

|ai|nm−i

)nm (2.4.5)

Now, because |ai| are finite, there exists N such that

|ai| /nm−i ≤ am/2m, ∀i ∈ [0,m− 1], ∀n ≥ N

It makes an upper limit on the inner sum:

m−1∑i=0

|ai|nm−i

≤m−1∑i=0

am2m

=am2, ∀n ≥ N

Putting the sum into (2.4.5) we get

|Pm(n)| ≥ (am − am/2)nm = (am/2)nm, ∀n ≥ N

This makes (2.4.4) if we take K = am/2.

Since asymptotic lower and upper bounds of the polynomial Pm(n) are thesame, we can combine them in a single notation:

Definition (big-theta, Θ). We write f(n) = Θ(g(n)) if

[f(n) = O(g(n))] ∧ [f(n) = Ω(g(n))]

or, alternatively, there exist positive integers K1, K2, and N such that

K1 |g(n)| ≤ |f(n)| ≤ K2 |g(n)| ∀n ≥ N

In other words, f(n) = Θ(g(n)) says that the behaviour of f(n) can becaptured with g(n) such that for all sufficiently large n the functions are equalup to a constant factor. It is said that g(n) is an asymptotically tight bound off(n).

Corollary. Polynomial∑m

i=0 aini with am > 0 is Θ(nm).

2.5 Asymptotic PropertiesUsually it is easier to proof asymptotic relations using limits rather than

definitions from the previous sections:

18

Proposition.

limn→∞

f(n)/g(n) 6=∞ ⇒ f(n) = O(g(n))

limn→∞

f(n)/g(n) 6= 0 ⇒ f(n) = Ω(g(n))

limn→∞

f(n)/g(n) /∈ 0,∞ ⇒ f(n) = Θ(g(n))

Note however that these properties are one-sided. For example, n2 cosn =O(n2) even though the limit of their quotient does not exist (see Figure 2.2).

Figure 2.2: big-O holds even if the quotient limit does not exist

Big-O notation can be equivalently defined as a set of functions of n:

O(g(n)) = f(n) | ∃K,N : ∀n ≥ N → |f(n)| ≤ K |g(n)|

In this case f(n) = O(g(n)) is just another form of f(n) ∈ O(g(n)). Same goesfor Ω and Θ. We can define formulaes with these symbols as sets as well:

c ·O(g(n)) = c · f(n) | f(n) ∈ O(g(n)) , c = const

p(n) +O(g(n)) = p(n) + f(n) | f(n) ∈ O(g(n))

Now, if α(n) and β(n) are formulae that include asymptotic notation, we candefine their equality as set inclusion:

α(n) = β(n) ⇐⇒ α(n) ⊆ β(n)

Consequently we can use asymptotic notation in equations replacing functionswe do not want to specify explicitly. For example, one can see that

2n2 + 3n+ 1 = 2n2 + Θ(n) = Θ(n2)

One can prove that the following properties apply to asymptotic notation:

f(n) = O(f(n)) c ·O(f(n)) = O(f(n)) O(f(n)) ·O(g(n)) = O(f(n)g(n))

f(n) = Ω(f(n)) c · Ω(f(n)) = Ω(f(n)) Ω(f(n)) · Ω(g(n)) = Ω(f(n)g(n))

f(n) = Θ(f(n)) c · Ω(f(n)) = Θ(f(n)) Θ(f(n)) ·Θ(g(n)) = Θ(f(n)g(n))

19

Note that multiplication by constant is ignored by the asymptotic notation.It justifies the assumption from Section 2.1 where we have made all ‘basic’ state-ments to take a single unit of time. Even if we accounted different kinds ofstatements separately, it would boil down to an additional constant factor thatwould not change the asymptotic approximation.

Another consequence is that we can ignore the bases of logarithms:

O(loga n) = O(logb n/ logb a) = O(const · logb n) = O(logb n)

Because of this we will simply use log n in the asymptotic context to denote alogarithmic function of any base.

2.6 Asymptotic AnalysisAsymptotic notation provides such a good approximation that we rarely

consider precise running time functions at all. Instead we directly find theirasymptotic bounds.

We are mostly intrested in the tight bounds (in terms of Θ) on the runningtime functions (such as Tw(n) and Tavg(n)). Non-tight bounds (O and Ω) willalso be used to describe the running time in general (T∗(n)), which is not aproper function and may not have a tight bound (see Figure 2.3).

N.B.: People often abuse big-O notation by assuming that it specifies a tightbound. In these notes we use big-O only to denote the upper bound.

Figure 2.3: Illustration of big-O and Ω for an algorithm’s running time

The bounds have to be simple enough so that we can compare them betweendifferent algorithms. The good news is that there are only so many functions thatoccur in basic algorithm analysis. Here they are listed in the order of increasingorder of growth:

20

• Constant: Θ(1), e.g. the cost of operations independent on the input size.• Logarithmic: Θ(log n), e.g. a number of splits when you take a half of the

input items, than a half of that half, and so on.• Linear: Θ(n), e.g. the cost of accesing each input item a constant number

of times.• Superlinear: Θ(n log n), arises in comparison sorts.• Quadratic: Θ(n2), and, generally, polynomial: Θ(na) with constant a > 2• Exponential: Θ(cn) for a constant c > 1, e.g. the cost of enumerating all

subsets of n items• Factorial: Θ(n!), e.g. the cost of generating all permutations of n items.Table 2.1 converts these functions into approximate running times in seconds

for various input sizes under the assumption that a unit of time equals 1 ns. Itshows that even the coarse asymptotic analysis gives a good idea on algorithmspracticality for a given input size.

n Θ(logn) Θ(n) Θ(n logn) Θ(n2) Θ(2n) Θ(n!)10 3 ns 10 ns 33 ns 100ns 1µs 3.6ms20 4ns 20 ns 86 ns 400ns 1ms 77 y30 5ns 30 ns 147ns 900ns 1 s 8.4× 1015 y40 5ns 40 ns 213ns 1.6 µs 18.3min50 6ns 50 ns 282ns 2.5 µs 13 d100 7ns 100ns 664ns 10µs 4× 1013 y103 10ns 1 µs 10µs 1ms104 13ns 10µs 133µs 100ms105 17ns 100µs 1.67ms 10 s106 20ns 1ms 20ms 16.7min107 23ns 10ms 233ms 1.16d108 27ns 100ms 2.66 s 116d109 30ns 1 s 29.9 s 31.7 y

Table 2.1: Running time in seconds for common asymptotic bounds

2.7 Amortised AnalysisConsider an k-bit binary value; how many bits are flipped every time we

increment it? Depending on the initial value, the answer may vary from 1(00102 + 1 = 00112) to k (01112 + 1 = 10002).

Under worst-case analysis we would assume that each increment flips k bits.But this (even intuitively) is too pessimistic. For example, starting from 01112the number of flipped bits on each increment would be 4, 1, 2, 1, 3, 1...

Consider instead an effective number of flipped bits per increment: count thetotal number of flipped bits after n increments and divide it by n.

Starting from k zeros, doing n = 2k increments will follow the bits throughall possible states. The most significant bit will be flipped twice: from 0 to 1and back to 0; the second most significant bit will have twice as many flips;

21

and so on up to the least significant bit that flips on each increment, i.e. 2k

times. The total number of flipped bits boils down to∑k

i=1 2i = 2k+1 − 1 with(2k+1 − 1

)/2k ≈ 2 bits effectively flipped per increment, which is way better

than the original worst-case estimation.This technique is called amortised analysis and used if sequentially repeating

the same operation results in singificantly varying per-operation performance,while the overall performance of the sequence is better than a worst-case esti-mation.

Amortised running time of an algorithm is an average over a series of nexecutions of the algorithm for some large n. It should be distinguished from aguaranteed running time, which holds for any execution.

The accounting method is another way to do amortised analysis. It assumesthat each operation has some virtual amortised cost. Part of the cost goes tothe actual work while the rest is deposited within the data to be used later.

Assume that flipping a bit costs $1, and the increment costs $2. Out of the$2 one dollar is spent to flip a bit from 0 to 1 (on each increment there is alwaysa single bit that becomes 1), and another dollar is deposited on this bit.

This does make a sustainable economy, consider incrementing 01112 thatrequires 4 bit flips. With $2 we can only flip the MSB and deposit a dollar ontoit. The rest of the bits are flipped on the money we saved before:

Therefore, an effective cost of each increment equals to 2 bit flip, which isthe same result we got earlier. The key ingredient of the accounting methodis to identify what parts of the data are responsible for the fluctuation of theoperation cost. By putting ‘savings’ onto these parts, we even out the fluctuationso that it does not affect the amortised cost.

22

Chapter 3

Data Structures

23

3.1 Dynamic ArraysIn some programming languages (e.g. C and C++) it is possible to directly

allocate a contiguous chunk of memory for any data type. We assume a chunk ofsize n can be allocated in a constant time Θ(1) if we take the memory as-is (e.g.via malloc), or Θ(n) if we fill the memory with zeroes first (e.g. via calloc).

If we use this memory for an array, its size cannot be easily changed. Memorymanagers can only allocate a new chunk (which will not necessarily be adjacentto the old one) or deallocate the existing chunk as a whole.

Dynamic array1 is a resizable array-based data structure that supports insert-ing and deleting elements. It consists of an underlying array A of length c (calledcapacity), where only n ≤ c prefix elements can be accessed, see Figure 3.1. Then defines the size of dynamic array.

Figure 3.1: Underlying memory allocation of a dynamic array

Inserting at the end takes a constant time if n < c: assign the element toA[n] and increment n. However, if n = c, then (1) a larger underlying array hasto be allocated, (2) existing elements have to be copied to the old array to thenew one, and (3) the new element has to be inserted. In this case inserting anelement takes Θ(n) time.

How much memory should the new array take? If we allocate c+ 1 elements,another insertion will cause the costly expansion again. On the other hand, wedo not want to allocate too much essentially unused memory in advance.

Proposition. In a dynamic array, if the underlying array is expanded by aconstant factor α > 1, apppending an element at the end takes Θ(1) amortisedrunning time.

Proof. For α = 2 we could use accounting method introduced in Section 2.7.Let us proof a more general case instead.

Amortised running time is an average of n sequential operations. We canstart with an empty dynamic array and make n insertions at its end.

Assume that the empty dynamic array is initialised with a constant-sizeunderlying array of length b. Then it is expanded to bα, bα2, and so on until itreaches bαk = n. With bαi−1 elements being copied on ith expansion, it makes

1Also called vector or array list.

24

the following contribution to the overall running time:

T (n) = n+

k∑i=1

bαi−1

Get rid of the summation using a formula for geometric series:

T (n) = n+b(1− αk)

(1− α)= n+

b(1− αlogα(n/b))

(1− α)= n

(α− b/nα− 1

)≤ n

(α

α− 1

)Consequently, inserting an element at the end of a dynamic array takes Θ(1)amortised running time proportinal to α/(α− 1).

Deleting from the end could take a constant time: all we need to do is decre-ment n. However, in order to avoid waste of memory, the underlying array hasto be contracted if n reaches a certain threshold. Taking the contraction intoaccount, deletions from the end also take Θ(1) amortised running time.

It is important to have the threshold strictly less than c/α, see Figure 3.2.Otherwise a sequence of alternating insertions/deletions will cause repeaded ex-pansions and contractions degrading the performance.

Figure 3.2: Different thresholds for expansion and contraction of dynamic array

Insertion and deletion in the middle can also be implemented, but they arenot as efficient (see Figure 3.3). Insertion requires moving the ‘tail’ elements oneplace to the right to make a space for the new element. Similarly, deletion makesa hole that has to be filled by moving the tail one place to the left. Consequently,the worst-case running time of these operations is Θ(n).

Figure 3.3: Insertion and deletion in the middle of dynamic array

25

Dynamic array is a standard data structure for most programming languages:it is used for std::vector in C++, list in Pytnon, and slices in Go2.Someimplementation allow to control the capacity to minimise expansions if the futurearray size is known (at least approximately) in advance.

Growth factor α is an implementation-specific parameter. While many text-books discuss doubling the capacity on each expansion, most implementationsuse α < 2, e.g. 1.25 in Go’s slices or 1.125 in Python’s lists.

2Well, not exactly, e.g. Go does not provide contraction of the underlying array. Also slices maynot necessarily point to the prefix of underlying array.

26

Bibliography

[1] Donald E. Knuth. The Art of Computer Programming, Volume 3: (2NdEd.) Sorting and Searching. Addison Wesley Longman Publishing Co., Inc.,Redwood City, CA, USA, 1998.

27

Documents

Theory of Algorithms...3 0011 -3 1101 4 0100 -4 1100 5 0101 -5 1011 6 0110 -6 1010 7 0111 -7 1001-8 1000 Table1.1: 4-bitsignedinteger The two’s complement representation allows to