CS Fundamentals: Scalability and Memory

SCALABILITY AND MEMORYCS FUNDAMENTALS SERIES

http://bit.ly/1TPJCe6

HOW DO YOU MEASURE AN ALGORITHM?

CLOCK TIME?

DEPENDS ON WHO’S COUNTING.

ALSO, TOO FLAKY EVEN ON THE SAME

MACHINE.

THE NUMBER OF LINES?

THE NUMBER OF

LINES?

THIS IS TWO LINES, BUT A WHOLE LOT OF STUPID.

THE NUMBER OF CPU CYCLES?

THE NUMBER OF

CPU CYCLES?

DEPENDS ON THE RUNTIME.

ALL THESE METHODS SUCK.

NONE OF THEM CAPTURE WHAT WE ACTUALLY CARE ABOUT.

ENTER BIG O!

ASYMPTOTIC ANALYSIS

▸ Big O is about asymptotic analysis

ASYMPTOTIC ANALYSIS

▸ In other words, it’s about how an algorithm scales when the numbers get huge

ASYMPTOTIC ANALYSIS

▸ You can also describe this as “the rate of growth”

ASYMPTOTIC ANALYSIS

▸ You can also describe this as “the rate of growth”

▸ How fast do the numbers become unmanageable?

ASYMPTOTIC ANALYSIS

▸ Another way to think about this is:

ASYMPTOTIC ANALYSIS

▸ What happens when your input size is 10,000,000? Will your program be able to resolve?

ASYMPTOTIC ANALYSIS

▸ What happens when your input size is 10,000,000? Will your program be able to resolve?

▸ It’s about scalability, not necessarily speed

PRINCIPLES OF BIG O

▸ Big O is a kind of mathematical notation

PRINCIPLES OF BIG O

▸ In computer science, it means essentially means

PRINCIPLES OF BIG O

“the asymptotic rate of growth”

PRINCIPLES OF BIG O

“the asymptotic rate of growth”▸ In other words, how does the running time of this function

scale with the input size when the numbers get big?

PRINCIPLES OF BIG O

▸ Big O notation looks like this:

PRINCIPLES OF BIG O

▸ Big O notation looks like this:

O(n) O(nlog(n)) O(n2)

PRINCIPLES OF BIG O

▸ n here refers to the input size

PRINCIPLES OF BIG O

▸ Can be the size of an array, the length of a string, the number of bits in a number, etc.

PRINCIPLES OF BIG O

▸ O(n) means the algorithm scales linearly with the input

PRINCIPLES OF BIG O

▸ O(n) means the algorithm scales linearly with the input

▸ Think like a line (y = x)

PRINCIPLES OF BIG O

▸ “Scaling linearly” can mean 1:1 (one iteration per extra input), but it doesn’t necessarily

PRINCIPLES OF BIG O

▸ “Scaling linearly” can mean 1:1 (one iteration per extra input), but it doesn’t necessarily

▸ It can simply mean k:1 where k is constant, like 3:1 or 5:1 (i.e., a constant amount of time per extra input)

PRINCIPLES OF BIG O

PRINCIPLES OF BIG O▸ In Big O, we strip out any coefficients or smaller factors.

▸ The fastest-growing factor wins. This is also known as the dominant factor.

▸ Just think, when the numbers get huge, what dwarfs everything else?

▸ O(5n) => O(n)

▸ O(½n - 10) also => O(n)

PRINCIPLES OF BIG O

▸ O(k) where k is any constant reduces to O(1).

PRINCIPLES OF BIG O

▸ O(200) = O(1)

PRINCIPLES OF BIG O

▸ O(200) = O(1)

▸ Where there are multiple factors of growth, the most dominant one wins.

PRINCIPLES OF BIG O

▸ O(200) = O(1)

▸ Where there are multiple factors of growth, the most dominant one wins.

▸ O(n4 + n2 + 40n) = O(n4)

PRINCIPLES OF BIG O

▸ If there are two inputs (say you’re trying to find all the common substrings of two strings), then you use two variables in your Big O notation => O(n * m)

PRINCIPLES OF BIG O

▸ Doesn’t matter if one variable probably dwarfs the other. You always include both.

PRINCIPLES OF BIG O

▸ O(n + m) => this is considered linear

PRINCIPLES OF BIG O

▸ O(n + m) => this is considered linear

▸ O(2n + log(m)) => this is considered exponential

COMPREHENSION TEST

Convert each of these to their appropriate Big O form!

COMPREHENSION TEST

▸ O(3n + 5)

COMPREHENSION TEST

▸ O(3n + 5)

▸ O(n + 1/5n2)

COMPREHENSION TEST

▸ O(3n + 5)

▸ O(n + 1/5n2)

▸ O(log(n) + 5000)

COMPREHENSION TEST

▸ O(3n + 5)

▸ O(n + 1/5n2)

▸ O(log(n) + 5000)

▸ O(2m3 + 50 + ½n)

COMPREHENSION TEST

▸ O(3n + 5)

▸ O(n + 1/5n2)

▸ O(log(n) + 5000)

▸ O(2m3 + 50 + ½n)

▸ O(nlog(m) + 2m2 + nm)

▸ What should n be for this function?

For each character in the string…

Unshift them into an array…And then join the array together.

Let’s break it down.

Make an empty array.

▸ Initialize an empty array => O(1)

▸ Then, split the string into an array of characters => O(n)

▸ Then for each character => O(n)

▸ Unshift into an array => O(n)

▸ Then join the characters into a string => O(n)

We’ll see later why this is.

These multiply. => O(n2)

▸ O(n2 + 2n) = O(n2)

▸ This algorithm is quadratic.

▸ O(n2 + 2n) = O(n2)

▸ This algorithm is quadratic.

▸ Let’s see how badly it sucks.

Benchmark away!

(showSlowReverse.js)

TIME COMPLEXITIES WAY TOO FAST

Constant O(1) math, pop, push, arr[i], property access, conditionals, initializing a variable

Logarithmic O(logn) binary search

Linear O(n) linear search, iteration

Linearithmic O(nlogn) sorting (merge sort, quick sort)

Quadratic O(n2) nested looping, bubble sort

Cubic O(n3) triply nested looping, matrix multiplication

Polynomial O(nk) all “efficient” algorithms

Exponential O(2n) subsets, solving chess

Factorial O(n!) permutations

TIME TO IDENTIFY TIME COMPLEXITIES

OPTIMIZATIONS DON’T ALWAYS MATTER

BOTTLENECKS

▸ A bottleneck is the part of your code where your algorithm spends most of its time.

BOTTLENECKS

▸ Asymptotically, it’s wherever the dominant factor is.

BOTTLENECKS

▸ If your algorithm is has an O(n) part and an O(50) part, the bottleneck is the O(n) part.

BOTTLENECKS

▸ If your algorithm is has an O(n) part and an O(50) part, the bottleneck is the O(n) part.

▸ As n => ∞, your algorithm will eventually spend 99%+ of its time in the bottleneck.

BOTTLENECKS

▸ When trying to optimize or speed up an algorithm, focus on the bottleneck.

BOTTLENECKS

▸ Optimizing code outside the bottleneck will have a minuscule effect.

BOTTLENECKS

▸ Optimizing code outside the bottleneck will have a minuscule effect.

▸ Bottleneck optimizations on the other hand can easily be huge!

BOTTLENECKS

▸ If you cut down non-bottleneck code, you might be able to save .01% of your runtime.

BOTTLENECKS

▸ If you cut down on bottleneck code, you might be able to save 30% of your runtime.

BOTTLENECKS

▸ If you cut down on bottleneck code, you might be able to save 30% of your runtime.

▸ Better yet, try to lower the time complexity altogether if you can!

BOTTLENECK EXERCISE

SPACE COMPLEXITY

▸ Same thing, except now with memory instead of time.

SPACE COMPLEXITY

▸ Do you take linear extra space relative to the input?

SPACE COMPLEXITY

▸ Do you take linear extra space relative to the input?

▸ Do you allocate new arrays? Do you have to make a copy of the original input? Are you creating nested data structures?

COMPREHENSION CHECK

▸ What is the space complexity of:

COMPREHENSION CHECK

▸ max(arr)

COMPREHENSION CHECK

▸ max(arr)

▸ firstFive(arr)

COMPREHENSION CHECK

▸ max(arr)

▸ firstFive(arr)

▸ substrings(str)

COMPREHENSION CHECK

▸ max(arr)

▸ firstFive(arr)

▸ substrings(str)

▸ hasVowel(str)

SO WHAT THE HELL IS MEMORY ANYWAY

TO UNDERSTAND MEMORY, WE NEED TO UNDERSTAND HOW A COMPUTER IS STRUCTURED.

Immediate workspace. A CPU usually has 16 of these.

Data Layers

1 cycle

Data Layers

A nearby reservoir of useful data we’ve recently read. Close-by.

1 cycle

~4 cycles

Data Layers

More nearby data, but a little farther away.

1 cycle

~4 cycles

~10 cycles

Data Layers

~800 cycles. Getting pretty far now. It’s completely random-access, but takes a while.

1 cycle

~4 cycles

~10 cycles

Data Layers

~800 cycles. Getting pretty far now. It’s completely random-access, but takes a while.

1 cycle

~4 cycles

~10 cycles

On an SSD, you’re looking at ~5,000 cycles.This is pretty much another country.

And on a spindle drive, it’s more like 50,000.

SO ALL DATA TAKES A JOURNEY UP FROM THE HARD DISK TO

EVENTUALLY LIVE IN A REGISTER.

WHAT DOES MEMORY ACTUALLY LOOK LIKE?

IT’S JUST A BUNCH OF CELLS WITH SHIT IN ‘EM.

IT’S ALL BINARY DATA.

STRINGS, FLOATS, OBJECTS, THEY’RE ALL STORED AS BINARY.

AND IT’S ALL STORED CONTIGUOUSLY.

THIS IS VERY IMPORTANT WHEN IT COMES TO ARRAYS.

ARRAYS ARE JUST CONTIGUOUS BLOCKS OF

MEMORY.

THAT’S WHY THEY’RE SO FAST.

Garbage Also garbage

Assume each of these cells are 8 bytes (64-bits)

Garbage Also garbage

Let’s imagine they’re addressed like so…

832968 833032 833096 833160 833224 833288 833352 833416 833480 833544

this.startAddr = 833096;

832968 833032 833096 833160 833224 833288 833352 833416 833480 833544

Each cell is offset by exactly 64 in the address space

832968 833032 833096 833160 833224 833288 833352 833416 833480 833544

Each cell is offset by exactly 64 in the address space

Meaning you can easily derive the address of any index

832968 833032 833096 833160 833224 833288 833352 833416 833480 833544

function get(i) {

return this.startAddr + i * 64;

832968 833032 833096 833160 833224 833288 833352 833416 833480 833544

function get(i) {

return this.startAddr + i * 64;

get(3) = 833096 + 3 * 64 = 83306 + 192 = 833288

THIS IS POINTER ARITHMETIC.

THIS IS WHAT MAKES ARRAY LOOKUPS O(1)

AND IT’S WHY ARRAYS ARE BY FAR THE FASTEST DATA

STRUCTURE

LET’S WRAP UP BY TALKING ABOUT CACHE

EFFICIENCY.

CACHES ARE DUMB.

When the CPU needs data, it first looks in the cache.

Say it’s not in the cache. This is called a cache miss.

The cache then loads the data the CPU requested from RAM…

But the cache guesses that if the CPU wanted this data, it probably will also want

other nearby data eventually. It would be stupid to have to make multiple round trips.

In other words, the cache assumes that related data will be stored around the same physical area.

The cache assumes locality of data.

So the cache just loads a huge contiguous chunk of data around the address the CPU asked for.

OK. SO?

Remember this?

Loading from memory is slow as shit.

Remember this?

Loading from memory is slow as shit.

We really want to minimize cache misses.

SO KEEP YOUR DATA LOCAL AND YOUR DATA STRUCTURES

CONTIGUOUS.

ARRAYS ARE KING, BECAUSE ALL OF THE DATA IS LITERALLY RIGHT NEXT

TO EACH OTHER IN MEMORY!

An algorithm that jumps around in memoryor follows a bunch of pointers to other objectswill trigger lots of cache misses!

Think linked lists, trees, even hash maps.

IDEALLY, YOU WANT TO WORK LOCALLY WITHIN ARRAYS OF

CONTIGUOUS DATA.

LET’S DO A QUICK EXERCISE.

QUESTIONS?

I AM HASEEB QURESHI

You can find me on Twitter: @hosseeb

You can read my blog at: haseebq.com

PLEASE DONATE IF YOU GOT SOMETHING OUT OF THIS

Ranked by GiveWell as the most

efficient charity in the world!

CS Fundamentals: Scalability and Memory

Engineering

THE PERFORMANCE AND SCALABILITY OF ...csl.cs.ucf.edu/~heinrich/papers/Dissertation.pdfTHE PERFORMANCE AND SCALABILITY OF DISTRIBUTED SHARED MEMORY CACHE COHERENCE PROTOCOLS A DISSERTATION

Memory Management Fundamentals Virtual Memory. Outline Introduction Motivation for virtual memory Paging – general concepts –Principle of locality, demand

SOFTWARE-DEFINED MEMORY HIERARCHIES SCALABILITY AND QOS IN

3-1-1 Chapter 3 Memory and Memory Interfacing Section 3.01 Semiconductor Memory Fundamentals In the design of all computers, semiconductor memories are

From Technologies to Market Emerging Non-Volatile Memory (NVM)©vénements/Prez workshop memory... · enterprise storage •NAND and DRAM scalability was supposed to reach its limit

Methodologies to Study the Scalability and Reliability ......Stanford University 11.12.2013 Methodologies to Study the Scalability and Reliability Physics of Phase-Change Memory Rakesh

Memory Management Fundamentals

Boosting Scalability ofBoosting Scalability of InfiniBand

Memory Management Fundamentals - Turbonomic · Memory Management Fundamentals Memory Over-Commitment 3. Memory Ballooning Memory Ballooning occurs when a host is running low on available

Scalability analysis of the distributed-memory ...users.monash.edu › ~jdroniou › MWNDEA › slides › slides-martin.pdf · Scalability analysis of the distributed-memory implementation

Memory Management Fundamentals - Turbonomic€¦ · Memory Management Fundamentals Why Today’s Virtual & Cloud Environments Demand a New Understanding of the Data Center

HIgh Performance Computing & Systems LAB Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems Rachata Ausavarungnirun,

5 Fundamentals of Virtual Server Data Protection · 5 Fundamentals of Virtual Server Data Protection ... times, limitless scalability. But they’re often surprised to learn that

CJ) INTERLEAVERS: FUNDAMENTALS … Implementation of... · 2019. 1. 14. · B. Interleaver memory The interleaver memory block comprises of two memory modules (RAM-1 and RAM-2) ,

CLR Fundamentals: Memory Management

DB2 „Deep dive into DB2“ - exstor.de · „Deep dive into DB2 ... Tuning for performance, scalability Memory and virtual memory manager Database reorganization DB2 logs db2diaglog

F.Y.B.Sc.(IT) Syllabus : Fundamentals of Digital Computing · Basic Organisation, Memory: ROM, RAM, PROM, EPROM, EEPROM, Secondary Memory: Hard Disk and optical Disk, Cache Memory,

Chair, Computer Engineering Program · • fundamentals of Verilog ... • shared memory and distributed memory systems • synchronization and coherence models ... scalability, security

Chapter 8 – Memory Basics Logic and Computer Design Fundamentals

Week 8 Memory and Memory Interfacing - Hacettepe …alkar/ELE414/dirz2005/w8-414-[2005].pdf · 2 Semiconductor Memory Fundamentals • In the design of all computers, semiconductor