20
Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University

Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

Sublinear time algorithmsRonitt Rubinfeld

Blavatnik School of Computer Science

Tel Aviv University

Page 2: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

How can we understand?

Page 3: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

Vast data

• Impossible to access all of it

• Potentially accessible data is too enormous to be viewed by a single individual

• Once accessed, data can change

Page 4: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

Small world phenomenon

• each “node’’ is a person

• “edge’’ between people that know each other

Page 5: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

Small world property

• “connected” if every pair can reach each other • “distance’’ between two people is the minimum

number of edges to reach one from another• “diameter’’ is the maximum distance between any

pair– “6 degrees of separation’’ is equivalent to “diameter of the

world population is 6’’

Page 6: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

Does earth have the small world property?

• How can we know?– data collection problem is immense– unknown groups of people found on earth– births/deaths/new introductions

Page 7: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

Data sets can be massive

• examples:– sales logs– scientific measurements – genome project– world-wide web– network traffic, clickstream patterns

• in many cases, hardly fit in storage

Page 8: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

The Gold Standard

• linear time algorithms:– for inputs encoded by n

bits/words, allow cn time steps (constant c)

• Inadequate?

Page 9: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

What can we hope to do without viewing most of the

data?• Can’t answer “for all” or “exactly” type statements:

– are all individuals connected by at most 6 degrees of separation?

– exactly how many individuals on earth are left-handed?

• Change our goals: ask about “approximate” statements– is there a large group of individuals connected by at most 6

degrees of separation?– approximately how many individuals on earth are left-

handed?

Page 10: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

Statistics!

• Sampling to approximate average, median values– Polling to predict outcome of elections– Approximate fraction of left-handed people, male-female ratios

• Can we use these ideas in algorithmic settings?

Page 11: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

An example

Page 12: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

Monotonicity of a sequence

• Given: list y1 y2 ... yn

• Question: is the list sorted?

• Clearly requires n steps – must look at each yi

Page 13: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

Monotonicity of a sequence

• Given: list y1 y2 ... yn

• Question: can we quickly test if the list close to sorted?

Page 14: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

What do we mean by ``quick’’?

• query complexity measured in terms of list size n

• Our goal (if possible): – Very small compared to n, will go for clog n

Page 15: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

What do we mean by “close’’?

Definition: a list of size n is -close to sorted if can delete at most .01n values to make it sorted. Otherwise, .99-far.

Sorted: 1 2 4 5 7 11 14 19 20 21 23 38 39 45Close: 1 4 2 5 7 11 14 19 20 39 23 21 38 45 1 4 5 7 11 14 19 20 23 38 45Far: 45 39 23 1 38 4 5 21 20 19 2 7 11 14 1 4 5 7 11 14Requirements for algorithm:• pass sorted lists • if list passes test, can change at most .01 fraction of list to

make it sorted

Page 16: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

An attempt:

• Proposed algorithm:– Pick random i and test that yi≤yi+1

• Bad input type:– 1,2,3,4,5,…n/4, 1,2,….n/4, 1,2,…n/4, 1,2,…,n/4– Difficult for this algorithm to find “breakpoint” – But other algorithms work well

i

yi

Page 17: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

A second attempt:• Proposed algorithm:

– Pick random i<j and test that yi≤yj

• Bad input type:– n/m groups of m elements

m,m-1,m-2,…,1,2m,2m-1,2m-2,…m+1,3m,3m-1,3m-2,…, – must pick i,j in same group– need at least (n/m)1/2 choices to do this

i

yi

Page 18: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

A test that works• The test: (for distinct yi)

– Test several times:• Pick random i• Look at value of yi • Do binary search for yi

• Does the binary search find any inconsistencies? If yes, FAIL• Do we end up at location i? If not FAIL

– Pass if never failed

• Running time: O(log n) time

• Why does this work?– If list is in order, then test always passes– If the test would pass when i and j are chosen, then yi and yj are in correct

order– Since test usually passes, yi’s in the right order

Page 19: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

Other properties

• Graph properties: e.g. small world property, other connectivity properties

• String properties

• Function properties

• Lots more!

Page 20: Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual

Conclusions

• Sublinear time possible in many contexts– Relatively new area, lots of techniques– What else can you compute in sublinear time?– Other applications...?