View
242
Download
3
Category
Preview:
Citation preview
精研课程
04/20/23 2/74
课程基本信息课程编号: 21190120
上课时间、地点: 2012 年 夏学期周二 (9 、 10) 曹光彪 西 101 、周四( 3~4 )曹光彪 西 101
上机:周日( 11 、 12 )软件学院机房考试时间: 6 月 11 日( hard deadline )考试形式:提交课程论文—综述学时 / 学分: 4-2/ 周 学时 / 2-1 学分
04/20/23 3/74
Office & Homepage
Office :工商楼 215
My Homepage :http://www.cs.zju.edu.cn/people/yedeshi/
Course Home:
04/20/23 4/74
Examination
Grading Polices :1) Homework: 15%
2) Quiz: 10% (5th Week, May 15)
3) Presentation+ Poster session : 20% (Slides submit to TA),
Videos maybe taken
Poster: in Lab time
4) Programming Project: 20% (2-3programms)
5) Final Survey: 35%
TA information
金淑贤 : rolesGuide to group the students
Collect the homework, program, and survey
Review of homework, programs
Collect the scores of each presentations/posters
04/20/23 5/74
04/20/23 6/74
What is the position of algorithms in CS
1. Linguists: what shall we talk to the machines?
2. Algorithms: what is a good method for solving a problem fast on my computer
3. Architects: Can I build a better computer?
4. Sculptors of Machine Intelligence: Can I write a computer program that can find its own solution.
04/20/23 7/74
04/20/23 8/74
Algorithms in Computer ScienceA
lgo
rithm
s
Hardware
Compilers, Programming languages
Machine learning, Statistics, Information retrieval, AI
Networking, Distributed systems, Fault tolerance, Security
Bioinformatics .......
04/20/23 9/74
MIT Undergraduate Programs
04/20/23 10/74
What is algorithm ?( Oxford Dict. ) Algorithm:
A set of rules that must be followed when solving a particular problem.
From Math world A specific set of instructions for carrying out a procedure or solving a problem, usually with the requirement that the procedure terminate at some point.
An algorithm is any well-defined computational procedure that takes some value, or set of values, as input and produces some value, or set of values, as output.
04/20/23 11/74
04/20/23 12/74
What will CS be?
Computer Science:Past, Present, and Future
Ed Lazowska (Washington)
Computer Science is the new Math
Christos H. Papadimitriou (Berkeley)
04/20/23 13/74
Algorithm
Problem definition 问题
Objective 目标 (very important)
Evaluation 算法评价
Methods 方法
04/20/23 14/74
Algorithm evaluation
Quality: how far away from the optimal solution ?
Cost: Running time Space needed
Our goal is to design algorithm with high quality, but in low cost
04/20/23 15/74
Reasonable times
Poly(|I|), Time polynomial in |I|, where |I| is the size of the problem instanceInput size: size(x) of an instance x with rational data is the total number of bits needed for the binary prepresentation.
Time complexity
logarithmic time if T(n) = O(log n).
sub-linear time if T(n) = o(n)
linear time, or O(n) time
linearithmic function: T(n) = O(n log n), quasilinear time if T(n) = O(n logk n)
polynomial time: T(n) = O(nk) for some constant kstrongly polynomial time:
the number of operations in the arithmetic model of computation is bounded by a polynomial in the number of integers in the input instance; and
the space used by the algorithm is bounded by a polynomial in the size of the input.
weakly polynomial time: P but not strongly P
04/20/23 16/74
Time complexity
Quasi-polynomial time: for some fixed c.
Sub-exponential time if T(n) = 2o(n)
Exponential time, if T(n) is upper bounded by 2poly(n)
04/20/23 17/74
04/20/23 18/74
Hardness of problems
Polynomial (e.g. n2, n log n, n3, n1000).
Quasi-polynomial(e.g.:n log n, n log2n, c log7n).
Sub-exponential (e.g.: 2√n, 5(n0.98)).
Exponential (e.g.: 2n, 8n, n!, nn).
Easy
Hard
04/20/23 19/74
Running timeComputer A is 100 times faster than computer B
Sort n numbersComputer A requires instructions
Computer B requires 50nlgn instructions
n = 1,000, 000Computer A: 2(10^6)^2/10^9 = 2000 seconds
Computer B: 50*10^6 lg 10^6/10^7 ~ 100 seconds
22n
04/20/23 20/74
Running time
10 < 1 s < 1s < 1 s < 1 s < 1 s 4 s
100 < 1 s < 1 s < 1 s 1 s 18 min
year
1,000 < 1 s < 1 s 1 s 18 min
Very long
Very long
10,000 < 1 s < 1 s 2 min 12 day
Very long
Very long
1 s 20 s 12 days
31710 year
Very long
Very long
n2n3nlognn 2n n!
1025
106
04/20/23 21/74
Sorting输入: A sequence of n number
输出:排列( permutation )< a0
1, a02,… ,a0
n >
使得: a01 <=a0
2 <= ... <=a0n
Example:
Input: 8 2 4 9 3 6Output: 2 3 4 6 8 9
< a1,a2, …,an >
04/20/23 22/74
EX. of insertion sort
8 2 4 9 3 6
04/20/23 23/74
8 2 4 9 3 6
2 8 4 9 3 6
EX. of insertion sort
04/20/23 24/74
8 2 4 9 3 6
2 8 4 9 3 6
EX. of insertion sort
04/20/23 25/74
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
EX. of insertion sort
04/20/23 26/74
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
EX. of insertion sort
04/20/23 27/74
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
EX. of insertion sort
04/20/23 28/74
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
EX. of insertion sort
04/20/23 29/74
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
2 3 4 8 9 6
EX. of insertion sort
04/20/23 30/74
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
2 3 4 8 9 6
EX. of insertion sort
04/20/23 31/74
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
2 3 4 8 9 6
2 3 4 6 8 9 done
EX. of insertion sort
04/20/23 32/74
Insertion sortINSERTION-SORT (A, n) ⊳ A[1 . . n]
for j ← 2 to ndo key ← A[ j]
i ← j – 1while i > 0 and A[i] > key
do A[i+1] ← A[i]i ← i – 1
A[i+1] = key
“pseudocode”
i j
keysortedsorted
AA::1 n
04/20/23 33/74
Analyzing algorithms
Need a computational model
Random-access machine (RAM) modelInstructions are executed one after another. No concurrent operations.
Arithmetic: add, subtract, multiply, divide, remainder, floor, ceiling
Data movement: load, store, copy
Control: conditional/unconditional branch, subroutine call and return.
Each of these instructions takes a constant amount of time.
04/20/23 34/74
Running time
Running time: The running time of an algorithm on a particular input is the number of primitive operations or “steps” executed.line consists only of primitive operations and takes constant time
Input size: number of items the total number of bits.more than one number: Graph
the number of vertices and the number of edges
04/20/23 35/74
The input size of sorting problem is n.Worst-case running time of Insert sort is O(n2).
Example:
04/20/23 36/74
Running time
The running time depends on the input: an already sorted sequence is easier to sort.
Parameterize the running time by the size of the input, since short sequences are easier to sort than long ones.
Generally, we seek upper bounds on the running time, because everybody likes a guarantee.
04/20/23 37/74
Map of Algorithm Design
Off-line problem
Polynomial NP-C problem
Improve cost running time
Exact Algorithm
HeuristicApproximate
Algorithm
Improve cost running time
QualityAppro. ratio
On-line problem
New problem
QualityAppro. ratio
Polynomial
04/20/23 38/74
课程内容1. 数学基础
1.1 算法基础 1.2 和 (SUMS) 集合运算 (Sets) 1.3 特殊数 ( Stirling numbers, Harmonic numbers, Eulerian numbers et al. )
2. 基本算法2.1 分治 ( Divide-and-Conquer ) *
2.1.1 Mergesort * 2.1.2 自然数相乘( Multiplication ) * 2.1.3 矩阵相乘( Matrix multiplication ) 2.1.4 Discrete Fourier transform and Fast Fourier transform
04/20/23 39/74
2.2 动态规划 (Dynamic Programming) 2.2.1 背包问题( Knapsack problem ) 2.2.2 最长递增子序列( Longest increasing subsequence ) 2.2.3 Sequence alignment 2.2.4 最长相同子序列( Longest common subsequence ) 2.3.5 Matrix-chain multiplication 2.3.6 树上的独立集 (Max Independent set in tree)
课程内容
04/20/23 40/74
2.3 贪婪算法 ( Greedy ) 2.3.1 区间规划( Interval scheduling )2.3.2 集合覆盖( Set cover )2.3.3 拟阵( Matroids )
2.4 NP 问题 ( NP-completeness ) 2.4.1 The classes P and NP
2.4.2 NP-completeness and reducibility
2.4.3 NP-complete problems *
课程内容
04/20/23 41/74
2.5 近似算法 (Approximate Algorithm)
2.5.1 顶点覆盖问题 ( Vertex cover ) 2.5.2 负载平衡问题 (Load balancing)
2.5.3 旅行商问题 (Traveling salesman problem)
2.5.4 子集和问题 (Subset sum problem)
课程内容
04/20/23 42/74
3. 算法的应用3.1 局部搜索 (Local Search)
3.1.1 The Metropolis Algorithm and Simulated Annealing
3.1.2 Local Search to Hopfield Neural Networks ( Nash Equilibria ) 3.1.3 Maximum Cut Approximation via Local Search
课程内容
04/20/23 43/74
3.2 图论 (Graph Theorem) *3.2.1 图论的基本知识 ( Fundamental )3.2.2 线性规划 (Linear Programming)
网络流( Network Flow ),二分图,完全图的匹配
3.3 计算几何学 (Computational Geometry)*3.3.1 基本概念与折线段的性质 (Line-segment )
3.3.2 线段的一些性质 (Segments intersects )
3.3.3 凸包问题 (Convex Hull )
3.3.4 最近点对问题 (The closet pair of points)
3.3.5 多边形三角剖分 (Polygon Triangulation)
课程内容
04/20/23 44/74
3.4 随机算法 (Randomized Algorithm) 3.4.1 随机变量与期望 3.4.2 A Randomized MAX-3-SAT 3.4.3 Randomized Divide-and-Conquer
3.5 在线算法( Online Algorithm ) 3.5.1 Online Skying 3.5.2 Online Hiring
课程内容
* :备选内容
04/20/23 45/74
课程内容
04/20/23 46/74
教材 Textbook:Textbook: Introduction to algorithms Introduction to algorithms, Second Edition. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. The MIT Press, 2001. ISBN: 0262032937.
Recommended:Recommended: Algorithm Design.Algorithm Design. Jon Kleinberg, Éva Tardos. Addison Wesley, 2005. ISBN: 0-321-29535-8.
Rolf Nevanlinna Rolf Nevanlinna Prize, 06Prize, 06
04/20/23 47/74
04/20/23 48/74
Algorithms.Algorithms. S. Dasgupta, C.H. Papadimitriou, and U. V. Vazirani. May 2006.
Combinatorial Algorithms. Jeff Erickson. University of Illinois, Urbana-Champaign. Lecture Notes. Fall 2002.
Concrete MathematicsConcrete Mathematics. Ronald L. Graham, Donald E. Knuth, Oren Patashnik. Addison-Wesley Publishing Company, 2005. ISBN: o-201-14236-8.
参考教材
04/20/23 49/74
Algorithms in Computer ScienceP = NPP = NP ?
Can we solve a problem efficiently?
Tradeoff between quality of solution and the running time
Solve a problem with optimal solution, but it might cost long time
Solve a problem approximately in short time
04/20/23 50/74
$1,000,000 problem
P = NPP = NP ? http://www.claymath.org/millennium/Solved???!!!!
Algorithms in Computer Science
Selfish Routing
Privacy preserve in database
TSP
Ad auction
04/20/23 51/74
04/20/23 52/74
Perspective
Algorithms we can find everywhere
They have been developed to easy our daily life
Train/Airplane timetable schedule
Routing
We live in the age of information Text, numbers, images, video, audio
04/20/23 53/74
Selfish routingPigou's Example
Suburb s, a nearby train station t.
Assuming that all drivers aim to minimize the driving time from s to t
s t
C(x) = 1
C(x) = x, with x in [0, 1]
04/20/23 54/74
Selfish routing
We have good reason to expect all traffic to follow the lower road
Social optimal? ½ to the long, wide highway, ½ to the lower road.
selfish behavior need not produce a socially optimal outcome
04/20/23 55/74
Braess's Paradox
s
v
w
t
C(x) = 1
C(x) = xC(x) = 1
C(x) = x
04/20/23 56/74
Braess's Paradox
s
v
w
t
C(x) = 1
C(x) = xC(x) = 1
C(x) = x
C(x) = 0
04/20/23 57/74
Braess's Paradox
Paradox thus shows that the intuitively helpful action of adding a new zero-cost link can negatively impact all of the traffic!
With selfish routing, network improvements can degrade network performance.
Link attack example
Re-identify the medical record of the governor of Massachussetts
MA collects and publishes sanitized medical data for state employees (microdata) left circle
voter registration list of MA (publicly available data) right circle
• looking for governor’s record• join the tables:
– 6 people had his birth date
– 3 were men
– 1 in his zipcode
• regarding the US 1990 census data– 87% of the population are unique based on (zipcode,
gender, dob)
Privacy in microdatathe role of attributes in microdata
explicit identifiers are removed
quasi identifiers can be used to re-identify individuals
sensitive attributes (may not exist!) carry sensitive information
Name Birthdate
Sex Zipcode Disease
Andre 21/1/79 male 53715 Flu
Beth 10/1/81 female 55410 Hepatitis
Carol 1/10/44 female 90210 Brochitis
Dan 21/2/84 male 02174 Sprained Ankle
Ellen 19/4/72 female 02237 AIDS
identifier
quasi identifiers sensitive
Name Birthdate
Sex Zipcode Disease
Andre 21/1/79 male 53715 FluBeth 10/1/81 female 55410 HepatitisCarol 1/10/44 female 90210 BrochitisDan 21/2/84 male 02174 Sprained
AnkleEllen 19/4/72 female 02237 AIDS
k-anonymityk-anonymity: intuitively, hide each individual among k-1 others
each QI set of values should appear at least k times in the released microdata
linking cannot be performed with confidence > 1/k
sensitive attributes are not considered (going to revisit this...)
how to achieve this?generalization and suppression
value perturbation is not considered (we should remain truthful to original values )
privacy vs utility tradeoffdo not anonymize more than necessary
04/20/23 61/74
Advertisement Auction
Auction Dutch auction
Vickrey auction
Ad placement
k-anonymity exampletools for anonymization
generalizationpublish more general values, i.e., given a domain hierarchy, roll-up
suppressionremove tuples, i.e., do not publish outliers
often the number of suppressed tuples is bounded
Birthdate
Sex Zipcode
21/1/79 male 53715
10/1/79 female 55410
1/10/44 female 90210
21/2/83 male 02274
19/4/82 male 02237
Birthdate
SexZipcode
group 1*/1/79 person 5*****/1/79 person 5****
suppressed
1/10/44 female 90210
group 2*/*/8* male 022***/*/8* male 022**
original microdata 2-anonymous data
04/20/23 63/74
TSPTrucking company with a central warehouse
Each day, it loads up the truck at the warehouse and sends it around to several locations to make deliveries.
At the end of the day, the truck must end up back at the warehouse so that it ready to be loaded for the next day.
To reduce the costs, the company wants to select an order of delivery stops that yields the lowest overall distance traveled by the truck.
04/20/23 64/74
04/20/23 65/74
04/20/23 66/74
04/20/23 67/74
Pizza delivery
One can give a call or via internet to order a pizza for dinner
We want the hot, fresh and tasty pizzas
How should they delivery the pizzas upon the reception of orders??
Immediately or wait some minutes for next orders in the near places?
04/20/23 68/74
The Ski problem
The Ski problem [Karp 92]: A skier must decide every day she goes skiing whether to rent or buy skis, unless or until she decides to buy them. The skiier doesn’t know how many days she will go on skiing before she gets tired of this hobbie. The cost to rent skis for a day is 1 unit, while the cost to buy the skis is B units.
How can she save money?
04/20/23 69/74
Lost cow problem
A short-sighted cow (or assume it’s dark, or foggy, or ...) is standing in front of a fence and does not know in which direction the only gate in the fence might be. How can the cow find the gate without walking too great a detour?
How can two soldiers get together when lost in battlefield ?
04/20/23 70/74
Erdős project – shortest path
Paul Erdős(1913-1996) has an Erdős number of zero. If the lowest Erdős number of a coauthor is X, then the author's Erdős number is X + 1.
04/20/23 71/74
Nevanlinna Prize winners
NAME YEAR COUNTRY ERDÖS NUMBER
Robert Tarjan 1982 USA 2
Leslie Valiant 1986 Hungary/Gt Brtn 3
Alexander Razborov 1990 Russia 2
Avi Wigderson 1994 Israel 2
Peter Shor 1998 USA 2
Madhu Sudan 2002 India/USA 2
Jon Kleinberg 2006 USA 3
Daniel Alan Spielman 2010 USA
04/20/23 72/74
Other famous people Albert Einstein 1921 Physics 2 Chen Ning Yang 1957 Physics 4 Tsung-dao Lee 1957 Physics 5 John F. Nash 1994 Economics 4 Edmund S. Phelps 2006 Economics 4 Shing-Tung Yau 1982 China 2 Shiing Shen Chern 1983-84 China 2 Alan Turing computer science 5 John von Neumann mathematics 3 David Hilbert mathematics 4 Donald E. Knuth 2
Extensions of shortest path
On k-skip Shortest Paths (SIGMOD 2011)
04/20/23 73/74
04/20/23 74/74
History of Algorithm
The word algorithm comes from the name of the 9th century Persian mathematician Abu Abdullah Muhammad ibn Musa al-Khwarizmi whose works introduced Arabic numerals and algebraic concepts. The word algorism originally referred only to the rules of performing arithmetic using Arabic numerals but evolved into algorithm by the 18th century. The word has now evolved to include all definite procedures for solving problems or performing tasks.
04/20/23 75/74
History – con.
The first case of an algorithm written for a computer was Ada Byron's notes on the analytical engine written in 1842, for which she is considered by many to be the world's first programmer. However, since Charles Babbage never completed his analytical engine the algorithm was never implemented on it. This problem was largely solved with the description of the Turing machine, an abstract model of a computer formulated by Alan Turing, and the demonstration that every method yet found for describing "well-defined procedures" advanced by other mathematicians could be emulated on a Turing machine (a statement known as the Church-Turing thesis).
04/20/23 76/74
Why you come here?
04/20/23 77/74
Requirement
Come to the class (*)
Ask questions
Thinking :Why it is ok now ?How about other methods ?
04/20/23 78/74
Kinds of analyses
Worst-case: (usually)T(n) = maximum time of algorithm on any input of size n.
Average-case: (sometimes)T(n) = expected time of algorithm over all inputs of size n.Need assumption of statistical distribution of inputs.
Best-case: (bogus) Cheat with a slow algorithm that works fast on
some input.
04/20/23 79/74
Uniform distribution
04/20/23 80/74
Performance Measures for On-line Algorithms
Competitive ratio
Max/Max ratio
Smoothed Competitiveness
Recommended