45
Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Building a Science B ase for the Information A ge

  • Upload
    job

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

Building a Science B ase for the Information A ge. John Hopcroft Cornell University Ithaca, NY. Time of change. The information age is a revolution that is changing all aspects of our lives. - PowerPoint PPT Presentation

Citation preview

Page 1: Building a Science  B ase for the Information  A ge

Building a Science Base for the Information Age

John HopcroftCornell University

Ithaca, NY

Xiamen University

Page 2: Building a Science  B ase for the Information  A ge

Xiamen University

Time of change

The information age is a revolution that is changing all aspects of our lives.

Those individuals, institutions, and nations who recognize this change and position themselves for the future will benefit enormously.

Page 3: Building a Science  B ase for the Information  A ge

Xiamen University

Computer Science is changing

Early years Programming languages Compilers Operating systems Algorithms Data bases

Emphasis on making computers useful

Page 4: Building a Science  B ase for the Information  A ge

Xiamen University

Computer Science is changing

The future years

Tracking the flow of ideas in scientific literature Tracking evolution of communities in social networks Extracting information from unstructured data

sources Processing massive data sets and streams Extracting signals from noise Dealing with high dimensional data and dimension

reduction

Page 5: Building a Science  B ase for the Information  A ge

Xiamen University

Computer Science is changing

Merging of computing and communication

The wealth of data available in digital form

Networked devices and sensors

Drivers of change

Page 6: Building a Science  B ase for the Information  A ge

Xiamen University

Implications for TCS

Need to develop theory to support the new directions

Update computer science education

Page 7: Building a Science  B ase for the Information  A ge

Xiamen University

A short view of the future

Page 8: Building a Science  B ase for the Information  A ge

Xiamen University

Digitization of medical records

Doctor – needs my entire medical record Insurance company – needs my last doctor

visit, not my entire medical record Researcher – needs statistical information but

no identifiable individual information

Relevant research – zero knowledge proofs, differential privacy

Page 9: Building a Science  B ase for the Information  A ge

Xiamen University

Page 10: Building a Science  B ase for the Information  A ge

Xiamen University

Zero knowledge proof

• Graph 3-colorability

• Problem is NP-hard - No polynomial time algorithm unless P=NP

Page 11: Building a Science  B ase for the Information  A ge

Xiamen University

Zero knowledge proof

I send the sealed envelopes. You select an edge and open the two

envelopes corresponding to the end points.

Then we destroy all envelopes and start over, but I permute the colors and then resend the envelopes.

Page 12: Building a Science  B ase for the Information  A ge

Xiamen University

Digitization of medical records is not the only system

Car and road – gps – privacy

Supply chains

Transportation systems

Page 13: Building a Science  B ase for the Information  A ge

Xiamen University

Page 15: Building a Science  B ase for the Information  A ge

Xiamen University

Page 16: Building a Science  B ase for the Information  A ge

Xiamen University

When will my bus arrive?

Page 17: Building a Science  B ase for the Information  A ge

Xiamen University

IN 4 TO 5 MINUTES

Page 18: Building a Science  B ase for the Information  A ge

Xiamen University

Tracking the flow of ideas in scientific literatureYookyung Jo

Page rank

Web

Link

GraphRetrievalQuerySearchText

WebPageSearchRank

WebChordUsage

IndexProbabilisticText

FileRetrieveTextIndex

DiscourseWordCenteringAnaphora

Page 19: Building a Science  B ase for the Information  A ge

Xiamen University

Topic evolution thread

• Seed topic :– 648 : making c program

type-safe by separating pointer types by their usage to prevent memory errors

• 3 subthreads :– Type : – Garbage collection : – Pointer analysis :

Yookyung Jo, 2010

Page 20: Building a Science  B ase for the Information  A ge

Xiamen University

Topic Evolution Map of the ACM corpus

Yookyung J o, 2010

Page 21: Building a Science  B ase for the Information  A ge

Xiamen University

In the past, sociologists could study group of a few thousand individuals.

Today with social networks we can study interaction among millions of individuals.

One important activity is how communities form and evolve.

Page 22: Building a Science  B ase for the Information  A ge

Xiamen University

• Early work Min cut – two equal size communities Conductance – minimizes cross edges

• Future work Consider communities with more external edges

than internal edges Find small communities Track communities over time Develop appropriate definitions for communities Understand the structure of different types of social

networks

Page 23: Building a Science  B ase for the Information  A ge

Xiamen University

Our view of a community

TCS

Me

Colleagues at Cornell

Classmates

Family and friendsMore connections outside than inside

Page 24: Building a Science  B ase for the Information  A ge

Xiamen University

On going research on finding communities

Page 25: Building a Science  B ase for the Information  A ge

Xiamen UniversitySpectral clustering with K-means.

Page 26: Building a Science  B ase for the Information  A ge

Xiamen University

Spectral clustering with K-means.

Page 27: Building a Science  B ase for the Information  A ge

Xiamen UniversitySpectral clustering with K-means.

Page 28: Building a Science  B ase for the Information  A ge

Xiamen University

Page 29: Building a Science  B ase for the Information  A ge

Xiamen University

Instead of two overlapping clusters, we find three clusters.

Page 30: Building a Science  B ase for the Information  A ge

Xiamen University

Instead of clustering the rows of the singular vectors, find the minimum 0-norm vector in the space spanned by the singular vectors.

The minimum 0-norm vector is, of course, the all zero vector, so we will require one component to be 1.

Page 31: Building a Science  B ase for the Information  A ge

Xiamen University

Finding the minimum 0-norm vector is NP-hard.

Use the minimum 1-norm vector as a proxy.

Page 32: Building a Science  B ase for the Information  A ge

Xiamen University

Page 33: Building a Science  B ase for the Information  A ge

Xiamen University

Page 34: Building a Science  B ase for the Information  A ge

Xiamen University

Page 35: Building a Science  B ase for the Information  A ge

Xiamen University

Page 36: Building a Science  B ase for the Information  A ge

Xiamen University

Page 37: Building a Science  B ase for the Information  A ge

Xiamen University

Page 38: Building a Science  B ase for the Information  A ge

Xiamen University

Minimum 1-norm vector is not an indicator vector.

By thresh-holding the components, convert it to an indicator vector for the community.

Page 39: Building a Science  B ase for the Information  A ge

Xiamen University

0 50 100 150 200 250 300 350 4000.4

0.5

0.6

0.7

0.8

0.9

1

Page 40: Building a Science  B ase for the Information  A ge

Xiamen University

Random walk

How long?

What dimension?

Page 41: Building a Science  B ase for the Information  A ge

Xiamen University

Krylov subspace

Find orthonormal basis.

Update subspace in each step rather than just the probability vector

Page 42: Building a Science  B ase for the Information  A ge

Xiamen University

Find minimum 1-norm vector in Krylov subspace.

Actually allow vector to be close to subspace.

Page 43: Building a Science  B ase for the Information  A ge

Xiamen University

Structure of communities

How many communities is a person in?Small, medium, large

How many seed points are needed to uniquely specify a community a person is in?Which seeds are good seeds?Etc.

Page 44: Building a Science  B ase for the Information  A ge

Xiamen University

What types of communities are there?

How do communities evolve over time?

Page 45: Building a Science  B ase for the Information  A ge

Xiamen University

This is an exciting time for computer science.

There is a wealth of data in digital format, information from sensors, and social networks to explore.

It is important to develop the science base to support these activities.