19
Network Science: A Theoretical and Practical Framework Alessandro Vespignani, School of Informatics & Katy Börner, School of Library and Information Science [email protected] [email protected] Networks and Complex Systems Talk, IUB, September 12 th , 2005.

Network Science: A Theoretical and Practical Framework Alessandro Vespignani, School of Informatics & Katy Börner, School of Library and Information Science

Embed Size (px)

Citation preview

Network Science: A Theoretical and Practical Framework

Alessandro Vespignani, School of Informatics & Katy Börner, School of Library and Information Science

[email protected] [email protected]

Networks and Complex Systems Talk, IUB, September 12th, 2005.

Alessandro Vespignani & Katy Börner, Network Science: A Theoretical and Practical Framework, September 12 th, 2005.

Alessandro’s slides

Alessandro Vespignani & Katy Börner, Network Science: A Theoretical and Practical Framework, September 12 th, 2005.

What, you are not part of the 150 people that aim to advance network science theory?

You want to apply network science to help heal patients, improve information access, increase the error and attack

tolerance of networks, etc.?

Well, we are working on a practical framework also.

Alessandro Vespignani & Katy Börner, Network Science: A Theoretical and Practical Framework, September 12 th, 2005.

Practical framework – The problem

How to build bridges across all the many different Terminologies, e.g.,

Component, also called giant component in Math, percolation cluster by Physicists, and community of subjects by Sociologists. Rich get richer effect, also known as Mathew effect or cumulative advantage modeled as preferential attachment

Conceptualizations, e.g., Sampling, measurement, or modeling.

Approaches to do and report science. With or without blackboards, computers, etc. Single person vs.

large team research. Value systems, e.g.,

Are formulas and proofs needed or does it count if lives are saved? Is model validation based on empirical data truly necessary or does

it suffice to create artificial life that exhibits life like behavior? …that are used in the diverse sciences that develop or apply network

science?

Tower of Babel

Alessandro Vespignani & Katy Börner, Network Science: A Theoretical and Practical Framework, September 12 th, 2005.

Why there is hope

Human perceptual and cognitive abilities evolve very slowly. There are individual differences but rather minor ones.

Different areas of science have very different cultures. However, the cultures are learned and humans are able to learn throughout their life.

Crossing disciplinary boundaries is incredibly hard. You have to read and talk a lot in languages that sound totally alien and only slowly become familiar to you. You just cant understand why anybody would do the things that seem to be widely accepted. However, after a few years you will understand.

Typically, all collaborators in an interdisciplinary project have to go far out of their ‘normal/safe way of doing things’ to truly gain/enjoy synergies. This can be scary. Many decide not to leave the safe harbor. Yet, humans are explorers.

Last but not least, most of today’s challenges require interdisciplinary collaboration. Those which go for the challenge have a higher chance to be among the ‘fittest’ of their generation.

Alessandro Vespignani & Katy Börner, Network Science: A Theoretical and Practical Framework, September 12 th, 2005.

How to make it work

Simple suggestions for how to deal with different Terminologies:

Watch out and report terminology that seems to refer to the same concept.

Do not invent new terms for existing and already named concepts. Attend all Mon talks – not only those in your area of research.

Conceptualizations: Read and talk across disciplinary boundaries. Contribute to and

benefit from interdisciplinary reviews. Engage in interdisciplinary collaborations.

Approaches to do and report science: Read and talk across disciplinary boundaries. Contribute to and

benefit from interdisciplinary reviews. Engage in interdisciplinary collaborations.

Value systems: Learn to understand and value other value systems than your own.

There must be researchers at IU that actually study these aspects. Their input in how to make network

science work on campus and in general would be most appreciated.

Alessandro Vespignani & Katy Börner, Network Science: A Theoretical and Practical Framework, September 12 th, 2005.

Doing network science is tricky (I)

Ok, we have a physicist, a biologist, and an information scientist that want to collaborate on increasing our understanding of the spreading of epidemics.

They are good friends. They are experts. Their skills complement each other. They got major funding. We expect amazing results.

They manage to agree on terminology, conceptualization, basic approach and success criteria of the project.

Alessandro Vespignani & Katy Börner, Network Science: A Theoretical and Practical Framework, September 12 th, 2005.

Doing network science is tricky (II)

The biologist brings data which cannot be read into any of the programs that the physicist or the information scientist ever used. A parser is written.

Measurement and modeling requires a deep understanding of the data. Good thing they are friends and like to talk with each other a LOT. Too bad petrol prices are so high – travel budget goes to zero quickly.

Diverse new measurement and modeling techniques are implemented and tried. Too bad the original algorithm developers did not make them available. Well, they are not going to make them available either as this would require proper coding and documentation and they did not budget for this.

First results make clear that the current dataset is incomplete and will not solve the research question. Additional data needs to be acquired – takes at least a year given that patient data requires the completion and approval of all kinds of forms. There are some appropriate datasets online but they are not properly documented. Well, if we spent so much time to get our hands on this dataset, to clean and understand it, then we are not going to share it. Instead, we will ‘melk’ it to death. Data is power.

Doctors do not understand the results. Visualizations are employed to communicate results. The visualizations work wonder but nobody understands why or how they work. Well, we are not going to figure out why they worked – we need to start the next project.

Alessandro Vespignani & Katy Börner, Network Science: A Theoretical and Practical Framework, September 12 th, 2005.

Doing network science is tricky (III)

The project is successful. Papers are written and cited. The developed vaccination strategy is adopted in medical practice.

Old and new datasets as well as re-implemented and newly developed algorithms are stored on the investigators or more likely their graduate students hard disks. But only until nobody knows any more what this directory contains or more space is needed or the hard disk fails.

The next person interested in using these algorithms will have to code them up again. What a waste of life time.

Research that would benefit from access to the datasets will most likely never find out that it exists or there might not be a way to gain access.

Information on what researcher/practitioner is familiar with what dataset or algorithms is best acquired by using one’s social networks.

This is a rather inefficient way of ‘standing on the shoulders of giants’.

Alessandro Vespignani & Katy Börner, Network Science: A Theoretical and Practical Framework, September 12 th, 2005.

A data-code repository for network science

Diverse communities have created data and code repositories.

Standard libraries are tuned for performance and tested for reliability and hence using them results in more re-usable and robust software.

Open-source toolkits invite contributions from a broader community not just from one company or individual.

Anyone can provide an alternative implementation of an algorithm that may make different trade-offs and appeal to a different subset of users.

Open source facilitates peer-review at the algorithm level rather than based on pseudo code published in a research publication.

Repositories minimize the time spent for re-implementing algorithms, and Enable researchers to perform comparisons of existing and novel

algorithms on a broader spectrum of implementations.

Learning modules teach valid combinations of algorithms, encourage algorithm comparison and algorithm modifications.

Some areas of Network Science deal with very large scale networks. Hence, we need a

cyberinfrastructure that provides distributed access to data-code-computing resources.

SEI: NetWorkBench: A Large-Scale Network Analysis, Modeling and Visualization Toolkit for Biomedical, Social Science and Physics Research. NSF IIS-0513650 award (Katy Börner, Albert-Laszlo Barabasi, Santiago Schnell, Alessandro Vespignani & Stanley Wasserman, Craig Stewart (Senior Personnel), $1,120,926) Sept. 05 - Aug. 08.

Alessandro Vespignani & Katy Börner, Network Science: A Theoretical and Practical Framework, September 12 th, 2005.

NetWorkBench: A ‘Hubble Telescope’ for NetSci

The NetWorkBench is envisioned as a ‘macroscope’ for network scientists – a tool that helps you to see and understand the structure and dynamics of large-scale networks.

The NetWorkBench will provide access to networks from diverse domains for research and educational purposes (about 500 datasets in 2008).

It’s software core will support the easy integration of new algorithms as well as their menu driven usage. It intends to decrease the time it takes before a ‘normal’ user gets to see/use a new algorithm and to increase the reputation of algorithm developers that make useful code available at reasonable effort.

Think ‘Wikicode’ – utilizing the emerging global brain of people that love to share data, people that love to code, people that like to document code, and people that love to teach old and new algorithms.

But, how can one manage hundreds of datasets, algorithms, papers, and experts?

Semantic association networks might help as they support new ways of information access.

Alessandro Vespignani & Katy Börner, Network Science: A Theoretical and Practical Framework, September 12 th, 2005.

Semantic association networks

Interconnect ‘Datasets’, ‘Services’, ‘Publications’, ‘Authors’ and ‘Users’ and facilitate

The retrieval of all authors that worked with dataset x or all papers that used algorithm y;

Ease the reuse of datasets and services, thus increasing the reproducibility of results;

Enable dataset/algorithm/result comparisons at the data/code/implication level; Exploit data access and data origin logs to indicate the usefulness of resources

and the reputation of authors. And much more.

Katy Börner. (in press) Semantic Association Networks: Using Semantic Web Technology to Improve Scholarly Knowledge and Expertise Management. In Vladimir Geroimenko & Chaomei Chen (eds.) Visualizing the Semantic Web, Springer Verlag, 2nd Edition, chapter 11.

Alessandro Vespignani & Katy Börner, Network Science: A Theoretical and Practical Framework, September 12 th, 2005.

But, are people ready to share?

Design for ‘the survival of the fittest’. The better the tool the fitter the user.

Provide incentive structures that promote sharing. Wikipedia, Wikispecies, Wiki* are amazing example that this can work.

Give people what they want and make it easy to share results:

Give algorithm developers a way to easily diffuse their code and related publications as part of a widely used cyberinfrastructure – increasing their reputation as researchers/programmers.

Give algorithm users a tool that is easy to use (i.e., it does not get in the way of the scientific question and is easy to learn), flexible and is extendable to read all kinds of datasets and can satisfy all kinds of network measurement/modeling/ visualization needs, and

can generate figures that make science and nature covers. Research results are communicated in papers that contain pointers to any used NWB datasets or algorithms.

Alessandro Vespignani & Katy Börner, Network Science: A Theoretical and Practical Framework, September 12 th, 2005.

Any indicators that this might work?

Well, there are many communities that have successfully created their own

cyberinfrastrucures.

The one I am most familiar with is the InfoVis Cyberinfrastructure (IVC)

http://iv.slis.indiana.edu/ IVC core

Alessandro Vespignani & Katy Börner, Network Science: A Theoretical and Practical Framework, September 12 th, 2005.

IVC core & plugins

Alessandro Vespignani & Katy Börner, Network Science: A Theoretical and Practical Framework, September 12 th, 2005.

IVC interface

Wizzard driven integration of new algorithms.Menu driven usage of algorithms.

Demo.

Alessandro Vespignani & Katy Börner, Network Science: A Theoretical and Practical Framework, September 12 th, 2005.

Socio-technical Challenges

What functionality should the ‘dream come true’ network science cyberinfrastructure provide? Is there any overlap in what sampling, measurement, modeling and visualizations the different sciences need?

What datasets would be most useful for NetSci research and education?

What data formats are most commonly used? What incentives would make you contribute code to the NWB or use

the NWB?Watch out for questionnaires!

What NWB core architecture and portal interface would work best?

If we can build a true dream tool that increases the productivity for network

science research at IU then it is very likely to work for other researchers as well.

Alessandro Vespignani & Katy Börner, Network Science: A Theoretical and Practical Framework, September 12 th, 2005.

Acknowledgements

I would like to thank the students in the InfoVis Lab at IU and my collaborators for their contributions to the IVC.

Support comes from the School of Library and Information Science, Indiana University's High Performance Network Applications Program, a Pervasive Technology Lab Fellowship, a 21st Century Grant, an Outstanding Junior Faculty Award, a SBC (formerly Ameritech) Fellow Grant and National Science Foundation grants DUE-0333623, CHE 0524661, IIS-0238261, and IIS-0513650. We also acknowledge Equipment Grants by SUN Microsystems, the 21st Century Fund, and Indiana University.