Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Business Data Analytics
Lecture 12: Networks in Finance
MTAT.03.319
The slides are available under creative common license. The original owner of these slides is the University of Tartu
Lectures 3, 4, 5, 7, 8 Vs. Lectures 11, 12
• Looking at other/not directly controlled platforms
• Twitter, Blogs, Tech posts, Recc. websites
• Looking at their own customers
• Subscription based data
Lecture 10 : Brand Value MonitoringLecture 11: Networks in Finance
Lectures : 3, 4, 5, 7, 8
Lecture 11 Vs. Lecture 12
Lecture: Brand Value Monitoring
• Objective of the company is:• Trying to get a sense of customers’
emotions
• Trying to improve services based on feedback/complaints.
• Influence is limited.
Lecture: Networks in Finance
• Objective of the company is:• How can it Increase sales?
• How can it attract more customers?
• How can it convince people to spend on our products ?
• Influence is the objective: People are contacted to create influence.
• HR: Employees communication.
Enron Network
Source: Jana Diesner, Kathleen M. Carley. Exploration of Communication Networks from the Enron Email Corpus
Year :2000 Year :2001
Customers’ spending behavior• Intersection between social behavior and income levels.
• localities (cell tower areas) with diverse network interactions tend to have higher economic development.
• People with higher diversity in social contacts tend to have higher incomes.
• A second line of investigation has focused on using homophily and social closeness to predict the products of interest to individuals
• Easily available data on prospects, such as demographics and sociographic factors often have limited ability to predict future spending behavior.
• Highly social people are also likely to earn higher wages, find better jobs, and live healthier lives.
• There is growing evidence that social behavior is a fundamental human characteristic that affects multiple aspects of human life.
Source: Vivek K. Singh, Laura Freeman, Bruno Lepri, Alex (Sandy) Pentland. Classifying Spending Behavior using Socio-Mobile Data
Networks in Finance
Understanding graph structures in business
settings.
Network Analysis
• Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes (or vertices) and the connections between the elements or actors as links (or edges).
Source: Wikipedia
Understanding Networks through graph theory
• Terminologies And Basics• Networks can be represented
using Graphs, G(N, E).• Nodes (N): Set of entities• Examples:
• Users in Facebook. User A is friend of user B.
• Users in a transactional networks. Customers lending money to others.
• Students in Homework-homework network. Students working together in homeworks.
• Edges (E): Set of connections.
SNA: Basics
Nodes• Vertex
• Actors
• Players
• People
• Things within the network
Edges
• Ties
• Links
• Relationships
• Interactions
SNA: Basics
Nodes• Vertex
• Actors
• Players
• People
• Things within the network
Edges
• Ties
• Links
• Relationships
• Interactions
To be Connected or Not ?
• A network could be disconnected.
• Consider an organization where some employees are working as consultant.
• Consider world trade network, where some countries trade among each other.
Terms such as network and graphs will be used interchangeably in the lecture.
Enron Network
Source: Jana Diesner, Kathleen M. Carley. Exploration of Communication Networks from the Enron Email Corpus
Year :2000 Year :2001
# Disconnected components96 39
Categorization Networks
Directed Vs. Undirected
• Directed • Ex: Twitter
• Undirected • Ex: Facebook
Weighted Vs. Unweighted
• Unweighted: All relations are important
• Ex: Some streets are more important than others, based on traffic.
• Weighted: Some are more important than others.
• Ex: All relations are equally important
Local Vs. Global Concepts
Local• Degree
• Centrality measures of nodes
• Local Clustering coefficient
Global• Degree Distribution
• Diameter
• Average Path Length
• Density
• Global Clustering Coefficient
• Communities
• Network Topology/Models
• Network robustness
What is a Degree and Degree Distribution ?• Degree of a node is the number of friends or neighbors or
connections a node has.
• Degree Distribution: Number of nodes (Y axis) and Number of neighbors (X axis).
Log log scalePower law, long tail, scale free, pareto, zipfs law
Milgram Reloaded !
The navigation problem
Small world community.
The experiment setup (1967)
● One target (Massachusetts)
● Many originators. (Nebraska)
● Acquaintance chains of Letters
Output
● Six degrees of Separation
New version (2003) by Dodds et al.
● Multiple source and Targets
Image source: wikipedia
Outcome of the Experiment !
• “I read somewhere that everybody on this planet is separated by only six other people. Six degrees of separation. Between us and everybody else on this planet. The president of the United States. A gondolier in Venice. Fill in the names. . . . How every person is a new door, opening up into other worlds. Six degrees of separation between me and everyone else on this planet. But to find the right six people . . .” –John Guare, Six Degrees of Separation (1990)
What about in the age of Facebook
Average Path LengthDiameter
Path in the network !
• Path between two nodes ni and nj
• Collection of edges if traversed, can take you from the node ni to nj
• Each edge is traversed once.
• Path Length: Magnitude or the number of edges in the path.
• Shortest path Length: • Two nodes can have multiple paths.• The smallest among all the path
lengths is called the shortest path length.
• Path between nodes B and E
• Path 1: < e1, e4 >
• Path 2: < e2, e6 >
• Path 3: < e5, e7 , e6, >
• Path 4: < e5, e7 , e6, >
• Path 5: < e5, e3 , e4, >
• Shortest paths: 2• Path 1 and Path 2
AC
DB
e2
e3e1
e6
e5
Ee4
e7
Diameter and APL
• What is the diameter/APL of the network ?
• Step 1:• Shortest path between Mike and Bob• Shortest path between Mile and Emma • :
• Step 2.1 (Diameter):• Largest path among all the shortest
paths
• Step 2.2 (APL):• Average of all the shortest paths/pair
Measures
• Diameter: greatest distance between any pair of vertices• How stretched is the network.
• Maximum shortest paths among all the shortest paths for every pair of nodes.
• Average Path Length (APL): finding the shortest path between all pairs of nodes, adding them up, and then dividing by the total number of pairs.
• How many hops it takes on an average to reach a message.
• In a real network like the internet, a short APL facilitates the quick transfer of information and reduces costs.
• A power grid network will have fewer losses if APL is minimized.
Source and must read: https://en.wikipedia.org/wiki/Network_science
Facebook distance Distribution (2016)
Figure shows the distribution of averages for each person.The majority of the people on Facebook have averages between 2.9 and 4.2 degrees of separation.
Direction Matters
• In directed networks, direction matters.
• There is a path from the node “a” to the node “d” but vice-versa not true.
Measures of tightness
• Density: Ratio of “Total edges present/exists” to “Total Edges in an ideal case”
• Density is a Dyadic measure.
• It considers relation between two nodes only.
Source and must read: https://en.wikipedia.org/wiki/Network_science
Clustering Coefficient (Triadic measure)
Source: http://qasimpasta.info/data/uploads/sina-2015/calculating-clustering-coefficient.pdf
Interpretation of Density and Clustering Coefficient.
• High Density/CC• Even if some of the nodes disappear, the information can still somehow can
reach to other nodes.
• Low Density/CC• If some of the nodes disappear, network might become disconnected.
Enron Network
Source: Jana Diesner, Kathleen M. Carley. Exploration of Communication Networks from the Enron Email Corpus
Year :2000 Year :2001
Density
# Disconnected components
0.018
96
0.031
39
Social Capital
• Networks of relationships among people who live and work in a particular society, enabling that society to function effectively.
• Social capital refers to an individual’s social network andthe resources embedded within the networks that can benefitthe individual in terms of achieving their goals and facilitatingtheir actions.
• It is context dependent• Company is looking for a sales manager
or another java expert.
Viral Marketing• You identify few leaders/nodes/users in a network with a hope that
they will be able to cover most of the users in the whole network.
Mobile Network
Lengthy calls but less users Short calls but to a large # of users
1
2
3
1
2
4
5
3
6
Mobile Network: How to select a influential users?
Lengthy calls but less users Short calls but to a large # of users
1
2
3
1
2
4
5
3
6
Influential user
Which nodes are most ‘central’?
Definition of ‘central’ varies by context/purpose.
Local measure:
degree
Relative to rest of network:
closeness, betweenness,
eigenvector (Bonacich power centrality)
How evenly is centrality distributed among nodes?
centralization…
Network centrality
centrality: who’s important based on their
network position
indegree
In each of the following networks, X has higher centrality than Y according to
a particular measure
outdegree betweenness closeness
• Degree centrality• Centralization
• Betweenness centrality
• Closeness centrality
• Bonacich power centrality
Network centrality
One who has many friends is most important.
Degree centrality (undirected)
When is the number of connections the best centrality
measure?
o people who will do favors for you
o people you can talk to
degree: normalized degree centrality
divide by the max. possible, i.e. (N-1)
Degree centralization examples
example financial trading networks
high centralization: one node
trading with many otherslow centralization: trades
are more evenly distributed
When degree isn’t everythingIn what ways does degree fail to capture centrality in the
following graphs?
• ability to broker between groups
• likelihood that information originating anywhere in the network reaches you…
Network centrality
• Degree centrality• Centralization
• Betweenness centrality
• Closeness centrality
• Bonacich power centrality
Betweenness: another centrality measure
• intuition: how many pairs of individuals would have to go through you in order to reach one another in the minimum number of hops?
• who has higher betweenness, X or Y?
XY
CB (i) g jk(i) /g jkjk
Where gjk = the number of shortest paths connecting j and k,
gjk (i)= the number that actor i is on.
Usually normalized by:
CB' (i) CB (i ) /[(n 1)(n 2) /2]
number of pairs of vertices excluding the vertex itself
betweenness centrality: definition
adapted from James Moody
Betweenness on toy networks• non-normalized version:
A B C ED
A lies between no two other vertices
B lies between A and 3 other vertices: C, D, and E
C lies between 4 pairs of vertices (A,D),(A,E),(B,D),(B,E)
note that there are no alternate paths for these pairs to
take, so C gets full credit
CB (i) g jk(i) /g jkjk
Where gjk = the number of shortest paths connecting j and k,
gjk (i)= the number that actor i is on.
Betweenness on toy networks• non-normalized version:
CB (i) g jk(i) /g jkjk
Where gjk = the number of shortest paths connecting j and k,
gjk (i)= the number that actor i is on.
A
B
C
E
D
F
betweenness on toy networks• non-normalized version:
Nodes are sized by degree, and colored by betweenness.
example
Can you spot nodes with
high betweenness but
relatively low degree?
What about high degree but
relatively low betweenness?
Network centrality
• Degree centrality• Centralization
• Betweenness centrality
• Closeness centrality
• Bonacich power centrality
Closeness: another centrality measure
• What if it’s not so important to have many direct friends?
• Or be “between” others
• But one still wants to be in the “middle” of things, not too far from the center
Closeness is based on the length of the average shortest
path between a vertex and all vertices in the graph
Cc (i) d(i, j)j1
N
1
CC' (i) (CC (i)) /(N 1)
Closeness Centrality:
Normalized Closeness Centrality
closeness centrality: definition
Cc' (A)
d(A, j)j1
N
N 1
1
1 2 3 4
4
1
10
4
1
0.4
closeness centrality: toy example
A B C ED
• degree• number of
connections
• denoted by size
• closeness• length of shortest
path to all others
• denoted by color
How closely do degree and betweenness correspond to closeness?
Network centrality
• Degree centrality• Centralization
• Betweenness centrality
• Closeness centrality
• Bonacich power centrality or Eigen vector centrality
Bonacich power centrality
Finding experts !
• Goal:• We would like to find good newspapers
• Don’t just find newspapers. Find “experts” – people who link in a coordinated way to good newspapers.
• Idea:• Links as votes ?
• Page is more important if it has more incoming links
Hubs and Authorities
Hubs
• pages that provide lots of useful links to relevant content pages (topic authorities).
• Points to a lot of other pages
• Example: Yahoo
Authorities
• Authorities are pages that are recognized as providing significant, trustworthy, and useful information on a topic.
• In-degree (number of pointers to a page) is one simple measure of authority.
• Example: Authority view on some subject
Reciprocity
• Directed network concept.
• Likelihood of occurring double links
• Can be studied in Email, World Trade, WWW, Transportation etc.
• Interpretation: Mutual links (in both directions) facilitate the transportation process.
• Measured = How many pairs are pointing to each othertotal number of edges
A lot of weak ties or some strong ties ?
• Relationships demand time but time is limited.
• So should we restrict to low number of friends or a large number of friends.
• Strong and weak ties is non tangible concept.
• In work searches, it is important to have a lot of weak ties rather than a few strong ties.
• Triadic closure: Friend of a friend is also friend (or will become friend)
Source: 1) https://www.forbes.com/sites/jacobmorgan/2014/03/11/every-employee-weak-ties-work/#c38c3b6316812) https://www.socialmediatoday.com/content/strong-and-weak-ties-why-your-weak-ties-matter3) The Strength of Weak Ties. Mark Granovetter
Bridge and Weak ties
Source: https://info207.w.uib.no/2014/12/01/strong-and-weak-ties-in-social-networks-2/
A bridge is a weak tie as it helps in bridging the information gap.
65
Cut vertices and cut edges (Bridge)
• A cut vertex (or articulation point) is a vertex which, when removed with all its incident edges, leaves behind a subgraph with more connected components than were found in the original graph
• The removal of a cut vertex from a connected graph produces a subgraph that is not connected
• An edge whose removal produces a graph with more connected components than in the original graph is called a cut edge or bridge
66
Example
Find the cut vertices and cut edges in the graph below:
67
Example
Original graph:
Vertex b is a cut vertex:
Vertex c is a cut vertex:
Vertex e is a cut vertex:
68
Example
Cut edges are:
{a, b} {c, e}
Original graph:
Clique/Complete Graph
• A completely connected network, where all nodes are connected to every other node. These networks are symmetric in that all nodes have in-links and out-links from all others.
Communities
• Set of edges in a community are more densely connected with each other compared to the rest of the nodes.
• Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules.
Additional source: https://en.wikipedia.org/wiki/Modularity_(networks)
Zachary's karate club
• A university karate club
• SN of a karate club studied by Wayne W. Zachary from 1970 to 1972
• Interactions of 34 members of a karate club.
• During the study, a conflict arose between the administrator (34) and the Instructor (1), which led to the split of the club into two.
“An Information Flow Model for Conflict and Fission in Small Groups" by Wayne W. Zachary.
Zachary's karate club (Continues)
• Half of the members formed a new club around 1 (instructor);
• Other half stayed with the president or administrator (34)
• Rest of the members from the other part found a new instructor or gave up karate.
Tale of 2 parties !
US Elections 2004Each node is a Twitter user
The Political Blogosphere and the 2004 U.S. Election: Divided They Blog . Lada Adamic and Natalie Glance
Finding Communities
Types are communities
Community with in a community
Discovering Social Circles in Ego Networks. Julian McAuley and Jure Leskovec
Modularity
• Modularity is a measure to assess the strength of network’s structure.
• It was designed to measure the strength of division of a network into modules (also called groups, clusters or communities)
• Intuition behind the Modularity Function: • Given a network or Graph G(N, E): It measures how well a set of nodes are
connected with each other, compared to an random arrangement of the nodes.
• 2 Definitions are famous• Louvain
• Newman Girvan
Method 1: Louvain
• Two steps process until the convergence:
• 1st Step: Assignment of nodes to communities, favoring local optimizations of modularity.
• 2nd Step: Definition of a new coarse-grained network in terms of the communities found in the first step.
• These two steps are repeated until no further modularity-increasing reassignments of communities are possible.
• Pros: • Very fast, can identify communities in a network of million of nodes, in few minutes.• Work through Hierarchy of communities
• Cons: Only Identifies very small or large communities.
Method 2: Walk Trap
• Approach based on random walks.
• If you perform random walks on the graph, then the walks are more likely to stay within the same community as there are only a few edges that lead outside a given community.
Method 3: Label propagation algorithm
• Every node is assigned one of k labels.
• The method then iteratively re-assigns labels to nodes in a way that each node takes the most frequent label of its neighbors in a synchronous manner.
• The method stops when the label of each node is one of the most frequent labels in its neighborhood.
• Fast but yields different results based on the initial configuration (which is decided randomly)
• Better to run it a large number of times (say, 1000 times for a graph) and then build a consensus labeling, which could be tedious.
Single Vs. MultiLayer Networks
Individual (Single) Networks Multilayer Networks (Holistic View)
Resultant
Network
Single Layer Network
“This paper examines the degree to whichthe failure of one bank would cause thesubsequent collapse of other banks.Using unique data on interbank paymentflows [in the U.S.], the magnitude ofbilateral federal funds exposures isquantified. These exposures are used tosimulate the impact of various failurescenarios, and the risk of contagion isfound to be economically small.”
Furfine (2003), Interbank Exposures:
Quantifying the Risk of Contagion,
JMCB
Loans:
Forex
Derivates
Securities
Combined
The multi-layer network nature of financial systemic risk and its implications Sebastian Poledna, Jose Luis Molina-Borboa, Seraf´ın Mart´ınez-Jaramillo Marco van der Leij
Stefan Thurner1;
Combined layer
Customers’ spending behavior• Intersection between social behavior and income levels.
• localities (cell tower areas) with diverse network interactions tend to have higher economic development.
• People with higher diversity in social contacts tend to have higher incomes.
• A second line of investigation has focused on using homophily and social closeness to predict the products of interest to individuals
• Easily available data on prospects, such as demographics and sociographic factors often have limited ability to predict future spending behavior.
• Highly social people are also likely to earn higher wages, find better jobs, and live healthier lives.
• There is growing evidence that social behavior is a fundamental human characteristic that affects multiple aspects of human life.
Source: Vivek K. Singh, Laura Freeman, Bruno Lepri, Alex (Sandy) Pentland. Classifying Spending Behavior using Socio-Mobile Data. 2013 International Conference on Social Computing
Spread of Economic shock
Trading
Source: http://www.cepii.fr/PDF_PUB/wp/2013/wp2013-24.pdf
Viral Marketing• You identify few leaders/nodes/users in a network with a hope that
they will be able to cover most of the users in the whole network.
Linear Threshold (LT) Model
• A node v has random threshold ~ U[0,1]
• A node v is influenced by each neighbor w according to a weight bw,v such that
• A node v becomes active when at least
(weighted) fraction of its neighbors are active
v
v
1 ofneighbor
, vw
vwb
v
vw
vwb ofneighbor active
,
Linear Threshold (LT) Model
• Different individuals have different thresholds.
• Individuals' thresholds may be influenced by many factors: social economic status, education, age, personality, etc.
• Relate “threshold” with utility one gets from participating in collective behavior or not, using the utility function, each individual will calculate his or her cost and benefit from undertaking an action.
• Situation may change the cost and benefit of the behavior, so threshold is situation-specific.
• The distribution of the thresholds determines the outcome of the aggregate behavior (for example, public opinion).
Example
Inactive Node
Active Node
Threshold
Active neighbors
vw0.5
0.30.2
0.5
0.1
0.4
0.3 0.2
0.6
0.2
Stop!
U
X
Y
Summary
• Using network science• Graph theory (diameter, Centrality)
• Social Science (strong and weak ties)
• Physicist (Communities)
• Problem Domains• Viral Marketing
• Sales (based on centrality)
• Recommendation of products (using homophily)
Ideas for Master Thesis1 a) Financial Interactions Interactions.
Ideas for Master Thesis1 b) Financial Interactions Interactions.
1 c) Network of Global Corporate Control
.
Vitali S, Glattfelder JB, Battiston S (2011) The Network of Global Corporate Control. PLOS ONE 6(10): e25995. https://doi.org/10.1371/journal.pone.0025995http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0025995
Ideas for Master Thesis
Ideas for Master Thesis
2) Mobile Interactions
Demo time!
https://courses.cs.ut.ee/2019/bda/spring/Main/Practice