Upload
ling
View
28
Download
1
Embed Size (px)
DESCRIPTION
Application Technology Workshop : P2P and GRID: 28/01/2004. P2P Simulation and Reality. Sam Joseph Strategic Software Division Graduate School of Information Science and Technology, University of Tokyo. Sam Joseph Laboratory for Interactive Learning Technology (LILT), Department of - PowerPoint PPT Presentation
Citation preview
P2P Simulation and Reality
Application Technology Workshop : P2P and GRID: 28/01/2004
Sam JosephLaboratory for Interactive Learning Technology (LILT), Department of
Information and Computer Sciences, University of Hawai'i at Manoa
Sam JosephStrategic Software Division
Graduate School of Information Science and Technology,
University of Tokyo
Personal Profile
Founder of NeuroGrid project: http://www.neurogrid.net
Sub-editor on the P2PJournal: http://www.p2pjournal.com
MetaData subgroup leader for P2P research groups: http://www.irtf.org/charters/p2prg.html
Talk Contents
What is a Simulation? Why Simulate P2P? Simulation Methodology
P2P Simulation Issues The Dangers of Simulation
Real P2P Systems Types of Simulator An Extendable Simulator
What is a simulation?
A simulation is “an attempt to model a system in order to study it scientifically” (Law & Kelton, 2000)
Real world complexity often prevent directs mathematical analysis of model
Thus a numerical approach or simulation is required This will require an abstraction of the real system,
since otherwise we would just be building the real system
Central question is which abstractions to make, as one can accidentally abstract away essential details
For example is peer heterogeneity required in simulation?
Why Simulate? Testing scalability to large numbers of peers,
requires … large numbers of peers Thus one motivation to simulate comes from the
expense of running the real system
Testing solutions to malicious peers requires … malicious peers
And introducing malicious peers into a real system is somewhat socially irresponsible
However crucial question is are simulation studies relevant to real p2p systems?
Simulation Methodology All too often simulation "studies" involve
building a model and using the results of a single run to obtain the "answer". (Law & Kelton, 2000)
This pattern is replicated across P2P simulation studies
Drawing valid and credible conclusions requires:
Careful assessment of assumptions Appropriate probability distributions of starting
parameters Subjecting results to the appropriate statistical analysis
P2P Simulation Issues 1 Content Model
1. Representational complexitydocument is represented by hash X document is in category X and no otherdocument is related to “whales” and “dolphins” document “defines” “whales” and "has illustrations of”
“dolphins”. 2. Vocabulary
whether users map fundamental concepts onto the same terms, e.g. I say “whale” and you say “kujira”, but we both mean marine mammal
3. Fundamental concepts agreement about fundamental concepts; you say this
marine mammal is food and I say it is sentientcontent-centric or user-centric?
4. Dishonestye.g. you say this is a “revolutionary product” and I say
this is “unsolicited junk”
Content Model Each content issue subject to dynamic evolutionary
processes where users change opinions and strategies over time
More on content modeling in P2P networks in Joseph & Hoshiai (2003)
Network state serialization Allows stopping and starting Danger of biasing statistical analysis Network Markup Language (NML)
Visualization, unit-testing Visualization greatly aids debugging Unit-testing particularly important in extendible
framework
P2P Simulation Issues 2
Parameter Distributions Starting topology, content & query distributions and
churn rates Determine from real system where available Lv et al.(2002) showed different macro-behaviour
depending on whether topology was constructed using a Zipfian model or using real world data
Results Analysis run multiple simulations starting with different
selections from the same input probability distributions present results indicating confidence intervals Or repeat assessment of confidence intervals, after
sets of additional simulations, until the specified precision is acquired
P2P Simulation Issues 3
Dangers of Simulation
Case study: Query Message Combination Protocol (QMCP)
QMCP is a Gnutella Protocol modification to combine multiple queries, that could lead to more efficient use of bandwidth (based on 2001 study)
However network protocols are frequently changing – do older results about the Gnet still apply?
Failing to consider lower network levels may leave you suggesting redundant things
e.g. replicating a Nagle Algorithm in the overlay when it already exists in TCP/IP
Real P2P Systems Saroiu et al (2001) Gnutella/Napster study:
Significant heterogeneity: bandwidth, latency, availability vary between 3-5 orders magnitude
Peers deliberately misreport information if there is an incentive to do so
Clip2 showed Gnet follows a power law – Saroiu et al show resistance to random failure, but fragments under directed attack
Ripeanu et al, 2002 show Gnutella diverging from a power law network
Ge et al (2002) unregulated and transitory nature of p2p systems makes it difficult to evaluate assumptions in real system
Types of Simulator
Hierarchy of approaches Numerical Model
• SimP2 (Kant & Iyer, 2003)• Queuing Model (Ge et al., 2002?)
Flow-based simulation• Narses (Baker & Giuli, 2002)
Event-based simulation• NeuroGrid (Joseph, 2003)• QueryCycle (Schlosser et al., 2002)
Packet-based simulation• PLP2P (He et al., 200 3 )• NS-2
Real system
NeuroGrid Simulator Abstract Classes
Keyword Document Message Node Network MessageHandler
By extending the above classes allows us to create different p2p networks
GnutellaFreenetNeuroGridPastry
Action Event framework
0 1 2 3 4 5 6 7 8 9
Action Action
Action
Action
0 1 2 3 4 5 6 7 8 9
Action
0 1 2 3 4 5 6 7 8 9
Action Action
Action
Action
Action
Action
Execution causes two actions to be inserted at timestep 3
Execution causes one actions to be inserted at timestep 4, another at timestep 8
Execution causes two more actions to be inserted at timestep 8
Conclusion
P2P systems are characterized by many of the annoying real life complexities that prevent simple analysis and simulation
For example high turnover of peers download & connection failures large numbers of stochastically behaving peers
Simplifications used for tractable simulations can lead to unrealistic behaviour
Effective use of simulation studies requires a lot of work,but not as much as full implementation?
Questions?
GUID
G084G023G045
Query-G067
GUID
G044G023G047
GUID
G084G032G099
Seenit GUID
G084G067G045
GUID
G037G048G045
MatchGUID
G099G023G045
Seenit GUID
G084G067G045
TTL=1
TTL=1
TTL=1
TTL=0
StopTTL=2
TTL=2
TTL=2
TTL=3
TTL=1
LOOP
N001
N002
N004
N003
N007
N005
N006
Gnutella uses broadcast Gnutella uses broadcast searchsearch
The spread of the The spread of the messages is limited by TTL messages is limited by TTL and GUIDand GUID
TTL: Time To Live - the number of hops before a message is expired
GUID: Globally Unique Identifier - allows nodes to identify loops
Gnutella Search
public SimpleMessage(Message p_message)
throws Exception
{
if(p_message == null) throw new Exception("Message is null");
o_message_ID = p_message.getMessageID();
o_TTL = p_message.getTTL() - 1;
o_keywords = p_message.getKeywords();
o_document = p_message.getDocument();
etc …
}
Abstract Class Extension Extending the abstract classes implements p2p functions
Keyword
SimpleKeyword
Keyword IDHashtable
Document
SimpleDocument
Document IDHashtable
Message
SimpleMessage
Message IDHashtable
Node
SimpleNode
Node IDHashtable
E.g. the Message abstract class contains Document and Keyword array variables
SimpleMessage implements a second constructor, which is used when nodes forward messages
GUID
TTL decrement
public void processMessage(Message p_message, boolean p_start)
throws Exception
{
if(p_message == null) throw new Exception(“p_message is null");
String x_previous = (String)(o_seenGUIDs.get(p_message.getMessageID()));
if(x_previous != null)
return;
o_seenGUIDs.put(p_message.getMessageID(),p_message.getMessageID());
etc …
protected Hashtable o_seenGUIDs = new Hashtable(10);
processMessage() The Node abstract class has a GUID Hashtable
SimpleNode implements this method
Message seen?
Message ID of incoming Message goes into Hashtable
The Node abstract class has an abstract processMessage method
public abstract void processMessage(Message p_message, boolean p_start)
throws Exception;
Enumeration x_enum = o_conn_list.elements();
while(x_enum.hasMoreElements())
{
x_temp_node = (Node)(x_enum.nextElement());
x_new_message = new SimpleMessage(p_message);
x_new_message.setPreviousLocation(this);
o_sending_message.put(x_temp_node,x_temp_node);
x_temp_node.addMessageToInbox(x_new_message);
} etc …
Forwarding Messages
Also in the SimpleNode processMessage implementation:
Forward to the next node
Create new message
When a message is forwarded, a new message object is created through the SimpleMessage constructor which ensures the GUID is maintained and the TTL decremented
GUID
G084G023G045
Query-G067
GUID
G044G023G047
GUID
G084G032G099
GUID
G084G067G045
GUID
G037G048G045
MatchGUID
G099G023G045
GUID
G084G067G045
TTL=1
TTL=1
TTL=0
StopTTL=2
TTL=2
TTL=3
N001
N002
N004
N003
N007
N005
N006
KB
A – N003B – N002C – N003
KB
A – N004B – N005C – N005
KB
A – NXXB – NXXC – NXX
KB
A – NXXB – NXXC – NXX
KB
A – NXXB – NXXC – NXX
KB
A – NXXB – NXXC – NXX
KB
A – NXXB – NXXC – NXX
NeuroGrid nodes learn NeuroGrid nodes learn data location and data location and forward accordinglyforward accordingly
Human networking Human networking analogyanalogy
NeuroGrid Search
for(int i=0;i<x_keywords.length;i++)
{
x_docs = (Vector)(o_contents.get(x_keywords[i]));
if(x_docs != null)
{
if(x_docs.contains(p_message.getDocument()))
{
if(Network.o_learning == true)
{
x_start_node = p_message.getStart();
x_start_node.addConnection(this);
x_start_node.addKnowledge(this,x_keywords);
}
break; // stop checking once we find a node
// MultiHashtable used to store which documents are in this node (key = keyword)
protected MultiHashtable o_contents = new MultiHashtable();
// MultiHashtable used to store information about documents in other nodes (key = keyword)
protected MultiHashtable o_knowledge = new MultiHashtable();
NeuroGrid Nodes NeuroGrid nodes have MultiHashtables that associate a single key with a Vector of objects
The keywords in the incoming message
A successful search and processMessage updates the knowledge base of the node that generated the query
Document with that keyword present?
Update original Node KB
GUID
G084G023G045
Query-G067
GUID
G044G023G047
GUID
G084G032G099
GUID
G084G067G045
GUID
G067G048G045
MatchGUID
G099G023G045
GUID
G084G067G045
TTL=10
TTL=12
SeenitTTL=19
TTL=13
TTL=20
N001
N002
N004
N003
N007
N005
N006
KB
K002 – N002K003 – N003K004 – N007
KB
K002 – NXXXK003 – NXXXK004 – NXXX
KB
K002 – NXXXK003 – NXXXK004 – NXXX
KB
K002 – N004K003 – N005K004 – N006
KB
K002 – N002K003 – N003K004 – N007
KB
K002 – NXXXK003 – NXXXK004 – NXXX
KB
K002 – NXXXK003 – NXXXK004 – NXXX
TTL=14
TTL=11
= match
Freenet Freenet aggressively caches aggressively caches data while data while performing a serial performing a serial searchsearchRouting uses Routing uses document hashesdocument hashes
Freenet Search