RAN Journal

Thursday, April 19, 2007

Evolution and Structure of the Internet

"Evolution and Structure of the Internet: A Statistical Physics Approach", by Romualdo Pastor-Satorras and Alessandro Vespignani. Cambridge University Press, 2004.

In my recent study on the structure of Internet, I happened to read most of this book. It is a very concise book on the structure of the Internet topology, its characterizing parameters, their statistical models, descriptions of topology creation tools available in the field and analysis of their performances. It is an easy read for the people who want to get the idea of the structure of the Internet as well as an impressive read for the people who want to understand it in a statistical point-of-view.

Internet can be visualized as a connected graph with the routers as the graphs node and their connections as the edges in the graphs. I want to short-list the defining characteristics of the Internet topology:
  • degree: number of edges branches from a node in the Internet topology (property of nodes). Internet shows heavy-tail distribution in degree distribution which means that, while a huge number of nodes has very small degree, there are a small number of nodes with very large degree. Which also implies the "popular-get-more-popular" concept.
  • shortest path length: the number of edges in the shortest paths between two end-points (property of node-pairs). Internet shows small world phenomena, that is, the average shortest path length in Internet is small.
  • betweenness: the number of shortest paths goes through a particular node (property of nodes).
  • clustering coefficient: an index indicating how well a node's neighbors are connected among themselves (property of nodes).
  • scale-free behavior: it is a property of the topology as a whole. It says that, at whatever granularity we look at the Internet topology (AS-topology, router topology, intra-AS router topology), the properties of the topology follows the same statistical model.
  • rich-club phenomena: this says that larger degree nodes are connected to each other with high probability.
The book also makes the following observations on the geographical characteristics of the Internet:
  • router-density is correlated with the population density. It is further correlated with the wealth of the regions.
  • Logical Internet distance (hop count) does not correlate
From the book, it is obvious that none of the existing Internet topology generators take all the above characteristics of the Internet topology into consideration. It is important to carefully go through the documentation of the topology generators and choose one wisely that models the parameter(s) of your interest correctly.

Read more!

Wednesday, April 18, 2007

BID-Scheme

It is an entry after a very long time. After my internship at INRIA, France, my research direction suddenly changed into bandwidth based resource discovery. As the first step towards it, I have been working on a landmarking scheme for distributed bandwidth prediction.

The bandwidth landmarking scheme I developed, which is named as BID-scheme, in inspired from landmarking schemes for network latency prediction and another bandwidth prediction scheme called BRoute. I will make another detailed entry on BRoute. I build the node states for bandwidth prediction similar to the latency-based landmarking scheme, but uses a deduction scheme similar to the one used in BRoute.

The landmarks in my BID-scheme are not some chosen nodes as in the latency-based landmarking schemes. BID-scheme uses some selected autonomous system (AS) domains as landmark domains (not landmark nodes). Another difference between the BID-scheme and the latency-based landmarking schemes are that, in BID-scheme, nodes are not placed in a Cartesian space -- it is impossible because bandwidth is not an additive measure like latency (and neither abides by the triangular inequality). Instead, the nodes in the BID-scheme probes the landmark domains to construct BID tables. Each row in a BID table belongs to a landmark AS and records information about it -- distance to it in terms of AS hop counts, average bottleneck bandwidth observed in reaching the AS and a bloom filter that stores information about the landmark AS's probability of being in the route towards other ASes.

Deduction of bandwidth between two end-points works in two phases: first to guess the most probable AS domain (from the landmark set) to be in between the two end-points; and then to deduce the end-to-end bandwidth using the two links -- source point to middle AS and middle AS to sink point. The bloom filters from the BID-table from both end-nodes are used to determine the best probable middle AS and the respective bandwidths are used to determine the end-to-end bandwidth.

One definite advantage of the BID-scheme over the BRoute scheme is that the BID-tables of the nodes can be used to create bandwidth identifiers (BIDs) of the nodes which can be used to create BID-based routing, in other words, a bandwidth-based discovery scheme. This BID-based routing/discovery scheme is a direct derivative of the DHT-based discovery schemes like Pastry.

I performed a simulation study on the scheme which showed promising results. We have submitted a paper to GlobeCom 2007 on this research. A journal version is currently under process.

Read more!

Sunday, July 16, 2006

Review: MSPastry

Miguel Castro, Manuel Costa and Antony Rowstron, "Performance and Dependability of Structured Peer-to-Peer Overlays," Technical report MSR-TR2003-94, Microsoft Research, 2003

MSPastry (probably stands for Microsoft Pastry!!) is not a new Pastry implementation, but intensively refines the routing and route maintenance techniques to makes the Pastry routing dependable. It is argued that to provide dependable routing the routing mechanism should be both consistent and reliable. consistency is defined as the guarantee to deliver a message with a key to the node that is responsible for the key, not to any other node. Reliability is defined as the guarantee for a message not to get lost on the route.

In addition to relying on the constraint gossiping proposed by the authors in a previous work (see my last blog entry), MSPastry adds some additional techniques to ensure route dependability. Obviously ensuring route dependability introduces additional overhead, but the technique proposed for MSPastry are self-tunable adjusting their overhead according to the stability of the network. That is when the system is fairly stable (not much churning), the proposed mechanism reduces their work and hence their imposed additional overhead.

MSPastry ensure a consistent routing by keeping the leaf sets consistent. "The state in the routing table is important fro performance but it is not necessary to ensure consistency." The key aspects of keeping the leaf set consistent is to (a) mark a node active (when joining) only after ensuring all the nodes in its leaf set is active by probing them; and (b) maintaining its leaf set by periodic heartbeats. However, in order to reduce the overhead of the heartbeat messages, MSPastry nodes send heartbeat messages on to its immediate left neighbors. When a node does not receive heart beat message from its right neighbor for a particular amount of time, it triggers a "suspect" routine to determine whether the right neighbor should be still in the leaf set.

The route reliability is achieved by active probing and per-hop ack messages. Under active probing, a node periodically probes each entry in its routing table to ensure they are still alive. When a route in on the way, the nodes expect to receive an ack message from the node to which the route is forwarded. If they did not receive ack, the node will reroute the message using another node from the routing table. The active probing period in MSPastry is self-tunable depending on the observed node failure rates in the network.

The paper shows by simulation and implementation that in fact MSPastry performs well in terms of dependable routing with low overhead.

Read more!

Friday, July 14, 2006

Review: Proximity Neighbor Selection in Tree-Based Structured Peer-to-Peer Overlays

Miguel Castro, Peter Druschel, Y. Charlie Hu and Antony Rowstron, "Proximity Neighbor Selection in Tree-Based Structured Peer-to-Peer Overlays,", Microsoft Technical Report, MSR-TR-2003-52

Structured P2P networks route queries based on the randomly assigned node IDs. Therefore, the path a query takes is short in terms of the node ID space, but not necessarily in the network delay space. The ratio of the network latency a message experience in a DHT-based routing to the actual delay between the source and destination is called stretch and it is generally larger than 1 in DHT-based overlays. In order to reduce the stretch in the queries, proximity neighbor selection (PNS) was suggested, which makes sure the entries in the routing table of a node are closer to the node in terms of network latency. However strict PNS generally results in high overhead.

This paper proposes a heuristic for PNS in Pastry based on constrained gossiping, called PNS-CG. PNS-CG refines the original Pastry node join and routing table maintenance. It also provides an intelligent algorithm for nearest seed selection for node joins (which was done by expanding ring search in the original Pastry).

In the new protocol, a new node joining the Pastry overlay gossip with the nodes on the way to the final join, so that the new node can fill the rows in its routing table from the information received from each node on the way. Also the new node advertises itself to those nodes so that they can update their routing tables. The other major part of the PNS-CG is to periodically gossip with one random node from each row in the routing table, to update its routing table with more suitable (proximity wise) and alive nodes.

The paper compares this heuristic with perfect PNS and shows that the performance of the new heuristic is very much comparable.

Read more!

Thursday, July 13, 2006

Review: Epidemic-style Management of Semantic Overlays for Content-Based Searching

Spyros Voulgaris and Maarten van Steen, "Epidemic-style Management of Semantic Overlays for Content-Based Searching," EuroPar 2005, Aug 2005

According to the authors, this is one of the pioneer works in creating semantic overlays using epidemic protocols. Semantic overlays are application level interconnection of nodes where semantically closed neighbors are connected with each other. The semantic proximity depends on the application of the network. For example, in case of a file sharing application, the amount of common files two nodes share can determine this measure (This is the metric used in this paper).

The paper clearly analyses the architecture of an epidemic protocol-based semantic network. Compared to the earlier proposals (according to the authors) which assumes stable network, the epidemic protocol-based architecture performs very well in environments under frequent churning. In this type of semantic overlay, each node maintains a set of nodes, called its semantic view, out of which it randomly selects a peer periodically and exchange its semantic information. Transitivity can be reasonably assumed among semantic neighbors, that is, if A is semantically closer to B that is semantically closer to C, A can be expected to be semantically closer to C. This notion helps in refining a node's semantic view by gossiping with its semantic neighbors periodically.

However, this leads to semantic clustering which prevents a node to discover nodes in another semantic cluster that become semantically closer recently. Therefore, this paper proposes a two-tier gossip overlay for constructing the semantic overlay where the top, actual semantic overlay depends on the bottom random-peer sampling overlay who can propose random nodes in the system to communicate with. The random-sampling overlay is based on epidemic-protocol as well.

The paper shows through via simulations that the convergence of the system into semantic clusters is fast and it can adjust itself with changes in the semantics due to node failures or changes in the file cache.

Read more!

Review: The Case For a Hybrid P2P Search Infrastructure

Boon Thau Loo, Ryan Huebsch, Ion Stoica and Joseph M. Hellerstein, "The Case for a Hybrid P2P Search Infrastructure," IPTPS 2004, pp 141-150

The paper is of two parts: in the first part, it analyses the performance of gnutella network in discovering rare items; and in the second part, it proposes a hybrid overlay to address the problem in gnutella network (or generally in an unstructured network).

The paper probes the gnutella network using mutliple vantage points and shows that the gnutella network performs badly in discovering rare items. It points out that structured networks are the exact opposite in nature, performing better with rare items, but introducing too much overhead with common items. This justify a hybrid architecture combining both the unstructured and structured overlays. In the hybrid architecture, the queries for common items are propagated in the unstructured overlay while the structured overlay is used for propagating queries for rare items.

The problem in this hybrid overlay is to identify which items are rare and which are not. The proposed architecture addresses the problem by connecting only the ultrapeer (similar to the superpeer in KaZaA) nodes in gnutella to join the structured overlay. These nodes can observe the results for the queries generated by the leaf nodes and decide which items are rare and which are not. Leaf nodes uses the unstructured overlay for queries and, in case of no result within a certain time, it relaunches the query on the structured overlay. The paper provides figures to prove that the search performance is improved in the hybrid.

Even though the paper talks about a hybrid of unstructured and structured networks, it consider a gnutella type of unstructured network which, with the use of ultrapeer, not a pure unstructured network. On the other hand, it can be argued that considering just the ultrapeers it is an unstructured network.

Read more!

Wednesday, July 12, 2006

Review: C2: A new Overlay Network Based on CAN and Chord

Wenyuan Cai, Shuigeng Zhou, Weining Qian, Linhao Xu, Kian-Lee Tan and Aoying Zhou, "C2: a new overlay network based on CAN and Chord," To appear in Int. J. High Performance Computing and Networking.

C2 is a P2P overlay built on top of CAN which a routing table notion borrowed from Chord. Therefore, it provides a routing performance like Chord (O(log N) route complexity), at the same time showing enough fault tolerance capability due to its multi path routing capability using CAN's multidimensions. Even though the paper is intensive in providing theoretical proofs on its performance, in my view, the idea is nothing new, but almost an identical copy of the idea proposed in the "Expressway in CAN" paper.

Read more!

Tuesday, May 16, 2006

From Rennes, France

Came to Rennes, France on a three month internship at INRIA (IRISA, Paris project). I am going to be here from May to July and working with Prof. Anne-Marie Kemmerrac and Prof. Marin Bertier on a project related to leveraging overlapping structured and unstructured P2P networks.

Read more!