Wednesday, July 15, 2009

The Internet and its Topology

A while ago, I came across an article that discusses points related to network modeling and characterization [1], particularly the Internet physical topology.

The motivation used by the authors is, as the article puts it, the power-law argument. In particular, the authors highlight important points researchers should focus when performing similar studies. They also go further and challenge the now traditional assumption that the Internet topology resembles a scale free network. In this post, I briefly summarize the paper.

The paper recounts the now well known and accepted scale-free Internet argument (i.e., that the Internet topology is well modeled by a network with a power-law node degree distribution). The arguments, as presented by the authors, are rooted on the limitations of traceroute on accurately determining the physical topology, as it only captures interfaces of routers instead of physical boxes.

In short, the work proposes to focus on the decision making process an Internet service provider goes through when planning and deploying its physical infrastructure, as opposed to using traceroute traces to inspire models for the Internet topology. Moreover, the authors suggest that the right tool to do that is constrained optimization that allows to formalize the referred decision making process.

Other interesting points extracted from the paper:

a).
A node high degree implies low capacity links, conversely low degree implies high capacity links, this is due to the limited capacity on processing traffic. Thus, the high degree nodes would be on the edge of network, as opposed to the core, which is different from what most of the traceroute studies claim.


b).
To avoid confusion and to emphasize the fact that preferential attachment is just one of many other mechanisms that is capable of generating scale-free graphs, we will refer here to the network models proposed in [2].


Suggested principles on characterization and modeling, which, I think, are sufficiently general to be applied in fields other than network topology characterization:

1. Know your data:
The data used by Faloutsos, Faloutsos, and Faloutsos [3] was not intended to capture the physical topology, rather to "get some experimental data on the shape of multicast trees one an actually obtain in the real Internet"


2. When modeling is more than data fitting:
If we wish to increase our confidence in a proposed model, we ought also to ask what new types of measurements are either already available or could be collected and used for validation. (By new they mean completely new types of data not involved whatsoever with the original modelling exercise).


3. Know your statistic:
Once agreeing that the data set is problematic, one could try to use a more robust statistic to avoid the mistakes.


[1] W. Willinger, D. Alderson and C. Doyle. "Mathematics and the Internet: A source of enormous confusion and great potential", Notices of the AMS, Vol. 56, no. 5, May 2009.

[2] A.-L. Barabási and R. Albert, Emergence of scaling in random networks, Science 286 (1999).

[3] M. Faloutsos, P. Faloutsos, and C. Faloutsos, On power-law relationships of the Internet topology, ACM SIGCOMM (1999).