Mundaú - Distributed Computing (or not): peer-to-peer

Showing posts with label peer-to-peer. Show all posts

Wednesday, December 01, 2010

Theory Tuesdays -- What do Wikipedia, car-pooling, couch surfing, and social bookmarking have in common?

Last fall, I worked for four months in Google, Zürich. It was a fun and enriching experience, indeed. The opportunity to learn and contribute to the technology that is used by millions of people daily is quite exciting.

Besides enjoying the Google-life, during my days in Zürich, I was invited by two artist friends, Silvan Käelin [1,2,3] and Philip Matesic [4,5,6]), to give a talk about my PhD thesis research at Perla Mode -- as part of the Theory Tuesdays project.

Philip organizes a weekly event at Perla Mode named Theory Tuesdays. The goal is to bring together artists, researchers (of multiple disciplines), and the public to discuss a variety of topics such as art, technology, society, and their intersection.

I received the invitation with surprise and interest. It was a new experience to talk to an audience completely outside my field of research. Also, it was a chance to receive feedback (from such non-technical group) about what I think it is a relevant topic of study.

The idea was to have a middle ground between what I have been investigating (i.e., techniques to assess the value of contributions in peer production systems [pdf]) and a the broader topic that could interest the attendance.

Therefore, it seemed appropriate to introduce the notion of peer production systems [7], hint the questions I am interested in answering in this context, and asking the participants related questions such as: how do they perceive the value of information they consume online? do they often perceive themselves contributing to others by producing information online? what is the main incentive to do so? What are the aspects they take into account to decided whether an information provider produces value to them?

I am glad that the "talk" turned into a lively conversation about about all these questions and other aspects related to online peer production. We covered topics from the basic notion of social production (and why it works so well in certain scenarios), passed through specifics about the utility of tagging (e.g., classification languages may emerge through collaboration), and talked about the intuition behind the techniques I am designing to assess the value of contributions in social tagging systems.

Although anecdotal, it was possible to observe from the discussion two explicit trends on the perception of value of online peer-produced information: novelty and trust on the information producer. These aspects came up in the discussions as crucial to the users to assess the value of peer-produced information. It is important to note that the information consumer's interest is an implicitly aspect considered in the value assessment.

The observations are somehow intuitive, but for me it was quite important and helpful to have a first-hand discussion with the real users of the systems I study. It does help one to tune the questions to ask and where the relevancy of one's research. I hope to have more opportunities like this. Thanks to Silvan and Philip for the first one.

References:

[1] One Man
[2] Lagoa do Ouro
[3] Temps de Poussiére (Time of Dust)
[4] An Bonus
[5] Mau Series
[6] To Don Pedro with Mr. Gonzalez
[7] Y. Benkler. "The Wealth of Networks: How Social Production Transforms Markets and Freedom"

Friday, October 01, 2010

The Small World of File Sharing

The IEEE TPDS has recently published a preprint of an interesting piece of work that I had the pleasure to collaborate with.

Adriana Iamnitchi, Matei Ripeanu, Elizeu Santos-Neto, Ian Foster, "The Small World of File Sharing," IEEE Transactions on Parallel and Distributed Systems, 28 Sept. 2010. IEEE computer Society Digital Library. IEEE Computer Society.

Abstract:

Web caches, content distribution networks, peer-to-peer file sharing networks, distributed file systems, and data grids all have in common that they involve a community of users who use shared data. In each case, overall system performance can be improved significantly by first identifying and then exploiting the structure of community's data access patterns. We propose a novel perspective for analyzing data access workloads that considers the implicit relationships that form among users based on the data they access. We propose a new structure —the interest-sharing graph— that captures common user interests in data and justify its utility with studies on four data-sharing systems: a high-energy physics collaboration, the Web, the Kazaa peer-to-peer network, and a BitTorrent file-sharing community. We find small-world patterns in the interest-sharing graphs of all four communities. We investigate analytically and experimentally some of the potential causes that lead to this pattern and conclude that user preferences play a major role. The significance of small-world patterns is twofold: it provides a rigorous support to intuition and it suggests the potential to exploit these naturally emerging patterns. As a proof of concept, we design and evaluate an information dissemination system that exploits the small-world interest-sharing graphs by building an interest-aware network overlay. We show that this approach leads to improved information dissemination performance.

Wednesday, August 13, 2008

The Anticommons

The New Yorker's Financial Page has an interesting article: The Permission Problem by James Surowiecki (the author of Wisdom of Crowds).

James Surowiecki discusses the notion of anticommons as presented by Professor Michael Heller (The Gridlock Economy).

To illustrate the point, Surowiecki points out two extreme scenarios of the resource sharing problem -- i) common good model: the resource is deemed public and it is shared among individuals without the notion of individual ownership over the shared resource; ii) private property model: the notion of unlimited property, where the resource is owned by a subset of individuals, who may charge other individuals that want to consume units of that resource.

The article says that, on the one hand, common goods may lead to the well-known tragedy of the commons: overuse. On the other hand, unlimited property rights may lead to the exactly opposite: waste of resources (or the Tragedy of the anticommons.

The article has nice examples:

[...]
The commons leads to overuse and destruction; the anticommons leads to underuse and waste. In the cultural sphere, ever tighter restrictions on copyright and fair use limit artists’ abilities to sample and build on older works of art. In biotechnology, the explosion of patenting over the past twenty-five years—particularly efforts to patent things like gene fragments—may be retarding drug development, by making it hard to create a new drug without licensing myriad previous patents. Even divided land ownership can have unforeseen consequences. Wind power, for instance, could reliably supply up to twenty per cent of America’s energy needs—but only if new transmission lines were built, allowing the efficient movement of power from the places where it’s generated to the places where it’s consumed. Don’t count on that happening anytime soon. Most of the land that the grid would pass through is owned by individuals, and nobody wants power lines running through his back yard.
[...]

It seems to me that certain computational environments present an interesting middle ground between these two extremes discussed above. For example, Nazareno pointed out a while ago to a Large-scale commons-based Wi-fi: users, who own an Internet connection, may share the spare capacity in exchange for either using the spare capacity of others later or being paid for it. The wonderful insight of this resource sharing model is that people buy more Internet bandwidth (as many other computational goods -- if I can name it like this) than they are able to use. Hence, resources are mostly underutilized. So, why not sharing it (the spare capacity)in exchange for access to others spare capacity in the future?

Finally, a question comes to my mind: besides the fact that certain resource units bear an extra capacity by definition (e.g. often my CPU is 99% idle), does any other intrinsic resource characteristic play a role in suggesting which model is suitable for the sharing of that resource?

Thursday, June 05, 2008

IPTV Viewing Habits and Netflix Player

The IPTPS 2008 has an interesting paper on exploiting TV Viewing Habits to reduce the traffic on the ISP backbone generated by IPTV consumers.

"On Next-Generation Telco-Managed P2P TV Architectures" by Meeyoung Cha (KAIST), Pablo Rodriguez (Telefonica Research, Barcelona), Sue Moon (KAIST), and Jon Crowcroft (University of Cambridge).

In this paper the authors analyze the utilization o P2P content distribution techniques to reduce network overhead in a Internet Service Provider IPTV infrastructure. To exploit the patterns of channel holding time, channel popularity and the correlation between the time of the day and the number of viewers, the authors propose a locality-aware P2P content distribution scheme that reduces the traffic on the ISP backbone.

From the paper:

we ascertain the sweet spots and the overheads of server-based unicast, multicast, and serverless P2P and also show the empirical lower bound network cost of P2P (where cost reduction is up to 83% compared to current IP multicast distribution

[...]

We believe that our work provides valuable insights to service providers in designing the next-generation IPTV architecture. Especially, it highlights that dedicated multicast is useful for few of the extremely popular channels and that P2P can handle a much larger number of channels while imposing very little demand for infrastructure.

This week I saw some news on the internal characteristics of the Netflix Player. Immediately, I thought of the paper from IPTPS as a possible optimization to the Netflix Player.

The NetFlix player is supposed to use the conventional broadband connection, as opposed to well provisioned IPTV architectures described in Cha et al. Perhaps, the locality-aware P2P content distribution technique is even more interesting in the Netflix Player case.

Nevertheless, the viewing habits and interest sharing among Netflix users may differ dramatically from what is observed in the IPTV environment, which would impact the efficiency of the locality-aware P2P content distribution.

Friday, May 16, 2008

OurGrid 4.0 released

Good news from the South! The OurGrid 4.0 is out.

In summary: OurGrid is an open source, free-to-join, peer-to-peer grid, where users trade computational resources. The loosely coupled computational infrastructure is ideal for the execution of embarrassingly parallel applications.

I am particularly glad with this release, as OurGrid has been a useful tool in my previous studies. I used it to analyze traces of activity of content sharing communities. OurGrid makes it easy to harness the idle times of our desktop machines and monitor the progress of the computations in a much easier way.

Next week I will definitely give a try on the new version (as we are still using version 3.3).

The new site looks great too. Congratulations, OurGrid Community! :-)

Tuesday, February 12, 2008

Interesting Articles: IPTPS 2008

For those interested in the convergence of Online Social Networks and Peer-to-Peer Systems, it is worth taking a look at some articles in the program of the International workshop on Peer-To-Peer Systems (IPTPS).

Mundaú - Distributed Computing (or not)