Tuesday, December 18, 2007

Content Sharing with Good Privacy Control

This is a follow up on the previous post.

Open Tag is a tool where users do have the option to select what is private and to which degree. Moreover, users may withdraw completely from the system by removing their activity traces.

Wednesday, December 12, 2007

Privacy is on the news!

These days, online privacy have attracted a great deal of attention in technology media [3][4]. Perhaps, the online privacy topic is an old concern in some specific conversation/technology/academic circles, but privacy control was apparently dragged to the general public attention more recently due to the surge of online social networks.

It was interesting to see that online service providers as Ask.com is try to sell privacy control as one feature that differentiates them from their competitors [1].

Although there is an interest in the market to raise some awareness about privacy and to allow users to control the access to their online footprints [2], there is a question whether this will attract more consumers or not.

Regardless, I think the user must have the option of fine tunning the disclosure of his/her explicit and implicit online "footprints". Moreover, I believe that certain domains could benefit from systems designed around the high-level concept of online social networking, if better privacy control capabilities are put in place.


[1] Ask Eraser. http://sp.ask.com/en/docs/about/askeraser.shtml
[2] Attention Trust. http://www.attentiontrust.org.
[3] Will Privacy Sell?. Slashdot.org. December, 11, 2007.
[4] Evolving Privacy Concerns. MIT Technology Review. December, 11, 2007.

Saturday, November 03, 2007


No need for one more drop in the bucket of news about Google Social API.

Instead, I will comment on a kind of old, but interesting, news.

An article by Roy Furchgott (Navigating With Feedback From Fellow Drivers) in the New York Times describes a new GPS-enabled device that is able to receive traffic information based on the aggregation of the information collected from other cars using the same device.

The rationale behind the Dash Express (as the device is named) is that there is a wealth of information that individual cars can generate. Moreover, if such information is aggregated it can become extremely useful for drivers. Examples are obviously related to traffic, but they can also be related to business around a certain area (such as parking space).

The idea is neat. However, the aggregation is currently done in a centralized fashion. I do not have a clear picture on whether this centralized component limits the scalability or not.

In any case, there seems to have an interesting research problem in enabling a Dash-2-Dash communication, without relying on a central point to aggregate the information, and yet maintaining the quality of information by coping with devices that could report wrong data (maliciously or due to mechanism faults). In fact, there are some interesting results in the community of wireless sensor networks along these lines.

Saturday, October 06, 2007

Internet OS

Today, Xcerion contacted me. Just to let me know that I was not selected in the first batch to receive a developer/beta tester of their Internet OS - XIOS/3.

Although the site contains more information than a couple of months ago, as expected, their description of the underlying technologies are opaque.

It seems that that I'll have to wait a bit more...

Saturday, September 15, 2007

On The Repeatability of Experiments

As someone with an experimental background, I have found this initiative by the organizers of PODS/SIGMOD quite interesting.

In their page about guidelines for research papers, the program committee not only present the format guideline, but also a set of steps to help the committee to assess the "repeatability level" of an experimental research.

From their page:

we attempt to establish that the code developed by the
authors exists, runs correctly on well-defined inputs,
and performs in a manner compatible with that presented
in the paper.

Friday, August 31, 2007

respice finem

These days, a passage from a book I read a while ago reoccurred to me several times. It sounds interesting and it seems to have a broad applicability in science (and in life, if you wish).

Look at the unknown. This is old advice; the corresponding 
Latin saying is: "respice finem." That is, look at the end.
Remember your aim. Do not forget your goal. Think of what is
required. Keep in mind what you are working for. Look at the
unknown. Look at the conclusion
. The last two versions of
"respice finem" are specially adapted to mathematical problems,
to "problems to find" and to "problem to prove" respectively.


From: G. Polya. "How to Solve It: A New Aspect of Mathematical Method", page 123, 2nd Edition, 1973.

The only addendum, perhaps, is that the goal may change time to time.

Saturday, July 21, 2007

Open Books!

Rainy day. A cup of Ethiopia Harar and a book.

Today, while drinking a good cup of coffee, I came across the Open Library Project. At first, I thought the book reader used by the Open Library Project was the GreenStone (a tool developed by a digital library project in New Zealand at the University of Waikato with a long list of collaborators). As a matter of fact, it is not.

It is good to see that there are a lot of initiatives out there aiming to provide people with open access to books from a whole variety of topics. Projects like Project Gutenberg and Open Library are good examples that these efforts are up and running.

Seeing digital books as a component in the content generation/consumption ecosystem, it would be interesting to see the convergence of open libraries and urban-driven search tools. A good sample is the Google Books that allow readers to find the closest library that has a particular title available.

Perhaps, the next generation of such systems will also suggest what is the coffee shop that serves your favorite coffee to accompany the book you are searching for. :-)

Sunday, July 01, 2007

Happy 1st of July

For many reasons, my interest over virtual worlds is increasing everyday. Currently, the attention relies on the characterization of large scale distributed systems perspective, than from a business perspective (which is may be very interesting too!).

In the Scientific American's blog, Christopher Mins comments on a talk delivered by Mitch Kapor (among a list of activities, he is also investor and chair of Second Life).

The most interesting aspect of this talk, which seems natural to me, is the suggestion that Kapor (also as a chair of the Mozilla Foundation) defends the creation of open standards for virtual worlds.

It would be cool to think of a protocol design to allow interoperability among virtual worlds. Although this seems to be a classical software engineering problem, the requirements posed by virtual worlds (including the demand for monetary transactions) may be unique enough to create new research opportunities.

By the way, happy 1st of July. :-)

Sunday, June 24, 2007

Digital Libraries and User Attention

This past week, I attended to the JCDL'2007 (Joint Conference in Digital Libraries) and the CAMA'2007 (International ACM/IEEE Workshop on Contextualized Attention Metadata).

Since I cannot comment every single interesting paper that I've seen and discussed about (there are so many), I will point two interesting papers:

The first, which was also the very first presentation of the conference, World Explorer: Visualizing Aggregate Data from Unstructured Text in Geo-Referenced Collections (presented by Rahul Nair). This is a cool tool built on top of Flickr geotagging features. It is really nice to see how many applications are possible considering on-line communities like Flickr and del.icio.us that incorporate tagging features.

The main opportunity explored by the authors is to use geo-reference to cluster content. Besides a simply photo sharing mechanism, I think geotagging also opens up several research challenges/opportunities on designing applications for urban sensing.

The second paper was Can Social Bookmarking Enhance Search in the Web? (presented by Y. Yanbe).

The authors propose the introduction of what I would call a "Ranking Aggreagator" mechanism between the user, Google PageRank and del.icio.us ranking. Thus, their observation is that extremely fresh web pages tend to get low PageRank, but they may have a fair number of bookmark occurrences in del.icio.us. Therefore, they propose a combination of both ranking schemes to improve the ranking of fresh pages and allow the user to get good mix of 'reputable' pages via PageRank and popular pages on del.icio.us. Actually, I wondered if a combination of Google search history and the user interest sharing could be combined to provide better personalized search results.

I also participated to an interesting workshop, CAMA'2007, organized by Erik Duval, Martin Wolpers and Jehad Najjar.

The first talk by Seth Goldstein was exciting, possibly because it shows the incredibly large number of business opportunities are orbiting on-line social networks and how much value there is on online users attention

Joe Pagano from the Library of Congress presented some results on measuring the audience of a newly launched web site. His main finding was that more visitors come from blogs than from search engines, just to reinforce the intuition on the blog influence on the information consumption in the Web.

Personally, a positive aspect of this workshop was to identify possible applications that may validate our preliminary studies on interest sharing in collaborative tagging communities. For example, an extension on Joe Pagano's work would be the application of our interest sharing graph to understand how these visitors relate to each other and whether they from sub-communities of interest.

Also, Erik Duval made a nice comment on the fact that the large number of unique users we have found in our investigation over CiteULike and Bibsonomy may still present rich information to recommendation systems, even though these users are not connected to any island of interest we depicted in the interest sharing graph (more details here).

This is a brief summary of what I have seen this past week. Now, a lot of ideas to refine and put in practice...

Monday, June 04, 2007

Teragrid, Group Theory, Rubik's Cube

The Teragrid has been used to help finding the solution for the minimum number of steps that lead to the solution of the Rubik's Cube.

What impressed me more was the reduction of the initial number of possible states to enable the finding. Very neat!

See more here: How many moves does it take to solve a Rubik's Cube?
By Matt Ford

Sunday, June 03, 2007

Issues on Online Reputation

The Information Week has an interesting article on the issues involving the proliferation of web communities which are built around user generated content and mechanisms to build user reputation.

Web Credibility: Hard Earned, Harder To Prove
by J. Nicholas Hoover

Friday, May 25, 2007

Problem Solving Skills

Some works by George Polya captured my attention recently. In part, this happened because some of his works provide very powerful models to characterize large scale content/resource sharing communities (or systems, if you wish). For example, the Urn Model is suggested by Golder and Hubberman as a model to explain collaborative tagging behavior.

Another interesting work, which I am reading now, is the book How to Solve It. This book brings a methodology to motivate students on the process of strengthening their problem solving skills. I wonder how different it would be if some undergrad professors inspired a little bit of their course mechanics on that material.

Tuesday, April 24, 2007


Today, after a random walk on the web graph, I hit the website of band that like a lot: Aerotrio.

A recommendation is: while listening carefully to their music, appreciate some photographies by Salvatore. At least, it seems a good combination to me.

It would be great to see a concert by Aerotrio someday again.

Thursday, March 22, 2007

News from the Distributed Storage World

There was a post running on the FUSE mailing list that captured my attention this week. A Google FS-like sort of distributed file system named Startfish.

In their FAQ, there are several claims (and bar charts) about good performance results regarding several performance metrics (i.e. read/write throughput, scalability, etc) compared to NFS, SAMBA and Lustre file systems. But, I think a deep evaluation seems to be needed to assess these claimed advantages.

Also, this seems to be a major milestone on the increasing number of FUSE-based file systems. It would be nice to try it out.

Wednesday, March 14, 2007

The Art of Tagging Libraries

It turns out that there is a system that allows users to tag virtual library catalogs.

Thanks to my fellows at OpenTag.

The system is called PennTags and it is developed by people at University of Pennsylvania.

Thursday, March 08, 2007

The Art of Tagging

This seems a very interesting idea on how to extract the perception of public about objects, particularly, art works. Furthermore, it is also a clever way to understand what is the vocabulary actually used by the majority of the audience to describe the art works.

The Art Museum Social Tagging Project

There is a workshop paper about it in the program of WWW2006 Conference:

Investigating social tagging and folksonomy in art museums with steve.museum

An interesting application of the same idea would be to add tagging capabilities to online library catalogs, where users of public libraries could categorize items that they've read (or intend to) by using tags online. This could give a clue about the content of the item, a kind of highly compacted review.

Friday, March 02, 2007

Rethinking your role...

The BBC has published this article about a petition by European institutions requesting a that all government funded research should be easily available to the public - you can read the article here http://news.bbc.co.uk/2/hi/technology/6404429.stm.

Particularly, the idea of providing open access to government funded research publications seems a sensible move. However, the whole picture is a bit more complex, I guess. The article nicely points out what it may be the fundamental question regarding the discussion on "To Open or Not To Open", which is the role of researchers in disseminating information.

Friday, February 16, 2007

"High Fidelity"

A short break on popularity distributions and clustering coefficients.

The novel High Fidelity by Nick Hornby is guaranteed fun. The movie based on the novel, which is located at Chicago, is also great. I also must confess that I got the Top Five fever for a while after reading the book and watching the movie.

Thus, just for your fun consideration, appreciated reader, here is my top 5 ranking for cover versions of songs.

1. "Garota de Ipanema" by RUSH (originally by Tom Jobim and Vinicius de Moraes)
2. "What a wonderful world" by Joe Ramone (orginally by Louis Armstrong)
3. "Have a Cigar" performed by Foo Fighters (originally by Pink Floyd)
4. "Bullet in the Blue Sky" by Sepultura (originally by U2)
5. "Hurt" by Johnny Cash (originally by NIN)

Now, a nice place to find those songs... Hype Machine, a nice tool for search music and video content commented on blogs.


Saturday, February 03, 2007

Hemingway and FLP

Yesterday, I have finished reading For Whom the Bell Tolls by Ernest Hemingway. It is not only because the book is written by a Nobel Prize winner that it is worth reading, it is a very interesting and thought provoking book about the human nature. My impression is that the book becomes more and more exciting as it comes to the end.

Anyway, the main goal of this post is not solely discuss my impressions about this Hemingway's book. The fact is that for several chapters the main character faces a dilemma, which is in my opinion the same challenge faced by those building reliable distributed systems, this is explained by a great paper well known as - FLP. :-)

(Un)fortunately, Robert Jordan could not have access, at that time, to the ground breaking paper by Fisher, Lynch and Patterson (Impossibility of distributed consensus with one faulty process) to shed some light (or darkness, perhaps) on his situation.

Definitely, both works are great references...

Sunday, January 21, 2007

The Power of Probability

An article published at Live Science defends the idea that it is possible that two snowflakes are alike. At least if you believe in probability.

The article says that it is possible to have two snowflakes alike, but with very low probability. Moreover, it totally depends on the way similarity is defined ( number of crystals and shape, for example).

So, be aware when you use the setence "no two snowflakes are alike".

For those who like macro photography, the pictures are very nice.

Wednesday, January 10, 2007

Zenit 12XP

A bit of photography now.

As a result of a complex conjunction events, I decided to use my first reflex camera again for a couple of weeks.

It might sound odd, but there is a kind of unique experience on handling a totally mechanic camera as opposed to a modern one (film or digital). From the film loading process, which is more like a ritual than a technical process, to the feeling-based exposure settings.

I've got very nice experimental results with this came in low light conditions. However, what impressed more was color saturation levels I got in a set of pictures.

There was an expedition where I took the same pictures with the Zenit 12XP and an EOS 300 (EOS Rebel in USA) with the same negative film, developed at the same place.

I was planning a set of photographs of urban landscapes with snow. Unfortunately, the shutter mechanism presented some problems in the first frames of a film last weekend. Incredible sunset light tones illuminating the buildings on my neighborhood, but the camera asked for a retirement.

Great times we had together!