Wednesday, March 05, 2008

Making faces in virtual worlds

Today, a friend sent me a link to this device named: Emotiv.

A rough description of it would be: a helmet that monitors your brain activity and convert it into signals that can be used in a variety of ways.

Although games might be the first thought, the Expressive application particularly reminded me of a previous post by Ian Foster on a Second Life hack that allows the manipulation of objects in virtual worlds as a response to some physical process.

Thursday, February 21, 2008

Hadoop - now in larger scales!

Yahoo! just reported their new deployment of a Hadoop-based application. The achievement is considered to be the world's largest Hadoop deployment in a production environment.

The scale of the application is quite impressive. They used Hadoop to process the Webmap, as part of their search engine architecture. From their post (check their website for a video with some discussion about Hadoop in this context):

* Number of links between pages in the index: roughly 1 trillion links
* Size of output: over 300 TB, compressed!
* Number of cores used to run a single Map-Reduce job: over 10,000
* Raw disk used in the production cluster: over 5 Petabytes

I can even see the difference in the quality of the search results now. :-)

Update: Greg Linden also posted about the new Hadoop-cluster. It's nice that he puts the numbers above in perspective, by comparing to Google's infrastructure.

Tuesday, February 12, 2008

Interesting Articles: IPTPS 2008

For those interested in the convergence of Online Social Networks and Peer-to-Peer Systems, it is worth taking a look at some articles in the program of the International workshop on Peer-To-Peer Systems (IPTPS).

Tuesday, January 29, 2008

Taking photography to new heights

In 1906, some panoramic pictures of San Francisco after the big earthquake were taken. These were not ordinary pictures. George Lawrence used kites to place a camera at the right place to record the extension of the damage caused by the earthquake. Besides the historic value of the pictures, they are the outcome of a quite interesting engineering project.

Two years ago, Lawrence's project was revisited. Although they did not use kites this time, the picture is still impressive. They also have some interactive version that allows you to zoom in and see more details of the landscape.

Saturday, January 19, 2008

Scientific Data For All!

The Wired Blog is running a brief article about yet another Google's initiative. The idea is to provide storage, and as far as I understood, free access to scientific data sets.

One interesting point of the article is the following:

(Google people) are providing a 3TB drive array (Linux RAID5). The array is provided in "suitcase" and shipped to anyone who wants to send they data to Google. Anyone interested gives Google the file tree, and they SLURP the data off the drive. I believe they can extend this to a larger array (my memory says 20TB).

It sounds exciting that in the near feature, we might have access to a long list of data sets. Perhaps, under a standard API. If you like buzzwords, this might be named (if it is not the case already) -- Science in the Cloud. Despite the name they will give to this, this initiative can bring a long list of advantages for the the scientific community, I think.

Finally, I wonder when the RFC for the "suitcase-based transport protocol" will be available (similar to RFC 1149). :-)

Tuesday, December 18, 2007

Content Sharing with Good Privacy Control

This is a follow up on the previous post.

Open Tag is a tool where users do have the option to select what is private and to which degree. Moreover, users may withdraw completely from the system by removing their activity traces.

Wednesday, December 12, 2007

Privacy is on the news!

These days, online privacy have attracted a great deal of attention in technology media [3][4]. Perhaps, the online privacy topic is an old concern in some specific conversation/technology/academic circles, but privacy control was apparently dragged to the general public attention more recently due to the surge of online social networks.

It was interesting to see that online service providers as Ask.com is try to sell privacy control as one feature that differentiates them from their competitors [1].

Although there is an interest in the market to raise some awareness about privacy and to allow users to control the access to their online footprints [2], there is a question whether this will attract more consumers or not.

Regardless, I think the user must have the option of fine tunning the disclosure of his/her explicit and implicit online "footprints". Moreover, I believe that certain domains could benefit from systems designed around the high-level concept of online social networking, if better privacy control capabilities are put in place.

References

[1] Ask Eraser. http://sp.ask.com/en/docs/about/askeraser.shtml
[2] Attention Trust. http://www.attentiontrust.org.
[3] Will Privacy Sell?. Slashdot.org. December, 11, 2007.
[4] Evolving Privacy Concerns. MIT Technology Review. December, 11, 2007.

Saturday, November 03, 2007

GPS-2-GPS

No need for one more drop in the bucket of news about Google Social API.

Instead, I will comment on a kind of old, but interesting, news.

An article by Roy Furchgott (Navigating With Feedback From Fellow Drivers) in the New York Times describes a new GPS-enabled device that is able to receive traffic information based on the aggregation of the information collected from other cars using the same device.

The rationale behind the Dash Express (as the device is named) is that there is a wealth of information that individual cars can generate. Moreover, if such information is aggregated it can become extremely useful for drivers. Examples are obviously related to traffic, but they can also be related to business around a certain area (such as parking space).

The idea is neat. However, the aggregation is currently done in a centralized fashion. I do not have a clear picture on whether this centralized component limits the scalability or not.

In any case, there seems to have an interesting research problem in enabling a Dash-2-Dash communication, without relying on a central point to aggregate the information, and yet maintaining the quality of information by coping with devices that could report wrong data (maliciously or due to mechanism faults). In fact, there are some interesting results in the community of wireless sensor networks along these lines.

Saturday, October 06, 2007

Internet OS

Today, Xcerion contacted me. Just to let me know that I was not selected in the first batch to receive a developer/beta tester of their Internet OS - XIOS/3.

Although the site contains more information than a couple of months ago, as expected, their description of the underlying technologies are opaque.

It seems that that I'll have to wait a bit more...

Saturday, September 15, 2007

On The Repeatability of Experiments

As someone with an experimental background, I have found this initiative by the organizers of PODS/SIGMOD quite interesting.

In their page about guidelines for research papers, the program committee not only present the format guideline, but also a set of steps to help the committee to assess the "repeatability level" of an experimental research.

From their page:


...
we attempt to establish that the code developed by the 
authors exists, runs correctly on well-defined inputs, 
and performs in a manner compatible with that presented 
in the paper.
...

Friday, August 31, 2007

respice finem

These days, a passage from a book I read a while ago reoccurred to me several times. It sounds interesting and it seems to have a broad applicability in science (and in life, if you wish).

[...]

Look at the unknown. This is old advice; the corresponding 
Latin saying is: "respice finem." That is, look at the end. 
Remember your aim. Do not forget your goal. Think of what is 
required. Keep in mind what you are working for. Look at the
unknown. Look at the conclusion. The last two versions of 
"respice finem" are specially adapted to mathematical problems,
to "problems to find" and to "problem to prove" respectively.

[...]

From: G. Polya. "How to Solve It: A New Aspect of Mathematical Method", page 123, 2nd Edition, 1973.

The only addendum, perhaps, is that the goal may change time to time.

Saturday, July 21, 2007

Open Books!

Rainy day. A cup of Ethiopia Harar and a book.

Today, while drinking a good cup of coffee, I came across the Open Library Project. At first, I thought the book reader used by the Open Library Project was the GreenStone (a tool developed by a digital library project in New Zealand at the University of Waikato with a long list of collaborators). As a matter of fact, it is not.

It is good to see that there are a lot of initiatives out there aiming to provide people with open access to books from a whole variety of topics. Projects like Project Gutenberg and Open Library are good examples that these efforts are up and running.

Seeing digital books as a component in the content generation/consumption ecosystem, it would be interesting to see the convergence of open libraries and urban-driven search tools. A good sample is the Google Books that allow readers to find the closest library that has a particular title available.

Perhaps, the next generation of such systems will also suggest what is the coffee shop that serves your favorite coffee to accompany the book you are searching for. :-)

Sunday, July 01, 2007

Happy 1st of July

For many reasons, my interest over virtual worlds is increasing everyday. Currently, the attention relies on the characterization of large scale distributed systems perspective, than from a business perspective (which is may be very interesting too!).

In the Scientific American's blog, Christopher Mins comments on a talk delivered by Mitch Kapor (among a list of activities, he is also investor and chair of Second Life).

The most interesting aspect of this talk, which seems natural to me, is the suggestion that Kapor (also as a chair of the Mozilla Foundation) defends the creation of open standards for virtual worlds.

It would be cool to think of a protocol design to allow interoperability among virtual worlds. Although this seems to be a classical software engineering problem, the requirements posed by virtual worlds (including the demand for monetary transactions) may be unique enough to create new research opportunities.

By the way, happy 1st of July. :-)

Sunday, June 24, 2007

Digital Libraries and User Attention

This past week, I attended to the JCDL'2007 (Joint Conference in Digital Libraries) and the CAMA'2007 (International ACM/IEEE Workshop on Contextualized Attention Metadata).

Since I cannot comment every single interesting paper that I've seen and discussed about (there are so many), I will point two interesting papers:

The first, which was also the very first presentation of the conference, World Explorer: Visualizing Aggregate Data from Unstructured Text in Geo-Referenced Collections (presented by Rahul Nair). This is a cool tool built on top of Flickr geotagging features. It is really nice to see how many applications are possible considering on-line communities like Flickr and del.icio.us that incorporate tagging features.

The main opportunity explored by the authors is to use geo-reference to cluster content. Besides a simply photo sharing mechanism, I think geotagging also opens up several research challenges/opportunities on designing applications for urban sensing.

The second paper was Can Social Bookmarking Enhance Search in the Web? (presented by Y. Yanbe).

The authors propose the introduction of what I would call a "Ranking Aggreagator" mechanism between the user, Google PageRank and del.icio.us ranking. Thus, their observation is that extremely fresh web pages tend to get low PageRank, but they may have a fair number of bookmark occurrences in del.icio.us. Therefore, they propose a combination of both ranking schemes to improve the ranking of fresh pages and allow the user to get good mix of 'reputable' pages via PageRank and popular pages on del.icio.us. Actually, I wondered if a combination of Google search history and the user interest sharing could be combined to provide better personalized search results.

I also participated to an interesting workshop, CAMA'2007, organized by Erik Duval, Martin Wolpers and Jehad Najjar.

The first talk by Seth Goldstein was exciting, possibly because it shows the incredibly large number of business opportunities are orbiting on-line social networks and how much value there is on online users attention

Joe Pagano from the Library of Congress presented some results on measuring the audience of a newly launched web site. His main finding was that more visitors come from blogs than from search engines, just to reinforce the intuition on the blog influence on the information consumption in the Web.

Personally, a positive aspect of this workshop was to identify possible applications that may validate our preliminary studies on interest sharing in collaborative tagging communities. For example, an extension on Joe Pagano's work would be the application of our interest sharing graph to understand how these visitors relate to each other and whether they from sub-communities of interest.

Also, Erik Duval made a nice comment on the fact that the large number of unique users we have found in our investigation over CiteULike and Bibsonomy may still present rich information to recommendation systems, even though these users are not connected to any island of interest we depicted in the interest sharing graph (more details here).

This is a brief summary of what I have seen this past week. Now, a lot of ideas to refine and put in practice...

Monday, June 04, 2007

Teragrid, Group Theory, Rubik's Cube

The Teragrid has been used to help finding the solution for the minimum number of steps that lead to the solution of the Rubik's Cube.

What impressed me more was the reduction of the initial number of possible states to enable the finding. Very neat!

See more here: How many moves does it take to solve a Rubik's Cube?
By Matt Ford

Sunday, June 03, 2007

Issues on Online Reputation

The Information Week has an interesting article on the issues involving the proliferation of web communities which are built around user generated content and mechanisms to build user reputation.

Web Credibility: Hard Earned, Harder To Prove
by J. Nicholas Hoover

Friday, May 25, 2007

Problem Solving Skills

Some works by George Polya captured my attention recently. In part, this happened because some of his works provide very powerful models to characterize large scale content/resource sharing communities (or systems, if you wish). For example, the Urn Model is suggested by Golder and Hubberman as a model to explain collaborative tagging behavior.

Another interesting work, which I am reading now, is the book How to Solve It. This book brings a methodology to motivate students on the process of strengthening their problem solving skills. I wonder how different it would be if some undergrad professors inspired a little bit of their course mechanics on that material.

Tuesday, April 24, 2007

Aerotrio

Today, after a random walk on the web graph, I hit the website of band that like a lot: Aerotrio.

A recommendation is: while listening carefully to their music, appreciate some photographies by Salvatore. At least, it seems a good combination to me.

It would be great to see a concert by Aerotrio someday again.

Thursday, March 22, 2007

News from the Distributed Storage World

There was a post running on the FUSE mailing list that captured my attention this week. A Google FS-like sort of distributed file system named Startfish.

In their FAQ, there are several claims (and bar charts) about good performance results regarding several performance metrics (i.e. read/write throughput, scalability, etc) compared to NFS, SAMBA and Lustre file systems. But, I think a deep evaluation seems to be needed to assess these claimed advantages.

Also, this seems to be a major milestone on the increasing number of FUSE-based file systems. It would be nice to try it out.

Wednesday, March 14, 2007

The Art of Tagging Libraries

It turns out that there is a system that allows users to tag virtual library catalogs.

Thanks to my fellows at OpenTag.

The system is called PennTags and it is developed by people at University of Pennsylvania.

Mundaú - Distributed Computing (or not)