Mundaú - Distributed Computing (or not): 2006

Wednesday, December 13, 2006

BloggerFS

Last night I was playing around with FUSE, FUSE-J and Blogger Data API. The result is a new FUSE-based filesystem, which is named BloggerFS.

The idea is very simple, by using BloggerFS one can manipulate posts on her blog as if they were files. The user just need to pass a mounting point, a blog address (it needs to be on Blogger) and a pair username/password.

I intend to publish the source code under GPL as soon as the code is fairly documented and ready for use in production. It would be nice to hear your experiences when using it.

Cheers,
./Eli

Netherlands goes all digital

The government completely siwtched the analog television transmission to digital. The main reason was what surprised me: only 74,000 households will be directly affected by this shift.

According to the news, cable-TV represents 94% of the TV market in Netherlands. This is pointed as one more reason to increase the competition by expanding the coverage of digital TV.

To be able to receive the digital TV signal, users will have to buy a tuner which is estimated to cost around $66.50. In terms of costs, this seems to be much cheaper when comparing to the costs per household to maintain the "free" analog TV. The governement will save $200 per household (a year) with this shift according to the information published at Chron.com (link below).

It seems to me very interesting the role of people that need to think about new formats for TV shows, movies "made for TV", news, and so on, assuming a new technology to transmit them. I had a very inspiring converstion about how Video Art has changed the way people do TV shows, and way people think of doing a TV show is completely different fom cinema. Interestingly, a decisive aspect is the technology that is going to be used for recording and transmitting.

Source: http://www.chron.com/disp/story.mpl/headline/world/4393351.html

Monday, November 27, 2006

First Impressions of Supercomputing & VTCD'06

I guess this is a late post. However, I still think it is relevant to say some words about my impressions of Supercomputing Conference 2006 (SC'06).

The SC'06 was held in Tampa, FL this year. As my first time in the conference, I had a bit of expectation about how large the conference is. It was very good to see the mix of industry and academia there. The main challange maybe is to measure how the demand posed by industrial scale clients would be driving research labs and university efforts.

The First International Workshop on Virtualization Technology in Distributed Computing (VTDC'06) was held in conjunction with SC'06 and it was a great opportunity to meet researchers involved in virtualization technologies.

I have been most of the time attending to the technical sessions, where there were a lot of interesting works and conversations. Here is a small list of papers that I have found neat ideas and important findings:

Sage A Weil et al. "CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data"

Zheng Zhang et al. "CycleMeter: Detecting Fraudulent Peers in Internet Cycle Sharing"

Sotomayor et al. " Overhead Matters: A Model for Virtual Resource Management". VTDC'06.

David Wolinsky et al. "On the Design of Virtual Machine Sandboxes for Distributed Computing in WOWs"

Hopefully see you on SC'07...

Cheers,
./Eli

Thursday, October 19, 2006

Citizen Collaborative Work

Recently, I have been reading some articles on policies and mechanisms for collaborative work on virtual environments like the Web and I have found some interesting works on reputation schemes to ameliorate the quality of collaboration on collaborative content production.

A while ago I came across an article by Tom Cross on First Monday (Puppy smoothies: Improving the reliability of open, collaborative wikis) which proposes a simple mechanism to allow users to identify parts of a Wiki article that it is not "mature" yet. The rationale behind the approach proposed by Cross is that texts that survive a sufficiently long period of iterative and collaborative editing process may be considered mature and accurate. Conversely, texts recently added may contain inaccuracies. So, readers should be aware of which part is still not mature.

Today I have found a site which promotes the collaborative content production, Bikely. The idea is to allow users to share their bike routes, to add comments and share information about these routes. Basically, a use case for a combination of Wiki and Google Maps. :-)

In that context, I was wondering about a mechanism which could improve the quality of information about routes and comments on routes. In my first thought, the assumption used by Cross may not apply here, since routes could be out of date over time in contrast to text that gets mature. It has also the case where routes and comments are misleading. Anyway, I believe this is a good motivation to think about a reputation scheme with a nice application.

By the way, check out my routes:

- Flamingo - UFCG (Campina Grande, PB, Brazil).
- Chesnut Park - UC Davis (Davis, CA, US)

Cheers,
Eli

Wednesday, September 13, 2006

"I read banned books"

I have seen a new post on the The Google Blog about a list of banned and challenged books from 20th century. It was nice to see that a lot of my favorite books are part of the list of best novels of the 20th century have been challenged or banned.

The interesting fact on that is that I do not usually decide about reading a book by magazine reviews and/or ranking of latest week's best sellers. I'd rather to collect good opinions in person talking to my friends. However, most of the time I discover good authors by 'accident' (by watching movies that refer some particular authors or by listening to some music which is inspired in some books). :-)

Furthermore, the list contains several books that I have already put on my shelf. Now, I was wondering what would be banned today ?

Keep reading! ;-)
./Eli

Tuesday, September 05, 2006

Fall is coming... Not really!

Alright, the Fall is almost coming, right? Not really. :-) I am in Brazil now and here Spring starts late this month. Thus, the inspiring summer days in Ryerson Hall at University of Chicago working in collaboration with Kate Keahey from Argonne National Labs as part of an internship are behind now.

However, we will be able to remember our wonderful conversation about Shakespeare and Tom Stoppard plays, since our paper was accepted in the Supercomputing'06. The work is entitled To Bid or Not To Bid: A Hybrid Market-Based Resource Allocation Framework. Actually, talking about those authors was motivated by the title that I suggested to the paper. I almost included 'Arcadia' and 'Romeo & Juliet' in the references. :-)

Now, I am part of the LSD-family (Laboratorio de Sistemas Distribuidos) again, at least for the next 4 months. The idea is to be involved with the same research topics that I have been looking into during the summer season at Chicago, but the application is different. The goals is to improve the virtualization solution provided by the OurGrid. I am very glad to be here after one year away to meet my former advisors and friends.

So, I'm looking forward to be in the Supercomputing in November. I hope to see most of my friends from Chicago there. Making new friends is part of the plan as well. :-)

Cheers,
Eli

Tuesday, August 08, 2006

Lollapalooza Days

This might seem off topic. However, I would like to comment on my experience at Lollapalooza.

I had the great opportunity to go to Lollapalooza last weekend. The festival was located in Grant Park at Downtown Chicago. I long list of bands played during three days of a good (sometimes, not so good) rhythm mixture.

Lollapalooza was probably the biggest music festival that I have been so far. I would say that the second in size was the Rock In Rio III (Rio de Janeiro, Brazil). The infrastructure of Lollapalooza was perfect, my only complaint was the fact that they were not selling beer, only Budweiser. :-) The concerts were perfectly scheduled and I really enjoyed the whole environment around Grant Park and Downtown.

Last Monday, a friend sent an article from MIT Technology Review about Technology @ Lollapalooza. This is just one more example that networked technology is part of our daily lives.

Despite the fact that the technology device that I took more advantage was a portable fan with an attached water mister during the very very hot weather on Friday, it was interesting to see the high-tech tends packed.

I was wondering later that one could enjoy this huge live crowd to perform a kind of field experiment with sensor devices attached to people. Perhaps, some of the collected data in this "live experiment" could be useful for future simulation based studies.

Tuesday, August 01, 2006

The Backup Power

The fact that backup is an important part of any IT infrastructure management plan, everybody should know. What about backing up DNAs?

Well, this is what a NYTimes article by By Richard Morgan ("Life After Earth: Imagining Survival Beyond This Terra Firma") suggests (in a scientific fiction mode) that it could be done to prevent human life of being extinct by natural or war disasters.

Despite the currently unrealistic, yet possible, interesting description of the applicability of a DNA repository of every form of life in our beloved planet earth, the article discuss how the doomsday is understated sometimes.

However, instead of putting all effort on developing technologies for "DNA backup" and storage, how about to move the focus from military development to sustainable development? I guess this would be a very good and true step to reduce the risk of human extinction.

Anyway, reading this article was good to reassure my point of view that some priorities of some nations do not seem to me the right way to go. :-)

Cheers,
./Eli

Saturday, June 17, 2006

Greenwood Simulator

It's been a long time, since the last time I've posted. Alright, I guess I'll have more time for the rest of the summer to post some interesting thought about what really matters in life: People, Music, Photography and Computers. :-)

The good news is that I have started a new solo project last Spring. After having a laptop crash I decided to publish the partially recovered source code that I have been developing under the terms of an open source initiative.

The main goal is to develop a simulator (or something more like a simulation framework) to help researchers interested in having fun with data staging on large scale and distributed storage environments.

GreenSim builds on top of Simgrid Toolkit. My main goal is to provide a developer-oriented framework that can be personalized to specific demands.

However, before downloading you should be aware that the software is still in a pre-alpha version. Thus, you should not expect a stable version. I will keep committing changes, bug fixes and announcing releases here, as soon as the internship activities allow me to do.

In the case you feel interested in participate, please, drop me a message and let's discuss your ideas, I would appreciate it!

GreenSim
http://greensim.sourceforge.net

Regards,
./Eli

Friday, May 26, 2006

Is Computer Science a science?

Time to time this question comes to my mind, and (almost automatically) I remember a statement by one of the most interesting persons that I have met in Chicago while we were enjoying some beers with our advisor:

"Real sciences do not need to have the word science in their names. For example, Physics, Biology and Chemistry."

I should start this post by saying that I disagree with such statement. :-)

Let us take a look at what is the definition of the word science According to the Merrian-Webster Online (http://www.m-w.com). Thus, we have:

1: the state of knowing : knowledge as distinguished from ignorance or misunderstanding

2 a : a department of systematized knowledge as an object of study (the science of theology) b : something (as a sport or technique) that may be studied or learned like systematized knowledge (have it down to a science)

3 a : knowledge or a system of knowledge covering general truths or the operation of general laws especially as obtained and tested through scientific method b : such knowledge or such a system of knowledge concerned with the physical world and its phenomena : NATURAL SCIENCE

4 : a system or method reconciling practical ends with scientific laws (culinary science)

To avoid being biased and a situation where one could tell me that I do not know the color of the sky. I will cite a second source. No, the second source is not Britanica, it is Wikipedia (http://en.wikipedia.org/wiki/Science).

[...]
refers to the system of acquiring knowledge based on empiricism, experimentation, and methodological naturalism.
[...]

Therefore, I think that Computer Science fits very well on both definitions and it should be considered a real science. Perhaps a good next question is: what is a good scientific method in computer science and what is not?

However, this is a discussion for a second round of beers. :-)

Cheers,
Eli

Friday, May 19, 2006

The Spam World Map

I have found an interesting tool based on the Google Maps API and Host IP Info API. The tool basically translates domain names to a geographic location by showing on the map. The idea is pretty simple. But it is nice. :-)

Here is the link: http://map.butterfat.net/emailroutemap/

It was a very interesting finding because some days ago I was poetically thinking about exploiting geographical location and network information/usage patterns for some particular cases that I am investigating now. One of the results that I am particularly aware are related to a time zone aware scheduling approach.

Cheers,
Eli

Wednesday, April 12, 2006

IP Design Principles

Recently, I have found a very interesting article that revisited a discussion on the IP packet switching foundations versus circuit switching.

I particularly liked the article structure and agree with some points exposed by the authors. Mainly, the question whether there is an approach to guarantee non-trivial QoS levels over IP other than overprovisioning the links.

Yet, the most important aspect in the article is the attempt to uncurtain new perspectives for packet switching and circuit switching as well.

The paper is short, contains good references and it is definitely worth reading.

Is IP going to take over the world (of communications)?

Regards,
Eli

Thursday, March 23, 2006

Storage Affinity Simulator

Due to several e-mails that I received asking for the simulator that I used to evaluate the Storage Affinity scheduling heuristic, I decided to put the source code available for download here.

Please read the WARNING file inside.

StorageAffinity_Sim.tar.bz2

Enjoy,
./Eli

Friday, March 17, 2006

Xen and The Art of Hyper-Threading

The quarter is technically over. Here, I decided to post an abstract of some findinds in one of the projects where I had a lot of fun during almost all this time that I have not posted here. :-)

The Performance Impact of Information Gap between Virtual and Physical CPU Capabilities

Theo Hebert, Elizeu Santos-Neto, Andy Seidel

Virtualization is a technique used to create software abstractions of physical hardware architectures to enable more flexible resource utilization and more efficient application execution. Recently, virtualization technologies have regained popularity and created a whole new set of opportunities, that span from software development environments to planetary scale service deployment. However, during the virtualization process, information about architectural components can be lost. This information gap regarding the physical and virtual hardware capabilities may cause performance penalties in the applications executing on top of virtual hardware. In this work we show that the such lack of accuracy exists in the virtual CPU provide by the Xen virtual machine monitor. We focused our investigations on the Hyper-Threading enabled processors. We also describe our solution to alleviates the performance penalty and a discussion about the performance impact of the information gap.

Thursday, January 26, 2006

CPU Inheritance Scheduling

Recently, I remembered a topic that I dedicated a lot of my attention in the end of my undergrad course: CPU Inheritance Scheduling.

The main motivation for Inheritance (Hierarchical or Loadable, if you prefer) Scheduling is the assumption that it is hard to a particular scheduling policy to fulfill the requirements posed by several different target applications.

Therefore, the idea is to allow general purpose systems to easily implement multiple scheduling policies. Furthermore, to have the schedulers organized in a given hierarchy, in the sense that it is possible to reuse the whole scheduling policy logic.

As an example of different application requirements in the same system, we can think of interactive applications have a natural demand for responsiveness (e.g. a text editor, image editing), while batch applications for throughput (e.g. compiling a kernel, running simulations).

Back to the late 90's, the first reference that I have found about it, and that captivated me, was a paper written by Bryan Ford and Sai Susarla (CPU Inheritance Scheduling). In this article, the authors describe the design and implementation of a thread scheduling framework that supports multi-policy scheduling in the FreeBSD system.

Later, an approach that provides yet more flexibility is presented by George Candea Michael B. Jones (Vassal: Loadable Scheduler Support for Multi-Policy Scheduling). In this case, they provide the ability of dynamic loading of scheduling policies. In contrast with the Bryan Ford's paper, the Vassal strategy is better from the point of view that it is not necessary to rely on the scheduling policies made available by the operating system. One could request for loading her own scheduling policy instead. Obviously, this would require the necessary privileges, what turns out in a limited flexibility.

Thus, how about in a system based on virtual machines, where possible harmful user activities will not influence other users? Well, I would primarily think that it might be interesting in a certain degree, however it remains an open question to me.

Maybe, future posts soon. If I have some course projects break. :-)

Tuesday, January 03, 2006

In the last Computer Architecture class, it was mentioned that it is easier to find information about the evolution of processors (e.g. number of transistors, performance, etc) than finding the equivalent information about disk and/or memory. This is very interesting because I was wondering about something similar approximately a month ago.

The question in my head was: "Why does not exist a TOP500 ranking of the High Performance Storage infrastructures?".

They have published the www.top500.org highlighting the processing power for years, but few details are included about the storage systems which come together these powerful machines.

Recently, I have found that the IEEE Computer Society Mass Storage Systems Technical Committee is sponsoring an initiative to develop the TOP100io (http://top100io.org/) which is still in the "Call for Contributors" phase.

Particularly, considering that several distributed computational architectures are built today to tackle huge data intensive problems, this ranking might be relevant in guiding future research on high performance storage systems design and performance analysis.

Monday, January 02, 2006

I used to post some comments about articles that I read in my CiteULike library.

The purpose of this blog is to be a place where preliminary ideas and opinions about research will be published.

I would appreciate to receive comments about the information posted here.

Cheers,
Eli

Mundaú - Distributed Computing (or not)