The scale of the application is quite impressive. They used Hadoop to process the Webmap, as part of their search engine architecture. From their post (check their website for a video with some discussion about Hadoop in this context):
* Number of links between pages in the index: roughly 1 trillion links
* Size of output: over 300 TB, compressed!
* Number of cores used to run a single Map-Reduce job: over 10,000
* Raw disk used in the production cluster: over 5 Petabytes
I can even see the difference in the quality of the search results now. :-)
Update: Greg Linden also posted about the new Hadoop-cluster. It's nice that he puts the numbers above in perspective, by comparing to Google's infrastructure.
No comments:
Post a Comment