December 16, 2010

Mozilla Open Data Visualization Competition

The following are the results of my analysis of the data from the Mozilla Open Data Visualization Fall 2010 contest. I downloaded witl_small.tar.gz (from "A Week in the Life of a Browser - Version 2" ) which contains a sample of the data for my analysis.

I did not download the full set as my bandwidth has been crappy for sometime and also was running out of time. (The queries can be run on the FULL data though.) Mozilla has provided many attributes related to the various activities on FF (and there are SO MANY of them!). Since i stumbled on this contest pretty late in the game, i was unable to analyse ALL the attributes/dimensions. I preferred tackling a few questions in good detail than analysing many dimensions without much depth.

So my analysis consists of the following 4 visualizations which try to answer 4 different questions.

Tools Used : Protovis, HighCharts, Python, SQLite3 (Excel was used for Preliminary analysis/data cleansing)

1) What is the Web usage pattern of people of different age groups?
  Or in other words, What is the average number of hours spent by someone who is 30 years old?


2) Is their a correlation between the number of years being associated with Firefox and the number of hours spent on the Web daily?
or in otherwords, do people who have used Firefox for 3-5 years or more, spend more number of hours using the Web Daily?


3) What kind of bookmark activity do people do who are associated with Firefox for a number of years(we analyse *only* those who use any of the bookmark feature)
i.e, how is the bookmarking creation/choosing/modifying spread among the bookmarking operations?

(In the above chart, you will find 3-6m column being empty - the reason being, there was no data for this in the sample - i hope that the same is present in the full data set).

4) How do different age groups function w.r.t various features on the Firefox?
Note: this chart is to be read vertically - i.e, for a given feature, lets say Private Mode, which is the age group which uses this feature often? You will find that on viewing the column Privatemode, the age group 18-25 has the darkest color, which means that this is the age group which uses the feature often. Hence, the color gradient from the lightest to the darkest encodes the least to most often used.



[I have deliberately avoided explaining the interpretations and understandings - as I believe that the numbers speak for themselves. However, any doubts in the charts can be explained]

My 2nd entry to this competition can be found here.
My 3rd entry to this competition can be found here.

2 comments:

Eugene Tjoa said...

Hi,

Thanks for your interesting article.

I see two stacked column charts in your article. Did you consider using several non-stacked charts instead of a single stacked chart(6 charts in case of the first chart)? If you put them all on the same horizontal line, you can compare the distribution within a chart, but also compare the distribution over all charts. For example, you can then compare the height of the green bar with the height of the purple bar within a chart, or you can compare the heights of all green bars.

Cheers,
Eugene

Venkat said...

Hi Eugene,

That would make the baseline of the chart pretty long. As of now, since this is a blog, i have just pasted the image of the chart; i have a webapp which i need to host in GAE sometime - that would give you a better 'feel' of the chart, with the hover tips etc.

I hope the webapp is not of high priority; am working on a few more viz cases that i will putting in my next blog.

Thanks.