Which groups of people use Twitter?

Many people post on Twitter and it is possible to identify blocks of users (for example, members of political parties, followers of Justin Bieber). This project will cluster authors on Twitter and will try to see if interesting groups can be found (answer: they can be).

Clustering large numbers of items is computationally intensive, so Canopy clustering might be used to speed-up the process.

The Mahout toolkit supports Canopy Clustering (and other methods) and would be a good place to start.

This project will use the Edinburgh Twitter Corpus for data (97 million Tweets).