Finding breaking news in Twitter

Twitter contains posts on many many topics, most of which are irrelevant. Breaking news can be found on Twitter by clustering Tweets into groups according to their similarity with each other. This threads tweets into stories and each such thread can be inspected to see if it looks like a story.

This project will use the Edinburgh Twitter Corpus for data (97 million Tweets).

Locality sensitive hashing can be used to efficiently thread Tweets.