Students will work in groups on a project. Each project uses Hadoop to solve a natural language processing problem.

Each group will be largely self-directed, but can call upon the organisers for help.

Students can also propose their own projects. Contact us if you want to do this.

Project 1: Finding breaking news in Twitter

Can we find breaking news stories in Twitter? More.

Project 2: Which groups of people use Twitter?

Efficiently cluster authors on Twitter into interesting groups. More.

Project 3: Computing Kneser-Ney smoothed language models

Compute a K-N smoothed language model using large volumes of data. More.

Project 4: Page Rank for Twitter Users

Given the complete follower graph of Twitter users, compute Page Rank for each author. Who has the highest PR? More.

Project 5: Distributed discriminative supervised machine learning

Discriminative machine learning approaches often produce the best performance for many problems. Can we get it to run using gigantic amounts of data? More.

Project 6: Finding useful information in a sea of garbage

Often we have massive volumes of low-grade training material (eg data from the Web). Can we spot items in it that are likely to improve performance of some task (eg reduce language model perplexity)? . More.

Project 7: Clustering words into classes

Cluster words into useful classes using the Exchange Algorithm . More.

Project 8: Your own project

Don't like our suggestions? The simply propose your own! Contact us and we will try to arrange for this to happen.