User:TheFearow/PopularWords

I am currently working on some statistics on the most popular words that are used. At the moment my main studies are in titles, partially because a dump of the didles is a 20mb download, and one of the articles is just over 2gb.

Data Source
I am using the slightly outdated database dumps, as screen scraping all 1.8 million entries even if I was using 100 entries a page would result in over 18000 page views (which i'm not sure I would be loved for).

Processing
I am doing the processing using a custom written Java application. I will consider publicising the source at a later date, once I get the bugs worked out and make it tidier.

Results
The results will be on the following pages:
 * User:TheFearow/PopularWords/Title
 * User:TheFearow/PopularWords/TitleBig (Same as above but with only words over 5 characters)