Wikipedia:List of Wikipedians by number of edits/How to generate the lists

This page explains how to generate the following lists.


 * en:Wikipedia:List of Wikipedians by number of edits
 * en:Wikipedia:List of Wikipedians by number of recent edits
 * ja:Wikipedia:編集回数の多いウィキペディアンの一覧
 * zh:Wikipedia:最多贡献的用户

Preconditions

 * A computer system (e.g. personal computer) that can run Java.
 * UNIX or Unix-like system (including GNU/Linux and Mac OS X)
 * Windows
 * etc.
 * Java Development Kit (JDK) and Java Runtime Environment (JRE) are installed on your computer (Java SE 5.0 or later).
 * The following Java programs are compiled and deployed on the computer.

Instructions

 * Download the latest database dump from http://download.wikimedia.org/.
 * The following files are required.
 * user_groups.sql.gz
 * stub-meta-history.xml.gz
 * You can read the RSS feed for wikipedia dump progress.
 * e.g. enwiki-latest-user_groups.sql.gz-rss.xml at http://download.wikimedia.org/enwiki/latest/.
 * Run the Java program(s) to generate a list.
 * Upload.

The case of the lists of the English Wikipedia
1. User:Mikkalai 2. User:Haemo 3. User:Jeffrey O. Gustafson .  .   . or User:Mikkalai User:Haemo User:Jeffrey O. Gustafson .  .   .   1. Bluebot 2. AntiVandalBot 3. MartinBot .  .   . or Bluebot AntiVandalBot MartinBot . ..
 * Download from: http://download.wikimedia.org/enwiki/
 * Copy and paste List of Wikipedians by number of edits/Anonymous into your text editor, and save as " ".
 * Examples of :
 * Copy and paste List of Wikipedians by number of edits/unflagged bots into your text editor, and save as " ".
 * Examples of :

The case of en:Wikipedia:List of Wikipedians by number of edits
java -Xmx1500m -Dbegin.date=2008-04-01 -Dend.date=2008-04-30 -Dlimit=4000 WikipediansByNumberOfEdits_en enwiki-20080501-stub-meta-history.xml.gz enwiki-20080501-user_groups.sql.gz > result.txt
 * Run the Java program as following.

The case of en:Wikipedia:List of Wikipedians by number of recent edits
java -Xmx1500m -Dbegin.date=2008-04-01 -Dend.date=2008-04-30 -Dlimit=5000 WikipediansByNumberOfRecentEdits_en enwiki-20080501-stub-meta-history.xml.gz enwiki-20080501-user_groups.sql.gz > result.txt
 * Run the Java program as following.

Using awk
perform (you don't need java)

mawk -v startdate=2005-01-01 -v enddate=2011-01-31 '{sub(/^blank:+/,"")}/ /{gsub(/<[^>]*>/,""); date=substr($0,1,10);next} / /{gsub(/<[^>]*>/,""); totcount[$0]++; if ((date >= startdate) && (date <= enddate))periodcount[$0]++} END{for(u in periodcount)print "| | User:" u " || " periodcount[u]+0 " || " totcount[u] "\n|-"}' input

The case of ja:Wikipedia:編集回数の多いウィキペディアンの一覧
java -Xmx500m -Dbegin.date=2008-04-01 -Dend.date=2008-04-30 -Dlimit=200 WikipediansByNumberOfRecentEdits_ja jawiki-20080501-stub-meta-history.xml.gz jawiki-20080501-user_groups.sql.gz > result.txt
 * Download from: http://download.wikimedia.org/jawiki/
 * Run the Java program as following:

The case of zh:Wikipedia:最多贡献的用户
java -Xmx500m -Dbegin.date=2008-04-01 -Dend.date=2008-04-30 -Dlimit=500 WikipediansByNumberOfRecentEdits_zh zhwiki-20080501-stub-meta-history.xml.gz zhwiki-20080501-user_groups.sql.gz > result.txt
 * Download from: http://download.wikimedia.org/zhwiki/
 * Run the Java program as following.