User:Doctor Boogaloo/Perl

Here is some Perl code that I have written to automate checking (not editing) of Wikipedia. Please see the notes at the foot of this page. I'm not a lawyer but you use this code at your own risk and I bear no responsibility for anything that you may do with it. In particular any time you are programmatically hitting web servers you should be careful that you do not flood them with requests, inadvertently causing denial of service. If you use the code below it is your responsibility to ensure that you do not do this.

All scripts that use this code should begin as follows:

use strict; use Net::IP; require LWP::UserAgent; use Math::BigInt;

Code snippets are at the top, and the required subroutines follow.

Contributions by IP range
This will produce output in the form shown below. The first numbers are just a progress indicator, then you get the IP address that has contributed something, the total number of edits made in Wikipedia, and the total number of articles edited, followed by a list of those articles.

1/256) 71.146.148.0: 1 edits in total, 1 articles edited.   Shi'a Islam  2/256) 71.146.148.1: 2 edits in total, 1 articles edited. Chevrolet 68/256) 71.146.148.67: 2 edits in total, 1 articles edited.   Stress testing  75/256) 71.146.148.74: 1 edits in total, 1 articles edited. Talk:Tie (music)

my $minIP = "71.146.148.0"; my $maxIP = "71.146.148.255"; my $ipRange = "$minIP - $maxIP"; my $ip = new Net::IP ($ipRange) || die; my $ipCount = $ip->size; my $ipPosition = 0; do { $ipPosition++; my ($totalContribCount, $distinctArticleEditCount, $contribsByDate, $contribCountByArticle) = getContributions ($ip->ip); if ($totalContribCount > 0) {     print $ipPosition. "/" . $ipCount. ") " . $ip->ip.": $totalContribCount edits in total, $distinctArticleEditCount articles edited.\n";     foreach my $article (keys %$contribCountByArticle)      {        print "   $article\n";      }    }  }

Subroutines
This subroutine uses LWP::UserAgent to retrieve a page programmatically. You pass in a fully fledged URL.

sub getPage {   my $url = shift; my $ua = LWP::UserAgent->new ( agent => "Some_String_That_Contains_Your_Wikipedia_UserName" ); $ua->timeout(60); my $response = $ua->get($url); if ($response->is_success) {   }    else {     die $response->status_line; }   return $response->content; }

This subroutine will get all contributions by a specific user.

sub getContributions {   my $user = shift; my $max = shift || 1000; my $totalContribCount = 0; my $distinctArticleEditCount = 0; my %contribsByDate; my %contribCountByArticle; my $url = " http://en.wikipedia.org/w/index.php?title=Special:Contributions&limit=${max}&target=$user "; my $text = getPage ($url); $text =~ s/\n//g; $text =~ s/\.27/\'/g; my $entries = 0; while ($text =~ s/\(.*?)\<\/li\>//) {     my $match = $1; if ($match =~ m/^(\d\d\:\d\d, \d+ \w+ \d\d\d\d)(.*)?/) {       $totalContribCount++; my $date = $1; my $rest = $2; if ($rest =~ m/\s*\(.*?\)\s*\(.*?\).*?\(.*?)\<\/a\>.*?/) {         my $article = $1; $contribsByDate{$date} = $article; $contribCountByArticle{$article}++; }     }    }    $distinctArticleEditCount = scalar (keys %contribCountByArticle); return ($totalContribCount, $distinctArticleEditCount, \%contribsByDate, \%contribCountByArticle); }

This subroutine will get all contributors to a specific page.

sub getContributors {   my $page = shift; my $max = shift || 1000; $page =~ s/\ /\_/g; my %contributors; my $url = " http://en.wikipedia.org/w/index.php?title=$page&limit=${max}&action=history "; my $text = getPage ($url); $text =~ s/\n//g; $text =~ s/\.27/\'/g; while ($text =~ s/\(.*?)\<\/li\>//) {     my $match = $1; if ($match =~ m/\(.*?\)\s*\(.*?\)\s*\\s*\\s*\(.*?)\<\/a\>\s*\s*\(.*?)\<\/a\>.*?/) {       my $date = $1; my $contributor = $2; push @{$contributors{$contributor}}, $date; }   }	    return \%contributors; }