User:Rich Farmbrough/Disambig scripts

Scripting
Download the database.

Note these scripts work with the SQL databse dumps, not with the XML dumps. Create this perl script dab.pl


 * 1) !/usr/bin/perl

while (<>) { @lines=split /\[INSERT INTO \`cur\` VALUES \(|\d\'\),\(|\d\'\);\n/; foreach $line (@lines){ $line =~ m/\d+,(\d+),'(.+?[^\\])','(.+?[^\\])','/; $space=$1; $name=$2; $text=$3; if ($space==0) { if ($text =~ m/\{\{disambig\}\}/){ print $name, "\n"; }			elsif ($text =~ m/\{\{msg:disambig\}\}/){ print $name, "\n"; }		}		} }

run

perl dab.pl ddddddddd_cur_table.sql > dab.txt

Where dddddd is the appropriate date (Takes a few minutes, I didn't time it.)

then create countdab.pl

%dab=; open (DAB,"dab.txt"); while (){ chomp; $dab{$_}=0; } $i=0; while (<>) { @lines=split /\[INSERT INTO \`cur\` VALUES \(|\d\'\),\(|\d\'\);\n/; foreach $line (@lines){ $line =~ m/\d+,(\d+),'(.+?[^\\])','(.+?[^\\])','/; if ($1==0){ $_=$3;			@links= /\[\[(.*?)(?:\||\]\])/g;
 * 1) !/usr/bin/perl

foreach $link (@links){ if ( exists $dab{$link} ) { $dab{$link}++; }			}		}	}	print STDERR ".",++$i; } $i=0; foreach $key (sort { $dab{$b} <=> $dab{$a} } keys %dab) { print "# ", $key," (links to ",$dab{$key}," articles)\n"; if ($i++>200) {last;} }

and run with something like

perl dabcount.pl dddddddddd_cur_table.sql > count.txt

where dddddddddd is the appropriate date. (takes about twenty minutes)

and you have your result. It's not perfect because it ignores nowiki, comments etc. but for a disambiguation league table it's good enough.