Wikipedia talk:WikiProject Oregon/Readership

Data extraction technique
Thanks to instigation by Pete, this script creates the table on this article. wpor {       wget http://stats.grok.se/en/$2/$1 -O - 2>/dev/null | \ grep " has been viewed " | \ sed 's#.* has been viewed \([0-9]*\).*#\1#;' }
 * 1) !/bin/bash

echo '{| class="wikitable sortable"' echo '! article !! importance !! rating !! Dec 2007 !! Jan 2008 !! Feb 2008 !! Mar 2008'

dates="200712 200801 200802 200803" while read article importance rating do x="" for month in `echo $dates`; do               y=$(wpor $article $month) x="$x || $y" done echo "|-" echo "| $article || $importance || $rating $x" done echo "|}"

It is fed input which came from the table generated by the automatic rating thingy which appears on the project page. I got the data from there, but for the life of me I can't figure out where that is now. The beginning of the data looks like Oregon_State_Capitol   Top     FA 1980_eruption_of_Mount_St._Helens       Mid     FA 1984_Rajneeshee_bioterror_attack        Mid     FA D._B._Cooper    Mid     FA It was mildly reformatted from the magically generated article. —EncMstr (talk) 03:55, 21 April 2008 (UTC)

summary
This is a summary of the steps detailed below which create an update of this (Readership) page:
 * 1) Edit Version 1.0 Editorial Team/Oregon articles by quality/1.
 * 2) Copy and paste the wikitext into Vim hosted on Linux
 * 3) Execute the search and replace command (below), change "^I" to tab characters if necessary
 * 4) Remove header and trailer lines
 * 5) Save the resulting data as "file"
 * 6) Execute the script below, saved as "wpor", with  ./wpor result
 * 7) Copy and paste "result" into the article.  Preview, then fix any UTF-8 character problems revealed as redlinked articles

gory detail
The format of the article containing assessments Version 1.0 Editorial Team/Oregon articles by quality/1 has changed. The wikisource of that article is trimmed to exclude the header and trailer text, then fed through these vim commands to produce the article table (which is demonstrated above): :%s/{{assessment | page=\[\[\(.*\)]].*importance={{\(.*\)-Class.*class={{\(.*\)-Class.*/\1^I\2^I\3      (for most entries) :%s/^{{assessment | page=\[\[\(.*\)]].*class={{\(.*\)-Class.*/\1^I#na^I\2                               (for unknown importance entries)

The first vim command transforms a line like {{assessment | page=Berkeley Lent | importance={{Mid-Class}} | date=June 4, 2007 | class={{Start-Class}} | version= | comments= }} into Berkeley Lent  Mid     Start The second command transforms a line which has ... | importance= | date=... into Berkeley Lent  #na     Start

The result is fed into the script below as stdin (that is, < file): wpor {       wget http://stats.grok.se/en/$2/$1 -O - 2>/dev/null | \ grep " has been viewed " | \ sed 's#.* has been viewed \([0-9]*\).*#\1#;' }
 * 1) !/bin/bash

declare -a monthlist monthlist=(200712 200801 200802 200803 200804 200805)

n=${#monthlist[@]} declare -a coltotals

echo '{| class="wikitable sortable" style="text-align:right"' echo '! article !! importance !! rating !! Dec 2007 !! Jan 2008 !! Feb 2008 !! Mar 2008 !! Apr 2008 !! May 2008 !! Total'

for (( m = 0; m < n; ++m )); do        coltotals[$m]=0 done

rowcount=0

while IFS=$'\t\n' read article importance rating do wikiarticle=`echo $article | tr " " "_"` #echo "article $article=$wikiarticle, importance $importance, rating $rating" x="" linetot=0 for (( m = 0; m < n; ++m )); do                y=$(wpor $wikiarticle ${monthlist[$m]})
 * $((linetot = linetot + y))

x="$x || $y" coltotals[$m]=$(( coltotals[$m] + y )) done echo "|-" echo "| $article || $importance || $rating$x || $linetot"
 * $((rowcount = rowcount + 1))

done

linetot=0 x="" for (( m = 0; m < n; ++m )); do        y=$(( coltotals[$m] ))
 * $((linetot = linetot + y))

x="$x || $y" done

echo "|-" echo "| __Total__ $rowcount articles || || $x || $linetot" echo "|}"

This script is based on the old one, but calculates row and column totals. Also, its output is directly suitable for inclusion, whereas the old one needed some text tweakings. The only current glitch is that some extended UTF-8 characters are munged, about 6 article names presently. —EncMstr (talk) 21:14, 3 June 2008 (UTC)

Kudos!
Wow EncMstr, thanks for the major expansion of data, and all the documentation of how you did it! -Pete (talk) 23:59, 3 June 2008 (UTC)