User:The Transhumanist/Sandbox203

Mystery
Please solve this mystery if you can...

On September 23rd, traffic to Portal:James Bond doubled, and has stayed at the new level since then. I can't figure out what happened.

See http://stats.grok.se/en/201109/Portal%3AJames_Bond

Traffic to Outline of James Bond stayed the same (though it was at the higher-level already), which leads me to suspect changes made somewhere in Wikipedia.

See http://stats.grok.se/en/201109/Outline%20of%20James_Bond

I'd like to find out what happened, in case it reveals helpful link placement tips that can double the traffic to outlines too!

I look forward to your reply on my talk page. The Transhumanist 23:24, 5 October 2011 (UTC)

Stats
Items categorised as outlines get a total of about 500,000 hits per month.

James Bond: Most likely someone added either a lot of portal links or talk page banners on that day.

Rich Farmbrough, 11:47, 7 October 2011 (UTC).

You need a list of pages as well.

You also need perl I reccomend Strawberry for Windows


 * Rich Farmbrough, 17:48, 7 October 2011 (UTC).

Re:Stats
Okay, I've been studying Perl, and today I finally took a crack at the script you sent me:

It's a command with the syntax

You use, because that's the module where " " is.

The " " lines set literal variables to the values provided.

is a looping command, and in this case works on the default variable. The default here appears to be each successive entry in the list specified.

The angle brackets  turn the script into a command that is executable from the command prompt in the same way that a Unix command is.

In the loop, you substitute all spaces for underscores, to make the entries work in URLs.

Then you print the current entry to the screen, but  would have done the same thing.

You follow that with pulling in the output from toolserver. For example http://stats.grok.se/en/201109/Outline_of_geography. In the same operation, you assign the output to the variable.

Then you employ the bind operator to specify a pattern (regular expression) match from toolserver's output (taking the match from the content of the  variable), for the purpose of using the automatic match variable. The  matches digits and the   means one or more of them in a row.

Then you assign the matched string to  using a cumulative numeric assignment operator. Because it's a numeric operator, Perl automatically strips out the non-numerical stuff from the string (well, not quite, the stuff on the left of the numbers is set to zero, while the stuff on the right is dropped).

Basically, you've scraped the monthly page views from toolserver's output.

Then you print that value to the screen and advance to a new line.

And the loop repeats on the next item in the list.

When the loop is done, you repeat the final total at the end.

I'm ready for my next one. Please send me another simple but useful Wikipedia-related script. The Transhumanist 01:48, 9 December 2011 (UTC)

P.S.: Thank you for the Strawberry recommendation. It works fine.

P.P.S.: is there a collection of perl scripts on Wikipedia somewhere?


 * Good work. The angle brackets actually take next line of input.  If you ran this without the list file, the script would take input from the command line, one item at a time. The input from the angle brackets is automatically assigned to $_.  (As you can see, perl does a lot of stuff automatically for us.) I'll ferret around for something tomorrow, and see what I can find.


 * I'm not sure if there's much simple perl floating around, perhaps we should start a library. But there are quite a few bots, Anomie's code is rather beautiful, if a little obscure. Rich Farmbrough, 02:32, 9 December 2011 (UTC).


 * In fact a little challenge:
 * get the stats for the previous year for one page
 * output the data in a format suitable for a wiki-page - using a by-month table and a year total on the right.
 * do the same for a list of pages
 * We could build this into a little bot.
 * Rich Farmbrough, 02:36, 9 December 2011 (UTC).

Fishing...
I love that little statistic you provided. It's a fish. I need more.

500,000 per month, that's 6 million per year! My guess was 5.

'Give someone a fish and you feed him for a day; teach the person to fish and you feed him for a lifetime.' (See Distributism).

Please teach me how to fish. How did you generate that stat?

I look forward to your reply. The Transhumanist 17:28, 7 October 2011 (UTC)

Can AWB access arguments?
That is, in its search replace commands, is there a way to specify the name of the current wikipedia page it is working on?

I'd like it to be able to insert that name, as the name and not as a sticky variable (I need the actual title of the page).

I'm guessing that regex can then be used to assign that name to a variable for modification.

The trouble I'm running into is that I often need to use the subject's name in replace strings, but the pages' names are "Outline of subject". This renders MediaWiki's variables useless. So if I can access the pagename within AWB, I think I can solve this problem using regex's variable manipulation.

Can AWB do this, and if so, how?

Also, if AWB can do this, where is the documentation on it? I'm sure there'd be other useful things in there.

I look forward to your reply. The Transhumanist 19:39, 6 January 2012 (UTC)

P.S.: I'd like to do this all in one pass, if possible.
 * Not sure if this is what you were looking for but AWB and Wikipedia have certain "Magic words like PAGENAME, NAMESPACE, %%KEY%% and others. If you look at the magic words link under variables I think that will help. --Kumioko (talk) 20:25, 6 January 2012 (UTC)
 * Yes, I know. But when you just want the subject name, using PAGENAME in this situation produces "Outline of subject", which necessitates a second pass with AWB to get rid of "Outline of". There are about 540 outlines, so an additional pass even with AWB is pretty cumbersome and time-consuming.  I'm looking for a more efficient way to do this.  Thank you though.  Every input helps.  Can you think of anything else? Where are the advanced features documented? The Transhumanist 01:30, 7 January 2012 (UTC)
 * %%title%%
 * Rich Farmbrough, 16:40, 7 January 2012 (UTC).


 * Thank you. I'll try it. The Transhumanist 19:21, 7 January 2012 (UTC)

Text editor?
What (free) text editor do you recommend for editing perl scripts? The Transhumanist 21:47, 10 December 2011 (UTC)
 * I use mainly VIM, as recommended by Anomie (also vi, notepad and the command line editor), I also have Perl IDE but I haven't done much with it. The main problem with VIM is that it doesn't cope with Unicode. Rich Farmbrough, 23:25, 10 December 2011 (UTC).


 * I'll try 'em. Thank you. The Transhumanist 00:47, 11 December 2011 (UTC)

Perl text editor
Do you know of any (copyleft) text editors and/or word processors written in perl? I'd like to familiarize myself with how they work. The Transhumanist 21:49, 10 December 2011 (UTC)
 * No idea on this one. Rich Farmbrough, 23:25, 10 December 2011 (UTC).

These aren't Perl specific but try taking a look at notepad ++ here and Scintilla here. They may lead you to some helpful information. You can also check out Sourceforge fro some good stuff written in Perl. All three of these are Free open source software related. --Kumioko (talk) 00:07, 11 December 2011 (UTC)
 * Thank you. I'll take a look. The Transhumanist 00:41, 11 December 2011 (UTC)

Re: a little challenge
Previously, you wrote:


 * In fact a little challenge:
 * get the stats for the previous year for one page
 * output the data in a format suitable for a wiki-page - using a by-month table and a year total on the right.
 * do the same for a list of pages
 * We could build this into a little bot.

I saw how to do #1 and #3 in your initial ("Stats") script. How do you do #2? The Transhumanist 22:01, 11 December 2011 (UTC)
 * OK so by "Previous year" I meant Dec 2010, Jan 2011, Feb 2011....
 * To output the data in Wiki-format you just need to use the print command. Perl is generally very forgiving about print:


 * print '{|\n!December\n!-\n...';

(note both types of quotes work, they are subtlety different.)
 * print "\|$number";


 * You might need to use a for loop.


 * Rich Farmbrough, 22:11, 11 December 2011 (UTC).

But how do you put the data in a file ("print" just displays it on the screen, right?), and then how do you place it in a page on Wikipedia? The Transhumanist 03:21, 16 December 2011 (UTC)
 * . One thing at a time. If you open a file for output then you can print to it.


 * open MYPAGE, ">mypage,txt;
 * print MYPAGE "Some words and a newline.\n";
 * close MYPAGE;
 * Rich Farmbrough, 18:06, 17 December 2011 (UTC).
 * Rich Farmbrough, 18:06, 17 December 2011 (UTC).

Nice. By the way, was that supposed to be "mypage.txt" (mypage dot txt)?

Thank you for the tip. I'm now reading the Input and Output chapter of the Llama book.

And I found the documentation on (which you used in the initial script).

Okay, here's my next question...

Now that you have content in a file, how to you place that content on a Wikipedia page? The Transhumanist 23:53, 17 December 2011 (UTC)

Re:Stats
Okay, I've been studying Perl, and today I finally took a crack at the script you sent me:

It's a command with the syntax

You use, because that's the module where " " is.

The " " lines set literal variables to the values provided.

is a looping command, and in this case works on the default variable. The default here appears to be each successive entry in the list specified.

The angle brackets  turn the script into a command that is executable from the command prompt in the same way that a Unix command is.

In the loop, you substitute all spaces for underscores, to make the entries work in URLs.

Then you print the current entry to the screen, but  would have done the same thing.

You follow that with pulling in the output from toolserver. For example http://stats.grok.se/en/201109/Outline_of_geography. In the same operation, you assign the output to the variable.

Then you employ the bind operator to specify a pattern (regular expression) match from toolserver's output (taking the match from the content of the  variable), for the purpose of using the automatic match variable. The  matches digits and the   means one or more of them in a row.

Then you assign the matched string to  using a cumulative numeric assignment operator. Because it's a numeric operator, Perl automatically strips out the non-numerical stuff from the string (well, not quite, the stuff on the left of the numbers is set to zero, while the stuff on the right is dropped).

Basically, you've scraped the monthly page views from toolserver's output.

Then you print that value to the screen and advance to a new line.

And the loop repeats on the next item in the list.

When the loop is done, you repeat the final total at the end.

I'm ready for my next one. Please send me another simple but useful Wikipedia-related script. The Transhumanist 01:48, 9 December 2011 (UTC)

P.S.: Thank you for the Strawberry recommendation. It works fine.

P.P.S.: is there a collection of perl scripts on Wikipedia somewhere?


 * Good work. The angle brackets actually take next line of input.  If you ran this without the list file, the script would take input from the command line, one item at a time. The input from the angle brackets is automatically assigned to $_.  (As you can see, perl does a lot of stuff automatically for us.) I'll ferret around for something tomorrow, and see what I can find.


 * I'm not sure if there's much simple perl floating around, perhaps we should start a library. But there are quite a few bots, Anomie's code is rather beautiful, if a little obscure. Rich Farmbrough, 02:32, 9 December 2011 (UTC).


 * In fact a little challenge:
 * get the stats for the previous year for one page
 * output the data in a format suitable for a wiki-page - using a by-month table and a year total on the right.
 * do the same for a list of pages
 * We could build this into a little bot.
 * Rich Farmbrough, 02:36, 9 December 2011 (UTC).


 * Yes, I'm intersted.


 * On a similar vein, a script or bot that I have great need for is one that builds a chart (similar to this) of subjects, with columns showing comparitively the monthly traffic for outline, portal, and category corresponding to each subject listed. It could take input from a list similar to the script you sent me.


 * Is that something you'd be interested in helping to create? The Transhumanist 03:50, 9 December 2011 (UTC)


 * OF course, that is where we started, wasn't it? Rich Farmbrough, 11:12, 9 December 2011 (UTC).


 * What's the plan? To pass code back and forth, or wiki-develop it on a project page? The Transhumanist 00:45, 11 December 2011 (UTC)

Outline of Perl
Here's a new outline.

You could help us Perl newbies by adding anything you think would be helpful. The Transhumanist 19:21, 7 January 2012 (UTC)

perl table construction script
I don't have a clue where to start.

I'd like the table to list subjects down the left, with columns for traffic on the right. One traffic column for the corresponding outline, category, and portal for comparison purposes.

And totals at the bottom of each column.

If you whip something up, I'm sure I could help refine it.

I look forward to any perl code you can throw at me. The Transhumanist 02:29, 3 January 2012 (UTC)

P.S.: Happy New Year!


 * OK so here's the (untested) basics in pseudo-perl. (There's two approaches, storing everything then making the table,or making the table line by line. Both have advantages, the latter is simpler.)
 * let us suppose we have a config file with the subject, outline, cats and portals listed thus:


 * (We could just have the word "Stamford" - if we could be sure that all three entities follow the naming convention.)
 * (We could just have the word "Stamford" - if we could be sure that all three entities follow the naming convention.)


 * Rich Farmbrough, 11:01, 3 January 2012 (UTC).


 * I'll see if I can figure out how it works. Thank you!  The Transhumanist 21:24, 4 January 2012 (UTC)

A couple questions...
In the annotation Perl script you wrote...

What does $page do?

I tried opening a file into $page, and it didn't work:

I used the following script to test the behavior of :

It just produced blank space.

I tried the above script without the "open" line, providing "Outline.txt" as a command line argument, and it still didn't work.

I use regex all the time, but file handling in perl has me stumped.

The Transhumanist 01:25, 19 March 2012 (UTC)
 * Yes, that's not how I would open a file - if I wrote that it was very strange.


 * will open the file.
 * Then you need to read from it. Something like:
 * Then you need to read from it. Something like:

perlmonks
 * The you are good to go. Rich Farmbrough, 02:10, 19 March 2012 (UTC).
 * The you are good to go. Rich Farmbrough, 02:10, 19 March 2012 (UTC).


 * Or just
 * TMTOWTDI. Rich Farmbrough, 02:17, 19 March 2012 (UTC).

Is it normal for beginner Perl students' heads to spin? (Mine is spinning). :)

You provided the following script fragment in a previous thread:

There is definitely something missing, because the script does not work when run, even when I replace the guts with

I don't understand "$page". It's not defined in the script, and I don't know how to define it from the examples you just provided.

It doesn't appear that this fragment can be dropped into the new script you provided above.

What is missing? The Transhumanist 03:34, 19 March 2012 (UTC)


 * Yes $page is the contents of the page. How you get the contents of the page is another matter. Remember data types in perl are somewhat flexible - $page does not need to be defined unless you
 * which you probably should. So in the example I gave (gluing it together)
 * which you probably should. So in the example I gave (gluing it together)

the text was loaded from a file. There would need to be a subroutine to get the Wikipage $bulleted.

Rich Farmbrough, 03:34, 19 March 2012 (UTC).

I swapped out the guts to test the file handling portion...

...and it didn't work.

What did I do wrong? The Transhumanist 03:54, 19 March 2012 (UTC)

P.S.: is there supposed to be a "." after "$page"? -TT


 * Yes ".=" appends so it's the same as "$page = $page . $_;"


 * The critical difference is that the while loop in my code has a match in it. While that match is true it loops.


 * Assuming there is some text in your "Outline.txt" file, your code will stay in the while loop forever, printing whatever is in the $_ variable; If $page evaluates to false, (which probably means an empty file) then it will just finish.

would be all that was needed. Rich Farmbrough, 04:04, 19 March 2012 (UTC).

Pagestats doesn't appear to work anymore
It looks like they changed the output at http://stats.grok.se...

I tried running this script again (last time was in September), to get a new total for outline traffic, and it doesn't seem to work right. Just returns zeros now.

On the command line I specified a file that is a list with bare unbracketed article names, one article name per line.

Does the script work for you?

I look forward to your reply. The Transhumanist 00:53, 23 March 2012 (UTC)

I found the problem. Solved by removing " times in" from the script.

The new total is 586,206. The Transhumanist 01:33, 23 March 2012 (UTC)
 * Excellent! Rich Farmbrough, 01:34, 23 March 2012 (UTC).

Viewing outlines with or without annotations
The next improvement I'd like to tackle is to provide some way to toggle an outline's annotations off/on (all at the same time) while viewing the outline in Wikipedia!!!

For example...

The user is browsing Wikipedia and has just arrived at an outline page. It's fully annotated, but he wants to look at the page uncluttered by the annotations.

How could we make it so that all he has to do is press a hot key to make (all of) the annotations disappear?

And then reappear by pressing a different hot key.

What are the possible approaches to implementing this?

Sincerely, The Transhumanist 23:54, 15 March 2012 (UTC)
 * Hmm well, the two that spring to mind are using the collapse functionality - navboxes can be set to collapse if there is more than one of them, so presumably this can be brought under control using the same technology (I assume CSS), or java-script. The Javascript code would need to be installed as default, whereas css an be soemwhat standalone, I think, although the preference is for having it all centrally stored. Rich Farmbrough, 02:30, 16 March 2012 (UTC).


 * It sounds like a javascript might be the best approach. Though it would be nice to have the functionality built-in on the browser level (via add-on). Do you know any add-on programmers?  The Transhumanist 23:14, 16 March 2012 (UTC)
 * I don't that I know of. And I have to disagree, add-ins are great but not for something you wan to be standard WP functionality. However the two tasks become very similar if you use Scriptish. Rich Farmbrough, 12:18, 23 March 2012 (UTC).

BTW, I'm stuck on the thread preceding this one (extract/insert annotations). I posted a bunch of new questions up there for you (I mention them here just in case you missed them). The Transhumanist 23:14, 16 March 2012 (UTC)

Data extraction & insertion
Let's say I have the wikicode file "Outline of Stamford" saved on my computer, and I want a program that goes through the outline, finds the first bulleted entry lacking an annotation, pulls the article from Wikipedia for the subject in the entry, extracts the first two sentences of the lead paragraph, then inserts those two sentences as the annotation for that entry, then repeats for the next missing entry, until the all the entries have annotations.

This would be very helpful, as it would save tons of manual cutting and pasting.

How would you go about doing that with perl?

The Transhumanist 22:05, 4 January 2012 (UTC)


 * I'm not sure what "un-annotated" means but at a guess you could use something like:

here the handwaving is in the assumption that the Wikipeida articles are well-formed, and not exceptional. Rich Farmbrough, 22:27, 4 January 2012 (UTC).


 * You would need to get the source of the article. You need a module for that, which comes with examples. MediaWiki::API I think is the name. Rich Farmbrough, 23:33, 4 January 2012 (UTC).

Entries in outlines look like this:


 * Architecture – art and science of designing buildings.
 * Crafts – activities and hobbies that are related to making things with one's hands and skill.
 * Drawing – visual art that makes use of any number of drawing instruments to mark a two-dimensional medium. As a verb, it is the act of making marks on a surface so as to create an image, form or shape. As a noun, it is the image produced, or the visual art form itself.
 * Film – also called a movie or motion picture, is a series of still or moving images. It is produced by recording photographic images with cameras, or by creating images using animation techniques or visual effects. The process of filmmaking has developed into an art form and industry.
 * Painting – the practice of applying paint, pigment, color or other medium to a surface (support base) with a brush or other objects. The term describes both the act and the result of the action.
 * Photography
 * Sculpture –

Concerning list entries, an annotation is a dashed comment.

The entries "Photography" and "Sculpture" above lack annotations. Would the program you wrote above home in on those and add an annotation for each? The Transhumanist 03:41, 5 January 2012 (UTC)
 * It would pick up the first, fail on the second for two reasons: it would count the endash as an annotation, and there's no following list item. Rich Farmbrough, 11:13, 5 January 2012 (UTC).

I'm stuck!
(I had to return the programming books to the library).

I don't know what to do to be able to use the while loop you provided above on an outline.

That is, how do you make it read the outline file into the $page variable?

Also, what did you mean by "handwaving"?

Once the annotations are inserted, how do I save the outline back to disk?

When this script becomes fully operational, I expect it will do more than 50% of the work on outlines. Because inserting annotations by hand is tedious as hell, and all of the outlines have entries that need annotations. We're talking tens of thousands of annotation insertions. I can't stress how helpful this tool will be.

How fast do you think it could insert 100 annotations? [

I look forward to your reply. The Transhumanist 23:49, 16 March 2012 (UTC)


 * The while loop will run while something is in the $page that consists of a newline followed by a bulleted link with nothing after it on the line.


 * So this is done .. wait didn't we do this? Depending on where the file is, by the reading it from disk as we discussed, or by loading it form Wikipedia.


 * "Handwaving" means the bit of the argument that is glossed over. Often it is a good idea to simply not worry about some problems until they can be actually met with (like developing the internal combustion engine, without worrying too much about people getting lost in strange towns), but sometimes this can be disastrous (like setting out across the desert without planning your water consumption).


 * Once the text is completed you can save with


 * open FILE, ">:", "somefilename.txt" or die;
 * print FILE $page;
 * close FILE;

Rich Farmbrough, 00:39, 27 March 2012 (UTC).

Think of it as a change in routine, and a change of pace...
The block may be a blessing in disguise.

This may give editors who have had a hard time keeping your attention the opportunity to converse with you on a more meaningful level (i.e., not rushed).

Why would we want to?

Because you are an expert on many aspects of Wikipedia.

This vacation gives you valuable time to share your expertise and experience with other Wikipedians.

Personally, I have many questions for you... – The Transhumanist 03:55, 1 April 2012 (UTC)

Hey
I just took a look at your user page, to see if it provides any info on the types of questions you would be able to answer, and I noticed you're from London. Half my family tree lives around there.

I haven't been to London since 1997. Almost got killed jaywalking 3 times, due to looking the wrong way before crossing. I guess it's not "jaywalking" over there, because it's legal &mdash; for you it's just crossing the street. I think it's cool that you have the right to cross the street. Here we are subject to getting ticketed by the police if we cross anywhere other than at an intersection.

By the way, that you drive on the other side of the street over there makes it easy to spot foreigners. I noticed many of them looking the wrong way.

I also learned that clotted cream tea is not tea with clotted cream in it. :) The Transhumanist 04:46, 1 April 2012 (UTC)
 * Mm, worth going to Devon and Cornwall just for the cream teas. They are good elsewhere but that is the home of the cream tea. Rich Farmbrough, 16:15, 1 April 2012 (UTC).


 * Incidentally the Magna Carta had a provision for access to the highway I believe. Other rights, of course, have been eroded massively over the last 20-30 years.  Notably extra-territoriality, retroactive legislation, double jeopardy, right to silence, the rights of the second chamber and just about anything that might be construed as "fundamental" has been thrown to the wolves of political opportunism.  The few that have been saved have been as much as a result of the political opportunism of the opposition of the day as principled resistance by backbenchers.  Of course historically it was ever thus, but the extremism of recent events, considering that we are not in the straitened circumstances of previous eras, is telling. Rich Farmbrough, 16:49, 1 April 2012 (UTC).

Wow, that's a lot of edits
Still number 1, I see. And closing in on the 1,000,000 edit mark. The Transhumanist 05:25, 1 April 2012 (UTC)
 * Yes, but blocked... one could hypothesise a link, between people who do very similar edits to me, and call for me to be blocked using Freudian analysis, but that would be unkind (if funny). Rich Farmbrough, 16:04, 1 April 2012 (UTC).

What is the most advanced operation you've used AWB for?
And how did you do it? The Transhumanist 05:38, 1 April 2012 (UTC)
 * Hm, well I did use it for checksum calculations and ISBN hyphenation. Basically  I wrote a perl program to write a program to write the rulebase.The hyphenation was just a large number of rules, but the calculations involved implementing a partial arithmetic parser in regular expressions, including full addition and multiplication tables modulo 11 (and possibly 10 as well). Rich Farmbrough, 16:03, 1 April 2012 (UTC).

What do your bots consist of?
I.e., what are they made of (what languages, programs, etc.)? The Transhumanist 05:01, 1 April 2012 (UTC)
 * I use perl and AWB. For example the main bot runs on perl (because I was being blocked for using AWB), but if I have a one-off job it is often quicker to use AWB.  Even there though I use perl to write some of the rules. Rich Farmbrough, 15:58, 1 April 2012 (UTC).

Can AWB be used to remove redlinks from a list?
(For example, see: Outline of Mozambique)

How? The Transhumanist 05:44, 1 April 2012 (UTC)


 * Kinda.

Use the list maker to make a list of "Links on page (redlinks only)".

Save the list to a text file.

Replace the carriage returns in the text file with "|". Copy the content.

Create a normal rule that replaces \[\[\]\] with $1

Run it against the page in question.

Rich Farmbrough, 16:13, 1 April 2012 (UTC).

Do you have a perl script...
...that opens a file, does something to it, and then saves it under a new filename?

I need to see how that is done. The Transhumanist 05:51, 1 April 2012 (UTC)


 * Er... so I think we covered this? Something like.

OPEN FILE, "<:utf-8", "oldfile"; while (){ $text .= $_} CLOSE FILE

$text =~ s/e/z/; # replace e with z to even up letter usage across the universe a little
 * 1) do some stuff

OPEN FILE, ">:utf-8", "newfile"; print FILE $text; CLOSE FILE

Rich Farmbrough, 16:08, 1 April 2012 (UTC).

I hate semicolons!
It took me over an hour to realize my script didn't work because a semicolon was missing from the end of a line. The Transhumanist 19:53, 2 April 2012 (UTC)
 * If I had a shilling for every time I'd done that I'd be Rich! Rich Farmbrough, 20:19, 2 April 2012 (UTC).


 * And that, you are. – The Transhumanist 20:07, 3 April 2012 (UTC)

How do I pull a file into a scalar?
I opened a file, and tried to define a variable to be the contents of the filehandle, like this:

But it just prints out the filehandle!

What am I doing wrong? The Transhumanist 19:53, 2 April 2012 (UTC)
 * 


 * this will pull the next record in scalar context. If you set the record separator appropriately you should get the whole file. (It's a special variable $/ and $\ are the input and output separators.)

It is not clear from what you've said how to use the record separators. What should the line look like? Like this...?

I'm trying to be able to use the following line of code to search a file for a string. If it's in there, I want the program to run a subroutine. If it's not in there I want the program to run a different subroutine.

I'm kinda stuck. The Transhumanist 21:27, 2 April 2012 (UTC)

OK, the angle brackets work, but it only prints out one line from the file. If what you meant was to put record separators in the file, then how do you search files without preprocessing every single file with the insertion of record separators? What if I want to search a file that's not a list and still be able to use the file for something else? The Transhumanist 22:22, 2 April 2012 (UTC)
 * When perl reads the file, it uses \n or \r\n or \n\r as the file separator (depending on OS) - this is stored in the variable $/ . If you set $/ to the end of file marker (or some sting you will not encounter) I would expect it would read the whole file. Other methods are using binary mode, or reading a line at a time:

while () {$text .= $_}


 * Rich Farmbrough, 22:54, 2 April 2012 (UTC).

I found something called "local" that seems to do the trick:

Though I'm not exactly sure why this works. The Transhumanist 23:18, 2 April 2012 (UTC)


 * It works because it makes the value of "$/" undefined -- a value that can never occur in a file. The "local" keyword does this as a side effect of its main purpose (controlling variable scope); you can get the same effect by assigning "undef" to the global copy of "$/", as in "$/ = undef;". --Carnildo (talk) 01:44, 3 April 2012 (UTC)

Checking each item from one list against another list
I have two lists. list1.txt and list2.txt.

I can't believe I'm still in the file IO. I haven't even gotten to the guts of the program yet. Frustrating! The Transhumanist 00:28, 3 April 2012 (UTC)

ok, local is creating a scoped version of $/ that is undefined. I haven't tried this but I suppose it works, and rather nicely in a way, since if you were using this in a block the default value of $/ would come back when you leave the block.

Now the problem you have is that you will slurp file 2 the same way you slurped file 1. So you need something like

ATB. Rich Farmbrough, 00:55, 3 April 2012 (UTC).

With bare bones subroutines...
The print functions show that the program actually works.

Now I have the places to put the guts. Thank you!

By the way, what is this part of the program called, an IO skeleton? The Transhumanist 06:01, 3 April 2012 (UTC)

Installing Wikipedia locally
Is Wikipedia downloadable?

Do you have it installed on your computer? The Transhumanist 01:02, 3 April 2012 (UTC)
 * You can download the content (see database dumps on my user page) and the software (www.mediawiki.org). I have both, but not the content loaded into the software. Rich Farmbrough, 01:21, 3 April 2012 (UTC).


 * Cool. What does loading the content into the software entail?


 * I'm thinking that testing programs on a local copy of Wikipedia could be useful. Having access off-line would also be nice (my Internet access is sporadic).


 * I'm curious as to what use the database dump is without having it loaded into MediaWiki. What do you use it for?  The Transhumanist 06:14, 3 April 2012 (UTC)


 * It's XML so you can write perl to scan it very easily. Also AWB has facilities to scan it. It's useful for identifying problem articles, making reports, doing statistics and extracting data. Rich Farmbrough, 12:50, 3 April 2012 (UTC).