User:Yurik/Query API/User Manual

 Attention Query API users: The Query API is replaced by the official API. It is completely disabled on Wikimedia projects!

Overview
Query API provides a way for your applications to query data directly from the MediaWiki servers. One or more pieces of information about the site and/or a given list of pages can be retrieved. Information may be returned in either a machine (xml, json, php, wddx) or a human readable format. More than one piece of information may be requested with a single query.


 * Note: Query API is being migrated into the new API interface. Please use the new API, which is now a part of the standard MediaWiki engine.


 * New API live: http://en.wikipedia.org/w/api.php
 * Query API live: http://en.wikipedia.org/w/query.php
 * View the Source Code

Installation
These notes cover my experience - Fortyfoxes 00:50, 8 August 2006 (UTC) - of installing query.php on a shared virtual host, and may not apply to all set ups. I have the following configuration:


 * MediaWiki: 1.7.1
 * PHP: 5.1.2 (cgi-fcgi)
 * MySQL: 5.0.18-standard-log

Installation is fairly straight forward once you got the principles. Query.php is not like other documented "extensions" to MediaWiki - it does its own thing, and does not need integrating into the overall environment so that it can be called within wiki pages - so no registering with LocalSettings.php (my first mistake).

Installation Don'ts
Explicitly - do *NOT* place a "# require_once( "extensions/query.php" ); line in LocalSettings.php!

Installation Do's
All Query API files must be placed two levels below the main MediaWiki directory. For example:

/home/myuserName/myDomainDir/w/extensions/botquery/query.php

where the directory "w/" is the standard MediaWiki directory named in such a way as not to clash - ie not MediaWiki or Wiki. This allows easier redirection with .htaccess for tidier urls.

Apache Rewrite Rules and URls
This is not required, but might be desirable for shorter URLs to debug
 * In progress - have to see how pointing a subdomain (wiki.mydomain.org) at the installation affects query.php!

Short URLs with a symlink
Using the conventions above:

$ cd /home/myuserName/myDomainDir/w # change to directory containing LocalSettings.php $ ln -s extensions/botquery/query.php.

Short URLs in proper way
If you've got permission to edit "httpd.conf" file (Apache server configuration file), it's much better to create alias for "query.php". To do that, just add the following line to "httpd.conf" aliases section: Alias /w/query.php "c:/wamp/www/w/extensions/botquery/query.php" Of course, the path could be different on your system. Enjoy. --CodeMonk 16:00, 27 January 2007 (UTC)

Python
This sample uses the simplejson library found here.

import simplejson, urllib, urllib2 QUERY_URL = u"http://en.wikipedia.org/w/query.php" HEADERS = {"User-Agent" : "QueryApiTest/1.0" } def Query(**args): args.update({        "noprofile" : "",      # Do not return profiling information          "format"   : "json" ,  # Output in JSON format      }) req = urllib2.Request(QUERY_URL, urllib.urlencode(args), HEADERS) return simplejson.load(urllib2.urlopen(req)) # Request links for Main Page data = Query(titles="Main Page", what="links" ) # If exists, print the list of links from 'Main Page' <font color='blue'>if <font color='brown'>"pages" <font color='blue'>not <font color='blue'>in data: <font color='blue'>print <font color='brown'>"No pages" <font color='blue'>else : <font color='blue'>for pageID, pageData <font color='blue'>in data[<font color='brown'>"pages" ].iteritems: <font color='blue'>if <font color='brown'>"links" <font color='blue'>not <font color='blue'>in pageData: <font color='blue'>print <font color='brown'>"No links" <font color='blue'>else : <font color='blue'>for link <font color='blue'>in pageData[<font color='brown'>"links" ]: <font color='green'># To safelly print unicode characters on the console, set 'cp850' for Windows and 'iso-8859-1' for Linux <font color='blue'>print link[<font color='brown'>"*" ].encode(<font color='brown'>"cp850", <font color='brown'>"replace" )

Ruby
This example prints all the links on the Ruby (programming language) page.

require 'net/http' require 'yaml' require 'uri' @http = Net::HTTP. new ( "en.wikipedia.org", 80 ) def query ( args={} ) options = { :format => "yaml" , :noprofile => "" }.merge args url = "/w/query.php?" << options.collect{|k,v| "#{k}=#{ URI  .escape v}" }.join( "&" ) response = @http .start do |http| request = Net :: HTTP :: Get. new (url) http.request(request) end YAML .load response.body end result = query( :what => 'links', :titles => 'Ruby (programming language)' ) if result[ "pages" ].first[ "links" ] result[ "pages" ].first[ "links" ].each{|link| puts link[ "*" ]} else puts "no links" end

Browser-based
You want to use the JSON output by setting <tt>format=json</tt>. However, until you're figured out the parameters to supply query.php with and where the data will be, you can use <tt>format=jsonfm</tt> instead.

Once this is done, you <tt>eval</tt> the response text returned by query.php and extract your data from it.

JavaScript
<font color='green'>// this function attempts to download the data at url. <font color='green'>// if it succeeds, it runs the callback function, passing <font color='green'>// it the data downloaded and the article argument <font color='blue'>function download(url, callback, article) { <font color='blue'>var http = window.XMLHttpRequest ? <font color='blue'>new XMLHttpRequest : window.ActiveXObject ? <font color='blue'>new ActiveXObject(<font color='brown'>"Microsoft.XMLHTTP" ) : <font color='blue'>false ; <font color='blue'>if (http) { http.onreadystatechange = <font color='blue'>function { <font color='blue'>if (http.readyState == 4) { callback(http.responseText, article); }      };       http.open(<font color='brown'>"GET", url, <font color='blue'>true ); http.send(<font color='blue'>null ); } } <font color='green'>// convenience function for getting children whose keys are unknown <font color='green'>// such as children of pages subobjects, whose keys are numeric page ids <font color='blue'>function anyChild(obj) { <font color='blue'>for (<font color='blue'>var key <font color='blue'>in obj) { <font color='blue'>return obj[key]; }   <font color='blue'>return <font color='blue'>null ; } <font color='green'>// tell the user a page that is linked to from article <font color='blue'>function someLink(article) { <font color='green'>// use format=jsonfm for human-readable output <font color='blue'>var url = <font color='brown'>"http://en.wikipedia.org/w/query.php?format=json&what=links&titles=" + escape(article); download(url, finishSomeLink, article); } <font color='green'>// the callback, run after the queried data is downloaded <font color='blue'>function finishSomeLink(data, article) { <font color='blue'>try { <font color='green'>// convert the downloaded data into a javascript object <font color='blue'>eval (<font color='brown'>"var queryResult=" + data); <font color='green'>// we could combine these steps into one line <font color='blue'>var page = anyChild(queryResult.pages); <font color='blue'>var links = page.links; } <font color='blue'>catch (someError) { alert(<font color='brown'>"Oh dear, the JSON stuff went awry" ); <font color='green'>// do something drastic here }   <font color='blue'>if (links && links.length) { alert(links[0][<font color='brown'>"*" ] + <font color='brown'>" is linked from " + article); } <font color='blue'>else { alert(<font color='brown'>"No links on " + article + <font color='brown'>" found" ); } } someLink(<font color='brown'>"User:Yurik" );

How to run javascript examples
In Firefox, drag JSENV link (2nd) at this site to your bookmarks toolbar. While on a wiki site, click the button and copy/paste the code into the debug window. Click Execute at the top.

Perl
This example was inherited from MediaWiki perl module code by User:Edward Chernenko.


 * Do NOT get MediaWiki data using LWP. Please use a module such as MediaWiki::API instead.

use LWP::UserAgent; sub readcat ($) {   my $cat = shift ; my $ua = LWP::UserAgent-> new ; my $res = $ua -> get ( " http://en.wikipedia.org/w/query.php?format=xml&what=category&cptitle= $cat " ); return unless $res -> is_success ; $res = $res -> content ; # good for MediaWiki module, but ugly as example! # it should _parse_ XML, not match known parts... while ( $res =~ /(?<= ).*?(?=<\/page>)/sg) {       my $page = $& ; $page =~ /(?<=<ns>).*?(?=<\/ns>)/; my $ns = $& ; $page =~ /(?<= ).*?(?=<\/title>)/; my $title = $& ; if( $ns == 14) {          my @a = split /:/, $title ; shift @a ; $title = join  ":", @a ; push @subs, $title ; }       else {          push  @pages, $title ; }   }    return (\ @pages, \ @subs ); } my( $pages_p, $subcat_p ) = readcat ( "Unix" ); print "Pages:        ". join(", ", sort @$pages_p). "\n"; print "Subcategories: ". join(", ", sort @$subcat_p). "\n";

C (Microsoft .NET Framework 2.0)
The following function is a simpified code fragment of DotNetWikiBot Framework.
 * Attention: This example needs to be revised to remove RegEx parsing of the XML data. There are plenty of XML, JSON, and other parsers available or built into the framework. --Yurik 05:44, 13 February 2007 (UTC)

using System; using System.Text.RegularExpressions; using System.Collections.Specialized; using System.Net; using System.Web; /// This internal function gets all page titles from the specified /// category page using "Query API" interface. It gets titles portion by portion. /// It gets subcategories too. The result is contained in "strCol" collection. /// <param name="categoryName">Name of category with prefix, like "Category:...". public void FillAllFromCategoryEx( string categoryName) {    string src = "" ; StringCollection strCol = new StringCollection; MatchCollection matches; Regex nextPortionRE = new Regex( "<category next=\"(.+?)\" />" ); Regex pageTitleTagRE = new Regex( " ([^<]*?) " ); WebClient wc = new WebClient; do { Uri res = new Uri(site.site + site.indexPath + "query.php?what=category&cptitle=" +            categoryName + "&cpfrom=" + nextPortionRE.Match(src).Groups[1].Value + "&format=xml" ); wc.Credentials = CredentialCache.DefaultCredentials; wc.Encoding = System.Text.Encoding.UTF8; wc.Headers.Add( "Content-Type", "application/x-www-form-urlencoded" ); wc.Headers.Add( "User-agent", "DotNetWikiBot/1.0" ); src = wc.DownloadString(res); matches = pageTitleTagRE.Matches(src); foreach (Match match in matches) strCol.Add(match.Groups[1].Value); }    while (nextPortionRE.IsMatch(src)); }

Chicken Scheme
;; Write a list of html links to the latest changes ;; ;; NOTES ;; http:GET takes a URL and returns the document as a character string ;; SSAX:XML->SXML reads a character-stream of XML from a port and returns ;; a list of SXML equivalent to the XML. ;; sxpath takes an sxml path and produces a procedure to return a list of all ;; nodes corresponding to that path in an sxml expression. ;; (require-extension http-client) (require-extension ssax) (require-extension sxml-tools) ;; (define sxml  (with-input-from-string (http:GET "http://en.wikipedia.org/w/query.php?what=recentchanges&rchide=bots&format=xml&rclimit=200") (lambda       (SSAX:XML->SXML (current-input-port) ')))) (for-each (lambda (x) (display x)(newline))  (map (lambda (x)      (string-append "<a href=\"http://en.wikipedia.org/w/index.php?title="        (cadr x) "&diff=cur\">" (cadr x) "</a>   ")) ((sxpath "yurik/pages/page/title") sxml)))