User:Yurik/Query API/User Manual

Attention Query API users:

The Query API is replaced by the official API. It is completely disabled on Wikimedia projects!

Overview[edit]

Query API provides a way for your applications to query data directly from the MediaWiki servers. One or more pieces of information about the site and/or a given list of pages can be retrieved. Information may be returned in either a machine (xml, json, php, wddx) or a human readable format. More than one piece of information may be requested with a single query.

Note: Query API is being migrated into the new API interface. Please use the new API, which is now a part of the standard MediaWiki engine.

New API live: http://en.wikipedia.org/w/api.php
Query API live: http://en.wikipedia.org/w/query.php
View the Source Code

Installation[edit]

These notes cover my experience - Fortyfoxes 00:50, 8 August 2006 (UTC) - of installing query.php on a shared virtual host [1], and may not apply to all set ups. I have the following configuration:

MediaWiki: 1.7.1
PHP: 5.1.2 (cgi-fcgi)
MySQL: 5.0.18-standard-log

Installation is fairly straight forward once you got the principles. Query.php is not like other documented "extensions" to MediaWiki - it does its own thing, and does not need integrating into the overall environment so that it can be called within wiki pages - so no registering with LocalSettings.php (my first mistake).

Installation Don'ts[edit]

Explicitly - do *NOT* place a "# require_once( "extensions/query.php" ); line in LocalSettings.php!

Installation Do's[edit]

All Query API files must be placed two levels below the main MediaWiki directory. For example:

/home/myuserName/myDomainDir/w/extensions/botquery/query.php

where the directory "w/" is the standard MediaWiki directory named in such a way as not to clash - ie not MediaWiki or Wiki. This allows easier redirection with .htaccess for tidier urls.

Apache Rewrite Rules and URls[edit]

This is not required, but might be desirable for shorter URLs to debug

In progress - have to see how pointing a subdomain (wiki.mydomain.org) at the installation affects query.php!

Short URLs with a symlink[edit]

Using the conventions above:

$ cd /home/myuserName/myDomainDir/w # change to directory containing LocalSettings.php
$ ln -s extensions/botquery/query.php .

Short URLs in proper way[edit]

If you've got permission to edit "httpd.conf" file (Apache server configuration file), it's much better to create alias for "query.php". To do that, just add the following line to "httpd.conf" aliases section:

Alias /w/query.php "c:/wamp/www/w/extensions/botquery/query.php"

Of course, the path could be different on your system. Enjoy. --CodeMonk 16:00, 27 January 2007 (UTC)

Usage[edit]

Python[edit]

This sample uses the simplejson library found here.

import simplejson, urllib, urllib2

QUERY_URL = u"http://en.wikipedia.org/w/query.php"
HEADERS = {"User-Agent"  : "QueryApiTest/1.0"}

def Query(**args):
    args.update({
        "noprofile": "",      # Do not return profiling information
        "format"   : "json",  # Output in JSON format
    })
    req = urllib2.Request(QUERY_URL, urllib.urlencode(args), HEADERS)
    return simplejson.load(urllib2.urlopen(req))

# Request links for Main Page
data = Query(titles="Main Page", what="links")

# If exists, print the list of links from 'Main Page'
if "pages" not in data:
    print "No pages"
else:
    for pageID, pageData in data["pages"].iteritems():
        if "links" not in pageData:
            print "No links"
        else:
            for link in pageData["links"]:
                # To safelly print unicode characters on the console, set 'cp850' for Windows and 'iso-8859-1' for Linux
                print link["*"].encode("cp850", "replace")

Ruby[edit]

This example prints all the links on the Ruby (programming language) page.

 require 'net/http'
 require 'yaml'
 require 'uri'
 
 @http = Net::HTTP.new("en.wikipedia.org", 80)
 
 def query(args={})
   options = {
     :format => "yaml",
     :noprofile => ""
   }.merge args
   
   url = "/w/query.php?" << options.collect{|k,v| "#{k}=#{URI.escape v}"}.join("&")
   
   response = @http.start do |http|
     request = Net::HTTP::Get.new(url)
     http.request(request)
   end
   YAML.load response.body
  end
 
 result = query(:what => 'links', :titles => 'Ruby (programming language)')
 
 if result["pages"].first["links"]
   result["pages"].first["links"].each{|link| puts link["*"]}
 else
   puts "no links"
 end

Browser-based[edit]

You want to use the JSON output by setting format=json. However, until you're figured out the parameters to supply query.php with and where the data will be, you can use format=jsonfm instead.

Once this is done, you eval the response text returned by query.php and extract your data from it.

JavaScript[edit]

// this function attempts to download the data at url.
// if it succeeds, it runs the callback function, passing
// it the data downloaded and the article argument
function download(url, callback, article) {
   var http = window.XMLHttpRequest ? new XMLHttpRequest()
     : window.ActiveXObject ? new ActiveXObject("Microsoft.XMLHTTP")
     : false;
  
   if (http) {
      http.onreadystatechange = function() {
         if (http.readyState == 4) {
            callback(http.responseText, article);
         }
      };
      http.open("GET", url, true);
      http.send(null);
   }
}

// convenience function for getting children whose keys are unknown
// such as children of pages subobjects, whose keys are numeric page ids
function anyChild(obj) { 
   for(var key in obj) {
      return obj[key];
   }
   return null; 
}

// tell the user a page that is linked to from article
function someLink(article) {
   // use format=jsonfm for human-readable output
   var url = "http://en.wikipedia.org/w/query.php?format=json&what=links&titles=" + escape(article);
   download(url, finishSomeLink, article);
}

// the callback, run after the queried data is downloaded
function finishSomeLink(data, article) {
   try {
      // convert the downloaded data into a javascript object
      eval("var queryResult=" + data);
      // we could combine these steps into one line
      var page = anyChild(queryResult.pages);
      var links = page.links;
   } catch (someError) {
      alert("Oh dear, the JSON stuff went awry");
      // do something drastic here
   }
   
   if (links && links.length) {
      alert(links[0]["*"] + " is linked from " + article);
   } else {
      alert("No links on " + article + " found");
   }
}

someLink("User:Yurik");

How to run javascript examples[edit]

In Firefox, drag JSENV link (2nd) at this site to your bookmarks toolbar. While on a wiki site, click the button and copy/paste the code into the debug window. Click Execute at the top.

Perl[edit]

This example was inherited from MediaWiki perl module code by User:Edward Chernenko.

Do NOT get MediaWiki data using LWP. Please use a module such as MediaWiki::API instead.

use LWP::UserAgent;
sub readcat($)
{
   my $cat = shift;
   my $ua = LWP::UserAgent->new();
 
   my $res = $ua->get("http://en.wikipedia.org/w/query.php?format=xml&what=category&cptitle=$cat");
   return unless $res->is_success();
   $res = $res->content();
 
   # good for MediaWiki module, but ugly as example!
   # it should _parse_ XML, not match known parts...
   while($res =~ /(?<=<page>).*?(?=<\/page>)/sg)
   {
       my $page = $&;
       $page =~ /(?<=<ns>).*?(?=<\/ns>)/;
       my $ns = $&;
       $page =~ /(?<=<title>).*?(?=<\/title>)/;
       my $title = $&;
 
       if($ns == 14)
       {
          my @a = split /:/, $title; 
          shift @a; $title = join ":", @a;
          push @subs, $title;
       }
       else
       {
          push @pages, $title;
       }
   }
   return(\@pages, \@subs);
}
 
my($pages_p, $subcat_p) = readcat("Unix");
print "Pages:         " . join(", ", sort @$pages_p) . "\n";
print "Subcategories: " . join(", ", sort @$subcat_p) . "\n";

C# (Microsoft .NET Framework 2.0)[edit]

The following function is a simpified code fragment of DotNetWikiBot Framework.

Attention: This example needs to be revised to remove RegEx parsing of the XML data. There are plenty of XML, JSON, and other parsers available or built into the framework. --Yurik 05:44, 13 February 2007 (UTC)

using System;
using System.Text.RegularExpressions;
using System.Collections.Specialized;
using System.Net;
using System.Web;

/// <summary>This internal function gets all page titles from the specified
/// category page using "Query API" interface. It gets titles portion by portion.
/// It gets subcategories too. The result is contained in "strCol" collection. </summary>
/// <param name="categoryName">Name of category with prefix, like "Category:...".</param>
public void FillAllFromCategoryEx(string categoryName)
{
    string src = "";
    StringCollection strCol = new StringCollection();
    MatchCollection matches;
    Regex nextPortionRE = new Regex("<category next=\"(.+?)\" />");
    Regex pageTitleTagRE = new Regex("<title>([^<]*?)</title>");
    WebClient wc = new WebClient();
    do {
        Uri res = new Uri(site.site + site.indexPath + "query.php?what=category&cptitle=" +
            categoryName + "&cpfrom=" + nextPortionRE.Match(src).Groups[1].Value + "&format=xml");
        wc.Credentials = CredentialCache.DefaultCredentials;
        wc.Encoding = System.Text.Encoding.UTF8;
        wc.Headers.Add("Content-Type", "application/x-www-form-urlencoded");
        wc.Headers.Add("User-agent", "DotNetWikiBot/1.0");
        src = wc.DownloadString(res);		    	
        matches = pageTitleTagRE.Matches(src);
        foreach (Match match in matches)
            strCol.Add(match.Groups[1].Value);
    }
    while (nextPortionRE.IsMatch(src));
}

PHP[edit]

 // Please remember that this example requires PHP5.
 ini_set('user_agent', 'Draicone\'s bot');
 // This function returns a portion of the data at a url / path
 function fetch($url,$start,$end){
 $page = file_get_contents($url);
 $s1=explode($start, $page);
 $s2=explode($end, $page);
 $page=str_replace($s1[0], '', $page);
 $page=str_replace($s2[1], '', $page);
 return $page;
 }
 // This grabs the RC feed (-bots) in xml format and selects everything between the pages tags (inclusive)
 $xml = fetch("http://en.wikipedia.org/w/query.php?what=recentchanges&rchide=bots&format=xml","<pages>","</pages>");
 // This establishes a SimpleXMLElement - this is NOT available in PHP4.
 $xmlData = new SimpleXMLElement($xml);
 // This outputs a link to the curr diff of each article
 foreach($xmlData->page as $page) {
 echo "<a href=\"http://en.wikipedia.org/w/index.php?title=". $page->title . "&diff=curr\">". $page->title . "</a><br />\n";
 }

Chicken Scheme[edit]

;; Write a list of html links to the latest changes
;;
;; NOTES
;; http:GET takes a URL and returns the document as a character string
;; SSAX:XML->SXML reads a character-stream of XML from a port and returns
;; a list of SXML equivalent to the XML.
;; sxpath takes an sxml path and produces a procedure to return a list of all
;; nodes corresponding to that path in an sxml expression.
;;
(require-extension http-client)
(require-extension ssax)
(require-extension sxml-tools)
;;
(define sxml
  (with-input-from-string
    (http:GET "http://en.wikipedia.org/w/query.php?what=recentchanges&rchide=bots&format=xml&rclimit=200")
    (lambda ()
      (SSAX:XML->SXML (current-input-port) '()))))
(for-each (lambda (x) (display x)(newline))
  (map
    (lambda (x)
      (string-append
        "<a href=\"http://en.wikipedia.org/w/index.php?title="
        (cadr x) "&diff=cur\">" (cadr x) "</a><br/>"))
    ((sxpath "yurik/pages/page/title") sxml)))