Wikipedia:Bots/Requests for approval/OwensQueryBot


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol oppose vote.svg Withdrawn by operator.

OwensQueryBot
Operator:

Automatic or Manually assisted: Manual

Programming language(s): Python

Source code available: Yes

Function overview: Perform real-time read-only queries

Links to relevant discussions (where appropriate):

Edit period(s): Non-editing

Estimated number of pages affected: Non-editing

Exclusion compliant (Y/N): Y

Already has a bot flag (Y/N): N

Function details: I'm developing OwensQueryBot as a read-only query bot. It is my hope to develop real-time statistical sampling methods using the bot and perhaps comparing them to some of the output at grok.se. Therefore using the database dump from 2008 wouldn't be useful for the real-time statistical applications I hope to develop. Furthermore, since my goal is to develop parsimonious sampling methods, this bot will not use very much bandwidth (in fact, if designed properly it should hardly use more bandwidth than a human user). I plan on developing the code off of the pywikipedia framework, and plan on making it all open-source on this page because I'm hoping that other statisticians and interested parties will use my code to get their samplers developed as well.

Discussion
My understanding is that if the bot will not make a single edit outside of its own user space, it does not need BAG approval. harej 02:11, 30 March 2010 (UTC)
 * Yes this is what I've been told before, but when I try to run pywikipedia I get immediately rejected for not having the bot flag. Can you suggest another route? Perhaps another set of code that doesn't call Wikipedia's API with edit requests? Owensmartin (talk) 02:14, 30 March 2010 (UTC)
 * Since the purpose of the bot account would be to make the most use of the Wikipedia bot infrastructure, and it will not be approved to make any edits, I will grant this bot full approval in seven days unless there is an objection (which I cannot foresee but you never know). harej  02:15, 30 March 2010 (UTC)
 * Thanks! In the meanwhile I'll make every effort to develop python code that won't need the bot approval. I'm certain there's a workaround somewhere but it isn't immediately obvious to me. Owensmartin (talk) 02:29, 30 March 2010 (UTC)


 * What will it be querying to compare to the grok.se site? stats.grok.se has pageview data, which isn't available through the API or anywhere on the site. Also, if all you want to do is do API queries, pywikipedia is probably the worst possible framework, AFAIK, pywikipedia is the only heavily-used framework that still makes use of screen-scraping for many functions (its also probably the only one that puts arbitrary limits like requiring that the account actually have a bot flag before it will run). Creating a bot lists other frameworks.
 * I would also note that there's also a full history dump available from earlier this year and a "current versions only" dump from 2 weeks ago. Mr.Z-man 02:56, 30 March 2010 (UTC)

Pywikipedia does not "immediately reject" you for not having a bot flag; you're simply misunderstanding the message it gives you. You can still run scripts; they'll produce some warnings here-and-there, but you should be able to safely ignore them. The only use I see for this account is to make large API queries, because bot accounts can request data with a maximum limit of 5000 results, while normal users are restricted to 500 results. Mr.Z-man, Pywikipedia does actually use API queries in some of its functions (if you know which ones to use). At any rate, for what you want to do, I recommend the API and JSON. You don't need much more; Python works well with JSON and API queries are very easy to do (not to mention fast). Once you start writing the bot and begin querying the wiki, see if you are being affected by the API's max query limits. If you are, then this approval can be carried out – we'll give you a bot flag, and you can have the bot log in to the API and query with larger limits. However, if you do not find yourself limited by the API's max query limits, then this bot account is not necessary. You can query Wikipedia's API the normal way, without logging in. (Remember to use a user agent!) &mdash; The Earwig   (talk)  03:43, 30 March 2010 (UTC)


 * Gents, thanks for keeping this discussion going. Indeed I figured that using python and JSON would be the easiest way to collect, manage, and query data, but I was hoping not to reinvent the wheel in putting it all together. pywikipedia is loaded with functions, when all I really want is to able to pass the usual SQL queries to the API. Are, for example, mwclient or wikitools more succinct and useful for my purposes? Am I on the right track? Owensmartin (talk) 17:05, 30 March 2010 (UTC)


 * If you're doing non-textual queries, a toolserver account may be an appropriate mechanism for your needs; SQL is much quicker than API queries. Josh Parris 13:11, 10 April 2010 (UTC)

This has gone very quiet. Are you still intending to pursue this? Josh Parris 14:40, 4 May 2010 (UTC)
 * Thanks Josh, indeed I've been looking around for ways to process the dump directly instead of doing API queries, and I think I've found a way. So for now, let's shelve this bot request. Thanks. Owensmartin (talk) 18:57, 7 May 2010 (UTC)
 * Thank you for that development. I have withdrawn this request per the above comment; feel free to reopen if new details emerge. &mdash; The Earwig   (talk)  20:37, 7 May 2010 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.