Wikipedia:Bots/Requests for approval/MetrikiBot


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol neutral vote.svg Request Expired.

MetrikiBot
Operator:

Time filed: 20:15, Wednesday February 29, 2012 (UTC)

Automatic, Supervised, or Manual:Manual

Programming language(s):Java

Source code available:no, not fully written yet

Function overview:I'm trying to write a bot that downloads page history information for data mining for my MS research.

Links to relevant discussions (where appropriate):

Edit period(s):No editing will be done, we are only downloading information, plan to use periodic batch runs.

Estimated number of pages affected:No pages will be edited, estimate downloading page histories for hundreds of pages.

Exclusion compliant (Y/N):Y

Already has a bot flag (Y/N):

'''Function details:We are not editing any pages. We are interested in high volume downloads without facing the limit of the 500 revisions that will be returned.'''

Discussion

 * This bot appears to have edited since this BRFA was filed. Bots may not edit outside their own or their operator's userspace unless approved or approved for trial. AnomieBOT ⚡ 20:24, 29 February 2012 (UTC)

Copied from pre-rename BRFA
Can you use database dump? — HELL KNOWZ  ▎TALK 21:26, 28 February 2012 (UTC)

No, we don't have terabytes of space available for use. We are looking to download a representative sample of version histories from 100s, not thousands of examples.--Metriki (talk) 21:31, 28 February 2012 (UTC)
 * This bot has edited its own BRFA page. Bot policy states that the bot account is only for edits on approved tasks or trials approved by BAG; the operator must log into their normal account to make any non-bot edits. AnomieBOT ⚡ 21:33, 28 February 2012 (UTC)
 * Note 2 : Presumably the bot would be named MetrikiBot or similar, but the user is new and might not have expected the BRFA process to use the name of the bot, rather than the name of the user. Headbomb {talk / contribs / physics / books} 22:17, 28 February 2012 (UTC)
 * I have it downloaded and it takes about 1.6 TB with all the revision history for the English version with no talk pages. If you don't need all the revision history it drops to about 400-600 GB and gets smaller as you start breaking things off you don't need (templates for example). Also, take a look here. I'm not sure if a bot of this type would be allowed. 71.163.243.232 (talk) 02:42, 29 February 2012 (UTC)
 * Kumioko, if you're going to retire, retire. Or at the very least don't disrupt BRFAs by making BS claims about what BAG allows or does not allow for bots. Plenty of bots like this were approved in the past, and plenty will be in the future too. Headbomb {talk / contribs / physics / books} 14:04, 29 February 2012 (UTC)
 * I concur with Headbomb that this bot's purpose is allowed as soon as the requester confirms a new account name.  MBisanz  talk 17:59, 29 February 2012 (UTC)
 * Kumioko may have a point: according to Robot policy, large batched downloads via the API are not necessarily a good thing to do. Might Special:Export work better for your purpose? Anomie⚔ 21:12, 29 February 2012 (UTC)
 * There's a limit on Special:Export of 1000 revisions; I'd imagine that the vast majority of pages don't butt heads against that, and the odd ones that do could pull the remainder down via API calls. Josh Parris 04:53, 1 March 2012 (UTC)

This has gone quiet. Are you still interested in pursuing this BRfA? Josh Parris 23:02, 7 March 2012 (UTC)
 * Note: there are several ways around this. "Banging heads against the 500 revision limit" is just a question of understanding the API properly. Moreover if more-or-less random pages are needed then you can download one chunk of the full dump and use that. It would be useful to understand the research question. Rich Farmbrough, 16:50, 14 March 2012 (UTC).


 * (To clarify, Zack implies that he needs "the history" but there's a hint that it's the contents of the history pages, rather than the full set of revisions.) Rich Farmbrough, 16:54, 14 March 2012 (UTC).

Bored now. Josh Parris 02:04, 15 March 2012 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.