Wikipedia:Bots/Requests for approval/HiTeCBot


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol delete vote.svg Denied.

HiTeCBot
Operator: Vigsterkr

Automatic or Manually Assisted: Automated

Programming Language(s): Python Wikipediabot Framework, C++

Function Summary: Automated categorization of the articles.

Edit period(s) (e.g. Continuous, daily, one time run): edit is not required

Edit rate requested: -

Already has a bot flag (Y/N): N

Function Details: As part of an on-going research at my university we would like to apply our hierarchical text categorizer (HiTeC, see: http://categorizer.tmit.bme.hu/) for wikipedia. This would require that we could retrive the whole category structure of wikipedia (currently just the english version) and store it in our own format and retrive a given number of articles that we could use as training dataset for HiTeC. As a result we could provide an automated categorization for new and currently uncategorized articles. Probably we could give more relevant results on a simple search query than an index based search engine - this is to be verified after applying HiTeC to wikipedia (see the requirements above).

Discussion
Do you know about database dumps? This will give you access to all of wikipedia without clogging the server up retrieving all the information you want. :: maelgwn - talk 01:28, 18 October 2007 (UTC)
 * If its not editing, and therefore not needing to get data at runtime... This BRFA isnt needed... And may aswell be denied..? Reedy Boy
 * I would say so... unfortunately i didn't know that database dumps exists, before i've made the request... sorry Vigsterkr
 * No problem. =) Reedy Boy 09:21, 19 October 2007 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.