Wikipedia:Help desk/Archives/2009 April 19

= April 19 =

Scanning a database dump
Moved from Village pump (technical).

After freeing up about 80GB of disk space, I've downloaded a copy of the March 13 database dump (enwiki-latest-pages-articles.xml.bz2) and have uncompressed it to XML using WinRAR. I'd like to scan it to identify, count, and possibly extract articles whose content match any of several regular expressions and would like to know the most direct path to doing so. Is it mandatory to set up and load the data into MySQL? Is there a way to extract all of the articles to text files and scan them there? Is the a tool for scanning them directly in XML and extracting the articles with hits? What's the best approach for a newbie? I suspect it has been asked before, but my archive search did not yield a clear answer. There's a wealth of information scattered about, but a bit much for me to digest all at once. A little guidance would be much appreciated. I'm running Vista (wish I were running Linux). Thank you. -- Tcncv (talk) 20:53, 18 April 2009 (UTC)
 * This Help desk is for questions about using Wikipedia. Your question is about researching Wikipedia. You did not mention why you want to do regular expression searches on Wikipedia's content. Therefore we can't be sure whether your step is the best path to whatever your goal is. Lots of people do lots of searches on Wikipedia's content; maybe someone has already set up a tool somewhere to do just the kind of search you want. Your question does not sound like a "newbie" question. Rather, it sounds like something only an experienced Wikipedia user would think about trying. What you need to know is probably somewhere in the links under these entries in the Editor's index:
 * WP:EIW
 * WP:EIW
 * WP:EIW
 * WP:EIW
 * WP:EIW
 * I'm pretty sure that anyone who knows how to do what you are asking how to do probably had to read lots of manuals to figure it out. But I could be wrong; maybe there is some simple way. --Teratornis (talk) 01:31, 19 April 2009 (UTC)


 * I forgot to say: thanks for the links. I didn't know about the EIW page before.  It gives me some good information, but not entirely what I'm looking for.  However, I'm sure I'll return there to see what other good information it leads to.  -- Tcncv (talk) 04:21, 19 April 2009 (UTC)


 * I didn't mean to imply that I was new to Wikipedia or to databases or programming, I'm experienced in all – especially the latter two. What I am new at is Wikipedia database dumps.  I'm not much of a writer, so I would like to contribute from the technical side.  Presently, I would like to learn the tools needed to scan the dumps for pages having poorly formed dates (as described here).  This is not something I can do through a search engine, and since I may need to test and tweak the queries a bit, it is not something I can just put in a request for.  So I'm currently looking at setting up a WAMP package, populating the database using mwdumper or one of the other tools on mw:Manual:Importing XML dumps, and then running queries similar to   against the database.  So I have questions are:  Am I on the right track?  Should I further investigate tools that scan the XML directly, like ParseMediaWikiDump?  I an not sure if this question should be asked here, at the Village Pump, or in some other forum.  -- Tcncv (talk) 03:49, 19 April 2009 (UTC)
 * It sounds like you are on a track that will work. You should definitely try setting up MediaWiki under a WAMP (see mw:Manual:Wiki on a stick), even if you don't end up using it, because the exercise is instructive. You will learn about the MediaWiki database schema and other stuff you don't get to see on Wikipedia as an ordinary user. I don't know if it is absolutely necessary to set up your own mirror of Wikipedia, but that is one method that would probably work. Parse::MediaWikiDump sounds interesting too. If you are comfortable with Perl, that might be simpler than setting up a Wikipedia mirror. You cannot use a conventional for-the-masses search engine, to be sure, but someone may have set up a Web-based tool that can do regexp searches against Wikipedia's content. Wikipedia is turning into a nice large corpus of data that lots of people want to search in lots of ways, and some people are trying to cater to some of these needs in ways that conventional search engines do not. Google doesn't provide regexp searches on the whole Web because (I guess) that would be too expensive. But Wikipedia is small enough (compared to the whole Web) that it might be feasible for someone to play around with. When someone sets up an experimental Wikipedia-only search engine, obviously they have to provide more features than Google does, or what's the point? So regexp's might be in play. That's certainly something I would think about if I were setting up a Wikipedia search engine. You might as well see what's there. If you do determine that you have to set up your own mirror, read all the links I gave you about queries and research. Among the people who research Wikipedia, I am virtually certain there are some who have done pretty much what you want to do. Find out where they hang out. Look at the histories of the pages about Wikipedia research to see who is editing them, and ask them who to ask. It's unlikely that someone who has done this sort of thing is reading the Help desk, but you never know. Keep checking this question for followups until it goes into the archive. And good luck. --Teratornis (talk) 06:30, 19 April 2009 (UTC)
 * Thank you. I'm up to the point where I've got a XAMPP and MediaWiki installed and am using mwdumper piped to mysql to populate the database.  Currently it's running at about 1,000 pages per minute, which works out to about a 48 hour load for 2.8 million pages, assuming it doesn't slow down more.  I thought I read somewhere that it might load faster if I dropped the table indexes, but I don't know how to do that in mysql (yet).  If anyone has suggestions for speeding this up, I'd love to hear from you.  -- Tcncv (talk) 07:15, 19 April 2009 (UTC)

Try out CATSCAN. It's an easy to use SQL Searching Tool.--Manuel-aa5 (talk) 06:48, 19 April 2009 (UTC)
 * Nice tool. It's not quite what I need, but I'm sure I'll find a use for it in the future.  Fro now, I need to search the body text of all pages for various regular expressions.  Something I'm sure is too costly to do on the live database via the tool server.  -- Tcncv (talk) 07:15, 19 April 2009 (UTC)


 * The database scanning function of AutoWikiBrowser should do what you want. Graham 87 10:52, 19 April 2009 (UTC)


 * Thank You! That is exactly what I'm looking for. -- Tcncv (talk) 13:19, 19 April 2009 (UTC)

Template
If an article's citations all are to the same source, for example that they all go to the company's own website - what template do you add? Fanoftheworld (talk) 00:18, 19 April 2009 (UTC)
 * Onesource, or perhaps Primary sources in that specific case. Algebraist 00:22, 19 April 2009 (UTC)
 * Thanks. Fanoftheworld (talk) 00:34, 19 April 2009 (UTC)
 * Or possibly self-published. – ukexpat (talk) 01:36, 19 April 2009 (UTC)

what does this mean?
"The time has come", said the walrus. —Preceding unsigned comment added by 74.249.138.245 (talk) 00:33, 19 April 2009 (UTC)
 * Symbol move vote.svg Have you tried Wikipedia's Reference Desk? They specialize in knowledge questions and will try to answer just about any question in the universe (except how to use Wikipedia, since that is what this Help Desk is for). Just follow the link, select the relevant section, and ask away. I hope this helps. Algebraist 00:34, 19 April 2009 (UTC)
 * The Walrus and the Carpenter. DuncanHill (talk) 00:40, 19 April 2009 (UTC)

Recover password
Hello - I was banned some time ago for an outburst etc, and would like to return to normal editing - but have forgotten my password, and did not supply an e-mail for the account - how can I get back in?77.86.67.245 (talk) 02:38, 19 April 2009 (UTC)
 * If you did not provide an email and have forgotten your password, then unfortunately, there's nothing with which we can help you. You'll have to create a new account. However, I feel that I must officially warn you that creating accounts to evade a block is not allowed. TN X Man  02:40, 19 April 2009 (UTC)
 * I'm not blocked as far as I can tell - the page is clear of warnings and other stuff - can I reuse the username as it is obviously not in use - it is "HappyVR" ?77.86.67.245 (talk) 02:44, 19 April 2009 (UTC)
 * Actually, according to the block log for that account, you have been blocked indefinitely. Also, there has been no activity there since 2006. So, we can't really know that you're the original owner of that account. Sorry, but it looks like you'll have to register a new account. Please bear in mind what I said earlier though about creating accounts to get around blocks. TN X Man  02:51, 19 April 2009 (UTC)
 * I'm still blocked.oh. Can I appeal that - and where? And if the appeal works will someone be able to recover the password? 77.86.67.245 (talk) 02:53, 19 April 2009 (UTC)
 * You can appeal the block (and thus create a new account with no repercussions) by discussing the issue with the blocking admin. You can drop Redvers a note on his talk page. Like I mentioned earlier though, there's no way to unlock this account since you have forgotten the password and didn't enable email. TN X Man  03:04, 19 April 2009 (UTC)
 * I have put an unblock request on the user page, User:HappyVR, as my talk page is still blocked. I understand that I possibly could re-use the same user-name - without the contributions - as the account is defunct - if anyone can direct me to instructions on this I would appreciate it, though I will wait for an unblock response first.77.86.67.245 (talk) 03:13, 19 April 2009 (UTC)
 * Tnxman307 said that in order to appeal the block on your old user, you must discuss the issue with the Redvers on his talk page before creating a new account. N Waldner (talk) 04:44, 19 April 2009 (UTC)

print
I can't get my computer to print a page from Wikipedia. Is this my computer or does Wikipedia not let the user print the information? —Preceding unsigned comment added by 68.12.33.72 (talk) 03:23, 19 April 2009 (UTC)
 * You should be able to print. There's a link on the left side bar saying "Printable version". Go to that and try to print.  C h a m a l  talk 03:26, 19 April 2009 (UTC)


 * Chamal_N— Printable version does not do what you seem to think it does: see Help:Printable. 68.12.33.72— By "a page", do you mean a certain page or any page? --Gadget850 (talk) 11:22, 19 April 2009 (UTC)


 * Nope, I merely directed him to the printable version thinking that the normal version doesn't print because of some problem with formatting etc that is not compatible with the printer. I didn't tell him that that is the version to print, but just to try with it :) I'm not a tech wizard though, I don't know if it would work or not.  C h a m a l  talk 11:28, 19 April 2009 (UTC)

A question?
Last night I posted a new page, and the page actually did appear today when i googled it but only within the wikipedia wbesite but not yet officially on google (can't find it when i google it). How long does it take for the page to be officially posted or did I by any chance miss a step?

Many thanks Jelena —Preceding unsigned comment added by Enalej011 (talk • contribs) 04:22, 19 April 2009 (UTC)


 * Usually takes 72 hours. There's nothing on Wikipedia or its users' end to make this happen. It's all on Google. Someguy1221 (talk) 04:24, 19 April 2009 (UTC)


 * And let's not forget that we are here to build an encyclopedia, not compete in a contest for Google rankings. – ukexpat (talk) 17:23, 19 April 2009 (UTC)

Renaming a photo that has mis-identified an animal
There are a few pics in the "angora goat" section that have been mis-identified and are not angora goats. just wanted to change them so they are correct. —Preceding unsigned comment added by 74.132.241.19 (talk) 05:18, 19 April 2009 (UTC)
 * What photos? To get the name of an image file, click on it to see the image file page. For example, the topmost image in the Angora goat article is File:Bounce.JPG. If you disagree with the identity of an object in an image file, look at the image file page to see the user who uploaded the file, and leave a message on the user's Talk page with your best explanation of what you think is wrong and what would be right. --Teratornis (talk) 06:37, 19 April 2009 (UTC)

Presence
User:Stevertigo has moved Presence to Presence (Led Zeppelin album) depsite there being clear consensus on the Talk page to let the Led Zeppelin album stay Presence. MegX (talk) 05:21, 19 April 2009 (UTC)


 * Problem has been raised at WP:AN/I and a possible solution obtained. Please do not bring up the same issue on several pages; this can be taken as forum shopping.  C h a m a l  talk 06:36, 19 April 2009 (UTC)

Trouble with a certain page
I've been trying lately to clean up the page, List of Monster Buster Club characters, but it seems to be a losing battle. It's actually one of the worst wiki pages I've ever seen, but whenever I try to clean something up, or remove unnecessary info, it gets reverted (usually by one particular user). I'm really not sure what the best course of action is. Goodbye Galaxy (talk) 05:54, 19 April 2009 (UTC)


 * You sound on the way to getting into an edit war. DON'T.Have you tried explaining your view on the page's Discussion page? if the other user comes to the party, you might be able to sort it out there, in a civil manner. As a next step, invite other users to comment. An "outside" view might help you resolve your differences on what is necessary information. See Requests for commentfor details. It also includes further steps to take. KoolerStill (talk) 06:41, 19 April 2009 (UTC)
 * I've seen worse articles, but I suppose it's not too shocking that an article about a children's cartoon might attract a different type of editor than you'd likely meet in articles about scholarly or scientific topics. If you see a particular editor who is violating policies or guidelines, check out the editor's user talk page to see if the editor has accumulated any warnings, blocks, etc. Get an idea of whether the editor has a pattern of problems with other editors. If you can't work things out by discussing on either the article's Talk page or a user's talk page, the next step would be to explore the options under WP:EIW such as the requests for comment. I second the above advice to avoid getting into a revert war. See WP:3RR. --Teratornis (talk) 06:49, 19 April 2009 (UTC)
 * In particular you should not be reasserting a proposed deletion. Proposed deletion is for uncontroversial deletions. When someone rejects your prod, it obviously is not uncontroversial. —teb728 t c 07:06, 19 April 2009 (UTC)

Foreign language content
Are edits like this: to be undone with a proper edit summary? This is en-WP, but I don't want to seem xenophobic. Thanks.  Tide  rolls  10:35, 19 April 2009 (UTC)

No, text dumping in foreign languages should be reverted instantly unless somebody adds a translation tag which indicates they are in the middle of translating. thanks. Dr. Blofeld       White cat 10:40, 19 April 2009 (UTC)


 * Many thanks.  Tide  rolls  10:43, 19 April 2009 (UTC)

Can't access Latin symbols
Hi. You know when editing you have the choice to go between wiki markup and Latin symbols etc. For some reason mine is stuck on wiki markup and when I click Latin to access a foreign letter such as é it doesn't work. Could somebody please explain why it isn't working, its been like it for a few days and I need it now because I'm drawing up templates for Guinea which have a lot of them in. Dr. Blofeld       White cat 10:38, 19 April 2009 (UTC)
 * Mine is working fine (Chrome on WinXP). ‡œώЩםظʘ DuncanHill (talk) 10:40, 19 April 2009 (UTC)

I've tried in on chrome and internet explorer (WinXP) and it doesn't make any difference. If I click Latin or Greek in thebox to change from mark up its as dead as a dodo Dr. Blofeld       White cat 11:01, 19 April 2009 (UTC)
 * Are they greyed out or can you highlight them? I ask because I find it almost as fast to highlight a symbol, copy and then  paste.--Fuhghettaboutit (talk) 11:42, 19 April 2009 (UTC)


 * If you actually have the pulldown, then JavaScript is working. Didn't you have some other issues recently? I suggest you go to Special:Preferences and restore default settings. --Gadget850 (talk) 11:54, 19 April 2009 (UTC)

Yes I did have some problems. For instance say I clicked edit this page, at the bottom there is the insert column which contains wiki markup Latin etc. When I open that box and click Latin for instance it is just dead. None of them are greyed out it just isn't responding. Dr. Blofeld       White cat 14:22, 19 April 2009 (UTC)


 * What skin are you using- monobook? What browser? Try asking this at WP:VPT. ---— Gadget850 (Ed)   talk 17:01, 19 April 2009 (UTC)
 * BTW- this is called edittools. ---— Gadget850 (Ed)   talk 18:11, 19 April 2009 (UTC)

Where do I go to ask someone to translate an article from another language Wikipedia?
I'm looking for someone who could translate de:Nibiru into English. Do you know where I could find someone?  Serendi pod ous  14:34, 19 April 2009 (UTC)
 * I believe that this list should help you out. I believe that User:Lectonar is an active user, but I couldn't swear to it. TN X Man  14:49, 19 April 2009 (UTC)

how do I ..
How do I rename a file/photo that I have uploaded with a slightly wrong name? (Off2riorob (talk) 14:39, 19 April 2009 (UTC))


 * You can ask an administrator to rename it. Or you can upload it again under the correct name.  C h a m a l  talk 14:45, 19 April 2009 (UTC)

thank you (Off2riorob (talk) 14:49, 19 April 2009 (UTC))


 * Pages in the image (and category namespace) cannot be moved. To change the name of an image, you need to upload it again, and copy the image description. After you do that, mark the image for deletion by placing db-g7 on it. However, it appears that you are marking this as in the public domain. If it is, please do not upload it here at all. Instead, upload it to the Wikimedia Commons, so that all Wikimedia projects have access to the image. Simply sign up for an account there (note that there is no autoconfirmed wait time there as there is here in order to upload an image). Cheers.--Fuhghettaboutit (talk) 14:57, 19 April 2009 (UTC)


 * right! will do. (Off2riorob (talk) 15:03, 19 April 2009 (UTC))


 * Hmmm... I was under the impression that this was possible, until Fuhghettaboutit apparently corrected that. I thought this was enabled by a recent software change... I seem to remember even reading a signpost article or something similar on this. Do we have any more info on this?  C h a m a l  talk 15:16, 19 April 2009 (UTC)


 * Right, answer found: Wikipedia Signpost/2009-03-23/Technology report. Apparently it was enabled and then disabled because of bugs. I should keep up with the news. I deserve a trout for that :P  C h a m a l  talk 15:22, 19 April 2009 (UTC)

Don't punish yourself too harshly, all well that ends well. (Off2riorob (talk) 15:33, 19 April 2009 (UTC))

Making a new page!!!!??
Hw do I do It??? —Preceding unsigned comment added by Clay Bolton (talk • contribs) 15:21, 19 April 2009 (UTC)

Before creating an article, please search Wikipedia first to make sure that an article does not already exist on the subject. Please also review a few of our relevant policies and guidelines which all articles should comport with. As Wikipedia is an encyclopedia, articles must not contain original research, must be written from a neutral point of view, should cite to reliable sources which verify their content and must not contain unsourced, negative content about living people. Articles must also demonstrate the notability of the subject. Please see our subject specific guidelines for people, bands and musicians, companies and organizations and web content and note that if you are closely associated with the subject, our conflict of interest guideline strongly recommends against you creating the article.

If you still think an article is appropriate, see Help:Starting a new page. You might also look at Wikipedia:Your first article and Wikipedia:How to write a great article for guidance, and please consider taking a tour through the Wikipedia:Tutorial so that you know how to properly format the article before creation.

 tempo di valse  [☎]  15:32, 19 April 2009 (UTC)

OK assuming all that is complete ,, where will i actually find where I write and upload pictures etc?? —Preceding unsigned comment added by Clay Bolton (talk • contribs)


 * Well, to start writing, simply type the title of the article you'd like to write in the search box at the top-left area of your screen (you might have to scroll up to the top to see it). Then click "Go". If the article you want to create doesn't yet exist, then you can click the "create this page" link in the search results, up at the top. Then, an edit window will appear. Type away! When you're finished, just click the "save page" button. For inserting images, check out this link, which should be of help: IMAGE. Hope this helped.  tempo di valse  [☎]  16:11, 19 April 2009 (UTC)

Creating a new case under the Wikipedians by alma mater category.
I have tried now on two occasions, and have not been able to add to the category Wikipedians by alma mater. The goal is addition of the California State University, Fullerton to the category. I have not been successful in locating instructions for this process. I will be glad to do the work, if I can determine how it should be performed. William R. Buckley (talk) 15:48, 19 April 2009 (UTC)
 * I believe this is done by pasting in to the page. I have done this by looking at the related categories. I went ahead and made the edit, let me know if this is what you wanted.  TN X Man  16:10, 19 April 2009 (UTC)

Reliable sources
Are these sources ok to use for factual citations. The Red Peacock (talk) 17:00, 19 April 2009 (UTC)

http://www.highbeam.com/doc/1P1-72035237.html www.huffingtonpost.com/


 * Please ask over at Reliable sources/Noticeboard. ---— Gadget850 (Ed)   talk 17:02, 19 April 2009 (UTC)


 * Highbeam usually doesn't list full articles unless you've signed into an account. Whenever possible, it's better to find the original source they took it from. - Mgm|(talk) 18:02, 19 April 2009 (UTC)

How to remove an "Additional Citations Needed" tag???
I today edited an article that was tagged "additional citations". Article on Frank Wu. The citations were readily available and took maybe a half hour to find and edit Is it now MY responsibility to remove the TAG?, or does someone else do that? VentnorNJ (talk) 21:15, 19 April 2009 (UTC)
 * No-one has a responsibility to remove such tags. If you think the tag is no longer appropriate, feel free to remove it. Algebraist 21:17, 19 April 2009 (UTC)


 * If you referenced all, or most of the article - then you might as well remove it - if further facts remain unreferenced then leave it. As said above - its not your responsibilty though.FengRail (talk) 00:33, 20 April 2009 (UTC)

complete subst
Resolved. Hi, what is the quickest way to completely subst nested templates to the corresponding wikicode? Repeated subst->show changes->subst gets tedious pretty quick. Thanks in advance. decltype (talk) 21:35, 19 April 2009 (UTC)
 * I think you want Special:ExpandTemplates. Algebraist 21:39, 19 April 2009 (UTC)
 * That may very well be the case. Thanks! decltype (talk) 21:44, 19 April 2009 (UTC)

Where's the tag you use when you see someone who is apparently the subject of the biography editing his/her own biography?
I know I've seen this tag before. I frequently see this issue on my copyediting rounds, and while it isn't always a big deal, sometimes it seems worth it to note, especially in article where there are neutrality or other content-related issues. I had the tag at one point, but can't find it. While I'm on that topic - is there a main list of the various tags that produce banners for articles? Thanks in advance.--Levalley (talk) 22:04, 19 April 2009 (UTC)
 * I believe you mean COI. I think Template messages is the best answer to your second question, unless you want to look directly at the templates transcluding Ambox. Algebraist 22:09, 19 April 2009 (UTC)


 * There's also autobio. decltype (talk) 22:18, 19 April 2009 (UTC)


 * And there's autobio-warn as well, for warning people who seem to be editing their own biographies.  tempo di valse  [☎]  22:30, 19 April 2009 (UTC)

editing errors
I tried to edit the tiny reference you had for the Auto Red Bug page. I wanted to check wikipedia for a full article, but none was available. Then I googled "the red bug" and found genuine articles online.

It is worthy of a full article, but as I went in to perhaps enhance or add a reference, I well... apparently didn't do it right and it got flagged. I am now rather disenfranchised with the idea of being a wikipedia editor/contributor since it is about as easy as the last time I tried to be involved in a wiki. I had forgotten how frustrating they were unless you practically think in html or whatever wiki's are formatted in.

sorry if I messed things up and I don't think I'll be bothering you again. I was just trying to add the link

http://www.oldwoodies.com/feature-redbug.htm

For folks that wanted to reference "the red bug" or any such. Fun little info, but not much worth trying to learn to edit a wiki even if I can write etc.

Sorry if I irritated anyone. —Preceding unsigned comment added by Stopher Hambone (talk • contribs) 23:00, 19 April 2009 (UTC)


 * I've fixed the edit at Auto Red Bug for you. I appreciate it can be difficult to edit pages at first if you're unfamiliar with Wikipedia. You might be pleased to know that there is a project to improve usability at Wikipedia so hopefully it would be easier to edit Wikipedia in the future. For now, you might be interested in reading Cheatsheet which should give you the basic information on how to edit pages if you change your mind and return to Wikipedia. Tra (Talk) 23:29, 19 April 2009 (UTC)
 * I would be very curious to know exactly how the usability project could help in this instance. --Teratornis (talk) 23:41, 19 April 2009 (UTC)

Auto-assessing bots
Resolved. Does anyone know if there is still a bot that will auto-assess any articles that use stub templates? I thought there was one but can't quite find it. -Optigan13 (talk) 23:11, 19 April 2009 (UTC)
 * You might find something in WP:EIW or WP:EIW. --Teratornis (talk) 23:43, 19 April 2009 (UTC)
 * This search might provide some clues. --Teratornis (talk) 23:44, 19 April 2009 (UTC)
 * Thanks, I found a few options. -Optigan13 (talk) 00:38, 20 April 2009 (UTC)