MediaWiki talk:Robots.txt

Some suggestions
Perhaps these should be included?


 * Biographies of living persons/Noticeboard
 * Conflict of interest/Noticeboard
 * Fringe theories/Noticeboard

Along with the associated talk pages and all archives? rootology ( C )( T ) 13:17, 13 September 2008 (UTC)


 * Why Fringe? The usual reason for search exclusion is potential harm to real life identified people.  Can you give example where googling people is connecting back to Fringe?  Dragons flight (talk) 15:36, 13 September 2008 (UTC)

Administrator's Noticeboard
I strongly disagree with the inclusion of all of AN. Very few AN discussions involve people identifiable in real life. In previous discussions I asked for examples where a Google search on just someone's name showed a major result from AN (top 20 or so) and no one could give me any examples. Being able to search AN is useful, and I don't believe there is sufficient evidence that content on AN creates harm for people in order to justify this inclusion. Dragons flight (talk) 15:21, 13 September 2008 (UTC)


 * Indeed someone tried to do this previously through another method and it was rejected. WP:AN discusions are often important latter on people need to be able to find them even if not certian of their locations in the archives.Geni 21:25, 13 September 2008 (UTC)

User and user talk
Can you disallow my user and user talk? NonvocalScream (talk) 16:40, 13 September 2008 (UTC)
 * Please just use on them.  Also note, user-talk pages are automatically noindexed. - Rjd0060 (talk) 16:42, 13 September 2008 (UTC)
 * So, if User:Geo Swan, or User:Geo Swan/Guantanamo had a, is that supposed to automatically apply to all subdirectories?  Geo Swan (talk) 00:10, 11 December 2009 (UTC)
 * Most likely.--174.53.247.29 (talk) 17:28, 13 August 2012 (UTC)

Bugzilla entries
Is there a reason to keep the bugzilla links? As far as I can see, they are in the original robots.txt to show why various entries have been added and who requested them, etc. Here we don't need to file bugzilla reports, of course, we can use the talk page or just edit the page ourselves, so I don't really see why we should keep those links. We should probably use the comments to explain why the various pages are in the list instead. --Conti|✉ 01:25, 15 September 2008 (UTC)
 * I think they should be kept for the same reason that we have edit history and edit summaries, and talk page discussion histories. The links point to those bugzilla discussions. I think it would be a mistake to remove that history. They're references to reasons for the additions. (Which may be useful for future discussions.) - jc37 01:45, 15 September 2008 (UTC)
 * Hmm, couldn't we just link to the original robots.txt instead, then? My point is that this page will be edited again and again, and soon enough those bugzilla entries say one thing, and our local robots.txt says another thing. http://bugzilla.wikimedia.org/show_bug.cgi?id=12111 is, for example, about the German de:Wikipedia:Checkuser, and has nothing to do with what can be seen in the local list below the bugzilla URL. --Conti|✉ 13:29, 15 September 2008 (UTC)
 * This page goes in place of the default robots.txt. Mr.Z-man 16:18, 15 September 2008 (UTC)

Arbitration Committee Elections December 2008/Vote
This page (and its subpages and related pages) are being indexed by Google and probably shouldn't be. --MZMcBride (talk) 00:56, 8 September 2009 (UTC)

WikiProject_Deletion_sorting?
Does WikiProject_Deletion_sorting need to be here? It only contains current and very recently closed deletion discussions, and I was surprised when I couldn't find one by Googling for i.e. 'deletion sorting china'.--Apoc2400 (talk) 23:02, 10 January 2010 (UTC)

Update and addition
editprotected Could someone update TFD as it has been renamed to "Templates for discussion" (though I'm not sure it's really needed, as templates hardly would end up as the number 1 Google hit for some person) and add Files for deletion and Possibly unfree files, where I see more danger than in templates. An image about oneself might be something to avoid in Google results. However, file deletion discussions don't seem to be too popular (e.g. they weren't in Xfd today until recently), so no one has added them yet. In this table, you'll find the new syntax:

Thank you, --The Evil IP address (talk) 20:27, 4 April 2010 (UTC)
 * ✅, as well as the talk pages. Nakon  21:12, 4 April 2010 (UTC)

Syntax highlighting
Could you replace the  tag with  20:04, 1 March 2015 (UTC)
 * No kidding. And nobody suggested otherwise; in fact, the opposite. :-) --MZMcBride (talk) 22:41, 1 March 2015 (UTC)

Exclusion of sandbox content...
It is noted that a number of users sensibly use their userspace to develop article drafts and to create sandbox content for test edits.

It would therefore be appreciated if consideration be given to adding such sandboxes and drafts to the exclusions here.

The alternative is to place a user sandbox or userpace draft manually, which I've been informed upsets people who like to trest their userspace with a degree of privacy. Sfan00 IMG (talk) 12:02, 26 April 2015 (UTC)

Disallow /?title=
Lately a large part of my Google searches give url's like https://en.wikipedia.org/?title=Denmark and https://en.wikipedia.org/?title=Woman while our preferred /wiki/ url is not listed. I guess it's removed as duplicate content of /?title=. Is it possible to disallow /?title= without a big risk of not having it replaced by another url like /wiki/ ? So far I only see /?title= for en.wikipedia.org so I haven't posted to meta:MediaWiki talk:Robots.txt. PrimeHunter (talk) 15:02, 22 June 2015 (UTC)


 * I've been unable to reproduce. Is this still an issue? Mdann52 (talk) 11:47, 5 October 2015 (UTC)
 * I haven't noticed it in a long time and couldn't reproduce now so I guess Google has either fixed it or their bot no longer finds such url's. The specific search claims "About 939,000 results" but there are only 55 when the last result page is clicked.  doesn't seem to work. PrimeHunter (talk) 12:06, 5 October 2015 (UTC)

Protected edit request on 5 July 2015
Per Village pump (proposals)/Archive_126, there is a consensus to disable indexing for userspace. This is easiest done by adding  immediately below the last entry in the list.

Thanks,

Mdann52 (talk) 10:40, 5 July 2015 (UTC)


 * Ehm, that seems rather drastic to me... Also note that NOINDEX != Disallow, these days. Dan, any thoughts on this ? —Th e DJ (talk • contribs) 22:53, 5 July 2015 (UTC)
 * For changing an entire namespace, it's better to change wgNamespaceRobotPolicies of the server config btw. Then you can set noindex, follow for instance. For that file a phabricator ticket. —Th e DJ (talk • contribs) 23:11, 5 July 2015 (UTC)

Thanks for the ping, TheDJ! Is the indexing of user space a recent change? In the distant past I added the __INDEX__ magic word to my user page since I didn't mind having it indexed by search engines, which would imply that something's changed since then. If there's some other cause of this then it'd be good to know what it is rather than piling on quick hacks on top of some other problem, as this doesn't seem emergent enough to require immediate action. I'll start a thread on wikitech-l to see if anyone knows. With respect to this specific request, I have no issue with this request as the __NOINDEX__ magic word doesn't affect our search functionality at all, so you can still patrol the projects in that way. Per TheDJ's recommendation, we can do this via a configuration change; I can have an engineer in the Search Team take a look at that after some initial investigation is performed. --Dan Garry, Wikimedia Foundation (talk) 17:11, 6 July 2015 (UTC)
 * I've tracked this in phabricator as T104797. Mdann52 (talk) 17:15, 6 July 2015 (UTC)
 * I've started a wikitech-l thread to whether something's changed on our end. --Dan Garry, Wikimedia Foundation (talk) 17:33, 6 July 2015 (UTC)

Comment vs code (Wayback Machine)
Isn't this bit treated as a comment rather than a rule since it's preceded by #'s?

86.90.39.63 (talk) 22:32, 3 October 2015 (UTC)
 * Yes. It seems to be disabled on purpose.  22:36, 3 October 2015 (UTC)
 * There is a patch for review to change it to this:

User-agent: archive.org_bot Disallow: /wiki/User: Disallow: /wiki/Benutzer:
 * See T104949 and gerrit diff. PrimeHunter (talk) 22:54, 3 October 2015 (UTC)

Protected edit request on 26 February 2016
YO

1.23.216.65 (talk) 19:27, 26 February 2016 (UTC)
 * malformed request. — xaosflux  Talk 19:59, 26 February 2016 (UTC)

immediate edit request
The Disallow: /wiki/Wikipedia:Long_term_abuse section needs to be updated to refer to Long-term abuse due to a mass rename of the entire project years ago. The difference is a hyphen, but search engines are now picking up on the reports which previously were excluded. The same goes for the Disallow: /wiki/Wikipedia:Abuse_reports/ section, which was renamed to Abuse response years ago. It might be better to keep both the old and new names, because there are some straggler subpages on both names. The corresponding talk pages and subpages would also need to be updated. Pteroinae (talk) 07:08, 3 April 2016 (UTC)
 * ✅ — xaosflux  Talk 03:00, 4 April 2016 (UTC)
 * Sorry, forgot my first account's password. Abuse reports is also listed on the NOINDEX page, but that project too has been renamed to Abuse response and some of the reports were moved and some weren't. As with Long-term abuse, it seems two sets of entries are needed because there are some straggler subpages. Thanks. Pteroinae alternate (talk) 06:22, 7 April 2016 (UTC)

Split for discussion
Could we also perhaps add Wikiquette assistance and its subpages/talkpages to NOINDEX because it's materially similar to the other noticeboards already NOINDEXed and could out/pose a privacy concern to those being discussed (or were being discussed, since the place is inactive)? Pteroinae (talk) 07:08, 3 April 2016 (UTC)
 * I've split this for further discussion - a community consensus must be demonstrated first. — xaosflux  Talk 03:00, 4 April 2016 (UTC)
 * ✅ — xaosflux  Talk 23:05, 10 April 2016 (UTC)

Disallow: /wiki/Wikipedia:Archive.is_RFC_4
Could you please add

Disallow: /wiki/Wikipedia:Archive.is_RFC Disallow: /wiki/Wikipedia_talk:Archive.is_RFC

These RFC pages mistakenly not placed under already disallowed folders;

Disallow: /wiki/Wikipedia:Requests_for_comment/ Disallow: /wiki/Wikipedia_talk:Requests_for_comment/

PS. I added RFC_5. There is no such page yet, it is just to avoid the extra work when it will be created.

PPS. I read the spec about robots.txt and removed lines like "Disallow: /wiki/Wikipedia:Archive.is_RFC_4", "Disallow: /wiki/Wikipedia:Archive.is_RFC" should cover all pages with this prefix including "Disallow: /wiki/Wikipedia:Archive.is_RFC_4". Only two lines needed. — Preceding unsigned comment added by 78.139.174.106 (talk) 13:30, 26 May 2016 (UTC)
 * How about just MOVING these pages? — xaosflux  Talk 17:24, 26 May 2016 (UTC)
 * It would be not enough. The pages have been moved, but old (indexable) location still have the content, not the redirect.
 * Although new location is protected by robots.txt and you are not able to archive the page using new URL: http://web.archive.org/save/https://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment/Archive.is_RFC
 * You are still able to archive the page this way: http://web.archive.org/save/https://en.wikipedia.org/wiki/Wikipedia:Archive.is_RFC
 * Ironically, the Wikipedia servers does not redirect robots to the new location of the MOVED pages, it serves them with the same content on both old and new locations. Only meat users are redirected.
 * In the robot's eye it is no MOVE but COPY.
 * 78.139.174.106 (talk) 17:58, 26 May 2016 (UTC)


 * 1) . Look, ma: I created a redirect to /wiki/MediaWiki:Spam-blacklist on a talk page https://en.wikipedia.org/w/index.php?title=User_talk:178.137.146.212&redirect=no
 * 2) . IP's talk page have NOINDEX.
 * 3) . /wiki/MediaWiki:Spam-blacklist are protected in robots.txt
 * 4) . The resulting page https://en.wikipedia.org/wiki/User_talk:178.137.146.212 has all the content of /wiki/MediaWiki:Spam-blacklist and it does not have NOINDEX and not protected by robots.txt. You can archive it: http://web.archive.org/save/https://en.wikipedia.org/wiki/User_talk:178.137.146.212 or submit to Google, whatever. 178.137.146.212 (talk) 04:26, 27 May 2016 (UTC)


 * The RFC pages could add NOINDEX but there was disagreement about that at Requests for comment/Archive.is RFC 4. I added NOINDEX to User talk:178.137.146.212 [//en.wikipedia.org/w/index.php?title=User_talk:178.137.146.212&diff=722324558&oldid=722291922] but that apparently doesn't prevent indexing when it redirects to a page without NOINDEX. PrimeHunter (talk) 11:08, 27 May 2016 (UTC)


 * You are absolutely right, NOINDEX (placed on the content pages, not on pages with #REDIRECT) is the solution technically. NOINDEX has been on those pages for years. But yesterday an admin removed the NOINDEX from all the page for ungrounded reason: Wikipedia_talk:Requests_for_comment/Archive.is_RFC_4, Requests_for_comment/Archive.is_RFC_4. If I put NOINDEX back she or he undo my changes. 78.139.174.106 (talk) 14:40, 27 May 2016 (UTC)


 * I tend to think, that even robots.txt is not a solution here. Even with those lines added to robots.txt, one is still able to create a redirect page (or a huge farm of such pages) in her or his user space and thus circumvent robots.txt. The solution expected to be in fixing MediaWiki code, as using redirect for circumvent robots.txt makes many of the solutions above futile: you wanted to prevent User_talk:Jimbo_Wales from archiving on Wayback Machine? anyone can create an indexable redirect (actually, not redirect, but live mirror in the robots's eyes) page and save it instead. 78.139.174.106 (talk) 14:46, 27 May 2016 (UTC)

Google thinks it's cute, we need to blacklist Wikipedia%3AArticles_for_deletion%2F
Google results for "Sarah Beck Mather" brings up this link: Sarah Beck Mather, which is technically not disallowed, because the slash is escaped. If I am not reading this correctly, I'd like a pointer to what's actually happening, but if I'm right, please add %2F counterparts for the appropriate rules with slashes.

Note: This was brought up in the #wikipedia-en-help channel on IRC, and while the article is now blanked (thanks, User:DragonflySixtyseven), other AfDs may be indexed in this manner, against our wishes.

Thanks! --MarkTraceur (talk) 16:20, 8 December 2016 (UTC)


 * I think you are absolutely right, and other AfDs that are indeed getting indexed in this manner. Mz7 (talk) 22:54, 31 December 2016 (UTC)
 * With that being said, I'm not quite sure if simply disallowing  would fix this, however. In my admittedly very basic understanding of this, by escaping the slash, we are now referring to the page   as a subpage of , instead of   as a subpage of  . In other words, for this to work, we would have to add   to the robots.txt in order to pull it from Google. It would be easier if the MediaWiki developers could somehow prevent our URLs from being able to escape that slash with a %2F. Mz7 (talk) 23:15, 31 December 2016 (UTC)

I was just coming here to say/note the same. This search has the result:

Wikipedia:Articles for deletion/Anil Dash - Wikipedia https://en.wikipedia.org/wiki/Wikipedia%3AArticles_for_deletion%2FAnil_Dash This page is an archive of the discussion about the proposed deletion of the article below. This page is no longer live. Further comments should be made on the ...

We currently specify these lines:

Disallow: /wiki/Wikipedia:Articles_for_deletion/ Disallow: /wiki/Wikipedia%3AArticles_for_deletion/

These lines do not match "Wikipedia%3AArticles_for_deletion%2FAnil_Dash".

Do we care about the root page (i.e., Articles for deletion) being indexed? If not, we could just remove the trailing slashes from these two rules, which would then catch the Anil Dash case and others.

Otherwise, we'll need to add more permutations to the list of disallow directives, which is kind of gross. In either case, we need to act here. --MZMcBride (talk) 16:26, 9 January 2017 (UTC)
 * Should we change the second line to:

Disallow: /wiki/Wikipedia%3AArticles_for_deletion%2F
 * ? Legoktm (talk) 02:07, 10 January 2017 (UTC)
 * I feel like changing the second line as you suggest will just result in future issues for "/wiki/Wikipedia%3AArticles_for_deletion/". We don't aggressively normalize these URLs.
 * We could manually mark pages such as Articles for deletion/Anil Dash as noindex with a bot/script. And/or we could change MediaWiki to programmatically mark all pages with a specified prefix as noindex in their HTML outputs. (We already do this at a namespace level, but we could do it at a page title prefix level as a step further.)
 * I don't think there's much value in indexing the root Articles for deletion page. I think the simplest solution is to remove the trailing "/"s from the existing rules. The harm in having a single false negative is surely outweighed by the harm of having many false positives. --MZMcBride (talk) 04:14, 10 January 2017 (UTC)
 * OK, done. I suppose should do the same for the rest of the XFD types? What about all the other rules? Legoktm (talk) 04:29, 10 January 2017 (UTC)
 * In skimming the list again, I think it's fine to remove all the trailing slashes. In some cases, such as "Disallow: /wiki/Wikipedia:Copyright_problems", we've already done this. For the cases where we're currently including the trailing slash, for example "/wiki/Wikipedia:Neutral_point_of_view/Noticeboard/", I don't think there's any real value in indexing the root noticeboard page. In cases like this, by including the trailing slash, we're actually allowing the current content to be indexed. If we've decided to not index the archives and other subpages of noticeboards, I don't really see how encouraging indexing of the current open topics makes sense. This is also true of cases where the root page transcludes content from subpages. --MZMcBride (talk) 05:58, 10 January 2017 (UTC)

Hi Od Mishehu and Legoktm and any other passing admin. Thanks for the recent edits. Can someone please remove the trailing slashes from the other rules? I'm worried about cases like this search, which have  in the results. --MZMcBride (talk) 05:11, 11 January 2017 (UTC)
 * I think I got them all. Lemme know if you need anything else ^demon[omg plz] 01:50, 12 January 2017 (UTC)
 * Cool, thank you. We may still see issues at some point with pages such as Reliable sources/Noticeboard in search results, if the "/" gets converted to "%2F", but I'm not sure the risk is worth adding more variant directives. --MZMcBride (talk) 07:05, 12 January 2017 (UTC)

Archive Team's view on robots.txt
Hi. I found this piece interesting: . --MZMcBride (talk) 07:06, 12 January 2017 (UTC)

Protected edit request on 30 January 2017
My page Ujwal Ghimire needs indexing so search engines find it. Please add Indexing. Thanks --Rohkum (talk) 18:36, 30 January 2017 (UTC) Rohkum (talk) 18:36, 30 January 2017 (UTC)
 * Red information icon with gradient background.svg Not done: The article is not noindexed due to Robots.txt. —&thinsp;JJMC89&thinsp; (T·C) 19:58, 30 January 2017 (UTC)

Sandbox modules
Please add: Disallow: /wiki/Module:Sandbox Disallow: /wiki/Module%3ASandbox Unlike normal templates, Scribunto modules only work in the Module namespace, so what would otherwise be created in the User namespace get created under Module:Sandbox/. Nardog (talk) 10:28, 2 January 2019 (UTC)
 * ✅ — xaosflux  Talk 16:40, 2 January 2019 (UTC)

Also add: Disallow: /wiki/Template:TemplateStyles sandbox Disallow: /wiki/Template%3ATemplateStyles sandbox for a similar reason. Nardog (talk) 05:30, 6 September 2020 (UTC)
 * Added that in the same section. Jo-Jo Eumerus (talk) 06:55, 10 September 2020 (UTC)
 * Thanks—and, my bad, it should have been underscores instead of spaces ( →  ). Apologies for the inconvenience. Nardog (talk) 11:09, 12 September 2020 (UTC)
 * Done, thus. Jo-Jo Eumerus (talk) 11:16, 12 September 2020 (UTC)
 * Thanks! Nardog (talk) 11:19, 12 September 2020 (UTC)

Protected edit request on 19 August 2020
Please add  and   per Village_pump_(proposals)/Archive_169. (I know the discussion is from a month ago, but I did not know if there was consensus for the move, but looking a second time, it appears that there is a rough consensus to deindex article talk pages.) Aasim 05:57, 19 August 2020 (UTC)
 * ❌ not doing this, for many reasons.  For something extremely broad like this: that RfC was never really "closed"; it was also not well-attended; it was not well-advertised; finally - entire namespace indexing control should be done with meta tags and the   parameters - which will require a phab request - which will also be requiring a well-attended, strongly supported discussion. —  xaosflux  Talk 14:15, 19 August 2020 (UTC)

Syntax validator in comments
Someone may want to remove the syntax validator URL from the comments, as it now redirects to a completely different site. Trivialist (talk) 16:06, 29 May 2021 (UTC)
 * ✅ — xaosflux  Talk 18:14, 29 May 2021 (UTC)

COIBot report
COIBot is creating reports related to spamming/link abuse which are currently NOINDEXed by the addition of a template. User:Asartea suggested to have them added here, therefore: can the following 4 pages and subpages of them (thousands of reports) be NOINDEXed through robots.txt please: WikiProject Spam/COIReports, WikiProject Spam/LinkReports, WikiProject Spam/UserReports, and WikiProject Spam/PageReports? Dirk Beetstra T C 12:48, 23 January 2022 (UTC)
 * Is this such a good idea? I thought that robots.txt, unlike NOINDEX, doesn't prevent the page from showing in Google search results. It only prevents the content from showing, but there will be still be a link. The owner of someinnocentsite.com is not going to want to be associated with "Spam". From : Google can't index the content of pages which are disallowed for crawling, but it may still index the URL and show it in search results without a snippet Suffusion of Yellow (talk) 20:10, 23 January 2022 (UTC)
 * Ah had missed that was the case. In that case its probably best to indeed continue to use NOINDEX (although maybe via another template), although we do use Robots.txt for say XfD -- Asartea   Talk  &#124;  Contribs  20:52, 23 January 2022 (UTC)
 * FWIW I can't find a single example of a Google search actually showing an AFD discussion. I could have sword this came up before, though. Suffusion of Yellow (talk) 21:11, 23 January 2022 (UTC)
 * hmm, OK. I've had in the past reports showing up in Google, even while we actually already have the path WikiProject Spam in robots.txt.  That was because some reports did not have NOINDEX (they, as SoY says, show up without snippet).  I've once or twice asked google (after NOINDEXING the report) to remove the report from their results.  Maybe better as is (but I am willing to consider another template - User:COIBot/noindex) from the future - note that it may need a change in the code of the bot, not sure if it is all regulated through m:User:COIBot/Settings and User:COIBot/Settings - but that is quick enough to figure out). Dirk Beetstra T  C 05:29, 24 January 2022 (UTC)
 * But if robots.txt forbids access, how can the GoogleBot (or any well-behaved bot) even see the  in the NOINDEXed page? It's not supposed to access it. Suffusion of Yellow (talk) 20:46, 24 January 2022 (UTC)
 * Yeah thats a good question, shouldn't the existing Wikiproject:Spam line just autoforbid the COIBot pages anyway? -- Asartea   Talk  &#124;  Contribs  20:56, 24 January 2022 (UTC)