Wikipedia talk:Search engine indexing/NOINDEX of noticeboards

=Moved from WP:AN=

NOINDEX on various noticeboards and archives proposal
I have added NOINDEX to Template:Administrators' noticeboard navbox all. --Random832 (contribs) 03:36, 20 August 2008 (UTC)


 * I've removed it until we get consensus. At minimum if we are going to add anything like that we need to add a big fat pointer on the template that there is a search tool on the toolserver. And we need a hell of a lot of assurance that that tool won't go down as tools so often do. I suggest putting a discussion about this on AN rather than here which isn't as likely to be noticed. JoshuaZ (talk) 03:43, 20 August 2008 (UTC)


 * My intent was not to open a discussion. It's unacceptable for these to remain visible to google. And there IS a "big fat pointer on the template that there is a search tool on the toolserver", there are in fact four links to that tool; perhaps you noticed them when you made the edit? --Random832 (contribs) 16:43, 20 August 2008 (UTC)

(above was moved from WT:AN --Random832 (contribs) 16:45, 20 August 2008 (UTC))

I was thinking it may be a good idea to do this. There is a LOT of negative information on living people by name liberally over the years scattered across these--sometimes just plain bad, sometimes in good faith discussions, but all the same findable by search engine. Yes, I know that our internal search somewhat sucks still, but the benefit of our own searching isn't as valuable as not screwing people by their names being found in negative connotations on this site buried in some archive. If the search function is too busted for some, we have lots of very skilled people that can fix it if they wanted to spend the time on it. So, simple proposal. on every notice page plus archives/talk on the header today:

Administrators' • Incidents • ArbCom enforcement • Biographies • Conflict of interest • Ethnic and cultural conflicts • Fiction • Fringe theories • Neutral point of view • Original research • Reliable sources

At a dead minimum, the ones in "red" to start as they're most likely to touch on BLPs. rootology ( T ) 13:36, 20 August 2008 (UTC)
 * I'd agree to do this, but I think this proposal would be better at the Village Pump. At least make a note there, if you haven't already. - Rjd0060 (talk) 14:03, 20 August 2008 (UTC)
 * WT:AN pointed people here, but a reference at VP wouldn't be bad either.  MBisanz  talk 14:05, 20 August 2008 (UTC)


 * I think we need to get the gorram search function fixed before we start noindexing the whole place. – xeno  ( talk ) 14:09, 20 August 2008 (UTC)
 * Agree strongly with Xeno. I have no objections to no-indexing if a) we have a working search function b) can guarantee that tool will stay functioning and c) add a prominent note at the top of the relevant pages about how to search for people who aren't aware of it. b is the easy step. a and b need to happen first. JoshuaZ (talk) 15:32, 20 August 2008 (UTC)


 * NOINDEX them all, and add the Community Sanction Noticeboard to the list. I had added the NOINDEX tags to them last night, but have since been reverted because it "makes it harder to search the archives along and reduces general levels of transparency." This response indicates a need for improving the internal search function, and has nothing to do with transparency.
 * We are depending on Google and other external search engines to do work that needs to be internal. Can you imagine any other responsible organisation using a publicly available search function to document concerns about clientele (in our case, subjects of articles) or personnel (in our case, editors)?
 * On a daily basis, editors complain about "incivility" on any number of noticeboards and talk pages. People get blocked, sometimes even banned, for saying unkind things about other editors (or in some cases, about subjects of articles); we are told that the validity of their words does not excuse the lack of "civility"...and yet we as a community do not apply the same standards to the encyclopedia.
 * There is an attitude amongst many individuals that people who get banned or blocked "deserve" to be named and shamed publicly, and it is the blocked/banned individual's "fault" that Google searches turn up pages suggesting they behaved unacceptably on a top-10 website. The veracity or validity of the complaint is irrelevant to whether or not these posts are searchable outside of Wikipedia.
 * The real life identity of a very significant segment of our editing population is easily linked to their Wikipedia activities, either directly (real-life name as username) or indirectly (by making real-life name available on userpage, etc.) Few of these individuals made that information available expecting to be publicly castigated for failing to follow the rather arcane behavioural rules of Wikipedia. We keep these complaints about editors on pages that often rank highly in search engines despite the fact that many of them relate to editors who are easily identifiable in real life.
 * This information is available to current and future employers, colleagues, clients, police and other security forces, and so on. Is this the kind of thing we want to have following our teenage editors who go on to mature behaviour? Is this what should happen to academics who have spent years in the parry-and-thrust of more direct debate than is permitted by our "civility" policy? Do we want people to be branded "troublemakers" in the outside world because they just don't fit in here?
 * Discussions assessing the "verifiability" of negative information about the subjects of our articles are spread all over the place, and again are searchable outside of Wikipedia.
 * For whom are we trying to make things transparent? Our editors? The information is searchable within Wikipedia already; if people can't find it, improve the search function or help them learn how to use the current one. (I have never had to resort to Google to find information on Wikipedia, and I am hardly a genius when it comes to searching.) Why does the world at large need to know that User So-and-so was blocked for being rude to User Such-and-such, after a 20kb discussion on some noticeboard? It has nothing to do with the quality of the product - the encyclopedia.
 * Our current system highlights the negative editor information (messages on user and user talk pages, noticeboards, etc) over and above any positive editor information (contribution histories, key articles, etc.). It's time that we as a community model the behaviour we expect from our editors. With indexing of noticeboards, our behaviour management process includes promotion of pejorative information about individual editors; we know these pages are highly ranked but we allow them to be widely available, despite the fact that individual editors are frequently blocked/banned for identical behaviour.
 * Summary - Fix the problem - our internal search function - instead of publicly smearing the subjects of our articles and the editors who produce them. --Risker (talk) 15:58, 20 August 2008 (UTC)
 * Agree with Risker (and with several others further up); NOINDEX now, and work on fixing the internal search next. Personally, I am of the belief that the only pages in Wikipedia that should be indexed are article pages and category pages. Everything else is internal workings that does not need to be catalogued by Google/Yahoo/whatever search engine.  Horologium  (talk) 16:05, 20 August 2008 (UTC)
 * It will be incredibly damaging to internal functioning if we don't have a search function. I agree completely with the sentiment but it isn't acceptable unless we have a search function. I also strongly object to Risker's claims that anyone here thinks that blocked editors "deserve" to be "shamed" This sis a straw-man argument which no one has ever claimed but is repeatedly brought up. JoshuaZ (talk) 16:10, 20 August 2008 (UTC)
 * We have a search function, JoshuaZ, it can definitely use improvement, but it does work and it does pull up everything I have ever looked for, including information on noticeboards. I was able to do a very indepth summary of evidence using information from noticeboards, for the Tango RFAR without once resorting to an external search engine. Having this information widely available is not necessary, even with today's search engine. Removing the ability to search externally will promote the improvement of our internal search function because it becomes a high priority. Risker (talk) 16:26, 20 August 2008 (UTC)
 * Addendum to reply to the other part of JoshuaZ's comment: Please look on this very page to the thread entitled "Greg Kohs aka MyWikiBiz" for some examples (and no, I have no opinion on whether or not he should be unblocked). There are others right here too, including the discussion of removal of a permission from a reformed but formerly blocked editor. This is the kind of stuff I am talking about. Should the discussions happen? Yes, I think so. Should anyone google searching for the name "Greg Kohs" get to this page or its archive (or the archives of those other pages listed in the thread)? No, they should not.  Risker (talk) 17:35, 20 August 2008 (UTC)
 * Also note that we have another search specifically for the noticeboards, which is linked in the navigation box. --Random832 (contribs) 16:41, 20 August 2008 (UTC)
 * Ah. I did not know about that. That seems to be functional for AN. That at least takes away the AN, ANI 3RR and CN archives but not the other noticeboards. I'd also strongly prefer that that link was much larger. In any event, I have no objection to putting the Noindex into the Template for the noticeboards. But we need a better search function to use it on the noticeboards other than those 4. (I also think we should wait to get a bit more input in general before taking this large a step) JoshuaZ (talk) 16:49, 20 August 2008 (UTC)
 * NOINDEX must come first to end the harm being done, and perhaps by preventing external search engines from indexing non-article spaces, Necessity will enter the scene trailed by her child Invention, and we will see in short order a leap in internal search functionality. What short-term difficulty some administrators may have with searching non-article space is far outweighed by the ethical obligation to reduce people's exposure to the distorting effects of search engine publicity. alanyst /talk/ 16:25, 20 August 2008 (UTC)
 * NOINDEX is far more important than improving the internal search function and should come first. Anything we really need to find can be found via search and what links here.  There is today no need to use external searches to find relevant internal data, it is merely a habit that many have acquired along the way.  If a specific location becomes challenging to search, have someone build or modify a toolserver tool.  GRBerry 17:06, 20 August 2008 (UTC)


 * Oppose. We shouldn't be broadly disabling useful functionality for the sake of a few identified people who might prefer Google had a little less to say about them.  I understand blocking AFDs and certain focused discussions, but blocking entire noticeboards goes too far for me.  Even if we have an internal search as good as Google (and let's be honest, we aren't there yet), I'd still want to maintain Google functionality for people who prefer that interface and the broader comparisons it allows.  Most of what is discussed at AN is not harmful to identifiable people, and of that portion which is, a significant fraction is no more harmful than they deserve (if someone is a consummate trouble maker all across the web, there is no reason for us to conceal that fact).  The discussions of identifiable people under circumstance that might well warrant redaction are sufficiently few and far between that I can't see how that justifies mangling the searchability of all the other noticeboard content.  This simply doesn't pass a balancing test of justification versus negative impact.  Dragons flight (talk) 17:29, 20 August 2008 (UTC)
 * "[A] significant fraction is no more harmful than they deserve ..." Thank you for proving my point, Dragons flight. We aren't here to punish people, even if they are complete jerks, or totally incompetent. We might block them or ban them because they cannot work within our system. Overtly and consciously publishing their misdeeds is the internet equivalent of The Scarlet Letter.  Risker (talk) 17:41, 20 August 2008 (UTC)
 * And your point? You seem to want to protect everyone, even complete trolls, from the justifiable and predictable consequences of their actions, but you have no concern for protecting the rest of the internet from them.  We don't go out of our way to publicly chastise people, but neither should we go out of our way to protect them.  Dragons flight (talk) 17:46, 20 August 2008 (UTC)


 * Comment So I'm not asking for links to BLP/copyvio/oversight material but where is the demonstrated harm that would cause us to want to eliminate a helpful means to search wikipedia? Protonk (talk) 17:31, 20 August 2008 (UTC)

I will be posting at length to this discussion tonight. At least, I would like to, if I can locate a particular location where the central discussion is located. Is it here or there or where? Newyorkbrad (talk) 17:54, 20 August 2008 (UTC)
 * This appears to be the central discussion. JoshuaZ (talk) 17:56, 20 August 2008 (UTC)
 * If you want a centralised discussion Brad, you'll have to make one. What to NOINDEX isn't just an admin issue.  Why not use What to noindex?  Wily D  17:58, 20 August 2008 (UTC)

Challenge

 * How about a challenge? Maybe I'm wrong about this issue, but before I might conceed that, I'd like a demonstration that a real problem exists.  Give me four or five examples of people who if you put their real name (and only their name) into Google then you get a negative noticeboard discussion within the first 20 or 30 hits.  If having these archives are really profoundly distructive then such examples ought to be easy to find right?  I've already tried several well known trolls and none had a noticeboard hit in their first 30 google hits.  Dragons flight (talk) 18:20, 20 August 2008 (UTC)
 * Dragons flight, hope you don't mind that I made this a separate heading. I thought you asked a fair question here, but that googling the names of known trolls wouldn't necessarily show the potential effects on an ordinary editor, so I googled a bunch of editors with rollbacker privileges who use their real life names; hypothetically, these would be average users with a pretty good record, and who had been active since January, or they wouldn't have had this permission. The selection was random and I didn't keep track of names, and I only used names where the user page gave some RL details so a potential RL searcher would be able to verify they were the same person. I only checked 20 names; I was starting to feel kind of creepy looking up fellow editors this way. Aside from discovering that way too many editors have myspace and LinkedIn and Facebook pages, I found there were quite a few mirrors of user pages out there. I looked only at the first 30 results. The key findings were as follows:
 * 20/20 included both en.wp user and user talk pages. A couple of them had questionable content, however one could argue that the content of these pages was within the control of the user.
 * 1/20 included another user's talk page, where the editor I was looking for was leaving a Level 4 warning for vandalism on a page relating to pornography.
 * 1/20 included the Arbcom RFC from this June/July. The user I was looking for had some heated comments on the page. As anyone who has read the page will know, there are plenty of heated and pointy comments made by and about other users, not to mention the Arbitration Committee as a body.
 * 1/20 included a user RFC about the editor in question.
 * Summary - this small random sample of users in good standing revealed a few instances that might reflect badly on the user but were within the user's control; one instance that could be perceived to reflect badly on the project as a whole; and one instance in which the content was likely to reflect negatively on the user. The porn warning sort of falls between "within one's control" and "reflects negatively", depending on who is doing the looking and how they interpret it.
 * I have to say I was genuinely shocked to see the user RFC so highly ranked and easily accessible. This is definitely not the kind of thing we want to make available to the general public, in my mind. Risker (talk) 04:02, 21 August 2008 (UTC)
 * Thank you Risker. I think this is a very helpful direction for the discussion.  With respect to user pages, I certainly agree that people should be responsible for their own user pages.  (Incidentally, that includes allowing them to add NOINDEX if they want to.)  On the User RFC issue, I think that is perhaps the best example of a place where applying NOINDEX across the entire class really would capture a large amount of contentious content, with almost certainly a lower false positive rate than even places like XfD.  So I would be happy to go along with filtering user RFCs.  I hope that others will build on this kind of analysis when identifying what areas of Wikipedia that are actually problematic generally versus places where problematic content only occassionally appears.  Dragons flight (talk) 19:59, 22 August 2008 (UTC)

=New Discussion= New discussion may start here. Protonk (talk) 18:24, 20 August 2008 (UTC)

noindex notice boards now, Comment - If we don't we (collectively) will drag our feet about fixing our internal search and tool issues. If we noindex now, then folks that search those pages a lot (arbs, checkusers, otrs helpers) will push the folks who like to develop tools to work on it more. --Rocksanddirt (talk) 18:50, 20 August 2008 (UTC)
 * I don't agree with the idea of disabling functionality in order to force a group of people to work on something. If this is based on consensus then those people should be able to just revert the tag placement and no build new search tools.  We can't force a change in the job queue or demand a dev change by this means. Protonk (talk) 19:10, 20 August 2008 (UTC)

A modest suggestion
I would like for everyone who needs to find something on the noticeboard archives in the near future to try the toolserver search engine first, then the internal search, and if they're unable to find what they're looking for in either, point out what deficiencies they noted in it, so that it can be improved. Like any software project, it needs direction - I myself just submitted a feature request for a boolean OR operator for search terms. Similarly, anyone searching for anything else should try the internal search first. --Random832 (contribs) 22:06, 20 August 2008 (UTC)
 * I do, usually. It's terrible, really. -- Relata refero (disp.) 20:53, 21 August 2008 (UTC)

The mailing list post that helped start this, and some current thoughts
The subject of "No-Indexing" various pages on Wikipedia has been raised in a variety of forums over a period of several years. "No-Indexing", in this context, means designating Wikipedia pages with a code such that they are freely readable, editable, and searchable within Wikipedia, but do not appear in search results on Google and other search engines. The forums in which various "No-Indexing proposals have been made have included on-wiki, Bugzilla, Wiki-related mailing lists, and external sites critical of Wikipedia (principally "Wikipedia Review"). A variety of justifications have been given for these proposals, and a variety of suggestions made as to their scope.  These have also been extensive discussions of the best technical means by which to implement the change, a matter outside my fields of expertise.

In April, I started a thread on the WikiEn-l mailing list to discuss this issue. Extensive discussion took place, but no conclusion was reached (in part because I was, for several months, forced to stop participating in Wikipedia and no one else was forcefully pressing the issue). In July, I reopened the discussion on the mailing list with the following post:


 * ''A couple of months ago, I raised on this list the issue of "no-indexing" Wikipedia pages outside the mainspace, principally including project-space pages such as XfDs, AN/ANI, RfA's, RfAr's, and the like, but possibly including userspace as well. By no-indexing, I refer to coding these pages such that they will not be picked up by Google or other search engines.


 * ''The desirability of this change has been noted by many people, including very experienced Wikipedians. As we all know, the popularity of Wikipedia and the intensive number of internal links means that when a Wikipedia page contains the name of a living individual, then unless the person is either extremely notable or happens to have a common name, that page will almost inevitably become a high-ranking, if not the highest ranking, search engine result for that individual.  This raises issues enough when the search result is a BLP or other mainspace article, but it is totally unacceptable when the high-ranking result destined to follow the individual around forever is something like:


 * ''An AfD deciding to delete an article about a person because of her perceived lack of any sufficiently notable or meaningful accomplishments in life (these can be courtesy-blanked on request, but how many subjects know how even to ask); or
 * ''An RfA, involving a contributor who happens to edit under his real name, which fails because the user was deemed unqualified for adminship; or
 * ''An arbitration case, in which an editor was severely criticized or even banned for violations of Wikipedia policy - regrettable, but not something for which it would serve any purpose to tar the person's RL reputation forever; or
 * ''A long and heated discussion in an ancient ANI thread, again involving a contributor who edits using her name, involving some ancient wiki-grievance long forgotten ... until the contributor applies for a scholarship or a job and someone Googles her name; or
 * ''An ArbCom election in which the user came in 17th place; or
 * ''An SSP report in which a user editing under a new name is indelibly linked to a username based on his real name, which he chose to abandon months or years earlier because of precisely these very concerns; or
 * ''A discussion on ANI noticeboard of defamatory or privacy-invading material in a BLP or other article, which it is rightfully decided to delete from the article itself ... except it remains preserved in the noticeboard discussion (I do see that this aspect of the problem has been addressed on the BLP noticeboard archives, but this type of discussion occurs on ANI and elsewhere as well); or
 * ''Various other places where these issues, involving both article subjects and Wikipedia contributors, continue to arise on a frequent basis.


 * ''It has been observed that being named on Wikipedia, whether for legitimate reasons or otherwise, has a powerful potential to damage a person's life. (See for example the BLP policy and its talkpage, the ArbCom decisions in RfAr/Badlydrawnjeff and RfAr/Footnoted quotes, or discussion on various criticism sites.)  As noted, this raises a troublesome enough suite of issues when the person in question has been accurately discussed in the encyclopedia itself.  It is really not acceptable when it occurs as a happenstance of an ancillary discussion of an article subject or of a contributor (even a misbehaving or a now-unwelcome contributor).


 * ''I have read more than enough complaints from people who have found themselves in many of the unfortunate situations I describe here. If they are Wikipedians, they sometimes come to rue the day they ever thought of contributing, much less contributing under a name linked to their real identity.  If they are article subjects with no particular connection to Wikipedia, they must surely find the situation maddening.  By comparison, the benefits to the general public of being able to read through internal Wikipedia discussions of this nature as the result of a casual Google search must be reckoned, at the best, as slight.


 * ''In the prior thread, I believe there was significant support for implementing coding necessary to cause "no-indexing" of projectspace and possibly userspace and other-space pages. The main counter-arguments were:


 * ''That some project-space pages DO warrant indexing. An example that was given was the notability policy or the BLP policy.  The solution to this is to have a "yes-index" feature that would override the no-index code on a particular project-space page where indexing was agreed to be affirmatively desirable.  Community discussion could come up with a list of those particular pages in a week or so.
 * ''That Wikipedia currently lacks a top-quality internal search capability, and therefore we need to be able to use external search engines such as Google to perform administrator functions and the like. There is some merit to this observation; I certainly have used Google to hunt down references I remembered when I was writing arbitration decisions, for example.  But internal administrative convenience is not a good argument to disregard real harm that we are inadvertently causing to specific individuals.  The developers can and probably should be tasked, as a high priority, with improving the search capabilities; but it has been too long since the problems I have described in this e-mail were identifed, and it is time they were solved.
 * ''The most cynical response has been that Wikipedia thrives on Google-rank created by internal links and is not going to do anything that would lessen its page-ranks, whether out of pride or for some conjectured eventual mercenary reason. Actually, this was not a counter-argument presented on Wikien; it's a cynical speculation about motivations that was presented on a criticism site.  I give it no credence, but it would be easy enough to disprove once and for all.


 * Wikipedia and its community are often criticized for irresponsibly neglecting the negative effects of the project on some of its subjects and some of its contributors. We have here an opportunity to take an incremental but meaningful step toward addressing a group of related, significant concerns.  I would like to urge that the on-again, off-again discussion of this proposal proceed to a conclusion either here or on-wiki and that some definitive action be taken in the near future.

After further discussion, most (though not all) of the participants in the mailing-list thread agreed that some form of No-Indexing needed to be implemented. Interestingly enough, and to my surprise and to an extent my embarrassent, it transpired that No-Indexing of all RfA, RfAr, and XfD pages had actually been implemented several months earlier! (For example, the RfA archives have been No-Indexed since September 2007.) To the best of my knowledge, no one particularly noticed, and certainly no one complained about, this nearly year-old change, a fact that in and of itself suggests that concerns about the impact of No-Indexing on project administration are overblown.

I had always anticipated that any conclusions reached in the WikiEn-l thread would be taken on-wiki for discussion and implementation, although because I still was not back to participating on Wikipedia I did not commence such a discussion myself. It appears that coding has now been done to allow No-Indexing to take place. I note that there have been discussions on-wiki and on an external site as to whether the technical methods being suggested are optimal. I have no views on this particular sub-issue.

There remains no value to Wikipedia to having our internal discussions visible as high-ranking Google searches for real people, to the obvious detriment of our contributors, former contributors, and most important our article subjects or former article subjects. While acknowledging that there may be some counter-arguments focused primarily on internal project administration (for example, I know I have used Google searches to find evidence and precedents in drafting arbitration decisions), they are heavily outweighed by our need to treat our editors, former editors, and subjects in a fashion that does not unnecessarily spill with effects, potentially profoundly negative effects, into their off-wiki lives. Wikipedia, as I have written before, has grown beyond the initial influence anyone could have calculated it could have, and we must, at all times, remain conscious of and prepared to mitigate any negative external effects it may have on the real-world lives of our contributors and our subjects. (It is occasionally suggested that NPOV or any other consideration militate against caring or doing anything about such things. It remains my view that it would be intolerable to operate the world's most powerful interactive and participatory website in that fashion.)

In addition to RfA, RfAr, and XfD, which as noted were already No-Indexed several months ago without incident, it is the following project space pages and archives should be put into the No-Index designation immediately:
 * DRV (which was not No-Indexed when XfD was, apparently for technical reasons because the pages are configured differently; there have been several reports of OTRS complaints involving DRV results);
 * SSP and RfCU;
 * AN, AN/I, and AN/3;
 * User-conduct RfC's;
 * WQA and the former PAIN and CSN archives; and
 * Any other project-space pages of a similar ilk.

These to me are the easiest calls. This change should have been made long ago. Beyond that, there is a case that the easiest thing to do would be to exclude project-space altogether. A compromise would be to have project-space primarily No-Indexed but to allow a page (e.g., a policy) to be opted into indexing on an individual basis.

A second question is whether userspace should be No-Indexed. I can see arguments on both sides of this question, although in general, userspace is likely to have little content worth searching. A compromise could be to allow a given user the opting in or out of allowing searching for his or her userpage and talkpage, although one could debate whether the pages should be presumptively "out" of indexing with the ability to opt in, or presumptively "in" with the ability to opt out. Because 95%+ of users will ignore the issue and be governed whatever the default policy is, this is an important choice.

A third question is whether article talkpages should be No-Indexed. This has not yet received detailed discussion to my knowledge.

Finally, a fourth question is whether and how the No-Index capability should be extended to mainspace. Unlike the ancillary types of pages found in project (Wikipedia) space and userspace, the whole point of the encyclopedia articles themselves is to be widely available as a public resource. Easy No-Indexing of any given mainspace page on a whim would not be helpful toward that purpose. On the other hand, there is a significant discussion to be had on how No-Index can combine with other improvements to address the significant open issues that remain surrounding BLP's and articles of similar import, in conjunction with flagged revisions (what is the status of that these days?) and the like. The fact that anyone can create an article about anyone else, and it will almost instantly become visible online (sometimes within literally a few minutes of creation) no matter how objectionable its contents might be, remains a serious concern and one that we must addressed. (One proposal that has been made, which makes some sense, is that any new article that is primarily a BLP would be automatically No-Indexed until it is reviewed and a revision is flagged.) The No-Indexing of project-space and potentially userspace pages need not await what I anticipate will be a more complex discussion, and support for No-Indexing WP:SSP, for example, does not constitute support for No-Indexing some mainspace articles. They are separate issues, and should be treated separately, and perhaps successively. But this, too, is a discussion we ought to have.

I apologize, as is my wont, for the length of this post; but these issues are extremely important ones. I also apologize if I have missed any related discussions that might have taken place in other areas of the site. Newyorkbrad (talk) 23:05, 20 August 2008 (UTC)


 * Let me stand up and say I have been aware of and personally frustrated by both the RfAr and XfD blocks when searching for results I remembered but could not easily locate using the internal search. I can't recall ever being bothered by the RfA block, but that's not to say that I can't imagine hypothetical situations when it might be troublesome as well.  I can understand the argument that RfAr and XfD have the potential to be sufficiently embarrasing that a blanket prohibition is warranted (though I'm not sure I agree with the argument), but I certainly reject the view that these actions were taken without any negative impact on the community.


 * Having project-space searchable by Google is a good thing for editors. Personally, I think it is hard to argue with that position.  Editors often have the desire to find old remembered conversations, and as far as I know no one is opposed to the internal search engine indexing these pages.  At the same time, I agree with you that having project-space searchable by everyone else is potentially negative and embarrassing for some people.


 * For me then it is fundementally a question of balancing. How do we maximize the good of having project-space easily accessible to editors, while also minimizing the negative impact that may be created for some people on the outside.  I support NOINDEX-ing, blanking, and in some cases even deleting expressly negative content that is associated with personally identifiable individuals.  However, I do not support applying NOINDEX so broadly that it takes out everything else in the process.


 * Applying NOINDEX (or stronger measures) on a case by case basis is fine with me. But when we discuss applying NOINDEX to AN or all of project-space, we have to consider whether we are trying to swat a fly with an elephant gun.  What fraction of the entries on a typical day's AN or AN/I involve individuals identified by their real name?  By my estimate it is a quite small fraction.  Other parts of project-space (e.g. policies, guidelines, style guides, and their associated talk pages) are even less likely to have negative discussions of identifiable individuals.  Yes, there are more problematic places, SSP or XfD prehaps, where an entire page can be dedicated to the evils of one individual, but most of project-space is not constructed that way, and in practice most of the people we have disputes about are not identifiable in the real world.


 * The other concern is that I have a long memory. At times, I've searched for and cited AN/I actions that were more than 3 years old.  Though doing those searches is getting easier internally, they still seem to be faster and substantially more accurate through Google than internally.  I'd rather not lose the ability to do that without a showing of just cause.


 * In the previous section, I posed a challenge: Give several examples of people who when you search on just their real name in Google you get negative AN or AN/I results in the first 20 or 30 entries? I've tried and haven't been able to do it for anyone.  I can sometimes see other pages, such as an SSP or XfD, but I've yet to find an example where Google gives prominence to AN.  I'm sure some examples exist, but as far as I can see those examples aren't common enough to justify applying NOINDEX to all of AN or all of project-space.


 * I realize that a balancing test is intrinsically subjective, and different individuals may come to different conclusions about what the balance between fast searching and protection of others should be. But for me the downside of NOINDEX is sufficiently large that I don't want to apply it to whole classes of pages without good evidence that a substantial fraction of the content being addressed is precisely the kind of negative, personal commentary that we all agree should be targeted.  Dragons flight (talk) 00:43, 21 August 2008 (UTC)

I said it on the mailing list, and I'll say it again: we have courtesy blanking, which is a surgical tool where this is a blunt instrument. As WP gets larger, archives longer and noticeboards proliferate, I need google to know when a particular account has been discussed at AN/I or when a particular source has been OKd by consensus, or whether similar issues have been raised to an ongoing AfD. Large-scale noindexing gets in the way of basic editing. If we are concerned that a particular section is devoted to the sins of a clearly-identifiable living person, let us become more aggressive about courtesy blanking. I am pretty aggressive already. -- Relata refero (disp.) 20:59, 21 August 2008 (UTC)

Noindexing user talk space and other thoughts
Back in April, I proposed that user talk pages should be blocked from search engines because of templates like Template:Nn-warn, which show the name of an article that has been marked for speedy deletion. (The discussion referred to in the village pump page is now archived here). For example, there is no reason to have User talk:Manallackt be indexed by search engines, particularly as it's the top result on a Google search for Thomas Manallack. There are hundreds or maybe thousands of cases just like that one.

As for noindexing the Wikipedia namespace, I think that pages where negative content about people or companies is regularly posted and archived, like the admin noticeboards and deletion review, should be noindexed. However keep in mind that admin noticeboard archives are unlikely to come up in searches because there so big, and they cover so many topics. I've found a Google search for a person's real name which coughs up an admin noticeboard subpage on the second result, but that's only because it's in the title of the subpage.

However there are several pages in the Wikipedia namespace that IMO should be indexable by search engines. These include the FAQs, village pump, Wikipedia news pages (Wikipedia Signpost, Goings-on, and Announcements), and all of Wikipedia's policies, guidelines, and essays. Pages like the help desk and reference desks should be indexed by search engines because they contain useful information for new editors or people curious about a topic. For example, I wanted this question I posted to the Reference desk to be highly ranked on search engines, and it's currently #23 on a Google search for desmense, which is not bad at all.

Just my two cents. Graham 87 09:17, 21 August 2008 (UTC)


 * The user talk namespace is no longer indexed. Graham 87 05:37, 13 September 2008 (UTC)

Challenge: finding old discussions
OK. I clicked on a random AN or ANI archive, and I found a report that led to a 1-year block of a reincarnation of an Arbcom-banned user. The block log is here. There are many hundreds of block log entries like this that don't give sufficient details to track down the discussion that led to the block. If you ever have to do such a search (possibly on one of your own blocks where you cannot remember the details), would the internal search engine work as well as Google? A prize to the first person to find the discussion that led to that block. Preferably, start with using the internal search engine, and do say if you had to use Google to find the discussion. Actually, I can answer that now: both this (Google) and this (internal search) get the desired results. By the way, I stumbled across Administrators' noticeboard/Archives (is this still being maintained?) - here is another challege - try and find the AN disussion that led to that page being set up - there is an easy way to do this, but I want to see if anyone else can work it out. Carcharoth (talk) 21:42, 21 August 2008 (UTC)


 * Actually I think a less specific "Arius Heresiarch" is sufficient? I agree with others that our internal search sucks and that the loss of utility would be not insignificant. The important point, though, is that our convenience is less important than doing the Right Thing by very many people who may have been naive, stupid, misguided or even malicious, but are suffering the Curse of Google at Wikipedia's hands.  I think I am not soft on abusers of the project, but I would give them anonymity like a shot especially if it would help them bury the hatchet and leave us in peace. Guy (Help!) 22:46, 21 August 2008 (UTC)
 * Guy from the aspect of "abusive" users pretty much sums up one of the key reasons this is a good idea. rootology  ( T ) 01:38, 22 August 2008 (UTC)
 * May I once again respectfully murmur the words "courtesy blanking"? -- Relata refero (disp.) 05:44, 22 August 2008 (UTC)
 * Courtesy blanking is really only useful for an entire page, where one only has to go back one edit to see content. It does, however, make it impossible to search out the content that is now no longer on the page; neither internal nor external search engines will locate past revisions. One can still search for a known title; however, often what is being searched for is a phrase within the (hypothetically) blanked page/section. It's also essentially unworkable for sections of pages. Risker (talk) 06:31, 22 August 2008 (UTC)
 * Courtesy blanking is fine in its place but hiding the debates entirely can make some discussions very difficult. Of course our convenience is less important than our impact on real people. 82.132.136.214 (talk) 07:07, 22 August 2008 (UTC)
 * I'm not certain at all why sections are unblankable. I do it with sections of talkpages, with a link to the diff of the deletion. -- Relata refero (disp.) 06:55, 23 August 2008 (UTC)
 * Let's take WP:ANI as an example. Because of the thousands of revisions to ANI over time, it is not possible to delete an edit; this restriction is written into the software (confirmed with one of the developers). It is possible to oversight edits, but that is not the objective. Sections are open for 24 hours after last posting; they're current threads, and some that contain negative personal information or discuss BLP subjects have been debated for days. That is certainly long enough for web crawlers to pick them up and make them available on Google and other search engines. Most of the other noticeboards have the same issue; their long history makes them undeletable by anyone but a developer. The deletion restrictions don't apply to the archives, but blanking of sections makes that section unsearchable and can result in improperly formed archives. Risker (talk) 07:41, 23 August 2008 (UTC)
 * Given the rarity of the problem, I am happy with making only those sections unsearchable, as opposed to effectively making all sections unsearchable.
 * About the fact that BLP subjects are discussed for days, I agree that's a problem, and I prefer that they we be more restrictive in our on-wiki posting. That would be a better solution. I don't see why deletion is relevant, since we only need to blank to not be picked up by crawlers. -- Relata refero (disp.) 09:34, 23 August 2008 (UTC)
 * No. Being more restrictive about on-wiki posting is an awful idea. We have enough trouble actually being able to discuss stuff to the extent we need to as is. JoshuaZ (talk) 15:53, 26 August 2008 (UTC)
 * For the record, the noticeboard archive discussion is here... which I found by looking at the person who made the first edit to the Archive page and then checking their other contributions for that month. We have alot of tools other than just 'search' after all. --CBD 11:53, 22 August 2008 (UTC)
 * Good method! :-) I was thinking more of looking at "what links here" for Wikipedia:Administrators' noticeboard/Archives, reducing that list to redirects only, then looking at "what links here" for that single redirect. The AN archive on that list contains the discussion: see here. Of course, finding redirects was difficult before that change to the "what links here" display functionality. A better way to find redirects due to a page move is to go to the history and look at the early history: first edit (as CBD did) and any page moves. In conjunction with improving the internal search engine, improved software tools to follow such links and filter them, would help. See below for one thing I did that shows the utility of indexing AfDs. Carcharoth (talk) 19:44, 22 August 2008 (UTC)

Improving internal search capabilities
I adhere to my view that our taking the progressive steps discussed above should not await improvements intended for the internal convenience of Wikipedia administration. Nonetheless, it's become clear that many people believe our internal search capabilities require improvement. Without getting to an excessive level of coding detail, what would need to be done to effect the needed improvements? Newyorkbrad (talk) 01:46, 22 August 2008 (UTC)
 * A workable full text index search that gives as much utility as Google. Actually the internal search is better than it was in the past. But I stand bny my statement above that our convenience is less important than our impact on living individuals.  I would go further and implement NOINDEX in mainspace for biographies under deletion debate or where there is dispute over content.  OTRS is full of tickets from peple who have been negatively impacted by what Wikipedia says about them, and by Wikipedia's high position in Google rankings. 82.132.136.214 (talk) 07:05, 22 August 2008 (UTC)
 * Along those lines, I'd like to add NOINDEX to to take all article deletion discussions out of external search engines. I can't see that being much of an 'internal search' issue... how often is someone trying to find a deletion discussion about an article they don't know the name of? As to improving our internal search to be equivalent to Google's capabilities, not a small job. That said, the namespace selectivity of our internal search already provides some advantage over Google (yes, you can type the namespace into Google as a keyword, but it produces alot of false positives)... if that could be further enhanced to a prefix level search (i.e. search everything which starts with 'Wikipedia:Articles_for_deletion/') I'd think it would answer most of the concerns. Alternatively, something like this tool could be expanded to cover the prefixes of all the common processes rather than just the noticeboards it searches currently. --CBD 11:07, 22 August 2008 (UTC)
 * Yes, a feature to search for, say, pages within Special:Prefixindex/Wikipedia:Articles for deletion would be a neat idea. But there's no need to put NOINDEX on AfD discussions as they're already blocked as discussed above and at bug 4776. Wikipedia's Robots.txt file gives details about each entry that is blocked from search engines. Graham 87 13:04, 22 August 2008 (UTC)
 * About a year ago, I created User:Carcharoth/List of AfDs compact index. It's not a search of AfD, and it only indexes letters, numbers and special characters, but it does make some attempt to break things up into reasonable chunks. A proper index would list everything alphabetically in blocks of a set size, allowing people to jump in at any point. The current set-up allows you to jump in at any point in the alphabetical list, but doesn't allow you to get a feel for how far you are from another point. The AfD index I did using Special:PrefixIndex doesn't really solve anything, but maybe it will suggest something to someone else? Carcharoth (talk) 19:52, 22 August 2008 (UTC)
 * Also related to searching AfDs is topics like all the "in popular culture" AfDs. See WikiProject Deletion sorting/Popular culture for an example of an incomplete list. Obviously not all AfDs have "in popular culture" in the titles (and umbrella nominations may not even mention the article in question in the title). Still, what would be really useful is someway to find all AfD titles with the phrase "in popular culture" in them. I remember once talking about this on a talk page somewhere. I am currently searching for (and failing to find) that old discussion and trying not to use Google to find it! (Update: I did try Google, but it seems NOINDEX has already taken effect and means I can't find the old discussion I was thinking of). I think it was based on a search like this: "articles for deletion in popular culture" limited to 'Wikipedia' namespace, which gets 5616 hits. Doing "Wikipedia:Articles for deletion in popular culture" also gets 5616 hits. In other words, the search find hundreds of valid results, but many results that are not relevant as well. To be fair, it would be trivial to list all AfDs with "in popular culture" in their titles using other methods (eg. WP:BOTREQ). Doing such a search for WikiProjects can be very useful, though. I once did such a search to try and complete the list at Wikipedia talk:WikiProject Middle-earth/AfD. The method used can be seen at User:Carcharoth/Finding AfDs. This only works if an article got redirected or not deleted (ie. if the page title still exists). The method is to take a large list of all the articles and redirects in a particular topic area, and to then stick "Wikipedia:Articles for deletion/" in front of the titles. Anything with blue links points to a deletion debate. Still doesn't guarantee finding all the relevant debates, but various searches should fill in most of the gaps. Finding PRODs and speedies and deleted articles is more challenging. That probably requires a complete trawl through Special:Log/delete... (an absolutely massive undertaking). Carcharoth (talk) 01:10, 23 August 2008 (UTC)
 * See also the following threads (or sometimes just posts by me): the three sections starting from here. Carcharoth (talk) 01:17, 23 August 2008 (UTC)

Collapse boxes
I noticed recently that collapsed boxes (like those seen at DRV) make it harder to find things when searching on a page, or indeed when trying to link to sections within a collapsed box section. This led me to wonder whether collapse boxes also mess up search engines, both external (eg. Google) and internal ones? Does anyone know? For the record, collapse boxes also mess up searches and links using in page histories. See User talk:FayssalF for the classic example (try linking to a discussion on his talk page or an old discussion in his talk page history - maybe some script genius can force the link to automatically "open" the collapsed boxes, but I can't do it). The same thing appears to have been done at DRV, possibly for the same reason (though there you can still link to the discussion just before they get collapsed). Carcharoth (talk) 17:26, 25 August 2008 (UTC)
 * Ah. Looks like collapse boxes don't do anything magic. See here. Carcharoth (talk) 17:29, 25 August 2008 (UTC)
 * Yep, collapsible boxes only hide it on the page, not in the html code, so a search engine will see right through it.  MBisanz  talk 16:57, 4 September 2008 (UTC)

Search engine indexing updates
See Wikipedia:Administrators' noticeboard#Search engine indexing updates. Graham 87 06:30, 13 September 2008 (UTC)

MediaWiki:Robots.txt
Note that we now have MediaWiki:Robots.txt, so now admins can customise the robots.txt without needing to file a bug on bugzilla and have the devs change it. --bainer (talk) 06:43, 13 September 2008 (UTC)