Wikipedia:Send in the clones

There are now a large number of clones of Wikipedia's content on the World Wide Web, with various degrees of licence compliance. This is fine if they are in compliance with the GFDL; indeed, it was one of the original goals of the project. Some of them are high-quality mirrors, whilst others are poor-quality bags of spamdexing search-engine fodder.

What seems clear is that many of these clones are using search engine optimization techniques to achieve higher rankings on search engines than the original Wikipedia pages.

The question is: should we start to try to compete with these sites in terms of search engine rankings?

Option 1: Yes, we must, or we will lose traction
Wikipedia needs to keep its traffic up to maintain its momentum, to continue growing (both in breadth and depth) its userbase, editorbase, and content. Most mirrors are significantly out of date. The majority fail to properly comply with the GFDL. Some wrap Wikipedia's content with questionable ads or associate it with content that doesn't match our high NPOV standards. Many selectively include articles that support their agenda and omit those that don't. More traffic brings us more editors - we're still very short of editors with knowledge outside the western world, of medicine, fine arts, the "soft" sciences and in many other areas. Many current editors came to Wikipedia from internet searches - if Wikipedia continues to slide down the search result rankings, our growth may slow. Our userbase and editorship may even decline. It won't matter how good an encyclopedia we've written when all the casual internet user ever finds is some old, incomplete mirror. The longer we wait to fix things, the more deeply entrenched the mirrors become.


 * more to add

Option 2: No, we need not compete with our own clones
Our purpose is to write the best encyclopedia we can, for distribution by ourselves and others, electronically and otherwise. Our purpose isn't to get good Google rankings, high Alexa ratings, or lots of traffic. Compliant mirrors help us in our goal to educate and inform; non-compliant mirrors should be encouraged, pressured, and cajoled into becoming compliant.

If we do directly compete, we shouldn't use their questionable tactics (e.g. putting all the subheadings in the page title as http://encyclopedia.thefreedictionary.com does). We should just do "good" optimisation; if search engines don't report our pages, we should take that as a cue to improve the navigability of our site.


 * more to be written

Option 3: Yes we should, but only when we can bear the traffic
We agree that most internet users are best served by getting Wikipedia content "from the horse's mouth", and Wikipedia certainly needs more visitors and editors. In practice, however, we're constantly limited by our infrastructure. We should therefore try to improve the accessibility and search-engine friendliness of our site without "heroic measures". Instead, improve it in a moderate, controlled way, so that our infrastructure growth can keep pace with the demands placed upon it by our success.

Other options?

 * 1) We could ensure that we use all the search engine optimization (SEO) techniques that are ethically acceptable and improve our user experience, e.g. proper use of META tags (see )
 * 2) We could try to negotiate something with Google that would put Wikipedia ahead of its clones in search results

How do we compete?
Clearly we should only use legitimate tactics to try to increase our ratings with search engines. These generally mean making sure that our information architecture is clean, our design is simple, well laid out and easy to navigate. All of these are good goals we should be working on in any case.

Keywords
Currently, the meta tags added to Wikipedia articles list ten arbitrarily chosen internal links from their article. For example, this article currently includes:


 * 

The meta tags for Wikipedia include:


 * 

The keywords are usually alphabetized, but, as with this article, there are some exceptions. The keywords generally exclude most of the alphabet. No effort has been made to ensure links containing commas are treated as multiple keywords (as they are recognized by browsers); Rex, North Carolina has been inserted as one keyword, but will be interpreted as the two keywords Rex and North Carolina ( this is probably beneficial, however ). This leads to poor optimization if the components of a list are also linked separately:


 * &lt;meta name="KEYWORDS" content="Dublin Core,Google,GFDL,Meta tag,North Carolina,Wiki,Yahoo,Toad,Spamdexing,Rex, North Carolina" />

Edits which neither add nor remove links often change an article's keywords, with unpredictable results.

Possible improvements

 * including names of pages which link to the current page
 * including names of categories which the page belongs to
 * ordering the keywords according to their articles' size or popularity as measured by Wikipedia's usage statistics
 * ordering the keywords according to how often users follow each link from the article
 * restricting the keywords to the article's namespace (i.e. User:Ahoerstemeier should not be a keyword for the Wikipedia: namespace)
 * alphabetizing the keywords after extracting them from the article, so the entire alphabet may be represented, including relevant links which occur in the latter part of the alphabet (i.e., wiki should be a keyword for Wikipedia)
 * This is T2570 on Wikimedia Phabricator.

Description
The first paragraph (or perhaps first sentence) of an article could be included as the value of a "description" meta tag.

Others
For example including http://www.geourl.org headers where appropriate.

Is there a list of standards for meta tags anywhere, or is this pretty much ad hoc? anthony (see warning) 14:48, 11 Aug 2004 (UTC)
 * How about Dublin Core? [ alerante | &#x201c;&#x201d; 15:05, 15 Aug 2004 (UTC) ]

Meta-tags and Google
Note, however, that Google is supposed to ignore meta tags completely. -- The Anome 14:58, 9 Aug 2004 (UTC)
 * Well it certainly reads the descriptions, because if your search term is in the description it displays the description along with the search results. We don't know if it uses them for placement, but it might, even if they only contribute to "keyword density". Richard Taylor 14:41, 11 Aug 2004 (UTC)

Have the mirrors link to us
How about, it's legal to mirror WP, but we do not have to make it easy for the 'non-compliant' ones. There could be a blacklist of the IPs of the bots they use. To these bots, rather than blocking them (which would be pointless), feed an additional statement "the original page is here", with a link to WP. Now I know that some of these mirrors filter out anything even containing the string "wiki". In order for the message not simply being filtered out, the statement would have to be rephrased/replaced in irregular intervals. Also, the link to WP could be a link to a naked IP, or to a rather domain with a friendly sysop who is prepared to place a redirect (so the links will not be recognized by the mirrors as WP-related). This would keep them on their toes if they really don't want to acknowledge that they are just mirroring. dab 13:45, 27 Nov 2004 (UTC)

Include Names of the Article in Interlang Links
Currently, if someone links from, say, English Wikipedia to Japanese Wikipedia with the same version of the article, all you'll get is a link that says: 日本語. What we should do is have the link say 日本語 - フィクション (language - title of article in the language). More contextual, and helps boost rank for that title via context-sensitive links.

Methods we can't/shouldn't use
It's at least enlightening, if not useful, to note some of the methods clones have used to increase their search rankings that are either technically unfeasible for Wikipedia, or undesirable for other reasons.


 * Vastly increasing the number of inter-article references. Although Google et al. value links from one site to another more than links within a site, a huge pattern of non-random links within the site to the same article may still increase that article's ranking. Many WP clones have large sidebars or footers on every page that link to many other articles, sometimes by alphabetic proximity or category, sometimes by "what links here". This may be a useful feature, but would probably be an unacceptable performance drain on WP servers (clones can prepare it all in advance when they download the data, but our articles are dynamic). ←Hob 08:01, 2004 Sep 7 (UTC)
 * This should not be rejected. We can use such approaches. The Squid caches server almost 80% of all hits to the site and a higher percentage of those from search engines. Placing the leading what links here items in the sidebar is one which seems eminently possible. Placing some items from the same categories would also be doable. They would increase database load, but probably not enough to be very problematic, assuming that the queries were efficient. Jamesday 21:38, 8 Sep 2004 (UTC)
 * I agree with Jamesday, there's nothing dirty about this (although including more than one link in each direction alphabetically wouldn't be such a great idea). Of course, simply taking "what links here" off of robots.txt would accomplish pretty much the same thing. anthony &#35686;&#21578; 22:34, 23 Nov 2004 (UTC)
 * Proliferating subdomains. Whether or not this is still the case, at one point Google's algorithm considered subdomains to be different sites: e.g., bar.foo.com and baz.foo.com would be two sites, while foo.com/bar and foo.com/baz would be two pages on the same site. If they're separate sites, they can give each other a bigger ranking boost; also, having the search term in the domain name may boost the ranking. At least one clone, wikiverse.org, puts every article on a separate subdomain. Besides looking funny, this would probably require a major redesign to Wikimedia software. ←Hob 08:01, 2004 Sep 7 (UTC)
 * Google is aware of this technique and I've speculated that these links are reducing our rank rather than increasing it - there was a clear drop in rankings when Google implemented new measures to reduce the benefit of this method of site promotion. However, this is speculation. Jamesday 21:38, 8 Sep 2004 (UTC)
 * While I don't think we should do this solely to try to boost google rank, and I don't think we should make every single article a different subdomain, I think using separate subdomains for different topics within the same database is acceptable. It would be one way to compromise between the people who want to split out things like rambot articles and those who think they should stay in Wikipedia, for instance.  anthony &#35686;&#21578; 22:37, 23 Nov 2004 (UTC)
 * Minimizing links to content on other sites. For clones, this generally means keeping the number of links back to Wikipedia to an absolute minimum, and using nondescriptive text for all such links (i.e. "this article is based on Wikipedia: click here" boosts WP's ranking for "toad" less than "this article is based on Wikipedia: toad"). If clones did not use these techniques (or even more evasive ones, such as using Javascript instead of real links), their copied articles would boost WP's rankings as much as their own and they'd lose much of their advantage. We can't do much about this, although we can be more aggressive in requiring mirrors and forks to at least link to us from every article rather than on a single acknowledgement page. ←Hob 08:01, 2004 Sep 7 (UTC)
 * Neither a link here from one page nor a link from every page complies with the GFDL requirements, so how would this be enforced? Remember that completely complying with the GFDL involves no link at all - do we want to encourage that? Perhaps - if Google is penalising it. Jamesday 21:38, 8 Sep 2004 (UTC)
 * I'm not sure how this relates to being something that Wikipedia should or shouldn't do. Personally, as a mirror, I refuse to participate in Wikipedia's googlebombing scheme, and so I link to wikipedia using javascript.  I don't do this for normal external links, only for the links which some claim are mandated by the GFDL (they aren't actually mandated by the GFDL, but it's easier to give in than to fight that, besides, having the link is useful when I'm browsing McFly and come across an article I want to edit).  anthony &#35686;&#21578; 22:40, 23 Nov 2004 (UTC)

Finding of facts
Please make sure these are facts


 * 1) For a typical Google search, Wikipedia ranks lower than many of the other GFDL licensees of the content.
 * 2) Search engine traffic is an effective source of new contributors.
 * 3) New blood can offer contributions that old blood can't.
 * 4) Wikipedia has always grown very fast, and is continuing to do so, regardless of Google rank for particular articles.
 * 5) Most other GFDL licensees attempt to minimise their GFDL obligations, usually are less compliant with the GFDL requirements than the Wikimedia Foundation, and often don't follow the non-compliant suggestion of a link to the Wikimedia project either.
 * 6) The other GFDL licensees are assisting with achieving a key goal of the project.

Comments
We don't yet know how option 1 might be gone about. For instance, I am not sure that becoming a spamdexer ourselves is a great idea, but using contacts that Jimbo, for example, has with Google, might be extremely helpful if we want to change the results from that search engine.


 * I don't know how much truth there is to "Fact 1". I tried a couple random searches of articles on WP:FA, and if there was wikipedia information in the first 10 hits, wikipedia almost always came up higher than clones:
 * "British East India Company" (WP is 2nd, after another legitimate source)
 * update, wikipedia is #1 --Msoos 14:10, 21 October 2005 (UTC)
 * "LZW" (WP is 4th, after other legitimate sources)
 * "Horatio Nelson" (WP is 4th, after other legitimate sources)


 * Wikipedia has since dropped to ninth, following mirrors WordIQ.com, fact-index.com, explanation-guide.info, and the extremely weird http://horatio-nelson-1st-viscount-nelson.wikiverse.org/. This highlights a much more serious problem: a site with unique content (http://freemasonry.bcy.ca/, which contains biographical information about Freemasons) is seventh, totally obscured by redundant Wikipedia articles.  -- &eta;  [[Image:Venus symbol (blue).gif|&#9792;]] [ &upsilon;&omega;&rho; ] 09:33, 25 Sep 2004 (UTC)


 * I read an article recently (outside of WikiPedia, within the last 2 weeks - I wish I had the link still) about how someone found a way to bomb Google's search engine for higher page ranking. The author noticed a number of sites returned by Google did not contain any of the words in search query.  By investigating this puzzling occurrence, he noticed that in the anchor (a href...) tags contained TITLE attributes to list keywords.  Also, in image (img src...) the ALT attribute was used the same way.  By doing so, it doesn't improve the page ranking of the current site, but it does improve the site it links to (if in different domain).  This may already be covered in Google bomb. Guy M. 00:00, Jan 9, 2005 (UTC)


 * This might be a small problem, but it certainly is not that widespread yet. I know that a lot of wikipedia contributors are incensed to see so many clones that do not give proper credit, but I don't think we need to do any SEO ourselves at this time. -- DropDeadGorgias (talk) 15:33, Aug 2, 2004 (UTC)


 * I believe that Google may penalize content that is found in many near-identical copies on the net. It might be a good idea to let Google know that where other sites are found that mirror Wikipedia's content, that we are the original source, and not part of a spamdexing network. -- The Anome 09:30, 3 Aug 2004 (UTC)


 * If you search for "vote swapping united kingdom" you will notice that Wikipedia is not in the first pages. However, miore clearly try searching for a unique phrase in the same page "greater chance of winning in their district exchange" you can see exactly one result; wikipedia is ruled out as a "similar page". Duplicate material is simply excluded. Interestingly, the first listed clone has a direct link back to the original live article, which can be seen as above average. Incidentally, I just found this discussion whilst trying to find out why Wikipedia isn't listed.  Mozzerati 20:08, 2004 Sep 26 (UTC)


 * I'm a noob to Wikipedia cloning but as a concerned netizen may I say, Aaargh! I was doing a couple of math searches from google.ie involving specific math terms like hyperreal and so on. Wikipedia appears 7th below (i hate even using their names now that I have become acquainted with their tactics...) thebestlinks, freedictionary (twice), asinah, tutorgig and informationblast in the hits. What I find annoying is that I thought at each click I was getting new content and it took a while for me to twig that Wikipedia was being cloned. Mirroring is fine if links to Wikipedia are given and respect is shown where respect is due but these sites all have Google ads and their raison d'etre stinks somewhat. I thought the whole point about the content being free was that mere mortals such as myself could use it no probs but cloning thwe entire site goes against this premise, ethos, whatever. Anyhow, I think you all should be serious about your google ranking, love it or hate it Google rules the web but furthermore I think you should try to get the Clones silenced becaused they not only affect Wikipedia they force Googlers to wade through duplicate info. This is not necessarily a Wikipedia only problem. -- unregistered happy wiki user.


 * Hi.. thanks for your comment; if you do see something like this, please simply complain to google by pressing the button which is normally at the bottom right of your first results page ("Dissatisfied? Help us improve"). Google are the people who are probably best equipped to fix the problem.  The more users complain, the more they know to take it seriously.
 * Oh, and as far as it goes, I'm happy to have all my work cloned as much as possible as long as the GFDL license is followed. Much better that than have it lost if wikipedia ever disappeared or got closed down.  Mozzerati 21:30, 2005 Mar 8 (UTC)


 * Unfortunately, Google has done a poor job of fixing this problem, even though they have been informed of it. The exclusion of duplicate content is really hit or miss. For some searches, Wikipedia clones will make up 6 to 8 out of the top 10 results. It's gotten so bad that sometimes I have to specifically exclude Wikipedia from my searches. Wikipedia may now be the worst offender on the Internet for this, though eBay, Cnet and Epinions are terrible when searching for products. I don't care about the cloning of my work, but burdening the Internet like this is inappropriate. Also, it is unfair for Google to be forced to cleanup Wikipedia's mess (Google may intentionally or unintentionally "punish" Wikipedia when/if they fix the problem by removing or lowering it in search results). Finally, the issue may cause resentment for Wikipedia among Internet users (I know I hate eBay for its clones).
 * I'm not sure what the solution should be. Of course, the clones could be stopped completely (my favorite), but maybe there's better solutions. Perhaps Wikipedia's clones should be instructed to turn off search engine indexing for the Wikipedia portions of their websites. That would get the information out there and provide content for the sites, but prevent the clogging of search engines. Maybe the herd could be trimmed by stopping those websites that violate the license, requiring approval of a website before cloning or collecting a fee for doing so. I don't know if that violates Wikipedia's license/spirit, though. Kjkolb 10:18, July 11, 2005 (UTC)

Fact 1
I've tried to do my own small part for this. I hit "random page" a few times and googled the title of the result. I've charted the results below:
 * 1) Tanaka Memorial--thefreedictionary.com places 4th and 15th; Wikipedia is not in the top 30.
 * 2) Kommando Spezialkräfte--thefreedictionary.com places 1st in an English search; Wikipedia is not in the top 20
 * 3) Edmund Pevensie--neither one shows up
 * 4) 1749 in music--thefreedictionary.com dominates the top ten, with a few scattered hits from other clones; Wikipedia clocks in at #17.
 * 5) Rex, North Carolina--the first encyclopedia hit, # 8, is from www.fact-index.com. thefreedictionary.com is #29. Wikipedia is not in the top 60.

I think it's safe to say that fact one is largely true. Meelar (talk) 19:34, 2004 Aug 3 (UTC)


 * Interestingly enough, on Yahoo!, Wikipedia comes up fairly high for all of these, even Edmund Pevensie. OK, fine, bring on the SEO, as long as it's tasteful... - DropDeadGorgias (talk) 20:21, Aug 3, 2004 (UTC)


 * Two issues:


 * 1) Didn't we reach some sort of agreement with Yahoo? What are the details of that? Could we do something similar with Google?
 * 2) What is meant by "tasteful" SEO (or not)? Maybe I'm not tech-literate enough, but as long as your tactics still result in a relevant result (and those of thefreedictionary seem to), then what's wrong with it?


 * Best, [[User:Meelar|Meelar (talk)]] 20:25, 2004 Aug 3 (UTC)


 * A lot of SEO tricks these days are nothing more than dirty tricks used to deceive search engines, i.e. redirecting known search bots to pages with lots of relevant keywords, or spamming blogs with links to the targeted page. When the search engines find out, there's a chance they could remove from their results the site which is known to be doing this. Johnleemk | Talk 09:43, 4 Aug 2004 (UTC)


 * The Yahoo deal does not involve Yahoo inflating the rankings of Wikimedia sites. It's Yahoo getting regular updates of Wikimedia-hosted content and feeds, with more work on feeds for the future. If Google becomes a portal site we could make a similar deal with Google. I've used my own contacts with another largish portal site to get contacts for Jimbo there. Jamesday 21:38, 8 Sep 2004 (UTC)

Another example. Quantum circuit. I couldn't find the wikipedia article anywhere on Google, but the WordIQ clone was number 3. It sucked. None of the TeX was rendered. CSTAR 19:02, 9 Aug 2004 (UTC)
 * I finally did find it --- on page 11 ! CSTAR 23:48, 9 Aug 2004 (UTC)
 * I just did a search for quantum circuit (no quotes) and Wikipedia was the top-ranked encyclopedia site, at number 13. The highest other licensee was at 68. Searching for "quantum circuit" en was at number 4. The highest other licensee was at 53. Worth us considering that Google does regular "google dances" and may have tweaked this. Or not.:) Jamesday 21:38, 8 Sep 2004 (UTC)

maybe
How about we let Google handle this: each person can go and verify the listing details for a handful of pages, and use the complaint link for each one where Wikipedia ranks lower than the clones. If a lot of people did this, then perhaps Google might sit up and take notice of this. Dysprosia 10:00, 4 Aug 2004 (UTC)
 * Google says: "The only sites we omit are those we are legally compelled to remove or those maliciously attempting to manipulate our results." So in the case of sites "maliciously" manipulating the results that might work, but most of the sites in question are not doing that. anthony (see warning) 13:01, 11 Aug 2004 (UTC)
 * I believe Dysprosia is referring to the "Dissatisfied? Help us improve" link at the bottom of all Google's search results pages, which let you make general comments on results. They say: "Thanks for helping us improve our search. While we aren't able to respond directly to comments submitted with this form, the information will be reviewed by our quality team. (...) Please tell us what specific information you were seeking. Also tell us why you were dissatisfied with the search results. (...) Were you looking for a specific URL that wasn't listed in the search results? If so, please enter the URL here:". Jomel 09:35, 12 Aug 2004 (UTC)
 * But what can the google engineers do about it to improve the situation without introducing any Wikipedia-specific code (which they have said they won't do)? To borrow what someone else said on the pump, from google's perspective many of these clones are actually better results than Wikipedia.  They generally present the information in a way best suited for someone who wants to read an encyclopedia, rather than build an encyclopedia.  anthony (see warning) 13:44, 23 Aug 2004 (UTC)
 * Maybe. Google also has an interest in encouraging the development of good resources and that interest is best served by rewarding the primary sites instead of spam or other licensee sites. It may be that Google has improved its handling of this in one of its more recent updates. Or not. Jamesday 21:38, 8 Sep 2004 (UTC)
 * One thing I think we all agree on is that google shouldn't be displaying the clones and the original all on the same results page (unless maybe someone clicked on "see clones" or something). But I assume Google is aware of this problem, it's just a hard problem to solve (as clones are never exact duplicates). anthony &#35686;&#21578; 22:46, 23 Nov 2004 (UTC)
 * The primary purpose of many of the cloned sites is to display Google Ads. The sites make money for Google, so removing them, or reducing their ranking would not be entirely in Google's interest, there is a greater effect from their point of view than simply removing duplicate content from the results. 80.4.3.240 13:55, 9 Oct 2004 (UTC)

Another problem?
Look at these search results:. Wikipedia comes up first in the results, but there is no blurb or summary... If I were a user, I would think that that was a paid search result. The wikipedia clones come up with normal blurbs. What is causing this? - DropDeadGorgias (talk) 15:45, Aug 7, 2004 (UTC)


 * The real answer can only be given by google - but I suspect these happen when the google bot has problems crawling our site, and thus just notes the existence of the URL, but cannot store the content (maybe it received a timeout). andy 19:46, 7 Aug 2004 (UTC)


 * Wikipedia error pages have meta tags which request google not to cache the results. If the DB is acting flaky, google might download the page with such a tag.  It's unclear how long google waits before attempting to redownload the page. anthony (see warning) 13:04, 11 Aug 2004 (UTC)


 * OK, that explains the empty listing. But at least for my pet google check (monthon) it seems it takes a month till it is updated - and apparently the google bot has a preference to check in the wrong time, as after one month without a cached copy it worked and even was the top hit, about a week later it was back to the empty one. andy 10:16, 12 Aug 2004 (UTC)


 * It could be that google checks cached pages more frequently. But really this is all just speculation.  I'm not really sure what could be done to solve it, either, other than of course to have better uptime.  It also could be that this isn't the problem at all.  It's probably best for someone to contact google and ask them about it, preferably someone who can actually get through to someone who knows what they're talking about it. anthony (see warning) 13:36, 23 Aug 2004 (UTC)


 * Our google guardian angel must be watching, because this problem has been resolved. --DropDeadGorgias (talk) 02:56, Aug 25, 2004 (UTC)


 * As of today, en is number one complete with a summary. Jamesday 21:38, 8 Sep 2004 (UTC)


 * As of today, en is number one with no summary. anthony &#35686;&#21578; 22:48, 23 Nov 2004 (UTC)

Wikipedians must link!
I entirely agree about the annoyance of the clones. Perhaps Wikipedia should introduce a formal notice to wikipedians advising them to link to Wikipedia (and to their own articles within) from their own personal websites. I believe this is one of the best ways for a site to improve its Google rankings. --mervyn 09:02, 9 Aug 2004 (UTC)


 * I think this is a great idea! I just did this myself; I now have links to all of the articles I have made significant contributions to on my web page.  Samboy 19:02, 9 Aug 2004 (UTC)


 * These types of google bombs do work. See the talk page for the full discussion. anthony (see warning) 14:22, 11 Aug 2004 (UTC)


 * There's no law against linking to appropriate Wikipedia articles when you write elsewhere on the net. It's useful and effective. If you're a blogger, take particular note...:) Jamesday 21:38, 8 Sep 2004 (UTC)

Fact 5
I agree that allmost all clones don't give anything back, but there are interesting counterexamples. One is this mirror of the German wikipedia. The site operators don't use Mediawiki, but wrote their own software, including a search engine (in Free Pascal) which they claim is much faster and more powerful than WP's mySQL search. And they are offering to release it under the GPL for Wikipedia to use it.

Others at least add useful functionality which might inspire WP back. For example encyclopedia.thefreedictionary.com uses the nice idea of displaying a part of a linked article as a tool-tip when the mouse cursor hovers over the link. regards, High on a tree 01:30, 25 Aug 2004 (UTC)

Many of the other licensees don't give much back but I wouldn't be surprised if some have provided donations or good ideas which we can learn from. Their producers may also be contributors here. They do all provide some load offloading from us and a useful resource for the times when we're unavailable. Jamesday 21:38, 8 Sep 2004 (UTC)

I feel that I've given something back with the 6,556 edits I've made to the main namespace, many of which were based on scripts I've run on my local copy of the database (see also User:Anthony DiPierro/Broken categories for something I've created based on that local copy). I think there's a lot of room for the mirrors and Wikipedia to work hand in hand. It bothers me that so many people see the mirrors as competition. I have no intention of competing with Wikipedia. In fact, I think Wikipedia should run a mirror of itself. anthony &#35686;&#21578; 23:24, 23 Nov 2004 (UTC)

A potential solution: persuade clones to include editing links
Why are clones bad? One reason: they reduce the number of potential editors per page view of Wikipedia material. We could solve this if we persuaded the clones to include links to the Wikipedia editing page. It would seem to be in their interest, because it would lead to improved content. --erauch 04:53, 28 Aug 2004 (UTC)


 * "In their interest" for some, not for others. The ones that are just trying to pull page hits for banner ads presumably don't care what the content is, and won't be persuaded. And even the good guys probably don't want to call attention to the fact that their snapshots of articles are days/weeks/months old. "Hey, I clicked this thing to add some more facts, and it says 20 other people already added them 6 weeks ago! This encyclopedia is broken!" &#8592;Hob 08:10, 2004 Sep 7 (UTC)


 * It's worth trying, though. Some of the larger ones, which pull in a lot of the page views, probably do care. --Erauch 14:17, 7 Sep 2004 (UTC)


 * Well, everything is worth trying. But if their main concern is page views, they're probably not eager to do anything that encourages people to use our site rather than theirs. Once someone gets turned on to WP editing, they're likely to stay on WP rather than the clone site. &#8592;Hob 03:16, 2004 Sep 8 (UTC)


 * Most of their page views come from random web searchers, so it wouldn't have much of an impact on page views. But better content would produce more page views. --Erauch 03:46, 8 Sep 2004 (UTC)


 * This is something I plan to do at some point. I think it is in my best interest, as a mirror site, to encourage people to edit on Wikipedia.  Sure, maybe some people will get hooked on Wikipedia as a result, but even this is a good thing.  I'd rather the person gets hooked on Wikipedia than not, and I'm not interested in merging contributions made on my site with those made on Wikipedia.  The only stopping point right now is that I want to implement it so that people can edit my site for articles which have been deleted from Wikipedia, but the edit link goes to Wikipedia for articles which have not been deleted.  So this involves finding some way to parse Deletion log.  Providing some way for me to easily do this would greatly facilitate things. anthony &#35686;&#21578; 22:55, 23 Nov 2004 (UTC)

Possible reason why Wikipedia is forced down
Not that I know anything but... is it possible that the fact that the mirrors link to Wikipedia cause a reverse effect of a Google bomb since it is exactly the same content? Someone stated before that because the Wikipedia article is exactly the same, it turned up to be a similar page. Is it possible that because it is turning up as a similar page so often that it is pushed below the PageRank? I'm only exploring an idea... don't know if this is true. --AllyUnion (talk) 06:43, 1 Dec 2004 (UTC)

Google will be forced to fix this eventually
As Wikipedia content proliferates, Google users are going to get more and more annoyed when they do a search and find 15 URLs of cloned material in the top 30 results. As a result, Google will have no choice but to fix this problem eventually (and I doubt that their fix could be anything except to push WP clones far down in the rankings). Moreover, in the long run external sites are going to link preferentially to Wikipedia, which will push its rankings up. Until then, relax and remember that imitation is the sincerest form of flattery. &mdash;Steven G. Johnson 21:56, Dec 3, 2004 (UTC)


 * Our Main Page already has a PageRank of 9, while http://www.answers.com/library/Encyclopedia has only 7. Seahen 11:10, 21 June 2006 (UTC)

External link

 * Google Information for Webmasters - Webmaster Guidelines