MediaWiki talk:Spam-blacklist/archives/July 2013

=Proposed Additions=

footballnation.com


Someone seems to be spamming a non-notable American football blog across the articles of many American Football players. -- Jayron  32  04:55, 17 May 2013 (UTC)


 * Looking at this user's history, I believe this is nothing more than a well-intentioned reader of the aforementioned football web log deciding this year to take the rather bold initiative of adding the results of an annual, lengthy quarterback-ranking article to Wikipedia, by adding those rankings to the pages of each quarterback listed. In my opinion, what this fellow is doing (while perhaps a bit misguided) is not only admirable, but it is obviously quite a project, and it must have taken some time to put together - what with finding each quarterback's page, customizing each edit, and locating the appropriate section and spot in which to place each paragraph within each article.  Seeing that project so suddenly destroyed may well have been quite a harsh blow to this person's morale, and as such could be a major obstacle in the way of this person continuing to become a decent editor.    If the site in question is truly non-notable (which I'd buy since I'd never heard of it until I encountered this user's edits, though I can't definitively say whether it is or is not notable), then fine, the edits in question are better left reverted.  However, I believe this person believes that it is a notable news source, and they do not seem to intend to be a spammer.  Thus, kindly letting them know that "Football Nation" is not considered by Wikipedia's policy to be a notable news source should be all that's needed here.  Placing a threatening note on their talk page, on the other hand, seems generally counterproductive towards the goal of teaching them where they've specifically erred without scaring them away. - Smike (talk) 16:13, 17 May 2013 (UTC)

wikinewstime.com


If I remember correctly there were quite a few more IP addresses spamming this site, but these were the only three I could find. - SudoGhost 17:36, 28 February 2013 (UTC)




 * Included a more recent account. - SudoGhost 12:35, 13 May 2013 (UTC)

soccerdatabase.eu
This was recently discussed at ANI - Administrators' noticeboard/Incidents. soccerdatabase.eu/ is a mirror site of www.playerhistory.com/ - the latter is defunct and the owner is launching legal action against the former. GiantSnowman 18:26, 8 May 2013 (UTC)


 * If, basically, it is established that soccerdatabase.eu is infringing the copyright of playerhistory.com by running an (illegal) mirror, then I would suggest that this is blacklisted on meta and assist in the cross-wiki cleanup. Otherwise (if there is no copyright infringement), is there really merit to blacklist this, it is used in good faith, as far as I can really find not actually spammed (there are some accounts who have used this site A LOT, but that is also true for playerhistory.com)? — Preceding unsigned comment added by Beetstra (talk • contribs) 07:50, 9 May 2013‎
 * Should we e-mail Playerhistory (I have an e-mail address for the owner) for evidence of the (supposed) copyright infringement of soccerdatabase? GiantSnowman 08:15, 9 May 2013 (UTC)
 * Might be an idea to ask the editor from AN/I to come here and comment on this - and we should be careful for libel statements before things really are established. --Dirk Beetstra T  C 08:17, 9 May 2013 (UTC)
 * All three discussions are already linked. GiantSnowman 08:41, 9 May 2013 (UTC)
 * Any movement on this? GiantSnowman 15:57, 13 May 2013 (UTC)
 * Did you email playerhistory as you apparently intended? Have you initiated a discussion on the meta blacklist? ~Amatulić (talk) 16:12, 13 May 2013 (UTC)
 * No to both. There was clear apathy here to my suggestion to e-mail playerhistory, and I never edit on meta. GiantSnowman 16:25, 13 May 2013 (UTC)
 * Well, you're the only one who knows how to contact the owner, so doing that is really up to you. I think it would be a good idea to establish some evidence of copyright violations, in which case blacklisting the soccerdatabase site would be an easy decision. As Beetstra pointed out, it would be best to establish evidence of infringement first before exploring a meta blacklist, so contacting the owner would be a logical first step. ~Amatulić (talk) 22:52, 13 May 2013 (UTC)
 * Have you even been to the soccerdatabase.eu website? It is a 100% copy - it even has the playerhistory logos etc. and it has the contacts for playerhistory listed. Is that not enough evidence of copyvio for you? GiantSnowman 08:27, 14 May 2013 (UTC)
 * Being a copy does not necessarily have to mean it is a violation of copyright. Many legitimate copies (mirrors) of Wikipedia exist.  --Dirk Beetstra T  C 10:56, 14 May 2013 (UTC)
 * ...but this is clear copyvio - it is a 100% copy of copyrighted material. GiantSnowman 16:41, 16 May 2013 (UTC)

So nobody seems to care that this website is a massive copyvio? GiantSnowman 16:38, 30 May 2013 (UTC)

telerik.com


$ whois 82.103.64.57 inetnum:       82.103.64.0 - 82.103.64.255 netname:       TELERIK descr:         Telerik Corp.

This is only the recent abuse, for the full story. MER-C 11:44, 17 May 2013 (UTC)

tr.im URL shortener
This URL shortener has been used recently:
 * edit

And sometime in the past - I removed one use here. Deli nk (talk) 13:31, 19 May 2013 (UTC)
 * URL shorteners belong on the global spam blacklist., request filed. MER-C 13:04, 20 May 2013 (UTC)
 * I didn't know. Thanks for filing it there for me!  Deli nk (talk) 13:30, 20 May 2013 (UTC)

suitusa.com



 * Spammers

MER-C 13:00, 20 May 2013 (UTC)

secure.vivid.com
Affiliate site for Vivid Entertainment.


 * Spammers

Trivialist (talk) 21:57, 20 May 2013 (UTC)

likewap.in

 * Previous incidents
 * Wikipedia talk:WikiProject Spam/2013 Archive Feb 1


 * Sites spammed


 * Spammers

MER-C 10:00, 2 June 2013 (UTC)

prowresblog.blogspot.com
See the discussion initiated by the website's owner at External_links/Noticeboard. Although there is a clear COI in this editor pushing his/her own website, the real problem is the huge amount of copyvio content - screencaps taken directly from Sky Sports in the UK as well as other TV channels, plus videos and animated gifs. --Biker Biker (talk) 20:05, 4 June 2013 (UTC)
 * Spammer
 * Spammer
 * Spammer

parstimes.com
According to the owner of the website here, parstimes.com is a personal page with economical purpose. At the moment there are more than 99 links mainly on EL section to this website.Farhikht (talk) 13:23, 6 June 2013 (UTC)

wizicam.com

 * Spammers
 * Spammers

A webcam website that's been aggressively spammed by multiple IPs and users. For diffs of spam, see the contibs for the spammers listed.  Spencer T♦ C 17:44, 6 June 2013 (UTC)

Guitarmegastore


Spam is coming out of Hungary; spammer adds this link, or replaces valid company URL with their own, for instance here. Blacklist please; cleanup is a drag. Drmies (talk) 17:03, 1 May 2013 (UTC)


 * Support.  Yinta n   19:46, 1 May 2013 (UTC)
 * Without a COIbot report, it's hard to tell if this is an isolated incident or widespread. Is it more than just that one IP address? ~Amatulić (talk) 20:03, 1 May 2013 (UTC)
 * I removed almost a dozen instances yesterday, from 78.92.125.61, 145.236.5.234, 199.241.30.194, 31.46.227.155, 81.182.18.116, and 81.183.148.51. Drmies (talk) 15:25, 2 May 2013 (UTC)
 * . I just cleaned up a couple more. ~Amatulić (talk) 14:46, 16 June 2013 (UTC)

guitarmegastore.com



 * Spammers
 * Spammer replaced existing links
 * Spammer replaced existing links
 * Spammer replaced existing links


 * Spammer replaced existing links
 * Spammer replaced existing links
 * Spammer replaced existing links
 * Spammer replaced existing links


 * Spammer used misleading edit summaries
 * Spammer used misleading edit summaries


 * Open proxy
 * Open proxy
 * Open proxy
 * Open proxy
 * Open proxy


 * Spammer replaced existing links
 * Spammer replaced existing links
 * Spammer replaced existing links
 * Spammer replaced existing links
 * Spammer replaced existing links

MER-C 12:25, 3 June 2013 (UTC)
 * Already from the other  entry above. ~Amatulić (talk) 15:47, 16 June 2013 (UTC)

checkmarx.com

 * 
 * 
 * 
 * 
 * 
 * 
 * 
 * 
 * 
 * 
 * 
 * 
 * 
 * 
 * 

Long-term problem with multiple accounts promoting the software company Checkmarx and its products. Behaviour includes persistent spamming of external links to checkmarx.com. Some diffs are provided in the list above; see Sockpuppet investigations/Grenoble jojo for further details. —Psychonaut (talk) 09:54, 12 June 2013 (UTC)
 * by WilliamH. MER-C 02:46, 14 June 2013 (UTC)

roichecker.com


Persistant reference spamming across city related articles. After being blocked as 207 came back immediately as the 223 IP address to continue spamming. The articles they are adding these to are being added in alphabetical order, which seems to suggest that they fully intend to spam all 400 of these articles. - SudoGhost 19:27, 25 June 2013 (UTC)
 * Adding another IP. Editor has now been blocked twice; blocking does no good as they simply switch IPs and continue spamming articles. - SudoGhost 20:27, 25 June 2013 (UTC)
 * to blacklist by User:Soap. ~Amatulić (talk) 22:11, 25 June 2013 (UTC)

The edits mentioned here have been obviously done without proper knowledge of the Wikipedia rules and departed from earlier contributions were cities were mentioned because of being added in the top 400 list. Such mentions in the past had a different structure (i.e. "[City] was added to the Top 400 business investments destinations with high return potential."), which comply with citation standards. We are still researching on how the additions cited here occurred but are anyway committed to add content only when it is compliant. Please, see that adding the above mentioned references to articles of cities has had a positive impact on the localities as they are always looking for more attention and opportunities to receive investors to create jobs. Blacklisting the site affects more than just the site, it also takes merit off those localities. May you please inform me on how to request the delisting? May you take care of it? Or I should proceed to request inclusion in the whitelist? Sincerely, --Mba lwall (talk) 16:30, 26 June 2013 (UTC)


 * That is hardly Wikipedia's problem, Mba lwall. The IPs had ample chance to stop pushing the link, and I presume that these IPs were pushing the link to promote the localities' businesses.  That is not what Wikipedia is for.  I guess one will need to find other outlets.  --Dirk Beetstra T  C 17:30, 26 June 2013 (UTC)

= Proposed Removals =

faizhaider.co.nr & smfaizhaider.co.nr
If one uses which should show the user page one gets a spam filter warning saying that one is trying to link to http://www.smfaizhaider.co.nr and http://www.faizhaider.co.nr.

This issue creates problem for users when they try to include my username in any discussion. I searched the blacklist logs but was not able to find any entry for the two sites. -- Sayed Mohammad Faiz Haider t c s 16:21, 19 June 2013 (UTC)
 * . I can't see a reason to de-list blacklisted sites over a non-existent problem.
 * I am able to include User:Faizhaider and the template as, as you can see by this reply. The template  (with a colon in stead of a vertical bar) would transclude your entire user page, and nobody should be doing that. ~Amatulić (talk) 22:15, 25 June 2013 (UTC)

freedom is sacred blog
This link on the Jesus Christians wikipage has been blacklisted for some reason. The Jesus Christians wikipage was recently modified in a biased way and when trying to revert we discovered that this link was "blacklisted", however when checking the blacklist for May 2013 the domain was not listed. We would like to restore the link because of historic significance. Thanks.


 * Funny, I think there is a broken regex on Wikipedia's blacklist. It seems to be caught by '�freedom.\ws� .. I'll correct that.
 * However, I still don't think this should be linked, could you please have a look through e.g. the external links guideline. --Dirk Beetstra T  C 07:27, 20 May 2013 (UTC)

Liberty Reserve
Liberty Reserve is in the news as it has been shut down by authorities. Liberty Reserve dot com currently shows a seizure notice. A link to this would be of encyclopaedic interest for the Liberty Reserve article. LukeSurlt c 22:59, 29 May 2013 (UTC)
 * But it would last only as long as the domain name stays associated with that company, which may not be long. Why isn't it enough to say in the article that "Liberty Reserve's webiste libertyreserve.com currently shows a seizure notice"? You don't actually have to have the http prefix in it. ~Amatulić (talk) 17:55, 31 May 2013 (UTC)

guard-soft.com


Not sure why this is blocked. I checked en.wikipedia.org/wiki/MediaWiki:Titleblacklist en.wikipedia.org/wiki/MediaWiki:Spam-blacklist and this site is not listed. I tried sitecheck.sucuri.net/scanner/ All referenced security checking sites showed guard-soft.com as OK, except that McAfee seems to give a 'warning' but fails to explain why. I requested McAfee to recheck there assessment. Note: I discovered wikipedia blocking this site when I attempted to edit: en.wikipedia.org/wiki/Guilloche


 * This is globally blacklisted, basically because it was spammed to, amongst other pages, to Guilloche, and on multiple wikis (also on the Russian Wikipedia). The reason for blacklisting is just plainly that it was linkspam, that link does not belong on those pages, yet editors found it necessary to add it at least 3 times to that page on en.wikipedia.  We are not a vehicle for promoting or advertising a product, we are writing an encyclopedia.  I hope this explains.  --Dirk Beetstra T  C 06:51, 2 June 2013 (UTC)

= Troubleshooting and problems = =Discussion=

Possible malware
There's a question at RSN about a possible malware site. Could someone take a look at Reliable_sources/Noticeboard? WhatamIdoing (talk) 06:01, 12 February 2011 (UTC)
 * Ran the url through a few malware/threat detectors, seems its ok.
 * Here are a few scanner tools that could be usefull.
 * Norton Safe Web, from Symantec- http://safeweb.norton.com/
 * McAfee SiteAdvisor Software – Website Safety Rating- http://www.siteadvisor.com/
 * Google Safe Browsing diagnostics- http://www.google.com/safebrowsing/diagnostic?site=malwarehelp.org
 * AVG Online Web Page Scanner- http://www.avg.com.au/resources/web-page-scanner/
 * Exploit Prevention Labs Linkscanner- http://linkscanner.explabs.com/linkscanner/default.aspx
 * Unmask Parasites- http://www.unmaskparasites.com/
 * --Hu12 (talk) 19:53, 12 February 2011 (UTC)

Where are the guidelines?
Where are the guidelines for what is blacklisted and what is not? Is it based upon the content of the sites, i.e. unmanageable advertising, or upon the action of editors in adding dubious citations? Or both? Could the guidelines be linked in a header from this Interface page? --Bejnar (talk) 17:13, 3 June 2013 (UTC)

Article-only blacklist?
Do we have the technical capability of designating links as blocked in articles only? In an IFD or PUI discussion, occasionally I get blocked by the spam filter for trying to link to a site as the source for a copyvio image. For example, examiner.com should obviously never ever in a million years be linked to from an article, but if we had the technical capability of doing so, it would be nice to be able to link to it at IFD/PUI. --B (talk) 00:46, 15 April 2013 (UTC)
 * No, that is not possible (though I think that there have been several bug-requests filed about changes to the system, none have ever received significant attention - this is one of those type of things).
 * In any case, you could just remove the 'http://' from the front, and then you can add it. It is a bit more work for the person to actually follow the link (copy the data, paste it in the address bar), but that works.  That is also done for copyvio-tagging of articles when the link is on one of those domains.  (general note: any other use of this trick besides for discussing the merit of a link, or reporting copyvios, is generally determined to be 'disruptive', especially in mainspace, as it can be explained as a attempt to circumvent blacklisting).  --Dirk Beetstra T  C 08:48, 16 April 2013 (UTC)

What happen if not all links are removed
\*.onion was recently added, but the article Tor (anonymity network) still has a link of .onion type as per a year or two old consensus. There is no whitelisting of that link, so what impact will the blacklisting of *.onion have on that article? Belorn (talk) 08:45, 18 April 2013 (UTC)
 * Because it was added to the the article prior to blacklisting it will have no effect on editing that article or saving future changes. However, if the link is removed, any attempts to re-add a .onion link will be blocked and any changes or additions which containing that type of link cannot not be saved. Exististing blacklisted links in Wikipedia articles won't effect the ability to edit those articles. Any future additions of a blacklisted link will be blocked. --Hu12 (talk) 09:37, 18 April 2013 (UTC)
 * If the link is really relevant there (e.g. the official link of the site), then specific whitelisting could be requested to include the link (here, one could do it so that if it accidentally gets removed, it can be re-added without problem - I've run into a case like that a couple of days ago, where I had to 1) whitelist the link, 2) revert the edits to get it back to appropriate, 3) de-whitelist the link, and then 4) request whitelisting, which is quite a hassle). --Dirk Beetstra T  C 04:25, 30 April 2013 (UTC)

Collateral damage: which blacklist?


Over on the whitelist page there's a whitelist request for shoe-shop.com. I had offered to whitelist the 'about' page, as is standard practice for blacklisted sites that have their own Wikipedia article, until I noticed that it isn't blacklisted. Locally or globally. It isn't in any logfile. I can't find any record of prior discussion in the archives of this page or on Wikipedia talk:WikiProject Spam.

That tells me it's collateral damage. However, I can't find any wildcard entries in either blacklist that would trigger on it. And yet, a URL containing shoe-shop.com triggers the blacklist error message. I'm sure there's some pattern that I'm not seeing. How would I locate the problem blacklist entry? ~Amatulić (talk) 15:14, 8 May 2013 (UTC)
 * That is the rule that is catching your URL. Werieth (talk) 16:07, 8 May 2013 (UTC)
 * PS thats on meta. Werieth (talk) 16:09, 8 May 2013 (UTC)
 * PS thats on meta. Werieth (talk) 16:09, 8 May 2013 (UTC)


 * Huh. Now that I see it, it's obvious. Thanks. How did you find that?
 * It looks as if the .com suffix wasn't intended to be caught in that.
 * I posed a question at meta, but I suspect it's easier just to whitelist the domain here. ~Amatulić (talk) 20:13, 8 May 2013 (UTC)
 * Ive got a tool that checks things against both blacklists as a feature for another site. Werieth (talk) 20:44, 8 May 2013 (UTC)
 * For those access to COIBot on IRC, you can ask COIBot as well ('wherelisted shoe-shop.com'). Other option, include the link-summary template here, wait for COIBot to refresh/generate the report on meta, and it should be listed there (see m:User:COIBot/LinkReports/shoe-shop.com) as well.  Hope this helps.  --Dirk Beetstra T  C 06:32, 9 May 2013 (UTC)
 * You could write a short program on your computer's command line to test each regular expression on a given string. For example, in bash:


 * —Psychonaut (talk) 09:33, 29 June 2013 (UTC)

Use for unhelpful reference URLs that aren't actually spam?
Is the blacklist sometimes used for unhelpful reference URLs that aren't actually spam? Should it be?--Elvey (talk) 18:10, 11 June 2013 (UTC) example below:

apparently unusable EBSCOHOST links
Based on a small sample I looked at, it seems Wikipedia has many apparently dead links (like this intended to be to PDFs of the form ebscohost.com...pdfviewer...: All 7 of the 323 pages containing ebscohost and pdfviewer] I looked at had dead EBSCO links. These are NOT links that hit a paywall (like this. Rather, they bring up 404-like server error messages.

A second problematic type of EBSCO link are the three added by a user's (sole ever) edit that are of the form hxxp://0-web.ebscohost.com.sculib.scu.edu/ehost/pdfviewer/pdfviewer?sid=[hex string]@sessionmgr13&vid=4&hid=13. (Note the bold portion!) Presumably, these links work ONLY for subscribers that are ALSO at SCU. We shouldn't allow such links, and perhaps the blacklist (or a similarly functioning parallel system) would be a good solution? Or maybe there's a formula that can be used to fix all such ebscohost.com.[foo].edu and ebscohost.com...ca links?

PS Since I started writing this, I've noticed that EBSCO staff is heavily editing their own article. On the plus side, maybe that means they'd be available, willing, and able to help fix these links or suggest ways to deal with them systematically. note posted. --Elvey (talk) 18:10, 11 June 2013 (UTC)


 * The first problem is non-persistent URLs. Eg, Marcelo in the Real World has a link to http://web.ebscohost.com/ehost/pdfviewer/pdfviewer?sid=fa89dffb-c9cf-4099-b46f-b0be569e7258%40sessionmgr4&vid=14&hid=19. That is not a persistent link. The correct permalink (as EBSCOhost calls it) for that article is http://search.ebscohost.com/login.aspx?direct=true&db=ulh&AN=37698669&site=ehost-live&scope=site.


 * Could you have converted the former into the latter without the other info in the cite template, ?--Elvey (talk) 20:30, 13 June 2013 (UTC)
 * I don't see why not. But I don't intend to do it. That article has 7 such links - there's little to be gained by doing one and not the others and I'm not interested in the article. Nurg (talk) 21:14, 13 June 2013 (UTC)
 * I think you misunderstood my question. I'm asking if something is possible, not that you edit Marcelo in the Real World. I assume you figured out how to convert the former into the latter by using the other info in the cite template, but I wonder if my assumption is correct.--Elvey (talk) 23:19, 13 June 2013 (UTC)
 * Indeed I misunderstood. I doubt it's possible to convert the non-persistent URL to the persistent URL using the data in the template. I have access to EBSCOhost through a library and searched for the article and got the permalink. Nurg (talk) 06:44, 14 June 2013 (UTC)


 * The second problem is the use of a proxied URL, ie, the link points to a institution's proxy server such as sculib.scu.edu. This is not specific to ebscohost - it happens with links to other subscription databases too. A search for "ezproxy", for example, will bring up hundreds of such links. They are a bad thing. Nurg (talk) 08:39, 12 June 2013 (UTC)

I am tempted to see these sites as redirects, which will be location-dependent whether they work. I would consider that these should typically be converted to direct links to the object (within educational institutions, one can generally use a web-proxy to get to literature - a direct link would either be the link on the server where the literature resides, or the DOI. If someone then has to change that to go through the proxy, that is then something that that person needs to do (we can't anticipate that by any definition).  Links through proxy servers have no place whatsoever.  I am somewhat tempted to say that these need blanket blacklisting on meta, as they could possibly be abused to circumvent other blacklistings (for a relatively open proxy), and serve no function whatsoever to most readers except for the (few) ones that have access through the proxy - I doubt even if the url can be understood well enough to be able to figure out a real link from it.  It is however going to be very obnoxious for the users that in good faith insert the proxy url they copy from their web-browser and then they can't save, and one could think of cases where it is appropriate (if information is only available to people who can pass the proxy and no-where else in the world, it could still a good reference for certain information - think of it of a book of which the single copy is in an nearly inaccessible library (the library in the Vatican), it is still verifiable by proxying through people who do have access to the library (ask the pope)).

Note, that with creative regex rule-writing, we could blacklist the two 'bad' examples of Nurg (the non-persistent link and the institution proxies), still enabling good ones (the permalinks). --Dirk Beetstra T C 09:30, 12 June 2013 (UTC)


 * I suppose that we already have examiner.com in here, and it's here, I assume, because of WP:RS, so I think it's appropriate that we add regexes for the impermanent URLs. (Arguably it would be better to have a similarly functioning parallel system with its own error messages handle sites like examiner.com and this ebsco problem, but in the meantime, I let's move toward (cautiously!) putting in regexes to handle them.)   Let's continue discussion at https://meta.wikimedia.org/w/index.php?title=Talk:Spam_blacklist&action=edit&section=11 ? I see these links on other sites - e.g 'fr.' .--Elvey (talk) 20:30, 13 June 2013 (UTC)

NOTE: Discussion continues at https://meta.wikimedia.org/w/index.php?title=Talk:Spam_blacklist&action=edit&section=11.

I'm getting no response there. Can we consider adding this here for now? :

ebscohost\.com(\.|.*(pdfviewer|EbscoContent))

--Elvey (talk) 22:38, 26 June 2013 (UTC)

Documentation error
blacklist starts with some comments and the last line of these comments is
 * " #  * Every non-blank line is a regex fragment which will only match hosts inside URLs"

This is plainly untrue. (Counterexample: \bbooks\.google\.com/books\?vid=ISBN0521009464\b) Let's replace it with a true statement, e.g.:
 * " #  * Every non-blank line is a regex fragment. Most only match hosts inside URLs. A few only match certain directories, parameters or filenames of a host. For example, references to certain books at books.google.com are blocked.",


 * " #  * Every non-blank line is a regex fragment. Most only match hosts inside URLs. A few must also match other parts of an URL."

or
 * " #  * Every non-blank line is a regex fragment. Most only match hosts inside URLs. For a few hosts, only certain directories, parameters or filenames are blocked."

--Elvey (talk) 19:48, 22 June 2013 (UTC)

Would you please fix the above documentation error? --Elvey (talk) 22:38, 26 June 2013 (UTC)
 * Yes check.svg Done. Sorry for the delay in responding. — Mr. Stradivarius  ♪ talk ♪ 12:57, 8 July 2013 (UTC)