User talk:Franciscompnsc

Working On A University Project About Wikipedia - Technical Questions About Its' Information Security Systems
Good evening, My name is Francisco Camões, I'm an 18 year-old student from Portugal. I'm currently studying at FEUP (Engeneering Faculty of Oporto's University). Me and 4 other students were given a theme to do a project about, which was to search for Wikipedia's systems to control the incoming information and it's fiability. Sadly, we couldn't find enough detailed information about it (we looked through practically every wikipedia's corner) and we're fearing our project won't correspond to our expectations. This beeing said, we would like to kindly ask you to provide us some information about the following subjects, if possible.

1. We know Wikipedia's extremely efficient in deleting "information vandalism" in its' databases, but we wanted to know how that is processed. Is the information filtered by any computer program (bot) or is it manually checked by wikipedia's staff &/or other individuals? Or both?

1.1. Does wikipedia use any program to detect (...) or is it manually deleted: - duplicated information - fake information

2. Do wikipedia users have different credibilty, when writing about an article, according to previous activity on wikipedia (this means: an individual who contributed with correct & verifiable information in other posts has a bigger chance of when writting a new article getting accepted and that article beeing posted online faster? If so, that process is executed automatically or manually by the wikipedia staff?)

We would be truly thankful if we received an answer from you because this project counts a lot to our final grade,

Kind Regards, Francisco Camões, FEUP, Oporto, Portugal --Franciscompnsc (talk) 08:09, 7 October 2014 (UTC) - question copied from Village Pump by Alsee (talk) 14:39, 8 October 2014 (UTC)


 * "Wikipedia" is a vague concept. The Wikimedia foundation runs the servers and provides the infrastructure, but, as far as I know, there is no general automated gatekeeper system. Maintaining the quality of the information is done by the Wikipedia community (or communities, if you look at different projects). On the English Wikipedia, a number of Bots try to detect and revert vandalism using heuristics - see e.g. User:ClueBot. But these bots run on external servers (which may or may not be provided by the Wikimedia foundation) and use (mostly) the same API as normal users - in particular, on a technical level every user could run a bot on his own computer. However, bot operation is guided by community-enforced rules - see Bots. On the other hand, many editors like working as "recent change patrollers" and improve or revert incoming changes. Yes, reputation does matter, but this is only on a social level. There is no "reputation score" being kept by the system, although users may well look at editing statistics when interacting with a user they don't know well. --Stephan Schulz (talk) 08:27, 7 October 2014 (UTC)
 * As a note, the vandal-fighting bot is User:ClueBot NG, from the family of ClueBots. As mentioned, it's volunteers' duty to verify the contents on Wikipedia. Wikimedia Foundation is responsible for the stability of the technical infrastructure, not the content, though some staffers regularly help us with the work as well. Also, there are pages with the Pending Changes feature enabled, which means every edit by new or anonymous users are manually checked by pending changes reviewers. Zhaofeng Li [ talk... contribs... ] 08:47, 7 October 2014 (UTC)
 * Also WP:Abuse filter - unfortunately, I have seen this abused for outright censorship just recently, and may be being used so still. The first and best "main line of defense" is patrolling of WP:Recent changes, originally done by hand, now more often perhaps with WP:Huggle or WP:Twinkle; these collaborate with the weaker setting of the abuse filter via Special:AbuseFilter/examine.  Wikipedia also relies heavily on WP:Admins who try to block the source of vandalism via blocking accounts, WP:rangeblocks on IP addresses.  Pending Changes is a rare and problematic form of WP:Page protection. Wnt (talk) 13:39, 7 October 2014 (UTC)
 * Also should have mentioned WP:watchlists. Though these are used not infrequently more to check if your point of view differs with that of the watcher...  Wnt (talk) 20:08, 7 October 2014 (UTC)

I patched in these responses to you, as your question and its answers will be archived fairly soon. --Ancheta Wis   (talk  &#124; contribs) 15:41, 8 October 2014 (UTC)

Wikipedia is an encyclopedia built by its community. Thus the answer to your question is contained in the community's responses to change. As you can see, that is the reason the encyclopedia is bigger than its software, data, or hardware. The encyclopedia is alive. --Ancheta Wis   (talk  &#124; contribs) 16:03, 8 October 2014 (UTC)

We have a page on Reliability_of_Wikipedia you should look at. In particular:
 * Several studies have been done to assess the reliability of Wikipedia. An early study in the journal Nature said that in 2005, Wikipedia's scientific articles came close to the level of accuracy in Encyclopædia Britannica and had a similar rate of "serious errors". The study by Nature was disputed by Encyclopædia Britannica, and later Nature replied to this refutation with both a formal response and a point-by-point rebuttal of Britannica's main objections. Between 2008 and 2012, articles in medical and scientific fields such as pathology, toxicology, oncology, pharmaceuticals, and psychiatry comparing Wikipedia to professional and peer-reviewed sources found that Wikipedia's depth and coverage were of a high standard. According to a study published in the European Journal of Gastroenterology and Hepatology, however, Wikipedia articles about gastroenterology and hepatology were not sufficiently reliable for use by medical students. (citations omitted here)

Wikipedia works in a very unusual way, and some aspects are extremely surprising to outsiders. Donations pay for staff to manage the computers and the software, but there is *no* writing staff. All articles are written by volunteer work. Wikipedia is The Encyclopedia That Anyone Can Edit. Anyone can immediately jump in and edit (almost) anything. We want new people to join in and start adding-to or fixing our articles. Our initial assumption is that people are trying to help. One of fundamental principals is to Assume Good Faith: Unless there is clear evidence to the contrary, assume that people who work on the project are trying to help it, not hurt it. New editors are given almost unlimited power and freedom to make changes. That initial experience is one of the main things that attracts new people to keep working and become valuable experienced editors. Valuable experienced editors work hard to ensure the quality of our articles. So the short answer to your question is that the articles are good because a vast army of volunteers aggressively strips out any bad information. I'll get to the details later.

Our approach is that all articles are a work-in-progress. All work is volunteer work. Volunteers create new articles. Depending upon the experience of the editor it may start as a poor quality article. The information in it may be unverified. We generally consider a poor quality unverified article to be better than no article at all. Readers might still find it useful, and having even a poor quality article is an important foundation for us to go to work improving it. Important or popular articles get a lot of attention and reach extremely high quality quickly. Unimportant and obscure article may get little attention, sometimes they stay at a poor quality, with unverified information, for a long time. The assumption is that eventually we'll get around to improving it.

Our methods and policies are built around effectively anonymous volunteer work. Expertise on a subject is helpful, but anyone can claim to be an expert. Such claims mean little to nothing here. We have a number of policies that enable an anonymous mob of volunteers to build a quality encyclopedia. Perhaps the most surprising is we firmly reject the idea of writing [WP:Truth|Truth]] in an article. In fact accusing an editor of inserting Truth into an article is a powerful argument for removing what they added, and for keeping it out. Our standard is that information needs to be Verifiable in Reliable Sources. Any content which is challenged, or is likely to be challenged, needs to cite a reliable source backing up that information. Anyone can remove unsourced information or marked it with a citation needed tag.

Verifiable means that if published sources say the earth is flat then we will say the earth is flat. Arguing that the article should contain Truth, arguing that the earth actually is round, is considered disruptive. Disruptive behavior can lead to a ban. That sounds odd, but it is the only way to prohibt unending argument about The Truth of politics or religion or evolution or climate science or astrology or ghost voices.

''1. We know Wikipedia's extremely efficient in deleting "information vandalism" in its' databases, but we wanted to know how that is processed. Is the information filtered by any computer program (bot) or is it manually checked by wikipedia's staff &/or other individuals? Or both?'' 1.1. Does wikipedia use any program to detect (...) or is it manually deleted: - duplicated information - fake information

We have a series of "filter" layers. Note that any random person looking at a page has the ability to remove "information vandalism" in as little as three clicks. Click PageHistory, click the Undo link on the bad edit, and click save.
 * 1) We do have software (bots) that can detect and remove a lot of vandalism. Some of that software even uses artificial intelligence learning methods. It saves us a lot of work cleaning up garbage, like when someone writes "penis penis penis" into an article. However software us unable evaluate the quality of information contained in ordinary text.
 * 2) Some people watch a stream of new edits. They catch more vandalism, and they can catch things that are obviously wrong. However they usually don't look at the full page, only the isolated change. They cannot spot non-obvious "information vandalism".
 * 3) We have Page Patrols, mainly for new pages. People look at the page to see if it's basically appropriate. They can mark bad new pages for deletion, and they mark the page for improvement if they see significant problems with it. For example if the page lacks references to verify the page information then they can add a large banner at the top of the page. The banner marks the page for improvement, and warns readers that the information on the page requires verification. Page Patrols rarely look closely to check for "information vandalism", but if it's a new page then there was no old information to damage... and it's almost certainly a rather obscure topic.
 * 4) Editors can add any page they are interested in onto a watchlist. They are notified any time the page is changed. They can view the exact difference between the old version and new version, seeing exactly what changed. This is a fairly strong layer of protection. People watchlist subjects they consider important, subjects they know a lot about, or which they have worked on recently.
 * 5) We have Project groups. Most pages are marked with category tags, and a group of people in a Project group will supervise all pages in certain categories.
 * 6) The highest level of protection is when there are multiple editors active on that article. On any important, popular, or controversial topic information often goes through extensive examination before it is allowed to stay in an article.

An important point in all of this is that someone knows ZERO about a topic can remove or tag unsourced information. And if the information does have a source then somebody with zero subject knowledge can generally Verify that the source does back up the information that was added.

We have a number of mechanisms for dispute resolution. A main method is to start an RfC - Request For Comment. This draws in a significant number of uninvolved editors to ensure that all of our policies are being properly followed. If a group of people try to add bad content to an article then a single good editor can start an RfC to draw people in to fix it. Depending on the situation, the people who were trying to add the bad content may be sanctioned with a temporary or permanent block.

Everyone's entire edit history is stored and easily viewed. If someone is caught adding "information vandalism" then we can go back and review every edit they ever made. We can clean up any other damage they have caused.

2. Do wikipedia users have different credibilty, when writing about an article, according to previous activity on wikipedia (this means: an individual who contributed with correct & verifiable information in other posts has a bigger chance of when writting a new article getting accepted and that article beeing posted online faster? If so, that process is executed automatically or manually by the wikipedia staff?)

Almost all pages are open to public editing. Any new person can show up and completely rewrite the page. That change is instantly visible online. This does mean that the next person to view the page may see garbage. We generally get things cleaned up very quickly and the chance of anyone seeing a damaged page is negligible. In general we accept this risk. The fact that our pages are open to public editing is a core aspect of how we attract people to come and build our articles.

We do have two levels of protection we can apply to restrict edits to a page, but first I'll answer the rest of your question.

In a technical sense these are substantially the "levels" that exist:
 * 1) Blocked accounts. Pretty simple, and pretty irrelevant. Chuckle. They severely violated rules, they can't edit at all.
 * 2) IP addresses, no account. We give almost unrestricted editing capabilities without even making an account. This includes the ability to create new articles. Their IP address serves as their account name. The limitation is that IP address can never advance to autoconfirmed.
 * 3) New account. An account with a history of less than 10 edits, or less than 4 days old, has the same capabilities as an IP address.
 * 4) Autoconfirmed. This is the average editor. This merely adds the ability to edit pages set to autoconfirmed protection level.
 * Autoconfirmed protection level is activated if a particular page is being repeatedly vandalized by outsiders. Anyone can trivially reach autoconfirmed status if they have any honest interest in joining us.
 * 1) Admin. People can request admin. Approval is by vote. Admin grants some management capabilities, including the ability to edit fully protected pages. Note: Admins are forbidden to use their powers on pages that they are personally editing. Admin does not give them higher status to decide what goes onto a page. They can only provide service editing a page after the other editors on that page have made an edit request.
 * Full protection is used on a page when editors get in an edit battle, an excessive flurry of conflicting changes. The page is protected to halt the edit war. Editors are forced to discuss the issue and come to agreement on what changes to make. Then they request that an uninvolved admin make that change for them. The page protection is removed once the controversies are sorted out and things calm down.

Reaching autoconfirmed status is trivial, and admins are forbidden to use their powers if they are involved in a page. So for all practical purposes all editors have equal technical status in what a page should say.

In a social sense editing isn't too far off from equal status. The editing community is huge, in general any name you see will be unfamiliar. Names carry no clear indication of history or status, although it is fairly easy to check someone's history. A long history may earn some respect. No-account IP addresses are uncommon and are viewed with skepticism. When an IP says things that conflict with our policies they are firmly disregarded, but IP addresses can be extremely influential if they cite Reliable Sources or if they make solid policy-based arguments. A named-account may also be treated as little better than an IP address if someone notices they have little history and they are making arguments that conflict with our policies.

What it boils down to is that experienced editors know our methods and policies really well, and THAT carries an enormous level of credibly. In general the experienced editor knows what should and shouldn't go into an article and they can give a clear policy-based justification for it. Even a lowly non-account IP address can dominate a discussion and enforce our policies, if there's an experienced editor behind that IP address.

In general Wikipedia articles on any significant topic compare well with conventional encyclopedia articles, and Wikipedia articles on obscure topics don't even exist in conventional encyclopedias. In the final analysis there is no authoritative guarantee for the content you see on any particular page, other than the fact that any decent article has source links and you can verify the information yourself. We firmly state that if you are using Wikipedia for any serious purpose, if you are using Wikipedia for research, then you should NOT be using our article itself. You should be using our article to go to the sources that we cite. That's where the original information is. Alsee (talk) 23:16, 8 October 2014 (UTC)