Wikipedia:Why MEDRS?

Editors who are new to health-related content on Wikipedia are often surprised when their edits are reverted with the rationale of "Fails WP:MEDRS", a shorthand reference to Wikipedia's guideline about sources considered reliable for health-related content. This essay explains why these standards exist.

Summary of the long content below

 * The guideline, Identifying reliable sources (medicine) (MEDRS) is not different from other guidelines, because all Wikipedia content should be generated by summarizing high quality sources (independent, secondary sources written by experts in the field, published by respected publishers).  MEDRS (MEDical Reliable Source) explains how to find such sources for biomedical or health-related content.  It can be treated as an extension and help page based on the already existing policies. Wikipedia represents viewpoints in proportion to their prominence in the reliable sources, especially the secondary sources. Mainstream views, as determined by the sources, carry the most weight. Significant minority views are given less weight, and fringe views (often identified by omission from the secondary sources) carry little to none. It is very easy to engage in original research by cherry-picking primary sources. Editors should cite primary sources rarely, and then only with good reason and care!

Why is this especially important for biomedical content in Wikipedia?
 * What we understand about human health and medicine is based on the basic science of biology, and biology is complex.
 * Biological processes occur on a microscopic scale, with thousands of different kinds of molecules interacting all the time. At the same time, the organism interacts with the environment and other organisms. This becomes mind-bogglingly complex at the level of the actual molecules.
 * Most research is on model organisms (cultured cells, worms, flies, lab mice). (All models are wrong, but some are useful). Isolated cells in culture often behave differently than they do in whole organisms. Furthermore, there are significant differences in the genetics and physiology between animals and humans, so the results from animal studies often do not translate into humans. Finally, model systems are often genetically pure, and don't reflect the genetic diversity of the real world. Only a small portion of research is done on actual people, and even then it is done in artificial context of a clinical trial where the researchers try to change only one thing at a time. This is all part of the scientific process, and rarely produces results that transfer painlessly to the real world. (You might have heard the joke about searching for keys where the light is better. This is a good analogy to how experiments in biology are performed. Sometimes it's so dark that we can't even begin to look where we want to.) The artificiality of the studies presents a serious problem in translating results to any natural setting, much less translating them to human health or biology.
 * Biology is still a young science, and even our knowledge of basic things is fragmented; our big-picture ideas change all the time. Those from physical sciences or technology sometimes have an especially hard time understanding this — biological systems can't be mapped out and expected to act rationally.
 * Human biology is even harder because there are constraints as to what we can do. We can't cut up healthy living humans and examine them microscopically to see what is going on in real time (unless you were Josef Mengele). Nor is it even possible to collect data on certain processes such as surgery. To really assess quality of care it would be necessary to perform placebo surgery, which is widely considered unethical.
 * Translating insights from biology into medicine (in other words, applying the basic biological research to create technology) is another level of difficulty:
 * Even scientific breakthroughs in biology take many years, and a great deal of money, to turn into anything useful. Even then, many new drugs, medical devices or diagnostic tests based on the best of science fail in adequately powered clinical trials, when actually tested on humans (the "adequately powered" thing is important – that means it has been tested in enough people so that we have a good sense of whether the outcome is meaningful – if it is probably true or is a fluke – you can flip a coin 5 times and get heads every time) This uncertainty and difficulty also goes for basic research that suggests that X or Y might be toxic or bad for you.  It is hard to figure out what is true in the field of toxicology; toxicologists think very carefully about things like how people are exposed to a substance, at what dose, and over what amount of time, and try to come up with useful ways to model that in lab studies.  Dumping a ton of a chemical directly on cells and killing them, tells you nothing about whether skin contact will harm you, nor at what dose!
 * There is much we don't know even about existing therapeutics and other treatments. For example, there was a ton of great basic research (done in cells and model organisms) that showed a connection between oxidative damage and cancer; this research suggested that taking antioxidants might prevent cancer. So the NIH funded a huge (adequately powered!) clinical trial, the Selenium and Vitamin E Cancer Prevention Trial, so we could really learn if taking Vitamin E (a great antioxidant) could actually prevent cancer in actual humans.  Guess what?  People taking Vitamin E got more cancer, and the trend was so clear that they had to stop the clinical trial early.   Mind-blowing.  Welcome to biology.  Welcome to medicine.
 * The primary scientific literature is very exploratory, and not reliable. The use of WP:PRIMARY sources is really dangerous in the context of health.
 * The primary literature is written by scientists, for scientists. It is not intended to be taken as health advice by everyday people – it is not meant to be taken out of context and applied.
 * Because the work is exploratory, much of it turns out to be false leads or dead ends and is simply ignored by other scientists. The ignored articles are not marked in any way, for us to know which ones they are.   The replication crisis is being discussed and addressed throughout the health and biomedical sciences; it is especially acute in the field of psychology.
 * The mapping work — determining how the exploration is going and where we stand — is done in review articles. (Ignored articles aren't usually explicitly mentioned in reviews either — they are busy with building up accurate maps, not debunking dead ends.)
 * When clinical trials (experiments done on humans) are published, those papers are also primary sources. Some clinical trials are very well done (have enough subjects and are appropriately randomized, blinded, etc.) and some are very poorly done. It is very common to find publications of small clinical trials, the results of which cannot be generalized and which often contradict each other.  It is not easy for nonscientists to tell which is which. So we still look for review articles to help us understand even this kind of primary source.
 * A lot of people have strong opinions about health-related matters.
 * It is about us and our loved ones, after all.
 * People take things that happen to them very seriously, and try to generalize from them. But this is not valid scientifically (tiny sample and cognitive biases) and people too easily mistake correlation for causation.
 * There is a lot of money involved in health-related matters — everything from hype around basic research to drive donations to universities or sales of newspapers/attention to TV news, to companies trying to sell treatments of all kinds. With our 24-hour news cycle and the need to keep people coming back to websites, there is huge hype around basic science that is not ready for showtime in any responsible way.  This is why you see the phrase so often: "If you have questions, discuss them with your doctor."  Your doctor (if they are not a quack) will almost always tell you, "We don't know yet."

The result of all of this is that the world is awash with content about health. All kinds of media holler at us every day, about "new THIS" and "shocking THAT". Very often that content is dead wrong, or dramatically overstates what we can confidently say, based on the science. And many people have strong ideas that are not based on science at all.

But as an encyclopedia, Wikipedia is committed to providing reliable information to the public. We have nothing to do with hype or eyeballs or the 24-hour news cycle. We go slow, and say what is certain (which includes saying "we don't know" or "there is insufficient evidence to say X"). All of Wikipedia stands on, and is based on, the consensus of whatever field a given article falls within. We always have to think carefully about what sources we use to generate content, and this is especially true for health-related content. For health-related content, the field is evidence-based medicine. And per WP:MEDRS – which the community created after long and arduous discussion – we reach for review articles published in the biomedical literature, or statements by major medical or scientific bodies.

Secondary sources generally
Wikipedia is an encyclopedia. It is not a newspaper (we aren't in a hurry, and we don't have to report the latest and best). It is not a journal or a book, pulling together all the primary sources into a coherent picture — that is what scientists and other scholars do in review articles in journals, and what historians do in their books.

Our mission is to express the sum of human knowledge – "accepted knowledge", in the words of WP:NOT. We are all editors. Our role is to read and understand the reliable secondary and tertiary sources, in which experts have pulled the basic research together into a coherent picture, and summarize and compile what those sources say, in clear English that any reader with a decent education can understand.

In articles related to health, editors who want to cite primary sources and create extensive or strong content based on them generally fall into one of three groups.
 * Sometimes they are scientists, who treat Wikipedia articles like they themselves are literature reviews and these editors want to synthesize a story from primary sources. But articles here are encyclopedia articles, which is a different genre.  Each article is meant to be "a summary of accepted knowledge regarding its subject".  Secondary sources are where conclusions stated in primary sources are "accepted" or not.
 * Sometimes they are everyday people, who don't understand that the scientific literature is where science happens – it is where scientists talk to each other. The scientific literature is not really intended for the general public. The Internet has made it more available to the public, as has the open access movement.  Both are a mixed blessing.  The downside is that everyday people may take research papers out of context from the ongoing and always-developing discussion among scientists, and take individual results as some kind of gospel truth, when each paper is really just a stepping stone (sometimes a false one) as we (humanity) apply the scientific method to understanding the world around us.  Non-scientists may not know that many research articles in biology turn out to be dead ends, or unreplicable, or even withdrawn.  When a research paper is published, we cannot know if it will eventually turn out to be replicable and/or accepted and built on by the relevant field, and if it will not.  Reviews tell us that.
 * Sometimes editors wanting to use primary sources are agenda-driven — there is something in the real world that is very important to them, and they want that idea expressed in WP and given strong WP:WEIGHT. In the very act of doing that — in selecting a given primary source and giving it a lot of weight (or any weight at all, actually) — they are performing original research. It is sometimes hard to get people to see this.

Wikipedia is not about what you think is important, right now, nor even about what the media is hyping today. It is about what we know, as expressed in reliable, secondary sources. (Independent ones!) It is so hard for people to differentiate what they see and what they "know" from what humanity — as expressed by experts in a given field — knows.

It is hard for people to think like scholars, with discipline, and actually listen to and be taught by reliable, independent, secondary sources instead of acting like barroom philosophers who shoot from the hip or letting media hype drive them.

NPOV depends mightily upon editors' grasp of secondary sources. We have to find good ones – recent, independent ones – and absorb them, and see what the mainstream positions are in the field, what are "significant minority opinions", and what views are just plain WP:FRINGE. We have to let the best sources teach us. And yes, it takes commitment – both in time, and to the values of Wikipedia – to really try to find the best secondary sources, access them, absorb them, and learn from them how to distribute WEIGHT in a Wikipedia article.

What makes this even more challenging is that because this is a volunteer project, Wikipedia editors often come here and stay here due to some passion. This passion is a double-edged sword. It drives engagement and the creation of content, but too often brings with it advocacy for one position or another. This is a quandary. The discipline of studying secondary sources and editing content based on those sources, in putting egos aside and letting the secondary sources speak, is the key that saves Wikipedia from our personal, limited perspectives.
 * While WP:OR allows primary sources to be used, it is "only with care, because it is easy to misuse them."
 * WP:NPOV says "Neutrality assigns weight to viewpoints in proportion to their prominence. However, when reputable sources contradict one another and are relatively equal in prominence, describe both approaches and work for balance. This involves describing the opposing views clearly, drawing on secondary or tertiary sources that describe the disagreement from a disinterested viewpoint."
 * WP:VERIFY says, "Base articles largely on reliable secondary sources. While primary sources are appropriate in some cases, relying on them can be problematic. For more information, see the Primary, secondary, and tertiary sources section of the NOR policy, and the Misuse of primary sources section of the BLP policy."

The call to use independent secondary sources is deep in the guts of Wikipedia. This is a meta-issue — a question of what it means to be an editor on Wikipedia.

Biology is difficult
Biology is difficult. It is still a young science, and our knowledge of even basic things is fragmentary, and even our big-picture ideas are changing all the time. Human biology — our understanding of what is going on inside healthy people and inside sick people — is even harder, and there are serious barriers to furthering our understanding. People in the physical sciences or technology seem to have an especially hard time understanding this.

The physical sciences have given us deep insight into material reality, and because the science there has progressed so far, we can do amazing things. For example, Moore's law is a direct result of our advances in physics and materials science and our ability to apply science — to create technology to serve us, to the point where we now have amazing things like smartphones — computers we can hold in our hands and interact with in intuitive ways, capabilities that just a couple decades ago would have taken an entire room full of equipment to provide and that only cutting-edge scientists could operate.

Physics deals with dead matter. We can poke and prod without doing harm, and what we are looking at is what we are looking at. Life (made of physical matter, of course) is way more complicated. In comparison, "dead" is easy; life is hard.

Biology remains primarily an observational science. Don't get me wrong — biologists do experiments — they poke and prod living things in various ways, to help them try to flesh out the pictures we are still forming about what is going on in living things. But we are not in possession of a set of "laws of nature" such as those that govern physics. Even what we once called the central dogma of molecular biology — that DNA "makes" RNA which "makes" proteins — has turned out to be far more complicated than biologists originally thought. We still don't fully understand what something as basic as aspirin does in the human body, much less what it does in a particular person's body. We understand a lot, but our knowledge is far from perfect. Medicine like aspirin is technology — we are doing our best to apply the findings of biological science to solve problems. We understand what aspirin itself is, very well (the chemistry, not the biology), but what happens when you put it into an average human body, or a particular person's body, is another question altogether. The science is too weak in biology, especially human biology, to apply and evolve technology with anywhere near the speed of information technology.

These fields are different worlds, scientifically speaking. (I am not even getting into structural differences that make the markets so different — innovators in medicine have to deal with regulators and whether insurance companies will pay for things, with serious ethical issues involved in experimenting on animals and humans, and with the huge amounts of money and time and risk in bringing new products to market. All these make medicine a different universe from information technology.)

Going a little deeper into the science...

Most everybody has heard of "DNA", but what is it? It is a polymer — it is a chemical made up of many subunits all connected in a chain. Each of those subunits is a chemical called a nucleotide. In simple terms, there are four different nucleotides: adenine, guanine, thymine, and cytosine, and we often describe the chain they make when they link together by using the first letters of their names: A, G, T, and C. So DNA is a long chain of As, Gs, Ts, and Cs. We can describe a given instance of a DNA molecule as a chain of letters: AAGTCTTGACT, etc.

A "genome" is, basically, all the DNA in a cell. (A given species will have a pretty consistent genome on a high level, but every instance of that species will be slightly different — there will be many small variants — some of them a single nucleotide change, some of them being whole deletions or rearrangements of DNA segments. But genomes remain consistent enough from organism to organism within a species that we can indeed meaningfully talk about "species"). The simplest bacteria (which are some of the simplest living organisms) have DNA that is a chain 139,000 nucleotides long (ATCTG, etc., times ~139,000). Microscopic, mind you!

But ... who cares? Why does that matter? Well, DNA is kind of the "blueprint" of the cell.

(We need to be careful here — people use a lot of metaphors in biology, and they are starting to slip into thinking about "genomes" as pure information—as literally some kind of code, like software. But in the real world, any genome is DNA, which is not abstract information.  DNA is an actual, physical thing in every cell in every living thing. It interacts physically with other actual chemicals, which in turn interact with other chemicals, and so on and so on.  The sum of those actual interactions is what we call "life", and even "consciousness".)

Within the long, long chemical chains of DNA, certain segments function as a kind of code (we call these segments "genes"). The cell has machinery (yes! — actual mini-machinery that is amazing to behold and consider — see this youtube video for an animation) that creates a different polymer, RNA, by reading off the nucleotides on DNA. The RNA can in turn function as a kind of code that other machinery "reads" and builds proteins from, or it can go off and do things on its own—like become part of a ribosome, or directly interfere with other molecules, or do many other things. We used to think there was a lot of junk DNA in the regions of DNA between genes — we used to think the DNA in those regions was just inert. But we are learning more and more that all kinds of interesting things are happening there.

All those things (DNA, RNA, proteins, and many other things) all interact with each other. Zillions of interactions, all happening on a microscopic scale, and changing all the time.

Now, think about science. Newton could stand in his back yard, drop an apple, and measure how long it took the apple to hit the ground. He could change the experiment — climb a ladder and drop it from higher, and time that. But of course, all that is pretty... uncontrolled. What if it's windy one day, or really humid so the air is thicker? How high exactly was his hand above the ground? He could take it inside, where there is no wind. He could put the apple in a holder attached to a ruler, and release it from a precisely determined height. What he is doing there, is thinking about how to design a controlled experiment, so he is only testing one thing so that he will know exactly what is changing and can later make sense of the results of his experiment.

Turn back to a bacterium. Tens of thousands of nucleotides in even the simplest genome, dozens of genes encoding for many different proteins, etc., etc., and everything interacting with each other, and the whole bacterium interacting with whatever is floating around it, including...other bacteria. Alive, and constantly changing. And all microscopic and invisible to the naked eye. Think about trying to do an experiment and trying to change just one thing. Think about how easy it would be to contaminate the experiment—to have the tiniest jot of some chemical on one of the instruments you are trying to use to manipulate the bacterium. It is really, really hard, just physically, to perform an experiment in biology; it takes a lot of skill and training, and it is really hard to even design an experiment where you are reasonably certain you are only changing one thing.

And remember, this is just a bacterium. You can kill it, chop it, dump chemicals on it...pretty much whatever you want to try to create a controlled experiment and then see what happens. What if you want to understand an organism that has multiple cells, like a human? And remember, our cells also interact with each other (where each has their own DNA that is producing mRNA etc all the time), and we have organs and organ systems that interact with one another on a meta-level, and systems like hormones that act across our entire body on a meta-meta level. The complexity is absolutely mind-blowing. Add to that the fact that everyone is different, because each of us is the result of a unique blend of our parents' DNA, and each of us grew up and exist in different environments. So you can give one person a dose of the drug coumadin and it will perform just as you expect it to, but if you give the same dose to another person, and it can be wildly more potent or less potent. (Some drugs are more sensitive to individual differences than others, but all drugs act differently in different people.)

Human biology is harder still
So, trying to understand normal basic human biology is hard. Trying to figure out what is going on in a disease is really hard, too. For example, everybody knows that Alzheimer's is a terrible disease, and we have spent gobs of money trying to figure out what causes it. One of the bad actors is a piece of a protein. The piece is called "A beta" and the whole protein is called "APP". Well, with all the money we have spent, we still do not know what APP does in normal brains, and we still don't understand why the A beta piece gets cut out of it. We don't understand why neurons die in the brains of Alzheimer's patients, nor how to stop them from dying. That is crazy, right? It starts to make some sense when you realize that we have no way (really!) of looking inside a living human being's skull and seeing in detail — way down at the cellular level — what is going on. It's a serious problem! Anyway, we are scrabbling around in the dark. Humans are really, really complicated biological things. There's so much going on.

Since we can't chop up living human beings or do crazy experiments on them, how do we try to figure out human biology? We use models — mostly other organisms on which we can do experiments, and based on the results, we can then try to make guesses about human biology. You might have heard the joke about searching for keys where the light is better. This is what biology is like. People do research in mice, or in cells in petri dishes, or they cut up dead people. We do controlled experiments that make sense, and we can start to put stories together about what is going on. And while we are making progress, our answers are still pretty crappy, pretty fragmentary. (This is why we do experiments on animals. A lot of people, including scientists, struggle ethically with whether it is acceptable to do experiments on animals, and if so, how.  It is not an easy question.  How will we learn about biology if we cannot do experiments with living beings, especially ones that are similar to us?  How do we actually see what is going inside a living being if we do not cut it open and look?  We do not have any technology that allows us to non-invasively look deep inside a living thing on a microscopic level in real time.  That technology just doesn't exist in the real world—we have no tricorders.  These are real questions, and very hard ones.)

Another thing scientists do are "epidemiological studies". These are studies of a lot of living people where you measure a bunch of things and try to find correlations. But correlations are dangerous. For example, say a study found that college kids who sleep in their clothes tend to wake up with headaches—that's a correlation. But what does this really mean? Does wearing clothes while you sleep make you sleep poorly, or maybe cut off blood to your head or something? Well...the study didn't measure how much beer people drank the night before! Right? Now it all makes sense. In this case, the beer drinking is what we call a "confounder", and suddenly we can see that the correlation we saw before is really meaningless. Just because the two things happen to occur together does not mean one caused the other. So while it is tempting to say that the correlation implies causation, it is a very dangerous thing to assume (see correlation does not mean causation if you want to dig into that whole thing more).

Scientists also conduct clinical trials. These are also scientific experiments where the scientists are trying as hard as they can to change just one thing, again so they can actually make sense of the results. There are intense ethical issues involved in doing medical experiments with humans, and large bodies of international and national law and regulation concerning this. A principle that all these bodies of law and regulation share is that there must be some clear benefit to society from any experiment done on a human, and the research subject must be protected as much as possible from any risk of harm. For this reason, scientists don't test things like pesticides directly on humans. Instead, we rely on toxicology studies in animals and cells to try to understand the risks of substances that have beneficial uses outside of medicine. Scientists do test new drugs, medical devices, and diagnostic tests on humans to determine if they work well enough (that is, are "effective"), and are safe enough, to justify their release into the market and subsequent widespread use. Only after doing as much work as possible in cells and animals (and with many medical devices, in human cadavers) can testing on humans begin. Clinical trials of drugs start with small Phase I studies to explore how much of the drug can be used and to get an initial understanding of whether they are safe enough to continue testing. These tests are important and are dangerous. Terrible surprises happen (rarely, but they happen), such as what occurred in the Phase I trials of an immunomodulatory cancer drug called TGN1412; it unexpectedly caused a cytokine storm and multiple organ failure in the six patients to whom it was given. While all of them survived, they required treatment in intensive care and the long-term effects on them are unknown. ) Phase I disasters like that are rare, but the point is, they do happen, even with the most careful planning.  Putting a drug in a human for the first time is one of the scariest and most intellectually and ethically challenging experiments imaginable.

In any case, after Phase I trials establish dose levels and give a high-level insight into safety, Phase II trials start. These are conducted on larger (say 10–100) groups of patients who are actually sick, and the goal is to get an insight into safety and efficacy in actual patients. Companies will often conduct multiple Phase II trials (Phase IIa, Phase IIb, etc.) with various drug formulations and also in different patient populations or for different diseases, further exploring whether and how it makes sense to take on the much bigger expense and challenge of a Phase III trial. Phase III trials test the drug in large numbers of sick patients, with the goal of getting definitive data about safety and efficacy. These trials, which cost tens of millions of dollars to run, are carefully designed; the goal is to have a big enough "N" (see above!) and to follow patients long enough to get a reliable answer, but not too big an N such that patients are endangered unnecessarily and so as not to waste money and time.

All three phases are experiments that are limited in time and in the number of patients who are treated, and there is often more to learn about drugs after they are on the market, and are used by millions of people over years and years. Post-marketing surveillance of drugs is important, and is difficult—again because you are back to doing epidemiological studies that are not controlled, and it is hard to determine whether problems that arise in the population taking the drug (who are sick!) are caused by the drug or not (the correlation and causation problem).

Sometimes—rarely, but probably more commonly as we move into the 2010s and beyond—companies test new drugs against existing drugs. They do that because payors (insurance companies, national health payment systems like Medicare or NHS in the UK) are starting to demand this kind of information to justify drug pricing. This testing provides really valuable data. Outside of that, there are two ways we get insight into what available treatment ~might~ be best for a given patient with a given problem. One is that federal agencies like the US Agency for Healthcare Research and Quality or, less frequently, the National Institutes of Health, sometimes fund head-to-head trials comparing treatments. Another way is that doctors and scientists sometimes pull together all the published clinical trials for a given condition, with all the various treatments that were used, and try to compare how well the treatments worked, and how safe they are, using complex statistical methods. These articles are called systematic reviews, and they are some of the most reliable sources we have for medical information. The Cochrane Collaboration is an example of a group that does this.

So, it is hard to tell what is going on. We use models, we do big studies and make correlations...and all of these are experiments. Scientists also analyze published results and try to make sense of them, often with complex statistical modelling. All of these efforts show us stretching, reaching out, into the microscopic, churning darkness where life happens as well as into the mass of data we have built up about how groups of people respond to various treatments—to try to understand, bit by bit, what we are and how we are affected by diseases and by the drugs meant to treat them.

Primary scientific literature is exceptionally unreliable in biology
Biologists are working like crazy to understand "life" and are under all kinds of pressure to get grants and publish papers. They publish boatloads and boatloads of papers.

These articles are not written for the general public. Scientists do experiments using their model systems, as discussed above, and publish the results in order to talk to other scientists. This is the raw stuff of science. It is messy, and scientists know that they are groping their way toward the truth, together. These papers are very important to science, but they are of almost no value to the general public.

In addition to this, there are some problems with academic science and publishing even in the most reputable journals. Academic scientists are on a kind of hamster wheel. Their research is funded by grants that last for a few years at most. They need to string together grant after grant, in order to keep their labs going. Scientists who run labs spend a huge amount of time seeking out funding opportunities and writing research proposals to try to win them. Generally, in order to win the next grant, you publish high-profile, important papers using the grant you have now. So there is a huge force pushing academic scientists to move from one experiment to the next and to draw conclusions from their research that are important. "Publish or perish" is real — if a scientist cannot win grant funding to keep a lab going, the lab will be closed down and dispersed. It is as harsh as being in sales — you eat what you kill. You can see the potential for problems here.

But how exactly does this hamster wheel affect science?

When you do an experiment, you try very hard to execute it perfectly, so that you actually do what you intended to do and get a valid result. But how do you know if the result you got is true or is just some random answer? This comes down to statistics. If you flip a coin three times, you might get heads three times in a row. Should you stop there and decide that when you flip a coin, you always get heads? Is that "true"? (We all know it is not!) Maybe you should repeat that experiment, and again flip a coin three times. But you could still get heads (or tails) every time. However, if you flip a coin a hundred times, you will likely get about 50 heads and 50 tails. And if you flip a coin a thousand times, you will very likely get about 500 heads and a similar number of tails. The number of "flips" is called the "N" in experimental design. If the N is too small, it doesn't matter how many times you repeat the experiments — none of the experiments are valid. You need a big enough N to get a result you can trust.

Increasing N costs money and time. And repeating a high N experiment costs a LOT of money and time. So scientists often use the minimum N they can that will enable them to publish. And many journals allow scientists to publish results — and conclusions drawn from them — with small Ns. As a result, there are many, many papers published in the scientific literature that turn out to have conclusions that cannot be considered true because the N is too small.

This is starting to become a matter of concern in the scientific community. Drug discovery scientists at Bayer reported in 2011 that they were able to replicate results in only ~20–25% of the prominent studies they examined; scientists from Amgen followed with a publication in 2012 showing that they were only able to replicate six (11%) of fifty-three high-impact publications and called for higher standards in scientific publishing. The journal Nature announced in April 2013 that in response to these and other articles showing a widespread problem with reproducibility, it was taking measures to raise its standards.

So when you pick up any given published paper, we don't know what is going on well enough to judge whether the conclusions will "stick" or not. Even scientists don't know.

In Wikipedia, these research papers are "primary sources." Hopefully, you now have a good understanding of why these papers are not reliable descriptions of reality. They shouldn't be used by the general public for anything, much less creating encyclopedic content.

Now from time to time, scientists sit down and read a bunch of research papers. They think about them and write what we call "reviews", where they try to fit all the primary research together in a way that makes sense. The scientist doing the review will generally cite the primary studies that are part of this description. Generally, reviews do not say things like "that paper is bunk. We are going to ignore it." Instead, they just ignore papers that turn out to be false leads. This is really important. Only egregiously bad papers are actually retracted; there are loads and loads of papers that draw conclusions that turned out not to be true, but that remain in the literature. People who are not experts in the field have no way of knowing which research papers have been left in the dust by the scientific community. These papers are not retracted, nor are they labelled in any way. They just sit there, ignored.

The reviews are written primarily for other scientists in the field — reviews are one of the key tools that the scientific community uses to map itself, to step back and see where things stand. For Wikipedia, these reviews are "secondary sources", and they are dramatically more reliable than primary sources. They give us the consensus (or, if not consensus, the emerging consensus, or a clear picture of what the main contending theories are) in any given field about what is true and what is not, and what is still unknown or uncertain.

A lot of people have strong opinions about health-related matters
Part of this is pretty obvious, but other parts are more complex and deserve some discussion.

Humans are pattern making animals – our brains tend to try to make sense out of things. A common mistake that arises from this, is that people often treat two events that happen one after the other, as though the first caused the second, or treat two things that happened at the same time, as being connected somehow. And plenty of times this is useful and helps each of us thrive and survive. But for complex matters like health and medicine (see above, for how complex!) these simple associations don't work. Anecdotal evidence is very weak, and as discussed above, correlation is not causation.

And people have all kinds of strong ideas about health – it is a fundamental thing we all care about, both for ourselves and our loved ones. Something goes wrong with your child and your doctor tells you that they are autistic. How did this happen?? What happened to my child! It's easy to fall prey to people pushing theories like the now-completely-discredited vaccine theory. (a drastic "correlation is not causation" mistake). Worse, when people fall prey to baloney like that, they start to advocate for society to take action. This is really, really problematic—especially in the case of vaccines, where there are serious risks not only to your child but to other people and their children, in not vaccinating your child. Or say your child is prone to earaches and your doctor keeps prescribing antibiotics, yet your child still gets earaches and now has an upset stomach, and so you turn to alternative medicine to try to mitigate things... and then after you make the switch your child stops suffering. You might become convinced that it was the switch that made the difference and that "modern chemical medicine" is bad and "natural medicine" is good. (But again, see the "correlation is not causation" fallacy). This goes on and on. People have experiences, and want to generalize from them. When scientists hear about this, they say, "Oy".

The popular press and health news
Knowing that people are keenly interested in health-related matters, the media loves to grab science news and pump it up — this sells newspapers and pulls eyes to TV shows and websites. This is something that has been happening more and more over the past thirty years or so, and is driven in part by the 24-hour news cycle and its hunger for stories. But the popular press is really, really unreliable for health news. For example, the BBC—very respected! —reported in 2011 that some Swedish surgeons had "carried out the world's first synthetic organ transplant". They put that in bold print at the top of their article. The problem is that this was dead wrong. Another team published an article in 2006 on their work with artificial bladders—work they started in 2001.

Or let's take stories about food. Let's see, should I drink coffee or not? Maybe I will live longer and drive safer, and hey, if I am woman maybe I will be less depressed, but oh, no! it alters my estrogen levels and maybe it will screw up my baby. Every one of those links is from the New York Times, and is just from the past couple of years. I think it is terrible to jerk the public around like this. A newspaper has an excuse, but Wikipedia does not — we need to provide reliable information to the public.

Why does this happen? Newspapers want to tell stories that sell. And maybe these science/health topics are considered "soft" stories—public interest more than hard news, and are not subject to the same strict editorial and fact checking that real hard news stories are. They generally run with press releases.

Also, hospitals and medical colleges love to put out hype-y press releases when their scientists and doctors do things. Making a splash draws great attention, enhances reputation, and attracts donors, great faculty, and great students. There is a whole world of conflict of interest there. Likewise, individual scientists compete with each other for grant funding (it is really, really hard to win research grants today!) and for publication in high-profile journals. Splashy press releases help raise an investigator's profile. A lot of "science journalism" —too much of it—is unreliable.

Finally, there is a lot of money involved in health-related matters. Conflict of interest is a serious issue in publishing, and almost every journal requires its authors to report any possible conflicts of interest so that reviewers and readers can consider those conflicts when they judge the conclusions that authors draw.

About sources again
As mentioned above, scientists write reviews from time to time, which are dramatically more reliable than primary sources.

There are different kinds of reviews—some of these are kind of impressionistic, where a senior scientist in a field sits back and reflects. Others are more serious and more detailed, and actually do statistics and analyze and criticize the papers they are pulling together. That latter kind—systematic, critical reviews—are by far the most valuable, both for us at Wikipedia, and for anybody trying to understand what the hell is going on. Some of these systematic critical reviews are written especially for doctors to help them understand where things stand. The Cochrane Collaboration is a group of doctors and scientists who concentrate on doing this. These reviews are very very valuable to us.

Also, from time to time, major scientific or medical organizations come out with statements on important issues—the World Health Organization, US government agencies like the Agency for Healthcare Research and Quality and the Centers for Disease Control and Prevention, other governmental organizations like the UK's NICE or nonprofits like American Medical Association or Cancer Research UK... organizations like that. Likewise, professional medical associations often generate medical guidelines for diagnosing and treating diseases or conditions. When these organizations make statements, they are summarizing the evidence that exists and providing the mainstream view on it; you can trust those are true as well, and those statements also make great secondary sources for health content on Wikipedia.

This is what WP:MEDRS is about. We should base health content in Wikipedia articles on these kinds of reliable, independent, secondary sources. We should stay away from primary sources, and from reports in the popular media hyping research publications. Folks are always happy to talk about specific sources at WT:MED or WT:MEDRS.