User:Gjsis/sandbox

Comparison of meta-analysis to the scientific method
Francis Bacon described a method of procedure for advancing the physical sciences.

“Aphorism 106: In forming our axioms from induction, we must examine and try whether the axiom we derive be only fitted and calculated for the particular instances from which it is deduced, or whether it be more extensive and general. If it be the latter, we must observe, whether it confirms its own extent and generality by giving surety, as it were, in pointing out new particulars, so that we may neither stop at actual discoveries, nor with a careless grasp catch at shadows and abstract forms, instead of substances of a determinate nature: and as soon as we act thus, well authorized hope may with reason, be said to beam upon us.”

George Boole gave a similar description.

“The study of every department of physical science begins with observation; it advances by the collation of facts to a presumptive acquaintance with their connecting law, the validity of such presumption it tests by new experiments so devised as to augment, if the presumption be well founded, its probability indefinitely; and finally, the law of the phenomenon having been with sufficient confidence determined, the investigation of causes, conducted by the due mixture of hypothesis and deduction, crowns the inquiry.” (Boole, 1958, p. 402)

In the method described by Bacon and Boole, the test of an hypothesis must be prospective. A meta-analysis can be prospective or retrospective. Two ways in which it can be retrospective are (a) if the hypothesis is specified after the data are known, as discussed in aphorism 106 above or (b) if studies are selected after the data in them are known. The consequence of the selecting data after the data are known is the possibility of a biased selection as discussed in Aphorisms 46 and 49.

Protocols for carrying out a prospective meta-analysis are given by The Cochrane Collaboration Two key points in the protocols are (a) A prospective meta-analysis is a meta-analysis of studies (usually randomized trials) that were identified, evaluated and determined to be eligible for the meta-analysis before the results of any of those studies became known, and  (b) Prospective meta-analyses enable hypotheses to be specified in advance of the results of individual trials; enable prospective application of study selection criteria; and enable a priori statements of intended analyses. As meta-analyses rather than multi-centre trials, they allow variation in the protocols of the included studies, while maximizing power in the pre-planned meta-analyses.

How does one choose
There is an incompleteness in the article on statistical significance. I’ll try to illustrate it with an example. A dealer and a player are playing one hand of a five card game. The cards are dealt face up one at a time, first five cards to the dealer then five cards to the player. At some point during or before or after the game, the player forms two hypotheses. The null hypothesis, that the deck was fairly shuffled and an alternate that the deck was stacked. The null hypothesis is to be rejected if the dealer’s hand is a royal flush. Say the dealer’s cards were 10, Q, A, K, and J all of clubs in that order. The probability of the dealer’s hand being royal flush can have as many as six different values depending on at what the stage in the dealing it is calculated. If no cards have been dealt, it is P0 = 0.000002, after the 10 of clubs is dealt it is P1 = 0.000004. Continuing P2 = 0.000051, P3 = 0.000850, P4 = 0.020833, P5 = 1.0. The statistical significance depends on when the hypothesis is specified. If it’s specified between the third and fourth card dealt it is P3, if before the first card is dealt it’s P0, if after the fifth card it’s P5,, etc. The critical event (which causes the rejection of the null hypothesis) has several probabilities and the definition of statistical significance has to tell which probability to choose.

Does order of procedure affect statistical significance?
Order, refers to which comes first the test data or the specification of the hypotheses to be tested. When the hypotheses come first the test is prospective and when the data come first the test is retrospective. Traditionally prospective tests have been required. However, there is a well known generally accepted scientific report containing numerous hypothesis tests in which much or all of the data preceded the hypotheses. In that study the statistical significance was calculated the same as it would have been had the hypotheses preceded the data. A related question in use of statistics in the physical sciences is whether probability theory applies to the known past in the same way that it applies to the unknown future. Although these questions have been discussed, there are few references in this area of statistics. It hardly seems reasonable to accord the same status to a hypothesis that explains the results of an experiment after the results are known as to a hypothesis that predicts the results of an experiment before they are known. This is because it is well known that predicting an event before it occurs is more difficult than explaining it after it occurs.

Comment on Dubious
The citation (USEPA December 1992) contains numerous statistical tests, some presented as p-values and some as confidence intervals. Figures 5-1 through 5-4 show some of the test statistics used in the citation. In two of the figures the statistics are for individual studies and can be assumed to be prospective. In the other two, the statistics are for pooled studies and can be assumed to be retrospective. Table 5-9 includes the results of the test of the hypothesis RR = 1 versus RR > 1 for individual studies and for pooled studies. In the cited report, no distinction between prospective tests and retrospective tests was made. This is a departure from the traditional scientific method, which makes a strict distinction between predictions of the future and explanations of the past.

Idol of the market
The probabilities of events in the known past are restricted to zero and one. That is equivalent to saying that the occurrence of such events lacks uncertainty or that such events are unsuitable for gambling. In a retrospective statistical study, wherein the hypotheses are specified after the data are known, the critical event (critical outcome or rejection region) is an event in the known past and so has probability zero or one. What is called the level of statistical significance in retrospective studies is not a probability. It is only a measure of relative goodness of fit of the data to the distributions determined by the null hypothesis and the alternate hypothesis. If the critical event occurred before the hypotheses were specified, then the null hypothesis never had a chance. If it didn’t occur, then the alternate hypothesis never had a chance. Calling that measure of goodness of fit a probability is a misuse of words. The misuse of words is an impediment to understanding long recognized and named an “idol of the market”, words being the coin in the market place of ideas. The actual level of statistical significance in a retrospective study is 0% or 100% i.e. in retrospective statistical studies including retrospective meta-analyses no hypothesis is tested.

Restoration of the section "Does order of procedure affect statistical significance?"
The explanation provided by Manoguru for deleting the section on “order of procedure” is relevant to how the authors of the USEPA report make policy decisions, but is not relevant to how they calculated p-values to determine statistical significance which they defined as p-value < 0.05. The meaning of the term “statistical significance” as it is currently understood is found by seeing how the term is used. The referenced report, Respiratory health effects of passive smoking: Lung cancer and other disorders, 	EPA/600/6-90/006F (1992), is from a prestigious organization and has numerous examples of the use of the term. Moreover the report is well known, generally accepted as sound, and is readily available. That makes it a useful reference. Also, dropping the requirement that the hypothesis precede the experiment that tests it, is an innovation in the scientific method that deserves to be noted. For these reasons I suggest restoring the deleted section. Gjsis (talk) 16:58, 1 January 2014 (UTC)

The section, “Does order of procedure affect statistical significance?”, contained actual criticism supported by reliable sources and it was deleted.Gjsis (talk) 15:07, 8 January 2014 (UTC)

Two versions of scientific hypothesis testings
The article does not describe two versions of the scientific method that are currently accepted, but rather blurs the two methods into one. The two methods can be illustrated by an example.

Suppose a researcher sets out to toss a coin 100 times. The first fifty tosses are heads. The researcher then formulates the hypothesis that this coin is biased towards heads against the null hypothesis that the coin is unbiased. The researcher then continues tossing the coin for fifty more times and those last fifty tosses result in 25 heads and 25 tails. Here the two version of hypothesis testing diverge. The two methods don’t even see the same experimental outcome. One method sees the experimental outcome as 75 heads, 25 tails and calculates the probability of this many heads or more under the null hypothesis as less that 0.05 and rejects the null hypothesis at the 5% level of statistical significance. The other method sees the outcome as 25 heads, 25 tails and calculates the probability of 25 or more heads, under the null hypothesis, to be greater than 0.05 and accepts the null hypothesis at the 5% level.

Each of the methods has been endorsed by reputable authorities as a way of scientifically testing hypotheses. These two distinct understandings of the phrase “scientific method” are not presented to the reader of the article on the scientific method although there are faint hints of each in the article as is illustrated by two quotations from the article: For the first method, “Not all steps take place in every scientific inquiry (or to the same degree), and are not always in the same order.” ; For the second method, “It is essential that the outcome of testing such a prediction be currently unknown.”

The two methods use the same terminology, but they are not the same idea. An article on the scientific method shouldn’t fail to describe them both and cite their authoritative endorsements.Gjsis (talk) 12:31, 27 April 2014 (UTC)

First to the authoritative endorsements. The first method, including old data, is a partially retrospective statistical analysis, and most meta-analyses employ data known before the hypotheses were specified. The most famous such analysis was published by the USEPA in 1992 on environmental tobacco smoke and it has been accepted as sound. Also, prestigious journals such as New England Journal of Medicine and Journal of the American Medical Association have published such partially retrospective studies as a Google search will show. The second method is recommended by Francis Bacon in Aphorism 106 in the Novum Organum where he says that hypotheses must be tested by new data after the specification of the hypotheses.

Besides endorsements there are also negative views of the first method. Already mentioned is Bacon’s requirement that the test be based on new data. George Boole takes the position that probabilities, except for zero and one, require uncertainty. As the first 50 tosses were in the known past when the hypotheses were specified, the probability of those 50 heads is 1.0 and not the fiftieth power of 1/2. With Boole's correction the calculation in the first method gives the same result as was found in the second method. The Boole reference is from the Laws of Thought, chapter XVI.

Francis Bacon put forth a case that predictions of something unknown are necessary for testing conjectured physical laws. "Aphorism 106 In forming our axioms from induction, we must examine and try whether the axiom we derive be only fitted and calculated for the particular instances from which it is deduced, or whether it be more extensive and general. If it be the latter, we must observe, whether it confirm its own extent and generality by giving surety, as it were, in pointing out new particulars, so that we may neither stop at actual discoveries, nor with careless grasp catch at shadows and abstract forms, instead of substances of a determinate nature: and as soon as we act thus, well authorized hope may with reason, be said to beam upon us."