Analyzing tables of statistical tests

WR Rice - Evolution, 1989 - JSTOR
Evolution, 1989JSTOR
Tables of statistical tests are commonly analyzed in evolutionary studies. These include
analysis-of-variance and regression tables as well as tables of correlation coefficients, chi-
square values, G values, Student's t values, etc. To see the prevalence of such tables, one
need only refer to a recent issue of Evolution (eg, Evolution 41 (6), November 1987, where
such tables appeared in 14 of 22 empirical articles). Here, I point out that testing for the
statistical significance of component tests is routinely carried out in a biased fashion that …
Tables of statistical tests are commonly analyzed in evolutionary studies. These include analysis-of-variance and regression tables as well as tables of correlation coefficients, chi-square values, G values, Student's t values, etc. To see the prevalence of such tables, one need only refer to a recent issue of Evolution (eg, Evolution 41 (6), November 1987, where such tables appeared in 14 of 22 empirical articles). Here, I point out that testing for the statistical significance of component tests is routinely carried out in a biased fashion that liberally judges far too many tests to be significant. I then describe a nonparametric technique, originally proposed by Holm (1979), to eliminate this bias. So as not to single out any one person unfairly and use his published results as a straw man, consider a hypothetical correlation table examining five variables. The procedure standardly used to evaluate such a table is to carry out an individual significance test on each of the ten correlation coefficients and then denote those significant at the 5% level with an asterisk, those significant at the 1% level with two asterisks, etc. Suppose that two of the ten correlation coefficients were found to be individually significant (P< 0.05). Using the
" individual significance method," a researcher might spend several journal pages explaining the evolutionary ramifications of the two individually significant correlations observed in the table. Yet there may be insufficient evidence to be 95% confident that there are any nonzero correlations. Appropriate probability values must adjust for the number of simultaneous tests. One can solve for the probability of observing at least one individually significant correlation (P value less than 0.05) in the above, hypothetical correlation table on the composite null hypothesis (H0,) that all the component correlations are zero. In computer simulations (Appendix), this probability is approximately 40%. Moreover, the probability of observing two or more individual P values less than or equal to 0.05 is about 7%. If a dozen variables were correlated, we would be more than 95% certain, on Ho, that at least one correlation would be judged individually significant by chance alone. Even very small P values are expected in moderately large correlation tables. With a dozen variables, chance alone would produce a P value less than or equal to 0.001 about 7% of the time. The marking of component tests as statistically signif-
JSTOR