Study Reveals Increase In
Misleading P Values In Biomedical
believe that unreliable results in scholarly literature due to misuse of statistical
tools may be a cause of worry.
A team at
the Stanford University School of Medicine found their answers by analyzing p-values
reported in biomedical journals.
they concluded that instead of reporting isolated P values, articles should also
include effect sizes and uncertainty metrics.
The P value
is a statistical tool used to determine the significance of results in
research. It is a number between zero and one. A small P value indicates strong
evidence against the null hypothesis, which is the hypothesis that the
researcher is trying to disprove – it suggests that there is no statistical
significance between the two variables in the hypothesis, hence the null
hypothesis is rejected.
analysed millions of biomedical journals from 1990 to 2015 using automated text
mining to scan through the MEDLINE and PubMed Central (PMC) databases. 1000
abstracts and 100 full papers were also manually assessed.
found that P value reporting in abstracts increased from 7.3 percent in 1990 to
15.6 percent in 2014, a more than double increase. P values were reported in 33
percent of abstracts from core clinical journals, 35.7 percent of
meta-analyses, 38.9 percent of clinical trials, 54.8 percent of randomized controlled
trials and 2.4 percent of reviews.
in the manual assessment of the 1000 abstracts and 100 full papers, Bayesian
statistical methods and false-discovery rate methods were not present.
Confidence intervals were also rarely reported. The increased use of P values
in many abstracts do not demonstrate the advancement and improvement of the way
biomedical research is carried out or the way the data is analysed because they
are misused, as suggested by the researchers.
suggests that researchers tend to report nominally statistically significant
results in the abstract due to selective pressure to deliver significant
results in the competitive scientific environment, therefore elucidating the impression
that the study has generated useful results which can be used in the future. The
study also noted the recent increase in studies that test a very large number
of hypotheses and lower P values are attained by chance.
manually assessed abstracts that reported empirical data, only 18 reported one
confidence interval, a measure of the uncertainty about the magnitude of the
effect, and only 111 reported at least one effect size. None reported the use
of Bayesian statistical methods and false-discovery rate methods.
out of 99 full-text articles, 55 reported at least one P value, 4 reported
confidence intervals for all effect sizes, but none reported Bayesian
statistical methods and only one reported false-discovery rate methods.
problems of selective reporting of P values and absence of equally important
information such as effect sizes, such as mean differences and relative risk
measures, and measures of uncertainty can make it difficult to assess the
accuracy of the results. The reporting of these quantitative information is
substantial in improving the transparency and accuracy of the biomedical publications
as it estimates the probability that a finding is false.
concludes that the usage of P values is very selective and is biased towards
more significant results, especially in the abstracts of publications. The
study aims to encourage investigators to report the main quantitative findings
of their analyses in their publication abstracts instead of only reporting the
statistically significant results. The study also advocates using other statistical
tools that churn out effect sizes and uncertainty which complement the P