The role of P-values for null hypothesis testing is under debate. We aim to explore the impact of the significance threshold on estimates for the strengths of associations ("effects") and the... Show moreThe role of P-values for null hypothesis testing is under debate. We aim to explore the impact of the significance threshold on estimates for the strengths of associations ("effects") and the implications for different types of epidemiological research. We consider situations with normal distribution of a true effect, while varying the effect size. We confirm the occurrence of "testimation bias": estimating effect size only if the test was statistically significant leads to exaggerated results. The absolute bias is largest for true effects around 0.7 times the size of the standard error: +220% bias if effects are selected after testing with P < .05, and +335% if tested with P P < .20 (+130%) and larger true effect sizes. We conclude that a lower P-value threshold for declaring statistical significance implies more exaggeration in an estimated effect. This implies that if a low threshold is used, effect size estimation should not be attempted, for example in the context of selecting promising discoveries that need further validation. Confirmatory studies, such as randomized controlled trials, might stick to the 0.05 threshold if adequately powered, while prediction modelling studies should use an even higher threshold, such as 0.2, to avoid strongly biased effect estimates. Show less
Bollen, L.; Wibmer, C.; Linden, Y.M. van der; Pondaag, W.; Fiocco, M.; Peul, W.C.; ... ; Dijkstra, S.P.D. 2016
Study Design.A retrospective cohort study.Objective.The aim of this study was to assess and compare the predictive accuracy of six models designed to estimate survival of patients suffering from... Show moreStudy Design.A retrospective cohort study.Objective.The aim of this study was to assess and compare the predictive accuracy of six models designed to estimate survival of patients suffering from spinal bone metastases Just (SBMs).Summary of Background Data.On the basis of the estimated survival of patients with SBM, extent of treatment can be adjusted. To aid clinicians in the difficult task of assessing probability of survival, prognostic scoring systems have been developed by Tomita, Tokuhashi, Van der Linden, Bauer, Rades, and Bollen.Methods.All patients who were treated for SBM between 2000 and 2010 were included in this international, multicenter, retrospective study (n=1379). Medical records were reviewed for all items needed to use the scoring systems. Survival time was calculated as the difference between start of treatment for SBM and date of death. Survival curves were estimated using the Kaplan-Meier method and accuracy was assessed with the c-statistic. Survival rates of the worst prognostic groups were evaluated at 4 months.Results.Median follow-up was 6.7 years [95% confidence interval (95% CI) 5.6-7.7] with a minimum of 2.3 years and a maximum of 12.3 years. The overall median survival was 5.1 months (95% CI 4.6-5.6). The most common primary tumors were breast (n=388, 28%), lung (n=318, 23%), and prostate cancer (n=259, 19%). The Tokuhashi, Bauer, Tomita, and Van der Linden models performed similar with a c-statistic of 0.64 to 0.66 and a 4-month accuracy of 62% to 65%. The Rades model (c-statistic 0.44) and Bollen model (c-statistic 0.70) had a 4-month accuracy of 69% and 75%, respectively.Conclusion.The Bollen model performs better than the other models. However, improvements are still warranted to increase the accuracy.Level of Evidence: 3 Show less