A common view in epidemiology is that automated confounder selection methods, such as backward elimination, should be avoided as they can lead to biased effect estimates and underestimation of... Show moreA common view in epidemiology is that automated confounder selection methods, such as backward elimination, should be avoided as they can lead to biased effect estimates and underestimation of their variance. Nevertheless, backward elimination remains regularly applied. We investigated if and under which conditions causal effect estimation in observational studies can improve by using backward elimination on a prespecified set of potential confounders. An expression was derived that quantifies how variable omission relates to bias and variance of effect estimators. Additionally, 3960 scenarios were defined and investigated by simulations comparing bias and mean squared error (MSE) of the conditional log odds ratio, log(cOR), and the marginal log risk ratio, log(mRR), between full models including all prespecified covariates and backward elimination of these covariates. Applying backward elimination resulted in a mean bias of 0.03 for log(cOR) and 0.02 for log(mRR), compared to 0.56 and 0.52 for log(cOR) and log(mRR), respectively, for a model without any covariate adjustment, and no bias for the full model. In less than 3% of the scenarios considered, the MSE of the log(cOR) or log(mRR) was slightly lower (max 3%) when backward elimination was used compared to the full model. When an initial set of potential confounders can be specified based on background knowledge, there is minimal added value of backward elimination. We advise not to use it and otherwise to provide ample arguments supporting its use. Show less
Heinze, G.; Smeden, M. van; Wynants, L.; Steyerberg, E.; Calster, B. van 2022
Although regression models play a central role in the analysis of medical research projects, there still exist many misconceptions on various aspects of modeling leading to faulty analyses. Indeed,... Show moreAlthough regression models play a central role in the analysis of medical research projects, there still exist many misconceptions on various aspects of modeling leading to faulty analyses. Indeed, the rapidly developing statistical methodology and its recent advances in regression modeling do not seem to be adequately reflected in many medical publications. This problem of knowledge transfer from statistical research to application was identified by some medical journals, which have published series of statistical tutorials and (shorter) papers mainly addressing medical researchers. The aim of this review was to assess the current level of knowledge with regard to regression modeling contained in such statistical papers. We searched for target series by a request to international statistical experts. We identified 23 series including 57 topic-relevant articles. Within each article, two independent raters analyzed the content by investigating 44 predefined aspects on regression modeling. We assessed to what extent the aspects were explained and if examples, software advices, and recommendations for or against specific methods were given. Most series (21/23) included at least one article on multivariable regression. Logistic regression was the most frequently described regression type (19/23), followed by linear regression (18/23), Cox regression and survival models (12/23) and Poisson regression (3/23). Most general aspects on regression modeling, e.g. model assumptions, reporting and interpretation of regression results, were covered. We did not find many misconceptions or misleading recommendations, but we identified relevant gaps, in particular with respect to addressing nonlinear effects of continuous predictors, model specification and variable selection. Specific recommendations on software were rarely given. Statistical guidance should be developed for nonlinear effects, model specification and variable selection to better support medical researchers who perform or interpret regression analyses. Show less
In the last decades, statistical methodology has developed rapidly, in particular in the field of regression modeling. Multivariable regression models are applied in almost all medical research... Show moreIn the last decades, statistical methodology has developed rapidly, in particular in the field of regression modeling. Multivariable regression models are applied in almost all medical research projects. Therefore, the potential impact of statistical misconceptions within this field can be enormous Indeed, the current theoretical statistical knowledge is not always adequately transferred to the current practice in medical statistics. Some medical journals have identified this problem and published isolated statistical articles and even whole series thereof. In this systematic review, we aim to assess the current level of education on regression modeling that is provided to medical researchers via series of statistical articles published in medical journals. The present manuscript is a protocol for a systematic review that aims to assess which aspects of regression modeling are covered by statistical series published in medical journals that intend to train and guide applied medical researchers with limited statistical knowledge. Statistical paper series cannot easily be summarized and identified by common keywords in an electronic search engine like Scopus. We therefore identified series by a systematic request to statistical experts who are part or related to the STRATOS Initiative (STRengthening Analytical Thinking for Observational Studies). Within each identified article, two raters will independently check the content of the articles with respect to a predefined list of key aspects related to regression modeling. The content analysis of the topic-relevant articles will be performed using a predefined report form to assess the content as objectively as possible. Any disputes will be resolved by a third reviewer. Summary analyses will identify potential methodological gaps and misconceptions that may have an important impact on the quality of analyses in medical research. This review will thus provide a basis for future guidance papers and tutorials in the field of regression modeling which will enable medical researchers 1) to interpret publications in a correct way, 2) to perform basic statistical analyses in a correct way and 3) to identify situations when the help of a statistical expert is required. Show less
Wynants, L.; Calster, B. van; Bonten, M.M.J.; Collins, G.S.; Debray, T.P.A.; Vos, M. de; ... ; Smeden, M. van 2020
OBJECTIVETo review and critically appraise published and preprint reports of prediction models for diagnosing coronavirus disease 2019 (covid-19) in patients with suspected infection, for prognosis... Show moreOBJECTIVETo review and critically appraise published and preprint reports of prediction models for diagnosing coronavirus disease 2019 (covid-19) in patients with suspected infection, for prognosis of patients with covid-19, and for detecting people in the general population at risk of being admitted to hospital for covid-19 pneumonia.DESIGNRapid systematic review and critical appraisal.DATA SOURCESPubMed and Embase through Ovid, Arxiv, medRxiv, and bioRxiv up to 24 March 2020.STUDY SELECTIONStudies that developed or validated a multivariable covid-19 related prediction model.DATA EXTRACTIONAt least two authors independently extracted data using the CHARMS (critical appraisal and data extraction for systematic reviews of prediction modelling studies) checklist; risk of bias was assessed using PROBAST (prediction model risk of bias assessment tool).RESULTS2696 titles were screened, and 27 studies describing 31 prediction models were included. Three models were identified for predicting hospital admission from pneumonia and other events (as proxy outcomes for covid-19 pneumonia) in the general population; 18 diagnostic models for detecting covid-19 infection (13 were machine learning based on computed tomography scans); and 10 prognostic models for predicting mortality risk, progression to severe disease, or length of hospital stay. Only one study used patient data from outside of China. The most reported predictors of presence of covid-19 in patients with suspected disease included age, body temperature, and signs and symptoms. The most reported predictors of severe prognosis in patients with covid-19 included age, sex, features derived from computed tomography scans, C reactive protein, lactic dehydrogenase, and lymphocyte count. C index estimates ranged from 0.73 to 0.81 in prediction models for the general population (reported for all three models), from 0.81 to more than 0.99 in diagnostic models (reported for 13 of the 18 models), and from 0.85 to 0.98 in prognostic models (reported for six of the 10 models). All studies were rated at high risk of bias, mostly because of non-representative selection of control patients, exclusion of patients who had not experienced the event of interest by the end of the study, and high risk of model overfitting. Reporting quality varied substantially between studies. Most reports did not include a description of the study population or intended use of the models, and calibration of predictions was rarely assessed.CONCLUSIONPrediction models for covid-19 are quickly entering the academic literature to support medical decision making at a time when they are urgently needed. This review indicates that proposed models are poorly reported, at high risk of bias, and their reported performance is probably optimistic. Immediate sharing of well documented individual participant data from covid-19 studies is needed for collaborative efforts to develop more rigorous prediction models and validate existing ones. The predictors identified in included studies could be considered as candidate predictors for new models. Methodological guidance should be followed because unreliable predictions could cause more harm than benefit in guiding clinical decisions. Finally, studies should adhere to the TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) reporting guideline. Show less
Background The introduction of HLA matching of donors and recipients was a breakthrough in kidney transplantation. However, half of all transplanted kidneys still fail within 15 years after... Show moreBackground The introduction of HLA matching of donors and recipients was a breakthrough in kidney transplantation. However, half of all transplanted kidneys still fail within 15 years after transplantation. Epidemiological data suggest a fundamental role of non-HLA alloimmunity.Methods We genotyped 477 pairs of deceased donors and first kidney transplant recipients with stable graft function at three months that were transplanted between Dec 1, 2005, and April 30, 2015. Genome-wide genetic mismatches in non-synonymous single nucleotide polymorphisms (nsSNPs) were calculated to identify incompatibilities in transmembrane and secreted proteins. We estimated the association between nsSNP mismatch and graft loss in a Cox proportional hazard model, adjusting for HLA mismatch and clinical covariates. Customised peptide arrays were generated to screen for antibodies against genotype-derived mismatched epitopes in 25 patients with biopsy-confirmed chronic antibody-mediated rejection.Findings 59 268 nsSNPs affecting a transmembrane or secreted protein were analysed. The median number of nsSNP mismatches in immune-accessible transmembrane and secreted proteins between donors and recipients was 1892 (IQR 1850-1936). The degree of nsSNP mismatch was independently associated with graft loss in a multivariable model adjusted for HLA eplet mismatch (HLA-A, HLA-B, HLA-C, HLA-DP, HLA-DQ, and HLA-DR). Each increase by a unit of one IQR had an HR of 1.68 (95% CI 1.17-2.41, p=0.005). 5-year death censored graft survival was 98% in the quartile with the lowest mismatch, 91% in the second quartile, 89% in the third quartile, and 82% in the highest quartile (p=0.003, log-rank test). Customised peptide arrays verified a donor-specific alloimmune response to genetically predicted mismatched epitopes.Interpretation Genetic mismatch of non-HLA haplotypes coding for transmembrane or secreted proteins is associated with an increased risk of functional graft loss independently of HLA incompatibility. As in HLA alloimmunity, donor-specific alloantibodies can be identified against genotype derived non-HLA epitopes.Funding Austrian Science Fund, WWTF (Vienna Science and Technology Fund), and Ministry of Health of the Czech Republic. Copyright (c) 2019 Elsevier Ltd. All rights reserved. Show less