OBJECTIVES: Many machine learning (ML) models have been developed for application in the ICU, but few models have been subjected to external validation. The performance of these models in new... Show moreOBJECTIVES: Many machine learning (ML) models have been developed for application in the ICU, but few models have been subjected to external validation. The performance of these models in new settings therefore remains unknown. The objective of this study was to assess the performance of an existing decision support tool based on a ML model predicting readmission or death within 7 days after ICU discharge before, during, and after retraining and recalibration.DESIGN: A gradient boosted ML model was developed and validated on electronic health record data from 2004 to 2021. We performed an independent validation of this model on electronic health record data from 2011 to 2019 from a different tertiary care center.SETTING: Two ICUs in tertiary care centers in The Netherlands.PATIENTS: Adult patients who were admitted to the ICU and stayed for longer than 12 hours.INTERVENTIONS: None.MEASUREMENTS AND MAIN RESULTS: We assessed discrimination by area under the receiver operating characteristic curve (AUC) and calibration (slope and intercept). We retrained and recalibrated the original model and assessed performance via a temporal validation design. The final retrained model was cross-validated on all data from the new site. Readmission or death within 7 days after ICU discharge occurred in 577 of 10,052 ICU admissions (5.7%) at the new site. External validation revealed moderate discrimination with an AUC of 0.72 (95% CI 0.67–0.76). Retrained models showed improved discrimination with AUC 0.79 (95% CI 0.75–0.82) for the final validation model. Calibration was poor initially and good after recalibration via isotonic regression.CONCLUSIONS: In this era of expanding availability of ML models, external validation and retraining are key steps to consider before applying ML models to new settings. Clinicians and decision-makers should take this into account when considering applying new ML models to their local settings. Show less
Ramspek, C.L.; Boekee, R.; Evans, M.; Heimburger, O.; Snead, C.M.; Caskey, F.J.; ... ; EQUAL Study Investigators 2022
Introduction: Predicting the timing and occurrence of kidney replacement therapy (KRT), cardiovascular events, and death among patients with advanced chronic kidney disease (CKD) is clinically... Show moreIntroduction: Predicting the timing and occurrence of kidney replacement therapy (KRT), cardiovascular events, and death among patients with advanced chronic kidney disease (CKD) is clinically useful and relevant. We aimed to externally validate a recently developed CKD G4thorn risk calculator for these outcomes and to assess its potential clinical impact in guiding vascular access placement.Methods: We included 1517 patients from the European Quality (EQUAL) study, a European multicentre prospective cohort study of nephrology-referred advanced CKD patients aged $65 years. Model performance was assessed based on discrimination and calibration. Potential clinical utility for timing of referral for vascular access placement was studied with diagnostic measures and decision curve analysis (DCA).Results: The model showed a good discrimination for KRT and "death after KRT," with 2-year concordance (C) statistics of 0.74 and 0.76, respectively. Discrimination for cardiovascular events (2-year C-statistic: 0.70) and overall death (2-year C-statistic: 0.61) was poorer. Calibration was fairly accurate. Decision curves illustrated that using the model to guide vascular access referral would generally lead to less unused arteriovenous fistulas (AVFs) than following estimated glomerular filtration rate (eGFR) thresholds.Conclusion: This study shows moderate to good predictive performance of the model in an older cohort of nephrology-referred patients with advanced CKD. Using the model to guide referral for vascular access placement has potential in combating unnecessary vascular surgeries. Show less
Boone, S.C.; Smeden, M. van; Rosendaal, F.R.; Cessie, S. le; Groenwold, R.H.H.; Jukema, J.W.; ... ; Mutsert, R. de 2022
Visceral adipose tissue (VAT) is a strong prognostic factor for cardiovascular disease and a potential target for cardiovascular risk stratification. Because VAT is difficult to measure in clinical... Show moreVisceral adipose tissue (VAT) is a strong prognostic factor for cardiovascular disease and a potential target for cardiovascular risk stratification. Because VAT is difficult to measure in clinical practice, we estimated prediction models with predictors routinely measured in general practice and VAT as outcome using ridge regression in 2,501 middle-aged participants from the Netherlands Epidemiology of Obesity study, 2008-2012. Adding waist circumference and other anthropometric measurements on top of the routinely measured variables improved the optimism-adjusted R-2 from 0.50 to 0.58 with a decrease in the root-mean-square error (RMSE) from 45.6 to 41.5 cm(2) and with overall good calibration. Further addition of predominantly lipoprotein-related metabolites from the Nightingale platform did not improve the optimism-corrected R-2 and RMSE. The models were externally validated in 370 participants from the Prospective Investigation of Vasculature in Uppsala Seniors (PIVUS, 2006-2009) and 1,901 participants from the Multi-Ethnic Study of Atherosclerosis (MESA, 2000-2007). Performance was comparable to the development setting in PIVUS (R-2 = 0.63, RMSE = 42.4 cm(2), calibration slope = 0.94) but lower in MESA (R-2 = 0.44, RMSE = 60.7 cm(2), calibration slope = 0.75). Our findings indicate that the estimation of VAT with routine clinical measurements can be substantially improved by incorporating waist circumference but not by metabolite measurements. Show less
Ramspek, C.L.; Teece, L.; Snell, K.I.E.; Evans, M.; Riley, R.D.; Smeden, M. van; ... ; Diepen, M. van 2021
Background: External validation of prognostic models is necessary to assess the accuracy and generalizability of the model to new patients. If models are validated in a setting in which competing... Show moreBackground: External validation of prognostic models is necessary to assess the accuracy and generalizability of the model to new patients. If models are validated in a setting in which competing events occur, these competing risks should be accounted for when comparing predicted risks to observed outcomes. Methods: We discuss existing measures of calibration and discrimination that incorporate competing events for time-to-event models. These methods are illustrated using a clinical-data example concerning the prediction of kidney failure in a population with advanced chronic kidney disease (CKD), using the guideline-recommended Kidney Failure Risk Equation (KFRE). The KFRE was developed using Cox regression in a diverse population of CKD patients and has been proposed for use in patients with advanced CKD in whom death is a frequent competing event. Results: When validating the 5-year KFRE with methods that account for competing events, it becomes apparent that the 5-year KFRE considerably overestimates the real-world risk of kidney failure. The absolute overestimation was 10%age points on average and 29%age points in older high-risk patients. Conclusions: It is crucial that competing events are accounted for during external validation to provide a more reliable assessment the performance of a model in clinical settings in which competing risks occur. Show less
Youssef, A.; Hoorn, M.L.P. van der; Dongen, M.; Visser, J.; Bloemenkamp, K.; Lith, J. van; ... ; Lashley, E.E.L.O. 2021
Study question: What is the predictive performance of a currently recommended prediction model in an external Dutch cohort of couples with unexplained recurrent pregnancy loss (RPL)?Summary answer: .. Show moreStudy question: What is the predictive performance of a currently recommended prediction model in an external Dutch cohort of couples with unexplained recurrent pregnancy loss (RPL)?Summary answer: The model shows poor predictive performance on a new population; it overestimates, predicts too extremely and has a poor discriminative ability.What is known already: In 50-75% of couples with RPL, no risk factor or cause can be determined and RPL remains unexplained. Clinical management in RPL is primarily focused on providing supportive care, in which counselling on prognosis is a main pillar. A frequently used prediction model for unexplained RPL, developed by Brigham et al. in 1999, estimates the chance of a successful pregnancy based on number of previous pregnancy losses and maternal age. This prediction model has never been externally validated.Study design, size, duration: This retrospective cohort study consisted of 739 couples with unexplained RPL who visited the RPL clinic of the Leiden University Medical Centre between 2004 and 2019.Participants/materials, setting, methods: Unexplained RPL was defined as the loss of two or more pregnancies before 24 weeks, without the presence of an identifiable cause for the pregnancy losses, according to the ESHRE guideline. Obstetrical history and maternal age were noted at intake at the RPL clinic. The outcome of the first pregnancy after intake was documented. The performance of Brigham's model was evaluated through calibration and discrimination, in which the predicted pregnancy rates were compared to the observed pregnancy rates.Main results and the role of chance: The cohort included 739 women with a mean age of 33.1 years (±4.7 years) and with a median of three pregnancy losses at intake (range 2-10). The mean predicted pregnancy success rate was 9.8 percentage points higher in the Brigham model than the observed pregnancy success rate in the dataset (73.9% vs 64.0% (95% CI for the 9.8% difference 6.3-13.3%)). Calibration showed overestimation of the model and too extreme predictions, with a negative calibration intercept of -0.46 (95% CI -0.62 to -0.31) and a calibration slope of 0.42 (95% CI 0.11-0.73). The discriminative ability of the model was very low with a concordance statistic of 0.55 (95% CI 0.51-0.59). Recalibration of the Brigham model hardly improved the c-statistic (0.57; 95% CI 0.53-0.62).Limitations, reasons for caution: This is a retrospective study in which only the first pregnancy after intake was registered. There was no time frame as inclusion criterium, which is of importance in the counselling of couples with unexplained RPL. Only cases with a known pregnancy outcome were included.Wider implications of the findings: This is the first study externally validating the Brigham prognostic model that estimates the chance of a successful pregnancy in couples with unexplained RPL. The results show that the frequently used model overestimates the chances of a successful pregnancy, that predictions are too extreme on both the high and low ends and that they are not much more discriminative than random luck. There is a need for revising the prediction model to estimate the chance of a successful pregnancy in couples with unexplained RPL more accurately.Study funding/competing interest(s): No external funding was used and no competing interests were declared.Trial registration number: N/A.Keywords: external validation; miscarriage; prediction model; pregnancy success rate; recurrent pregnancy loss. Show less
Background: SARS-CoV-2 is straining health care systems globally. The burden on hospitals during the pandemic could be reduced by implementing prediction models that can discriminate patients who... Show moreBackground: SARS-CoV-2 is straining health care systems globally. The burden on hospitals during the pandemic could be reduced by implementing prediction models that can discriminate patients who require hospitalization from those who do not. The COVID-19 vulnerability (C-19) index, a model that predicts which patients will be admitted to hospital for treatment of pneumonia or pneumonia proxies, has been developed and proposed as a valuable tool for decision-making during the pandemic. However, the model is at high risk of bias according to the "prediction model risk of bias assessment" criteria, and it has not been externally validated.Objective: The aim of this study was to externally validate the C-19 index across a range of health care settings to determine how well it broadly predicts hospitalization due to pneumonia in COVID-19 cases.Methods: We followed the Observational Health Data Sciences and Informatics (OHDSI) framework for external validation to assess the reliability of the C-19 index. We evaluated the model on two different target populations, 41,381 patients who presented with SARS-CoV-2 at an outpatient or emergency department visit and 9,429,285 patients who presented with influenza or related symptoms during an outpatient or emergency department visit, to predict their risk of hospitalization with pneumonia during the following 0-30 days. In total, we validated the model across a network of 14 databases spanning the United States, Europe, Australia, and Asia.Results: The internal validation performance of the C-19 index had a C statistic of 0.73, and the calibration was not reported by the authors. When we externally validated it by transporting it to SARS-CoV-2 data, the model obtained C statistics of 0.36, 0.53 (0.473-0.584) and 0.56 (0.488-0.636) on Spanish, US, and South Korean data sets, respectively. The calibration was poor, with the model underestimating risk. When validated on 12 data sets containing influenza patients across the OHDSI network, the C statistics ranged between 0.40 and 0.68.Conclusions: Our results show that the discriminative performance of the C-19 index model is low for influenza cohorts and even worse among patients with COVID-19 in the United States, Spain, and South Korea. These results suggest that C-19 should not be used to aid decision-making during the COVID-19 pandemic. Our findings highlight the importance of performing external validation across a range of settings, especially when a prediction model is being extrapolated to a different population. In the field of prediction, extensive validation is required to create appropriate trust in a model. Show less
Ramspek, C.L.; Jager, K.J.; Dekker, F.W.; Zoccali, C.; Diepen, M. van 2021
Prognostic models that aim to improve the prediction of clinical events, individualized treatment and decision-making are increasingly being developed and published. However, relatively few models... Show morePrognostic models that aim to improve the prediction of clinical events, individualized treatment and decision-making are increasingly being developed and published. However, relatively few models are externally validated and validation by independent researchers is rare. External validation is necessary to determine a prediction model's reproducibility and generalizability to new and different patients. Various methodological considerations are important when assessing or designing an external validation study. In this article, an overview is provided of these considerations, starting with what external validation is, what types of external validation can be distinguished and why such studies are a crucial step towards the clinical implementation of accurate prediction models. Statistical analyses and interpretation of external validation results are reviewed in an intuitive manner and considerations for selecting an appropriate existing prediction model and external validation population are discussed. This study enables clinicians and researchers to gain a deeper understanding of how to interpret model validation results and how to translate these results to their own patient population. Show less
Rueten-Budde, A.J.; Praag, V.M. van; Sande, M.A.J. van de; Fiocco, M.; PERSARC Study Grp 2020
Background and Objectives A dynamic prediction model for patients with soft tissue sarcoma of the extremities was previously developed to predict updated overall survival probabilities throughout... Show moreBackground and Objectives A dynamic prediction model for patients with soft tissue sarcoma of the extremities was previously developed to predict updated overall survival probabilities throughout patient follow-up. This study updates and externally validates the dynamic model.Methods Data from 3826 patients with high-grade extremity soft tissue sarcoma, treated surgically with curative intent were used to update the dynamic PERsonalised SARcoma Care (PERSARC) model. Patients were added to the model development cohort and grade was included in the model. External validation was performed with data from 1111 patients treated at a single tertiary center.Results Calibration plots show good model calibration. Dynamic C-indices suggest that the model can discriminate between high- and low-risk patients. The dynamic C-indices at 0, 1, 2, 3, 4, and 5 years after surgery were equal to 0.697, 0.790, 0.822, 0.818, 0.812, and 0.827, respectively.Conclusion Results from the external validation show that the dynamic PERSARC model is reliable in predicting the probability of surviving an additional 5 years from a specific prediction time point during follow-up. The model combines patient-, treatment-specific and time-dependent variables such as local recurrence and distant metastasis to provide accurate survival predictions throughout follow-up and is available through the PERSARC app. Show less
Dijkland, S.A.; Helmrich, I.R.A.R.; Nieboer, D.; Jagt, M. van der; Dippel, D.W.J.; Menon, D.K.; ... ; CENTER-TBI Participants Investig 2020
The International Mission on Prognosis and Analysis of Clinical Trials in Traumatic Brain Injury (IMPACT) and Corticoid Randomisation After Significant Head injury (CRASH) prognostic models predict... Show moreThe International Mission on Prognosis and Analysis of Clinical Trials in Traumatic Brain Injury (IMPACT) and Corticoid Randomisation After Significant Head injury (CRASH) prognostic models predict functional outcome after moderate and severe traumatic brain injury (TBI). We aimed to assess their performance in a contemporary cohort of patients across Europe. The Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury (CENTER-TBI) core study is a prospective, observational cohort study in patients presenting with TBI and an indication for brain computed tomography. The CENTER-TBI core cohort consists of 4509 TBI patients available for analyses from 59 centers in 18 countries across Europe and Israel. The IMPACT validation cohort included 1173 patients with GCS <= 12, age >= 14, and 6-month Glasgow Outcome Scale-Extended (GOSE) available. The CRASH validation cohort contained 1742 patients with GCS <= 14, age >= 16, and 14-day mortality or 6-month GOSE available. Performance of the three IMPACT and two CRASH model variants was assessed with discrimination (area under the receiver operating characteristic curve; AUC) and calibration (comparison of observed vs. predicted outcome rates). For IMPACT, model discrimination was good, with AUCs ranging between 0.77 and 0.85 in 1173 patients and between 0.80 and 0.88 in the broader CRASH selection (n = 1742). For CRASH, AUCs ranged between 0.82 and 0.88 in 1742 patients and between 0.66 and 0.80 in the stricter IMPACT selection (n = 1173). Calibration of the IMPACT and CRASH models was generally moderate, with calibration-in-the-large and calibration slopes ranging between -2.02 and 0.61 and between 0.48 and 1.39, respectively. The IMPACT and CRASH models adequately identify patients at high risk for mortality or unfavorable outcome, which supports their use in research settings and for benchmarking in the context of quality-of-care assessment. Show less
Mikolic, A.; Polinder, S.; Steyerberg, E.W.; Helmrich, I.R.A.R.; Giacino, J.T.; Maas, A.I.R.; ... ; CENTER TBI Study Pa 2020
The majority of traumatic brain injuries (TBIs) are categorized as mild, according to a baseline Glasgow Coma Scale (GCS) score of 13-15. Prognostic models that were developed to predict functional... Show moreThe majority of traumatic brain injuries (TBIs) are categorized as mild, according to a baseline Glasgow Coma Scale (GCS) score of 13-15. Prognostic models that were developed to predict functional outcome and persistent post-concussive symptoms (PPCS) after mild TBI have rarely been externally validated. We aimed to externally validate models predicting 3-12-month Glasgow Outcome Scale Extended (GOSE) or PPCS in adults with mild TBI. We analyzed data from the Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury (CENTER-TBI) project, which included 2862 adults with mild TBI, with 6-month GOSE available for 2374 and Rivermead Post-Concussion Symptoms Questionnaire (RPQ) results available for 1605 participants. Model performance was evaluated based on calibration (graphically and characterized by slope and intercept) and discrimination (C-index). We validated five published models for 6-month GOSE and three for 6-month PPCS scores. The models used different cutoffs for outcome and some included symptoms measured 2 weeks post-injury. Discriminative ability varied substantially (C-index between 0.58 and 0.79). The models developed in the Corticosteroid Randomisation After Significant Head Injury (CRASH) trial for prediction of GOSE <5 discriminated best (C-index 0.78 and 0.79), but were poorly calibrated. The best performing models for PPCS included 2-week symptoms (C-index 0.75 and 0.76). In conclusion, none of the prognostic models for early prediction of GOSE and PPCS has both good calibration and discrimination in persons with mild TBI. In future studies, prognostic models should be tailored to the population with mild TBI, predicting relevant end-points based on readily available predictors. Show less
Luijken, K.; Groenwold, R.H.H.; Calster, B. van; Steyerberg, E.W.; Smeden, M. van 2019