Objectives: To (1) explore trends of risk of bias (ROB) in prediction research over time following key methodological publications, using the Prediction model Risk Of Bias ASsessment Tool (PROBAST)... Show moreObjectives: To (1) explore trends of risk of bias (ROB) in prediction research over time following key methodological publications, using the Prediction model Risk Of Bias ASsessment Tool (PROBAST) and (2) assess the inter-rater agreement of the PROBAST.Study Design and Setting: PubMed and Web of Science were searched for reviews with extractable PROBAST scores on domain and signaling question (SQ) level. ROB trends were visually correlated with yearly citations of key publications. Inter-rater agreement was asResults: One hundred and thirty nine systematic reviews were included, of which 85 reviews (containing 2,477 single studies) on domain level and 54 reviews (containing 2,458 single studies) on SQ level. High ROB was prevalent, especially in the Analysis domain, and overall trends of ROB remained relatively stable over time. The inter-rater agreement was low, both on domain (Kappa 0.04-0.26) and SQ level (Kappa -0.14 to 0.49). Conclusion: Prediction model studies are at high ROB and time trends in ROB as assessed with the PROBAST remain relatively stable. These results might be explained by key publications having no influence on ROB or recency of key publications. Moreover, the trend may suffer from the low inter-rater agreement and ceiling effect of the PROBAST. The inter-rater agreement could potentially be improved by altering the PROBAST or providing training on how to apply the PROBAST.& COPY; 2023 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). Show less
Treatment effects are often anticipated to vary across groups of patients with different baseline risk. The Predictive Approaches to Treatment Effect Heterogeneity (PATH) statement focused on... Show moreTreatment effects are often anticipated to vary across groups of patients with different baseline risk. The Predictive Approaches to Treatment Effect Heterogeneity (PATH) statement focused on baseline risk as a robust predictor of treatment effect and provided guidance on risk-based assessment of treatment effect heterogeneity in a randomized controlled trial. The aim of this study is to extend this approach to the observational setting using a standardized scalable framework. The proposed framework consists of five steps: (1) definition of the research aim, i.e., the population, the treatment, the comparator and the outcome(s) of interest; (2) identification of relevant databases; (3) development of a prediction model for the outcome(s) of interest; (4) estimation of relative and absolute treatment effect within strata of predicted risk, after adjusting for observed confounding; (5) presentation of the results. We demonstrate our framework by evaluating heterogeneity of the effect of thiazide or thiazide-like diuretics versus angiotensin-converting enzyme inhibitors on three efficacy and nine safety outcomes across three observational databases. We provide a publicly available R software package for applying this framework to any database mapped to the Observational Medical Outcomes Partnership Common Data Model. In our demonstration, patients at low risk of acute myocardial infarction receive negligible absolute benefits for all three efficacy outcomes, though they are more pronounced in the highest risk group, especially for acute myocardial infarction. Our framework allows for the evaluation of differential treatment effects across risk strata, which offers the opportunity to consider the benefit-harm trade-off between alternative treatments. Show less
Rekkas, A.; Rijnbeek, P.R.; Kent, D.M.; Steyerberg, E.W.; Klaveren, D. van 2023
Background Baseline outcome risk can be an important determinant of absolute treatment benefit and has been used in guidelines for "personalizing" medical decisions. We compared easily applicable... Show moreBackground Baseline outcome risk can be an important determinant of absolute treatment benefit and has been used in guidelines for "personalizing" medical decisions. We compared easily applicable risk-based methods for optimal prediction of individualized treatment effects.Methods We simulated RCT data using diverse assumptions for the average treatment effect, a baseline prognostic index of risk, the shape of its interaction with treatment (none, linear, quadratic or non-monotonic), and the magnitude of treatment-related harms (none or constant independent of the prognostic index). We predicted absolute benefit using: models with a constant relative treatment effect; stratification in quarters of the prognostic index; models including a linear interaction of treatment with the prognostic index; models including an interaction of treatment with a restricted cubic spline transformation of the prognostic index; an adaptive approach using Akaike's Information Criterion. We evaluated predictive performance using root mean squared error and measures of discrimination and calibration for benefit.Results The linear-interaction model displayed optimal or close-to-optimal performance across many simulation scenarios with moderate sample size (N = 4,250; similar to 785 events). The restricted cubic splines model was optimal for strong non-linear deviations from a constant treatment effect, particularly when sample size was larger (N = 17,000). The adaptive approach also required larger sample sizes. These findings were illustrated in the GUSTO-I trial.Conclusions An interaction between baseline risk and treatment assignment should be considered to improve treatment effect predictions. Show less
Background: While clinical prediction models (CPMs) are used increasingly commonly to guide patient care, the performance and clinical utility of these CPMs in new patient cohorts is poorly... Show moreBackground: While clinical prediction models (CPMs) are used increasingly commonly to guide patient care, the performance and clinical utility of these CPMs in new patient cohorts is poorly understood. Methods: We performed 158 external validations of 104 unique CPMs across 3 domains of cardiovascular disease (primary prevention, acute coronary syndrome, and heart failure). Validations were performed in publicly available clinical trial cohorts and model performance was assessed using measures of discrimination, calibration, and net benefit. To explore potential reasons for poor model performance, CPM-clinical trial cohort pairs were stratified based on relatedness, a domain-specific set of characteristics to qualitatively grade the similarity of derivation and validation patient populations. We also examined the model-based C-statistic to assess whether changes in discrimination were because of differences in case-mix between the derivation and validation samples. The impact of model updating on model performance was also assessed. Results: Discrimination decreased significantly between model derivation (0.76 [interquartile range 0.73-0.78]) and validation (0.64 [interquartile range 0.60-0.67], P<0.001), but approximately half of this decrease was because of narrower case-mix in the validation samples. CPMs had better discrimination when tested in related compared with distantly related trial cohorts. Calibration slope was also significantly higher in related trial cohorts (0.77 [interquartile range, 0.59-0.90]) than distantly related cohorts (0.59 [interquartile range 0.43-0.73], P=0.001). When considering the full range of possible decision thresholds between half and twice the outcome incidence, 91% of models had a risk of harm (net benefit below default strategy) at some threshold; this risk could be reduced substantially via updating model intercept, calibration slope, or complete re-estimation. Conclusions: There are significant decreases in model performance when applying cardiovascular disease CPMs to new patient populations, resulting in substantial risk of harm. Model updating can mitigate these risks. Care should be taken when using CPMs to guide clinical decision-making. Show less
Objective: To assess whether the Prediction model Risk Of Bias ASsessment Tool (PROBAST) and a shorter version of this tool can identify clinical prediction models (CPMs) that perform poorly at... Show moreObjective: To assess whether the Prediction model Risk Of Bias ASsessment Tool (PROBAST) and a shorter version of this tool can identify clinical prediction models (CPMs) that perform poorly at external validation. Study Design and Setting: We evaluated risk of bias (ROB) on 102 CPMs from the Tufts CPM Registry, comparing PROBAST to a short form consisting of six PROBAST items anticipated to best identify high ROB. We then applied the short form to all CPMs in the Registry with at least 1 validation (n = 556) and assessed the change in discrimination (dAUC) in external validation cohorts (n = 1,147). Results: PROBAST classified 98/102 CPMS as high ROB. The short form identified 96 of these 98 as high ROB (98% sensitivity), with perfect specificity. In the full CPM registry, 527 of 556 CPMs (95%) were classified as high ROB, 20 (3.6%) low ROB, and 9 (1.6%) unclear ROB. Only one model with unclear ROB was reclassified to high ROB after full PROBAST assessment of all low and unclear ROB models. Median change in discrimination was significantly smaller in low ROB models (dAUC -0.9%, IQR -6.2-4.2%) compared to high ROB models (dAUC -11.7%, IQR -33.3-2.6%; P < 0.001). Conclusion: High ROB is pervasive among published CPMs. It is associated with poor discriminative performance at validation, supporting the application of PROBAST or a shorter version in CPM reviews. (c) 2021 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license ( http:// creativecommons.org/ licenses/ by- nc- nd/ 4.0/ ) Show less
BACKGROUND The SYNTAX score II 2020 (SSII-2020) was derived from cross correlation and externally validated in randomized trials to predict death and major adverse cardiac and cerebrovascular... Show moreBACKGROUND The SYNTAX score II 2020 (SSII-2020) was derived from cross correlation and externally validated in randomized trials to predict death and major adverse cardiac and cerebrovascular events (MACE) following percutaneous coronary intervention (PCI) and coronary artery bypass grafting (CABG) in patients with 3-vessel disease (3VD) and/or left main coronary artery disease (LMCAD). OBJECTIVES The authors aimed to investigate the SSII-2020's value in identifying the safest modality of revascularization in a non-randomized setting. METHODS Five-year mortality and MACE were assessed in 7,362 patients with 3VD and/or LMCAD enrolled in a Japanese PCI/CABG registry. The discriminative abilities of the SSII-2020 were assessed using Harrell's C statistic. Agreement between observed and predicted event rates following PCI or CABG and treatment benefit (absolute risk difference [ARD]) for these outcomes were assessed by calibration plots. RESULTS The SSII-2020 for 5-year mortality well predicted the prognosis after PCI and CABG (C-index = 0.72, intercept =-0.11, slope = 0.92). When patients were grouped according to the predicted 5-year mortality ARD, <4.5% (equipoise of PCI and CABG) and $4.5% (CABG better), the observed mortality rates after PCI and CABG were not significantly different in patients with lower predicted ARD (observed ARD: 2.1% [95% CI:-0.4% to 4.4%]), and the significant difference in survival in favor of CABG was observed in patients with higher predicted ARD (observed ARD: 9.7% [95% CI: 6.1%-13.3%]). For MACE, the SSII-2020 could not recommend a specific treatment with sufficient accuracy. CONCLUSIONS The SSII-2020 for predicting 5-year death has the potential to support decision making on revascularization in patients with 3VD and/or LMCAD. (J Am Coll Cardiol 2021;78:1227-1238) (c) 2021 by the American College of Cardiology Foundation. Show less
Background: There are many clinical prediction models (CPMs) available to inform treatment decisions for patients with cardiovascular disease. However, the extent to which they have been externally... Show moreBackground: There are many clinical prediction models (CPMs) available to inform treatment decisions for patients with cardiovascular disease. However, the extent to which they have been externally tested, and how well they generally perform has not been broadly evaluated. Methods: A SCOPUS citation search was run on March 22, 2017 to identify external validations of cardiovascular CPMs in the Tufts Predictive Analytics and Comparative Effectiveness CPM Registry. We assessed the extent of external validation, performance heterogeneity across databases, and explored factors associated with model performance, including a global assessment of the clinical relatedness between the derivation and validation data. Results: We identified 2030 external validations of 1382 CPMs. Eight hundred seven (58%) of the CPMs in the Registry have never been externally validated. On average, there were 1.5 validations per CPM (range, 0-94). The median external validation area under the receiver operating characteristic curve was 0.73 (25th-75th percentile [interquartile range (IQR)], 0.66-0.79), representing a median percent decrease in discrimination of -11.1% (IQR, -32.4% to +2.7%) compared with performance on derivation data. 81% (n=1333) of validations reporting area under the receiver operating characteristic curve showed discrimination below that reported in the derivation dataset. 53% (n=983) of the validations report some measure of CPM calibration. For CPMs evaluated more than once, there was typically a large range of performance. Of 1702 validations classified by relatedness, the percent change in discrimination was -3.7% (IQR, -13.2 to 3.1) for closely related validations (n=123), -9.0 (IQR, -27.6 to 3.9) for related validations (n=862), and -17.2% (IQR, -42.3 to 0) for distantly related validations (n=717; P<0.001). Conclusions: Many published cardiovascular CPMs have never been externally validated, and for those that have, apparent performance during development is often overly optimistic. A single external validation appears insufficient to broadly understand the performance heterogeneity across different settings. Show less
Background Randomised controlled trials are considered the gold standard for testing the efficacy of novel therapeutic interventions, and typically report the average treatment effect as a summary... Show moreBackground Randomised controlled trials are considered the gold standard for testing the efficacy of novel therapeutic interventions, and typically report the average treatment effect as a summary result. As the result of treatment can vary between patients, basing treatment decisions for individual patients on the overall average treatment effect could be suboptimal. We aimed to develop an individualised decision making tool to select an optimal revascularisation strategy in patients with complex coronary artery disease.Methods The SYNTAX Extended Survival (SYNTAXES) study is an investigator-driven extension follow-up of a multicentre, randomised controlled trial done in 85 hospitals across 18 North American and European countries between March, 2005, and April, 2007. Patients with de-novo three-vessel and left main coronary artery disease were randomly assigned (1:1) to either the percutaneous coronary intervention (PCI) group or coronary artery bypass grafting (CABG) group. The SYNTAXES study ascertained 10-year all-cause deaths. We used Cox regression to develop a clinical prognostic index for predicting death over a 10-year period, which was combined, in a second stage, with assigned treatment (PCI or CABG) and two prespecified effect-modifiers, which were selected on the basis of previous evidence: disease type (three-vessel disease or left main coronary artery disease) and anatomical SYNTAX score. We used similar techniques to develop a model to predict the 5-year risk of major adverse cardiovascular events (defined as a composite of all-cause death, non-fatal stroke, or non-fatal myocardial infarction) in patients receiving PCI or CABG. We then assessed the ability of these models to predict the risk of death or a major adverse cardiovascular event, and their differences (ie, the estimated benefit of CABG versus PCI by calculating the absolute risk difference between the two strategies) by cross-validation with the SYNTAX trial (n=1800 participants) and external validation in the pooled population (n=3380 participants) of the FREEDOM, BEST, and PRECOMBAT trials. The concordance (C)-index was used to measure discriminative ability, and calibration plots were used to assess the degree of agreement between predictions and observations.Findings At cross-validation, the newly developed SYNTAX score II, termed SYNTAX score II 2020, showed a helpful discriminative ability in both treatment groups for predicting 10-year all-cause deaths (C-index=0.73 [95% CI 0.69-0.76] for PCI and 0.73 [0.69-0.76] for CABG) and 5-year major adverse cardiovascular events (C-index=0.65 [0.61-0.69] for PCI and C-index=0.71 [0.67-0.75] for CABG). At external validation, the SYNTAX score II 2020 showed helpful discrimination (C-index=0.67 [0.63-0.70] for PCI and C-index=0.62 [0.58-0.66] for CABG) and good calibration for predicting 5-year major adverse cardiovascular events. The estimated treatment benefit of CABG over PCI varied substantially among patients in the trial population, and the benefit predictions were well calibrated.Interpretation The SYNTAX score II 2020 for predicting 10-year deaths and 5-year major adverse cardiovascular events can help to identify individuals who will benefit from either CABG or PCI, thereby supporting heart teams, patients, and their families to select optimal revascularisation strategies. Copyright (C) 2020 Elsevier Ltd. All rights reserved. Show less
Rekkas, A.; Paulus, J.K.; Raman, G.; Wong, J.B.; Steyerberg, E.W.; Rijnbeek, P.R.; ... ; Klaveren, D. van 2020
Background: Recent evidence suggests that there is often substantial variation in the benefits and harms across a trial population. We aimed to identify regression modeling approaches that assess... Show moreBackground: Recent evidence suggests that there is often substantial variation in the benefits and harms across a trial population. We aimed to identify regression modeling approaches that assess heterogeneity of treatment effect within a randomized clinical trial.Methods: We performed a literature review using a broad search strategy, complemented by suggestions of a technical expert panel.Results: The approaches are classified into 3 categories: 1) Risk-based methods (11 papers) use only prognostic factors to define patient subgroups, relying on the mathematical dependency of the absolute risk difference on baseline risk; 2) Treatment effect modeling methods (9 papers) use both prognostic factors and treatment effect modifiers to explore characteristics that interact with the effects of therapy on a relative scale. These methods couple data-driven subgroup identification with approaches to prevent overfitting, such as penalization or use of separate data sets for subgroup identification and effect estimation. 3) Optimal treatment regime methods (12 papers) focus primarily on treatment effect modifiers to classify the trial population into those who benefit from treatment and those who do not. Finally, we also identified papers which describe model evaluation methods (4 papers).Conclusions: Three classes of approaches were identified to assess heterogeneity of treatment effect. Methodological research, including both simulations and empirical evaluations, is required to compare the available methods in different settings and to derive well-informed guidance for their application in RCT analysis. Show less
The PATH (Predictive Approaches to Treatment effect Heterogeneity) Statement was developed to promote the conduct of, and provide guidance for, predictive analyses of heterogeneity of treatment... Show moreThe PATH (Predictive Approaches to Treatment effect Heterogeneity) Statement was developed to promote the conduct of, and provide guidance for, predictive analyses of heterogeneity of treatment effects (HTE) in clinical trials. The goal of predictive HTE analysis is to provide patient-centered estimates of outcome risk with versus without the intervention, taking into account all relevant patient attributes simultaneously, to support more personalized clinical decision making than can be made on the basis of only an overall average treatment effect. The authors distinguished 2 categories of predictive HTE approaches (a "risk-modeling" and an "effect-modeling" approach) and developed 4 sets of guidance statements: criteria to determine when risk-modeling approaches are likely to identify clinically meaningful HTE, methodological aspects of risk-modeling methods, considerations for translation to clinical practice, and considerations and caveats in the use of effect-modeling approaches. They discuss limitations of these methods and enumerate research priorities for advancing methods designed to generate more personalized evidence. This explanation and elaboration document describes the intent and rationale of each recommendation and discusses related analytic considerations, caveats, and reservations. Show less
Heterogeneity of treatment effect (HTE) refers to the nonrandom variation in the magnitude or direction of a treatment effect across levels of a covariate, as measured on a selected scale, against... Show moreHeterogeneity of treatment effect (HTE) refers to the nonrandom variation in the magnitude or direction of a treatment effect across levels of a covariate, as measured on a selected scale, against a clinical outcome. In randomized controlled trials (RCTs), HTE is typically examined through a subgroup analysis that contrasts effects in groups of patients defined "1 variable at a time" (for example, male vs. female or old vs. young). The authors of this statement present guidance on an alternative approach to HTE analysis, "predictive HTE analysis." The goal of predictive HTE analysis is to provide patient-centered estimates of outcome risks with versus without the intervention, taking into account all relevant patient attributes simultaneously. The PATH (Predictive Approaches to Treatment effect Heterogeneity) Statement was developed using a multidisciplinary technical expert panel, targeted literature reviews, simulations to characterize potential problems with predictive approaches, and a deliberative process engaging the expert panel. The authors distinguish 2 categories of predictive HTE approaches: a "risk-modeling" approach, wherein a multivariable model predicts the risk for an outcome and is applied to disaggregate patients within RCTs to define risk-based variation in benefit, and an "effect-modeling" approach, wherein a model is developed on RCT data by incorporating a term for treatment assignment and interactions between treatment and baseline covariates. Both approaches can be used to predict differential absolute treatment effects, the most relevant scale for clinical decision making. The authors developed 4 sets of guidance: criteria to determine when risk-modeling approaches are likely to identify clinically important HTE, methodological aspects of risk-modeling methods, considerations for translation to clinical practice, and considerations and caveats in the use of effect-modeling approaches. The PATH Statement, together with its explanation and elaboration document, may guide future analyses and reporting of RCTs. Show less
Klaveren, D. van; Balan, T.A.; Steyerberg, E.W.; Kent, D.M. 2019