BackgroundIn studies of time-to-events, it is common to collect information about events that occurred before the inclusion in a prospective cohort. When the studied risk factors are independent of... Show moreBackgroundIn studies of time-to-events, it is common to collect information about events that occurred before the inclusion in a prospective cohort. When the studied risk factors are independent of time, including both pre- and post-inclusion events in the analyses, generally referred to as relying on an ambispective design, increases the statistical power but may lead to a selection bias. In the field of venous thromboembolism (VT), ABO blood groups have been the subject of extensive research due to their substantial effect on VT risk. However, few studies have investigated their effect on the risk of VT recurrence. Motivated by the study of the association of genetically determined ABO blood groups with VT recurrence, we propose a methodology to include pre-inclusion events in the analysis of ambispective studies while avoiding the selection bias due to mortality.MethodsThis work relies on two independent cohorts of VT patients, the French MARTHA study built on an ambispective design and the Dutch MEGA study built on a standard prospective design. For the analysis of the MARTHA study, a weighted Cox model was developed where weights were defined by the inverse of the survival probability at the time of data collection about the events. Thanks to the collection of information on the vital status of patients, we could estimate the survival probabilities using a delayed-entry Cox model on the death risk. Finally, results obtained in both studies were then meta-analysed.ResultsIn the combined sample totalling 2,752 patients including 993 recurrences, the A1 blood group has an increased risk (Hazard Ratio (HR) of 1.18, p = 4.2 x 10(-3)) compared with the O1 group, homogeneously in MARTHA and in MEGA. The same trend (HR = 1.19, p = 0.06) was observed for the less frequent A2 group.ConclusionThe proposed methodology increases the power of studies relying on an ambispective design which is frequent in epidemiologic studies about recurrent events. This approach allowed to clarify the association of ABO blood groups with the risk of VT recurrence. Besides, this methodology has an immediate field of application in the context of genome wide association studies. Show less
Kantidakis, G.; Putter, H.; Litière, S.; Fiocco, M. 2023
BackgroundIn health research, several chronic diseases are susceptible to competing risks (CRs). Initially, statistical models (SM) were developed to estimate the cumulative incidence of an event... Show moreBackgroundIn health research, several chronic diseases are susceptible to competing risks (CRs). Initially, statistical models (SM) were developed to estimate the cumulative incidence of an event in the presence of CRs. As recently there is a growing interest in applying machine learning (ML) for clinical prediction, these techniques have also been extended to model CRs but literature is limited. Here, our aim is to investigate the potential role of ML versus SM for CRs within non-complex data (small/medium sample size, low dimensional setting).MethodsA dataset with 3826 retrospectively collected patients with extremity soft-tissue sarcoma (eSTS) and nine predictors is used to evaluate model-predictive performance in terms of discrimination and calibration. Two SM (cause-specific Cox, Fine-Gray) and three ML techniques are compared for CRs in a simple clinical setting. ML models include an original partial logistic artificial neural network for CRs (PLANNCR original), a PLANNCR with novel specifications in terms of architecture (PLANNCR extended), and a random survival forest for CRs (RSFCR). The clinical endpoint is the time in years between surgery and disease progression (event of interest) or death (competing event). Time points of interest are 2, 5, and 10 years.ResultsBased on the original eSTS data, 100 bootstrapped training datasets are drawn. Performance of the final models is assessed on validation data (left out samples) by employing as measures the Brier score and the Area Under the Curve (AUC) with CRs. Miscalibration (absolute accuracy error) is also estimated. Results show that the ML models are able to reach a comparable performance versus the SM at 2, 5, and 10 years regarding both Brier score and AUC (95% confidence intervals overlapped). However, the SM are frequently better calibrated.ConclusionsOverall, ML techniques are less practical as they require substantial implementation time (data preprocessing, hyperparameter tuning, computational intensity), whereas regression methods can perform well without the additional workload of model training. As such, for non-complex real life survival data, these techniques should only be applied complementary to SM as exploratory tools of model's performance. More attention to model calibration is urgently needed. Show less
Background: Clinical prediction models are often not evaluated properly in specific settings or updated, for instance, with information from new markers. These key steps are needed such that models... Show moreBackground: Clinical prediction models are often not evaluated properly in specific settings or updated, for instance, with information from new markers. These key steps are needed such that models are fit for purpose and remain relevant in the long-term. We aimed to present an overview of methodological guidance for the evaluation (i.e., validation and impact assessment) and updating of clinical prediction models. Methods: We systematically searched nine databases from January 2000 to January 2022 for articles in English with methodological recommendations for the post-derivation stages of interest. Qualitative analysis was used to summarize the 70 selected guidance papers. Results: Key aspects for validation are the assessment of statistical performance using measures for discrimination (e.g., C-statistic) and calibration (e.g., calibration-in-the-large and calibration slope). For assessing impact or usefulness in clinical decision-making, recent papers advise using decision-analytic measures (e.g., the Net Benefit) over simplistic classification measures that ignore clinical consequences (e.g., accuracy, overall Net Reclassification Index). Commonly recommended methods for model updating are recalibration (i.e., adjustment of intercept or baseline hazard and/or slope), revision (i.e., re-estimation of individual predictor effects), and extension (i.e., addition of new markers). Additional methodological guidance is needed for newer types of updating (e.g., meta-model and dynamic updating) and machine learning-based models. Conclusion: Substantial guidance was found for model evaluation and more conventional updating of regression-based models. An important development in model evaluation is the introduction of a decision-analytic framework for assessing clinical usefulness. Consensus is emerging on methods for model updating. Show less
Ceyisakar, I.E.; Leeuwen, N. van; Steyerberg, E.W.; Lingsma, H.F. 2022
Background: Instrumental variable (IV) analysis holds the potential to estimate treatment effects from observational data. IV analysis potentially circumvents unmeasured confounding but makes a... Show moreBackground: Instrumental variable (IV) analysis holds the potential to estimate treatment effects from observational data. IV analysis potentially circumvents unmeasured confounding but makes a number of assumptions, such as that the IV shares no common cause with the outcome. When using treatment preference as an instrument, a common cause, such as a preference regarding related treatments, may exist. We aimed to explore the validity and precision of a variant of IV analysis where we additionally adjust for the provider: adjusted IV analysis. Methods: A treatment effect on an ordinal outcome was simulated (beta - 0.5 in logistic regression) for 15.000 patients, based on a large data set (the IMPACT data, n = 8799) using different scenarios including measured and unmeasured confounders, and a common cause of IV and outcome. We compared estimated treatment effects with patient-level adjustment for confounders, IV with treatment preference as the instrument, and adjusted IV, with hospital added as a fixed effect in the regression models. Results: The use of patient-level adjustment resulted in biased estimates for all the analyses that included unmeasured confounders, IV analysis was less confounded, but also less reliable. With correlation between treatment preference and hospital characteristics (a common cause) estimates were skewed for regular IV analysis, but not for adjusted IV analysis. Conclusion: When using IV analysis for comparing hospitals, some limitations of regular IV analysis can be overcome by adjusting for a common cause. Show less
Amini, M.; Leeuwen, N. van; Eijkenaar, F.; Graaf, R. van de; Samuels, N.; Oostenbrugge, R. van; ... ; MR CLEAN Registry Investigators 2022
Introduction: Various statistical approaches can be used to deal with unmeasured confounding when estimating treatment effects in observational studies, each with its own pros and cons. This study... Show moreIntroduction: Various statistical approaches can be used to deal with unmeasured confounding when estimating treatment effects in observational studies, each with its own pros and cons. This study aimed to compare treatment effects as estimated by different statistical approaches for two interventions in observational stroke care data. Patients and methods: We used prospectively collected data from the MR CLEAN registry including all patients (n = 3279) with ischemic stroke who underwent endovascular treatment (EVT) from 2014 to 2017 in 17 Dutch hospitals. Treatment effects of two interventions - i.e., receiving an intravenous thrombolytic (IVT) and undergoing general anesthesia (GA) before EVT- on good functional outcome (modified Rankin Scale <= 2) were estimated. We used three statistical regression-based approaches that vary in assumptions regarding the source of unmeasured confounding: individual-level (two subtypes), ecological, and instrumental variable analyses. In the latter, the preference for using the interventions in each hospital was used as an instrument. Results: Use of IVT (range 66-87%) and GA (range 0-93%) varied substantially between hospitals. For IVT, the individual-level (OR similar to 1.33) resulted in significant positive effect estimates whereas in instrumental variable analysis no significant treatment effect was found (OR 1.11; 95% CI 0.58-1.56). The ecological analysis indicated no statistically significant different likelihood (beta = - 0.002%; P=0.99) of good functional outcome at hospitals using IVT 1% more frequently. For GA, we found non-significant opposite directions of points estimates the treatment effect in the individual-level (ORs similar to 0.60) versus the instrumental variable approach (OR =1.04).The ecological analysis also resulted in a non-significant negative association (0.03% lower probability). Discussion and conclusion: Both magnitude and direction of the estimated treatment effects for both interventions depend strongly on the statistical approach and thus on the source of (unmeasured) confounding.These issues should be understood concerning the specific characteristics of data, before applying an approach and interpreting the results. Instrumental variable analysis might be considered when unobserved confounding and practice variation is expected in observational multicenter studies. Show less
Background Prospective cohort studies are challenging to deliver, with one of the main difficulties lying in retention of participants. The need to socially distance during the COVID-19 pandemic... Show moreBackground Prospective cohort studies are challenging to deliver, with one of the main difficulties lying in retention of participants. The need to socially distance during the COVID-19 pandemic has added to this challenge. The pre-COVID-19 adaptation of the European Quality (EQUAL) study in the UK to a remote form of follow-up for efficiency provides lessons for those who are considering changing their study design. Methods The EQUAL study is an international prospective cohort study of patients >= 65 years of age with advanced chronic kidney disease. Initially, patients were invited to complete a questionnaire (SF-36, Dialysis Symptom Index and Renal Treatment Satisfaction Questionnaire) at research clinics every 3-6 months, known as "traditional follow-up" (TFU). In 2018, all living patients were invited to switch to "efficient follow-up" (EFU), which used an abbreviated questionnaire consisting of SF-12 and Dialysis Symptom Index. These were administered centrally by post. Response rates were calculated using returned questionnaires as a proportion of surviving invitees, and error rates presented as the average percentage of unanswered questions or unclear answers, of total questions in returned questionnaires. Response and error rates were calculated 6-monthly in TFU to allow comparisons with EFU. Results Of the 504 patients initially recruited, 236 were still alive at the time of conversion to EFU; 111 of these (47%) consented to the change in follow-up. In those who consented, median TFU was 34 months, ranging from 0 to 42 months. Their response rates fell steadily from 88% (98/111) at month 0 of TFU, to 20% (3/15) at month 42. The response rate for the first EFU questionnaire was 60% (59/99) of those alive from TFU. With this improvement in response rates, the first EFU also lowered errors to baseline levels seen in early follow-up, after having almost trebled throughout traditional follow-up. Conclusions Overall, this study demonstrates that administration of shorter follow-up questionnaires by post rather than in person does not negatively impact patient response or error rates. These results may be reassuring for researchers who are trying to limit face-to-face contact with patients during the COVID-19 pandemic. Show less
Background We investigated whether we could use influenza data to develop prediction models for COVID-19 to increase the speed at which prediction models can reliably be developed and validated... Show moreBackground We investigated whether we could use influenza data to develop prediction models for COVID-19 to increase the speed at which prediction models can reliably be developed and validated early in a pandemic. We developed COVID-19 Estimated Risk (COVER) scores that quantify a patient's risk of hospital admission with pneumonia (COVER-H), hospitalization with pneumonia requiring intensive services or death (COVER-I), or fatality (COVER-F) in the 30-days following COVID-19 diagnosis using historical data from patients with influenza or flu-like symptoms and tested this in COVID-19 patients. Methods We analyzed a federated network of electronic medical records and administrative claims data from 14 data sources and 6 countries containing data collected on or before 4/27/2020. We used a 2-step process to develop 3 scores using historical data from patients with influenza or flu-like symptoms any time prior to 2020. The first step was to create a data-driven model using LASSO regularized logistic regression, the covariates of which were used to develop aggregate covariates for the second step where the COVER scores were developed using a smaller set of features. These 3 COVER scores were then externally validated on patients with 1) influenza or flu-like symptoms and 2) confirmed or suspected COVID-19 diagnosis across 5 databases from South Korea, Spain, and the United States. Outcomes included i) hospitalization with pneumonia, ii) hospitalization with pneumonia requiring intensive services or death, and iii) death in the 30 days after index date. Results Overall, 44,507 COVID-19 patients were included for model validation. We identified 7 predictors (history of cancer, chronic obstructive pulmonary disease, diabetes, heart disease, hypertension, hyperlipidemia, kidney disease) which combined with age and sex discriminated which patients would experience any of our three outcomes. The models achieved good performance in influenza and COVID-19 cohorts. For COVID-19 the AUC ranges were, COVER-H: 0.69-0.81, COVER-I: 0.73-0.91, and COVER-F: 0.72-0.90. Calibration varied across the validations with some of the COVID-19 validations being less well calibrated than the influenza validations. Conclusions This research demonstrated the utility of using a proxy disease to develop a prediction model. The 3 COVER models with 9-predictors that were developed using influenza data perform well for COVID-19 patients for predicting hospitalization, intensive services, and fatality. The scores showed good discriminatory performance which transferred well to the COVID-19 population. There was some miscalibration in the COVID-19 validations, which is potentially due to the difference in symptom severity between the two diseases. A possible solution for this is to recalibrate the models in each location before use. Show less
Background: Case-control designs are an important yet commonly misunderstood tool in the epidemiologist's arsenal for causal inference. We reconsider classical concepts, assumptions and principles... Show moreBackground: Case-control designs are an important yet commonly misunderstood tool in the epidemiologist's arsenal for causal inference. We reconsider classical concepts, assumptions and principles and explore when the results of case-control studies can be endowed a causal interpretation. Results: We establish how, and under which conditions, various causal estimands relating to intention-to-treat or per-protocol effects can be identified based on the data that are collected under popular sampling schemes (case-base, survivor, and risk-set sampling, with or without matching). We present a concise summary of our identification results that link the estimands to the (distribution of the) available data and articulate under which conditions these links hold. Conclusion: The modern epidemiologist's arsenal for causal inference is well-suited to make transparent for case-control designs what assumptions are necessary or sufficient to endow the respective study results with a causal interpretation and, in turn, help resolve or prevent misunderstanding. Our approach may inform future research on different estimands, other variations of the case-control design or settings with additional complexities. Show less
Neve, O.M.; Benthem, P.P.G. van; Stiggelbout, A.M.; Hensen, E.F. 2021
Background Patient Reported Outcomes (PROs) are subjective outcomes of disease and/or treatment in clinical research. For effective evaluations of PROs, high response rates are crucial. This study... Show moreBackground Patient Reported Outcomes (PROs) are subjective outcomes of disease and/or treatment in clinical research. For effective evaluations of PROs, high response rates are crucial. This study assessed the impact of the delivery method on the patients' response rate. Methods A cohort of patients with a unilateral vestibular schwannoma (a condition with substantial impact on quality of life, requiring prolonged follow-up) was assigned to three delivery methods: email, regular mail, and hybrid. Patients were matched for age and time since the last visit to the outpatient clinic. The primary outcome was the response rate, determinants other than delivery mode were age, education and time since the last consultation. In addition, the effect of a second reminder by telephone was evaluated. Results In total 602 patients participated in this study. The response rates for delivery by email, hybrid, and mail were 45, 58 and 60%, respectively. The response rates increased after a reminder by telephone to 62, 67 and 64%, respectively. A lower response rate was associated with lower level of education and longer time interval since last outpatient clinic visit. Conclusion The response rate for PRO varies by delivery method. PRO surveys by regular mail yield the highest response rate, followed by hybrid and email delivery methods. Hybrid delivery combines good response rates with the ease of digitally returned questionnaires. Show less
Jong, Y. de; Willik, E.M. van der; Milders, J.; Voorend, C.G.N.; Morton, R.L.; Dekker, F.W.; ... ; Diepen, M. van 2021
Background Reviews of qualitative studies allow for deeper understanding of concepts and findings beyond the single qualitative studies. Concerns on study reporting quality led to the publication... Show moreBackground Reviews of qualitative studies allow for deeper understanding of concepts and findings beyond the single qualitative studies. Concerns on study reporting quality led to the publication of the COREQ-guidelines for qualitative studies in 2007, followed by the ENTREQ-guidelines for qualitative reviews in 2012. The aim of this meta-review is to: 1) investigate the uptake of the COREQ- and ENTREQ- checklists in qualitative reviews; and 2) compare the quality of reporting of the primary qualitative studies included within these reviews prior- and post COREQ-publication. Methods Reviews were searched on 02-Sept-2020 and categorized as (1) COREQ- or (2) ENTREQ-using, (3) using both, or (4) non-COREQ/ENTREQ. Proportions of usage were calculated over time. COREQ-scores of the primary studies included in these reviews were compared prior- and post COREQ-publication using T-test with Bonferroni correction. Results 1.695 qualitative reviews were included (222 COREQ, 369 ENTREQ, 62 both COREQ/ENTREQ and 1.042 non-COREQ/ENTREQ), spanning 12 years (2007-2019) demonstrating an exponential publication rate. The uptake of the ENTREQ in reviews is higher than the COREQ (respectively 28% and 17%), and increases over time. COREQ-scores could be extracted from 139 reviews (including 2.775 appraisals). Reporting quality improved following the COREQ-publication with 13 of the 32 signalling questions showing improvement; the average total score increased from 15.15 to 17.74 (p-value < 0.001). Conclusion The number of qualitative reviews increased exponentially, but the uptake of the COREQ and ENTREQ was modest overall. Primary qualitative studies show a positive trend in reporting quality, which may have been facilitated by the publication of the COREQ. Show less
Penson, A.; Deuren, S. van; Bronkhorst, E.; Keizer, E.; Heskes, T.; Coenen, M.J.H.; ... ; Loonen, J. 2021
Background A debilitating late effect for childhood cancer survivors (CCS) is cancer-related fatigue (CRF). Little is known about the prevalence and risk factors of fatigue in this population. Here... Show moreBackground A debilitating late effect for childhood cancer survivors (CCS) is cancer-related fatigue (CRF). Little is known about the prevalence and risk factors of fatigue in this population. Here we describe the methodology of the Dutch Childhood Cancer Survivor Late Effect Study on fatigue (DCCSS LATER fatigue study). The aim of the DCCSS LATER fatigue study is to examine the prevalence of and factors associated with CRF, proposing a model which discerns predisposing, triggering, maintaining and moderating factors. Triggering factors are related to the cancer diagnosis and treatment during childhood and are thought to trigger fatigue symptoms. Maintaining factors are daily life- and psychosocial factors which may perpetuate fatigue once triggered. Moderating factors might influence the way fatigue symptoms express in individuals. Predisposing factors already existed before the diagnosis, such as genetic factors, and are thought to increase the vulnerability to develop fatigue. Methodology of the participant inclusion, data collection and planned analyses of the DCCSS LATER fatigue study are presented. Results Data of 1955 CCS and 455 siblings was collected. Analysis of the data is planned and we aim to start reporting the first results in 2022. Conclusion The DCCSS LATER fatigue study will provide information on the epidemiology of CRF and investigate the role of a broad range of associated factors in CCS. Insight in associated factors for fatigue in survivors experiencing severe and persistent fatigue may help identify individuals at risk for developing CRF and may aid in the development of interventions. Show less
Rodriguez-Girondo, M.; Berg, N. van den; Hof, M.H.; Beekman, M.; Slagboom, E. 2021
Background: Although human longevity tends to cluster within families, genetic studies on longevity have had limited success in identifying longevity loci. One of the main causes of this limited... Show moreBackground: Although human longevity tends to cluster within families, genetic studies on longevity have had limited success in identifying longevity loci. One of the main causes of this limited success is the selection of participants. Studies generally include sporadically long-lived individuals, i.e. individuals with the longevity phenotype but without a genetic predisposition for longevity. The inclusion of these individuals causes phenotype heterogeneity which results in power reduction and bias. A way to avoid sporadically long-lived individuals and reduce sample heterogeneity is to include family history of longevity as selection criterion using a longevity family score. A main challenge when developing family scores are the large differences in family size, because of real differences in sibship sizes or because of missing data.Methods: We discussed the statistical properties of two existing longevity family scores: the Family Longevity Selection Score (FLoSS) and the Longevity Relatives Count (LRC) score and we evaluated their performance dealing with differential family size. We proposed a new longevity family score, the mLRC score, an extension of the LRC based on random effects modeling, which is robust for family size and missing values. The performance of the new mLRC as selection tool was evaluated in an intensive simulation study and illustrated in a large real dataset, the Historical Sample of the Netherlands (HSN).Results: Empirical scores such as the FLOSS and LRC cannot properly deal with differential family size and missing data. Our simulation study showed that mLRC is not affected by family size and provides more accurate selections of long-lived families. The analysis of 1105 sibships of the Historical Sample of the Netherlands showed that the selection of long-lived individuals based on the mLRC score predicts excess survival in the validation set better than the selection based on the LRC score .Conclusions: Model-based score systems such as the mLRC score help to reduce heterogeneity in the selection of long-lived families. The power of future studies into the genetics of longevity can likely be improved and their bias reduced, by selecting long-lived cases using the mLRC. Show less
Background: There is a growing interest in assessment of the quality of hospital care, based on outcome measures. Many quality of care comparisons rely on binary outcomes, for example mortality... Show moreBackground: There is a growing interest in assessment of the quality of hospital care, based on outcome measures. Many quality of care comparisons rely on binary outcomes, for example mortality rates. Due to low numbers, the observed differences in outcome are partly subject to chance. We aimed to quantify the gain in efficiency by ordinal instead of binary outcome analyses for hospital comparisons. We analyzed patients with traumatic brain injury (TBI) and stroke as examples.Methods: We sampled patients from two trials. We simulated ordinal and dichotomous outcomes based on the modified Rankin Scale (stroke) and Glasgow Outcome Scale (TBI) in scenarios with and without true differences between hospitals in outcome. The potential efficiency gain of ordinal outcomes, analyzed with ordinal logistic regression, compared to dichotomous outcomes, analyzed with binary logistic regression was expressed as the possible reduction in sample size while keeping the same statistical power to detect outliers.Results: In the IMPACT study (9578 patients in 265 hospitals, mean number of patients per hospital = 36), the analysis of the ordinal scale rather than the dichotomized scale ('unfavorable outcome'), allowed for up to 32% less patients in the analysis without a loss of power. In the PRACTISE trial (1657 patients in 12 hospitals, mean number of patients per hospital = 138), ordinal analysis allowed for 13% less patients. Compared to mortality, ordinal outcome analyses allowed for up to 37 to 63% less patients.Conclusions: Ordinal analyses provide the statistical power of substantially larger studies which have been analyzed with dichotomization of endpoints. We advise to exploit ordinal outcome measures for hospital comparisons, in order to increase efficiency in quality of care measurements. Show less
Kantidakis, G.; Putter, H.; Lancia, C.; Boer, J. de; Braat, A.E.; Fiocco, M. 2020
Background Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models... Show moreBackground Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians. Methods In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques. Results Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years. Conclusion In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables. Show less
Rekkas, A.; Paulus, J.K.; Raman, G.; Wong, J.B.; Steyerberg, E.W.; Rijnbeek, P.R.; ... ; Klaveren, D. van 2020
Background: Recent evidence suggests that there is often substantial variation in the benefits and harms across a trial population. We aimed to identify regression modeling approaches that assess... Show moreBackground: Recent evidence suggests that there is often substantial variation in the benefits and harms across a trial population. We aimed to identify regression modeling approaches that assess heterogeneity of treatment effect within a randomized clinical trial.Methods: We performed a literature review using a broad search strategy, complemented by suggestions of a technical expert panel.Results: The approaches are classified into 3 categories: 1) Risk-based methods (11 papers) use only prognostic factors to define patient subgroups, relying on the mathematical dependency of the absolute risk difference on baseline risk; 2) Treatment effect modeling methods (9 papers) use both prognostic factors and treatment effect modifiers to explore characteristics that interact with the effects of therapy on a relative scale. These methods couple data-driven subgroup identification with approaches to prevent overfitting, such as penalization or use of separate data sets for subgroup identification and effect estimation. 3) Optimal treatment regime methods (12 papers) focus primarily on treatment effect modifiers to classify the trial population into those who benefit from treatment and those who do not. Finally, we also identified papers which describe model evaluation methods (4 papers).Conclusions: Three classes of approaches were identified to assess heterogeneity of treatment effect. Methodological research, including both simulations and empirical evaluations, is required to compare the available methods in different settings and to derive well-informed guidance for their application in RCT analysis. Show less
Huebner, M.; Vach, W.; Cessie, S. le; Schmidt, C.O.; Lusa, L.; STRATOS Initiative STRengthening A 2020
BackgroundIn the data pipeline from the data collection process to the planned statistical analyses, initial data analysis (IDA) typically takes place between the end of the data collection and do... Show moreBackgroundIn the data pipeline from the data collection process to the planned statistical analyses, initial data analysis (IDA) typically takes place between the end of the data collection and do not touch the research questions. A systematic process for IDA and clear reporting of the findings would help to understand the potential shortcomings of a dataset, such as missing values, or subgroups with small sample sizes, or shortcomings in the collection process, and to evaluate the impact of these shortcomings on the research results. A clear reporting of findings is also relevant when making datasets available to other researchers. Initial data analyses can provide valuable insights into the suitability of a data set for a future research study. Our aim was to describe the practice of reporting of initial data analyses in observational studies in five highly ranked medical journals with focus on data cleaning, screening, and reporting of findings which led to a potential change in the analysis plan.MethodsThis review was carried out using systematic search strategies with eligibility criteria for articles to be reviewed. A total of 25 papers about observational studies were selected from five medical journals published in 2018. Each paper was reviewed by two reviewers and IDA statements were further discussed by all authors. The consensus was reported.ResultsIDA statements were reported in the methods, results, discussion, and supplement of papers. Ten out of 25 papers (40%) included a statement about data cleaning. Data screening statements were included in all articles, and 18 (72%) indicated the methods used to describe them. Item missingness was reported in 11 papers (44%), unit missingness in 15 papers (60%). Eleven papers (44%) mentioned some changes in the analysis plan. Reported changes referred to missing data treatment, unexpected values, population heterogeneity and aspects related to variable distributions or data properties.ConclusionReporting of initial data analyses were sparse, and statements on IDA were located throughout the research articles. There is a lack of systematic reporting of IDA. We conclude the article with recommendations on how to overcome shortcomings in the practice of IDA reporting in observational studies. Show less