BackgroundIn studies of time-to-events, it is common to collect information about events that occurred before the inclusion in a prospective cohort. When the studied risk factors are independent of... Show moreBackgroundIn studies of time-to-events, it is common to collect information about events that occurred before the inclusion in a prospective cohort. When the studied risk factors are independent of time, including both pre- and post-inclusion events in the analyses, generally referred to as relying on an ambispective design, increases the statistical power but may lead to a selection bias. In the field of venous thromboembolism (VT), ABO blood groups have been the subject of extensive research due to their substantial effect on VT risk. However, few studies have investigated their effect on the risk of VT recurrence. Motivated by the study of the association of genetically determined ABO blood groups with VT recurrence, we propose a methodology to include pre-inclusion events in the analysis of ambispective studies while avoiding the selection bias due to mortality.MethodsThis work relies on two independent cohorts of VT patients, the French MARTHA study built on an ambispective design and the Dutch MEGA study built on a standard prospective design. For the analysis of the MARTHA study, a weighted Cox model was developed where weights were defined by the inverse of the survival probability at the time of data collection about the events. Thanks to the collection of information on the vital status of patients, we could estimate the survival probabilities using a delayed-entry Cox model on the death risk. Finally, results obtained in both studies were then meta-analysed.ResultsIn the combined sample totalling 2,752 patients including 993 recurrences, the A1 blood group has an increased risk (Hazard Ratio (HR) of 1.18, p = 4.2 x 10(-3)) compared with the O1 group, homogeneously in MARTHA and in MEGA. The same trend (HR = 1.19, p = 0.06) was observed for the less frequent A2 group.ConclusionThe proposed methodology increases the power of studies relying on an ambispective design which is frequent in epidemiologic studies about recurrent events. This approach allowed to clarify the association of ABO blood groups with the risk of VT recurrence. Besides, this methodology has an immediate field of application in the context of genome wide association studies. Show less
Kantidakis, G.; Putter, H.; Litière, S.; Fiocco, M. 2023
BackgroundIn health research, several chronic diseases are susceptible to competing risks (CRs). Initially, statistical models (SM) were developed to estimate the cumulative incidence of an event... Show moreBackgroundIn health research, several chronic diseases are susceptible to competing risks (CRs). Initially, statistical models (SM) were developed to estimate the cumulative incidence of an event in the presence of CRs. As recently there is a growing interest in applying machine learning (ML) for clinical prediction, these techniques have also been extended to model CRs but literature is limited. Here, our aim is to investigate the potential role of ML versus SM for CRs within non-complex data (small/medium sample size, low dimensional setting).MethodsA dataset with 3826 retrospectively collected patients with extremity soft-tissue sarcoma (eSTS) and nine predictors is used to evaluate model-predictive performance in terms of discrimination and calibration. Two SM (cause-specific Cox, Fine-Gray) and three ML techniques are compared for CRs in a simple clinical setting. ML models include an original partial logistic artificial neural network for CRs (PLANNCR original), a PLANNCR with novel specifications in terms of architecture (PLANNCR extended), and a random survival forest for CRs (RSFCR). The clinical endpoint is the time in years between surgery and disease progression (event of interest) or death (competing event). Time points of interest are 2, 5, and 10 years.ResultsBased on the original eSTS data, 100 bootstrapped training datasets are drawn. Performance of the final models is assessed on validation data (left out samples) by employing as measures the Brier score and the Area Under the Curve (AUC) with CRs. Miscalibration (absolute accuracy error) is also estimated. Results show that the ML models are able to reach a comparable performance versus the SM at 2, 5, and 10 years regarding both Brier score and AUC (95% confidence intervals overlapped). However, the SM are frequently better calibrated.ConclusionsOverall, ML techniques are less practical as they require substantial implementation time (data preprocessing, hyperparameter tuning, computational intensity), whereas regression methods can perform well without the additional workload of model training. As such, for non-complex real life survival data, these techniques should only be applied complementary to SM as exploratory tools of model's performance. More attention to model calibration is urgently needed. Show less
Ismail, R.K.; Schramel, F.M.N.H.; Dartel, M. van; Pasmooij, A.M.G.; Welle, C.M. van der; Hilarius, D.L.; ... ; Garde, E.M.W. van de 2023
BackgroundMany studies have compared real-world clinical outcomes of immunotherapy in patients with metastatic non-small cell lung cancer (NSCLC) with reported outcomes data from pivotal trials.... Show moreBackgroundMany studies have compared real-world clinical outcomes of immunotherapy in patients with metastatic non-small cell lung cancer (NSCLC) with reported outcomes data from pivotal trials. However, any differences observed could be only limitedly explored further for causation because of the unavailability of individual patient data (IPD) from trial participants. The present study aims to explore the additional benefit of comparison with IPD.MethodsThis study compares progression free survival (PFS) and overall survival (OS) of metastatic NSCLC patients treated with second line nivolumab in real-world clinical practice (n = 141) with IPD from participants in the Checkmate-057 clinical trial (n = 292). Univariate and multivariate Cox proportional hazards models were used to construct HRs for real-world practice versus clinical trial.ResultsReal-world patients were older (64 vs. 61 years), had more often ECOG PS ≥ 2 (5 vs. 0%) and were less often treated with subsequent anti-cancer treatment (28.4 vs. 42.5%) compared to trial patients. The median PFS in real-world patients was longer (3.84 (95%CI: 3.19-5.49) vs 2.30 (2.20-3.50) months) and the OS shorter than in trial participants (8.25 (6.93-13.2) vs. 12.2 (9.90-15.1) months). Adjustment with available patient characteristics, led to a shift in the hazard ratio (HR) for OS, but not for PFS (HRs from 1.13 (0.88-1.44) to 1.07 (0.83-1.38), and from 0.82 (0.66-1.03) to 0.79 (0.63-1.00), respectively).ConclusionsThis study is an example how IPD from both real-world and trial patients can be applied to search for factors that could explain an efficacy-effectiveness gap. Making IPD from clinical trials available to the international research community allows this. Show less
Huebner, M.; Vach, W.; Cessie, S. le; Schmidt, C.O.; Lusa, L.; STRATOS Initiative STRengthening A 2020
BackgroundIn the data pipeline from the data collection process to the planned statistical analyses, initial data analysis (IDA) typically takes place between the end of the data collection and do... Show moreBackgroundIn the data pipeline from the data collection process to the planned statistical analyses, initial data analysis (IDA) typically takes place between the end of the data collection and do not touch the research questions. A systematic process for IDA and clear reporting of the findings would help to understand the potential shortcomings of a dataset, such as missing values, or subgroups with small sample sizes, or shortcomings in the collection process, and to evaluate the impact of these shortcomings on the research results. A clear reporting of findings is also relevant when making datasets available to other researchers. Initial data analyses can provide valuable insights into the suitability of a data set for a future research study. Our aim was to describe the practice of reporting of initial data analyses in observational studies in five highly ranked medical journals with focus on data cleaning, screening, and reporting of findings which led to a potential change in the analysis plan.MethodsThis review was carried out using systematic search strategies with eligibility criteria for articles to be reviewed. A total of 25 papers about observational studies were selected from five medical journals published in 2018. Each paper was reviewed by two reviewers and IDA statements were further discussed by all authors. The consensus was reported.ResultsIDA statements were reported in the methods, results, discussion, and supplement of papers. Ten out of 25 papers (40%) included a statement about data cleaning. Data screening statements were included in all articles, and 18 (72%) indicated the methods used to describe them. Item missingness was reported in 11 papers (44%), unit missingness in 15 papers (60%). Eleven papers (44%) mentioned some changes in the analysis plan. Reported changes referred to missing data treatment, unexpected values, population heterogeneity and aspects related to variable distributions or data properties.ConclusionReporting of initial data analyses were sparse, and statements on IDA were located throughout the research articles. There is a lack of systematic reporting of IDA. We conclude the article with recommendations on how to overcome shortcomings in the practice of IDA reporting in observational studies. Show less