Objectives: Epidemiologic studies often suffer from incomplete data, measurement error (or misclassification), and confounding. Each of these can cause bias and imprecision in estimates of exposure... Show moreObjectives: Epidemiologic studies often suffer from incomplete data, measurement error (or misclassification), and confounding. Each of these can cause bias and imprecision in estimates of exposure-outcome relations. We describe and compare statistical approaches that aim to control all three sources of bias simultaneously.Study Design and Setting: We illustrate four statistical approaches that address all three sources of bias, namely, multiple imputation for missing data and measurement error, multiple imputation combined with regression calibration, full information maximum likelihood within a structural equation modeling framework, and a Bayesian model. In a simulation study, we assess the performance of the four approaches compared with more commonly used approaches that do not account for measurement error, missing values, or confounding.Results: The results demonstrate that the four approaches consistently outperform the alternative approaches on all performance metrics (bias, mean squared error, and confidence interval coverage). Even in simulated data of 100 subjects, these approaches perform well.Conclusion: There can be a large benefit of addressing measurement error, missing values, and confounding to improve the estimation of exposure-outcome relations, even when the available sample size is relatively small. (C) 2020 The Authors. Published by Elsevier Inc. Show less
Vries, B.B.L.P. de; Smeden, M. van; Groenwold, R.H.H. 2021
Joint misclassification of exposure and outcome variables can lead to considerable bias in epidemiological studies of causal exposure-outcome effects. In this paper, we present a new maximum... Show moreJoint misclassification of exposure and outcome variables can lead to considerable bias in epidemiological studies of causal exposure-outcome effects. In this paper, we present a new maximum likelihood based estimator for marginal causal effects that simultaneously adjusts for confounding and several forms of joint misclassification of the exposure and outcome variables. The proposed method relies on validation data for the construction of weights that account for both sources of bias. The weighting estimator, which is an extension of the outcome misclassification weighting estimator proposed by Gravel and Platt (Weighted estimation for confounded binary outcomes subject to misclassification. Stat Med 2018; 37: 425-436), is applied to reinfarction data. Simulation studies were carried out to study its finite sample properties and compare it with methods that do not account for confounding or misclassification. The new estimator showed favourable large sample properties in the simulations. Further research is needed to study the sensitivity of the proposed method and that of alternatives to violations of their assumptions. The implementation of the estimator is facilitated by a new R function (ipwm) in an existing R package (mecor). Show less
Faquih, T.; Smeden, M. van; Luo, J.; Cessie, S. le; Kastenmuller, G.; Krumsiek, J.; ... ; Mook-Kanamori, D.O. 2020
Metabolomics studies have seen a steady growth due to the development and implementation of affordable and high-quality metabolomics platforms. In large metabolite panels, measurement values are... Show moreMetabolomics studies have seen a steady growth due to the development and implementation of affordable and high-quality metabolomics platforms. In large metabolite panels, measurement values are frequently missing and, if neglected or sub-optimally imputed, can cause biased study results. We provided a publicly available, user-friendly R script to streamline the imputation of missing endogenous, unannotated, and xenobiotic metabolites. We evaluated the multivariate imputation by chained equations (MICE) and k-nearest neighbors (kNN) analyses implemented in our script by simulations using measured metabolites data from the Netherlands Epidemiology of Obesity (NEO) study (n = 599). We simulated missing values in four unique metabolites from different pathways with different correlation structures in three sample sizes (599, 150, 50) with three missing percentages (15%, 30%, 60%), and using two missing mechanisms (completely at random and not at random). Based on the simulations, we found that for MICE, larger sample size was the primary factor decreasing bias and error. For kNN, the primary factor reducing bias and error was the metabolite correlation with its predictor metabolites. MICE provided consistently higher performance measures particularly for larger datasets (n > 50). In conclusion, we presented an imputation workflow in a publicly available R script to impute untargeted metabolomics data. Our simulations provided insight into the effects of sample size, percentage missing, and correlation structure on the accuracy of the two imputation methods. Show less
Linschoten, M.; Peters, S.; Smeden, M. van; Jewbali, L.S.; Schaap, J.; Siebelink, H.M.; ... ; CAPACITY-COVID Collaborative Conso 2020
Aims:To determine the frequency and pattern of cardiac complications in patients hospitalised with coronavirus disease (COVID-19).Methods and results:CAPACITY-COVID is an international patient... Show moreAims:To determine the frequency and pattern of cardiac complications in patients hospitalised with coronavirus disease (COVID-19).Methods and results:CAPACITY-COVID is an international patient registry established to determine the role of cardiovascular disease in the COVID-19 pandemic. In this registry, data generated during routine clinical practice are collected in a standardised manner for patients with a (highly suspected) severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection requiring hospitalisation. For the current analysis, consecutive patients with laboratory confirmed COVID-19 registered between 28 March and 3 July 2020 were included. Patients were followed for the occurrence of cardiac complications and pulmonary embolism from admission to discharge. In total, 3011 patients were included, of which 1890 (62.8%) were men. The median age was 67 years (interquartile range 56-76); 937 (31.0%) patients had a history of cardiac disease, with pre-existent coronary artery disease being most common (n=463, 15.4%). During hospitalisation, 595 (19.8%) patients died, including 16 patients (2.7%) with cardiac causes. Cardiac complications were diagnosed in 349 (11.6%) patients, with atrial fibrillation (n=142, 4.7%) being most common. The incidence of other cardiac complications was 1.8% for heart failure (n=55), 0.5% for acute coronary syndrome (n=15), 0.5% for ventricular arrhythmia (n=14), 0.1% for bacterial endocarditis (n=4) and myocarditis (n=3), respectively, and 0.03% for pericarditis (n=1). Pulmonary embolism was diagnosed in 198 (6.6%) patients.Conclusion:This large study among 3011 hospitalised patients with COVID-19 shows that the incidence of cardiac complications during hospital admission is low, despite a frequent history of cardiovascular disease. Long-term cardiac outcomes and the role of pre-existing cardiovascular disease in COVID-19 outcome warrants further investigation. Show less
Nab, L.; Groenwold, R.H.H.; Smeden, M. van; Keogh, R.H. 2020
Observational data are increasingly used with the aim of estimating causal effects of treatments, through careful control for confounding. Marginal structural models estimated using inverse... Show moreObservational data are increasingly used with the aim of estimating causal effects of treatments, through careful control for confounding. Marginal structural models estimated using inverse probability weighting (MSMs-IPW), like other methods to control for confounding, assume that confounding variables are measured without error. The average treatment effect in an MSM-IPW may however be biased when a confounding variable is error prone. Using the potential outcome framework, we derive expressions for the bias due to confounder misclassification in analyses that aim to estimate the average treatment effect using an marginal structural model estimated using inverse probability weighting (MSM-IPW). We compare this bias with the bias due to confounder misclassification in analyses based on a conditional regression model. Focus is on a point-treatment study with a continuous outcome. Compared with bias in the average treatment effect in a conditional model, the bias in an MSM-IPW can be different in magnitude but is equal in sign. Also, we use a simulation study to investigate the finite sample performance of MSM-IPW and conditional models when a confounding variable is misclassified. Simulation results indicate that confidence intervals of the treatment effect obtained from MSM-IPW are generally wider, and coverage of the true treatment effect is higher compared with a conditional model, ranging from overcoverage if there is no confounder misclassification to undercoverage when there is confounder misclassification. Further, we illustrate in a study of blood pressure-lowering therapy, how the bias expressions can be used to inform a quantitative bias analysis to study the impact of confounder misclassification, supported by an online tool. Show less
Calster, B. van; Smeden, M. van; Cock, B. de; Steyerberg, E.W. 2020
When developing risk prediction models on datasets with limited sample size, shrinkage methods are recommended. Earlier studies showed that shrinkage results in better predictive performance on... Show moreWhen developing risk prediction models on datasets with limited sample size, shrinkage methods are recommended. Earlier studies showed that shrinkage results in better predictive performance on average. This simulation study aimed to investigate the variability of regression shrinkage on predictive performance for a binary outcome. We compared standard maximum likelihood with the following shrinkage methods: uniform shrinkage (likelihood-based and bootstrap-based), penalized maximum likelihood (ridge) methods, LASSO logistic regression, adaptive LASSO, and Firth's correction. In the simulation study, we varied the number of predictors and their strength, the correlation between predictors, the event rate of the outcome, and the events per variable. In terms of results, we focused on the calibration slope. The slope indicates whether risk predictions are too extreme (slope < 1) or not extreme enough (slope > 1). The results can be summarized into three main findings. First, shrinkage improved calibration slopes on average. Second, the between-sample variability of calibration slopes was often increased relative to maximum likelihood. In contrast to other shrinkage approaches, Firth's correction had a small shrinkage effect but showed low variability. Third, the correlation between the estimated shrinkage and the optimal shrinkage to remove overfitting was typically negative, with Firth's correction as the exception. We conclude that, despite improved performance on average, shrinkage often worked poorly in individual datasets, in particular when it was most needed. The results imply that shrinkage methods do not solve problems associated with small sample size or low number of events per variable. Show less
Objective: Article full texts are often inaccessible via the standard search engines of biomedical literature, such as PubMed and Embase, which are commonly used for systematic reviews. Excluding... Show moreObjective: Article full texts are often inaccessible via the standard search engines of biomedical literature, such as PubMed and Embase, which are commonly used for systematic reviews. Excluding the full-text bodies from a literature search may result in a small or selective subset of articles being included in the review because of the limited information that is available in only title, abstract, and keywords. This article describes a comparison of search strategies based on a systematic literature review of all articles published in 5 topranked epidemiology journals between 2000 and 2017. Study Design and Setting: Based on a text-mining approach, we studied how nine different methodological topics were mentioned across text fields (title, abstract, keywords, and text body). The following methodological topics were studied: propensity score methods, inverse probability weighting, marginal structural modeling, multiple imputation, Kaplan-Meier estimation, number needed to treat, measurement error, randomized controlled trial, and latent class analysis. Results: In total, 31,641 Hypertext Markup Language (HTML) files were downloaded from the journals' websites. For all methodological topics and journals, at most 50% of articles with a mention of a topic in the text body also mentioned the topic in the title, abstract, or keywords. For several topics, a gradual decrease over calendar time was observed of reporting in the title, abstract, or keywords. Conclusion: Literature searches based on title, abstract, and keywords alone may not be sufficiently sensitive for studies of epidemiological research practice. This study also illustrates the potential value of full-text literature searches, provided there is accessibility of fulltext bodies for literature searches. (C) 2020 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). Show less
Wynants, L.; Calster, B. van; Bonten, M.M.J.; Collins, G.S.; Debray, T.P.A.; Vos, M. de; ... ; Smeden, M. van 2020
OBJECTIVETo review and critically appraise published and preprint reports of prediction models for diagnosing coronavirus disease 2019 (covid-19) in patients with suspected infection, for prognosis... Show moreOBJECTIVETo review and critically appraise published and preprint reports of prediction models for diagnosing coronavirus disease 2019 (covid-19) in patients with suspected infection, for prognosis of patients with covid-19, and for detecting people in the general population at risk of being admitted to hospital for covid-19 pneumonia.DESIGNRapid systematic review and critical appraisal.DATA SOURCESPubMed and Embase through Ovid, Arxiv, medRxiv, and bioRxiv up to 24 March 2020.STUDY SELECTIONStudies that developed or validated a multivariable covid-19 related prediction model.DATA EXTRACTIONAt least two authors independently extracted data using the CHARMS (critical appraisal and data extraction for systematic reviews of prediction modelling studies) checklist; risk of bias was assessed using PROBAST (prediction model risk of bias assessment tool).RESULTS2696 titles were screened, and 27 studies describing 31 prediction models were included. Three models were identified for predicting hospital admission from pneumonia and other events (as proxy outcomes for covid-19 pneumonia) in the general population; 18 diagnostic models for detecting covid-19 infection (13 were machine learning based on computed tomography scans); and 10 prognostic models for predicting mortality risk, progression to severe disease, or length of hospital stay. Only one study used patient data from outside of China. The most reported predictors of presence of covid-19 in patients with suspected disease included age, body temperature, and signs and symptoms. The most reported predictors of severe prognosis in patients with covid-19 included age, sex, features derived from computed tomography scans, C reactive protein, lactic dehydrogenase, and lymphocyte count. C index estimates ranged from 0.73 to 0.81 in prediction models for the general population (reported for all three models), from 0.81 to more than 0.99 in diagnostic models (reported for 13 of the 18 models), and from 0.85 to 0.98 in prognostic models (reported for six of the 10 models). All studies were rated at high risk of bias, mostly because of non-representative selection of control patients, exclusion of patients who had not experienced the event of interest by the end of the study, and high risk of model overfitting. Reporting quality varied substantially between studies. Most reports did not include a description of the study population or intended use of the models, and calibration of predictions was rarely assessed.CONCLUSIONPrediction models for covid-19 are quickly entering the academic literature to support medical decision making at a time when they are urgently needed. This review indicates that proposed models are poorly reported, at high risk of bias, and their reported performance is probably optimistic. Immediate sharing of well documented individual participant data from covid-19 studies is needed for collaborative efforts to develop more rigorous prediction models and validate existing ones. The predictors identified in included studies could be considered as candidate predictors for new models. Methodological guidance should be followed because unreliable predictions could cause more harm than benefit in guiding clinical decisions. Finally, studies should adhere to the TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) reporting guideline. Show less
Luijken, K.; Wynants, L.; Smeden, M. van; Calster, B. van; Steyerberg, E.W.; Groenwold, R.H.H. 2020
Objectives: The aim of this study was to quantify the impact of predictor measurement heterogeneity on prediction model performance. Predictor measurement heterogeneity refers to variation in the... Show moreObjectives: The aim of this study was to quantify the impact of predictor measurement heterogeneity on prediction model performance. Predictor measurement heterogeneity refers to variation in the measurement of predictor(s) between the derivation of a prediction model and its validation or application. It arises, for instance, when predictors are measured using different measurement instruments or protocols.Study Design and Setting: We examined the effects of various scenarios of predictor measurement heterogeneity in real-world clinical examples using previously developed prediction models for diagnosis of ovarian cancer, mutation carriers for Lynch syndrome, and intrauterine pregnancy.Results: Changing the measurement procedure of a predictor influenced the performance at validation of the prediction models in nine clinical examples. Notably, it induced model miscalibration. The calibration intercept at validation ranged from -0.70 to 1.43 (0 for good calibration), whereas the calibration slope ranged from 0.50 to 1.67 (1 for good calibration). The difference in C-statistic and scaled Brier score between derivation and validation ranged from -0.08 to +0.08 and from -0.40 to +0.16, respectively.Conclusion: This study illustrates that predictor measurement heterogeneity can influence the performance of a prediction model substantially, underlining that predictor measurements used in research settings should resemble clinical practice. Specification of measurement heterogeneity can help researchers explaining discrepancies in predictive performance between derivation and validation setting. (C) 2019 The Authors. Published by Elsevier Inc. Show less
Smeden, M. van; Lash, T.L.; Groenwold, R.H.H. 2020
Epidemiologists are often confronted with datasets to analyse which contain measurement error due to, for instance, mistaken data entries, inaccurate recordings and measurement instrument or... Show moreEpidemiologists are often confronted with datasets to analyse which contain measurement error due to, for instance, mistaken data entries, inaccurate recordings and measurement instrument or procedural errors. If the effect of measurement error is misjudged, the data analyses are hampered and the validity of the study's inferences may be affected. In this paper, we describe five myths that contribute to misjudgments about measurement error, regarding expected structure, impact and solutions to mitigate the problems resulting from mismeasurements. The aim is to clarify these measurement error misconceptions. We show that the influence of measurement error in an epidemiological data analysis can play out in ways that go beyond simple heuristics, such as heuristics about whether or not to expect attenuation of the effect estimates. Whereas we encourage epidemiologists to deliberate about the structure and potential impact of measurement error in their analyses, we also recommend exercising restraint when making claims about the magnitude or even direction of effect of measurement error if not accompanied by statistical measurement error corrections or quantitative bias analysis. Suggestions for alleviating the problems or investigating the structure and magnitude of measurement error are given. Show less
Calster, B. van; McLernon, D.J.; Smeden, M. van; Wynants, L.; Steyerberg, E.W.; STRATOS Initiative 2019
Background: The assessment of calibration performance of risk prediction models based on regression or more flexible machine learning algorithms receives little attention.Main text: Herein, we... Show moreBackground: The assessment of calibration performance of risk prediction models based on regression or more flexible machine learning algorithms receives little attention.Main text: Herein, we argue that this needs to change immediately because poorly calibrated algorithms can be misleading and potentially harmful for clinical decision-making. We summarize how to avoid poor calibration at algorithm development and how to assess calibration at algorithm validation, emphasizing balance between model complexity and the available sample size. At external validation, calibration curves require sufficiently large samples. Algorithm updating should be considered for appropriate support of clinical practice.Conclusion: Efforts are required to avoid poor calibration when developing prediction models, to evaluate calibration when validating models, and to update models when indicated. The ultimate aim is to optimize the utility of predictive analytics for shared decision-making and patient counseling. Show less
Wynants, L.; Smeden, M. van; McLernon, D.J.; Timmerman, D.; Steyerberg, E.W.; Calster, B. van; Topic Grp Evaluating Diagnosti 2019
In randomised trials, continuous endpoints are often measured with some degree of error. This study explores the impact of ignoring measurement error and proposes methods to improve statistical... Show moreIn randomised trials, continuous endpoints are often measured with some degree of error. This study explores the impact of ignoring measurement error and proposes methods to improve statistical inference in the presence of measurement error. Three main types of measurement error in continuous endpoints are considered: classical, systematic, and differential. For each measurement error type, a corrected effect estimator is proposed. The corrected estimators and several methods for confidence interval estimation are tested in a simulation study. These methods combine information about error-prone and error-free measurements of the endpoint in individuals not included in the trial (external calibration sample). We show that, if measurement error in continuous endpoints is ignored, the treatment effect estimator is unbiased when measurement error is classical, while Type-II error is increased at a given sample size. Conversely, the estimator can be substantially biased when measurement error is systematic or differential. In those cases, bias can largely be prevented and inferences improved upon using information from an external calibration sample, of which the required sample size increases as the strength of the association between the error-prone and error-free endpoint decreases. Measurement error correction using already a small (external) calibration sample is shown to improve inferences and should be considered in trials with error-prone endpoints. Implementation of the proposed correction methods is accommodated by a new software package for R. Show less
Luijken, K.; Groenwold, R.H.H.; Calster, B. van; Steyerberg, E.W.; Smeden, M. van 2019
Objectives: The objective of this study was to study the impact of ignoring uncertainty by forcing dichotomous classification (presence or absence) of the target disease on estimates of diagnostic... Show moreObjectives: The objective of this study was to study the impact of ignoring uncertainty by forcing dichotomous classification (presence or absence) of the target disease on estimates of diagnostic accuracy of an index test.Study Design and Setting: We evaluated the bias in estimated index test accuracy when forcing an expert panel to make a dichotomous target disease classification for each individual. Data for various scenarios with expert panels were simulated by varying the number and accuracy of "component reference tests" available to the expert panel, index test sensitivity and specificity, and target disease prevalence.Results: Index test accuracy estimates are likely to be biased when there is uncertainty surrounding the presence or absence of the target disease. Direction and amount of bias depend on the number and accuracy of component reference tests, target disease prevalence, and the true values of index test sensitivity and specificity.Conclusion: In this simulation, forcing expert panels to make a dichotomous decision on target disease classification in the presence of uncertainty leads to biased estimates of index test accuracy. Empirical studies are needed to demonstrate whether this bias can be reduced by assigning a probability of target disease presence for each individual, or using advanced statistical methods to account for uncertainty in target disease classification. (C) 2019 Elsevier Inc. All rights reserved. Show less
Curtin, D.; Dahly, D.L.; Smeden, M. van; O'Donnell, D.P.; Doyle, D.; Gallagher, P.; O'Mahony, D. 2019
OBJECTIVES Accurate prognostic information can enable patients and physicians to make better healthcare decisions. The Hospital-patient One-year Mortality Risk (HOMR) model accurately predicted... Show moreOBJECTIVES Accurate prognostic information can enable patients and physicians to make better healthcare decisions. The Hospital-patient One-year Mortality Risk (HOMR) model accurately predicted mortality risk (concordance [C] statistic = .92) in adult hospitalized patients in a recent study in North America. We evaluated the performance of the HOMR model in a population of older inpatients in a large teaching hospital in Ireland. DESIGN Retrospective cohort study. SETTING Acute hospital. PARTICIPANTS Patients aged 65 years or older cared for by inpatient geriatric medicine services from January 1, 2013, to March 6, 2015 (n = 1654). After excluding those who died during the index hospitalization (n = 206) and those with missing data (n = 39), the analytical sample included 1409 patients. MEASUREMENTS Administrative data and information abstracted from hospital discharge reports were used to determine covariate values for each patient. One-year mortality was determined from the hospital information system, local registries, or by contacting the patient's general practitioner. The linear predictor for each patient was calculated, and performance of the model was evaluated in terms of its overall performance, discrimination, and calibration. Recalibrated and revised models were also estimated and evaluated. RESULTS One-year mortality rate after hospital discharge in this patient cohort was 18.6%. The unadjusted HOMR model had good discrimination (C statistic = .78; 95% confidence interval = .76-.81) but was poorly calibrated and consistently overestimated mortality prediction. The model's performance was modestly improved by recalibration and revision (optimism corrected C statistic = .8). CONCLUSION The superior discriminative performance of the HOMR model reported previously was substantially attenuated in its application to our cohort of older hospitalized patients, who represent a specific subset of the original derivation cohort. Updating methods improved its performance in our cohort, but further validation, refinement, and clinical impact studies are required before use in routine clinical practice. J Am Geriatr Soc 1-6, 2019. Show less
Jong, V.M.T. de; Eijkemans, M.J.C.; Calster, B. van; Timmerman, D.; Moons, K.G.M.; Steyerberg, E.W.; Smeden, M. van 2019
Background Although the Nordic Hamstring Exercise (NHE) prevents hamstring injury in soccer players effectively, the annual incidence of these injuries still increases. This may be because of poor... Show moreBackground Although the Nordic Hamstring Exercise (NHE) prevents hamstring injury in soccer players effectively, the annual incidence of these injuries still increases. This may be because of poor long-term compliance with the program. Furthermore, the timing and amplitude of gluteal and core muscle activation seem to play an important role in hamstring injury prevention, the NHE program was not designed to improve activation of these muscles. Therefore, we propose plyometric training as an alternative to reduce hamstring injuries in soccer players. Purpose To determine the preventive effect of the Bounding Exercise Program (BEP) on hamstring injury incidence and severity in adult male amateur soccer players. Study design A cluster-Randomized Controlled Trial. Methods Thirty-two soccer teams competing in the first-class amateur league were cluster-randomized into the intervention or control group. Both groups were instructed to perform their regular training program, and the intervention group additionally performed BEP. Information about player characteristics was gathered at baseline and exposure, hamstring injuries and BEP compliance were weekly registered during one season (2016-2017). Results The data of 400 players were analyzed. In total, 57 players sustained 65 hamstring injuries. The injury incidence was 1.12/1000 hours in the intervention group and 1.39/1000 hours in the control group. There were no statistically significant differences in hamstring injury incidence (OR = 0.89, 95% CI 0.46-1.75) or severity between the groups (P > 0.48). Conclusion In this large cluster-randomized controlled trial, no evidence was found for plyometric training in its current form to reduce hamstring injuries in amateur soccer players. Show less