Objective When correcting for the "class imbalance" problem in medical data, the effects of resampling applied on classifier algorithms remain unclear. We examined the effect on performance over... Show moreObjective When correcting for the "class imbalance" problem in medical data, the effects of resampling applied on classifier algorithms remain unclear. We examined the effect on performance over several combinations of classifiers and resampling ratios. Materials and Methods Multiple classification algorithms were trained on 7 resampled datasets: no correction, random undersampling, 4 ratios of Synthetic Minority Oversampling Technique (SMOTE), and random oversampling with the Adaptive Synthetic algorithm (ADASYN). Performance was evaluated in Area Under the Curve (AUC), precision, recall, Brier score, and calibration metrics. A case study on prediction modeling for 30-day unplanned readmissions in previously admitted Urology patients was presented. Results For most algorithms, using resampled data showed a significant increase in AUC and precision, ranging from 0.74 (CI: 0.69-0.79) to 0.93 (CI: 0.92-0.94), and 0.35 (CI: 0.12-0.58) to 0.86 (CI: 0.81-0.92) respectively. All classification algorithms showed significant increases in recall, and significant decreases in Brier score with distorted calibration overestimating positives. Discussion Imbalance correction resulted in an overall improved performance, yet poorly calibrated models. There can still be clinical utility due to a strong discriminating performance, specifically when predicting only low and high risk cases is clinically more relevant. Conclusion Resampling data resulted in increased performances in classification algorithms, yet produced an overestimation of positive predictions. Based on the findings from our case study, a thoughtful predefinition of the clinical prediction task may guide the use of resampling techniques in future studies aiming to improve clinical decision support tools. Show less
We consider the spatially inhomogeneous Moran model with seed-banks introduced in den Hollander and Nandan (2021). Populations comprising active and dormant individuals are structured in colonies... Show moreWe consider the spatially inhomogeneous Moran model with seed-banks introduced in den Hollander and Nandan (2021). Populations comprising active and dormant individuals are structured in colonies labelled by Zd, d≥1. The population sizes are drawn from an ergodic, translation-invariant, uniformly elliptic field that form a random environment. Individuals carry one of two types: ♡, ♠. Dormant individual resides in what is called a seed-bank. Active individuals exchange type from seed-bank of their own colony and resample type by choosing parent from the active populations according to a symmetric migration kernel. In den Hollander and Nandan (2021) by using a dual (an interacting coalescing particle system), we showed that the spatial system exhibits a dichotomy between clustering (mono-type equilibrium) and coexistence (multi-type equilibrium). In this paper we identify the domain of attraction for each mono-type equilibrium in the clustering regime for a fixed environment. We also show that when the migration kernel is recurrent, for a.e. realization of the environment, the system with an initially consistent type distribution converges weakly to a mono-type equilibrium in which the fixation probability to type-♡ configuration does not depend on the environment. A formula for the fixation probability is given in terms of an annealed average of type-♡ densities in dormant and active population biased by ratio of the two population sizes at the target colony.Primary techniques employed in the proofs include stochastic duality and the environment process viewed from particle, introduced in Dolgopyat and Goldsheid (2019) for random walk in random environment on a strip. A spectral analysis of Markov operator yields quenched weak convergence of the environment process associated with the single-particle dual process to a reversible ergodic distribution, which we transfer to the spatial system of populations by using duality. Show less