Logistic regression is one of the most commonly used approaches to develop clinical risk prediction models. Developers of such models often rely on approaches that aim to minimize the risk of... Show moreLogistic regression is one of the most commonly used approaches to develop clinical risk prediction models. Developers of such models often rely on approaches that aim to minimize the risk of overfitting and improve predictive performance of the logistic model, such as through likelihood penalization and variance decomposition techniques. We present an extensive simulation study that compares the out-of-sample predictive performance of risk prediction models derived using the elastic net, with Lasso and ridge as special cases, and variance decomposition techniques, namely, incomplete principal component regression and incomplete partial least squares regression. We varied the expected events per variable, event fraction, number of candidate predictors, presence of noise predictors, and the presence of sparse predictors in a full-factorial design. Predictive performance was compared on measures of discrimination, calibration, and prediction error. Simulation metamodels were derived to explain the performance differences within model derivation approaches. Our results indicate that, on average, prediction models developed using penalization and variance decomposition approaches outperform models developed using ordinary maximum likelihood estimation, with penalization approaches being consistently superior over the variance decomposition approaches. Differences in performance were most pronounced on the calibration of the model. Performance differences regarding prediction error and concordance statistic outcomes were often small between approaches. The use of likelihood penalization and variance decomposition techniques methods was illustrated in the context of peripheral arterial disease. Show less
Purpose In studies of effects of time-varying drug exposures, adequate adjustment for time-varying covariates is often necessary to properly control for confounding. However, the granularity of the... Show morePurpose In studies of effects of time-varying drug exposures, adequate adjustment for time-varying covariates is often necessary to properly control for confounding. However, the granularity of the available covariate data may not be sufficiently fine, for example when covariates are measured for participants only when their exposure levels change. Methods To illustrate the impact of choices regarding the frequency of measuring time-varying covariates, we simulated data for a large target trial and for large observational studies, varying in covariate measurement design. Covariates were measured never, on a fixed-interval basis, or each time the exposure level switched. For the analysis, it was assumed that covariates remain constant in periods of no measurement. Cumulative survival probabilities for continuous exposure and non-exposure were estimated using inverse probability weighting to adjust for time-varying confounding, with special emphasis on the difference between 5-year event risks. Results With monthly covariate measurements, estimates based on observational data coincided with trial-based estimates, with 5-year risk differences being zero. Without measurement of baseline or post-baseline covariates, this risk difference was estimated to be 49% based on the available observational data. With measurements on a fixed-interval basis only, 5-year risk differences deviated from the null, to 29% for 6-monthly measurements, and with magnitude increasing up to 35% as the interval length increased. Risk difference estimates diverged from the null to as low as -18% when covariates were measured depending on exposure level switching. Conclusion Our simulations highlight the need for careful consideration of time-varying covariates in designing studies on time-varying exposures. We caution against implementing designs with long intervals between measurements. The maximum length required will depend on the rates at which treatments and covariates change, with higher rates requiring shorter measurement intervals. Show less
While mass spectrometry still dominates proteomics research, alternative and potentially disruptive, next-generation technologies are receiving increased investment and attention. Most of these... Show moreWhile mass spectrometry still dominates proteomics research, alternative and potentially disruptive, next-generation technologies are receiving increased investment and attention. Most of these technologies aim at the sequencing of single peptide or protein molecules, typically labeling or otherwise distinguishing a subset of the proteinogenic amino acids. This note considers some theoretical aspects of these future technologies from a bottom-up proteomics viewpoint, including the ability to uniquely identify human proteins as a function of which and how many amino acids can be read, enzymatic efficiency, and the maximum read length. This is done through simulations under ideal and non-ideal conditions to set benchmarks for what may be achievable with future single-molecule sequencing technology. The simulations reveal, among other observations, that the best choice of reading N amino acids performs similarly to the average choice of N+1 amino acids, and that the discrimination power of the amino acids scales with their frequency in the proteome. The simulations are agnostic with respect to the next-generation proteomics platform, and the results and conclusions should therefore be applicable to any single-molecule partial peptide sequencing technology. Show less
Faquih, T.; Smeden, M. van; Luo, J.; Cessie, S. le; Kastenmuller, G.; Krumsiek, J.; ... ; Mook-Kanamori, D.O. 2020
Metabolomics studies have seen a steady growth due to the development and implementation of affordable and high-quality metabolomics platforms. In large metabolite panels, measurement values are... Show moreMetabolomics studies have seen a steady growth due to the development and implementation of affordable and high-quality metabolomics platforms. In large metabolite panels, measurement values are frequently missing and, if neglected or sub-optimally imputed, can cause biased study results. We provided a publicly available, user-friendly R script to streamline the imputation of missing endogenous, unannotated, and xenobiotic metabolites. We evaluated the multivariate imputation by chained equations (MICE) and k-nearest neighbors (kNN) analyses implemented in our script by simulations using measured metabolites data from the Netherlands Epidemiology of Obesity (NEO) study (n = 599). We simulated missing values in four unique metabolites from different pathways with different correlation structures in three sample sizes (599, 150, 50) with three missing percentages (15%, 30%, 60%), and using two missing mechanisms (completely at random and not at random). Based on the simulations, we found that for MICE, larger sample size was the primary factor decreasing bias and error. For kNN, the primary factor reducing bias and error was the metabolite correlation with its predictor metabolites. MICE provided consistently higher performance measures particularly for larger datasets (n > 50). In conclusion, we presented an imputation workflow in a publicly available R script to impute untargeted metabolomics data. Our simulations provided insight into the effects of sample size, percentage missing, and correlation structure on the accuracy of the two imputation methods. Show less
By training with virtual opponents known as computer generated forces (CGFs), trainee fighter pilots can build the experience necessary for air combat operations, at a fraction of the cost of... Show moreBy training with virtual opponents known as computer generated forces (CGFs), trainee fighter pilots can build the experience necessary for air combat operations, at a fraction of the cost of training with real aircraft. In practice however, the variety of CGFs is not as wide as it can be. This is largely due to a lack of behaviour models for the CGFs. In this thesis we investigate to what extent behaviour models for the CGFs in air combat training simulations can be automatically generated, by the use of machine learning.The domain of air combat is complex, and machine learning methods that operate within this domain must be suited to the challenges posed by the domain. Our research shows that the dynamic scripting algorithm greatly facilitates the automatic generation of air combat behaviour models, while being sufficiently flexible to be moulded into answers to the challenges. However, ensuring the validity of the newly generated behaviour models remains to be a point of attention for future research. Show less
Smet, M.D. de; Jonge, N. de; Iannetta, D.; Faridpooya, K.; Oosterhout, E. van; Naus, G.; ... ; Beelen, M.J. 2019
With the proliferation of online learning, the future of classroom teaching has been called into question. However, the unfaltering popularity of brick-and-mortar courses indicates that direct... Show moreWith the proliferation of online learning, the future of classroom teaching has been called into question. However, the unfaltering popularity of brick-and-mortar courses indicates that direct access to expert knowledge and face-to-face engagements remain key considerations for students. Here we showcase a combination of these two worlds in a Small Private Online Course (SPOC). Compared to Massive Open Online Courses (MOOCs), SPOCs are developed for smaller and more dedicated target groups and depend on close engagement between teachers and students. This format enables educational providers to involve internal and external students and teachers alike and to make ample use of online resources. This paper is based upon our experiences of running a SPOC on ‘Modelling and Simulation in Archaeology’ at Leiden University. We review the process of developing and running the course aimed at teaching archaeology students computer programming skills, while supporting their development as professional archaeologists and responsible academics. Show less
Dam, M.; Ottenhof, K.W.; Boxtel, C.A.M. van; Janssen, F.J.J.M. 2019
Out of all the complex systems in science education curricula, cellular respiration is considered to be one of the most complex and abstract processes. Students are known to have low interest and... Show moreOut of all the complex systems in science education curricula, cellular respiration is considered to be one of the most complex and abstract processes. Students are known to have low interest and diculties in conceptual understanding of cellular respiration which provides a challenge for teaching and learning. In this study, we took literature about modelling and teaching and learning cellular respiration as a starting point for the design of a concrete dynamic model in which students (n = 126) use Lego® to simulate the process of cellular respiration. Students used the simulation embedded in the context of finding the eciency of a sediment battery as a future source of green energy and we tested the e ects on conceptual learning and situational interest in an experimental study. Results on conceptual learning show that both experimental and control groups had comparable results in the test. The questions that students in the experimental group asked during enactment, however, gave notice of a focus on both isolated component parts as well as modes of organization at higher organizational levels which is linked to how biologists mechanistically understand complex systems. Both groups report a similar high measure to which the topic is meaningful in real life (situational interest value), whereas the enjoyment (situational interest feeling) was significantly increased in the experimental group. Furthermore, students report specific advantages (e.g., I now understand that one acid chemically changes into another and they do not just transfer atoms) and disadvantages (e.g., time issues). Show less
This article explores the underlying culture of war sustaining the setting of scenarios within cyberwar games. In particular, it engages with the question of how to simulate a phenomenon that, due... Show moreThis article explores the underlying culture of war sustaining the setting of scenarios within cyberwar games. In particular, it engages with the question of how to simulate a phenomenon that, due to its remoteness, possesses an ambivalent relation to reality. Looking at scenarios of major national and international cyberwar games as illustrations, this article, first, engages with the modes of existence of simulated cyberwar through the conceptual prisms of copy/original and simulation/simulacra. It then explores how these scenarios frame cyberwar in relation to cyberwarfare scholarship and legislations. It then sheds light on how, through operational exercises, these scenarios ultimately reproduce cyberwar as an imagined cultural artifact. Show less
The main focus of this Thesis is the behaviour of two-dimensional materials,namely (anti)-ferromagnetic materials in the first two chapters, which showtopological phases, and energetic square ice... Show moreThe main focus of this Thesis is the behaviour of two-dimensional materials,namely (anti)-ferromagnetic materials in the first two chapters, which showtopological phases, and energetic square ice in the third and fourth chapter.The magnetic materials are of interest in part due to foreseen practical applicationsin which skyrmions can act as data carriers for which we have shownthat skyrmions can exist in the ground state. Energetic square ice is of theoreticalinterest due to its anomalous behaviour at the infinite-order phasetransition and as a purely mathematical analytically solvable model. We usedthis model to test the order parameter we constructed that, by definition, canbe used to detect these infinite-order phase transitions. We also show agreementbetween conjectured and known properties for energetic square icewith special boundaries and show the existence of oscillations that go beyondcurrent theories. Show less