Introduction: The rVSVDG-ZEBOV-GP (Ervebo®) vaccine is both immunogenic and protective against Ebola. However, the vaccine can cause a broad range of transient adverse reactions, from headache to... Show moreIntroduction: The rVSVDG-ZEBOV-GP (Ervebo®) vaccine is both immunogenic and protective against Ebola. However, the vaccine can cause a broad range of transient adverse reactions, from headache to arthritis. Identifying baseline reactogenicity signatures can advance personalized vaccinology and increase our understanding of the molecular factors associated with such adverse events.Methods: In this study, we developed a machine learning approach to integrate prevaccination gene expression data with adverse events that occurred within 14 days post-vaccination.Results and Discussion: We analyzed the expression of 144 genes across 343 blood samples collected from participants of 4 phase I clinical trial cohorts: Switzerland, USA, Gabon, and Kenya. Our machine learning approach revealed 22 key genes associated with adverse events such as local reactions, fatigue, headache, myalgia, fever, chills, arthralgia, nausea, and arthritis, providing insights into potential biological mechanisms linked to vaccine reactogenicity. Show less
The Banff Digital Pathology Working Group (DPWG) was established with the goal to establish a digital pathology repository; develop, validate, and share models for image analysis; and foster... Show moreThe Banff Digital Pathology Working Group (DPWG) was established with the goal to establish a digital pathology repository; develop, validate, and share models for image analysis; and foster collaborations using regular videoconferencing. During the calls, a variety of artificial intelligence (AI)-based support systems for transplantation pathology were presented. Potential collaborations in a competition/trial on AI applied to kidney transplant specimens, including the DIAGGRAFT challenge (staining of biopsies at multiple institutions, pathologists' visual assessment, and development and validation of new and pre-existing Banff scoring algorithms), were also discussed. To determine the next steps, a survey was conducted, primarily focusing on the feasibility of establishing a digital pathology repository and identifying potential hosts. Sixteen of the 35 respondents (46%) had access to a server hosting a digital pathology repository, with 2 respondents that could serve as a potential host at no cost to the DPWG. The 16 digital pathology repositories collected specimens from various organs, with the largest constituent being kidney (n = 12,870 specimens). A DPWG pilot digital pathology repository was established, and there are plans for a competition/trial with the DIAGGRAFT project. Utilizing existing resources and previously established models, the Banff DPWG is establishing new resources for the Banff community. Show less
Pullen, L.C.E.; Noortman, W.A.; Triemstra, L.; Jongh, C. de; Rademaker, F.J.; Spijkerman, R.; ... ; PLASTIC Study Grp 2023
Aim: To improve identification of peritoneal and distant metastases in locally advanced gastric cancer using [18F]FDG-PET radiomics. Methods: [18F]FDG-PET scans of 206 patients acquired in 16... Show moreAim: To improve identification of peritoneal and distant metastases in locally advanced gastric cancer using [18F]FDG-PET radiomics. Methods: [18F]FDG-PET scans of 206 patients acquired in 16 different Dutch hospitals in the prospective multicentre PLASTIC-study were analysed. Tumours were delineated and 105 radiomic features were extracted. Three classification models were developed to identify peritoneal and distant metastases (incidence: 21%): a model with clinical variables, a model with radiomic features, and a clinicoradiomic model, combining clinical variables and radiomic features. A least absolute shrinkage and selection operator (LASSO) regression classifier was trained and evaluated in a 100-times repeated random split, stratified for the presence of peritoneal and distant metastases. To exclude features with high mutual correlations, redundancy filtering of the Pearson correlation matrix was performed (r = 0.9). Model performances were expressed by the area under the receiver operating characteristic curve (AUC). In addition, subgroup analyses based on Lauren classification were performed. Results: None of the models could identify metastases with low AUCs of 0.59, 0.51, and 0.56, for the clinical, radiomic, and clinicoradiomic model, respectively. Subgroup analysis of intestinal and mixed-type tumours resulted in low AUCs of 0.67 and 0.60 for the clinical and radiomic models, and a moderate AUC of 0.71 in the clinicoradiomic model. Subgroup analysis of diffuse-type tumours did not improve the classification performance. Conclusion: Overall, [18F]FDG-PET-based radiomics did not contribute to the preoperative identification of peritoneal and distant metastases in patients with locally advanced gastric carcinoma. In intestinal and mixed-type tumours, the classification performance of the clinical model slightly improved with the addition of radiomic features, but this slight improvement does not outweigh the laborious radiomic analysis. Show less
Noortman, W.A.; Aide, N.; Vriens, D.; Arkes, L.S.; Slump, C.H.; Boellaard, R.; ... ; Geus-Oei, L.F. de 2023
Aim: To build and externally validate an [F-18]FDG PET radiomic model to predict overall survival in patients with head and neck squamous cell carcinoma (HNSCC).Methods: Two multicentre datasets of... Show moreAim: To build and externally validate an [F-18]FDG PET radiomic model to predict overall survival in patients with head and neck squamous cell carcinoma (HNSCC).Methods: Two multicentre datasets of patients with operable HNSCC treated with preoperative afatinib who underwent a baseline and evaluation [F-18]FDG PET/CT scan were included (EORTC: n = 20, Unicancer: n = 34). Tumours were delineated, and radiomic features were extracted. Each cohort served once as a training and once as an external validation set for the prediction of overall survival. Supervised feature selection was performed using variable hunting with variable importance, selecting the top two features. A Cox proportional hazards regression model using selected radiomic features and clinical characteristics was fitted on the training dataset and validated in the external validation set. Model performances are expressed by the concordance index (C-index).Results: In both models, the radiomic model surpassed the clinical model with validation C-indices of 0.69 and 0.79 vs. 0.60 and 0.67, respectively. The model that combined the radiomic features and clinical variables performed best, with validation C-indices of 0.71 and 0.82.Conclusion: Although assessed in two small but independent cohorts, an [F-18]FDG-PET radiomic signature based on the evaluation scan seems promising for the prediction of overall survival for HNSSC treated with preoperative afatinib. The robustness and clinical applicability of this radiomic signature should be assessed in a larger cohort. Show less
Bournez, C.; Riool, M.; Boer, L. de; Cordfunke, R.A.; Best, L. de; Leeuwen, R. van; ... ; Westen, G.J.P. van 2023
To combat infection by microorganisms host organisms possess a primary arsenal via the innate immune system. Among them are defense peptides with the ability to target a wide range of pathogenic... Show moreTo combat infection by microorganisms host organisms possess a primary arsenal via the innate immune system. Among them are defense peptides with the ability to target a wide range of pathogenic organisms, including bacteria, viruses, parasites, and fungi. Here, we present the development of a novel machine learning model capable of predicting the activity of antimicrobial peptides (AMPs), CalcAMP. AMPs, in particular short ones (<35 amino acids), can become an effective solution to face the multi-drug resistance issue arising worldwide. Whereas finding potent AMPs through classical wet-lab techniques is still a long and expensive process, a machine learning model can be useful to help researchers to rapidly identify whether peptides present potential or not. Our prediction model is based on a new data set constructed from the available public data on AMPs and experimental antimicrobial activities. CalcAMP can predict activity against both Gram-positive and Gram-negative bacteria. Different features either concerning general physicochemical properties or sequence composition have been assessed to retrieve higher prediction accuracy. CalcAMP can be used as an promising prediction asset to identify short AMPs among given peptide sequences. Show less
Garofoli, R.; Resche-Rigon, M.; Roux, C.; Heijde, D. van der; Dougados, M.; Molto, A. 2023
OBJECTIVES:To compare machine learning (ML) to traditional models to predict radiographic progression in patients with early axial spondyloarthritis (axSpA).METHODS:We carried out a prospective... Show moreOBJECTIVES:To compare machine learning (ML) to traditional models to predict radiographic progression in patients with early axial spondyloarthritis (axSpA).METHODS:We carried out a prospective French multicentric DESIR cohort study with 5 years of follow-up that included patients with chronic back pain for <3 years, suggestive of axSpA. Radiographic progression was defined as progression at the spine (increase of at least 1 point of mSASSS scores/2 years) or at the sacroiliac joint (worsening of at least one grade of the mNY score between 2 visits). Statistical analyses were based on patients without any missing data regarding the outcome and variables of interest (295 patients). Traditional modelling: we performed a multivariate logistic regression model (M1); then variable selection with stepwise selection based on Akaike Information Criterion (stepAIC) method (M2), and Least Absolute Shrinkage and Selection Operator (LASSO) method (M3). ML modelling: using “SuperLearner” package on R, we modelled radiographic progression with stepAIC, LASSO, random forest, Discrete Bayesian Additive Regression Trees Samplers (DBARTS), Generalized Additive Models (GAM), multivariate adaptive polynomial spline regression (polymars), Recursive Partitioning And Regression Trees (RPART) and Super Learner. Accuracy of these models was compared based on their 10-fold cross-validated AUC (cv-AUC).RESULTS:10-fold cv-AUC for traditional models were 0.79 and 0.78 for M2 and M3, respectively. The three best models in the ML algorithms were the GAM, the DBARTS and the Super Learner models, with 10-fold cv-AUC of: 0.77, 0.76 and 0.74, respectively.CONCLUSIONS:Two traditional models predicted radiographic progression as good as the eight ML models tested in this population. Show less
In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine... Show moreIn recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research. Show less
Data set acquisition and curation are often the most difficult and time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based liquid chromatography (LC)... Show moreData set acquisition and curation are often the most difficult and time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based liquid chromatography (LC) coupled to mass spectrometry (MS) data sets, due to the high levels of data reduction that occur between raw data and machine learning-ready data. Since predictive proteomics is an emerging field, when predicting peptide behavior in LC-MS setups, each lab often uses unique and complex data processing pipelines in order to maximize performance, at the cost of accessibility and reproducibility. For this reason we introduce ProteomicsML, an online resource for proteomics-based data sets and tutorials across most of the currently explored physicochemical peptide properties. This community-driven resource makes it simple to access data in easy-to-process formats, and contains easy-to-follow tutorials that allow new users to interact with even the most advanced algorithms in the field. ProteomicsML provides data sets that are useful for comparing state-of-the-art machine learning algorithms, as well as providing introductory material for teachers and newcomers to the field alike. The platform is freely available at https://www.proteomicsml.org/, and we welcome the entire proteomics community to contribute to the project at https://github.com/ProteomicsML/ProteomicsML. Show less
Park, H.B.; Lee, J.; Hong, Y.; Byungchang, S.; Kim, W.; Lee, B.K.; ... ; Chang, H.J. 2023
Background and HypothesisThe recently introduced Bayesian quantile regression (BQR) machine-learning method enables comprehensive analyzing the relationship among complex clinical variables. We... Show moreBackground and HypothesisThe recently introduced Bayesian quantile regression (BQR) machine-learning method enables comprehensive analyzing the relationship among complex clinical variables. We analyzed the relationship between multiple cardiovascular (CV) risk factors and different stages of coronary artery disease (CAD) using the BQR model in a vessel-specific manner.MethodsFrom the data of 1,463 patients obtained from the PARADIGM (NCT02803411) registry, we analyzed the lumen diameter stenosis (DS) of the three vessels: left anterior descending (LAD), left circumflex (LCx), and right coronary artery (RCA). Two models for predicting DS and DS changes were developed. Baseline CV risk factors, symptoms, and laboratory test results were used as the inputs. The conditional 10%, 25%, 50%, 75%, and 90% quantile functions of the maximum DS and DS change of the three vessels were estimated using the BQR model.ResultsThe 90th percentiles of the DS of the three vessels and their maximum DS change were 41%–50% and 5.6%–7.3%, respectively. Typical anginal symptoms were associated with the highest quantile (90%) of DS in the LAD; diabetes with higher quantiles (75% and 90%) of DS in the LCx; dyslipidemia with the highest quantile (90%) of DS in the RCA; and shortness of breath showed some association with the LCx and RCA. Interestingly, High-density lipoprotein cholesterol showed a dynamic association along DS change in the per-patient analysis.ConclusionsThis study demonstrates the clinical utility of the BQR model for evaluating the comprehensive relationship between risk factors and baseline-grade CAD and its progression. Show less
Background: Facioscapulohumeral muscular dystrophy (FSHD) is a progressive neuromuscular disease. Its slow and variable progression makes the development of new treatments highly dependent on... Show moreBackground: Facioscapulohumeral muscular dystrophy (FSHD) is a progressive neuromuscular disease. Its slow and variable progression makes the development of new treatments highly dependent on validated biomarkers that can quantify disease progression and response to drug interventions.Objective: We aimed to build a tool that estimates FSHD clinical severity based on behavioral features captured using smartphone and remote sensor data. The adoption of remote monitoring tools, such as smartphones and wearables, would provide a novel opportunity for continuous, passive, and objective monitoring of FSHD symptom severity outside the clinic.Methods: In total, 38 genetically confirmed patients with FSHD were enrolled. The FSHD Clinical Score and the Timed Up and Go (TUG) test were used to assess FSHD symptom severity at days 0 and 42. Remote sensor data were collected using an Android smartphone, Withings Steel HR+, Body+, and BPM Connect+ for 6 continuous weeks. We created 2 single-task regression models that estimated the FSHD Clinical Score and TUG separately. Further, we built 1 multitask regression model that estimated the 2 clinical assessments simultaneously. Further, we assessed how an increasingly incremental time window affected the model performance. To do so, we trained the models on an incrementally increasing time window (from day 1 until day 14) and evaluated the predictions of the clinical severity on the remaining 4 weeks of data.Results: The single-task regression models achieved an R2 of 0.57 and 0.59 and a root-mean-square error (RMSE) of 2.09 and 1.66 when estimating FSHD Clinical Score and TUG, respectively. Time spent at a health-related location (such as a gym or hospital) and call duration were features that were predictive of both clinical assessments. The multitask model achieved an R2 of 0.66 and 0.81 and an RMSE of 1.97 and 1.61 for the FSHD Clinical Score and TUG, respectively, and therefore outperformed the single-task models in estimating clinical severity. The 3 most important features selected by the multitask model were light sleep duration, total steps per day, and mean steps per minute. Using an increasing time window (starting from day 1 to day 14) for the FSHD Clinical Score, TUG, and multitask estimation yielded an average R2 of 0.65, 0.79, and 0.76 and an average RMSE of 3.37, 2.05, and 4.37, respectively. Conclusions: We demonstrated that smartphone and remote sensor data could be used to estimate FSHD clinical severity and therefore complement the assessment of FSHD outside the clinic. In addition, our results illustrated that training the models on the first week of data allows for consistent and stable prediction of FSHD symptom severity. Longitudinal follow-up studies should be conducted to further validate the reliability and validity of the multitask model as a tool to monitor disease progression over a longer period. Show less
Wang, Q.K.; Runhaar, J.; Kloppenburg, M.; Boers, M.; Bijlsma, J.W.J.; Bacardit, J.; ... ; CREDO Experts Grp 2022
Objectives To identify highly ranked features related to clinicians' diagnosis of clinically relevant knee OA. Methods General practitioners (GPs) and secondary care physicians (SPs) were recruited... Show moreObjectives To identify highly ranked features related to clinicians' diagnosis of clinically relevant knee OA. Methods General practitioners (GPs) and secondary care physicians (SPs) were recruited to evaluate 5-10 years follow-up clinical and radiographic data of knees from the CHECK cohort for the presence of clinically relevant OA. GPs and SPs were gathered in pairs; each pair consisted of one GP and one SP, and the paired clinicians independently evaluated the same subset of knees. A diagnosis was made for each knee by the GP and SP before and after viewing radiographic data. Nested 5-fold cross-validation enhanced random forest models were built to identify the top 10 features related to the diagnosis. Results Seventeen clinician pairs evaluated 1106 knees with 139 clinical and 36 radiographic features. GPs diagnosed clinically relevant OA in 42% and 43% knees, before and after viewing radiographic data, respectively. SPs diagnosed in 43% and 51% knees, respectively. Models containing top 10 features had good performance for explaining clinicians' diagnosis with area under the curve ranging from 0.76-0.83. Before viewing radiographic data, quantitative symptomatic features (i.e. WOMAC scores) were the most important ones related to the diagnosis of both GPs and SPs; after viewing radiographic data, radiographic features appeared in the top lists for both, but seemed to be more important for SPs than GPs. Conclusions Random forest models presented good performance in explaining clinicians' diagnosis, which helped to reveal typical features of patients recognized as clinically relevant knee OA by clinicians from two different care settings. Show less
Objectives: Around 30% of patients with RA have an inadequate response to MTX. We aimed to use routine clinical and biological data to build machine learning models predicting EULAR inadequate... Show moreObjectives: Around 30% of patients with RA have an inadequate response to MTX. We aimed to use routine clinical and biological data to build machine learning models predicting EULAR inadequate response to MTX and to identify simple predictive biomarkers. Methods: Models were trained on RA patients fulfilling the 2010 ACR/EULAR criteria from the ESPOIR and Leiden EAC cohorts to predict the EULAR response at 9 months (+/- 6 months). Several models were compared on the training set using the AUROC. The best model was evaluated on an external validation cohort (tREACH). The model's predictions were explained using Shapley values to extract a biomarker of inadequate response. Results: We included 493 therapeutic sequences from ESPOIR, 239 from EAC and 138 from tREACH. The model selected DAS28, Lymphocytes, Creatininemia, Leucocytes, AST, ALT, swollen joint count and corticosteroid co-treatment as predictors. The model reached an AUROC of 0.72 [95% CI (0.63, 0.80)] on the external validation set, where 70% of patients were responders to MTX. Patients predicted as inadequate responders had only 38% [95% CI (20%, 58%)] chance to respond and using the algorithm to decide to initiate MTX would decrease inadequate-response rate from 30% to 23% [95% CI: (17%, 29%)]. A biomarker was identified in patients with moderate or high activity (DAS28 > 3.2): patients with a lymphocyte count superior to 2000 cells/mm(3) are significantly less likely to respond. Conclusion: Our study highlights the usefulness of machine learning in unveiling subgroups of inadequate responders to MTX to guide new therapeutic strategies. Further work is needed to validate this approach. Show less
Bogaards, F.A.; Gehrmann, T.; Beekman, M.; Akker, E. ben van den; Rest, O. van de; Hangelbroek, R.W.J.; ... ; Slagboom, P.E. 2022
The response to lifestyle intervention studies is often heterogeneous, especially in older adults. Subtle responses that may represent a health gain for individuals are not always detected by... Show moreThe response to lifestyle intervention studies is often heterogeneous, especially in older adults. Subtle responses that may represent a health gain for individuals are not always detected by classical health variables, stressing the need for novel biomarkers that detect intermediate changes in metabolic, inflammatory, and immunity-related health. Here, our aim was to develop and validate a molecular multivariate biomarker maximally sensitive to the individual effect of a lifestyle intervention; the Personalized Lifestyle Intervention Status (PLIS). We used H-1-NMR fasting blood metabolite measurements from before and after the 13-week combined physical and nutritional Growing Old TOgether (GOTO) lifestyle intervention study in combination with a fivefold cross-validation and a bootstrapping method to train a separate PLIS score for men and women. The PLIS scores consisted of 14 and four metabolites for females and males, respectively. Performance of the PLIS score in tracking health gain was illustrated by association of the sex-specific PLIS scores with several classical metabolic health markers, such as BMI, trunk fat%, fasting HDL cholesterol, and fasting insulin, the primary outcome of the GOTO study. We also showed that the baseline PLIS score indicated which participants respond positively to the intervention. Finally, we explored PLIS in an independent physical activity lifestyle intervention study, showing similar, albeit remarkably weaker, associations of PLIS with classical metabolic health markers. To conclude, we found that the sex-specific PLIS score was able to track the individual short-term metabolic health gain of the GOTO lifestyle intervention study. The methodology used to train the PLIS score potentially provides a useful instrument to track personal responses and predict the participant's health benefit in lifestyle interventions similar to the GOTO study. Show less
Maleki, G.; Zhuparris, A.; Koopmans, I.; Doll, R.J.; Voet, N.; Cohen, A.; ... ; Maeyer, J. de 2022
Background: Facioscapulohumeral dystrophy (FSHD) is a progressive muscle dystrophy disorder leading to significant disability. Currently, FSHD symptom severity is assessed by clinical assessments... Show moreBackground: Facioscapulohumeral dystrophy (FSHD) is a progressive muscle dystrophy disorder leading to significant disability. Currently, FSHD symptom severity is assessed by clinical assessments such as the FSHD clinical score and the Timed Up-and-Go test. These assessments are limited in their ability to capture changes continuously and the full impact of the disease on patients' quality of life. Real-world data related to physical activity, sleep, and social behavior could potentially provide additional insight into the impact of the disease and might be useful in assessing treatment effects on aspects that are important contributors to the functioning and well-being of patients with FSHD.Objective: This study investigated the feasibility of using smartphones and wearables to capture symptoms related to FSHD based on a continuous collection of multiple features, such as the number of steps, sleep, and app use. We also identified features that can be used to differentiate between patients with FSHD and non-FSHD controls.Methods: In this exploratory noninterventional study, 58 participants (n=38, 66%, patients with FSHD and n=20, 34%, non-FSHD controls) were monitored using a smartphone monitoring app for 6 weeks. On the first and last day of the study period, clinicians assessed the participants' FSHD clinical score and Timed Up-and-Go test time. Participants installed the app on their Android smartphones, were given a smartwatch, and were instructed to measure their weight and blood pressure on a weekly basis using a scale and blood pressure monitor. The user experience and perceived burden of the app on participants' smartphones were assessed at 6 weeks using a questionnaire. With the data collected, we sought to identify the behavioral features that were most salient in distinguishing the 2 groups (patients with FSHD and non-FSHD controls) and the optimal time window to perform the classification.Results: Overall, the participants stated that the app was well tolerated, but 67% (39/58) noticed a difference in battery life using all 6 weeks of data, we classified patients with FSHD and non-FSHD controls with 93% accuracy, 100% sensitivity, and 80% specificity. We found that the optimal time window for the classification is the first day of data collection and the first week of data collection, which yielded an accuracy, sensitivity, and specificity of 95.8%, 100%, and 94.4%, respectively. Features relating to smartphone acceleration, app use, location, physical activity, sleep, and call behavior were the most salient features for the classification.Conclusions: Remotely monitored data collection allowed for the collection of daily activity data in patients with FSHD and non-FSHD controls for 6 weeks. We demonstrated the initial ability to detect differences in features in patients with FSHD and non-FSHD controls using smartphones and wearables, mainly based on data related to physical and social activity. Show less
Assadi, H.; Alabed, S.; Maiter, A.; Salehi, M.; Li, R.; Ripley, D.P.; ... ; Garg, P. 2022
Background and Objectives: Interest in artificial intelligence (AI) for outcome prediction has grown substantially in recent years. However, the prognostic role of AI using advanced cardiac... Show moreBackground and Objectives: Interest in artificial intelligence (AI) for outcome prediction has grown substantially in recent years. However, the prognostic role of AI using advanced cardiac magnetic resonance imaging (CMR) remains unclear. This systematic review assesses the existing literature on AI in CMR to predict outcomes in patients with cardiovascular disease. Materials and Methods: Medline and Embase were searched for studies published up to November 2021. Any study assessing outcome prediction using AI in CMR in patients with cardiovascular disease was eligible for inclusion. All studies were assessed for compliance with the Checklist for Artificial Intelligence in Medical Imaging (CLAIM). Results: A total of 5 studies were included, with a total of 3679 patients, with 225 deaths and 265 major adverse cardiovascular events. Three methods demonstrated high prognostic accuracy: (1) three-dimensional motion assessment model in pulmonary hypertension (hazard ratio (HR) 2.74, 95%CI 1.73-4.34, p < 0.001), (2) automated perfusion quantification in patients with coronary artery disease (HR 2.14, 95%CI 1.58-2.90, p < 0.001), and (3) automated volumetric, functional, and area assessment in patients with myocardial infarction (HR 0.94, 95%CI 0.92-0.96, p < 0.001). Conclusion: There is emerging evidence of the prognostic role of AI in predicting outcomes for three-dimensional motion assessment in pulmonary hypertension, ischaemia assessment by automated perfusion quantification, and automated functional assessment in myocardial infarction. Show less
Background: There has been a rapid increase in the number of Artificial Intelligence (AI) studies of cardiac MRI (CMR) segmentation aiming to automate image analysis. However, advancement and... Show moreBackground: There has been a rapid increase in the number of Artificial Intelligence (AI) studies of cardiac MRI (CMR) segmentation aiming to automate image analysis. However, advancement and clinical translation in this field depend on researchers presenting their work in a transparent and reproducible manner. This systematic review aimed to evaluate the quality of reporting in AI studies involving CMR segmentation. Methods: MEDLINE and EMBASE were searched for AI CMR segmentation studies in April 2022. Any fully automated AI method for segmentation of cardiac chambers, myocardium or scar on CMR was considered for inclusion. For each study, compliance with the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) was assessed. The CLAIM criteria were grouped into study, dataset, model and performance description domains. Results: 209 studies published between 2012 and 2022 were included in the analysis. Studies were mainly published in technical journals (58%), with the majority (57%) published since 2019. Studies were from 37 different countries, with most from China (26%), the United States (18%) and the United Kingdom (11%). Short axis CMR images were most frequently used (70%), with the left ventricle the most commonly segmented cardiac structure (49%). Median compliance of studies with CLAIM was 67% (IQR 59-73%). Median compliance was highest for the model description domain (100%, IQR 80-100%) and lower for the study (71%, IQR 63-86%), dataset (63%, IQR 50-67%) and performance (60%, IQR 50-70%) description domains. Conclusion: This systematic review highlights important gaps in the literature of CMR studies using AI. We identified key items missing-most strikingly poor description of patients included in the training and validation of AI models and inadequate model failure analysis-that limit the transparency, reproducibility and hence validity of published AI studies. This review may support closer adherence to established frameworks for reporting standards and presents recommendations for improving the quality of reporting in this field. Show less
Schultes, E.; Roos, M.; Santos, L.O.B.D.; Guizzardi, G.; Bouwman, J.; Hankemeier, T.; ... ; Mons, B. 2022
Although all the technical components supporting fully orchestrated Digital Twins (DT) currently exist, what remains missing is a conceptual clarification and analysis of a more generalized concept... Show moreAlthough all the technical components supporting fully orchestrated Digital Twins (DT) currently exist, what remains missing is a conceptual clarification and analysis of a more generalized concept of a DT that is made FAIR, that is, universally machine actionable. This methodological overview is a first step toward this clarification. We present a review of previously developed semantic artifacts and how they may be used to compose a higher-order data model referred to here as a FAIR Digital Twin (FDT). We propose an architectural design to compose, store and reuse FDTs supporting data intensive research, with emphasis on privacy by design and their use in GDPR compliant open science. Show less
Background and Objectives: With the current advanced data-driven approach to health care, machine learning is gaining more interest. The current study investigates the added value of machine... Show moreBackground and Objectives: With the current advanced data-driven approach to health care, machine learning is gaining more interest. The current study investigates the added value of machine learning to linear regression in predicting anastomotic leakage and pulmonary complications after upper gastrointestinal cancer surgery. Methods: All patients in the Dutch Upper Gastrointestinal Cancer Audit undergoing curatively intended esophageal or gastric cancer surgeries from 2011 to 2017 were included. Anastomotic leakage was defined as any clinically or radiologically proven anastomotic leakage. Pulmonary complications entailed: pneumonia, pleural effusion, respiratory failure, pneumothorax, and/or acute respiratory distress syndrome. Different machine learning models were tested. Nomograms were constructed using Least Absolute Shrinkage and Selection Operator. Results: Between 2011 and 2017, 4228 patients underwent surgical resection for esophageal cancer, of which 18% developed anastomotic leakage and 30% a pulmonary complication. Of the 2199 patients with surgical resection for gastric cancer, 7% developed anastomotic leakage and 15% a pulmonary complication. In all cases, linear regression had the highest predictive value with the area under the curves varying between 61.9 and 68.0, but the difference with machine learning models did not reach statistical significance. Conclusion: Machine learning models can predict postoperative complications in upper gastrointestinal cancer surgery, but they do not outperform the current gold standard, linear regression Show less
Background: There is increasing attention on machine learning (ML)-based clinical decision support systems (CDSS), but their added value and pitfalls are very rarely evaluated in clinical practice.... Show moreBackground: There is increasing attention on machine learning (ML)-based clinical decision support systems (CDSS), but their added value and pitfalls are very rarely evaluated in clinical practice. We implemented a CDSS to aid general practitioners (GPs) in treating patients with urinary tract infections (UTIs), which are a significant health burden worldwide. Objective: This study aims to prospectively assess the impact of this CDSS on treatment success and change in antibiotic prescription behavior of the physician. In doing so, we hope to identify drivers and obstacles that positively impact the quality of health care practice with ML. Methods: The CDSS was developed by Pacmed, Nivel, and Leiden University Medical Center (LUMC). The CDSS presents the expected outcomes of treatments, using interpretable decision trees as ML classifiers. Treatment success was defined as a subsequent period of 28 days during which no new antibiotic treatment for UTI was needed. In this prospective observational study, 36 primary care practices used the software for 4 months. Furthermore, 29 control practices were identified using propensity score-matching. All analyses were performed using electronic health records from the Nivel Primary Care Database. Patients for whom the software was used were identified in the Nivel database by sequential matching using CDSS use data. We compared the proportion of successful treatments before and during the study within the treatment arm. The same analysis was performed for the control practices and the patient subgroup the software was definitely used for. All analyses, including that of physicians' prescription behavior, were statistically tested using 2-sided z tests with an alpha level of .05. Results: In the treatment practices, 4998 observations were included before and 3422 observations (of 2423 unique patients) were included during the implementation period. In the control practices, 5044 observations were included before and 3360 observations were included during the implementation period. The proportion of successful treatments increased significantly from 75% to 80% in treatment practices (z=5.47, P<.001). No significant difference was detected in control practices (76% before and 76% during the pilot, z=0.02; P=.98). Of the 2423 patients, we identified 734 (30.29%) in the CDSS use database in the Nivel database. For these patients, the proportion of successful treatments during the study was 83%-a statistically significant difference, with 75% of successful treatments before the study in the treatment practices (z=4.95; P<.001). Conclusions: The introduction of the CDSS as an intervention in the 36 treatment practices was associated with a statistically significant improvement in treatment success. We excluded temporal effects and validated the results with the subgroup analysis in patients for whom we were certain that the software was used. This study shows important strengths and points of attention for the development and implementation of an ML-based CDSS in clinical practice. Trial Registration: ClinicalTrials.gov NCT04408976; https://clinicaltrials.gov/ct2/show/NCT04408976 Show less
Fairness and bias are crucial concepts in artificial intelligence, yet they are relatively ignored in machine learning applications in clinical psychiatry. We computed fairness metrics and present... Show moreFairness and bias are crucial concepts in artificial intelligence, yet they are relatively ignored in machine learning applications in clinical psychiatry. We computed fairness metrics and present bias mitigation strategies using a model trained on clinical mental health data. We collected structured data related to the admission, diagnosis, and treatment of patients in the psychiatry department of the University Medical Center Utrecht. We trained a machine learning model to predict future administrations of benzodiazepines on the basis of past data. We found that gender plays an unexpected role in the predictions-this constitutes bias. Using the AI Fairness 360 package, we implemented reweighing and discrimination-aware regularization as bias mitigation strategies, and we explored their implications for model performance. This is the first application of bias exploration and mitigation in a machine learning model trained on real clinical psychiatry data. Show less