In this thesis, deep learning is studied from a statistical perspective. Convergence rates for the worst case risk bounds of neural network estimators are obtained in the classification, density... Show moreIn this thesis, deep learning is studied from a statistical perspective. Convergence rates for the worst case risk bounds of neural network estimators are obtained in the classification, density estimation and linear regression model. Special attention is given to the role of input dimension since in practice, neural networks have shown promising results for high dimensional input settings. First, the estimation of conditional class probabilities under the cross-entropy loss is studied. A challenge with this loss is that it becomes unbounded near zero. To deal with this, the loss is truncated. Convergence rates are obtained for a neural network estimator under this truncated loss. The second problem considered is density estimation. A two step procedure is proposed. The first step transforms the density estimation problem into a regression problem by constructing response variables using a kernel density estimator on half of the data. In the second step, a neural network is fitted to this constructed data. Convergence rates for this method are obtained using existing approximation results for compositional functions. Finally, forward gradient descent is studied. This is a biologically motivated alternative for gradient descent. Convergence rates are derived for this method in the linear regression model with random design. Show less
The invention of neural networks marks a critical milestone in the pursuit of true artificial intelligence. Despite their impressive performance on various tasks, these networks face limitations in... Show moreThe invention of neural networks marks a critical milestone in the pursuit of true artificial intelligence. Despite their impressive performance on various tasks, these networks face limitations in learning efficiently as they are often trained from scratch. Deep meta-learning is one approach to improve the learning efficiency by leveraging prior knowledge and experience. Whilst many succesful deep meta-learning techniques have been proposed, our understanding of the performance of these methods remains limited. In this dissertation, we delve deeper into the underlying principles of these algorithms, and aim to gain a comprehensive understanding of why certain algorithms succeed while others fall short. This allows us to design enhanced deep meta-learning algorithms and reason about the impact of specific design choices on the performance of different algorithms. Moreover, we investigate the integration of theoretical principles into meta-learning algorithms to improve their performance. Overall, we make a small step toward a better understanding of deep meta-learning algorithms, paving the way for more robust and principled meta-learning techniques with broader applicability and superior performance. Show less
Inpatient violence is a common and severe problem within psychiatry. Knowing who might become violent can influence staffing levels and mitigate severity. Predictive machine learning models can... Show moreInpatient violence is a common and severe problem within psychiatry. Knowing who might become violent can influence staffing levels and mitigate severity. Predictive machine learning models can assess each patient's likelihood of becoming violent based on clinical notes. Yet, while machine learning models benefit from having more data, data availability is limited as hospitals typically do not share their data for privacy preservation. Federated Learning (FL) can overcome the problem of data limitation by training models in a decentralised manner, without disclosing data between collaborators. However, although several FL approaches exist, none of these train Natural Language Processing models on clinical notes. In this work, we investigate the application of Federated Learning to clinical Natural Language Processing, applied to the task of Violence Risk Assessment by simulating a cross-institutional psychiatric setting. We train and compare four models: two local models, a federated model and a data-centralised model. Our results indicate that the federated model outperforms the local models and has similar performance as the data-centralised model. These findings suggest that Federated Learning can be used successfully in a cross-institutional setting and is a step towards new applications of Federated Learning based on clinical notes. Show less
Li, R.; Napolitano, N.R.; Roy, N.; Tortora, C.; La Barbera, F.; Sonnenfeld, A.; ... ; Liu, S. 2022
Background Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models... Show moreBackground Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians. Methods In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques. Results Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years. Conclusion In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables. Show less
Background: Serial electrocardiography aims to contribute to electrocardiogram (ECG) diagnosis by comparing the ECG under consideration with a previously made ECG in the same individual. Here, we... Show moreBackground: Serial electrocardiography aims to contribute to electrocardiogram (ECG) diagnosis by comparing the ECG under consideration with a previously made ECG in the same individual. Here, we present a novel algorithm to construct dedicated deep-learning neural networks (NNs) that are specialized in detecting newly emerging or aggravating existing cardiac pathology in serial ECGs.Methods: We developed a novel deep-learning method for serial ECG analysis and tested its performance in detection of heart failure in post-infarction patients, and in the detection of ischemia in patients who underwent elective percutaneous coronary intervention. Core of the method is the repeated structuring and learning procedure that, when fed with 13 serial ECG difference features (intra-individual differences in: QRS duration; QT interval; QRS maximum; T-wave maximum; QRS integral; T-wave integral; QRS complexity; T-wave complexity; ventricular gradient; QRS-T spatial angle; heart rate; J-point amplitude; and T-wave symmetry), dynamically creates a NN of at most three hidden layers. An optimization process reduces the possibility of obtaining an inefficient NN due to adverse initialization.Results: Application of our method to the two clinical ECG databases yielded 3-layer NN architectures, both showing high testing performances (areas under the receiver operating curves were 84% and 83%, respectively).Conclusions: Our method was successful in two different clinical serial ECG applications. Further studies will investigate if other problem-specific NNs can successfully be constructed, and even if it will be possible to construct a universal NN to detect any pathologic ECG change. Show less
This thesis studies asymptotic behavior and stability of determinsitic and stochastic delay differential equations. The approach used in this thesis is based on fixed point theory, which does not... Show moreThis thesis studies asymptotic behavior and stability of determinsitic and stochastic delay differential equations. The approach used in this thesis is based on fixed point theory, which does not resort to any Liapunov function or Liapunov functional. The main contribution of this thesis is to study the approach using fixed point theory in a systematic way and to unify recent results in the literature by considering some general classes of equations. The equation we considered is a combination of time dependent delays, distributed delays, impulses and stochastic perturbations. In addition, an application to stochastic delayed neural networks is investigated. The results in this thesis extend and improve some exist results in the literature in some ways. Examples are discussed in each chapter to illustrate our main results. Show less
Stochastic differential equations with delay are the inspiration for this thesis. Examples of such equations arise in population models, control systems with delay and noise, lasers, economical... Show moreStochastic differential equations with delay are the inspiration for this thesis. Examples of such equations arise in population models, control systems with delay and noise, lasers, economical models, neural networks, environmental pollution and in many other situations. In such models we are often interested in the evolution of a particular quantity, for example the size of a population, or the amount of pollution in a particular area, changing in time. A differential equation with delay, or delay equation, is a differential equation in which the change in time of such a quantity is expressed as a function of the value of that quantity at different points in time, in the past as well as in the present. This is in contrast with an ordinary differential equation, in which the change in time of the quantity at a specific time is expressed as a function of that quantity at that specific time only. Show less