Learning from small data sets in machine learning is a crucial challenge, especially when dealing with data imbalances and anomaly detection. This thesis delves into the challenges and... Show moreLearning from small data sets in machine learning is a crucial challenge, especially when dealing with data imbalances and anomaly detection. This thesis delves into the challenges and methodologies of learning from small datasets in machine learning, with a particular focus on addressing data imbalances and anomaly detec- tion. It thoroughly explores various strategies for effective small dataset learning in ML, examining both existing approaches and introducing novel techniques. The research pivots around two key questions: firstly, it investigates current methods employed for learning from small datasets in machine learning, and secondly, it assesses the efficacy of batch normalization in enhancing model performance and utilizing salient image segmentation as an augmentation policy in self-supervised learning.The thesis comprehensively reviews techniques for managing small datasets, in- cluding data selection and preprocessing, ensemble methods, transfer learning, regularization techniques, and synthetic data generation. A critical examination of batch normalization reveals its significant role in improving training time and testing errors for minority classes in highly imbalanced datasets. The study also demonstrates that utilizing salient image segmentation as an augmentation policy in self-supervised learning substantially improves representation learning. This improvement is particularly evident in the context of downstream tasks such as image segmentation, highlighting the effectiveness of this technique in enhancing model performance.In summary, this study contributes to the field of machine learning by exploring strategies for learning from small datasets. It offers a detailed analysis of batch normalization, highlighting its potential in improving performance for minority classes in imbalanced datasets. Additionally, the study introduces salient image segmentation as an augmentation policy in self-supervised learning, showing its effectiveness in tasks like image segmentation. These findings provide a solid foundation for further research in small sample learning and present practical insights for machine learning practitioners working with limited data. Show less
The research in this dissertation aims to optimise blood donation processes in the framework of the Dutch national blood bank Sanquin. The primary health risk for blood donors is iron deficiency,... Show moreThe research in this dissertation aims to optimise blood donation processes in the framework of the Dutch national blood bank Sanquin. The primary health risk for blood donors is iron deficiency, which is evaluated based on donors' hemoglobin and ferritin levels. If either of these levels are inadequate, donors are deferred from donation. Deferral due to low hemoglobin levels occurs on-site, meaning that donors have already traveled to the blood bank and then have to return home without donating, which is demotivating for the donor and inefficient for the blood bank. A large part of this dissertation therefore has the objective to develop a prediction model for donors' hemoglobin levels, based on historical measurements and donor characteristics.The prediction model that was developed reduces the deferral rate by approximately 60\% (from 3\% to 1\% for women, and from 1\% to 0.4\% for men), showing the potential of using data to enhance blood bank policy efficiency. Additionally, the model predictions were made explainable, providing the blood bank with insights into why specific predictions are made. These insights increase our understanding of the relationships between donor characteristics and hemoglobin levels. If this prediction model would be implemented in practice, the explanations could also be shared with the donor to help them understand why they are (not) invited to donate, which could also contribute to donor satisfaction and retention.In a collaborative effort with blood banks in Australia, Belgium, Finland and South Africa, the same prediction model was applied on data from each blood bank. Despite differences in blood bank policies and donor demographics, the models found similar associations with the predictor variables in all countries. Differences in performance could mostly be attributed to differences in deferral rates, with blood banks with higher deferral rates obtaining higher model accuracy.Beyond hemoglobin prediction models, additional research questions are explored. One study aims to identify determinants of ferritin levels in donors through repeated measurements, and linking these to environmental variables. Another study involves modeling the pharmacokinetics of antibodies in COVID-19 recovered donors, and finding relationships between patient characteristics, symptoms, and antibody levels over time.In summary, the research in this dissertation shows the potential within the wealth of data collected by blood banks. The proposed data-driven donation strategies not only decrease deferral rates but also increase donor retention and understanding. This comprehensive approach allows Sanquin to provide more personalised feedback to donors regarding their iron status, ultimately optimising the blood donation process and contributing to the overall efficacy of blood banking systems. Show less