This thesis is concerned with statistical methodology for jointly analyzing multiple types of omics data. These datasets provide information on several biological levels, and an integrated analysis... Show moreThis thesis is concerned with statistical methodology for jointly analyzing multiple types of omics data. These datasets provide information on several biological levels, and an integrated analysis can lead to a better understanding of whole biological system. Due to the strong correlations within and between datasets, high dimensionality, and systematic differences between datasets, novel methods are needed. We consider latent variable modeling where strong correlations are incorporated, dimension reduction is performed, and heterogeneity between omics data is modeled. The first part of the thesis studies current data integration methods applied to population cohorts and their software implementations. In the second part, we propose a novel probabilistic data integration framework to model the relation between omics data: PO2PLS. This framework allows for statistical inference and helps reduce overfitting. The PO2PLS framework can be used to integrate multiple omics data with various study designs. Show less
Bouhaddani, S. el; Uh, H.W.; Jongbloed, G.; Hayward, C.; Klaric, L.; Kielbasa, S.M.; Houwing-Duistermaat, J. 2018