Motivation: Volcano plots are used to select the most interesting discoveries when too many discoveries remain after application of Benjamini-Hochberg's procedure (BH). The volcano plot suggests a... Show moreMotivation: Volcano plots are used to select the most interesting discoveries when too many discoveries remain after application of Benjamini-Hochberg's procedure (BH). The volcano plot suggests a double filtering procedure that selects features with both small adjusted -value and large estimated effect size. Despite its popularity, this type of selection overlooks the fact that BH does not guarantee error control over filtered subsets of discoveries. Therefore the selected subset of features may include an inflated number of false discoveries. Results: In this paper, we illustrate the substantially inflated type I error rate of volcano plot selection with simulation experiments and RNA-seq data. In particular, we show that the feature with the largest estimated effect is a very likely false positive result. Next, we investigate two alternative approaches for multiple testing with double filtering that do not inflate the false discovery rate. Our procedure is implemented in an interactive web application and is publicly available. Show less
Ebrahimpoor, M.; Spitali, P.; Goeman, J.J.; Tsonaka, R. 2021
We propose a top-down approach for pathway analysis of longitudinal metabolite data. We apply a score test based on a shared latent process mixed model which can identify pathways with... Show moreWe propose a top-down approach for pathway analysis of longitudinal metabolite data. We apply a score test based on a shared latent process mixed model which can identify pathways with differentially progressing metabolites. The strength of our approach is that it can handle unbalanced designs, deals with potential missing values in the longitudinal markers, and gives valid results even with small sample sizes. Contrary to bottom-up approaches, correlations between metabolites are explicitly modeled leveraging power gains. For large pathway sizes, a computationally efficient solution is proposed based on pseudo-likelihood methodology. We demonstrate the advantages of the proposed method in identification of differentially expressed pathways through simulation studies. Finally, longitudinal metabolite data from a mice experiment is analyzed to demonstrate our methodology. Show less
Signorelli, M.; Ebrahimpoor, M.; Veth, O.; Hettne, K.; Verwey, N.; Garcia-Rodriguez, R.; ... ; Spitali, P. 2021
DMD is a rare disorder characterized by progressive muscle degeneration and premature death. Therapy development is delayed by difficulties to monitor efficacy non-invasively in clinical trials. In... Show moreDMD is a rare disorder characterized by progressive muscle degeneration and premature death. Therapy development is delayed by difficulties to monitor efficacy non-invasively in clinical trials. In this study, we used RNA-sequencing to describe the pathophysiological changes in skeletal muscle of 3 dystrophic mouse models. We show how dystrophic changes in muscle are reflected in blood by analyzing paired muscle and blood samples. Analysis of repeated blood measurements followed the dystrophic signature at five equally spaced time points over a period of seven months. Treatment with two antisense drugs harboring different levels of dystrophin recovery identified genes associated with safety and efficacy. Evaluation of the blood gene expression in a cohort of DMD patients enabled the comparison between preclinical models and patients, and the identification of genes associated with physical performance, treatment with corticosteroids and body measures. The presented results provide evidence that blood RNA-sequencing can serve as a tool to evaluate disease progression in dystrophic mice and patients, as well as to monitor response to (dystrophin-restoring) therapies in preclinical drug development and in clinical trials. Show less
Ebrahimpoor, M.; Spitali, P.; Hettne, K.; Tsonaka, R.; Goeman, J. 2020
Studying sets of genomic features is increasingly popular in genomics, proteomics and metabolomics since analyzing at set level not only creates a natural connection to biological knowledge but... Show moreStudying sets of genomic features is increasingly popular in genomics, proteomics and metabolomics since analyzing at set level not only creates a natural connection to biological knowledge but also offers more statistical power. Currently, there are two gene-set testing approaches, self-contained and competitive, both of which have their advantages and disadvantages, but neither offers the final solution. We introduce simultaneous enrichment analysis (SEA), a new approach for analysis of feature sets in genomics and other omics based on a new unified null hypothesis, which includes the self-contained and competitive null hypotheses as special cases. We employ closed testing using Simes tests to test this new hypothesis. For every feature set, the proportion of active features is estimated, and a confidence bound is provided. Also, for every unified null hypotheses, a P-value is calculated, which is adjusted for family-wise error rate. SEA does not need to assume that the features are independent. Moreover, users are allowed to choose the feature set(s) of interest after observing the data. We develop a novel pipeline and apply it on RNA-seq data of dystrophin-deficient mdx mice, showcasing the flexibility of the method. Finally, the power properties of the method are evaluated through simulation studies. Show less