The Globaltest is a powerful test for the global null hypothesis that there is no association between a group of features and a response of interest, which is popular in pathway testing in... Show moreThe Globaltest is a powerful test for the global null hypothesis that there is no association between a group of features and a response of interest, which is popular in pathway testing in metabolomics. Evaluating multiple feature sets, however, requires multiple testing correction. In this paper, we propose a multiple testing method, based on closed testing, specifically designed for the Globaltest. The proposed method controls the familywise error rate simultaneously over all possible feature sets, and therefore allows post hoc inference, that is, the researcher may choose feature sets of interest after seeing the data without jeopardizing error control. To circumvent the exponential computation time of closed testing, we derive a novel shortcut that allows exact closed testing to be performed on the scale of metabolomics data. An R package ctgt is available on comprehensive R archive network for the implementation of the shortcut procedure, with applications on several real metabolomics data examples. Show less
For multiple comparisons in analysis of variance, the practitioners' handbooks generally advocate standard methods such as Bonferroni, or an F-test followed by Tukey's honest significant difference... Show moreFor multiple comparisons in analysis of variance, the practitioners' handbooks generally advocate standard methods such as Bonferroni, or an F-test followed by Tukey's honest significant difference method. These methods are known to be suboptimal compared to closed testing procedures, but improved methods can be complex in the general multigroup set-up. In this note, we argue that the case of three-groups is special: with three groups, closed testing procedures are powerful and easy to use. We describe four different closed testing procedures specifically for the three-group set-up. The choice of method should be determined by assessing which of the comparisons are considered primary and which are secondary, as dictated by subject-matter considerations. We describe how all four methods can be used with any standard software. Show less
We construct confidence regions in high dimensions by inverting the globaltest statistics, and use them to choose the tuning parameter for penalized regression. The selected model corresponds to... Show moreWe construct confidence regions in high dimensions by inverting the globaltest statistics, and use them to choose the tuning parameter for penalized regression. The selected model corresponds to the point in the confidence region of the parameters that minimizes the penalty, making it the least complex model that still has acceptable fit according to the test that defines the confidence region. As the globaltest is particularly powerful in the presence of many weak predictors, it connects well to ridge regression, and we thus focus on ridge penalties in this paper. The confidence region method is quick to calculate, intuitive, and gives decent predictive potential. As a tuning parameter selection method it may even outperform classical methods such as cross-validation in terms of mean squared error of prediction, especially when the signal is weak. We illustrate the method for linear models in simulation study and for Cox models in real gene expression data of breast cancer samples. Show less
We consider the class of all multiple testing methods controlling tail probabilities of the false discovery proportion, either for one random set or simultaneously for many such sets. This class... Show moreWe consider the class of all multiple testing methods controlling tail probabilities of the false discovery proportion, either for one random set or simultaneously for many such sets. This class encompasses methods controlling familywise error rate, generalized familywise error rate, false discovery exceedance, joint error rate, simultaneous control of all false discovery proportions, and others, as well as gene set testing in genomics and cluster inference in neuroimaging. We show that all such methods are either equivalent to a closed testing procedure, or are uniformly improved by one. Moreover, we show that a closed testing method is admissible if and only if all its local tests are admissible. This implies that, when designing methods, it is sufficient to restrict attention to closed testing. We demonstrate the practical usefulness of this design principle by obtaining more informative inferences from the method of higher criticism, and by constructing a uniform improvement of a recently proposed method. Show less
Mohamad, D. al; Zwet, E. van; Solari, A.; Goeman, J. 2021
We consider the problem of constructing simultaneous confidence intervals (CIs) for the ranks of n means based on their estimates together with the (known) standard errors of those estimates. We... Show moreWe consider the problem of constructing simultaneous confidence intervals (CIs) for the ranks of n means based on their estimates together with the (known) standard errors of those estimates. We present a generic method based on the partitioning principle in which the parameter space is partitioned into disjoint subsets and then each one of them is tested at level a. The resulting CIs have then a simultaneous coverage of 1 - alpha. We show that any procedure which produces simultaneous CIs for ranks can be written as a partitioning procedure. We present a first example where we test the partitions using the likelihood ratio (LR) test. Then, in a second example we show that a recently proposed method for simultaneous CIs for ranks using Tukey's honest significant difference test has an equivalent procedure based on the partitioning principle. By embedding these two methods inside our generic partitioning procedure, we obtain improved variants. We illustrate the performance of these methods through simulations and real data analysis on hotel ratings. While the novel method that uses the LR test and its variant produce shorter CIs when the number of means is small, the Tukey-based method and its variant produce shorter CIs when the number of means is high. Show less
The most prevalent approach to activation localization in neuroimaging is to identify brain regions as contiguous supra-threshold clusters, check their significance using random field theory, and... Show moreThe most prevalent approach to activation localization in neuroimaging is to identify brain regions as contiguous supra-threshold clusters, check their significance using random field theory, and correct for the multiple clusters being tested. Besides recent criticism on the validity of the random field assumption, a spatial specificity paradox remains: the larger the detected cluster, the less we know about the location of activation within that cluster. This is because cluster inference implies “there exists at least one voxel with an evoked response in the cluster”, and not that “all the voxels in the cluster have an evoked response”. Inference on voxels within selected clusters is considered bad practice, due to the voxel-wise false positive rate inflation associated with this circular inference. Here, we propose a remedy to the spatial specificity paradox. By applying recent results from the multiple testing statistical literature, we are able to quantify the proportion of truly active voxels within selected clusters, an approach we call All-Resolutions Inference (ARI). If this proportion is high, the paradox vanishes. If it is low, we can further “drill down” from the cluster level to sub-regions, and even to individual voxels, in order to pinpoint the origin of the activation. In fact, ARI allows inference on the proportion of activation in all voxel sets, no matter how large or small, however these have been selected, all from the same data. We use two fMRI datasets to demonstrate the non-triviality of the spatial specificity paradox, and its resolution using ARI. We verify that the endless circularity permitted by ARI does not render its estimates overly conservative using both simulation, and a data split. Show less
We propose three-sided testing, a testing framework for simultaneous testing of inferiority, equivalence and superiority in clinical trials, controlling for multiple testing using the partitioning... Show moreWe propose three-sided testing, a testing framework for simultaneous testing of inferiority, equivalence and superiority in clinical trials, controlling for multiple testing using the partitioning principle. Like the usual two-sided testing approach, this approach is completely symmetric in the two treatments compared. Still, because the hypotheses of inferiority and superiority are tested with one-sided tests, the proposed approach has more power than the two-sided approach to infer non-inferiority or non-superiority. Applied to the classical point null hypothesis of equivalence, the three-sided testing approach shows that it is sometimes possible to make an inference on the sign of the parameter of interest, even when the null hypothesis itself could not be rejected. Relationships with confidence intervals are explored, and the effectiveness of the three-sided testing approach is demonstrated in a number of recent clinical trials. Copyright (C) 2010 John Wiley & Sons, Ltd. Show less