Aging is a multifaceted and intricate physiological process characterized by a gradual decline in functional capacity, leading to increased susceptibility to diseases and mortality. While... Show moreAging is a multifaceted and intricate physiological process characterized by a gradual decline in functional capacity, leading to increased susceptibility to diseases and mortality. While chronological age serves as a strong risk factor for age-related health conditions, considerable heterogeneity exists in the aging trajectories of individuals, suggesting that biological age may provide a more nuanced understanding of the aging process. However, the concept of biological age lacks a clear operationalization, leading to the development of various biological age predictors without a solid statistical foundation. This paper addresses these limitations by proposing a comprehensive operationalization of biological age, introducing the “AccelerAge” framework for predicting biological age, and introducing previously underutilized evaluation measures for assessing the performance of biological age predictors. The AccelerAge framework, based on Accelerated Failure Time (AFT) models, directly models the effect of candidate predictors of aging on an individual’s survival time, aligning with the prevalent metaphor of aging as a clock. We compare predictors based on the AccelerAge framework to a predictor based on the GrimAge predictor, which is considered one of the best-performing biological age predictors, using simulated data as well as data from the UK Biobank and the Leiden Longevity Study. Our approach seeks to establish a robust statistical foundation for biological age clocks, enabling a more accurate and interpretable assessment of an individual’s aging status. Show less
We propose a permutation-based method for testing a large collection of hypotheses simultaneously. Our method provides lower bounds for the number of true discoveries in any selected subset of... Show moreWe propose a permutation-based method for testing a large collection of hypotheses simultaneously. Our method provides lower bounds for the number of true discoveries in any selected subset of hypotheses. These bounds are simultaneously valid with high confidence. The methodology is particularly useful in functional Magnetic Resonance Imaging cluster analysis, where it provides a confidence statement on the percentage of truly activated voxels within clusters of voxels, avoiding the well-known spatial specificity paradox. We offer a user-friendly tool to estimate the percentage of true discoveries for each cluster while controlling the family-wise error rate for multiple testing and taking into account that the cluster was chosen in a data-driven way. The method adapts to the spatial correlation structure that characterizes functional Magnetic Resonance Imaging data, gaining power over parametric approaches. Show less
We propose a permutation-based method for testing a large collection of hypotheses simultaneously. Our method provides lower bounds for the number of true discoveries in any selected subset of... Show moreWe propose a permutation-based method for testing a large collection of hypotheses simultaneously. Our method provides lower bounds for the number of true discoveries in any selected subset of hypotheses. These bounds are simultaneously valid with high confidence. The methodology is particularly useful in functional Magnetic Resonance Imaging cluster analysis, where it provides a confidence statement on the percentage of truly activated voxels within clusters of voxels, avoiding the well-known spatial specificity paradox. We offer a user-friendly tool to estimate the percentage of true discoveries for each cluster while controlling the family-wise error rate for multiple testing and taking into account that the cluster was chosen in a data-driven way. The method adapts to the spatial correlation structure that characterizes functional Magnetic Resonance Imaging data, gaining power over parametric approaches. Show less
The additive hazards model specifies the effect of covariates on the hazard in an additive way, in contrast to the popular Cox model, in which it is multiplicative. As the non-parametric model,... Show moreThe additive hazards model specifies the effect of covariates on the hazard in an additive way, in contrast to the popular Cox model, in which it is multiplicative. As the non-parametric model, additive hazards offer a very flexible way of modeling time-varying covariate effects. It is most commonly estimated by ordinary least squares. In this paper, we consider the case where covariates are bounded, and derive the maximum likelihood estimator under the constraint that the hazard is non-negative for all covariate values in their domain. We show that the maximum likelihood estimator may be obtained by separately maximizing the log-likelihood contribution of each event time point, and we show that the maximizing problem is equivalent to fitting a series of Poisson regression models with an identity link under non-negativity constraints. We derive an analytic solution to the maximum likelihood estimator. We contrast the maximum likelihood estimator with the ordinary least-squares estimator in a simulation study and show that the maximum likelihood estimator has smaller mean squared error than the ordinary least-squares estimator. An illustration with data on patients with carcinoma of the oropharynx is provided. Show less
In a seminal paper, Sejdinovic et al. (Ann. Statist. 41 (2013) 2263-2291) showed the equivalence of the Hilbert-Schmidt Independence Criterion (HSIC) and a generalization of distance covariance. In... Show moreIn a seminal paper, Sejdinovic et al. (Ann. Statist. 41 (2013) 2263-2291) showed the equivalence of the Hilbert-Schmidt Independence Criterion (HSIC) and a generalization of distance covariance. In this paper, the two notions of dependence are unified with a third prominent concept for independence testing, the "global test" introduced in (J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 (2006) 477-493). The new viewpoint provides novel insights into all three test traditions, as well as a unified overall view of the way all three tests contrast with classical association tests. As our main result, a regression perspective on HSIC and generalized distance covariance is obtained, allowing such tests to be used with nuisance covariates or for survival data. Several more examples of cross-fertilization of the three traditions are provided, involving theoretical results and novel methodology. To illustrate the difference between classical statistical tests and the unified HSIC/distance covariance/global tests we investigate the case of association between two categorical variables in depth. Show less
Simultaneous inference allows for the exploration of data while deciding on criteria for proclaiming discoveries. It was recently proved that all admissible post hoc inference methods for the true... Show moreSimultaneous inference allows for the exploration of data while deciding on criteria for proclaiming discoveries. It was recently proved that all admissible post hoc inference methods for the true discoveries must employ closed testing. In this paper, we investigate efficient closed testing with local tests of a special form: thresholding a function of sums of test scores for the individual hypotheses. Under this special design, we propose a new statistic that quantifies the cost of multiplicity adjustments, and we develop fast (mostly linear-time) algorithms for post hoc inference. Paired with recent advances in global null tests based on generalized means, our work instantiates a series of simultaneous inference methods that can handle many dependence structures and signal compositions. We provide guidance on the method choices via theoretical investigation of the conservativeness and sensitivity for different local tests, as well as simulations that find analogous behavior for local tests and full closed testing. Show less
Ipenburg, N.A.; Sharouni, M.A. el; Doorn, R. van; Diest, P.J. van; Leerdam, M.E. van; Rhee, J.I. van der; ... ; Netherlands Fdn Detection 2022
We construct confidence regions in high dimensions by inverting the globaltest statistics, and use them to choose the tuning parameter for penalized regression. The selected model corresponds to... Show moreWe construct confidence regions in high dimensions by inverting the globaltest statistics, and use them to choose the tuning parameter for penalized regression. The selected model corresponds to the point in the confidence region of the parameters that minimizes the penalty, making it the least complex model that still has acceptable fit according to the test that defines the confidence region. As the globaltest is particularly powerful in the presence of many weak predictors, it connects well to ridge regression, and we thus focus on ridge penalties in this paper. The confidence region method is quick to calculate, intuitive, and gives decent predictive potential. As a tuning parameter selection method it may even outperform classical methods such as cross-validation in terms of mean squared error of prediction, especially when the signal is weak. We illustrate the method for linear models in simulation study and for Cox models in real gene expression data of breast cancer samples. Show less
Mohamad, D. al; Zwet, E. van; Solari, A.; Goeman, J. 2021
We consider the problem of constructing simultaneous confidence intervals (CIs) for the ranks of n means based on their estimates together with the (known) standard errors of those estimates. We... Show moreWe consider the problem of constructing simultaneous confidence intervals (CIs) for the ranks of n means based on their estimates together with the (known) standard errors of those estimates. We present a generic method based on the partitioning principle in which the parameter space is partitioned into disjoint subsets and then each one of them is tested at level a. The resulting CIs have then a simultaneous coverage of 1 - alpha. We show that any procedure which produces simultaneous CIs for ranks can be written as a partitioning procedure. We present a first example where we test the partitions using the likelihood ratio (LR) test. Then, in a second example we show that a recently proposed method for simultaneous CIs for ranks using Tukey's honest significant difference test has an equivalent procedure based on the partitioning principle. By embedding these two methods inside our generic partitioning procedure, we obtain improved variants. We illustrate the performance of these methods through simulations and real data analysis on hotel ratings. While the novel method that uses the LR test and its variant produce shorter CIs when the number of means is small, the Tukey-based method and its variant produce shorter CIs when the number of means is high. Show less
Bulk, M.; Harten, T. van; Kenkhuis, B.; Inglese, F.; Hegeman, I.; Duinen, S. van; ... ; Ronen, I. 2021
Systemic lupus erythematosus (SLE) is an auto-immune disease characterized by multi-organ involvement. Although uncommon, central nervous system involvement in SLE, termed neuropsychiatric SLE ... Show moreSystemic lupus erythematosus (SLE) is an auto-immune disease characterized by multi-organ involvement. Although uncommon, central nervous system involvement in SLE, termed neuropsychiatric SLE (NPSLE), is not an exception. Current knowledge on underlying pathogenic mechanisms is incomplete, however, neuroinflammation is thought to play a critical role. Evidence from neurodegenerative diseases and multiple sclerosis suggests that neuroinflammation is correlated with brain iron accumulation, making quantitative susceptibility mapping (QSM) a potential hallmark for neuroinflammation in vivo. This study assessed susceptibility values of the thalamus and basal ganglia in (NP)SLE patients and further investigated the in vivo findings with histological analyses of postmortem brain tissue derived from SLE patients. We used a 3T MRI scanner to acquire single-echo T2*-weighted images of 44 SLE patients and 20 age-matched healthy controls. Of the 44 patients with SLE, all had neuropsychiatric complaints, of which 29 were classified as non-NPSLE and 15 as NPSLE (seven as inflammatory NPSLE and eight as ischemic NPSLE). Mean susceptibility values of the thalamus, caudate nucleus, putamen, and globus pallidus were calculated. Formalin-fixed paraffin-embedded post-mortem brain tissue including the putamen and globus pallidus of three additional SLE patients was obtained and stained for iron, microglia and astrocytes. Susceptibility values of SLE patients and age-matched controls showed that iron levels in the thalamus and basal ganglia were not changed due to the disease. No subgroup of SLE showed higher susceptibility values. No correlation was found with disease activity or damage due to SLE. Histological examination of the post-mortem brain showed no increased iron accumulation. Our results suggest that neuroinflammation in NPSLE does not necessarily go hand in hand with iron accumulation, and that the inflammatory pathomechanism in SLE may differ from the one observed in neurodegenerative diseases and in multiple sclerosis. Show less
Edelmann, D.; Saadati, M.; Putter, H.; Goeman, J. 2020
Standard tests for the Cox model, such as the likelihood ratio test or the Wald test, do not perform well in situations, where the number of covariates is substantially higher than the number of... Show moreStandard tests for the Cox model, such as the likelihood ratio test or the Wald test, do not perform well in situations, where the number of covariates is substantially higher than the number of observed events. This issue is perpetuated in competing risks settings, where the number of observed occurrences for each event type is usually rather small. Yet, appropriate testing methodology for competing risks survival analysis with few events per variable is missing. In this article, we show how to extend the global test for survival by Goeman et al. to competing risks and multistate models[Per journal style, abstracts should not have reference citations. Therefore, can you kindly delete this reference citation.]. Conducting detailed simulation studies, we show that both for type I error control and for power, the novel test outperforms the likelihood ratio test and the Wald test based on the cause-specific hazards model in settings where the number of events is small compared to the number of covariates. The benefit of the global tests for competing risks survival analysis and multistate models is further demonstrated in real data examples of cancer patients from the European Society for Blood and Marrow Transplantation. Show less
Ebrahimpoor, M.; Spitali, P.; Hettne, K.; Tsonaka, R.; Goeman, J. 2020
Studying sets of genomic features is increasingly popular in genomics, proteomics and metabolomics since analyzing at set level not only creates a natural connection to biological knowledge but... Show moreStudying sets of genomic features is increasingly popular in genomics, proteomics and metabolomics since analyzing at set level not only creates a natural connection to biological knowledge but also offers more statistical power. Currently, there are two gene-set testing approaches, self-contained and competitive, both of which have their advantages and disadvantages, but neither offers the final solution. We introduce simultaneous enrichment analysis (SEA), a new approach for analysis of feature sets in genomics and other omics based on a new unified null hypothesis, which includes the self-contained and competitive null hypotheses as special cases. We employ closed testing using Simes tests to test this new hypothesis. For every feature set, the proportion of active features is estimated, and a confidence bound is provided. Also, for every unified null hypotheses, a P-value is calculated, which is adjusted for family-wise error rate. SEA does not need to assume that the features are independent. Moreover, users are allowed to choose the feature set(s) of interest after observing the data. We develop a novel pipeline and apply it on RNA-seq data of dystrophin-deficient mdx mice, showcasing the flexibility of the method. Finally, the power properties of the method are evaluated through simulation studies. Show less
Dizier, B.; Callegaro, A.; Debois, M.; Dreno, B.; Hersey, P.; Gogas, H.J.; ... ; Ulloa-Montoya, F. 2020
Purpose: Immune components of the tumor microenvironment (TME) have been associated with disease outcome. We prospectively evaluated the association of an immune-related gene signature (GS) with... Show morePurpose: Immune components of the tumor microenvironment (TME) have been associated with disease outcome. We prospectively evaluated the association of an immune-related gene signature (GS) with clinical outcome in melanoma and non-small cell lung cancer (NSCLC) tumor samples from two phase III studies.Experimental Design: The GS was prospectively validated using an adaptive signature design to optimize it for the sample type and technology used in phase III studies. One-third of the samples were used as "training set"; the remaining two thirds, constituting the "test set," were used for the prospective validation of the GS.Results: In the melanoma training set, the expression level of eight Th1/IFN gamma-related genes in tumor-positive lymph node tissue predicted the duration of disease-free survival (DFS) and overall survival (OS) in the placebo arm. This GS was prospectively and independently validated as prognostic in the test set. Building a multivariate Cox model in the test set placebo patients from clinical covariates and the GS score, an increased number of melanoma-involved lymph nodes and the GS were associated with DFS and OS. This GS was not associated with DFS in NSCLC, although expression of the Th1/IFN gamma-related genes was associated with the presence of lymphocytes in tumor samples in both indications.Conclusions: These findings provide evidence that expression of Th1/IFN gamma genes in the TME, as measured by this GS, is associated with clinical outcome in melanoma. This suggests that, using this GS, patients with stage IIIB/C melanoma can be classified into different risk groups. Show less
Raz, Y.; Akker, E.B. van den; Roest, T.; Riaz, M.; Rest, O. van de; Suchiman, H.E.D.; ... ; Slagboom, P.E. 2020
Skeletal muscles control posture, mobility and strength, and influence whole-body metabolism. Muscles are built of different types of myofibers, each having specific metabolic, molecular, and... Show moreSkeletal muscles control posture, mobility and strength, and influence whole-body metabolism. Muscles are built of different types of myofibers, each having specific metabolic, molecular, and contractile properties. Fiber classification is, therefore, regarded the key for understanding muscle biology, (patho-) physiology. The expression of three myosin heavy chain (MyHC) isoforms, MyHC-1, MyHC-2A, and MyHC-2X, marks myofibers in humans. Typically, myofiber classification is performed by an eye-based histological analysis. This classical approach is insufficient to capture complex fiber classes, expressing more than one MyHC-isoform. We, therefore, developed a methodological procedure for high-throughput characterization of myofibers on the basis of multiple isoforms. The mean fluorescence intensity of the three most abundant MyHC isoforms was measured per myofiber in muscle biopsies of 56 healthy elderly adults, and myofiber classes were identified using computational biology tools. Unsupervised clustering revealed the existence of six distinct myofiber clusters. A comparison with the visual assessment of myofibers using the same images showed that some of these myofiber clusters could not be detected or were frequently misclassified. The presence of these six clusters was reinforced by RNA expressions levels of sarcomeric genes. In addition, one of the clusters, expressing all three MyHC isoforms, correlated with histological measures of muscle health. To conclude, this methodological procedure enables deep characterization of the complex muscle heterogeneity. This study opens opportunities to further investigate myofiber composition in comparative studies. Show less
Background: Huntington's disease (HD) is a devastating brain disorder with no effective treatment or cure available. The scarcity of brain tissue makes it hard to study changes in the brain and... Show moreBackground: Huntington's disease (HD) is a devastating brain disorder with no effective treatment or cure available. The scarcity of brain tissue makes it hard to study changes in the brain and impossible to perform longitudinal studies. However, peripheral pathology in HD suggests that it is possible to study the disease using peripheral tissue as a monitoring tool for disease progression and/or efficacy of novel therapies. In this study, we investigated if blood can be used to monitor disease severity and progression in brain. Since previous attempts using only gene expression proved unsuccessful, we compared blood and brain Huntington's disease signatures in a functional context.Methods: Microarray HD gene expression profiles from three brain regions were compared to the transcriptome of HD blood generated by next generation sequencing. The comparison was performed with a combination of weighted gene co-expression network analysis and literature based functional analysis (Concept Profile Analysis). Uniquely, our comparison of blood and brain datasets was not based on (the very limited) gene overlap but on the similarity between the gene annotations in four different semantic categories: "biological process", "cellular component", "molecular function" and "disease or syndrome".Results: We identified signatures in HD blood reflecting a broad pathophysiological spectrum, including alterations in the immune response, sphingolipid biosynthetic processes, lipid transport, cell signaling, protein modification, spliceosome, RNA splicing, vesicle transport, cell signaling and synaptic transmission. Part of this spectrum was reminiscent of the brain pathology. The HD signatures in caudate nucleus and BA4 exhibited the highest similarity with blood, irrespective of the category of semantic annotations used. BA9 exhibited an intermediate similarity, while cerebellum had the least similarity. We present two signatures that were shared between blood and brain: immune response and spinocerebellar ataxias.Conclusions: Our results demonstrate that HD blood exhibits dysregulation that is similar to brain at a functional level, but not necessarily at the level of individual genes. We report two common signatures that can be used to monitor the pathology in brain of HD patients in a non-invasive manner. Our results are an exemplar of how signals in blood data can be used to represent brain disorders. Our methodology can be used to study disease specific signatures in diseases where heterogeneous tissues are involved in the pathology. Show less
Mina, E.; Roon-Mom, W. van; Hettne, K.; Zwet, E. van; Goeman, J.; Neri, C.; ... ; Roos, M. 2016