Background: FMRI resting state networks (RSNs) are used to characterize brain disorders. They also show extensive heterogeneity across patients. Identifying systematic differences between RSNs in... Show moreBackground: FMRI resting state networks (RSNs) are used to characterize brain disorders. They also show extensive heterogeneity across patients. Identifying systematic differences between RSNs in patients, i.e. discovering neurofunctional subtypes, may further increase our understanding of disease heterogeneity. Currently, no methodology is available to estimate neurofunctional subtypes and their associated RSNs simultaneously. New method: We present an unsupervised learning method for fMRI data, called Clusterwise Independent Component Analysis (C-ICA). This enables the clustering of patients into neurofunctional subtypes based on differences in shared ICA-derived RSNs.The parameters are estimated simultaneously, which leads to an improved estimation of subtypes and their associated RSNs. Results: In five simulation studies, the C-ICA model is successfully validated using both artificially and realistically simulated data (N = 30-40). The successful performance of the C-ICA model is also illustrated on an empirical data set consisting of Alzheimer's disease patients and elderly control subjects (N = 250). C-ICA is able to uncover a meaningful clustering that partially matches (balanced accuracy = .72) the diagnostic labels and identifies differences in RSNs between the Alzheimer and control cluster. Comparison with other methods: Both in the simulation study and the empirical application, C-ICA yields better results compared to competing clustering methods (i.e., a two step clustering procedure based on single subject ICA's and a Group ICA plus dual regression variant thereof) that do not simultaneously estimate a clustering and associated RSNs. Indeed, the overall mean adjusted Rand Index, a measure for cluster recovery, equals 0.65 for C-ICA and ranges from 0.27 to 0.46 for competing methods. Conclusions: The successful performance of C-ICA indicates that it is a promising method to extract neuro-functional subtypes from multi-subject resting state-fMRI data. This method can be applied on fMRI scans of patient groups to study (neurofunctional) subtypes, which may eventually further increase understanding of disease heterogeneity. Show less
In this project we examine whether homicide ‘clusters together’ with other adverse health outcomes in the Netherlands, focusing on child mortality, suicide, sexual risk behavior, and substance... Show moreIn this project we examine whether homicide ‘clusters together’ with other adverse health outcomes in the Netherlands, focusing on child mortality, suicide, sexual risk behavior, and substance abuse. We expect moderate-to-strong correlations between homicide and the other adverse health phenomena (hypothesis 1). Further, we expect that these correlations will be reduced when social disorganization is controlled for (hypothesis 2).The study used population-level data between the years 1996 and 2019, for each of the 40 local regions of the Netherlands. We applied a multilevel correlation procedure to evaluate correlations between homicide and the other adverse health outcomes. Correlations between homicide and the other adverse health outcomes were modest. That is, we found only limited evidence for clustering between homicide and the other adverse health outcomes. The patterns of clustering that did occur, suggested that social disorganization in the region promotes risk-taking behaviors in the population, which ultimately increases rates of homicide, abuse of illegal drugs and births to adolescent parents.Project materials, syntax and supplementary information can be found on the Open Science Framework at https://osf.io/jd5yu/. Show less
Large and complex data sets are increasingly available for research in critical care. To analyze these data, researchers use techniques commonly referred to as statistical learning or machine... Show moreLarge and complex data sets are increasingly available for research in critical care. To analyze these data, researchers use techniques commonly referred to as statistical learning or machine learning (ML). The latter is known for large successes in the field of diagnostics, for example, by identification of radiological anomalies. In other research areas, such as clustering and prediction studies, there is more discussion regarding the benefit and efficiency of ML techniques compared with statistical learning. In this viewpoint, we aim to explain commonly used statistical learning and ML techniques and provide guidance for responsible use in the case of clustering and prediction questions in critical care. Clustering studies have been increasingly popular in critical care research, aiming to inform how patients can be characterized, classified, or treated differently. An important challenge for clustering studies is to ensure and assess generalizability. This limits the application of findings in these studies toward individual patients. In the case of predictive questions, there is much discussion as to what algorithm should be used to most accurately predict outcome. Aspects that determine usefulness of ML, compared with statistical techniques, include the volume of the data, the dimensionality of the preferred model, and the extent of missing data. There are areas in which modern ML methods may be preferred. However, efforts should be made to implement statistical frameworks (e.g., for dealing with missing data or measurement error, both omnipresent in clinical data) in ML methods. To conclude, there are important opportunities but also pitfalls to consider when performing clustering or predictive studies with ML techniques. We advocate careful valuation of new data-driven findings. More interaction is needed between the engineer mindset of experts in ML methods, the insight in bias of epidemiologists, and the probabilistic thinking of statisticians to extract as much information and knowledge from data as possible, while avoiding harm. Show less
Objectives: To identify patterns of spatial clustering of leprosy. Design: We performed a baseline survey for a trial on post-exposure prophylaxis for leprosy in Comoros and Madagascar. We screened... Show moreObjectives: To identify patterns of spatial clustering of leprosy. Design: We performed a baseline survey for a trial on post-exposure prophylaxis for leprosy in Comoros and Madagascar. We screened 64 villages, door-to-door, and recorded results of screening, demographic data and geographic coordinates. To identify clusters, we fitted a purely spatial Poisson model using Kulldorff's spatial scan statistic. We used a regular Poisson model to assess the risk of contracting leprosy at the individual level as a function of distance to the nearest known leprosy patient. Results: We identified 455 leprosy patients; 200 (4 4.0%) belonged to 2735 households included in a cluster. Thirty-eight percent of leprosy patients versus 10% of the total population live <25 m from another leprosy patient. Risk ratios for being diagnosed with leprosy were 7.3, 2.4, 1.8, 1.4 and 1.7, for those at the same household, at 1-<25 m, 25-<50 m, 50-<75 m and 75-<100 m as/from a leprosy patient, respectively, compared to those living at >100 m. Conclusions: We documented significant clustering of leprosy beyond household level, although 56% of cases were not part of a cluster. Control measures need to be extended beyond the household, and social networks should be further explored. (c) 2021 The Author(s). Published by Elsevier Ltd on behalf of International Society for Infectious Diseases. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-ncnd/4.0/). Show less
Distance association models constitute a useful tool for the analysis and graphical representation of cross-classified data in which distances between points inversely describe the association... Show moreDistance association models constitute a useful tool for the analysis and graphical representation of cross-classified data in which distances between points inversely describe the association between two categorical variables. When the number of cells is large and the data counts result in sparse tables, the combination of clustering and representation reduces the number of parameters to be estimated and facilitates interpretation. In this article, a latent block distance-association model is proposed to apply block clustering to the outcomes of two categorical variables while the cluster centers are represented in a low dimensional space in terms of a distance-association model. This model is particularly useful for contingency tables in which both the rows and the columns are characterized as profiles of sets of response variables. The parameters are estimated under a Poisson sampling scheme using a generalized EM algorithm. The performance of the model is tested in a Monte Carlo experiment, and an empirical data set is analyzed to illustrate the model. Show less
Background: Psychiatric disorders are highly heterogeneous, defined based on symptoms with little connection to potential underlying biological mechanisms. A possible approach to dissect biological... Show moreBackground: Psychiatric disorders are highly heterogeneous, defined based on symptoms with little connection to potential underlying biological mechanisms. A possible approach to dissect biological heterogeneity is to look for biologically meaningful subtypes. A recent study Drysdale et al. (2017) showed promising results along this line by simultaneously using resting state fMRI and clinical data and identified four distinct subtypes of depression with different clinical profiles and abnormal resting state fMRI connectivity. These subtypes were predictive of treatment response to transcranial magnetic stimulation therapy.Objective: Here, we attempted to replicate the procedure followed in the Drysdale a al. study and their findings in a different clinical population and a more heterogeneous sample of 187 participants with depression and anxiety. We aimed to answer the following questions: 1) Using the same procedure, can we find a statistically significant and reliable relationship between brain connectivity and clinical symptoms? 2) Is the observed relationship similar to the one found in the original study? 3) Can we identify distinct and reliable subtypes? 4) Do they have similar clinical profiles as the subtypes identified in the original study?Methods: We followed the original procedure as closely as possible, including a canonical correlation analysis to find a low dimensional representation of clinically relevant resting state fMRI features, followed by hierarchical clustering to identify subtypes. We extended the original procedure using additional statistical tests, to test the statistical significance of the relationship between resting state fMRI and clinical data, and the existence of distinct subtypes. Furthermore, we examined the stability of the whole procedure using resampling.Results and conclusion: As in the original study, we found extremely high canonical correlations between functional connectivity and clinical symptoms, and an optimal three-cluster solution. However, neither canonical correlations nor clusters were statistically significant. On the basis of our extensive evaluations of the analysis methodology used and within the limits of comparison of our sample relative to the sample used in Drysdale et al., we argue that the evidence for the existence of the distinct resting state connectivity-based subtypes of depression should be interpreted with caution. Show less