Rapidly mutating Y-chromosomal short tandem repeats (RM Y-STRs) were suggested for differentiating patrilineally related men as relevant in forensic genetics, anthropological genetics, and genetic... Show moreRapidly mutating Y-chromosomal short tandem repeats (RM Y-STRs) were suggested for differentiating patrilineally related men as relevant in forensic genetics, anthropological genetics, and genetic genealogy. Empirical data are available for closely related males, while differentiation rates for more distant relatives are scarce. Available RM Y-STR mutation rate estimates are typically based on father-son pair data, while pedigree-based studies for efficient analysis requiring less samples are rare. Here, we present a large-scale pedigree analysis in 9379 pairs of men separated by 1-34 meioses on 30 Y-STRs with increased mutation rates including all known RM Y-STRs (RMplex). For comparison, part of the samples were genotyped at 25 standard Y-STRs mostly with moderate mutation rates (Yfiler Plus). For 43 of the 49 Y-STRs analyzed, pedigree-based mutation rates were similar to previous father-son based estimates, while for six markers significant differences were observed. Male relative differentiation rates from the 30 RMplex Y-STRs were 43%, 84%, 96%, 99%, and 100% for relatives separated by one, four, six, nine, and twelve meioses, respectively, which largely exceeded rates obtained by 25 standard Y-STRs. Machine learning based models for predicting the degree of patrilineal consanguinity yielded accurate and reasonably precise predictions when using RM Y-STRs. Fully matching haplotypes resulted in a 95% confidence interval of 1-6 meioses with RMplex compared to 1-25 with Yfiler Plus. Our comprehensive pedigree study demonstrates the value of RM Y-STRs for differentiating male relatives of various types, in many cases achieving individual identification, thereby overcoming the largest limitation of forensic Y-chromosome analysis. Show less
Maas, S.C.E.; Vidaki, A.; Teumer, A.; Costeira, R.; Wilson, R.; Dongen, J. van; ... ; Kayser, M. 2021
Background Information on long-term alcohol consumption is relevant for medical and public health research, disease therapy, and other areas. Recently, DNA methylation-based inference of alcohol... Show moreBackground Information on long-term alcohol consumption is relevant for medical and public health research, disease therapy, and other areas. Recently, DNA methylation-based inference of alcohol consumption from blood was reported with high accuracy, but these results were based on employing the same dataset for model training and testing, which can lead to accuracy overestimation. Moreover, only subsets of alcohol consumption categories were used, which makes it impossible to extrapolate such models to the general population. By using data from eight population-based European cohorts (N = 4677), we internally and externally validated the previously reported biomarkers and models for epigenetic inference of alcohol consumption from blood and developed new models comprising all data from all categories. Results By employing data from six European cohorts (N = 2883), we empirically tested the reproducibility of the previously suggested biomarkers and prediction models via ten-fold internal cross-validation. In contrast to previous findings, all seven models based on 144-CpGs yielded lower mean AUCs compared to the models with less CpGs. For instance, the 144-CpG heavy versus non-drinkers model gave an AUC of 0.78 +/- 0.06, while the 5 and 23 CpG models achieved 0.83 +/- 0.05, respectively. The transportability of the models was empirically tested via external validation in three independent European cohorts (N = 1794), revealing high AUC variance between datasets within models. For instance, the 144-CpG heavy versus non-drinkers model yielded AUCs ranging from 0.60 to 0.84 between datasets. The newly developed models that considered data from all categories showed low AUCs but gave low AUC variation in the external validation. For instance, the 144-CpG heavy and at-risk versus light and non-drinkers model achieved AUCs of 0.67 +/- 0.02 in the internal cross-validation and 0.61-0.66 in the external validation datasets. Conclusions The outcomes of our internal and external validation demonstrate that the previously reported prediction models suffer from both overfitting and accuracy overestimation. Our results show that the previously proposed biomarkers are not yet sufficient for accurate and robust inference of alcohol consumption from blood. Overall, our findings imply that DNA methylation prediction biomarkers and models need to be improved considerably before epigenetic inference of alcohol consumption from blood can be considered for practical applications. Show less
Altena, E.; Smeding, R.; Gaag, K.J. van der; Larmuseau, M.H.D.; Decorte, R.; Lao, O.; ... ; Knijff, P. de 2020
Previous studies indicated existing, albeit limited, genetic-geographic population substructure in the Dutch population based on genome-wide data and a lack of this for mitochondrial SNP based data... Show morePrevious studies indicated existing, albeit limited, genetic-geographic population substructure in the Dutch population based on genome-wide data and a lack of this for mitochondrial SNP based data. Despite the aforementioned studies, Y-chromosomal SNP data from the Netherlands remain scarce and do not cover the territory of the Netherlands well enough to allow a reliable investigation of genetic-geographic population substructure. Here we provide the first substantial dataset of detailed spatial Y-chromosomal haplogroup information in 2085 males collected across the Netherlands and supplemented with previously published data from northern Belgium. We found Y-chromosomal evidence for genetic-geographic population substructure, and several Y-haplogroups demonstrating significant clinal frequency distributions in different directions. By means of prediction surface maps we could visualize (complex) distribution patterns of individual Y-haplogroups in detail. These results highlight the value of a micro-geographic approach and are of great use for forensic and epidemiological investigations and our understanding of the Dutch population history. Moreover, the previously noted absence of genetic-geographic population substructure in the Netherlands based on mitochondrial DNA in contrast to our Y-chromosome results, hints at different population histories for women and men in the Netherlands. Show less
Telomere length (TL) regulation is an important factor in ageing, reproduction and cancer development. Genetic, hereditary and environmental factors regulating TL are currently widely investigated,... Show moreTelomere length (TL) regulation is an important factor in ageing, reproduction and cancer development. Genetic, hereditary and environmental factors regulating TL are currently widely investigated, however, their relative contribution to TL variability is still understudied. We have used whole genome sequencing data of 250 family trios from the Genome of the Netherlands project to perform computational measurement of TL and a series of regression and genome-wide association analyses to reveal TL inheritance patterns and associated genetic factors. Our results confirm that TL is a largely heritable trait, primarily with mother's, and, to a lesser extent, with father's TL having the strongest influence on the offspring. In this cohort, mother's, but not father's age at conception was positively linked to offspring TL. Age-related TL attrition of 40 bp/year had relatively small influence on TL variability. Finally, we have identified TL-associated variations in ribonuclease reductase catalytic subunit M1 (RRM1 gene), which is known to regulate telomere maintenance in yeast. We also highlight the importance of multivariate approach and the limitations of existing tools for the analysis of TL as a polygenic heritable quantitative trait. Show less
Inferring a person's smoking habit and history from blood is relevant for complementing or replacing self-reports in epidemiological and public health research, and for forensic applications.... Show moreInferring a person's smoking habit and history from blood is relevant for complementing or replacing self-reports in epidemiological and public health research, and for forensic applications. However, a finite DNA methylation marker set and a validated statistical model based on a large dataset are not yet available. Employing 14 epigenome-wide association studies for marker discovery, and using data from six population-based cohorts (N = 3764) for model building, we identified 13 CpGs most suitable for inferring smoking versus non-smoking status from blood with a cumulative Area Under the Curve (AUC) of 0.901. Internal fivefold cross-validation yielded an average AUC of 0.897 +/- 0.137, while external model validation in an independent population-based cohort (N = 1608) achieved an AUC of 0.911. These 13 CpGs also provided accurate inference of current (average AUC(crossvalidation) 0.925 +/- 0.021, AUC(externalvalidation)0.914), former (0.766 +/- 0.023, 0.699) and never smoking (0.830 +/- 0.019, 0.781) status, allowed inferring pack-years in current smokers (10 pack-years 0.800 +/- 0.068, 0.796; 15 pack-years 0.767 +/- 0.102, 0.752) and inferring smoking cessation time in former smokers (5 years 0.774 +/- 0.024, 0.760; 10 years 0.766 +/- 0.033, 0.764; 15 years 0.767 +/- 0.020, 0.754). Model application to children revealed highly accurate inference of the true non- smoking status (6 years of age: accuracy 0.994, N = 355; 10 years: 0.994, N = 309), suggesting prenatal and passive smoking exposure having no impact on model applications in adults. The finite set of DNA methylation markers allow accurate inference of smoking habit, with comparable accuracy as plasma cotinine use, and smoking history from blood, which we envision becoming useful in epidemiology and public health research, and in medical and forensic applications. Show less
Ralf, A.; Oven, M. van; Gonzalez, D.M.; Knijff, P. de; Beek, K. van der; Wootton, S.; ... ; Kayser, M. 2019
Y-chromosomal haplogroups assigned from male-specific Y-chromosomal single nucleotide polymorphisms (Y-SNPs) allow paternal lineage identification and paternal bio-geographic ancestry inference,... Show moreY-chromosomal haplogroups assigned from male-specific Y-chromosomal single nucleotide polymorphisms (Y-SNPs) allow paternal lineage identification and paternal bio-geographic ancestry inference, both being relevant in forensic genetics. However, most previously developed forensic Y-SNP tools did not provide Y haplogroup resolution on the high level needed in forensic applications, because the limited multiplex capacity of the DNA technologies used only allowed the inclusion of a relatively small number of Y-SNPs. In a proof-of-principle study, we recently demonstrated that high-resolution Y haplogrouping is feasible via two AmpliSeq PCR analyses and simultaneous massively parallel sequencing (MPS) of 530 Y-SNPs allowing the inference of 432 Y-haplogroups. With the current study, we present a largely improved Y-SNP MPS lab tool that we specifically designed for the analysis of low quality and quantity DNA often confronted with in forensic DNA analysis. Improvements include i) Y-SNP marker selection based on the "minimal reference phylogeny for the human Y chromosome" (PhyloTree Y), ii) strong increase of the number of targeted Y-SNPs allowing many more Y haplogroups to be inferred, iii) focus on short amplicon length enabling successful analysis of degraded DNA, and iv) combination of all amplicons in a single AmpliSeq PCR and simultaneous sequencing allowing single DNA aliquot use. This new MPS tool simultaneously analyses 859 Y-SNPs and allows inferring 640 Y haplogroups. Preliminary forensic developmental validation testing revealed that this tool performs highly accurate, is sensitive and robust. We also provide a revised software tool for analysing the sequencing data produced by the new MPS lab tool including final Y haplogroup assignment. We envision the tools introduced here for high-resolution Y-chromosomal haplogrouping to determine a man's paternal lineage and/or paternal bio-geographic ancestry to become widely used in forensic Y-chromosome DNA analysis and other applications were Y haplogroup information from low quality / quantity DNA samples is required. Show less
Gul, A.; Jong, M.A. de; Gijt, J.P. de; Wolvius, E.B.; Kayser, M.; Bohringer, S.; Koudstaal, M.J. 2019
X-inactivation is a well-established dosage compensation mechanism ensuring that X-chromosomal genes are expressed at comparable levels in males and females. Skewed X-inactivation is often... Show moreX-inactivation is a well-established dosage compensation mechanism ensuring that X-chromosomal genes are expressed at comparable levels in males and females. Skewed X-inactivation is often explained by negative selection of one of the alleles. We demonstrate that imbalanced expression of the paternal and maternal X-chromosomes is common in the general population and that the random nature of the X-inactivation mechanism can be sufficient to explain the imbalance. To this end, we analyzed blood-derived RNA and whole-genome sequencing data from 79 female children and their parents from the Genome of the Netherlands project. We calculated the median ratio of the paternal over total counts at all X-chromosomal heterozygous single-nucleotide variants with coverage ≥10. We identified two individuals where the same X-chromosome was inactivated in all cells. Imbalanced expression of the two X-chromosomes (ratios ≤0.35 or ≥0.65) was observed in nearly 50% of the population. The empirically observed skewing is explained by a theoretical model where X-inactivation takes place in an embryonic stage in which eight cells give rise to the hematopoietic compartment. Genes escaping X-inactivation are expressed from both alleles and therefore demonstrate less skewing than inactivated genes. Using this characteristic, we identified three novel escapee genes (SSR4, REPS2, and SEPT6), but did not find support for many previously reported escapee genes in blood. Our collective data suggest that skewed X-inactivation is common in the general population. This may contribute to manifestation of symptoms in carriers of recessive X-linked disorders. We recommend that X-inactivation results should not be used lightly in the interpretation of X-linked variants. Show less