Multi-ancestry genome-wide association analyses identify 124 risk loci for rheumatoid arthritis, of which 34 are novel. A polygenic risk score based on multi-ancestry data showed comparable... Show moreMulti-ancestry genome-wide association analyses identify 124 risk loci for rheumatoid arthritis, of which 34 are novel. A polygenic risk score based on multi-ancestry data showed comparable performance between populations of European and East Asian ancestries.Rheumatoid arthritis (RA) is a highly heritable complex disease with unknown etiology. Multi-ancestry genetic research of RA promises to improve power to detect genetic signals, fine-mapping resolution and performances of polygenic risk scores (PRS). Here, we present a large-scale genome-wide association study (GWAS) of RA, which includes 276,020 samples from five ancestral groups. We conducted a multi-ancestry meta-analysis and identified 124 loci (P < 5 x 10(-8)), of which 34 are novel. Candidate genes at the novel loci suggest essential roles of the immune system (for example, TNIP2 and TNFRSF11A) and joint tissues (for example, WISP1) in RA etiology. Multi-ancestry fine-mapping identified putatively causal variants with biological insights (for example, LEF1). Moreover, PRS based on multi-ancestry GWAS outperformed PRS based on single-ancestry GWAS and had comparable performance between populations of European and East Asian ancestries. Our study provides several insights into the etiology of RA and improves the genetic predictability of RA. Show less
Objective To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by... Show moreObjective To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects. Material and Methods We used 1872 billing codes in EHRs of 102 880 patients from 12 healthcare systems. Using tools borrowed from single-cell omics, we mitigated center-specific batch effects and performed clustering to identify patients with highly similar medical history patterns across the various centers. Our visualization method (PheSpec) depicts the phenotypic profile of clusters, applies a novel filtering of noninformative codes (Ranked Scope Pervasion), and indicates the most distinguishing features. Results We observed 114 clinically meaningful profiles, for example, linking prostate hyperplasia with cancer and diabetes with cardiovascular problems and grouping pediatric developmental disorders. Our framework identified disease subsets, exemplified by 6 "other headache" clusters, where phenotypic profiles suggested different underlying mechanisms: migraine, convulsion, injury, eye problems, joint pain, and pituitary gland disorders. Phenotypic patterns replicated well, with high correlations of >= 0.75 to an average of 6 (2-8) of the 12 different cohorts, demonstrating the consistency with which our method discovers disease history profiles. Discussion Costly clinical research ventures should be based on solid hypotheses. We repurpose methods from single-cell omics to build these hypotheses from observational EHR data, distilling useful information from complex data. Conclusion We establish a generalizable pipeline for the identification and replication of clinically meaningful (sub)phenotypes from widely available high-dimensional billing codes. This approach overcomes datatype problems and produces comprehensive visualizations of validation-ready phenotypes. Show less
Khan, A.; Shang, N.; Petukhova, L.; Zhang, J.; Shen, Y.F.; Hebbring, S.J.; ... ; Kiryluk, K. 2021
Background Genetic variants in complement genes have been associated with a wide range of human disease states, but well-powered genetic association studies of complement activation have not been... Show moreBackground Genetic variants in complement genes have been associated with a wide range of human disease states, but well-powered genetic association studies of complement activation have not been performed in large multiethnic cohorts. Methods We performed medical records?based genome-wide and phenome-wide association studies for plasma C3 and C4 levels among participants of the Electronic Medical Records and Genomics (eMERGE) network. Results In a GWAS for C3 levels in 3949 individuals, we detected two genome-wide significant loci: chr.1q31.3 (CFH locus; rs3753396-A; ?=0.20; 95% CI, 0.14 to 0.25; P=1.52x10(-11)) and chr.19p13.3 (C3 locus; rs11569470-G; ?=0.19; 95% CI, 0.13 to 0.24; P=1.29x10(-8)). These two loci explained approximately 2% of variance in C3 levels. GWAS for C4 levels involved 3998 individuals and revealed a genome-wide significant locus at chr.6p21.32 (C4 locus; rs3135353-C; ?=0.40; 95% CI, 0.34 to 0.45; P=4.58x10(-35)). This locus explained approximately 13% of variance in C4 levels. The multiallelic copy number variant analysis defined two structural genomic C4 variants with large effect on blood C4 levels: C4-BS (?=?0.36; 95% CI, ?0.42 to ?0.30; P=2.98x10(-22)) and C4-AL-BS (?=0.25; 95% CI, 0.21 to 0.29; P=8.11x10(-23)). Overall, C4 levels were strongly correlated with copy numbers of C4A and C4B genes. In comprehensive phenome-wide association studies involving 102,138 eMERGE participants, we cataloged a full spectrum of autoimmune, cardiometabolic, and kidney diseases genetically related to systemic complement activation. Conclusions We discovered genetic determinants of plasma C3 and C4 levels using eMERGE genomic data linked to electronic medical records. Genetic variants regulating C3 and C4 levels have large effects and multiple clinical correlations across the spectrum of complement-related diseases in humans.Significance Statement The complement pathway represents one of the critical arms of the innate immune system. We combined genome-wide and phenome-wide association studies using medical records data for C3 and C4 levels to discover common genetic variants controlling systemic complement activation. Three genome-wide significant loci had large effects on complement levels. These loci encode three critical complement genes: CFH, C3, and C4. We performed detailed functional annotations of the significant loci, including multiallelic copy number variant analysis of the C4 locus to define two structural genomic variants with large effects on C4 levels. Blood C4 levels were strongly correlated with the copy number of C4A and C4B genes. Lastly, using genome-wide genetic correlations and electronic health records?based phenome-wide association studies in 102,138 participants, we catalogued a spectrum of human diseases genetically related to systemic complement activation, including inflammatory, autoimmune, cardiometabolic, and kidney diseases. Show less
Objective To identify interactions between genetic factors and current or recent smoking in relation to risk of developing systemic lupus erythematosus (SLE). Methods For the study, 673 patients... Show moreObjective To identify interactions between genetic factors and current or recent smoking in relation to risk of developing systemic lupus erythematosus (SLE). Methods For the study, 673 patients with SLE (diagnosed according to the American College of Rheumatology 1997 updated classification criteria) were matched by age, sex, and race (first 3 genetic principal components) to 3,272 control subjects without a history of connective tissue disease. Smoking status was classified as current smoking/having recently quit smoking within 4 years before diagnosis (or matched index date for controls) versus distant past/never smoking. In total, 86 single-nucleotide polymorphisms and 10 classicHLAalleles previously associated with SLE were included in a weighted genetic risk score (wGRS), with scores dichotomized as either low or high based on the median value in control subjects (low wGRS being defined as less than or equal to the control median; high wGRS being defined as greater than the control median). Conditional logistic regression models were used to estimate both the risk of SLE and risk of anti-double-stranded DNA autoantibody-positive (dsDNA+) SLE. Additive interactions were assessed using the attributable proportion (AP) due to interaction, and multiplicative interactions were assessed using a chi-square test (with 1 degree of freedom) for the wGRS and for individual risk alleles. Separate repeated analyses were carried out among subjects of European ancestry only. Results The mean +/- SD age of the SLE patients at the time of diagnosis was 36.4 +/- 15.3 years. Among the 673 SLE patients included, 92.3% were female and 59.3% were dsDNA+. Ethnic distributions were as follows: 75.6% of European ancestry, 4.5% of Asian ancestry, 11.7% of African ancestry, and 8.2% classified as other ancestry. A high wGRS (odds ratio [OR] 2.0,P= 1.0 x 10(-51)versus low wGRS) and a status of current/recent smoking (OR 1.5,P= 0.0003 versus distant past/never smoking) were strongly associated with SLE risk, with significant additive interaction (AP 0.33,P= 0.0012), and associations with the risk of anti-dsDNA+ SLE were even stronger. No significant multiplicative interactions with the total wGRS (P= 0.58) or with theHLA-only wGRS (P= 0.06) were found. Findings were similar in analyses restricted to only subjects of European ancestry. Conclusion The strong additive interaction between an updated SLE genetic risk score and current/recent smoking suggests that smoking may influence specific genes in the pathogenesis of SLE. Show less
Knevel, R.; Cessie, S. le; Terao, C.C.; Slowikowski, K.; Cui, J.; Huizinga, T.W.J.; ... ; Raychaudhuri, S. 2020
It is challenging to quickly diagnose slowly progressing diseases. To prioritize multiple related diagnoses, we developed G-PROB (Genetic Probability tool) to calculate the probability of different... Show moreIt is challenging to quickly diagnose slowly progressing diseases. To prioritize multiple related diagnoses, we developed G-PROB (Genetic Probability tool) to calculate the probability of different diseases for a patient using genetic risk scores. We tested G-PROB for inflammatory arthritis-causing diseases (rheumatoid arthritis, systemic lupus erythematosus, spondyloarthropathy, psoriatic arthritis, and gout). After validating on simulated data, we tested G-PROB in three cohorts: 1211 patients identified by International Classification of Diseases (ICD) codes within the eMERGE database, 245 patients identified through ICD codes and medical record review within the Partners Biobank, and 243 patients first presenting with unexplained inflammatory arthritis and with final diagnoses by record review within the Partners Biobank. Calibration of G-probabilities with disease status was high, with regression coefficients from 0.90 to 1.08 (1.00 is ideal). G-probabilities discriminated true diagnoses across the three cohorts with pooled areas under the curve (95% CI) of 0.69 (0.67 to 0.71), 0.81 (0.76 to 0.84), and 0.84 (0.81 to 0.86), respectively. For all patients, at least one disease could be ruled out, and in 45% of patients, a likely diagnosis was identified with a 64% positive predictive value. In 35% of cases, the clinician's initial diagnosis was incorrect. Initial clinical diagnosis explained 39% of the variance in final disease, which improved to 51% (P < 0.0001) after adding G-probabilities. Converting genotype information before a clinical visit into an interpretable probability value for five different inflammatory arthritides could potentially be used to improve the diagnostic efficiency of rheumatic diseases in clinical practice. Show less
Objective. Anti-tumor necrosis factor alpha (anti-TNF) therapy is a mainstay of treatment in rheumatoid arthritis (RA). The aim of the present study was to test established RA genetic risk factors... Show moreObjective. Anti-tumor necrosis factor alpha (anti-TNF) therapy is a mainstay of treatment in rheumatoid arthritis (RA). The aim of the present study was to test established RA genetic risk factors to determine whether the same alleles also influence the response to anti-TNF therapy. Methods. A total of 1,283 RA patients receiving etanercept, infliximab, or adalimumab therapy were studied from among an international collaborative consortium of 9 different RA cohorts. The primary end point compared RA patients with a good treatment response according to the European League Against Rheumatism (EULAR) response criteria (n = 505) with RA patients considered to be nonresponders (n = 316). The secondary end point was the change from baseline in the level of disease activity according to the Disease Activity Score in 28 joints (Delta DAS28). Clinical factors such as age, sex, and concomitant medications were tested as possible correlates of treatment response. Thirty-one single-nucleotide polymorphisms (SNPs) associated with the risk of RA were genotyped and tested for any association with treatment response, using univariate and multivariate logistic regression models. Results. Of the 31 RA-associated risk alleles, a SNP at the PTPRC (also known as CD45) gene locus (rs10919563) was associated with the primary end point, a EULAR good response versus no response (odds ratio [OR] 0.55, P = 0.0001 in the multivariate model). Similar results were obtained using the secondary end point, the Delta DAS28 (P = 0.0002). There was suggestive evidence of a stronger association in autoantibody-positive patients with RA (OR 0.55, 95% confidence interval [95% CI] 0.39-0.76) as compared with autoantibody-negative patients (OR 0.90, 95% CI 0.41-1.99). Conclusion. Statistically significant associations were observed between the response to anti-TNF therapy and an RA risk allele at the PTPRC gene locus. Additional studies will be required to replicate this finding in additional patient collections. Show less
To identify new genetic risk factors for rheumatoid arthritis, we conducted a genome-wide association study meta-analysis of 5,539 autoantibody-positive individuals with rheumatoid arthritis (cases... Show moreTo identify new genetic risk factors for rheumatoid arthritis, we conducted a genome-wide association study meta-analysis of 5,539 autoantibody-positive individuals with rheumatoid arthritis (cases) and 20,169 controls of European descent, followed by replication in an independent set of 6,768 rheumatoid arthritis cases and 8,806 controls. Of 34 SNPs selected for replication, 7 new rheumatoid arthritis risk alleles were identified at genome-wide significance (P < 5 x 10(-8)) in an analysis of all 41,282 samples. The associated SNPs are near genes of known immune function, including IL6ST, SPRED2, RBPJ, CCR6, IRF5 and PXK. We also refined associations at two established rheumatoid arthritis risk loci (IL2RA and CCL21) and confirmed the association at AFF3. These new associations bring the total number of confirmed rheumatoid arthritis risk loci to 31 among individuals of European ancestry. An additional 11 SNPs replicated at P < 0.05, many of which are validated autoimmune risk alleles, suggesting that most represent genuine rheumatoid arthritis risk alleles. Show less