Objective To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by... Show moreObjective To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects. Material and Methods We used 1872 billing codes in EHRs of 102 880 patients from 12 healthcare systems. Using tools borrowed from single-cell omics, we mitigated center-specific batch effects and performed clustering to identify patients with highly similar medical history patterns across the various centers. Our visualization method (PheSpec) depicts the phenotypic profile of clusters, applies a novel filtering of noninformative codes (Ranked Scope Pervasion), and indicates the most distinguishing features. Results We observed 114 clinically meaningful profiles, for example, linking prostate hyperplasia with cancer and diabetes with cardiovascular problems and grouping pediatric developmental disorders. Our framework identified disease subsets, exemplified by 6 "other headache" clusters, where phenotypic profiles suggested different underlying mechanisms: migraine, convulsion, injury, eye problems, joint pain, and pituitary gland disorders. Phenotypic patterns replicated well, with high correlations of >= 0.75 to an average of 6 (2-8) of the 12 different cohorts, demonstrating the consistency with which our method discovers disease history profiles. Discussion Costly clinical research ventures should be based on solid hypotheses. We repurpose methods from single-cell omics to build these hypotheses from observational EHR data, distilling useful information from complex data. Conclusion We establish a generalizable pipeline for the identification and replication of clinically meaningful (sub)phenotypes from widely available high-dimensional billing codes. This approach overcomes datatype problems and produces comprehensive visualizations of validation-ready phenotypes. Show less
Knevel, R.; Cessie, S. le; Terao, C.C.; Slowikowski, K.; Cui, J.; Huizinga, T.W.J.; ... ; Raychaudhuri, S. 2020
It is challenging to quickly diagnose slowly progressing diseases. To prioritize multiple related diagnoses, we developed G-PROB (Genetic Probability tool) to calculate the probability of different... Show moreIt is challenging to quickly diagnose slowly progressing diseases. To prioritize multiple related diagnoses, we developed G-PROB (Genetic Probability tool) to calculate the probability of different diseases for a patient using genetic risk scores. We tested G-PROB for inflammatory arthritis-causing diseases (rheumatoid arthritis, systemic lupus erythematosus, spondyloarthropathy, psoriatic arthritis, and gout). After validating on simulated data, we tested G-PROB in three cohorts: 1211 patients identified by International Classification of Diseases (ICD) codes within the eMERGE database, 245 patients identified through ICD codes and medical record review within the Partners Biobank, and 243 patients first presenting with unexplained inflammatory arthritis and with final diagnoses by record review within the Partners Biobank. Calibration of G-probabilities with disease status was high, with regression coefficients from 0.90 to 1.08 (1.00 is ideal). G-probabilities discriminated true diagnoses across the three cohorts with pooled areas under the curve (95% CI) of 0.69 (0.67 to 0.71), 0.81 (0.76 to 0.84), and 0.84 (0.81 to 0.86), respectively. For all patients, at least one disease could be ruled out, and in 45% of patients, a likely diagnosis was identified with a 64% positive predictive value. In 35% of cases, the clinician's initial diagnosis was incorrect. Initial clinical diagnosis explained 39% of the variance in final disease, which improved to 51% (P < 0.0001) after adding G-probabilities. Converting genotype information before a clinical visit into an interpretable probability value for five different inflammatory arthritides could potentially be used to improve the diagnostic efficiency of rheumatic diseases in clinical practice. Show less
Epidemiology and candidate gene studies indicate a shared genetic basis for celiac disease (CD) and rheumatoid arthritis (RA), but the extent of this sharing has not been systematically explored.... Show moreEpidemiology and candidate gene studies indicate a shared genetic basis for celiac disease (CD) and rheumatoid arthritis (RA), but the extent of this sharing has not been systematically explored. Previous studies demonstrate that 6 of the established non-HLA CD and RA risk loci (out of 26 loci for each disease) are shared between both diseases. We hypothesized that there are additional shared risk alleles and that combining genome-wide association study (GWAS) data from each disease would increase power to identify these shared risk alleles. We performed a meta-analysis of two published GWAS on CD (4,533 cases and 10,750 controls) and RA (5,539 cases and 17,231 controls). After genotyping the top associated SNPs in 2,169 CD cases and 2,255 controls, and 2,845 RA cases and 4,944 controls, 8 additional SNPs demonstrated P < 5 x 10(-8) in a combined analysis of all 50,266 samples, including four SNPs that have not been previously confirmed in either disease: rs10892279 near the DDX6 gene (P-combined = 1.2 x 10(-12)), rs864537 near CD247 (P-combined = 2.2 x 10(-11)), rs2298428 near UBE2L3 (P-combined = 2.5 x 10(-10)), and rs11203203 near UBASH3A (P-combined = 1.1 x 10(-8)). We also confirmed that 4 gene loci previously established in either CD or RA are associated with the other autoimmune disease at combined P<5 x 10(-8) (SH2B3, 8q24, STAT4, and TRAF1-C5). From the 14 shared gene loci, 7 SNPs showed a genome-wide significant effect on expression of one or more transcripts in the linkage disequilibrium (LD) block around the SNP. These associations implicate antigen presentation and T-cell activation as a shared mechanism of disease pathogenesis and underscore the utility of cross-disease meta-analysis for identification of genetic risk factors with pleiotropic effects between two clinically distinct diseases. Show less
Cui, J.; Saevarsdottir, S.; Thomson, B.; Padyukov, L.; Helm-van Mil, A.H.M. van der; Nititham, J.; ... ; Ge 2010
Objective. Anti-tumor necrosis factor alpha (anti-TNF) therapy is a mainstay of treatment in rheumatoid arthritis (RA). The aim of the present study was to test established RA genetic risk factors... Show moreObjective. Anti-tumor necrosis factor alpha (anti-TNF) therapy is a mainstay of treatment in rheumatoid arthritis (RA). The aim of the present study was to test established RA genetic risk factors to determine whether the same alleles also influence the response to anti-TNF therapy. Methods. A total of 1,283 RA patients receiving etanercept, infliximab, or adalimumab therapy were studied from among an international collaborative consortium of 9 different RA cohorts. The primary end point compared RA patients with a good treatment response according to the European League Against Rheumatism (EULAR) response criteria (n = 505) with RA patients considered to be nonresponders (n = 316). The secondary end point was the change from baseline in the level of disease activity according to the Disease Activity Score in 28 joints (Delta DAS28). Clinical factors such as age, sex, and concomitant medications were tested as possible correlates of treatment response. Thirty-one single-nucleotide polymorphisms (SNPs) associated with the risk of RA were genotyped and tested for any association with treatment response, using univariate and multivariate logistic regression models. Results. Of the 31 RA-associated risk alleles, a SNP at the PTPRC (also known as CD45) gene locus (rs10919563) was associated with the primary end point, a EULAR good response versus no response (odds ratio [OR] 0.55, P = 0.0001 in the multivariate model). Similar results were obtained using the secondary end point, the Delta DAS28 (P = 0.0002). There was suggestive evidence of a stronger association in autoantibody-positive patients with RA (OR 0.55, 95% confidence interval [95% CI] 0.39-0.76) as compared with autoantibody-negative patients (OR 0.90, 95% CI 0.41-1.99). Conclusion. Statistically significant associations were observed between the response to anti-TNF therapy and an RA risk allele at the PTPRC gene locus. Additional studies will be required to replicate this finding in additional patient collections. Show less
To identify new genetic risk factors for rheumatoid arthritis, we conducted a genome-wide association study meta-analysis of 5,539 autoantibody-positive individuals with rheumatoid arthritis (cases... Show moreTo identify new genetic risk factors for rheumatoid arthritis, we conducted a genome-wide association study meta-analysis of 5,539 autoantibody-positive individuals with rheumatoid arthritis (cases) and 20,169 controls of European descent, followed by replication in an independent set of 6,768 rheumatoid arthritis cases and 8,806 controls. Of 34 SNPs selected for replication, 7 new rheumatoid arthritis risk alleles were identified at genome-wide significance (P < 5 x 10(-8)) in an analysis of all 41,282 samples. The associated SNPs are near genes of known immune function, including IL6ST, SPRED2, RBPJ, CCR6, IRF5 and PXK. We also refined associations at two established rheumatoid arthritis risk loci (IL2RA and CCL21) and confirmed the association at AFF3. These new associations bring the total number of confirmed rheumatoid arthritis risk loci to 31 among individuals of European ancestry. An additional 11 SNPs replicated at P < 0.05, many of which are validated autoimmune risk alleles, suggesting that most represent genuine rheumatoid arthritis risk alleles. Show less