Background Contemporary pulmonary embolism (PE) research, in many cases, relies on data from electronic health records (EHRs) and administrative databases that use International Classification of... Show moreBackground Contemporary pulmonary embolism (PE) research, in many cases, relies on data from electronic health records (EHRs) and administrative databases that use International Classification of Diseases (ICD) codes. Natural language processing (NLP) tools can be used for automated chart review and patient identification. However, there remains uncertainty with the validity of ICD-10 codes or NLP algorithms for patient identification.Methods The PE-EHR+ study has been designed to validate ICD-10 codes as Principal Discharge Diagnosis, or Secondary Discharge Diagnoses, as well as NLP tools set out in prior studies to identify patients with PE within EHRs. Manual chart review by two independent abstractors by predefined criteria will be the reference standard. Sensitivity, specificity, and positive and negative predictive values will be determined. We will assess the discriminatory function of code subgroups for intermediate- and high-risk PE. In addition, accuracy of NLP algorithms to identify PE from radiology reports will be assessed.Results A total of 1,734 patients from the Mass General Brigham health system have been identified. These include 578 with ICD-10 Principal Discharge Diagnosis codes for PE, 578 with codes in the secondary position, and 578 without PE codes during the index hospitalization. Patients within each group were selected randomly from the entire pool of patients at the Mass General Brigham health system. A smaller subset of patients will also be identified from the Yale-New Haven Health System. Data validation and analyses will be forthcoming.Conclusions The PE-EHR+ study will help validate efficient tools for identification of patients with PE in EHRs, improving the reliability of efficient observational studies or randomized trials of patients with PE using electronic databases. Show less
The clinical notes in electronic health records have many possibilities for predictive tasks in text classification. The interpretability of these classification models for the clinical domain is... Show moreThe clinical notes in electronic health records have many possibilities for predictive tasks in text classification. The interpretability of these classification models for the clinical domain is critical for decision making. Using topic models for text classification of electronic health records for a predictive task allows for the use of topics as features, thus making the text classification more interpretable. However, selecting the most effective topic model is not trivial. In this work, we propose considerations for selecting a suitable topic model based on the predictive performance and interpretability measure for text classification. We compare 17 different topic models in terms of both interpretability and predictive performance in an inpatient violence prediction task using clinical notes. We find no correlation between interpretability and predictive performance. In addition, our results show that although no model outperforms the other models on both variables, our proposed fuzzy topic modeling algorithm (FLSA-W) performs best in most settings for interpretability, whereas two state-of-the-art methods (ProdLDA and LSI) achieve the best predictive performance. Show less
Objective To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by... Show moreObjective To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects. Material and Methods We used 1872 billing codes in EHRs of 102 880 patients from 12 healthcare systems. Using tools borrowed from single-cell omics, we mitigated center-specific batch effects and performed clustering to identify patients with highly similar medical history patterns across the various centers. Our visualization method (PheSpec) depicts the phenotypic profile of clusters, applies a novel filtering of noninformative codes (Ranked Scope Pervasion), and indicates the most distinguishing features. Results We observed 114 clinically meaningful profiles, for example, linking prostate hyperplasia with cancer and diabetes with cardiovascular problems and grouping pediatric developmental disorders. Our framework identified disease subsets, exemplified by 6 "other headache" clusters, where phenotypic profiles suggested different underlying mechanisms: migraine, convulsion, injury, eye problems, joint pain, and pituitary gland disorders. Phenotypic patterns replicated well, with high correlations of >= 0.75 to an average of 6 (2-8) of the 12 different cohorts, demonstrating the consistency with which our method discovers disease history profiles. Discussion Costly clinical research ventures should be based on solid hypotheses. We repurpose methods from single-cell omics to build these hypotheses from observational EHR data, distilling useful information from complex data. Conclusion We establish a generalizable pipeline for the identification and replication of clinically meaningful (sub)phenotypes from widely available high-dimensional billing codes. This approach overcomes datatype problems and produces comprehensive visualizations of validation-ready phenotypes. Show less
Khan, A.; Shang, N.; Petukhova, L.; Zhang, J.; Shen, Y.F.; Hebbring, S.J.; ... ; Kiryluk, K. 2021
Background Genetic variants in complement genes have been associated with a wide range of human disease states, but well-powered genetic association studies of complement activation have not been... Show moreBackground Genetic variants in complement genes have been associated with a wide range of human disease states, but well-powered genetic association studies of complement activation have not been performed in large multiethnic cohorts. Methods We performed medical records?based genome-wide and phenome-wide association studies for plasma C3 and C4 levels among participants of the Electronic Medical Records and Genomics (eMERGE) network. Results In a GWAS for C3 levels in 3949 individuals, we detected two genome-wide significant loci: chr.1q31.3 (CFH locus; rs3753396-A; ?=0.20; 95% CI, 0.14 to 0.25; P=1.52x10(-11)) and chr.19p13.3 (C3 locus; rs11569470-G; ?=0.19; 95% CI, 0.13 to 0.24; P=1.29x10(-8)). These two loci explained approximately 2% of variance in C3 levels. GWAS for C4 levels involved 3998 individuals and revealed a genome-wide significant locus at chr.6p21.32 (C4 locus; rs3135353-C; ?=0.40; 95% CI, 0.34 to 0.45; P=4.58x10(-35)). This locus explained approximately 13% of variance in C4 levels. The multiallelic copy number variant analysis defined two structural genomic C4 variants with large effect on blood C4 levels: C4-BS (?=?0.36; 95% CI, ?0.42 to ?0.30; P=2.98x10(-22)) and C4-AL-BS (?=0.25; 95% CI, 0.21 to 0.29; P=8.11x10(-23)). Overall, C4 levels were strongly correlated with copy numbers of C4A and C4B genes. In comprehensive phenome-wide association studies involving 102,138 eMERGE participants, we cataloged a full spectrum of autoimmune, cardiometabolic, and kidney diseases genetically related to systemic complement activation. Conclusions We discovered genetic determinants of plasma C3 and C4 levels using eMERGE genomic data linked to electronic medical records. Genetic variants regulating C3 and C4 levels have large effects and multiple clinical correlations across the spectrum of complement-related diseases in humans.Significance Statement The complement pathway represents one of the critical arms of the innate immune system. We combined genome-wide and phenome-wide association studies using medical records data for C3 and C4 levels to discover common genetic variants controlling systemic complement activation. Three genome-wide significant loci had large effects on complement levels. These loci encode three critical complement genes: CFH, C3, and C4. We performed detailed functional annotations of the significant loci, including multiallelic copy number variant analysis of the C4 locus to define two structural genomic variants with large effects on C4 levels. Blood C4 levels were strongly correlated with the copy number of C4A and C4B genes. Lastly, using genome-wide genetic correlations and electronic health records?based phenome-wide association studies in 102,138 participants, we catalogued a spectrum of human diseases genetically related to systemic complement activation, including inflammatory, autoimmune, cardiometabolic, and kidney diseases. Show less
Car, L.T.; Kyaw, B.M.; Panday, R.S.N.; Kleij, R. van der; Chavannes, N.; Majeed, A.; Car, J. 2021
Background: Medical schools worldwide are accelerating the introduction of digital health courses into their curricula. The COVID-19 pandemic has contributed to this swift and widespread transition... Show moreBackground: Medical schools worldwide are accelerating the introduction of digital health courses into their curricula. The COVID-19 pandemic has contributed to this swift and widespread transition to digital health and education. However, the need for digital health competencies goes beyond the COVID-19 pandemic because they are becoming essential for the delivery of effective, efficient, and safe care.Objective: This review aims to collate and analyze studies evaluating digital health education for medical students to inform the development of future courses and identify areas where curricula may need to be strengthened.Methods: We carried out a scoping review by following the guidance of the Joanna Briggs Institute, and the results were reported in accordance with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. We searched 6 major bibliographic databases and gray literature sources for articles published between January 2000 and November 2019. Two authors independently screened the retrieved citations and extracted the data from the included studies. Discrepancies were resolved by consensus discussions between the authors. The findings were analyzed using thematic analysis and presented narratively.Results: A total of 34 studies focusing on different digital courses were included in this review. Most of the studies (22/34, 65%) were published between 2010 and 2019 and originated in the United States (20/34, 59%). The reported digital health courses were mostly elective (20/34, 59%), were integrated into the existing curriculum (24/34, 71%), and focused mainly on medical informatics (17/34, 50%). Most of the courses targeted medical students from the first to third year (17/34, 50%), and the duration of the courses ranged from 1 hour to 3 academic years. Most of the studies (22/34, 65%) reported the use of blended education. A few of the studies (6/34, 18%) delivered courses entirely digitally by using online modules, offline learning, massive open online courses, and virtual patient simulations. The reported courses used various assessment approaches such as paper-based assessments, in-person observations, and online assessments. Most of the studies (30/34, 88%) evaluated courses mostly by using an uncontrolled before-and-after design and generally reported improvements in students' learning outcomes.Conclusions: Digital health courses reported in literature are mostly elective, focus on a single area of digital health, and lack robust evaluation. They have diverse delivery, development, and assessment approaches. There is an urgent need for high-quality studies that evaluate digital health education. Show less
Introduction Early recognition of individuals with increased risk of sudden cardiac arrest (SCA) remains challenging. SCA research so far has used data from cardiologist care, but missed most SCA... Show moreIntroduction Early recognition of individuals with increased risk of sudden cardiac arrest (SCA) remains challenging. SCA research so far has used data from cardiologist care, but missed most SCA victims, since they were only in general practitioner (GP) care prior to SCA. Studying individuals with type 2 diabetes (T2D) in GP care may help solve this problem, as they have increased risk for SCA, and rich clinical datasets, since they regularly visit their GP for check-up measurements. This information can be further enriched with extensive genetic and metabolic information. Aim To describe the study protocol of the REcognition of Sudden Cardiac arrest vUlnErability in Diabetes (RESCUED) project, which aims at identifying clinical, genetic and metabolic factors contributing to SCA risk in individuals with T2D, and to develop a prognostic model for the risk of SCA. Methods The RESCUED project combines data from dedicated SCA and T2D cohorts, and GP data, from the same region in the Netherlands. Clinical data, genetic data (common and rare variant analysis) and metabolic data (metabolomics) will be analysed (using classical analysis techniques and machine learning methods) and combined into a prognostic model for risk of SCA. Conclusion The RESCUED project is designed to increase our ability at early recognition of elevated SCA risk through an innovative strategy of focusing on GP data and a multidimensional methodology including clinical, genetic and metabolic analyses. Show less
BackgroundElectronic health records (EHRs) are increasingly used for research; however, multicomponent outcome measures such as daily functioning cannot yet be readily extracted.AimTo evaluate... Show moreBackgroundElectronic health records (EHRs) are increasingly used for research; however, multicomponent outcome measures such as daily functioning cannot yet be readily extracted.AimTo evaluate whether an electronic frailty index based on routine primary care data can be used as a measure tor daily functioning in research with community-dwelling older persons [aged >=( )years].Design and settingCohort study among participants of the Integrated Systemic Care lot Older People (ISCOPE) trial (11 476 eligible; 7285 in observational cohort; 3141 in trial; over-representation of trail people).MethodAt baseline (T0) and after 12 months (112), daily functioning was measured with the Groningen Activities Restriction Scale (GARS, range 18-72). Electronic frailty index scores (range 0-1) at T0 and T12 were computed from the EHRs. The electronic frailty index (electronic Frailty Index - Utrecht) was tested for responsiveness and compared with the GARS as a gold standard for daily functioning.ResultsIn total, 1390 participants with complete EHR and follow-up data were selected (31.4% male; median age - 81 years. interguartile range = 78-851. The electronic frailty index increased with age, was higher for females, and lower for participants living with a partner. It Was responsive after an acute major - medical event; however, the correlation between the electronic frailty index arid GARS at T0 arid over time was limited.ConclusionBecause the electronic frailly index does riot reflect daily functioning, further research on new methods to measure daily functioning with routine care data (for example, other proxies) is needed before EHRs can be a useful data source for research with older persons. Show less
Background: The lack of interoperable IT systems between residential aged care facilities (RACF) and general practitioners (GP) in primary care settings in Australia introduces the potential for... Show moreBackground: The lack of interoperable IT systems between residential aged care facilities (RACF) and general practitioners (GP) in primary care settings in Australia introduces the potential for medication discrepancies and other medication errors. The aim of the GRACEMED study is to determine the extent and potential severity of medication discrepancies between general practice and RACFs, and identify factors associated with medication discrepancies.Methods: A cross sectional study of medication discrepancies between RACF medication orders and GP medication lists was conducted in the Sydney North Health Network, Australia. A random sample of RACF residents was included from practice lists provided by the general practices. RACF medication orders and GP medication lists for the included residents were compared, and medication discrepancies between the two sources were identified and characterised in terms of discrepancy type, potential for harm and associated factors.Results: 31 GPs and 203 residents were included in the study. A total of 1777 discrepancies were identified giving an overall discrepancy rate of 72.6 discrepancies for every 100 medications. Omissions were the most common discrepancy type (35.2%,) followed by dose discrepancies (34.4%) and additions (30.4%). 48.5% of residents had discrepancy with the potential to result in moderate harm and 9.8% had a discrepancy with the potential for severe harm. Number of medications prescribed was the only factor associated with medication discrepancies.Conclusion: Increased use of systems that allow information sharing and improved interoperability of clinical information is urgently needed to address medication safety issues experienced by RACF residents. Show less