Background In biomedicine, machine learning (ML) has proven beneficial for the prognosis and diagnosis of dif‑ferent diseases, including cancer and neurodegenerative disorders. For rare diseases,... Show moreBackground In biomedicine, machine learning (ML) has proven beneficial for the prognosis and diagnosis of dif‑ferent diseases, including cancer and neurodegenerative disorders. For rare diseases, however, the requirementfor large datasets often prevents this approach. Huntington’s disease (HD) is a rare neurodegenerative disorder causedby a CAG repeat expansion in the coding region of the huntingtin gene. The world’s largest observational studyfor HD, Enroll‑HD, describes over 21,000 participants. As such, Enroll‑HD is amenable to ML methods. In this study, wepre‑processed and imputed Enroll‑HD with ML methods to maximise the inclusion of participants and variables. Withthis dataset we developed models to improve the prediction of the age at onset (AAO) and compared it to the well‑established Langbehn formula. In addition, we used recurrent neural networks (RNNs) to demonstrate the utility of MLmethods for longitudinal datasets, assessing driving capabilities by learning from previous participant assessments.Results Simple pre‑processing imputed around 42% of missing values in Enroll‑HD. Also, 167 variables were retainedas a result of imputing with ML. We found that multiple ML models were able to outperform the Langbehn formula.The best ML model (light gradient boosting machine) improved the prognosis of AAO compared to the Langbehnformula by 9.2%, based on root mean squared error in the test set. In addition, our ML model provides more accu‑rate prognosis for a wider CAG repeat range compared to the Langbehn formula. Driving capability was predictedwith an accuracy of 85.2%. The resulting pre‑processing workflow and code to train the ML models are available to beused for related HD predictions at: https:// github. com/ Jaspe rO98/ hdml/ tree/ main.Conclusions Our pre‑processing workflow made it possible to resolve the missing values and include most par‑ticipants and variables in Enroll‑HD. We show the added value of a ML approach, which improved AAO predictionsand allowed for the development of an advisory model that can assist clinicians and participants in estimating futuredriving capability. Show less
Ongoing health challenges, such as the increased global burden of chronic disease, are increasingly answered by calls for personalized approaches to healthcare. Genomic medicine, a vital component... Show moreOngoing health challenges, such as the increased global burden of chronic disease, are increasingly answered by calls for personalized approaches to healthcare. Genomic medicine, a vital component of these personalization strategies, is applied in risk assessment, prevention, prognostication, and therapeutic targeting. However, several practical, ethical, and technological challenges remain. Across Europe, Personal Health Data Space (PHDS) projects are under development aiming to establish patient-centered, interoperable data ecosystems balancing data access, control, and use for individual citizens to complement the research and commercial focus of the European Health Data Space provisions. The current study explores healthcare users' and health care professionals' perspectives on personalized genomic medicine and PHDS solutions, in casu the Personal Genetic Locker (PGL). A mixed-methods design was used, including surveys, interviews, and focus groups. Several meta-themes were generated from the data: (i) participants were interested in genomic information; (ii) participants valued data control, robust infrastructure, and sharing data with non-commercial stakeholders; (iii) autonomy was a central concern for all participants; (iv) institutional and interpersonal trust were highly significant for genomic medicine; and (v) participants encouraged the implementation of PHDSs since PHDSs were thought to promote the use of genomic data and enhance patients' control over their data. To conclude, we formulated several facilitators to implement genomic medicine in healthcare based on the perspectives of a diverse set of stakeholders. Show less
Although all the technical components supporting fully orchestrated Digital Twins (DT) currently exist, what remains missing is a conceptual clarification and analysis of a more generalized concept... Show moreAlthough all the technical components supporting fully orchestrated Digital Twins (DT) currently exist, what remains missing is a conceptual clarification and analysis of a more generalized concept of a DT that is made FAIR, that is, universally machine actionable. This methodological overview is a first step toward this clarification. We present a review of previously developed semantic artifacts and how they may be used to compose a higher-order data model referred to here as a FAIR Digital Twin (FDT). We propose an architectural design to compose, store and reuse FDTs supporting data intensive research, with emphasis on privacy by design and their use in GDPR compliant open science. Show less
Schultes, E.; Roos, M.; Santos, L.O.B.D.; Guizzardi, G.; Bouwman, J.; Hankemeier, T.; ... ; Mons, B. 2022
Although all the technical components supporting fully orchestrated Digital Twins (DT) currently exist, what remains missing is a conceptual clarification and analysis of a more generalized concept... Show moreAlthough all the technical components supporting fully orchestrated Digital Twins (DT) currently exist, what remains missing is a conceptual clarification and analysis of a more generalized concept of a DT that is made FAIR, that is, universally machine actionable. This methodological overview is a first step toward this clarification. We present a review of previously developed semantic artifacts and how they may be used to compose a higher-order data model referred to here as a FAIR Digital Twin (FDT). We propose an architectural design to compose, store and reuse FDTs supporting data intensive research, with emphasis on privacy by design and their use in GDPR compliant open science. Show less
Background The COVID-19 pandemic has challenged healthcare systems and research worldwide. Data is collected all over the world and needs to be integrated and made available to other researchers... Show moreBackground The COVID-19 pandemic has challenged healthcare systems and research worldwide. Data is collected all over the world and needs to be integrated and made available to other researchers quickly. However, the various heterogeneous information systems that are used in hospitals can result in fragmentation of health data over multiple data 'silos' that are not interoperable for analysis. Consequently, clinical observations in hospitalised patients are not prepared to be reused efficiently and timely. There is a need to adapt the research data management in hospitals to make COVID-19 observational patient data machine actionable, i.e. more Findable, Accessible, Interoperable and Reusable (FAIR) for humans and machines. We therefore applied the FAIR principles in the hospital to make patient data more FAIR. Results In this paper, we present our FAIR approach to transform COVID-19 observational patient data collected in the hospital into machine actionable digital objects to answer medical doctors' research questions. With this objective, we conducted a coordinated FAIRification among stakeholders based on ontological models for data and metadata, and a FAIR based architecture that complements the existing data management. We applied FAIR Data Points for metadata exposure, turning investigational parameters into a FAIR dataset. We demonstrated that this dataset is machine actionable by means of three different computational activities: federated query of patient data along open existing knowledge sources across the world through the Semantic Web, implementing Web APIs for data query interoperability, and building applications on top of these FAIR patient data for FAIR data analytics in the hospital. Conclusions Our work demonstrates that a FAIR research data management plan based on ontological models for data and metadata, open Science, Semantic Web technologies, and FAIR Data Points is providing data infrastructure in the hospital for machine actionable FAIR Digital Objects. This FAIR data is prepared to be reused for federated analysis, linkable to other FAIR data such as Linked Open Data, and reusable to develop software applications on top of them for hypothesis generation and knowledge discovery. Show less
Background The European Platform on Rare Disease Registration (EU RD Platform) aims to address the fragmentation of European rare disease (RD) patient data, scattered among hundreds of independent... Show moreBackground The European Platform on Rare Disease Registration (EU RD Platform) aims to address the fragmentation of European rare disease (RD) patient data, scattered among hundreds of independent and non-coordinating registries, by establishing standards for integration and interoperability. The first practical output of this effort was a set of 16 Common Data Elements (CDEs) that should be implemented by all RD registries. Interoperability, however, requires decisions beyond data elements - including data models, formats, and semantics. Within the European Joint Programme on Rare Diseases (EJP RD), we aim to further the goals of the EU RD Platform by generating reusable RD semantic model templates that follow the FAIR Data Principles. Results Through a team-based iterative approach, we created semantically grounded models to represent each of the CDEs, using the SemanticScience Integrated Ontology as the core framework for representing the entities and their relationships. Within that framework, we mapped the concepts represented in the CDEs, and their possible values, into domain ontologies such as the Orphanet Rare Disease Ontology, Human Phenotype Ontology and National Cancer Institute Thesaurus. Finally, we created an exemplar, reusable ETL pipeline that we will be deploying over these non-coordinating data repositories to assist them in creating model-compliant FAIR data without requiring site-specific coding nor expertise in Linked Data or FAIR. Conclusions Within the EJP RD project, we determined that creating reusable, expert-designed templates reduced or eliminated the requirement for our participating biomedical domain experts and rare disease data hosts to understand OWL semantics. This enabled them to publish highly expressive FAIR data using tools and approaches that were already familiar to them. Show less
Kuijper, E.C.; Toonen, L.J.A.; Overzier, M.; Tsonaka, R.; Hettne, K.; Roos, M.; ... ; Mina, E. 2022
While the genetic cause of Huntington disease (HD) is known since 1993, still no cure exists. Therapeutic development would benefit from a method to monitor disease progression and treatment... Show moreWhile the genetic cause of Huntington disease (HD) is known since 1993, still no cure exists. Therapeutic development would benefit from a method to monitor disease progression and treatment efficacy, ideally using blood biomarkers. Previously, HD-specific signatures were identified in human blood representing signatures in human brain, showing biomarker potential. Since drug candidates are generally first screened in rodent models, we aimed to identify HD signatures in blood and brain of YAC128 HD mice and compare these with previously identified human signatures. RNA sequencing was performed on blood withdrawn at two time points and four brain regions from YAC128 and control mice. Weighted gene co-expression network analysis was used to identify clusters of co-expressed genes (modules) associated with the HD genotype. These HD-associated modules were annotated via text-mining to determine the biological processes they represented. Subsequently, the processes from mouse blood were compared with mouse brain, showing substantial overlap, including protein modification, cell cycle, RNA splicing, nuclear transport, and vesicle-mediated transport. Moreover, the disease-associated processes shared between mouse blood and brain were highly comparable to those previously identified in human blood and brain. In addition, we identified HD blood-specific pathology, confirming previous findings for peripheral pathology in blood. Finally, we identified hub genes for HD-associated blood modules and proposed a strategy for gene selection for development of a disease progression monitoring panel. Show less
Introduction: Existing methods to make data Findable, Accessible, Interoperable, and Reusable (FAIR) are usually carried out in a post hoc manner: after the research project is conducted and data... Show moreIntroduction: Existing methods to make data Findable, Accessible, Interoperable, and Reusable (FAIR) are usually carried out in a post hoc manner: after the research project is conducted and data are collected. De-novo FAIRification, on the other hand, incorporates the FAIRification steps in the process of a research project. In medical research, data is often collected and stored via electronic Case Report Forms (eCRFs) in Electronic Data Capture (EDC) systems. By implementing a de novo FAIRification process in such a system, the reusability and, thus, scalability of FAIRification across research projects can be greatly improved. In this study, we developed and implemented a novel method for de novo FAIRification via an EDC system. We evaluated our method by applying it to the Registry of Vascular Anomalies (VASCA). Methods: Our EDC and research project independent method ensures that eCRF data entered into an EDC system can be transformed into machine-readable, FAIR data using a semantic data model (a canonical representation of the data, based on ontology concepts and semantic web standards) and mappings from the model to questions on the eCRF. The FAIRified data are stored in a triple store and can, together with associated metadata, be accessed and queried through a FAIR Data Point. The method was implemented in Castor EDC, an EDC system, through a data transformation application. The FAIRness of the output of the method, the FAIRified data and metadata, was evaluated using the FAIR Evaluation Services. Results: We successfully applied our FAIRification method to the VASCA registry. Data entered on eCRFs is automatically transformed into machine-readable data and can be accessed and queried using SPARQL queries in the FAIR Data Point. Twenty-one FAIR Evaluator tests pass and one test regarding the metadata persistence policy fails, since this policy is not in place yet. Conclusion: In this study, we developed a novel method for de novo FAIRification via an EDC system. Its application in the VASCA registry and the automated FAIR evaluation show that the method can be used to make clinical research data FAIR when they are entered in an eCRF without any intervention from data management Show less
Background Patient data registries that are FAIR-Findable, Accessible, Interoperable, and Reusable for humans and computers-facilitate research across multiple resources. This is particularly... Show moreBackground Patient data registries that are FAIR-Findable, Accessible, Interoperable, and Reusable for humans and computers-facilitate research across multiple resources. This is particularly relevant to rare diseases, where data often are scarce and scattered. Specific research questions can be asked across FAIR rare disease registries and other FAIR resources without physically combining the data. Further, FAIR implies well-defined, transparent access conditions, which supports making sensitive data as open as possible and as closed as necessary. Results We successfully developed and implemented a process of making a rare disease registry for vascular anomalies FAIR from its conception-de novo. Here, we describe the five phases of this process in detail: (i) pre-FAIRification, (ii) facilitating FAIRification, (iii) data collection, (iv) generating FAIR data in real-time, and (v) using FAIR data. This includes the creation of an electronic case report form and a semantic data model of the elements to be collected (in this case: the "Set of Common Data Elements for Rare Disease Registration" released by the European Commission), and the technical implementation of automatic, real-time data FAIRification in an Electronic Data Capture system. Further, we describe how we contribute to the four facets of FAIR, and how our FAIRification process can be reused by other registries. Conclusions In conclusion, a detailed de novo FAIRification process of a registry for vascular anomalies is described. To a large extent, the process may be reused by other rare disease registries, and we envision this work to be a substantial contribution to an ecosystem of FAIR rare disease resources. Show less
Duistermaat, H.; Kruijer, J. D.; Roos, M.; Roskam, H. C. 2021
Rett syndrome (RTT) is a rare neurological disorder mostly caused by a genetic variation in MECP2. Making new MECP2 variants and the related phenotypes available provides data for better... Show moreRett syndrome (RTT) is a rare neurological disorder mostly caused by a genetic variation in MECP2. Making new MECP2 variants and the related phenotypes available provides data for better understanding of disease mechanisms and faster identification of variants for diagnosis. This is, however, currently hampered by the lack of interoperability between genotype-phenotype databases. Here, we demonstrate on the example of MECP2 in RTT that by making the genotype-phenotype data more Findable, Accessible, Interoperable, and Reusable (FAIR), we can facilitate prioritization and analysis of variants. In total, 10,968 MECP2 variants were successfully integrated. Among these variants 863 unique confirmed RTT causing and 209 unique confirmed benign variants were found. This dataset was used for comparison of pathogenicity predicting tools, protein consequences, and identification of ambiguous variants. Prediction tools generally recognised the RTT causing and benign variants, however, there was a broad range of overlap Nineteen variants were identified that were annotated as both disease-causing and benign, suggesting that there are additional factors in these cases contributing to disease development. Show less
Lin, N. van; Paliouras, G.; Vroom, E.; Hoen, P.A.C. 't; Roos, M. 2021
Background: For patients with rare diseases such as Duchenne and Becker muscular dystrophy (DMD/BMD), access to their health data is key to being able to advocate for themselves and be in control... Show moreBackground: For patients with rare diseases such as Duchenne and Becker muscular dystrophy (DMD/BMD), access to their health data is key to being able to advocate for themselves and be in control of their care. Since 2018, the DMD/BMD patient community has been committed to making DMD/BMD-related data FAIR, i.e., Findable, Accessible, Interoperable, and Reusable. On March 3, 2021, the second international meeting on FAIR data sharing for DMD/BMD was held virtually.Objective: The aim of this meeting report is to summarize the presentations and discussions of the meeting.Methods: During this meeting, the progress of FAIRification efforts since the first international meeting in 2019, new developments, stakeholder perspectives, and experiences from implementing FAIR data principles in practice were presented and discussed.Results: Over 120 attendees representing various stakeholder groups (ie, patient organizations, clinicians, clinical and academic researchers, pharmaceutical companies, regulators, and EU organizations) from 22 countries participated in the meeting. This meeting report summarizes the presentations and discussions from the meeting, provides an overview of the key lessons learned since the first meeting, and outlines the next steps.Conclusions: Patient organizations are key drivers of the FAIRification process in practice and dialogue with stakeholders is critical to success. Show less
One application of personalized medicine is the tailoring of medication to the individual, so that the medication will have the highest chance of success. In order to individualize medication, one... Show moreOne application of personalized medicine is the tailoring of medication to the individual, so that the medication will have the highest chance of success. In order to individualize medication, one must have a complete inventory of all current pharmaceutical compounds (a detailed formulary) combined with pharmacogenetic datasets, the genetic makeup of the patient, their (medical) family history and other health-related data. For healthcare professionals to make the best use of this information, it must be visualized in a way that makes the most medically relevant data accessible for their decision-making. Similarly, to enable bioinformatics analysis of these data, it must be prepared and provided through an interface for controlled computational analysis. Due to the high degree of personal information gathered for such initiatives, privacy-sensitive implementation choices and ethical standards are paramount. The Personal Genetic Locker project provides an approach to enable the use of personal genomic data in primary care. In this paper, we provide a description of the Personal Genetic Locker project and show its utility through a use case based on open standards, which is illustrated by the 4MedBox system. Show less
Jacobsen, A.; Miranda Azevedo, R. de; Juty, N.; Batista, D.; Coles, S.; Cornet, R.; ... ; Schultes, E. 2020
The FAIR principles have been widely cited, endorsed and adopted by a broad range of stakeholders since their publication in 2016. By intention, the 15 FAIR guiding principles do not dictate... Show moreThe FAIR principles have been widely cited, endorsed and adopted by a broad range of stakeholders since their publication in 2016. By intention, the 15 FAIR guiding principles do not dictate specific technological implementations, but provide guidance for improving Findability, Accessibility, Interoperability and Reusability of digital resources. This has likely contributed to the broad adoption of the FAIR principles, because individual stakeholder communities can implement their own FAIR solutions. However, it has also resulted in inconsistent interpretations that carry the risk of leading to incompatible implementations. Thus, while the FAIR principles are formulated on a high level and may be interpreted and implemented in different ways, for true interoperability we need to support convergence in implementation choices that are widely accessible and (re)-usable. We introduce the concept of FAIR implementation considerations to assist accelerated global participation and convergence towards accessible, robust, widespread and consistent FAIR implementations. Any self-identified stakeholder community may either choose to reuse solutions from existing implementations, or when they spot a gap, accept the challenge to create the needed solution, which, ideally, can be used again by other communities in the future. Here, we provide interpretations and implementation considerations (choices and challenges) for each FAIR principle. Show less