NR2F1 database: 112 variants and 84 patients support refining the clinical synopsis of Bosch–Boonstra–Schaaf optic atrophy syndrome

Pathogenic variants of the nuclear receptor subfamily 2 group F member 1 gene (NR2F1) are responsible for Bosch–Boonstra–Schaaf optic atrophy syndrome (BBSOAS), an autosomal dominant disorder characterized by optic atrophy associated with developmental delay and intellectual disability, but with a clinical presentation which appears to be multifaceted. We created the first public locus‐specific database dedicated to NR2F1. All variants and clinical cases reported in the literature, as well as new unpublished cases, were integrated into the database using standard nomenclature to describe both molecular and phenotypic anomalies. We subsequently pursued a comprehensive approach based on computed representation and analysis suggesting a refinement of the BBSOAS clinical description with respect to neurological features and the inclusion of additional signs of hypotonia and feeding difficulties. This database is fully accessible for both clinician and molecular biologists and should prove useful in further refining the clinical synopsis of NR2F1 as new data is recorded.


| BACKGROUND
The nuclear receptor subfamily 2 group F member 1 gene (NR2F1; MIM# 132890), consisting of three exons, encodes the 423 amino acids (aa) of the COUP transcription factor 1 protein (COUP-TF1; Swiss-Prot:COT1_HUMAN). It belongs to the superfamily of the steroid/thyroid hormone receptors and is involved in the development of several brain structures, including the neocortex, hippocampus, and ganglionic eminences, as it has been shown in mice (Alfano et al., 2011;Armentano et al., 2006;Bertacchi et al., 2019).
Most pathogenic variants are de novo and dominant (Al-Kateb et al., 2013;Balciuniene et al., 2019;Bertacchi et al., 2020;Bojanek et al., 2020;Bosch et al., 2014;Bosch et al., 2016;Brown et al., 2009; The phenotype of BBSOAS is heterogeneous, as some patients do not suffer from visual impairment and others display no neurodevelopmental delay. In this article, we describe the construction of the first NR2F1 locus-specific database (LSDB), listing all patients and genetic variants referenced in the literature and unpublished cases from our laboratory, as well as computed data suggesting refinement of the BBSOAS clinical synopsis.

| Nomenclature
All names, symbols, and Online Mendelian Inheritance in Man (OMIM) database numbers were checked for correspondence with current official names indicated by the Human Genome Organization (HUGO) Gene Nomenclature Committee (Gray et al., 2013) and the OMIM database (Hamosh et al., 2000). The phenotype descriptions are based on the Human Phenotype Ontology (HPO), indicating the HPO term name and identifier (Köhler et al., 2019). Although the current official HGVS recommendations prescribe description beyond cDNA, we have indicated them for clarity by following a guideline envisaged for the open issue; the notation "c.−1687_*240{0}" indicates the absence of the entire NR2F1 gene, from the first nucleotide of the first exon to the last nucleotide of the last exon, the limits of which are beyond and not precisely defined.
Information concerning changes in RNA and protein levels have been added from the original papers or predicted from DNA variants if not experimentally studied. Following the HGVS guidelines, deduced changes are indicated between brackets. Protein domains were predicted according to InterPro version 85.0-April 8, 2021 (Blum et al., 2021).

| Implementation of the database
The NR2F1 database belongs to the Global Variome shared Leiden Open-source Variation Database (LOVD), currently running under LOVD v.3.0 Build 26c (I. F. Fokkema et al., 2011), following the guidelines for LSDBs (Vihinen et al., 2012) and hosted under the responsibility of the Global Variome/Human Variome Project (Cotton et al., 2008). The database reviews clinical and molecular data from patients carrying NR2F1 variants published in peer-reviewed literature as well as unpublished contributions that are directly submitted.
If there are inaccuracies or an obsolete convention is used, the "DNA published" field of the page dedicated to each variant indicates whether the published name has been modified by the curator. The NR2F1 LSDB website requires full compliance with the rules set out above for the description of sequence variants to provide uniform and comparable data.

| Data collection and analysis
The causative variants were collected from the literature published to date (May 2021) using the NCBI PubMed search tool (Sayers et al., 2010). The positions of variants in the reference transcripts were determined and updated according to the HGVS nomenclature version 2.0 (den Dunnen et al., 2016). Correct naming at the nucleotide and amino acid levels was verified and reestablished when necessary using the Mutalyzer 2.0.34 Syntax Checker (Wildeman et al., 2008).
Information on the number of patients carrying each causative variant, as well as their geographical origins and the homo-or heterozygosity, was taken from the original or review publications, as well as from data collected from our clinical laboratory. If the same patient or variant is reported in more than one article, then it is recorded only once in the database with reference to the first publication. Further information on the genetic origin of the allele, segregation with the disease phenotype, and frequency in the control population was recorded. The criteria of pathogenicity, which depend upon the clinical context and molecular findings, are stated under the heading "Clinical classification" for the classification of the variant based on standardized criteria The clinical classification of variants is based on standardized criteria directed on the clinical consequences as published or submitted, indicated using an enriched system including inheritance-for example, pathogenic (dominant), likely pathogenic, VUS (variant of unknown significance), likely benign, benign-derived from the American College of Medical Genetics and Genomics (ACMG) standards and guidelines (Richards et al., 2015).
We produced a data set from the NR2F1 LSDB version NR2F1:211007 (last updated on October 7, 2021) to carry out the statistical analysis. The HPO terms have been checked and prepared using the suite of R packages ontologyIndex version 2.5 and ontolo-gyPlot version 1.4 (Greene et al., 2017), within R version 4.0.5 (R Core Team, 2020), to read in the OBO file version hp/releases/2021-04-13 (Köhler et al., 2019). Hierarchical clustering is performed using the hclust function from the R-Core package (R Core Team, 2020). Our data were crossed with the following external sources: gnomAD v.2.1.1 (Karczewski et al., 2020)

| Data access and submission
The NR2F1 database is an open database allowing any researcher or clinician to consult the contents freely without prior registration or to contribute new data after registration to ensure traceability. The database can be accessed on the World Wide Web at: https://www.lovd.nl/NR2F1 (through the Global Variome shared LOVD server or through the MITOchondrial DYNamics variation portal at: http://nr2f1.mitodyn.org/). The data can also be retrieved via an application programming interface (API), that is, a web service allowing simple queries and retrieval of basic genes and variants information (documentation available on the web page of the database), as well as serving as a public beacon in The Global Alliance for Genomics and Health Beacon Project (Global Alliance for Genomics and Health, 2016). General information is available on the database home page. The process for submitting data begins by clicking the "Submit" tab.
Data concerning patients and variants may be retrieved using the standard LOVD tabs, named "Individuals" and "Variants," respectively.

| RESULTS AND DISCUSSION
The NR2F1 database contains three main interconnected tables: the "Individuals" table contains details of the patient examined, including gender, geographic origin, and patient identification, if applicable; the

| Genotypic data
To date, the database contains 112 sequence variants records.
Among the most frequently observed pathogenic effects on the protein encoded by NR2F1, 61% are missense variants, 14% are nonsense and 12% are frameshift variants leading to a premature protein truncation, that is, a quarter of the variations would result in truncated proteins or their absence due to the nonsense-mediated F I G U R E 1 Distribution of the 83 unique genomic variants in the NR2F1 database (compact view). Complete deletions of the gene that extends beyond the exons are shown as an extended bar with rafters, substitutions as black bars, deletions as blue bars, and the duplication as an orange bar. From the top are reported: an ideogram showing the cytogenetic localization (5q15); the genomic coordinates on human chromosome 5 (region shown extending over 10,846 bp, between positions 92,918,993 and 92,929,838 according to assembly GRCh37/hg19); and NR2F1 gene structure including exon numbering and domains. The full view detailing the names of each variant is available in Figure S1. NR C4-type domain: DNA-binding domain that is composed of two C4-type zinc fingers; NR LBD domain: nuclear hormone receptor ligand-binding domain. Adapted from UCSC Genome Browser (http://genome.ucsc.edu) with the NR2F1 database custom track; data as of October 7, 2021   (Mio et al., 2020), PA19 (Park et al., 2019), RE20 (Rech et al., 2020), ST20 (Starosta et al., 2020), VI17 (Vissers et al., 2017), WA20 (Walsh et al., 2020), ZO20 (Zou et al., 2020); the variants referenced FO19 are classification records from genomic diagnostic laboratories, that is, there is no associated patient information; the character "-" indicates a submission to the database without publication. h 3ʹ-untranslated region.
| 135 mRNA decay surveillance pathway; 10% lead to haploinsufficiency, either due to the complete deletion of the whole gene or as a consequence of a variant in the translation initiation codon on one allele; two variants are a deletion of a single aa (Figure 2b). Although only a few variants are recurrent, the large deletions of NR2F1 encompassing the cDNA boundaries have been significantly more frequently reported, with 11 records (Table 1) (Al-Kateb et al., 2013;Bosch et al., 2014;Brown et al., 2009;Chen et al., 2016;Rech et al., 2020).
The data from the Genome Aggregation Database (gnomAD), which is the aggregation of the high-quality exome (protein-coding region) DNA sequence data for about 140,000 individuals (Karczewski et al., 2020), has been integrated into the Global Variome shared LOVD server. However, it was decided to only indicate the frequency reported in gnomAD for each variant present in the server, so as to not flood the LSDBs with data not related to a phenotype.
This information is particularly useful at the time of curation for assessing the variants' classification. In total, only five of the unique variants in our database (6%) have a frequency assigned in gnomAD: all four variants considered benign that are recorded in the database, with a low frequency ranging from one allele in 248,414 to less than 0.02%; as well as a single variant classified as likely pathogenic with a frequency of 0.04%. This additional information reinforces the importance of the LSDB approach and data sharing to improve the classification of variants for genetic diagnostics.

| Phenotypic data
To date, the database includes 84 patient records ( This led to the analysis of a total of 5251 HPO terms (820 unique) by including the parent terms inferred by the ontological relationships ( Figures S2 and S3). More than half of the unique terms (413) are found to be annotated in only one patient, which is considered insufficiently informative. Thus, we focused only on the HPO terms represented in at least 25% of patients, that is, terms annotated in 21 or more patients. Figure 3 provides an overview of this most significant HPO annotation as a grid, highlighting clusters that suggest that NR2F1 patients share common phenotypic features.
We carried out a study of these same most-represented data as a and #00311065 (son) (Chen et al., 2016), as well as #00310033 and #00310047 (twins) (Mio et al., 2020). In contrast, two cases of inheritance of the variant without transmission of the phenotype are recorded in four patients: #00312024 (mother) and #00312023 (daughter), as well as #00312026 (mother) and #00312025 (daughter) from our center. This stresses the variable expressivity of BBSOAS, emphasizing that the mode of transmission is not necessarily obvious at the time of diagnosis.

| Clinical synopsis
The comparison of our data set, which reflects the state of the knowledge in the literature as well as recent unpublished patients, with the reference knowledge concerning BBSOAS in disease databases (MIM# 615722 and ORPHA:401777; Figure S4) confirms some clinical signs associated with NR2F1 and suggests that some others should be added or removed (Table 2). Compared to the OMIM synopsis, we first corroborate that ocular involvement predominates (95% of patients), with mainly optic atrophy (HP:0000648) and visual impairment (HP:0000505) in 81% and 72% of patients respectively.
The absence of optic disc pallor (HP:0000543) in both our data set and Orphanet is explained by the fact that this subjective clinical sign (discoloration of optic nerve head, as interpreted by the clinician) represents de facto a form of optic atrophy; some authors explicitly mention it (Bosch et al., 2014;Chen et al., 2016;Rech et al., 2020;Zou et al., 2020) while the majority do not. Ultimately, optic disc pallor is a milder form of optic atrophy.
We also confirm that the brain physiology (HP:0012638) is primarily affected (90% of patients), but with emphasis on the presence of delayed speech and language development (HP:0000750), motor delay (HP:0001270), and seizure (HP:0001250) in about half of the patients registered in the database. Interestingly, autistic behavior (HP:0000729), which is tagged "in some patients" in OMIM, appears as frequently as the previous phenotypes in the NR2F1 database (42%). Furthermore, we confirm the variability and nonspecificity of the dysmorphic features.
Finally, we suggest considering, in addition to what is already well established, two independent phenotypes: hypotonia (HP:0001252), which is found in 55% of patients in the database, and feeding difficulties (HP:0011968), which is found in 34%. This impaired ability to eat, related to problems gathering food and preparing to suck, chew, and swallow it, may be secondary to hypotonia (Rech et al., 2020).
Overall, our meta-analysis of the state of molecular and clinical knowledge of BBSOAS, representing over 5000 ontological terms in 83 patients, is inspired by big data methodologies, by retaining only F I G U R E 3 Visualization of the Human Phenotype Ontology (HPO) annotation describing the 82 symptomatic patients' reports with an extended set of full clinical description in the NR2F1 data set. Only the HPO terms represented in more than 25% of the patients are displayed. A red box indicates the presence of the phenotype; columns and rows are clustered using hclust; human-readable shortened ontological term names were used (where possible); the term Mode of inheritance (HP:0000005) indicates that the mode of transmission of the patient's phenotypic profile to relatives is known-it appears in the data set as an ancestor of the HPO term Sporadic (HP:0003745; in 79% of patients) or Autosomal dominant inheritance (HP:0000006; in two patients). In columns, the identifiers of the patients (eight digits) are prefixed by an arbitrary letter. The visualization of the full HPO annotation in the NR2F1 data set is available in Figure S2. Data as of October 7, 2021 the most frequent phenotypes annotated in at least 25% of patients.
Only strong discordance with the OMIM reference guided the suggestion of signs to be added or removed from the clinical synopsis (
Thus, descriptions of human diseases using HPO annotations are key elements of our novel NR2F1 clinicobiological database. Meanwhile, specialized databases reporting pathogenic variations, the so-called LSDB, have proven to be the most complete (Brookes & Robinson, 2015), as they benefit from the participation of a curator who is a referent specialist for the gene or disease in question. Our systematic and rational approach, based on the computer representation and analysis of data, has led us to propose a refinement of the description of the clinical signs of BBSOAS, particularly with respect to neurological features and the suggestion of additional hypotonia that may result in impaired feeding ability. Interestingly, this analysis could be reproduced in the future to enable further refinement as new data is recorded in the database.

WEB RESSOURCES
The following web resources were used: gnomAD (https://gnomad. F I G U R E 4 Visualization of frequency and relationships of the Human Phenotype Ontology (HPO) terms in the NR2F1 data set. Only the HPO terms represented in more than 25% of the patients are displayed. The mode of inheritance (HP:0000005) and phenotypic abnormalities (HP:0000118) subgraphs are descending from the root of all terms (All; HP:0000001) in the HPO. Arrows indicate relations between terms in the ontology. Colors correspond to the frequency of the phenotypes, from 25% in yellow to 100% in blue, the light green color corresponding to a term present in half of the patients. Human-readable shortened ontological term names were used (where possible). The visualization of the full frequency and relationships of the HPO terms in the NR2F1 data set is available in Figure S3. Data as of October 7, 2021

ACKNOWLEDGMENTS
We thankfully acknowledge grants from the following foundations and patients' associations: Association contre les Maladies mitochondriales, Fondation Visio, Kjer France Ouvrir les Yeux.

CONFLICT OF INTERESTS
The authors declare that there are no conflict of interests.