Multi-omics profiling, in vitro and in vivo enhancer assays dissect the cis-regulatory mechanisms underlying North Carolina macular dystrophy, a retinal enhanceropathy

North Carolina macular dystrophy (NCMD) is a rare autosomal dominant disease affecting macular development. The disease is caused by non-coding single nucleotide variants (SNVs) in two hotspot regions near PRDM13 and by duplications in two distinct chromosomal loci, overlapping DNase I hypersensitive sites near either PRDM13 or IRX1. To unravel the mechanisms by which these variants cause disease, we first established a genome-wide multi-omics retinal database, RegRet. Integration of UMI-4C profiles we generated on adult human retina then allowed fine-mapping of the interactions of the PRDM13 and IRX1 gene promoters, and the identification of eighteen candidate cis-regulatory elements (cCREs), the activity of which was investigated by luciferase and Xenopus enhancer assays. Next, luciferase assays showed that the non-coding SNVs located in the two hotspot regions of PRDM13 affect cCRE activity, including two novel NCMD-associated non-coding SNVs that we identified. Interestingly, the cCRE containing one of these SNVs was shown to interact with the PRDM13 promoter, demonstrated in vivo activity in Xenopus, and is active at the developmental stage when progenitor cells of the central retina exit mitosis, putting forward this region as a PRDM13 enhancer. Finally, mining of single-cell transcriptional data of embryonic and adult retina revealed the highest expression of PRDM13 and IRX1 when amacrine cells start to synapse with retinal ganglion cells, supporting the hypothesis that altered PRDM13 or IRX1 expression impairs interactions between these cells during retinogenesis. Overall, this study gained insight into the cis-regulatory mechanisms of NCMD and supports that this condition is a retinal enhanceropathy. Graphical abstract


Introduction
The term "missing heritability", introduced by Brendan Maher in 2008, refers to the fact that the underlying genetic cause of many complex traits and diseases remains unexplained. 1,2 Although many technological advances in the field of genomics took place during the past decade, large-scale exome sequencing still has a limited diagnostic yield ranging from ~30-60%, depending on the condition being investigated. [3][4][5] It was estimated that 85% of mutations with large effects on disease-related traits can be identified in the protein-coding regions. Nevertheless, this number may result from the historical observational bias towards the approximately 1% coding part of the human genome, partly due to the fact that a significant portion of the non-coding genome still lacks proper functional annotation. 6 In the last couple of years, however, with the continuously reducing cost of next-generation sequencing and the improvements of complementing analysis tools and algorithms, large-scale initiatives have contributed to the better understanding of the human genome and its different functional regions.
According to the analysis of the Encyclopedia of DNA Elements (ENCODE) project, based on chromatin accessibility, transcriptional activity, and DNA methylation data, about 80% of the human genome may have a functional significance. 7 The majority of the identified functional elements, including gene promoters, enhancers, silencers, and insulators, appeared to be involved in spatiotemporal gene expression. Control through these cis-regulatory elements (CREs) results from the presence of transcription factor binding sites (TFBS) for tissue-and time-specific transcription factors, which mediate the activation or repression of transcription. 8 It was also discovered that the human genome contains non-coding elements with an extremely high degree of sequence conservation across multiple vertebrate species. These are called ultra-conserved non-coding elements (UCNEs) and cluster in genomic regions associated with genes known to have a role during development. 9,10 Although studies in transgenic animals have shown that some of these elements function as tissue-specific enhancers during developmental processes, the functional significance of others remains to be ascertained. 11,12 To facilitate communication between CREs and target gene promoters, the genome is organized into well-defined three-dimensional domains, called topologically associating domains (TADs). This form of genome architecture, which is conserved across cell types, allows long-range interactions between CREs and their target genes within the same TAD, while insulating regulatory activity from neighboring TADs. 13 Whole genome sequencing (WGS) has revealed structural variants (SVs) that alter the copy number of CREs or disrupt TAD boundaries, resulting in modified three-dimensional chromatin architecture. 14 Moreover, it has also been shown that single nucleotide variants (SNVs) residing in noncoding CREs could explain a certain fraction of the missing heritability, with the phenotypic consequences usually resulting from alterations in gene expression. [15][16][17][18] Thus far, an increasing number of non-coding variants have been linked to inherited retinal diseases (IRDs), the majority of them affecting cis-acting splicing. 19-21 A well-known example is a deep-intronic mutation in CEP290, accounting for ~20% of congenital blindness due to Leber congenital amaurosis in Northwestern Europe. 22 In 2016, it was shown that North Carolina macular dystrophy (NCMD, MIM: 136550) is caused by non-coding SNVs near PRDM13 and duplications overlapping with DNase I hypersensitive sites near either PRDM13 or IRX1, likely dysregulating these transcription factor-encoding genes. 23 NCMD is a rare autosomal dominant disorder that affects the development of the macula as first noted by  This completely penetrant maculopathy is present at birth, shows no progression, is characterized by variable expressivity (grade 1 to 3), even among family members, and usually affects both eyes symmetrically. 24,25 Grade 1 is characterized by a few small yellow drusen-like lesions within the fovea, which form larger confluent lesions in grade 2. In both grades, patients typically have no or mild impairment of central vision. In grade 3, the fundus has a striking appearance characterized by severe central colobomatous-like chorioretinal atrophy, resulting in mild to moderately impaired vision. 24,25 Although NCMD is generally considered as non-progressive, visual deterioration can be observed as a result of complications associated with choroidal neovascularization. 26 The disease was first described in 1971 upon the clinical investigation of a family from North Carolina, 27 and twenty years later, NCMD was mapped to chromosome 6q14-q16 (MCDR1 locus, MIM 136550). 28 During the following decades, linkage analysis was performed in many additional families giving a cumulative logarithm of the odds (LOD) score greater than 40. 29 Genetic heterogeneity was subsequently found with linkage to 5p15-p13 (MCDR3 locus, MIM 608850) in a single Danish family. 30 Since no mutations were found in the protein coding regions in both loci, Small et al. used WGS which eventually uncovered the first pathogenic non-coding variants underlying NCMD in 2016. 23 To date, fourteen NCMD-associated genetic variants have been reported, of which eleven are located in the MCDR1 locus and three in the MCDR3 locus. In particular, six pathogenic heterozygous SNVs have been identified in the MCDR1 locus, which cluster together in two distinct hotspots upstream of the PRDM13 gene. 23,31,32 In addition, five heterozygous tandem duplications have been reported in the same locus, encompassing the PRDM13 gene and a shared non-coding region (~44 kb) upstream of the gene, containing several putative CREs. 23,[33][34][35][36] The three other NCMD-associated genetic variants are heterozygous tandem duplications located in the MCDR3 locus. These share a ~39 kb non-coding region in a gene desert downstream of the IRX1 gene, while only one out of three duplications encompasses the coding region of IRX1. 23,26 Interestingly, the shared duplicated region in MCDR3 contains putative CREs active during a definite period of retinal development, as well as a UCNE of which the function is yet unexplored. 9,26 Both the PRDM13 and IRX1 gene have been demonstrated to have a role during retinal development.
More specifically, PRDM13 encodes a transcription factor characterized by an N-terminal PR (PRDI-BF1 and RIZ1 homology) domain with histone methyltransferase activity, followed by four zinc-finger domains, responsible for protein-protein and protein-DNA binding. 37 In the retina, PRDM13 acts as a cell type specification factor downstream of PTF1A, playing an essential role in the development and diversification of specific subsets of amacrine cells. Based on their morphological, neurochemical, and physiological features, more than 30 subtypes of amacrine cells can be defined, each having a distinct set of functions. Most PRDM13-positive amacrine cells use GABA or glycine as a neurotransmitter and adapt visual sensitivity to changing contrast and spatiotemporal frequency. As a consequence, Prdm13 knockout in the mouse significantly reduces this amacrine cell subtype. 38 On the other hand, most vertebrate species possess six Iroquois (IRX) genes, located in two paralog clusters. These genes encode important developmental homeobox transcription factors, with key roles in regulating gene expression during pattern formation and nervous system development. 39,40 In mice, all six Irx genes are expressed in the inner layers of the neuroretina during early retinogenesis and Irx1 is required for the neurogenesis of the inner and outer nuclear layer of the retina. Also in zebrafish, irx1a knockdown negatively affects the proper differentiation of amacrine, bipolar, photoreceptor, and Müller glial cells. 41,42 In this study, we aimed to provide insight into the cis-regulatory mechanisms of NCMD by integrated multi-omics profiling of the human retina, followed by in vitro and in vivo enhancer assays of candidate cis-regulatory elements (cCREs) and NCMD-associated (likely) pathogenic SNVs. Additionally, we expanded the mutational spectrum of NCMD by the identification of novel (likely) pathogenic SNVs in the PRDM13 region.

Generation of an integrated retinal multi-omics database RegRet
In order to gain more insight into the regulatory landscapes of the two NCMD-associated disease loci, multiple publicly available tissue-specific datasets were collected and integrated into a genome-wide database. This database contains multiple chromatin immunoprecipitation sequencing (ChIP-seq) datasets of histone modifications and retinal transcription factors, 43 assay for transposase-accessible chromatin sequencing (ATAC-seq) data, 43,44 and RNA-seq data, 43,45 all generated in adult human retina.
Moreover, DNase-seq data of embryonic retina at five different stages, 46 and high-throughput chromosome conformation capture (Hi-C) data from human retinal organoids were included. 47 Following dissection, retinas were processed according to the protocol of Matelot et al. (2016). 48 Briefly, retinas were treated with a 12.5% collagenase solution and forced through a 40 µm cell strainer, followed by crosslinking using a 2% formaldehyde solution. The crosslinking reaction was quenched by adding a cold 1 M glycine solution on ice. Cell suspensions were then centrifuged and washed with PBS. Cell lysis was performed using a 50 mM Tris-HCl, 150 mM NaCl, 5 mM EDTA, 0.5% NP-40, 1% Triton X-100 lysis buffer containing a complete protease inhibitor cocktail. Next, the lysates were centrifuged to obtain a pellet of nuclei. After a washing step with PBS, the nuclei were aliquoted per 10 million, snap frozen as a pellet in liquid nitrogen, and stored at -80°C. The maximum processing time between enucleation and preservation of nuclei was 24 h.

RNA isolation, cDNA conversion, qPCR
After retina dissection and before continuing with nuclei isolation, a small fraction of each of the collected retinas was preserved in RNAlater (Qiagen). The tissues were thoroughly homogenized in the presence of TRIzol (Invitrogen). Next, total RNA was obtained through phenol-chloroform extraction.
A DNase treatment was performed on-column using the RNeasy mini kit (Qiagen). The iScript cDNA synthesis kit (Bio-Rad) was used for RNA to cDNA conversion and subsequent qPCR analysis was performed using the SYBR Green Master Mix (Bio-Rad) and primers for retina-specific genes and controls. Expression values were analyzed in qbase+ (Biogazelle). Primer sequences can be found in Table S1.

Generation of UMI-4C sequencing libraries
The dissected retinas were then subjected to unique molecular identifier chromosome conformation capture (UMI-4C) sequencing to identify the chromatin interactions within the two NCMD-associated disease loci. As a first step, libraries were generated according to the protocol of Schwartzman et al.
(2016). 49 For every library, the crosslinked DNA from 10 million nuclei was first digested overnight with 400 U DpnII (NEB) at 37°C. Next, samples were diluted one in two before adding 4,000 U T4 DNA ligase (NEB) for overnight incubation at 16°C. Digestion and ligation efficiency were evaluated via agarose gel electrophoresis. DNA was then de-crosslinked with proteinase K (BIOzymTC). After overnight incubation at 65°C, the resulting 3C templates were purified using AMPure XP beads (Agencourt) and 4 µg of each template was sheared on a Covaris M220 focused-ultrasonicator (Covaris) to achieve 300 bp DNA fragments. The obtained fragments were library prepped using NEBNext Ultra II DNA Library Prep Kit (NEB) for Illumina. DNA shearing and library prep efficiency were evaluated on a Fragment Analyzer (Agilent Technologies). Next, a ligation-mediated nested PCR was conducted with an upstream (US) forward primer hybridizing to the selected genomic region of interest (i.e. the viewpoint), a universal reverse primer, 100 ng of library as input, and the KAPA2G Robust HotStart ReadyMix (Roche). After purification using AMPure XP beads (Agencourt), the resulting PCR product was used for the second PCR step, using a downstream (DS) forward primer and the same universal reverse primer. Primer sequences for the different viewpoints can be found in Table S2. After purification, final library composition was evaluated on a Fragment Analyzer (Agilent Technologies).

UMI-4C sequencing and data-analysis
UMI-4C libraries were multiplexed in equimolar ratios and sequenced on the Illumina NovaSeq 6000 platform at 150 bp paired-end. Sequenced reads were demultiplexed based on their barcodes and their DS primer. UMI-4C data were then analyzed as described in Schwartzman et al. (2016) using the umi4cpackage in R (https://bitbucket.org/tanaylab/umi4cpackage). 49 Briefly, reads containing the primer sequence were first tested for a match with the pad sequence. Reads lacking this match were filtered out. After quality trimming, the restriction fragment ends, representing a potentially informative ligation partner, were mapped to the human genome using Bowtie2. 50 PCR duplicates were removed based on the UMI sequences, being the last ten bases of the 3′ end of the read pair.
Next, restriction fragment ends mapped to genomic coordinates within less than 1,500 bp of the viewpoint were considered as nondigested products and were excluded from the analysis. Contact intensity profiles around the viewpoint were constructed by extracting the number of ligations for all restriction fragment ends within a genomic window and pooling of this data from both samples. Finally, UMI-4C profiles were normalized and domainograms created.

Generation of in vitro reporter constructs
In order to evaluate the activity of cCREs and the effect of NCMD-associated genetic variants, in vitro luciferase assays were set up. Therefore, fourteen non-coding regions of interest were PCR amplified from human genomic DNA (Roche) using the Phusion High-Fidelity PCR Kit (NEB). Resulting products were cloned into the pGL4.23 luciferase reporter vector (Promega) by restriction-ligation cloning. The recombinant vectors were amplified in One Shot TOP10 Chemically Competent E. coli cells (Invitrogen) and purified using a QIAprep Spin Miniprep Kit (Qiagen). For the generation of the three reporter constructs containing duplicated regions of interest, a second round of restriction-ligation cloning was performed using different restriction enzymes. SNVs were inserted into the regions of interest using the Q5 Site-Directed Mutagenesis Kit (NEB) and mutation-specific primers were designed with the NEBaseChanger tool. The sequence of all inserts was confirmed by Sanger sequencing using the BigDye Terminator v3.1 kit (Life Technologies). All regions of interest, their respective primer sequences, and the mutagenesis primers can be found in Table S3 & S4.

Luciferase enhancer assays
The recombinant pGL4.23 luciferase reporter vectors were transfected in equimolar amounts into ARPE-19 cells (ATCC, CRL-2302™) using the Lipofectamine 3000 Reagent Protocol, according to the manufacturer's instructions. The vectors were co-transfected with the pRL-TK Renilla luciferase control reporter vector (Promega) for normalization purposes. After 48 h, cells were lysed and luciferase activity was detected using the Dual&Glo Luciferase Assay System (Promega) in a Glomax 96 Microplate Luminometer (Promega). Each transfection was done in triplicate and each experiment was repeated at least three times to ensure reproducibility. For each well, the ratio of firefly luciferase activity was normalized to Renilla luciferase activity. Next, these ratios were normalized to the average ratio of the corresponding control vector. For each SNV, the respective wild-type vector was used as a 10% fetal bovine serum, 1% penicillin-streptomycin, 1% non-essential amino acid solution, and 0.1% amphotericin B. Cells were cultured at 37°C and 5% CO2 and tested for mycoplasma contamination prior to use.

Generation of in vivo reporter constructs
To functionally assess the potential in vivo activities of five cCREs in the PRDM13 and IRX1 loci, enhancer detection assays using SED vectors in frog (Xenopus) animal models were set up (Table S5).
This Xenopus transgenesis compatible version of the fluorescent zebrafish enhancer detection (ZED) 51 vector utilizes a I-SceI based detection system for assessing enhancer activity in Xenopus embryos. 52 A detailed description of the vector can be found in Figure S1. To generate the in vivo reporter constructs, the regions of interest were PCR amplified from human genomic DNA (Roche) using Taq polymerase (Thermo Scientific) and cloned into the pCR-GW-TOPO vector (Invitrogen). Using LR Clonase II (Invitrogen), the DNA fragments were then subcloned into the destination SED vector, as previously described. 53

Functional characterization of cCREs using enhancer assays in Xenopus embryos
All experiments on Xenopus tropicalis and albino Xenopus laevis were executed in accordance with the guidelines and regulations of Ghent University, Faculty of Sciences, Belgium. Approval (EC2014-089 and EC2017-104) was obtained by the ethical committee of Ghent University, Faculty of Sciences. For both Xenopus tropicalis and albino Xenopus laevis, females and males were primed with 20 U and 10 U human chorionic gonadotropin (hCG, Pregnyl, Merck), respectively. Natural mating was set-up the next day for Xenopus tropicalis and 5 days after priming for albino Xenopus laevis, after boosting the females and males with 150 and 100 U hCG, respectively. All the injections took place at the day of mating. Therefore, the I-SceI meganuclease-mediated transgenesis protocol was followed, as previously described for Xenopus. 52 Briefly, a total of 100 ng of SED vector construct was digested with 5 U of I-SceI meganuclease (NEB) at 37°C for 10 minutes in a 10 μL reaction mixture. Fertilized Xenopus eggs were obtained and dejellied using established protocols. 54 Embryos were then injected either unilaterally in the two-cell stage, or in two of the dorsal blastomeres at the four-cell stage, using 1 nL of the reaction mixture. Next, embryos were raised overnight at room temperature and replaced at 25.5°C for the rest of their development. Embryos were screened for enhanced green fluorescent protein (EGFP) reporter expression at different developmental stages using fluorescent microscopy.

Targeted sequencing and qPCR
To expand the mutational spectrum of NCMD, targeted genetic testing was performed in a cohort of twenty-three unsolved and unrelated index cases with a clinical diagnosis of NCMD. This study was approved by the ethics committee for Ghent University Hospital (2018/1566, B670201938572). The known NCMD-associated SNVs were analyzed on genomic DNA by PCR amplification of the two mutational hotspots, respectively containing the variants referenced as V1-V3, V12 and the V10-V11 variants, followed by Sanger sequencing using the BigDye Terminator v3.1 kit (Life Technologies). For detection of the previously reported tandem duplications, primers specific for the duplication products were designed. Fragment analysis was performed on a Fragment Analyzer (Agilent Technologies) using the PROSize 2.0 software. All primer sequences can be found in Table S6. In the next step, copy number variant analysis was performed for a selection of regions implicated in the known tandem duplications.
In particular, the copy number of the coding regions of PRDM13 and IRX1, as well as the shared duplicated region downstream of IRX1 were evaluated by qPCR using the SYBR Green Master Mix (Bio-Rad). Copy number values were analyzed in qbase+ (Biogazelle). All primer sequences can be found in Table S6.

Whole genome sequencing (WGS)
Following targeted sequencing and copy number variant analysis, a subset of nine molecularly undiagnosed NCMD index cases was subjected to WGS. Briefly, samples were prepared according to the Illumina TruSeq DNA PCR-free library preparation guide. The library of family F1 was sequenced on the Illumina HiSeq X-Ten platform at 350 bp paired-end, while all other libraries were sequenced on the Illumina NovaSeq 6000 platform at 150 bp paired-end. Reads were mapped to the human genome (hg38) using Isaac aligner (iSAAC-04.18.11.09) 55 and Strelka (2.9.10) 56 was used to identify SNVs and short indels. Variants were annotated by SnpEff (v4.3t) 57 in combination with additional databases, including ESP6500, 58 ClinVar, 59 and dbNSFP3.5. 60 To identify SVs and large indels, the Manta (1.5.0) 61 software was utilized. At first, the two NCMD-associated disease loci were analyzed for candidate variants. Therefore, SNVs were filtered based on zygosity (heterozygous), Genome Aggregation Database (gnomAD v3.1.2) 62 population frequency (<0.005%), and genomic location (2 Mb and 4 Mb region around the PRDM13 and IRX1 gene, respectively corresponding to the TAD the gene is located in). SVs located in these genomic regions were evaluated using the Database of Genomic Variants (DGV, http://dgv.tcag.ca/dgv/app/home). Next, the coding regions of 290 known retinal disease genes (RetNet panel V5, https://sph.uth.edu/retnet/disease.htm) were assessed for (likely) pathogenic variants, both SNVs and SVs, that could explain a macular phenotype. Data analysis was performed using our in-house developed analysis platform Seqplorer, which incorporates gnomAD population frequencies, in silico missense predictions (REVEL, PolyPhen-2, SIFT, CADD, etc.), and splice predictions (MaxEntScan, SpliceRegion). Variants were confirmed by Sanger sequencing using the BigDye Terminator v3.1 kit (Life Technologies) and underwent segregation analysis where possible. All primer sequences can be found in Table S6.

Clinical evaluation
Ophthalmic examination of the cohort of twenty-three unsolved NCMD families included fundus photography, optical coherence tomography (OCT), and measurement of best-corrected visual acuity (BCVA). In family F1, color vision was tested by means of a Lanthony D-15 saturated and desaturated assay.

In silico assessment of the non-coding SNVs in the mutational hotspots
For the previously reported and novel pathogenic NCMD-associated SNVs in the two mutational hotspots upstream of PRDM13, potential effects of the nucleotide changes on TFBS motifs were analyzed using the TRANSFAC software. 63 Furthermore, an in silico assessment was performed using several functional prediction tools for human non-coding regulatory variants, including an integrated non-coding regulatory prediction score, regBase, 64 as well as several other tools (GenoCanyon, 65 ncER, 66 DVAR, 67 FIRE, 68 PAFA, 69 CDTS 70 ).

In silico assessment of the NCMD-associated duplication breakpoints
A bioinformatics analysis was performed for the breakpoints of previously reported pathogenic NCMDassociated tandem duplications in the PRDM13 and IRX1 loci, as previously described. 71,72 In particular, the degree of microhomology (multiple sequence alignment, ClustalW), the presence of repetitive elements (RepeatMasker track UCSC Genome Browser), and the presence of sequence motifs, based on 40 previously described motifs (Fuzznuc) were analyzed on the breakpoint regions of the tandem duplications. 73,74

Data mining in single-cell retinal dataset
Publicly available single-nucleus RNA-seq data of embryonic (53,59,74,78,113, and 132 days of development) and adult (25, 50, and 54 years old) human retinal cells, generated by Thomas et al.
(2022), were processed for evaluating PRDM13 and IRX1 expression at single-cell level. 75 Expression matrices derived from nine post-mortem donor neural retinal samples (GSE183684) were imported into R (v4.0.5) and processed using the Seurat single-cell analysis package (v4.0). 76 Pre-processing and quality control was conducted to remove outlier cells. Briefly, we only considered genes with counts in at least three cells and filtered out cells that had unique feature (gene) counts <200 or >9,000 and that expressed >5% mitochondrial counts. The data were then total-count normalized, logarithmized, filtered for highly variable features, and scaled to unit variance. We used the Harmony package to merge all the single-cell data. 77 After quality control, pre-processing, and merging of all the data, a total of 60,014 retinal cells were kept for subsequent dimensionality reduction, embedding, and clustering. Markers associated with major neural retina cell populations were used to assess IRX1 and PRDM13 expression at single-cell level.

Establishment of RegRet, a genome-wide multi-omics retinal database
Multiple publicly available retinal datasets from different sources were integrated in a genome-wide multi-omics database. In particular, the database contains ChIP-seq datasets of histone modification H3K27ac from adult human retina, macula, and retinal pigment epithelium (RPE), and H3K4me2 from adult human retinal tissue, both indicative of enhancer activity. 43 Additional ChIP-seq datasets of retinal (CRX, NRL, OTX2, MEF2D, RORB) and other (CTCF, CREB) transcription factors, generated in adult human retina were also included. 43 Next, we integrated available ATAC-seq data from both adult whole retina and macula, 43 as well as ATAC-seq data from adult retina and RPE, where a distinction was made between macular and peripheral regions. 44 In the context of retinal transcriptomics, quantitative RNA-seq data and newly identified transcripts observed in adult human retina were added, 45 next to single-nucleus RNA-seq data generated in adult human retina, macula, and RPE. 43 With regard to retina-specific chromatin interactions, Hi-C data from human retinal organoids was added. 47 To provide insight into retinal development, DNase-seq data from embryonic retinal tissue at five stages (74,85,89,103, and 125 days) were included. 46 Apart from the retina-specific datasets,

UMI-4C profiling on human adult retina reveals cCREs in the PRDM13 and IRX1 loci
To identify the non-coding regions in the two known disease loci that interact with the PRDM13 and IRX1 promoters, we performed chromosome conformation capture, in particular UMI-4C, on retinas of adult human donor eyes. After dissection of the eyes, the purity of the retinal tissue was first demonstrated by expression analysis of several retina-and RPE-specific genes ( Figure S2). Next, UMI-4C was performed on the cross-linked retinal nuclei, using the PRDM13 and IRX1 promoter regions as bait sequences. The numbers of raw, mapped, and unique reads for each viewpoint can be found in Table S7. For both genes, UMI-4C profiles indicated that promoter interactions were spread across but limited to their respective TAD ( Figure 1). These profiles were then integrated into the retina-specific multi-omics RegRet database, which enabled the identification of cCREs in the two NCMD-associated disease loci. In particular, cCREs were defined as non-coding regions interacting with the PRDM13 or IRX1 gene promoter while demonstrating overlap with peaks of epigenomics datasets or containing a UCNE. Although, based on our UMI-4C experiments, the first mutational hotspot upstream of PRDM13 and the shared duplicated region downstream of IRX1 do not exhibit interactions with the PRDM13 or IRX1 gene promoter in the adult retina, these regions were nevertheless considered as cCREs in the course of this study (PRDM13_cCRE3, IRX1_cCRE10). Reason for this is that both regions are important for NCMD disease pathogenesis, while overlapping with DNase I hypersensitive sites during retinal development and with ChIP-seq peaks of retinal transcription factors or a UCNE, respectively. This brings the total number of identified cCREs to eight and ten for the PRDM13 and IRX1 locus, respectively (Table 1 & Figure S3).
One particularly interesting cCRE was identified upstream of the PRDM13 promoter (PRDM13_cCRE1).
There, the UMI-4C data demonstrated a distinct interaction with the PRDM13 promoter, in addition to an overlap with DNase-seq and ATAC-seq peaks, as well as with ChIP-seq profiles of specific histone marks indicative of enhancer activity (H3K27ac, H3K4me2) and ChIP-seq profiles of retinal transcription factors (CRX, NRL, OTX2) ( Figure 2). To confirm this interaction, a reverse UMI-4C experiment was conducted using this cCRE as a bait sequence. The resulting UMI-4C profile confirmed the interaction between the CRE and the PRDM13 promoter, altogether making this region a very likely enhancer of PRDM13 ( Figure 2).
A second reverse UMI-4C experiment was performed for the shared duplicated region located in a gene desert ~800 kb downstream of the IRX1 gene (IRX1_cCRE10). Although we were not able to identify an interaction using the IRX1 promoter as a viewpoint, the reverse UMI-4C experiment with IRX1_cCRE10 as a viewpoint suggested an interaction between the two regions ( Figure S4).

In vitro enhancer assays show a regulatory effect for cCREs in the PRDM13 and IRX1 loci
From the eighteen identified cCREs, a set of fourteen cCREs was selected to determine their in vitro enhancer activity. These regions were cloned into separate luciferase reporter vectors (pGL4.23) (Table S3) Table S8). Remarkably, five regions, some of which had strong in silico predictions, did not show a significant difference in luciferase reporter levels compared to the negative control vector (P>0.05).

PRDM13 and IRX1 loci
In order to address the activity of cCREs in vivo, the five most prominent cCREs were cloned into the SED vector, upstream of a minimal gata2 promoter element and an EGFP reporter gene. In particular, the cCRE with the highest in vitro activity from the IRX1 locus (IRX1_cCRE7) and the shared duplicated region (IRX1_cCRE10), as well as the cCRE with the strongest epigenomic signatures from the PRDM13 locus (PRDM13_cCRE1) and the two mutational hotspots (PRDM13_cCRE3, PRDM13_cCRE5) were & 4D). In these transgenic tadpoles, DsRed expression, which in the SED vector is driven by the Xenopus cardiac actin promoter, was visible in muscles, thereby demonstrating successful transgene integration ( Figure 4E & 4F). When the tadpoles were grown and screened for EGFP reporter expression before metamorphosis at NF stage 55, 13 out of 30 transgenic embryos were found to express EGFP in the eye of the injected side, which was visible through the lens of the dark-pigmented eye ( Figure 4G).
Meanwhile, no fluorescent signal was observed in the eye on the non-injected side of these tadpoles ( Figure 4H). The findings from the other Xenopus tropicalis transgenesis experiments are summarized in Table S9.
As EGFP reporter expression driven by the SED vector reporter constructs becomes obscured by the pigmentation of the eyes in the tadpole stages, albino Xenopus laevis animals were used to obtain embryos with non-or poorly-pigmented eyes. Transgenic albino embryos that were unilaterally  Table S9.

Novel non-coding SNVs found in the mutational hotspots of PRDM13
In order to expand the mutational spectrum of NCMD, twenty-three unrelated index cases with a clinical diagnosis of NCMD underwent targeted sequencing and copy number profiling of the known hotspots of the PRDM13 and IRX1 regions, followed by WGS for a subset of nine undiagnosed cases after targeted testing. In the PRDM13 locus, this revealed two novel non-coding SNVs, called V15 and V16, as well as a previously reported NCMD-associated SNV (V1). Copy number analysis via qPCR did not reveal any additional variants in the two disease loci. A summary of the genetic findings can be found below and in Table S10. An overview of the pedigrees and clinical details of the families with (likely) pathogenic variants can be found in Figures S5 (family F1), S6 (family F2), and S7 (family F3), and in Table S10.

Variable in silico predictions of NCMD-associated SNVs
Using TRANSFAC, the effect of the eight previously reported and novel NCMD-associated non-coding SNVs on consensus TFBS was analyzed. This revealed that the novel SNV V15 is predicted to lead to the gain of an HSF1 TFBS, when compared to the wild-type sequence; while in case of the previously reported V10 variant, the A to C nucleotide change results in a predicted loss of an POU2F1 (OCT1) TFBS ( Figure S8). For the six other known or novel SNVs, no effect on consensus TFBS was observed.
The regBase_REG model, predicting the regulatory potential of a variant regardless of its functional direction and pathogenicity, generated relatively high scores for all variants in both hotspots. The regBase_PAT model on the other hand, which predicts the pathogenic capacity of a variant, scores the variants in first mutational hotspot (PRDM13_cCRE3) as more likely to be pathogenic, compared to those located in the second mutational hotspot (PRDM13_cCRE5). According to GenoCanyon, which uses a cut-off of 0.5 to define functionality, the five variants located in first mutational hotspot are all functional, having a score of 1.0. This is in contrast with the three variants in the second mutational hotspot, which have low GenoCanyon scores. A similar result is obtained from the ncER prediction tool, which generates a score ranging from 0 (non-essential) to 100 (putative essential). There, the variants in the two hotspots have a score of ~70-80 and ~10-30, respectively. The DVAR scores, reflecting the probability of a variant being functional, are ~0.9 for variants in first and ~0.7 for variants in the second mutational hotspot. The PHRED-scaled FIRE scores of all variants were comparable with those from DVAR, with higher scores suggesting greater capability to regulate nearby gene expression levels.
Based on the PHRED-scaled PAFA scores, only variants V1 and V16, located in the first mutational hotspot (PRDM13_cCRE3), are predicted to have a functional effect. Finally, the CDTS scores demonstrate moderate intolerance to variation for all variants (Table 2).

PRDM13 and IRX1 region
Next, the potential effect of the eight non-coding NCMD-associated SNVs on downstream reporter activity was assessed. Therefore, the two mutational hotspot-containing luciferase reporter vectors were subjected to site-directed mutagenesis to create eight individual variant vectors (Table S4). The luciferase assays revealed that the four known variants (V1, V2, V3, V12) in the first mutational hotspot (PRDM13_cCRE3) respectively resulted in a 2.6-, 2.8-, 1.6-, and 2.0-fold increase of reporter expression (P<0.001), relative to their wild-type vector. Similar effects were obtained for the novel variant (V16) in the same region. There, the relative luminescence was 3.2-fold higher (P<0.001) compared to the wild-type vector ( Figure 6A & Table S8). For the variants located in second mutational hotspot (PRDM13_cCRE5), an opposite trend was observed. The two known (V10, V11) and one novel variant (V15) in this region respectively demonstrated a 0.6-, 0.7-, and 0.6-fold decrease of luciferase expression (P<0.001), relative to their wild-type vector ( Figure 6B & Table S8).
To evaluate whether the cCREs located in previously reported NCMD-associated tandem duplications could have altered effects on reporter gene expression upon duplication, we analyzed luciferase activity of three regions located in the shared duplicated region of either the PRDM13 or IRX1 locus (PRDM13_cCRE1, PRDM13_cCRE3, IRX1_cCRE10). Each of these regions was cloned into a separate luciferase reporter vector (pGL4.23) as a tandem duplication, to be compared against its reciprocal vector containing the same region as single insert. The results from the luciferase assays show that in case of all three cCREs, a 0.5-fold decrease in expression levels of the luciferase reporter (P<0.001) is exhibited by the duplicated region in comparison with its single region counterpart ( Figure 6C & Table S8).

Tandem duplications in the PRDM13 and IRX1 loci are caused by nonhomologous endjoining or replicative-based mechanisms
To assess the underlying mechanisms that gave rise to the eight previously reported NCMD-associated tandem duplications in the PRDM13 and IRX1 loci, bioinformatic analyses were performed on the breakpoint regions of these duplications. We found that three out of eight tandem duplication showed overlap with repetitive elements at both breakpoints. These pairs of elements, however, had little mutual sequence identity, since they were part of different classes. Apart from repetitive elements, other sequence motifs have also been shown to predispose to DNA breakage. Therefore, we investigated the 70 bp sequence surrounding the exact breakpoints of the eight tandem duplications for the presence of 40 sequence motifs previously associated with DNA breakage. Interestingly, one particular motif (deletion hotspot consensus, TGRRKM) was present in at least one breakpoint region of seven out of eight tandem duplications, while for four of these, this motif was present in both breakpoint regions. Moreover, the inclusion of several random nucleotides at the joint point (i.e. information scar) was observed at three tandem duplications, whereas a microhomology of two or three bps was present at six tandem duplications. Based on these results, we conclude that the tandem duplications are caused either by nonhomologous end-joining (NHEJ) or by a replicative-based mechanism, such as fork stalling and template switching (FoSTeS) or microhomology-mediated breakinduced replication (MMBIR). In contrast to replicative-based mechanisms, NHEJ does not require microhomology, but is generally associated with the presence of an information scar (V4, V5, V6).
Microhomology in the absence of an information scar in all other tandem duplications favors the hypothesis of a replicative-based mechanism. The sites of microhomology serve as priming location to invade the second replication fork after stalling or collapse of the first replication fork. For FoSTeS and MMBIR, a single DNA break and the presence of microhomology at the breakpoints are sufficient to cause this template switching in a backward direction, which could have given rise to five out of eight tandem duplications (V7, V8, V9, V13, V14). A summary of these findings can be found in Table S11, while the presence of microhomology is visualized in Figure S9.

During retinal development PRDM13 is predominantly expressed in amacrine cells and IRX1 in retinal ganglion cells
Using the snRNA-seq data from six embryonic (53,59,74,78,113, and 132 days) and three adult (25, 50, and 54 years old) human retinal samples generated by Thomas et al. (2022) 75 Figure S10B). For both genes, the highest expression is observed at e70.

Discussion
In the last decade, it has become evident that non-coding variants contribute significantly to the molecular pathogenesis of IRDs. Although an increasing number of pathogenic variants affecting cisacting splicing have been identified, 19-21 causal variants located in non-coding cis-acting regulatory elements are more scarce. [82][83][84] The identification and interpretation of the latter, however, remains challenging as the exact mechanisms of the elements they are located in, the genes they act on, and the effect of the variation remains largely unknown. Here, we have focused on NCMD, a developmental macular disorder caused by non-coding SNVs and tandem duplications assumed to affect PRDM13 (MCDR1 locus) and IRX1 (MCDR3 locus) expression.
The establishment of RegRet, a genome-wide multi-omics retinal database, advances genome annotation for the human retina. It not only laid the foundation for the locus-specific research of the two genomic regions implicated in NCMD, but also represents a universal framework for studies of other loci implicated in rare IRDs as well as complex retinal diseases.

By integrating chromosome conformation capture (UMI-4C) profiles for the PRDM13 and IRX1
promoter, generated on adult human retinal tissue, with multiple genome-wide retinal datasets on DNA accessibility, epigenetic histone marks, transcription factor binding, and gene expression, we were able to both characterize candidate cis-regulatory elements (cCREs) and determine which of these physically interact with the two promoters of interest, revealing eight and ten cCRE for the PRDM13 and IRX1 gene, respectively. Although in vitro luciferase assays did not demonstrate regulatory activity for all of these regions, in vivo experiments in Xenopus embryos could confirm eyeand brain-specific activity for two cCREs in both the PRDM13 (PRDM13_cCRE1, PRDM13_cCRE5) and the IRX1 locus (IRX1_cCRE7, IRX1_cCRE10). Moreover, the interaction of PRDM13_cCRE1 with the  85 Interestingly, 70% of the macula-specific CREs that were identified in this study correspond with the eight candidate CREs that we identified in the PRDM13 locus ( Figure S11). It should also be noted that the quality of UMI-4C profiles, reflected by the number of UMIs, is dependent on both the genomic region the viewpoint is located in and the efficiency of the nested PCR reactions. The same holds true for the reverse UMI-4C experiment on adult retinal tissue using the shared duplicated region, located ~800kb downstream from the IRX1 promoter, as a bait. The obtained profile hinted at an interaction between this region and the IRX1 promoter, while the promoter-based UMI-4C experiment did not. Although IRDs in general have a diagnostic success rate of approximately 60%, causative variants were detected in only 13% of the NCMD cohort tested here, which could be explained by different factors.
First of all, most IRD studies commonly rely on exome sequencing to identify coding variants with a high impact. Here, the search was oriented toward non-coding pathogenic variants, which is complicated by incomplete knowledge of the location and function of all regulatory elements.
Furthermore, (non-coding) variants affecting other known or novel disease genes involved in retinal and/or macular development, may also be causative for NCMD. Next, NCMD has great phenotypic variability and its clinical presentation may be mimicked by phenocopies. Examples thereof are toxoplasmosis or subtypes of inherited macular disease (e.g. BEST1-, PRPH2-associated maculopathy). 36, 89 Finally, more complex structural variants in the PRDM13 or IRX1 locus, affecting the regulation of target gene expression, may be missed due to technical limitations of the qPCR-based screening and short-read WGS.
When comparing both the novel and previously reported SNVs against their wild-type sequences using luciferase assays in ARPE-19 cells, we showed that all SNVs exert an effect on reporter gene expression.
In case of the first hotspot (PRDM13_cCRE3), the five variants had an increased activity, with the novel to 80), corresponding to the moment when amacrine cells start to emerge and begin to synapse with retinal ganglion cells. 93 We therefore hypothesize that altered PRDM13 or IRX1 expression impairs macula-specific synaptic interactions between amacrine and ganglion cells during retinogenesis. In future studies, transcriptome and chromatin interaction studies on patient-derived retinal organoids may be useful to evaluate the effect of SNVs and tandem duplications on gene expression, genome organization, and cellular differentiation and function in vivo.
In conclusion, we have provided an integrated retinal multi-omics database, which advances the annotation of the genome in human retina and represents a universal framework for the investigation of disease-associated loci implicated in rare as well as complex retinal diseases. By integrated multiomics profiling of human retina, and by an in-depth in silico, in vitro and in vivo assessment of cCREs and variants therein, we have gained insight into the cis-regulatory mechanisms and genetic architecture of NCMD. Overall, this supports the hypothesis that NCMD is a retinal enhanceropathy, which is unique in the wider group of IRDs. With a broader implementation of WGS in rare disease research, an emphasis shift to non-coding regions such as CREs as targets of mutations can be expected. Finally, our study is exemplar for how expanding knowledge of disease-causing mechanisms and phenotype-driven genomic and functional data profiling of non-coding regions, advance our ability to fully interpret variants in non-coding regions in rare diseases. Providing a definitive genetic diagnosis in more individuals with rare diseases is imperative for the design of efficient disease-specific genetic testing, genetic counseling, and ultimately for therapeutic decisions. Figure 1 -Integration of the generated UMI-4C data with publicly available Hi-C data. The UMI-4C interaction frequency profiles and domainograms (bottom) for the PRDM13 promoter (left) and IRX1 promoter (right) viewpoints were integrated with Hi-C data from control human retinal organoids (top), demonstrating promoter interactions within the respective TADs. 47 Topologically associated domains (TADs) are indicated by blue triangles. Chromosome coordinates are in hg38 annotation. PRDM13 and IRX1 loci. The cCREs are obtained after integration of the UMI-4C profiles in the retina-specific multi-omics database ( Figure S3). For each cCRE, it is indicated which characteristics contributed to their identification. The two right columns indicate whether cCRE activity was assessed using in vitro luciferase assays or in vivo enhancer detection assays, as well as the outcome of these experiments. cCRE: candidate CRE, DHS: DNase I hypersensitive site, ChIP-seq: chromatin immunoprecipitation sequencing, , UCNE: ultraconserved non-coding element.      frequency, and TRANSFAC output are given, as well as the prediction scores and the PHRED-scaled scores (-10 log 10 ) of seven in silico tools. The regBase_REG and regBase_PAT scores range from 0 to 1, with higher values indicating that a variant has a higher regulatory or pathogenic potential, respectively. GenoCanyon scores range from 0 to 1. A cutoff of 0.5 was used to define functionality. The ncER score ranges from 0 to 100, with higher scores suggesting that a region is more vital in terms of regulation. DVAR scores, ranging from 0 to 1, can be interpreted as the probability of a variant being functional. The FIRE score also ranges from 0 to 1, with higher scores providing stronger evidence that a variant regulates nearby gene expression levels. PAFA scores range from -1 to 1, reflecting the association of a variant with complex diseases or trait-associated SNPs. The CDTS score represents the absolute difference of the observed variation from the expected variation, with lower scores suggesting more intolerance to variation. The normalized, PHRED-scaled scores are color-coded for easy comparison (green = high, red = low).

Data availability
All unique data and materials that support the findings of this study are readily available from the corresponding author upon request.

Supplementary data
Gene name F primer R primer Reference genes

Variant F primer R primer
Known SNVs Copy number analysis   with the IRX1 promoter, as indicated by the diagonal arrow. However, in the UMI-4C experiment using the IRX1 promoter as viewpoint, the interaction is less evident. The three tandem duplications are displayed as blue bars. The UCNE located in this region is indicated by the horizontal arrow. UCNE: ultra-conserved non-coding element.