Expression and protein sequence analyses of zebrafish impg2a and impg2b, two proteoglycans of the interphotoreceptor matrix

Photoreceptor outer segments projecting from the surface of the neural retina toward the retinal pigment epithelium (RPE) are surrounded by a carbohydrate-rich matrix, the interphotoreceptor matrix (IPM) [1,2]. This extracellular compartment is necessary for physiological retinal function. However, specific roles for molecules characterizing the IPM have not been clearly defined [3]. Recent studies have found the presence of nonsense mutations in the interphotoreceptor matrix proteoglycan 2 (IMPG2) gene in patients affected by autosomal recessive Retinitis Pigmentosa (arRP) [4,5] and autosomal dominant and recessive vitelliform macular dystrophy (VMD) [6,7]. The gene encodes for a proteoglycan synthesized by photoreceptors and secreted in the IPM. However, little is known about the function and structure of this protein. We used the teleost zebrafish (D.rerio) as a model to study IMPG2 expression both during development and in adulthood, as its retina is very similar in humans [8]. In zebrafish, there are two IMPG2 proteins, IMPG2a and IMPG2b. We generated a phylogenetic tree based on IMPG2 protein sequence similarity among different vertebrate species, showing a significant similarity despite the evolutionary distance between humans and teleosts. In fact, human IMPG2 and D.rerio IMPG2a and IMPG2b share conserved SEA and EGF-like domains. Homology models of these domains were obtained by using the iTasser server. Finally, expression analyses of impg2a and impg2b during development and in the adult fish showed expression of both mRNAs starting from 3 days post fertilization (dpf) in the outer nuclear layer of zebrafish retina that continues throughout adulthood. This data lays the groundwork for the generation of novel and most needed animal models for the study of IMPG2-related inherited retinal dystrophies.


51
The IPM is the extracellular matrix, mainly composed of proteoglycans and 52 glycosaminoglycans, surrounding retinal photoreceptor outer segments and ellipsoids [9]. 53 The function of the IPM in retinal function has started to be investigated only recently, as is 54 its involvement in retinal disorders [10]. In the last years, new roles for the IPM were 55 identified, which include intercellular communication, membrane and matrix turnover, duplication. For this reason, many genes are found in two copies, named paralogues [32].

76
IMPG2 is present as impg2a and impg2b. We obtained a phylogenetic tree of IMPG2 in 77 different vertebrate species to investigate the extent of protein conservation during evolution. 78 Moreover, since IMPG2 protein structure has largely been unstudied, we performed  Human IMPG2 is a 1241 residues protein with four topologically distinct regions: a signal 87 peptide of 22 amino acids at the N-terminus, an extracellular topological domain (residues 88 23 to 1099), a helical transmembrane domain (residues 1100 to 1120) and a cytoplasmic 89 topological domain (residues 1121 to 1241). It also contains two SEA domains and two EGF-90 like tandem repeats, together with 5 hyaluronan-binding motifs. The protein is also a target 91 for post-translational modifications, such as glycosylation and phosphorylation, at different 92 sites (UniProt database). 93 We first investigated IMPG2 conservation during evolution, by alignment of the protein 94 sequence of each chosen species and subsequent generation of a phylogenetic tree. IMPG2 95 protein sequences of different species were retrieved from NCBI database, which indicated 96 the presence of IMPG2 only in jawed vertebrates. We selected some of the most common 97 species of different vertebrate groups to include in our analysis. We then used the Clustal 98 Omega sequence alignment program to perform a multiple sequence alignment and 99 generate a phylogenetic tree (Fig 1a), which reflects the distance in terms of sequence 5 100 alignment between the different vertebrate species. The length of the branches is directly 101 correlated with the difference between the sequences. For example, we observed that Danio 102 rerio IMPG2a and IMPG2b and Homo sapiens IMPG2 protein sequences cluster separately.

103
Such a sequence difference reflects the evolutionary distance between the two groups. 104 Moreover, as reported in Section 1, the genome of teleost fishes underwent duplication [32], 105 explaining the presence of two paralogues in Danio rerio, which cluster together in the 106 phylogenetic tree (Fig 1a). Interestingly, the other teleost fishes included in our analysis 107 (Notobranchius furzeri and Oryzias latipes) do not have paralogues. One explanation could 108 be that the genomes of these two species underwent duplication, but the second copy of 109 the gene lost its function during evolution and was no longer subjected to selective pressure.

110
To understand in more detail the sequence conservation between the two zebrafish proteins 111 and human IMPG2 we used UniProt database and we found some domains (SEA, EGF-like 112 and transmembrane) that are conserved in all the three proteins (Fig 1b). By using the 113 BLAST Alignment Tool, we demonstrated that both fish proteins share 65% identity with the 114 region of the human protein where the conserved domains are located (residues 879-1238).

115
These conserved domains were then deeper analysed by homology modelling, as described 116 in the subsequent section.  organs of adult fish. Results revealed very low expression of impg2b at 2.5 dpf and low 151 impg2a mRNA levels at 3 dpf. However, impg2b and impg2a mRNAs start being significantly 152 expressed at 3 dpf and 4 dpf, respectively (p<0.001, Tukey's test following one-way 153 ANOVA). In the analysis we compared the expression of impg2a and impg2b with that of 154 rhodopsin, a strongly expressed photoreceptor-specific gene (Fig 3a). In the adult fish, RT-    with the antibodies reported in S2 Table. 12 276 Design of RNA probes 277 The genome browser Ensembl was used to find the exon sequences of the genes of interest.

278
NCBI Primer-Blast (https://www.ncbi.nlm.nih.gov/tools/primer-blast/) was used to design the 279 primers, following two criteria: ideal length between 20 and 24 bp and GC content between 280 42% and 52%. For the amplicon instead, a length between 400 and 1200 bp was chosen,

281
with an ideal value of 600 bp. Finally, the T7 polymerase promoter sequence 282 (GCGTAATACGACTCACTATAGGG) was added to the 5' of the designed primers (S3 283   Table).