Differential Expression and Interaction network construction of Noncoding RNA and mRNA in Heart Tissue associated with Tetralogy of Fallot

Tetralogy of Fallot (TOF) is still the most common and complicated cyanotic congenital heart defect of all congenital heart diseases with a 10% incidence. Surgery repair is often necessary in infancy. The etiology of TOF is complex and genetic and epigenetic mechanisms such as chromosomal abnormalities, gene mutations, nucleic acid modifications, non-coding RNA, and circular RNA(circRNA) play an important role in its occurrence. RNA not only plays an auxiliary role of genetic information carrier, but also plays a more important role in various regulatory functions. There are few studies on the action mechanism of non-coding RNA. Aim to gain more in-depth knowledge of TOF, we collected tissue samples of the right ventricular outflow tract of 5 TOF children with no other intracardiac and extracardiac malformations and 5 normal fetuses. We systematically analyzed the specific long non-coding RNA (lncRNA), microRNA(miRNA), circRNA and messenger RNA(mRNA) profiles of TOF. To our knowledge, there are no reports of genome-wide study of transcriptome in TOF and we first obtained meaningful differentially expressed lncRNAs, miRNAs, circle RNA and mRNAs.

The total incidence of neonatal defects in China is 5.6%, and 0.9 million neonates are born with 37 defects every year. The incidence of neonatal defects is increasing annually. The heart is the first 38 major internal organ in the process of embryogenesis and is important for the survival of an embryo. 39 Congenital heart disease (CHD) has ranked first for five consecutive years in neonates. The overall 40 incidence of CHD in the population is 1‰-13‰. In China, the incidence of CHD is 4‰-8‰. 41 Children with severe CHD die in childhood, thereby causing psychological trauma and economic 42 burden to families and society. Tetralogy of Fallot (TOF) remains the most common complex 43 cyanotic congenital heart defect among all congenital heart diseases given its 10% incidence 44 (BEDARD et al. 2009). Surgery repair is often necessary in infancy, and the perioperative mortality is 45 less than 5%. However, long-term follow-up has shown that patients who have undergone total repair 46 of TOF are often at high risk for restenosis of the right ventricular outflow tract (RVOT), pulmonary 47 regurgitation, and ventricular dysfunction, and late sudden cardiac death may occur in 0.5% to 6% of  However, ncRNAs may play a significant regulatory part in heart formation. Little is known 99 regarding the changes in the transcriptome and the possible associations with TOF. Thus, we 100 systematically analyzed the specific lncRNA, miRNA, circRNA, and mRNA profiles of TOF to gain 101 more in-depth knowledge. As we know, no genome-wide study has concentrated on the 102 transcriptome in TOF. This is the first work to obtain meaningful differentially expressed lncRNAs, 103 miRNAs, circle RNA, and mRNAs. 104 105

Materials and methods 106
Subjects 107 The case group included five children with TOF (two boys and three girls, aged 6-22 months). 108 Cyanosis was found in all children, and rough systolic murmur was found between the second and 109 fourth intercostals at the left border of the sternum by auscultation. Echocardiography showed 110 ventricular septal defect, aortic straddle, RVOT obstruction, pulmonary artery stenosis, right 111 ventricular enlargement, and right ventricular anterior wall thickening. The children and their parents 112 had no other intracardiac and extracardiac malformations. In the control group, five fetuses, including 113 three males and two females with gestational age of 25-28 weeks, had normal hearts. Given fetal 114 distress, cord twist, and placental abruption, odinopoeia was conducted. 115

Extraction of RNA 116
The Total RNA including small RNA was collected from 10-20 mg of frozen tissue of the right 117 ventricle by using a TRIzol reagent (Invitrogen, Gaithersburg, MD, USA), which was purified with a 118 mirVana miRNA Isolation Kit (Ambion, Austin, TX, USA) according to the instructions of the 119 manufacturer. RNA's purity and concentration were defined by OD260/280 by using a 120 spectrophotometer (NanoDrop ND-1000). RNA integrity was defined by 1% formaldehyde 121 denaturing gel electrophoresis. RNA integrity was defined by capillary electrophoresis with an RNA 122 6000 Nano Lab-on-a-Chip kit and Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA). 123 Only if the integrity number values of RNA were more than 6, it was analyzed later. 124

Microarray 125
The specific lncRNA, mRNA, miRNA, and circle RNA profiles of TOF were analyzed by using 126 Agilent InvDB, LncRNAs-a (Enhancer-like), Antisense ncRNA pipeline, etc.. Repeated probes were used to 134 detect each RNA for two times. The control probes were contained in the array. 135 The Agilent miRNA array (8 × 60K) was used to detect miRNA profiling, which included 2549 136 probes of human mature miRNAs and the miRNA target sequences were from the miRBase R21.0. 137 Repeated probes were used to detect each miRNA for times. The control probes were contained in 138 the array. 139 The Agilent human circRNA Array V2.0 (4 × 180K) was used to detect circRNA profiling, 140 which included approximately 170,340 probes of human circRNAs and the circRNA target 141 sequences were from circBase and DeepBase. Long and short probes were used to detect each 142 circRNA at the same time. The control probes were contained in the array. 143

RNA amplification, labeling, and hybridization 144
Agilent human lncRNA+mRNA Array v4.0 145 Cy5 and Cy3-dCTP was used to label the cDNA) by using the Eberwine's linear RNA amplification 146 method and subsequent enzymatic reaction (PATTERSON et al. 2006 The experiments were carried out in accordance with the protocol of the manufacturer. In short, the 170 miRNA labeling reagent was used to label the miRNAs. pCp-Cy3 was used to dephosphorylate and 171 ligate 200 ng total RNA. Next, we purify and hybridize the labeled RNA. Last, the Agilent 172 microarray scanner was used to scan the images and Agilent feature extraction software version 173 10.10 was used to grid and analyze them. 174 Agilent human circRNA Array v2.0 175 Total RNAs were first digested by using Ribonuclease R to ensure the specific and efficient labeling 176 of cirRNA. Then, Cy3-dCTP was used to label cDNA in accordance with Eberwine's linear RNA 177 amplification method. The succeeding enzymatic reaction was as described above. The next steps of 178 RNA amplification, labeling, and hybridization are described in the "Agilent human lncRNA+mRNA 179 Array v4.0" section. 180

Microarray imaging and sample analysis 181
The Agilent GeneSpring software V13.0 was used to summary, normalize and control in quality the 182 data of the lncRNA+mRNA array, miRNA array, and circRNA array. 183 The genes which were differentially expressed were selected by using the Threshold values of ≥2 184 and ≤−2-fold change and a corrected P value of Benjamini-Hochberg, less than 0.05, which had 185 significant differences. Log2 was used to process the data and the CLUSTER 3.0 software was used 186 to adjust the data. The hierarchical clustering was used to analyze the data and cluster heat maps were 187 generated. At last, the Java TreeView (Stanford University School of Medicine, Stanford, CA, USA) 188 was used to conduct the tree visualization. On the basis of the normalized data, the Pearson 189 correlation coefficient matrix of the correlation between samples was made. The box plot was drawn, 190 showing the gene expression of different samples before and after data normalization. Principal 191 component analysis (PCA) was used to reflect the similarity of samples, and the expression of the 192 samples was shown in the 3D space through the dimension reduction of data. 193

Comparative analysis of case and control groups 194
T-test was used to analyze the difference between groups. The default screening criteria for 195 significant difference in this study were as follows: when the gene was upregulated, the number of 196 detected samples required to account for more than 60% of the total number of this group. When the 197 gene was downregulated, the requirement was the same. The difference multiple FC(ABS)≥2.0 and 198 P≤0.05 were also required. The larger the difference between the multiple of FC (ABS), the greater 199 the difference between the two groups. The lower the P value, the higher the reliability of differential 200 genes. Cluster analysis was performed, and a scatter map and volcano map were drawn to illustrate 201 the differences between groups. 202

Construction of the coding-noncoding gene co-expression (CNC) network 203
Construction of the lncRNA-mRNA network 204 The correlation analysis between the lncRNA and mRNA differentially expressed was used for 205 constructing the network of CNC. For construction of the network, , Pearson correlation was 206 calculated, and the critical correlation lncRNA and mRNA pairs were chosen. When Pearson 207 correlation coefficients, which are not less than 0.99, the lncRNAs and mRNAs pairswere selected to 208 draw the network by using software Cytoscape. In the network construction analysis, a degree was 209 the simplest and most important measure of a gene centrality which was defined as the number of 210 connections between one node to the other(BARABASI AND OLTVAI 2004). 211 The lncRNA prediction for Cis-acting and trans-acting was performed in this study. The histogram drawn on the basis of the P value could directly reflect significantly enriched terms. A 236 bubble chart was also provided to show the enrichment analysis results. Finally, candidate signal 237 pathways related to TOF were used to further determine candidate lncRNAs, mRNAs, miRNAs, and 238 circRNAs. 239

Quantitative real-time polymerase chain reaction (qRT-PCR) 240
The RNA was extracted by using the TRIzol method, which was reverse, transcribed to cDNA by 241 using a TaKaRa reverse transcription kit. Reaction amplification was performed by using ABI 242 StepOne Real- 1 and 2). The clustering analysis result (Fig.1) suggests that in-group samples have a good correlation 257 and high consistency, whereas the between-group samples are significantly different and can be 258 clearly distinguished. Thus, the selection feasibility of tissue samples is high. Co-expression analysis 259 was conducted on the selected lncRNAs that are differentially expressed and mRNAs. A co-260 expression network map was constructed (Fig. 2), and 44 lncRNAs and 497 mRNAs co-expressed 261 were identified. The target genes of the co-expressed lncRNA and mRNA were predicted, and the co-262 expression network map was constructed (Fig. 3). Finally, 66 lncRNAs and 62 mRNAs that may 263 target each other were identified. 264 GO, KEGG pathway enrichment, and disease analyses were conducted for the candidate genes 265 that target each other. In GO analysis, 20 significantly enriched terms were selected from BP, MF, 266 and CC. A histogram was drawn on the basis of the P value, as shown in Fig. 4. In KEGG pathway 267 enrichment and disease analyses, a histogram was drawn for the top 20 terms, as shown in Fig. 5. A 268 total of five lncRNAs and mRNAs that interacted with one another were selected finally (Table 1). 269

miRNA 270
A total of 118 differentially expressed miRNAs, including 94 upregulated and 26 downregulated 271 miRNAs, were screened (Attached List 3). A Circos plot was drawn on the basis of the differences 272 (Fig. 6). This plot shows the degree of difference among genes on the basis of their location. Figure 6  273 shows 37 upregulated miRNAs and 10 downregulated miRNAs with a high degree of difference. Target gene prediction was conducted for differential miRNA. miRNAs that interacted with another 279 and conformed to the expression trend of the target mRNA were selected (Table 2). 280 circRNA 281 A total of 18,016 circRNAs which are differentially expressed, including 9404 upregulated circRNAs 282 and 8612 downregulated circRNAs, were screened. On the basis of differences, a Circos plot was 283 drawn (Fig. 9). This plot is based on the location of genes. It can show the degree of difference 284 between differential genes. The plot was distributed in all chromosomes. Clustering analysis and 285 PCA were conducted for cases and control samples. The results show that in-group samples have a 286 good correlation and high consistency; between-group samples are significantly different and can be 287 clearly grouped, and data have strong reliability. GO, KEGG pathway enrichment, and disease 288 analyses were conducted for differential circRNAs (Figs. 10 and 11). The most prominent function of 289 circRNAs is its role as miRNA sponge to regulate target gene expression by inhibiting miRNA 290 activity. A circRNA-miRNA network was constructed (Fig. 12). The figure shows the interaction 291 network between the first eight circRNAs and multiple target miRNAs. In these eight circRNAs, 292 except for hsa_circ_0056861_CBC1, the expression of the remaining circRNAs was downregulated 293 compared with that in the control group. Three differentially expressed circRNAs, namely, hsa-294 circRNA301, hsa_circ_0113626, and hsa_circ_0129079, may be related with the target mRNA and 295 were selected. The target mRNA of hsa-circRNA301 and hsa_circ_0113626 is CPT2, and that of 296 hsa_circ_0129079 is GHR. 297

qRT-PCR 298
qRT-PCR was conducted for the five pairs of candidate lncRNA/mRNAs in the case and control 299 groups. The test result shows that the relative quantitative analysis results of four lncRNAs and four 300 mRNAs are consistent with the chip results (Fig. 13)  stages of embryonic heart development. A very small amount of lncRNAs may be directly related to 333 congenital heart disease. However, whole genome-based studies on lncRNAs, especially on intron or 334 intergenic lncRNAs, are lacking. Given the diversity in the categories and functions of lncRNAs, the 335 reference significance of study results on different lncRNAs is not high. In addition, given the lack of 336 conservation among species, the expression level is generally low. There is no uniform name for 337 lncRNA characteristics in the world, so there is difficulty in understanding its real meaning and 338 function from the name. Given the absence of an lncRNA database, studying lncRNAs is difficult. 339 Moreover, the role of lncRNAs in heart development should be elucidated. With progress in this 340 field, many circRNAs have been identified to be distributed in many organisms; an evolutionarily 341 conserved endogenous ncRNA was found to play a crucial part in the regulation of gene expression 342 and correlation and consistency; between-group samples are significantly different and can be clearly 354 grouped with strong data reliability. Next, target gene prediction, GO, and KEGG pathway 355 enrichment analyses were conducted for differential locus.