Genome-wide association study of open field behavior in outbred heterogeneous stock rats identifies multiple loci implicated in psychiatric disorders

Many personality traits are influenced by genetic factors. Rodents models provide an efficient system for analyzing genetic contribution to these traits. Using 1,246 adolescent heterogeneous stock (HS) male and female rats, we conducted a genome-wide association study (GWAS) of behaviors measured in an open field, including locomotion, novel object interaction, and social interaction. We identified 30 genome-wide significant quantitative trait loci (QTL). Using multiple criteria, including the presence of high impact genomic variants and co-localization of cis-eQTL, we identified 13 candidate genes (Adarb2, Ankrd26, Cacna1c, Clock, Crhr1, Ctu2, Cyp26b1, Eva1a, Fam114a1, Kcnj9, Mlf2, Rab27b, Sec11a) for these traits. Most of these genes have been implicated by human GWAS of various psychiatric traits. For example, Cacna1c, a gene known to be critical for social behavior in rodents and implicated in human schizophrenia and bipolar disorder, is a candidate gene for distance to the social zone. In addition, the QTL region for total distance to the novel object zone, on Chr1 at 144 Mb, is syntenic to a hotspot on human Chr15 (82.5-90.8 Mb) that contains 14 genes associated with psychiatric or substance abuse traits. Although some of the genes identified by this study appear to replicate findings from prior human GWAS, others likely represent novel findings that can be the catalyst for future molecular and genetic insights into human psychiatric diseases. Together, these findings provide strong support for the use of the HS population to study psychiatric disorders.

time spent and distance traveled around the object zone are used as indicators of preference for novelty. Novel object interaction has been considered as an important predictor in addiction-like traits [15,16] and high novelty preference increases the propensity for addictive drug-seeking behavior [17,18,9]. There are multiple different methods for conducting social interaction test (SIT) in rats [19,20,21]. In general, an unfamiliar stimulus rat and the rats to be tested are placed in the same arena. While manual scoring of social interaction often allows both rats to be freely moving, experiments using automated video analysis often limit the movement of the stimulus rat and calculate the time spend and distance traveled by the test rat around the stimulus rat.
The heterogeneous stock (HS) rats were originally derived from interbreeding eight inbred strains [22] and have been maintained as outbred for more than 90 generations. HS rats have been successfully used in several high-resolution genome-wide association studies (GWAS) [23,24,25,26]. Here we report the results on associations of genomic loci with measures obtained from OFT, NOIT and SIT. These analyses were based on an expanded data set that contained about twice the sample size of that reported previously [27]. These data were collected as part of a larger GWAS on socially acquired nicotine intravenous self-administration, which will be the subject of a separate publication.

Animals
The N/NIH heterogeneous stock (HS) rat (RRID:RGD2314009), was created at the NIH in 1984 by interbreeding the following eight inbred founder strains: that were used as breeders. After a two-week quarantine period, rats were transferred to a reversed 12h light-dark cycle (lights off at 9:00 AM) housing room. Breeding pairs were assigned according to an algorithm that maximized the genetic diversity of the offspring. Litters were culled to a maximum of 8 pups to ensure a consistent nutritional environment. Rats were weaned on 3 PND 21. An RFID was inserted subcutaneously when rats were weaned. Two male and two female rats per litter were used for behavioral studies. Teklad Irradiated LM-485 Mouse/Rat Diet and water were provided ad libitum. All rats were group-housed with 2-4 same-sex peers throughout the experiments to avoid social isolation. All procedures were conducted in accordance with the NIH Guidelines concerning the Care and Use of Laboratory Animals, as approved by the Institutional Animal Care and Use Committee of the University of Tennessee Health Science Center.

Study Design
All HS rats (626 males and 620 females in total from 16 batches) were adolescents when tests began. Their age was 31.8 ± 2.6 (mean ± STD) on the day of the OFT. Each HS rat was tested in all three behavioral tests, one test per day, in the following sequence: OFT, NOIT, and SIT. All tests were conducted in the dark phase of the light cycle (9 AM -4 PM) and were conducted in the same open field and recorded using the same video capture system.

Open field test
Two OFT arenas were constructed using black acrylic glass, measuring 100cm(L)× 100cm(W )×50cm(H), which were placed side by side. The floors were covered by wood boards painted with either black or white acrylic paint (ART-Alternatives, ASTM D-4236, Emeryville, CA, USA) to contrast the coat of the animals (i.e. a black board was used for rats with white fur). The test chambers were illuminated by a long-range, 850-nm infrared light (LIR850-70, LDP LLC, Carlstadt, NJ) located 160 cm above the center of the two test chambers. No source of visible light was present during behavioral testing, with the exception of a flat panel monitor (Dell 1908FP). A digital camera (Panasonic WV-BP334) fitted with an 830 nm infrared filter (X-Nite830-M37, LTP LLC, Carlstadt, NJ) and located next to the infrared light source was used to record the behavior of the rats. All rats were released at the same corner of the test chamber, and data were collected for 1 h. 4

Novel object interaction test
This test was conducted the day after the OFT in the same arena. A cylindrical rat cage constructed using 24 aluminum rods (30 cm in length) spaced 1.7 cm apart was used as the novel object. The bottom and top of the cage (15 cm in diameter) were manufactured using a 3D printer from polylactic acid. The design can be downloaded from https://github.com/chen42/RatSocialInteractionTest.
The novel object was placed in the center of the arena before testing. The test duration was 20 min and was recorded using the same camera as that used in the OFT.

Social interaction test
This test was conducted the day after the NOIT. This test compares the preference of a subject rat for a stimulus rat restricted in a cylindrical cage (i.e. the novel object used in the NOIT) against an empty cylindrical cage. The test arena was reduced to 100cm(L) × 60cm(W ) × 50cm(H) by using a black board placed vertically in the arena. Two cylindrical cages described above were placed˜30 cm away from the walls on opposite sides (i.e., similar to the arrangement commonly used in the three-chamber test). A randomly selected stimulus Sprague-Dawley rat of the same sex and similar weight as the HS test rat was placed into one of the cylindrical cages (kept the same throughout the experiment) 5 min before the HS subject rat was placed into the arena. The stimulus and subject rats were never housed together and thus were unfamiliar to each other. No social isolation was conducted on either rat. Each stimulus rat was used no more than once per day. The test duration was 20 min and was recorded using the same camera as that used in the OFT.

Analysis of video data
Ethovision XT video tracking system (Version 4.0, Noldus Information Technology, The Netherlands) was used to analyze the videos recorded in all behavioral tests. After identifying the arena and calibrating the size of the arena, specific zones in the arena were outlined. For OFT and NOIT, one center zone, which was a circular region with a diameter of 20 cm, was used. For the SIT, one object zone and one social zone, both were circular regions with diameters of 20 cm, corresponding to the two cylindrical cages respectively, were specified. The extracted data included the total distance traveled in the arena, the duration 5 and the frequency the test rat was present in specific zones, the distance of the subject to the zones, and the latency of the test rat entering the zones. The center of the subject rat was used for all calculations. Phenotypic correlations were determined using the Pearson test.

Pre-processing of phenotype data
For genetic analysis, each trait was quantile-normalized separately for males and females; this approach is similar to using sex as a covariate. Other relevant covariates (including age, batch number, and dissector) were identified for each trait, and covariate effects were regressed out if they were significant and if they explained more than 2% of the variance. Residuals were then quantile-normalized again, after which the data for each sex were pooled prior to further analysis.
This approach removed mean differences due to sex; further, it did not attempt to model gene-by-sex interactions.

Genotyping and estimates of heritability
Genotypes were determined using genotyping-by-sequencing (GBS), as described previously [28]. This produced 3,513,494 SNPs with an estimated error rate <1%. Variants for X-and Y-chromosomes were not called. We used this set of SNPs for GWAS, genetic correlations, and heritability estimates. We used GCTA-GREML [29] analysis to estimate proportion of variance attributable to SNPs.

Genetic Mapping
GWAS analysis employed a linear mixed model, as implemented in the software GCTA [30], using a genetic relatedness matrix (GRM) to account for the complex family relationships within the HS population and the Leave One Chromosome Out (LOCO) method to avoid proximal contamination [31,32]. Significance thresholds were calculated using permutation. Because all traits were quantile normalized, we used the same threshold for all traits [33]. To identify QTLs, we scanned each chromosome to determine if there was at least one SNP that exceeded the permutation-derived threshold of −log 10 (p) > 5.6, which was supported by a second SNP within 0.5 Mb that had a p-value that was within 2 − log 10 (p) units of the index SNP.
Other QTLs on the same chromosome were tested to ensure that they were 6 independent of the first. To establish independence, we used the top SNP from the first QTL as a covariate and performed a second GWAS of the chromsome in question. If the resulting GWAS had an additional SNP with a p-value that exceeded our permutation-derived threshold, it was considered to be a second, independent locus. This process was repeated (including all previously significant SNPs as covariates), until no more QTLs were detected on a given chromosome.
Linkage disequilibrium (LD) intervals for the identified QTL were determined by identifying those markers that had a high correlation coefficient with the peak marker (r 2 = 0.6).

Sex differences
We found that many of the traits measured in OFT, NOIT, and SIT are different between males and females (Table S1). In OFT, with the exception of latency of entering the center zone, all traits have statistically significant sex differences.
In addition, four out of six traits in NOIT and seven out of eleven traits in SIT are different between males and females. The range of effect size (Cohen's d) for statistically significant differences is (0.14, 0.31). Our genetic analysis quantile-normalized each trait separately for males and females. This approach removed mean differences due to sex and allowed us to combine males and females in the same analysis to increase the power of GWAS,

Phenotypic correlations
We calculated Pearson correlation between the 23 traits (

Heritability
SNP heritability estimates (h 2 ) for traits are provided in Table 1. In all the three behavioral tests, total travel distance has the highest heritability. In OFT, all heritability estimates are between 0.28 -0.38, with the exception of that for latency of entering the center zone (h 2 = 0.08). Heritability estimates for variables from the NOIT are slightly lower than that of the OFT; most of them are in the range of 0.21 -0.29, with the exception of that for the latency of entering the center zone (h 2 = 0.10). Heritability estimates for various measures of the SIT are in the range of 0.10 -0.28. Interestingly, heritability estimates for measures on the social zone are consistently greater than those for the object zone.

Identification of multiple GWAS hits
In Table 2, we present single nucleotide polymorphisms (SNPs) that are significantly associated with the phenotypes. The genome-wide statistical significance of the association is determined by −log 10 P values which ranges from 5.609 to 8.268. The p-values correspond to these are 2.46 × 10 −6 and 0.5 × 10 −8 , respectively. For OFT, there are 9 significant loci for 5 traits. We did not find a significant QTL for Duration in center zone (h 2 = 0.284 ± 0.045). We identified two loci for Frequency of entering center zone and Total travel distance, 3 loci for Total distance to center zone. We found 4 NOIT traits have significant loci.
Among them, Total distance to center zone has 3 loci and Mean distance to center zone has 2 loci. We did not find any significant loci for Frequency of entering center zone (h 2 = 0.209 ± 0.041) and Latency of entering center zone (h 2 = 0.100 ± 0.034). For SIT, we identified significant loci for all traits except Latency of entering object zone which has heritability of h 2 = 0.082 ± 0.032. We found 2 loci for the traits Latency of entering social zone, Mean distance to social zone Total distance to social zone and Total travel distance. All genome-wide significant loci are shown in Figure

Pleiotropic loci
To determine if traits that mapped to the same location are pleiotropic, we considered the minor allele frequency (MAF), and the SDP of the index SNP among the 8 founder strains that were used to create the HS. Using these criteria, we did not observe any pleiotropic loci between the traits analyzed in different tests. However, we did identify pleiotropic loci between the traits of the same behavior test. Most of these traits are highly correlated, as shown in Figure 1.
With the exception of three sets of QTL (Table S3), all others share the same top SNP (Table S2).

Candidate gene identification
The number of genes within the identified QTL ranges from 1 to 127 (mean: 30.1, median: 19. Table 2). There is only one region that contains a single gene: Adarb2 within chr17:58Mb for latency of entering social zone in SIT. However, it is also possible that the causal allele is a regulatory variant that is located in this interval but regulates a gene outside of the identified interval.
All other loci contained more than one gene. To identify candidate genes, we combined several criteria: 1) the presence of moderate or high impact variants located within the gene, as predicted by SnpEff [34]. We also require these variants are in high LD with the top SNP. We identified 149 coding variants within 30 QTL, 8 of which were predicted to have a high impact (Supplementary   Table S4). 2) the presence of a significant cis-eQTL in one or more of the five brain regions in a dataset containing 88 navie adult HS rats [35], 3) has a human ortholog that has been reported to be associated with psychiatric diseases (including drug abuse). When multiple candidates are present using the above criteria, we remove the gene with very low expression levels across all five regions in the RNA-seq data set (e.g., FPKM < 0.5) and select the candidate with the strongest support for the literature. Combining these criteria with a literature search conducted using GeneCup [36], we identified plausible candidate genes within 13 loci (Table 3).
In addition, for total distance to the novel object zone, the QTL region on chr1 (144 Mb, size: 4.1 Mb, Figure S34) contains 69 gene with human orthologs.
We found 14 of these genes have been reported in human GWAS to be asso-  Table S5). Three additional genes with sub-threshold significance in human GWAS are also included. These genes are all located in a syntenic region on human chromosome 15 . Although based on the criteria described above, Sec11a is the best candidate gene (Table 3), it is possible that this region contains multiple genes that are associated with the trait.

Discussion
As part of a GWAS on intravenous nicotine self-administration in adolescent HS rats that we are conducting [27,37], we collected several behavioral phenotypes related to anxiety, novelty exploration, and social interaction. We have previously reported that these behavioral traits contribute to the variation in nicotine intake [27]. We report here GWAS results of three behavioral traits: OFT, NOIT, and SIT, which were all conducted in the same open field. We identified 24 QTL for 30 traits. Using a set criteria outlined above, we identified 13 candidate genes.
OFT, NOIT, and SIT are widely used behavioral assays in rodents. With over 1,200 rats, ours represent some of the largest data collected using these assays.
Similar to our interim report on this data set [27], we found a large number of correlations with relatively low coefficients (e.g., r < 0.4) but with high statistical significance. It is likely that these correlated traits are controlled by the same behavioral processes and thus are influenced by the same genetic factors. In fact, our genetic analysis did find several pleiotropic sites (Table S3). Almost all pleiotropic loci are reported for traits measured in the same behavior assay. It is likely that further increasing sample size will provide greater statistical power to detect pleiotropic effect across different behavioral assays.
Many of the candidate genes in this study have been associated with psychiatric or drug abuse traits in humans. For example, we identified Cyp26b1, a retinoic acid degrading enzyme, as a candidate gene for the frequency of entering the center The Crhr1 gene, which encodes corticotrophin release hormone receptor 1, is a candidate gene for total travel distance in the OFT. Crhr1 is involved in anxietylike behavior in OFT in rats [44,45]. In mice, conditional knockout approach showed that Crhr1 in the forebrain underlies the effect of early life stress on total travel distance in the OFT [46], which provides a direct confirmation for the association we report here.
Among the candidate genes for NOIT, Eva1a, a candidate gene for the duration in the center zone, is supported by strong cis-eQTL and a missense variant.
Eva1a has no literature support and thus could lead to the discovery of new mechanisms for novelty seeking-like behavior. Sec11a, a candidate gene for total distance in the center zone, is associated with depression and schizophrenia [47,39,48]. Mlf2, a candidate gene for total distance to center zone in NOIT, is associated with smoking in humans [49] and has very high expression levels in the accumbens (Table 3).
For the SIT, we identified Cacna1c, encoding the Ca v 1.2 subunit of the L-type Ca 2+ channel, as a candidate gene for distance to the social zone. Cacna1c has been associated with schizophrenia [50] and bipolar disorder [51] in human GWAS. Both schizophrenia and bipolar disorders are associated with impairments in a range of social deficits [52,53]. In animal studies, Sprague-Dawley rats with heterozygotic deletion of the Cacna1c gene (homozygotic mutation is lethal) showed many deficits in social behavior. These included reduced levels of ultrasonic vocalizations during rough-and-tumble play, as well as social approach behavior elicited by playback of ultrasonic vocalizations [54,55] . Ctu2 is involved in post-translational modification of tRNAs [63]. Adarb2 has been associated with home cage activity [64] and unipolar depression [65]. The Clock gene is involved in the maintenance of locomotor rhythms [66]. Mutations of the CLOCK gene have been implicated in many psychiatric disorders [67]. Although these candidates are well supported by multiple lines of evidence, additional work is needed to confirm their causal relationship to the corresponding behavioral traits.
This SNP is also associated with the duration rats stayed in the novel object zone, although the p value did not reach genome-wide significance (-logP=4.63). This  (Table S5). Using the criteria described above, we identified Sec11a as the best candidate gene (Table 3). However, given the large number of genetic variants reported in human GWAS that are associated with psychiatric conditions within this syntenic region, it is very likely that this region contains multiple genes that are associated with novelty seeking-like behavior.
We include overlapping with human psychiatric GWAS results as part of the criteria in prioritizing candidate genes. It is possible that this approach could introduce bias and prevent us from making novel discoveries. For example, two (Cyp26b1 and Crhr1 ) of the three candidate genes for OFT have been associated with schizophrenia, rather than anxiety. However, many genetic variants are pleiotropic for multiple psychiatric diseases [68]. For example, polygenic risk scores for schizophrenia have been associated with many other psychiatric diseases, such as anxiety disorder [69] or major depressive disorder [70], or cognitive performance [71]. Together with other evidence, we believe considering human psychiatric GWAS findings when identifying candidate genes in our study, even when the behavior trait in rats does not map directly to the psychiatric disease, is still valid and will likely increase the transnational value of our findings.
The presence of cis-eQTL in the brain is one of the strongest pieces of evidence that we use to prioritizes candidate genes. Nine of the 13 candidate genes we identified have cis-eQTL. Two of the strongest candidate genes in our results, Cacna1c for social behavior and Crhr1 for anxiety-like behavior, are both supported by prior studies on similar traits using knockout mice. However, we did not find significant cis-eQTL of these two genes in our dataset. This could imply either our cis-eQTL dataset lack sufficient power or that genetic regulation of the traits does not directly involve gene expression in the brain regions that we have eQTL data. In addition, several QTL regions contain multiple cis-eQTL.
It is possible this is due to strong LD within the region.
The HS rat population has already been successfully used in genetic mapping studies of physiological or behavioral traits [72,73,26]. Prior study mapped several anxiety-like traits using zero maze [23]. Several GWAS using HS to study behavioral regulation [74], response to cocaine cues [75], cocaine selfadministration [76], nicotine self-administration [37,27], or oxycodone selfadministration are underway. Our study add to the literature 30 QTLs and 13 candidate genes for psychiatric related behavioral traits. While some of the candidate genes are well supported by knockout studies in mice and human GWAS, others likely represent novel findings that can be the catalyst for future molecular and genetic insights on psychiatric diseases.

Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.