Genome-wide association analysis of cotton salt stress response-related sites

Soil salinization is the main abiotic stress factor affecting agricultural production worldwide, and salt stress has a significant impact on plant growth and development. Cotton is one of the most salt-tolerant crops. Its salt tolerance varies greatly depending on the variety, growth stage, organs, and soil salt types. Therefore, the selection and utilization of excellent salt-tolerant germplasm resources and the excavation of excellent salt-tolerant salt and salt resistance genes play important roles in improving cotton production in saline-alkali soils. In this study, we analysed the population structure and genetic diversity of 144 elite Gossypium hirsutum cultivar accessions collected from around the world, and especially from China. Illumina Cotton SNP 70K was used to obtain genome-wide single-nucleotide polymorphism (SNP) data for 149 experimental materials, and 18,432 highly consistent SNP loci were obtained by filtering. PCA (principal component analysis) indicated that 149 upland cotton materials could be divided into 2 subgroups, including subgroup 1 with 78 materials and subgroup 2 with 71 materials. Using the obtained SNP and other marker genotype test results, under salt stress, the salt tolerance traits 3d_Germination_potential, 3d_Bud_length_drop_rate, 7d_Germination_rate, 7d_Bud_length_drop_rate, 7d_Germination_weight, 3d_Bud_length, 7d_Bud_length, relative_germination_potential, Relative_germination_rate, 7d_Bud_weight_drop_rate, Salt tolerance index 3d_Germination_potential_index, 3d_Bud_length_index, 7d_Bud_length_index, 7d_Bud_weight_index, and 7d_Germination_rate_index were evaluated by genome association analysis. A total of 27 SNP markers closely related to salt tolerance traits and 15 SNP markers closely related to salt tolerance index were detected. At the SNP locus associated with the traits of the bud length decline rate at 7 days, alleles Gh_A01G0034 and Gh_D01G0028 related to plant salt tolerance were detected, and they are related to intracellular transport, membrane microtubule formation and actin network. This study provides a theoretical basis for the selection and breeding of salt-tolerant upland cotton varieties.

High-throughput genotyping platforms play an important role in plant genome research. Cai et al. constructed a high-density 80K SNP cotton chip that contained 77,774 SNP sites, among which 352 cotton materials were analysed, and 76.51% of the sites were polymorphic. The chip was utilized to perform a GWAS analysis of 288 upland cotton materials, and total of 54,588 SNPs related to 10 salt tolerance traits were identified, of which 8 SNPs were significantly associated with 3 salt tolerance traits (Cai et al, 2017). Huang et al. used the US 63K chip to perform GWAS analysis on 503 upland cotton materials (63K) and identified 324 SNPs and 160 QTLs related to 16 agronomic traits, of which 38 related areas control 2 or more traits (Huang et al, 2017). Paterson et al. used a population of upland cotton (440 materials) and a population of sea island cotton (219 materials) and the genotyping-by-sequencing (GBS) method to develop 10,129 SNP markers and obtained monomer domains in the whole gene range through analysis. Haplotypes and these results indicate the important role of population genetic methods in the selection of genomic regional variation in the process of cotton domestication (Paterson et al, 2012).
Breeding salt-tolerant crop varieties is the only way to achieve sustainable agricultural development in the future. However, the salt tolerance of plants is a very complicated process. In this study, 124 upland cotton varieties (lines) were used as materials, and 70K SNP chips were used to screen SNP loci and perform genome-wide association analysis on the traits related to salt tolerance at the seedling stage to find significant association sites related to salt tolerance. This study provides a reference and basis for further theoretical studies, such as the isolation of related genes and molecular marker-assisted selection of cotton salt tolerance.
1 Materials and methods

Test materials
We sampled 144 modern G. hirsutum cultivars collected from the Chinese national medium-term cotton gene bank at the Institute of Cotton Research (ICR) of the Chinese Academy of Agricultural Sciences (CAAS) ( Table 1).  (Pritchard et al. 2000). SPAGeDi  software was used to estimate the relative kinship between two individuals in a natural population. The kinship itself is the relative value that defines the genetic similarity between two specific materials and the genetic similarity between any material. Therefore, when the kinship value between the two materials is less than 0, it is directly defined as 0.

Linkage disequilibrium analysis
On the same chromosome, the linkage disequilibrium between two SNPs within a certain distance can be calculated (such as 1,000 kb), and the linkage disequilibrium strength is represented by r2. The closer r2 is to 1, the stronger the strength of linkage disequilibrium. The SNP spacing is fit to r2, and a graph can be drawn to represent the variation of r2 with distance. Generally, the closer the SNP spacing is, the larger r2 is, and the farther the SNP spacing is, the smaller r2 is. The distance travelled when the maximum r2 value drops to half is used as the LD decay distance (LDD) of linkage disequilibrium. The longer the LDD is, the smaller the probability of recombination within the same physical distance; the shorter the LDD is, the greater the probability of recombination within the same physical distance. Plink2 (Purcell et al., 2007) software was used for LD analysis.

Association analysis of salt tolerance traits
Linkage disequilibrium analysis of natural populations was used to evaluate traits.
Through a certain amount of population SNP marker data, combined with population structure and target trait phenotype data, the target region or site associated with the target trait can be located.
Salt stress conditions and salt-tolerant trait collection: The salt tolerance test during the germination period used double-layer filter paper rolls to stand the plant upright. Two pieces of filter paper each 20 cm in length and width were cut, and one piece of filter paper was spread on the test bench with a sprayer containing NaCl solution. The filter paper was soaked, and 15 seeds were placed 2 cm down from the top of the filter paper. The filter paper was then placed vertically into the culture box.
Approximately 30 rolled filter papers were placed in each culture box. The culture box was then placed at 28ºC, and the photoperiod was 10h/14h (L/D), with heat preservation and culture in a constant temperature light incubator. The germination potential of seeds and the length of each seed were measured on the 3rd day, and the germination rate, bud length and stem fresh weight of the seeds were measured on the 7th day. This process was repeated 3 times. The treatment concentrations of NaCl solution were 0 NaCl (CK) and 150.00 mmol/L NaCl. The calculation formula analyses the relative values of the salt stress environment and the control conversion.
Relative germination potential% = germination potential of treated seeds/germination potential of control seeds × 100% Relative germination rate% = germination rate of treated seeds/germination rate of control seeds × 100% Decrease rate% = (treatment traits-control traits)/control traits × 100% Salt tolerance index: Note: X d and X w are the measured values of a certain index of each material under salt stress conditions and control conditions, respectively, and is the average value of this index under salt stress conditions. SAS software was used to perform the best linear unbiased prediction (BLUP) for salt tolerance traits, TASSEL v5.0 software was used to perform correlation analysis for each trait based on the four models of glm, mlm, cmlm, and fastlmm, and the result of the structure was used as a fixed effect. Among them, the mixed linear model formula of TASSEL software is as follows: Note: SPAGeDi  software was used to calculate the genetic relationship K between samples.
The general linear model uses Q population structure information, while the mixed linear model uses Q+K, which is the population structure and genetic relationship information. X is the genotype and Y is the phenotype. In the end, an association result can be obtained for each SNP site.
Salt stress cotton transcriptional group data download: Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fibre improvement.  software. First, the number of subgroups (K) was set to 2-10, and each K value was set to 3 repetitions. Assuming that each site is independent, the Markov chain Monte Cardo (MCMC) at the beginning of the noncount iteration (length of burn in period) was set to 10,000 times, and then the MCMC after no-count iterations was set to 1,000,000 times. The optimal K value was selected according to the principle of maximum likelihood value to determine the number of subgroups and the group structure. In this experiment, using the Q value calculation and Structure software, the population was divided into 2 subgroups.

Material heterozygosity
The individual heterozygosity analysis found that 95% of the materials were less than 30% heterozygous, and 80% of the individual materials were less than 5% heterozygous ( Figure 2). Among the 149 varieties, the genetic relationships between most varieties were weak (the yellow parts in the figure), and the genetic relationships between a few materials were very close (the dark red parts in the picture).

Phenotypic statistical analysis
In the best linear unbiased prediction (BLUP) for salt tolerance index traits, a total of 5 phenotypic traits in the salt tolerance index, 3d germination vigour index under salt stress, 3d bud length index, 7d bud length index, 7d bud weight index, 7d germination rate index, were identified. Figure 5 shows that the phenotypic distributions of the 5 traits were all normally distributed, indicating that these traits are all typical quantitative traits and are controlled by minor-effect polygenes. Using R language to calculate the Pearson correlation coefficients between traits, it was found that the correlations between different traits were low (Table 3), which may be because these 5 traits are controlled by independent inherited genetic sites in response to salt stress, indicating consistent complexity.

Association analysis of salt tolerance index traits
The results of the GWAS under the optimal model of the salt tolerance index traits under BLUP were counted and explained, and the results are shown in Table 2. A total of 15 significant SNP-trait associations were detected (Table 5). It was also found that among these 5 traits, only 4 traits had significant SNPs, while the 7d_Bud_wight_index did not have a significant locus. This may be because this trait is more complicated and controlled by multiple minor QTLs ( Figure 6).   showed that the homologous gene in Arabidopsis thaliana is AT4G34940, with 73% homology, which encodes ARO1 armadillo repeat only 1 (Figure 8). blocks were excavated from sea island cotton (Reddy et al, 2017).
In this study, Illumina Cotton SNP 70K was used to develop 18,432 SNP markers in the whole genome. On this basis, whole-genome association analysis was used to associate excellent sites related to salt tolerance traits and the salt tolerance index.

Functional analysis of candidate genes
Candidate genes are a class of genes whose expression on the chromosome is not clear. They are involved in the phenotypic expression of organisms, and association analysis suggests that they are related to a certain part of the genome. Such genes may be structural genes, regulatory genes, or affect the expression of traits in biochemical metabolic pathways. The functional insufficiency of the candidate gene is known, and whether it is related to drought resistance has been verified. According to the screening, functional annotations can be assigned, or Arabidopsis homologous genes can be found from the gene information. This method has been previously reported to target genes that are clearly related to salt tolerance. GWAS analysis is a fast and powerful method to mine regulatory genes through crop indicators. In this study, two candidate genes related to salt tolerance, Gh_A01G0034 and Gh_D01G0028, and Arabidopsis homologous genes AT4G34660 and AT4G34940 were identified in the significant SNP sites. . Clathrin-mediated endocytosis of plasma membrane proteins is an essential regulatory process that controls plasma membrane protein abundance and is therefore important for many signalling pathways, such as hormone signalling and biotic and abiotic stress responses (Li, 2020).
Armadillo repeat protein (ARO1) is one gene in a family of four in Arabidopsis.
It is localized in the nucleus and cytoplasm of pollen vegetative cells and in the cytoplasm of egg cells and is involved in the signalling network, controlling tip growth and actin organization in the pollen tube. The signal-mediated and spatially controlled assembly and dynamics of actin are crucial for maintaining the shape, motility, and tip growth of eukaryotic cells. ARO1 is specifically expressed in the vegetative cells of pollen as well as in egg cells (Marina Gebert, et al, 2008).

Conclusion
A total of 18,432 polymorphic SNP markers were developed and screened from natural populations using gene chip technology. These SNP markers were used to analyse the structure of the population to obtain the Q matrix, and then the salt tolerance traits and salt tolerance index data were combined to conduct a genome-wide association analysis. The natural population can be divided into two subgroups. The genetic relationship between the materials was weak, indicating that the breed inherited diversity is decreasing. The salt tolerance traits were associated with 27 significant SNP sites, and the salt tolerance index was associated with 15 significant SNP sites. The significant SNP sites were further analysed, and the traits were associated with the 7 d shoot length decline rate. Salt tolerance-related alleles Gh_A01G0034 and Gh_D01G0028 were detected in the spot data. The transcriptome results showed that the expression of these two genes reached their peak at 12 hours of salt stress. The homologous sequences were compared with Arabidopsis thaliana to obtain the homologous genes AT4G34660 and AT4G34940. Analysis of the functions of these two genes revealed that the Arabidopsis thaliana homologous sequence encodes the SH3 domain protein and ARO1. The membrane lipid peroxidation scavenging system has high activity in the salt tolerance reaction of cotton, so the stability of the structure and function of the protective membrane is the key to the salt tolerance of cotton. This study further analysed the functions and expression patterns of cotton salt-tolerant genes and even has certain reference value for analysing the mechanism of cotton salt tolerance.

Dota availability statements
The raw sequence data reported in this paper have been deposited in the Genome