Construction of a High-Density Genetic Map Based on Large-Scale Marker Development in Coix lacryma-jobi L. Using Specifific-Locus Amplifified Fragment Sequencing (SLAF-seq)

Coix lacryma-jobi L. is one of the most important economical and medicine corn. In this study, a high genetic linkage map has been constructed for Coix lacryma-jobi L. from a cross F2 community of “Qianyi NO.2” × “Wenyi NO.2” and their parents through high-throughput sequencing and specfic-locus amplified fragment (SLAF) library construction. After pre-processing, 325.49 Gb of raw data containing 1,628 M reads were obtained. A total of 22,944 high-quality SLAFs were detected, among which 3,952 SLAFs and 3,646 of the polymorphic markers could meet the requirements for construction of a genetic linkage map. The integrated map contains 3,605 high quality SLAFs, which are grouped to 10 genetic linkage groups. The total length of the linkage map is 1,620.39 cM with an average distance of 0.45 cM and 360.5 markers in average of per linkage group. This work will provide an important molecular biology basis for investigating their characteristics, gene cloning, molecular marker-assisted breeding, functional genomics and so on for Job’s tears.


Introduction
Coix lacryma-jobi L. (Job's tears), also named as medicine corn, myotonin, and six grains, is an annual or perennial C4 herb belonging to Coix L. species, Maydeae, Gramineae.. It has been widely grown in East and Southeast Asia [1]. Therefore, southwestern China is one of its centers for the origin, evolution and migration[2,3]. Job's tears is a traditional crop with high nutritional and one of the most important components of Chinese traditional herbal medicine[4,5,6]. As an application of its seed oil, Kanglaite injection (KLT) has been widely used for cancer therapy [7,8]. With the widespread functional recognition on its nutrition and activities on anti-tumor, immunomodulation, as well as blood-lowering blood calcium, the demand for Job's tears has increased rapidly and widely as a medicinal and health product in almost all tropical and subtropical countries all over the world [2].
To date, traditional breeding including wild resource domestication, hybrid and mutagenic breeding, has been mainly utilized to breed new varieties of Job's tears. However, the breeding of Job's tears was limited due to some of the inherent characteristics and breeding technique limitations. Many important economic traits of the Job's tears are controlled by multiple gene loci. To improve the important agronomic traits and the breeding efficiency of the Job's tears, technologies of marker assistive technology (MAS)and marker quantity trait sites (QTLs) can be used to locate and clone important agronomic trait genes. Genetic maps, especially high-density genetic maps, are important tools for QTL location and marker-assisted selection research. Li et al. has used RAPD technology for genetical evaluation on seed resources of Job's tears [9]. Ma et al. has analyzed the genetic relationship of 79 varieties and found that the genetic diversity of the varieties in Guangxi, China is higher than that in South Korean [10]. Fu et al. has researched on the genetic relationship evaluation with 139 Job's tears varieties using AFLP and demonstrated that southwestern China is its secondary origin center[3]. Qin et al. has built a genetic map with F2 community of 131 individuals including 10 genetic clusters, 80 AFLP markers and 10 RFLP markers with a total length of 1,339.5 cM, average genetic spacing of 14.88 cM, based on their parents from Beijing and Wuhan [11]. Guo et al. has sketched the whole genome of species "Great Montenegro" with a total genome of 1.6 Gb and mapped the Genetic linkages constructed from 551 individuals of F2 community from "Great Montenegro" (Male parent) x "small white shell Xingren" (Female parent) and BC4 backcrossed by "small white shell Xingren". This mapcontained 230 Indel markers, a total length of 1,570.12 cM, and average genetic spacing of 6.83 cM. They also accurately identified a gene Ccph1controling the thickness of seed's shell and Ccph2 affecting color of seed's shell [12].
Some genetic maps have been constructed by molecular marking techniques such as AFLP, RFLP, RAPD, ISSR, SRAP, and SSR. Due to the limited amount of individuals used in these maps, low number of molecular markers and the marking density are not saturated enough, which implies difficulty to carry out the follow-up studies such as QTL. Specific site amplification fragment sequencing (SLAF-seq) technology is an effective method for large-volume single nucleotide polymorphism (SNP) and large-scale genotyping based on simplified genomic library (RRL) and high-throughput sequencing [13]. SLAF-seq's powerful role on genetic research is subsequently used to develop SLAF markers in Lophopyrum elongatum [14] and Corn [15], and high-density genetic mapping construction and QTL location in species such as sesame seeds[16], soybeans[16] and mango [17]. SLAF-seq is the best choice for large-scale molecular marker development and high-density chain mapping, especially in organisms that do not have reference to genomic information.
After several years of investigation and observation, two personality traits with large differences were chosen for hybridizing and their genetic linkage map. A large number of SNP markers have been identified by using SLAF-seq method, and these new markers were used to construct a high-density genetic linkage map, which could provide theoretical basis for investigating their characteristics, gene cloning, molecular marker-assisted breeding, functional genomics and so on.

Plant Materials and DNA Extraction
200 individuals were randomly selected from F2 separation group of 426 individuals which were built from "Wenyi NO.2" (male parent) x "Qianyi NO.2" (female parent) as genetic mapping separation group. These two parents and F2 generations were planted at the Institute of Subtropical Crops of the Guizhou Academy of Agricultural Sciences. The healthy leaves from parents and F2 generation were collected and stored in liquid nitrogen. For DNA extraction, the CTAB buffer (8.18 g NaCl, 2 g CTAB, in total volume of 100 mL with 20 mM EDTA, 100 mM Tris, pH 8.0) has been modified based on the traditional CTAB method [18]. The total genomic DNAs were respectively extracted from each plant and analyzed by electrophoresis with 1% agarose gel and quantified by spectrophotometer [19](NanoDrop 2000, Thermo, USA).

SLAF library Construction and High-throughput Sequencing
SLAF-Seq method was used [20]. Firstly, Genome of the parents and F2 populations was digested by Hpy166II restrictive endoenzyme (New England Biolabs (NEB), USA); then the single nucleotide (A) was added to the end of the digestive fragment by Klenow fragment (3'-5' exo-) (NEB) with dATP at 37℃. Afterwards, the dual-label sequencing markers (PAGE-purified, Life Technologies, USA) were connected to the new added terminal A by T4-DNA connecting enzymes. PCR amplification was performed with the diluted DNA samples, primers of 5'-AATGATACCGACCACCGA-3' (forward) and 5'-CAAGCAGAAGACGGCATA-3'(reverse), Q5 ® High-Fidelity DNA Polymerase (NEB), and dNTPs. The PCR products were purified and collected by Agencourt AMPure XP beads (Beckman Coulter, High Wycombe, UK) and separated by electrophoresis in 2% agar gel. The DNA fragment (with indices and adaptors) between 264 bp and 464 bp were re-separated and purified from the gel band in electrophoresis by QIA quick gel extraction kit (Qiagen, Hilden Qiagen, Germany); and the paired obtained sequences (terminal 125 bp) were analyzed by Illumina Hi-Seq 2500 system (Illumina, Inc., San Diego, CA, USA).

SLAF-Seq Data and Genotyping Analyses
SLAF-seq data analysis and genotyping method of Sun[20] were used. Dual index [21] was selected for raw data sequence identification, reads for each sample were used to evaluate the quality and quantity of the sequencing data. SLAF labels were also developed in the parent and F2 community through reads clustering. And polymorphism analysis was performed based on the difference between the number of allelic genes and the gene sequences. The SLAF labels (polymorphic SLAF label) with polymorphic sites (SNP and Indel) were selected for subsequent analysis. The SLAF markers were evaluated and filtered multiple times to obtain the high-quality, effective molecular markers. Since the Job's tears is a diploid plant, the filtered DNAs contained more than 4 different suspicious SLAF genotypes at one gene site. In this study, the sequence length less than 200 bp was defined as a low length of SLAFs and filtered out. SLAFs with 2, 3, or 4 tags were identified as polymorphic SLAFs and considered as potential tags. Polymorphic markers were divided into 8 separation modes, which were ab×cd, ef×eg, hk×hk, lm×ll, nn×np, aa×bb, ab×cc and cc×ab. The F2 population was obtained through F1, a fully pure parent with two genotypes of aa or bb. Therefore, the SLAF marker of separation pattern aa×bb was used for genetic map construction.

Genetic Map Construction
Linkage groups (LGs) have been initially divided by improved LOG score (MLOD) value up to 5 on mark site. To build more effectively maps, High Map strategy was chosen for arranging SLAF tags in a specific order and correcting genotyping errors in LGs [22]. The genetic map was constructed by maximum likelihood method[23], and genotyping errors will be corrected by smooth method [24] . The missed genotype will be estimated by k-nearest neighbor algorithm [25]. The recombinant r value will be converted to genetic map distance through Kosanbi function (centimorgan, cM)[26] and a high-density genetic chain map of Job's tears has been drawn. Areas with 3 or more than 3 partial separation markers in adjacent locations on map become the area of the partial separation hot spot which were defined as Segregation distortions region (SDR) [27].

Analysis of SLAF-seq Data and SLAF markers
Based on SLAF-Seq technology, total of 1,628,398,591 reads with approximately 325.49 Gb of raw data (Table S1) have been obtained by two-terminal sequencing on the constructed SLAF library. The effective length of each read is approximately between 384 bp to 464 bp by removing the label sequence at the end of the DNA fragments. The average Q30 of the high-throughput sequencing results is about 93.92% and the GC content ranges from 44.43% to 49.75% with average of 46.96% in the 200 F2 community. In order to improve the efficiency of molecular markers, the length of SLAF sequencing for the parent is much greater than the F2 generation. Therefore, in the total reads, 66,927,932 reads originate from the male parent, and 52,144,104 reads originate from the female parent. And in the 200 F2 generation, the sample sequences range from 2,878,250 to 14,176,926, with an average value of 7,546,632 (Table S1).

SLAF Mark detection and Genotype Definition
All sequences are formed SLAF clusters according to similarity clustering. After eliminating sequences with low lengths, repeats, and suspicious SLAF, the valid SLAF marks are 262,222 and 219,948 in the male and female parents, respectively through high-throughput sequencing. The developed SLAF marks are18,747,434 and 13,716,859 in the male and female parents, of which the average developed rates are 71.50 and 62.37 folds, respectively. In the F2 community, the SLAF marks range from 110,940 to 197,542 with an average of 153,652 through high-throughput sequencing respectively. The developed SLAF marks range from 639,392 to 4,736,985 with an average of 1,969,377, of which the average developed rates are from 5.19 to 30.27 × with total average coverage rate of 12.82 for per individual (Figure 1). Based on the polymorphism differences between alleles and gene sequences, 302,295 high-quality SLAF markers have been developed to 3 types, which are polymorphic markers, non-polymorphic markers and repetitive polymorphism slotted markers (Table 1). Among them 79,364 markers belong to polymorphic markers (26.25%), which are used as subsequent genotypes. After removing low-quality polymorphism markers such as missing parent information, repeat sequence areas, and low integrity of the non-polymorphic markers (73.08%) and repeat polymorphic markers (0.67%), 53,023 of high-quality and effective SLAF markers are genetically classified into 8 separation mods, which are ab×cd, ef×eg, hk×hk, lm×ll, nn×np, aa×bb, ab×cc, and cc×ab ( Figure 2). The generation of F2 community are constructed by self-inbred of the homozygote aa with bb hybridize. Therefore, only aa×bb isolates are used for genetic map construction and 22,944 SLAF markers are genetically classified into this group, and 3,952 of them originate from the parents with average sequencing depth of 66.925×; and in the F2 generation, the sequencing depth is 12.82×. All these SLAF marks were used to genetic maps construction.

Basic Characteristics of the Genetic Map
After a series of screenings, 3,605 effective SLAF markers from total of 3,646 ones were obtained and used for the final linkage analysis, while the rest 41 SLAF markers were not located on any of the linkage map. These markers have 165.30 × and 98.59 × of coverage in the male and female parents, respectively; and 20.14 × of the average coverage in the F2 individuals. In 200 of F2 individuals, the integrity of each marker is a key parameter in controlling the quality of the genetic map. The averages integrity of all markers positioned on chain map reaches up to 98.95%.
3,952 SLAF markers from 202 individuals has been used for genetic mapping. Linkage analysis through Mendel separation ratio was also achieved by software of Joimnap 4.1 in conditions of LOD ≥ 4.0 and recombination rate (r) ≤ 0.30，3,605 SLAF markers from total of 3,646 have been distributed across 10 linkage groups. The present rate reaches up to 98.88%. Finally, analysis of the distribution of all markers on 10 linkage groups showed that the total genetic distance was 1,620.39 cM and the average distance between two SLAF markers was 0.45 cM, which is the densest genetic linkage map as far as in Job's tears. The marker distribution and length of the 10 linkage groups were not the same (Table 2 and Figure 3). The largest LG is LG9 with 774 markers with the total length of 266.78 cM, and the average distance between adjacent markers is only 0.35 cM; the smallest LG is LG6 with only 97 markers with a length of 66.73 cM, and the average length between adjacent markers is only 0.70 cM. In the 10 linkage groups, the average SLAF markers are 360.5, the genetic LG length ranges from 66.73 cM (LG6) to 266.78 cM (LG9), and the distance between adjacent markers are between 0.35 cM (LG9) and 0.84 cM (LG10), the linkage levels range from 98.06% to 100.00% with an average of 99. 42% in conditions of "Gap≦5", the largest interval length is 9.72 cM (LG5) ( Table S2). In the above LGs, 10 segregation distortion regions that are 0.28% of the total SLAF markers are found, and 8 of them deviate in the male-parents and 2 of them deviate in the female-parents. 8 of the 10 segregation distortion regions (SDRs) locate in LG7, and the remaining 2 distribute in LG9. 1 'Gap < =5' indicated the percentages of gaps in which the distance between adjacent markers was smaller than 5 cM. LG

Linkage group SNP number Transition/Transversion number
LG1 541 388/153 LG2 268 194/74 LG3 440 320/120 LG4 623 452/171 LG5 613 418/195 LG6 147 93/54 LG7 755 545/210 LG8 428 308/120 LG9 1382 1,010/372 LG10 181 131/50  LGs are evenly distributed, which showed the genetic 78 map has very high quality ( Table 5). 79 The genetic map is a basic multi-point recombination analysis, of which the closer 80 distance between markers, the smaller recombination rate, the potential layout problems   The first high-density genetic map of Job's tears has been constructed in our study,  We report here the first high-density genetic map for Job's tears. The map was 212 constructed using an F2 population and the SLAF-seq approach, which allowed the