Abstract
Background Sugarcane (Saccharum spp.) is highly polyploid and aneuploid. Modern cultivars are derived from hybridization between S. officinarum and S. spontaneum. This combination results in a genome exhibiting variable ploidy among different loci, a huge genome size (approximately 10 Gb) and a high content of repetitive regions. Gene expression mechanisms are poorly understood in these cultivars. An approach using genomic, transcriptomic and genetic mapping can improve our knowledge of the behavior of genetics in sugarcane.
Results The hypothetical HP600 and centromere protein C (CENP-C) genes from sugarcane were used to elucidate the allelic expression and genomic and genetic behavior of this complex polyploid. The genomically side-by-side genes HP600 and CENP-C were found in two different homeologous chromosome groups with ploidies of eight and ten. The first region (Region01) was a Sorghum bicolor ortholog with all haplotypes of HP600 and CENP- C expressed, but HP600 exhibited an unbalanced haplotype expression. The second region (Region02) was a scrambled sugarcane sequence formed from different noncollinear genes containing duplications of HP600 and CENP-C (paralogs). This duplication occurred before the Saccharum genus formation and after the separation of sorghum and sugarcane, resulting in a nonexpressed HP600 pseudogene and a recombined fusion version of CENP-C and orthologous gene Sobic.003G299500 with at least two chimerical gene haplotypes expressed. The genetic map construction supported the difficulty of mapping markers located in duplicated regions of complex polyploid genomes.
Conclusion All these findings describe a low synteny region in sugarcane, formed by events occurring in all members of the Saccharum genus. Additionally, evidence of duplicated and truncate gene expression and the behavior of genetic markers in a duplicated region was found. Thus, we describe the complexity involved in sugarcane genetics and genomics and allelic dynamics, which can be useful for understanding the complex polyploid genome.
List of abbreviations
- ºC
- Celsius degrees
- BACs
- Bacterial Artificial Chromosomes
- BES
- BAC ends sequencing
- Bp
- base pairs
- BRIX
- Soluble solid content
- CDS
- Coding DNA sequence
- CenH3
- histone H3
- CENP-C
- Centromere protein C
- CMA
- A3 Chromomycin
- DAPI
- 4’,6-diamidino-2-phenylindole
- DNA
- Deoxyribonucleic acid
- EST
- Expressed Sequence Tag
- FISH
- Fluorescent in situ hybridization
- Gb
- Giga-base pairs
- GBS
- Genotyping by sequencing
- HMW
- High-molecular-weight
- Kb
- Kilo-base pairs
- QTL
- Quantitative Trait Loci
- LTR
- Long Terminal Repeat
- Min
- Minutes
- Mya
- Millions Years Ago
- PCR
- Polymerase chain reaction
- PFGE
- Pulsed Field Gel Electrophoresis
- RNA
- Ribonucleic acid
- RNAseq
- RNA sequencing
- Sec
- Seconds
- SNP
- Single Nucleotide Polymorphism
- TEs
- Transposable Elements