Whole genome sequencing of Red Chittagong Cattle (RCC) cattle and insight into genetic variants in candidate genes for disease resistance

Detection of genome-wide genetic variation is one of the primary goals in bovine genomics. Genomes of several cattle breeds have been sequenced so far to understand the genetic variation associated with important phenotypes. Red Chittagong Cattle (RCC) is a locally adopted and disease-resistant indicine cattle breed in Bangladesh. In this study, we describe the first genome sequence of the RCC breed and in silico analyses of identified functional variants. Deep sequencing of a RCC bull genome on the NanoBall sequencing platform generated approximately 110 Gb paired-end data, resulting in 31X of genome coverage. Quality filtering retained 360,711,803 paired-end reads. Of the filtered reads, 99.8% were mapped to the bovine reference genome (ARSUCD1.2). A total of 17. 8 million Single nucleotide variants (SNVs) and 2.1 insertions and deletions (INDELs) were identified in the RCC genome. Ts/Tv ratio was computed and found to be 2.21. In total, 332 4621 variants were novel compared with dbSNP data (NCBI dbSNP bovine build 150). Functional annotation identified 54961 SNVs exonic regions, 63.75% of which were synonymous, whereas 30.42% were non-synonymous changes. The percentage of coding INDELs was 0.25% (Frameshift deletion 0.19% and Frameshift insertion 0.06%). We identified 120 variants in 26 candidates for five diseases-foot and mouth disease (FMD), Mastitis, Parasite, para-tuberculosis, and tick. Of the 120 variants, 50 were non-synonymous / frameshift (NS/FS), while 70 were synonymous/non-frameshift (SS/NFS). The identified catalog of genomic variants in RCC may establish a paradigm for cattle research in Bangladesh by filling the void and providing a database for genome-wide variation for future functional studies in RCC.

Introduction 65 High throughput sequencing facilities are evolving with modern computational biological tools, 66 and the decreasing sequencing cost enabled cost-effective variant discovery in cattle (1, 2). 67 Furthermore, with the availability of the annotated reference bovine genome assembly and 68 annotated genes of both taurine (Bos taurus) and indicine (Bos indicus) cattle (3, 4), comparative 69 genomics is highly preferred for systematic genetic upgradation. Over the last decades, a 70 substantial number of genetic variants in the form of single nucleotide polymorphism (SNP) and 71 insertion/deletion (indel) have been identified through several bovine whole-genome sequencing 72 studies for different cattle breeds (5). However, studies for discovering genome-wide variants are 73 continuing since many genetic variants in diverse cattle breeds remain to be discovered. better than other indigenous genotypes in Bangladesh (6, 7). Economic and genetic evaluation 80 studies have shown that RCC farming is more profitable than other local or crossbred cattle 81 farming under rural conditions (8-10). This genotype is quite resistant to parasites compared to 82 other indigenous cattle in Bangladesh (6,(11)(12)(13). RCC performs well at farm and field levels (14,83 15) and has a positive economic value for body weight gain (10). RCC could be considered a 84 genotype of choice while breeding for disease resistance and beef production in Bangladesh.

85
Most of the research works done so far on RCC have been phenotype based. Few molecular 86 studies have been carried out to investigate RCC's origin and genetic diversity (16)(17)(18)(19)(20). However, despite past decades of research, the scientific community still needs a comprehensive 89 database on genetic variation in the RCC because most works were descriptive. Therefore, 90 considering the research gap and the usefulness of high throughput sequencing, we sequenced 91 the whole genome to detect genome-wide differences between taurine and an indicine cattle 92 breed and to reveal breed-specific genetic variants for subsequent use in breeding and 93 conservation programs. In addition, genetic variants in candidate genes associated with disease 94 resistance were also investigated to identify putative candidate mutation for a vital phenotype 95 that defines the RCC genotype.  Whole genome sequencing was performed using a DNBseq platform. First, adaptor sequences, 118 contamination, and low-quality reads were removed from raw reads using SOAPnuke (21).

119
Briefly, the first raw data with adapter or low-quality sequences were filtered. Then, data 120 processing was performed to remove contamination and obtain valid data. SOAPnuke software 121 filter parameters were: " -n 0.001 -l 10 --adaMR 0.25". Steps of filtering were: 1) Filter adapter: 122 if the sequencing read matches 25.0% or more of the adapter sequence (maximum two base 123 mismatches are allowed), remove the entire read; 2) Filter low-quality data: if the bases with a 124 quality value of less than 10 in the sequencing read account for 50.0% or more of the entire read, 125 delete the entire read; 3) Remove N: if the N content in the sequencing read accounts for 0.1% or 126 more of the entire read, delete the entire read and 4) Obtain Clean reads: the output read quality 127 value system is set to Phred+33. Raw reads were again checked for quality using fastqc (22), and low-quality raw paired reads 131 were further filtered out using Trim Galore (https://github.com/FelixKrueger/TrimGalore). High-132 quality reads were mapped against the reference Bos taurus genome GCF_0002263795.1(4) using Burrows-Wheeler Aligner (BWA (23) software with the demo BWA mem settings. Mean 134 coverage and breadth of reference genome coverage were estimated using samtools version 1.7 135 and bedtools v2.26.0. Variant calling was performed using SAM tools (24). Identified SNPs will 136 further be screened using bcftools (25) and vcftools to obtain high-quality SNPs. Identified    To evaluate the quality of the detected SNPs, the Ts/Tv ratio was computed and found to be 2.27 199 ( were 68337 and 58191. In addition, 15397123 (79%) variants were located in intergenic regions.
A catalog of LoF variant with functional annotation would greatly aid gene prioritization by 246 providing reference in disease studies (31,(39)(40)(41) By manual curation of the annotated exonic variants, we identified 120 variants in 26 candidates 256 for five diseases-foot and mouth disease (FMD), Mastitis, Parasite, para-tuberculosis and tick.

Foot and mouth disease (FMD)
We observed nine polymorphisms in four genes associated with FMD. A frameshift deletion at 261 988 nucleotide position was observed in Bovine leukocyte antigen (BOLA-A) gene (Table 5). genotype were associated with resistance to FMD by contrast AA genotype was associated with 267 susceptibility to FMD in Wanbei cattle.

268
In the present study, a nonsynonymous SNV was observed at the exon8 of the CCDC36 gene 269 (Table 5)  Seven non-synonymous SNVs were detected in four mastitis associated genes (Table 6).

288
Vitenberga  showed that immunoreactivity was more pronounced for IL-4 in 289 mastitis cases. We identified a non-synonymous mutation in the IL4 gene that coding for IL4.