Narrow transmission bottlenecks and limited within-host viral diversity during a SARS-CoV-2 outbreak on a fishing boat

The long-term evolution of viruses is ultimately due to viral mutants that arise within infected individuals and transmit to other individuals. Here we use deep sequencing to investigate the transmission of viral genetic variation among individuals during a SARS-CoV-2 outbreak that infected the vast majority of crew members on a fishing boat. We deep-sequenced nasal swabs to characterize the within-host viral population of infected crew members, using experimental duplicates and strict computational filters to ensure accurate variant calling. We find that within-host viral diversity is low in infected crew members. The mutations that did fix in some crew members during the outbreak are not observed at detectable frequencies in any of the sampled crew members in which they are not fixed, suggesting viral evolution involves occasional fixation of low-frequency mutations during transmission rather than persistent maintenance of within-host viral diversity. Overall, our results show that strong transmission bottlenecks dominate viral evolution even during a superspreading event with a very high attack rate.


Supplementary Figure 2. Comparison between different variant calling methods.
An UpSet plot shows the overlap in the sets of SNPs called by three different variant calling methods -varscan2, lofreq, ivar, and our custom python script using pysam (Citations). Variants were covered by more than 100X reads and present at greater than 2% frequency to be included in the set for each variant caller. The majority of variants are called by all four methods. No variants are called by our custom script that aren't identified by at least one other method.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 9, 2022. . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 9, 2022. ; https://doi.org/10.1101/2022.02.09.479546 doi: bioRxiv preprint Supplementary Figure 4. There is no discernable pattern of minor variants in the genome. Plot showing every minor variant (>50% allele frequency) identified across the crew members that passed our quality filters. We included variants if they were present in more than 2% of greater than 100 reads.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 9, 2022. ; https://doi.org/10.1101/2022.02.09.479546 doi: bioRxiv preprint

Supplementary Figure 5. Distribution of fixed mutations in the genome.
Plot showing fixed variants identified across the crew members that passed our quality controls. We included variants if they were present in 98% or more of at least 100 reads. Mutations that are present in the 5' and 3' UTRs are excluded from this plot.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 9, 2022. ;https://doi.org/10.1101https://doi.org/10. /2022 Supplementary Figure 6. Low frequency shared variants are present in the non-boat control specimen. Four variants shared at low-frequency between crew members are also detected in a specimen not collected from the boat but included as a control in both sequencing runs (Specimen 10136). This observation suggests that these are not de novo low-frequency variants that arise on the boat and spread between the crew, but rather sequencing contamination or variant calling errors common to samples from the two sequencing runs.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 9, 2022. ; https://doi.org/10. 1101/2022