High-throughput DNA extraction and 1 cost-effective miniaturized 2 metagenome and amplicon library 3 preparation of soil samples for DNA 4 sequencing 5

12 Reductions in sequencing costs have enabled widespread use of shotgun metagenomics and amplicon 13 sequencing, which have drastically improved our understanding of the microbial world. However, large 14 sequencing projects are now hampered by the cost of library preparation and low sample throughput. 15 Here, we benchmarked three high-throughput DNA extraction methods: ZymoBIOMICS™ 96 MagBead 16 DNA Kit, MP Biomedicals TM FastDNA TM -96 Soil Microbe DNA Kit, and DNeasy® 96 PowerSoil® Pro 17 QIAcube® HT Kit. The DNA extractions were evaluated based on length, quality, quantity, and the 18 observed microbial community across five diverse soil types. DNA extraction of all soil types was 19 successful for all kits, however DNeasy® 96 PowerSoil® Pro QIAcube® HT Kit excelled across all 20 performance parameters. We further used the nanoliter dispensing robot I.DOT One to miniaturize 21 Illumina amplicon and metagenomic library preparation volumes by a factor of 5 and 10, respectively, 22 with no significant impact on the observed microbial communities. With these protocols, DNA extraction, 23 metagenomic library preparation, or amplicon library preparation for one 96-well plate are approx. 3, 5, 24 and 6 hours, respectively. Furthermore, the miniaturization of amplicon and metagenome library 25 preparation reduces the chemical and plastic costs from 5.0 to 3.6 and 59 to 7.3 USD pr. sample. 26


Introduction 27
The drastic reduction in sequencing costs has enabled more researchers to utilize next-generation 28 sequencing in their field of study (1). In the field of microbial ecology especially, the reduced costs have 29 enabled an increase in the scope of the projects, as thousands of samples are required to understand 30 the diversity of microbial habitats (2-5). The ongoing reductions in sequencing costs means that large 31 sequencing projects are now cost-limited by the cost associated with hands-on time and sample 32 preparation. However, new automated or semi-automated workflows utilizing liquid handlers and drop 33 dispensing technology seem promising regarding the reduction of both labor time and reaction volumes 34 -ultimately reducing costs (1,6-8).

35
Soil samples are especially problematic for high-throughput (HT) DNA extraction workflows due to the 36 diversity of soil properties (1,9,10). Furthermore, the majority of the proposed protocols are not easy to 37 convert to a HT format due to steps that are either difficult to automate, time-consuming, or include 38 hazardous substances. Although several commercial soil-specific HT DNA extraction kits are available 39 (table 1), these have not been independently tested on a diverse range of soil types. 40 Table 1. Commonly used and commercially available kits for DNA extraction from soil. Extraction kits in bold were 41 compared. High-throughput equipment refers to available automated solutions for the HT solution. LT: Low-throughput, HT: High-which allowed for multiplexing of a large number of samples, was the Nextera XT DNA library 48 preparation kit (43)(44)(45). The Nextera XT library preparation protocols for small genomes, PCR 49 amplicons, plasmids, or cDNA have undergone several transformations since the first release of the 50 Nextera XT protocol in 2012. The first Nextera XT protocols were easy to use, however the expensive 51 reagents were limiting for large sequencing projects (46). To reduce the library preparation costs, earlier 52 work focused on diluting the expensive reagents or replacing them with cheaper alternatives (47), 53 however in 2017 the Nextera Flex (renamed to Illumina DNA prep) protocol was introduced, which 54 utilizes bead-linked transposases, rendering the previous cost-effective protocols less useful. Previous 55 work has shown it is also possible to dilute the reagents in the Nextera Flex kit (46), however another 56 strategy for reducing the overall costs without tampering with the reagents is by miniaturization.

57
Here we present and benchmark a complete HT workflow from DNA extraction to miniaturized Illumina 58 amplicon or metagenome library (Fig 1). The three HT DNA extraction kits were benchmarked on five different soil types (S1 Table)

74
Generally, the kits were able to extract DNA from all soil types; however low amounts were extracted 75 from Beach Sand, which was likely due to low biomass relative to the other soil types (S2 Table).

114
The microbial community profiles were similar across all kits (Fig 2A). PCA revealed the communities 115 clustered according to soil type, not DNA extraction kits. Based on a PERMANOVA 2.8 % variance was 116 explained by DNA extraction kit (p<0.001) and 90.6 % by soil type (p<0.001) (Fig 2B). When stratifying 117 for soil type PCA revealed samples clustered by DNA extraction kit ( Fig 2C, S2 Fig).

124
Based on the DNA extraction characteristics, the high amplicon library success rate for all soil types, 125 and the consistent community profile the PowerSoil Pro HT DNA extraction kit was selected for further 126 optimization. Specifically, the effect of bead beating time and intensity on the observed community 127 structure and DNA quantity and length was investigated. Both bead-beating time and intensity affected 128 the DNA yield and observed microbial community, however, little difference was observed between six 129 minutes of bead-beating at 1600 or 1800 RPM. Increasing the bead-beating intensity to 1800 RPM did 130 however increase fragmentation, therefore a bead-beating of a total of six minutes at 1600 RPM was 131 chosen (S2 File). Reducing the input amount from 125 mg to 50 mg had no effect on the observed 132 microbial community (S2 File). The PowerSoil Pro HT kit can furthermore be semi-automated with the 133 QIAcube HT system to reduce the hands-on time. For our projects, the DNA extraction costs per sample 134 based on chemicals and disposables for PowerSoil Pro HT was 7.7 USD. 135 2. Miniaturized Illumina Amplicon Library Protocol 136 Amplicon libraries were successfully prepared for all soil types (S3 Table). Shannon diversity index was 137 not significantly affected by the library volume (0.3 % variance explained, p=0.33, ANOVA, n=30) when 138 blocking the contribution from soil type (p<0.001) and the interaction between soil type and library 139 volume (p=0.04). The data violate the assumption of normal distribution but not the assumption of 140 heteroscedasticity for both grouping factors. ANOVA based on ranks did yield very similar results but 141 changed the p-value of the interaction to p=0.14.

152
Community profiles at the genus level were similar between standard and miniaturized amplicon 153 libraries ( Fig 3A). When conducting a differential abundance analysis of all ASVs between the standard 154 and miniaturized protocol a total of 19 ASVs were found to be differentially abundant ( Fig 3B). In total,   Table) except one library for the Clay soil using the 171 miniaturized protocol.

173
All reads identified as 16S fragments were aggregated to the genus level (see methods). Based on 174 ANOVA on ranks, Shannon diversity index was not significant for the library volume (0 % variance 175 explained, p=1). Soil type explained 80.2 % variance (p<0.001). From the ANOVA analysis based on 176 ranks, Bray-Curtis dissimilarity between replicates was affected by the library volume (6.8 % variance 177 explained, p=0.02), being slightly lower in the miniaturized protocol (TukeyHSD, p=0.02). Soil type again 178 accounted for the largest proportion of the variance (65 % variance explained, p<0.001). Bray-Curtis 179 dissimilarity for replicates within protocols was not different to the dissimilarity of replicates between 180 protocols for any soil type (S4 Table).

182
Comparison of the relative abundance at the genus level revealed very similar profiles between the 183 standard and miniaturized protocol, regardless of soil type ( Fig 4A). None of the identified genera were 184 found to be differential abundant between the protocols (

199
Hence, to facilitate this it was paramount that sample preparation was converted to a HT setting and 212 The microbial community could successfully be analyzed with miniaturized reaction volumes for both 213 amplicons and metagenomes. Metagenomes could be prepared in a 1:10 reaction volume, whereas 214 amplicon reaction volumes could be miniaturized with a factor of five. A downside to the miniaturized 215 reaction volumes was the entry cost of the nano-liter drop dispensing platforms, ranging from 100.000 216 to 300.000 USD, as well as expensive servicing fees and highly priced plastic consumables. However, 217 in large projects, the entry cost was small compared to the reduction in library preparation cost and 218 hands-on time. In our case, the cost savings of miniaturizing the metagenomes library protocol 219 exceeded the price of the I.DOT One after ~2000 samples.   Table. General sta�s�cs from DNA extrac�on, library prepara�on, and community profiles based on 16S rRNA amplicon data. All samples were rarefied to 11,468 reads (the lowest read count in any sample with more than 10,000 total reads). ASVs not exceeding 0.1 % rela�ve abundance in at least one sample were removed prior to Hellinger-transforma�on and calcula�on of Bray-Cur�s dissimilarity.
Numbers represent mean and numbers in parentheses represent standard devia�on. *DNA concentra�on could not be determined due to interference from humic substances.  Table. General library characteris�cs of the miniaturized and standard amplicon protocol. Microbial community characteris�cs of the standard and miniaturized amplicon library protocol for five different soil. All samples were rarefied to 3,511 reads (the lowest read count in any sample with more than 3,000 total reads). ASVs not exceeding 0.1 % rela�ve abundance in at least one sample were removed prior to Hellinger-transforma�on and calcula�on of Bray-Cur�s dissimilarity. Numbers represent mean (n="Libraries") and numbers in parentheses represent standard devia�on. The number of comparisons for "Bray-Cur�s dissimilarity between protocols" was 9.  Table. General library characteristics of the miniaturized and standard metagenome protocol. Microbial community characteristics of the standard and miniaturized metagenome library protocol for five different soil. All samples were rarefied to 3,665 reads (lowest read count in any sample with more than 3,000 total reads). Genera not exceeding 0.1 % relative abundance in at least one sample were removed prior to Hellinger transformation and calculation of Bray-Curtis dissimilarity. Numbers represent mean (n="Libraries") and numbers in parentheses represent standard deviation. Number of comparisons for "Bray-Curtis dissimilarity between protocols" were 9 except for Clay (n=6).