Hackflex: low cost Illumina sequencing library construction for high sample counts

We developed Hackflex, a low-cost method for the production of Illumina-compatible sequencing libraries that allows up to 11 times more libraries for high-throughput Illumina sequencing to be generated at a fixed cost. We call this new method Hackflex. Quality of library preparation was tested by constructing libraries from E. coli MG1655 genomic DNA using either Hackflex, standard Nextera Flex or a variation of standard Nextera Flex in which the bead-linked transposase is diluted prior to use. We demonstrated that Hackflex can produce high quality libraries and yields a highly uniform coverage, equivalent to the standard Nextera Flex kit. Using Hackflex, we were able to achieve a per sample reagent cost of library prep of A$8.66, which is 8.23 times lower than the Standard Nextera Flex protocol at advertised retail price. An additional simple modification to the protocol enables a further price reduction of up to 11 fold or about A$6.50/sample. This method will allow researchers to construct more libraries within a given budget, thereby yielding more data and facilitating research programs where sequencing large numbers of libraries is beneficial.


INTRODUCTION
The original Nextera protocol provided an easy to use and flexible means for generating Illumina-compatible shotgun libraries. When applied at scale, however, the Nextera reagents at list price could become prohibitively expensive for projects with large sample counts and low sequencing requirements per-sample. Previous work demonstrated that it was possible to dilute the Nextera reagents and with custom buffers, the per-sample library cost could be greatly reduced 1,2 , thereby facilitating the processing of large sample batches. In 2017 Illumina introduced a new type of Nextera kit, called Nextera Flex, and subsequently discontinued the original Nextera kits for which dilution strategies had been developed. The Nextera Flex kits use bead-linked transposases to fragment and tag DNA with adapter sequences. The tagmentation technique allows the incorporation of defined adapter sequences, enabling barcoded primers to anneal and be extended through tagmented DNA fragments in subsequent PCR amplification and sequencing reactions 3 . The new Nextera Flex kits have been shown to yield greatly improved data quality relative to the original Nextera and Nextera XT kits 4 . Unfortunately, due to several chemistry differences, the existing dilution-based protocols can not be directly applied with the new Nextera Flex kits. In this work, we introduce an ultra low cost variant of the Nextera Flex protocol that we call "Hackflex". In addition to diluting the bead-linked transposases, our protocol replaces all other reagents with components readily available from third party sources, driving the cost per sample to A$8.66 ( Fig. 1 ; S1 ). In this study we present our Hackflex protocol and compare the quality of the resulting data to the standard Nextera Flex kit protocol in terms of uniformity of coverage, sequence accuracy, GC coverage bias, and uniformity of barcode counts.

Genomic DNA preparation
Genomic DNA of three different bacteria were used in this study: Escherichia coli strain MG1655, Staphyloccus aureus strain ATCC25923 and Psudomonas aeruginosa strain PAO1. For E. coli MG1655 strain, the reference genome used in this study differs from the original E. coli MG1655 strain sequenced by Blattner et al 4 , most notably as it contains a pBAD plasmid. For this reason, we generated an independent reference assembly of our strain using both Illumina and Oxford Nanopore sequencing data (see below). For DNA extraction, high molecular weight gDNA was extracted from freshly cultivated cells of this strain using the Qiagen DNeasy UltraClean Microbial Kit according to the manufacturer's instructions. Briefly, twenty milliliters of overnight culture was centrifuged at 3200 g for 5 min to obtain a cell pellet. The pellet was then washed with 5 mL sterile 0.9% sodium chloride solution, and then resuspended in 300 μL PowerBead solution before continuing with the kit 's manufacturer protocol. The final gDNA was eluted with 50 μL elution buffer pre-warmed to 42°C. The concentration of isolated DNA was measured using Qubit 2.0 (Thermo Fisher Scientific, USA) and diluted in water.

Barcode design
Sample index (barcode) design using a previously introduced method 5 yielded a set of 96 x 8 bp sequences ( S2 ). Each i5 barcode was the reverse complement sequence of the corresponding i7 barcode (tandem complement design). Barcode sequences were designed such that no barcode contained 3 or more identical bases in a row, and the mean GC content was 0.499, max 0.875 and min 0.125 ( S2 ). Note that tandem complement barcode combinations can not be used on the Illumina NovaSeq system, therefore, only 9120 of the 9216 possible barcode combinations are viable when creating libraries intended for sequencing on that system. See http://sapac.support.illumina.com/bulletins/2017/08/recommended-strategies-for-unique-d ual-index-designs.html for further details on this limitation of the NovaSeq system.

Nextera Flex sequencing libraries preparation
We first created a library using the standard protocol of Nextera Flex (referred to as "Standard Flex"). The Standard Flex library was constructed using all standard kit reagents from the Nextera DNA Flex Library Prep kit (Illumina, USA), following the manufacturer's protocol. Briefly, 200 ng input DNA in 10 ul nuclease free water was tagmented by adding 10 ul of Bead Link Transponsase (BLT) and 10 ul of TB1 solution. The sample was then incubated in the thermocycler at 55C for 15 mins, then held at 10C. After the incubation, 10 ul of TSB solution was added into the tagmentation reaction, and the sample was incubated at 37C for 15 mins, then held at 10C. The sample was then transferred to the magnet to isolate the DNA-BLT complex. The DNA-BLT complex was washed with 100 ul of TWB solution three times. The PCR reaction for library amplification was prepared by mixing 20 ul of Enhanced PCR Mix (EPM) with 20 ul of nuclease free water. The mixture was added into the DNA-BLT complex. 5 ul of each i5 and i7 adapter was added into the PCR reaction. The final volume of the PCR reaction is 50 ul. The condition of PCR was 68C for 3 mins, 98C for 3 mins, followed by 5 cycles of [98C for 30 sec -62C for 30 sec -68C for 2 mins], 68C for 1 mins and held at 10C. After library amplification, the sample tube was placed onto the magnet. Forty-five ul of the PCR supernatant was mixed with 85 ul of diluted SPB (45 ul of PB solution diluted in 40 ul of RSB solution), and incubated at room temperature for 5 mins. The sample tube was then placed on the magnet, and 125 ul of supernatant was transferred into a new sample tube containing 15 ul of undiluted PB. The sample was mixed and incubated at room temperature for 5 mins. Then, the tube was placed on the magnet. The supernatant was discarded, and the bead was washed with 200 ul of fresh 80% ethanol twice. The bead was left to air-dry at room temperature, and were resuspended in 32 ul of RSB solution. The bead was incubated at room temperature for 2 mins. The sample tube was placed on the magnet, and finally 30 ul of eluted library was transferred into a new sample tube. The concentration of eluted library and the library size were measured using Qubit High Sensitivity dsDNA kit (Thermo Fisher Scientific, USA) and the High Sensitivity Bioanalyzer chip (Agilent, USA), respectively. We also created a library using 1:50 diluted BLT beads (referred to as "1:50 Flex"). The 1:50 Flex library was obtained by following the standard Nextera Flex protocol using the standard reagents, except for the BLT beads which were diluted 1:50 with nuclease free water prior to use. Only 10 ng of input DNA was used and the cycle number for library amplification PCR was adjusted to 12. Both Standard Flex and 1:50 Flex libraries were purified, pooled in equal volumes, diluted to 4 nM and QC on the Bioanalyzer (Agilent Technologies, USA). The pool was sequenced on Illumina MiSeq platform 2x300 bp using MiSeq Reagent Kit V3 (600 cycles PE) cartridge (Illumina, USA).

Tagmentation and Hackflex sequencing library preparation
For Hackflex libraries, ninety-six libraries were prepared using laboratory-made and adapted reagents from the Nextera DNA Flex Library Prep kit (Illumina; S1 ). All incubation temperature and time used in the Hackflex protocol were the same as in the Standard Flex protocol except the PCR amplification step. Briefly, BLT beads were diluted 1:50 with nuclease free water (Invitrogen). 10 ng of input gDNA in 10 ul ultrapure water (Invitrogen) was mixed with 10 ul of 1:50 diluted BLT, and 25 ul of 2x laboratory-made tagmentation buffer (20 mM Tris (pH 7.6) (Chem-Supply), 20 mM MgCl (Sigma), and 50% (v/v) Dimethylformamide (DMF) (Sigma)). The final volume of the tagmentation reactions was 45 ul. After tagmentation, 10 ul of 0.2% of sodium dodecyl sulphate (SDS; Sigma) was added into the sample to stop tagmentation, instead of using TSB. Then, instead of TWB, the beads were washed three times using 100 ul of washing solution (0.22 μm MF-Millipore™ membrane filtered solution of 10% polyethylene glycol (PEG) 8000 (Sigma), 0.25M NaCl (Chem-Supply) in Tris-EDTA buffer (TE) (Sigma)). For library amplification, EPM master mix was replaced with the PrimeSTAR GXL DNA Polymerase kit (Takara), following the manufacturer protocol Each PCR reaction contains 10 ul of 5x GXL buffer, 4 ul of 25 mM dNTPs, 2 ul of PrimeStar GXL polymerase, 19 ul of nuclease free water. The PCR mix was added into washed BLT beads. Then, 5 ul of each custom synthesized 96-well plate Illumina Adapter Oligos i5 and i7 (i7: IDT plate#: 11680765; i5: IDT plate#: 11680754) ( S2 ) were added to a final concentration of 0.555 uM to each reaction. The final volume for the PCR reaction is 45 ul. Library amplification was performed with different conditions from the manufacturer's recommended protocol, as follows: 3 min at 68C, 3 min at 98C, 12 cycles of [45 sec at 98C -30 sec at 62C -2 min at 68C], 1 min at 68C and hold at 10C. Then, size selection and purification of the library followed, replacing reagents SPB and RSB with equal volumes of SPRIselect beads (Beckman Coulter) and ultrapure water (Invitrogen) respectively. Reactions were then pooled in equal volumes. The concentration of the pooled library was measured with Qubit HS dsDNA kit (Thermo Fisher Scientific). Fragment size distribution was assessed using the High Sensitivity DNA kit on the Bioanalyzer (Agilent Technologies). The final library was diluted and denatured following manufacturer's instructions, then 4 pM of the pooled library with 5% PhiX v3 control (Illumina) was loaded onto an Illumina MiSeq instrument and sequenced using MiSeq V2 chemistry, generating 2 x 150 bp paired-end reads with a cluster density of 471 K/m m 2 (cluster passing filter 92%).

Preparation of additional Illumina libraries
During development of the Hackflex protocol (described above), we measured the effect of different polymerases used in the library amplification step, in particular the standard EPM master mix included in the Nextera DNA Flex kit and KAPA Master Mix (2xKAPA HiFi HotStart ReadyMix #KK2602; KAPA Biosystem, USA), on library yield and GC coverage bias. We measured the effect using genomic DNA from S. aureus strain ATCC 25923 (Sa) and P. aeruginosa strain POA1 (Pa) . To do this, four different types of libraries were prepared for each gDNA sample: 1) Standard Flex with EPM master mix for library amplification (SF_1), 2) 1:50 Flex with EPM master mix (SF_1:50), 3) Hackflex with KAPA master mix (KAPA_1:50), 4) Hackflex but 1:20 BLT beads with KAPA master mix (KAPA_1:20). There were eight libraries in total: Sa_SF_1, Sa_SF_1:50, Sa_KAPA_1:50, Sa_KAPA_1:20, Pa_SF_1, Pa_SF_1:50, Pa_KAPA_1:50 and Pa_KAPA_1:20. The name of the library indicates the source of gDNA used and the library preparation. For example, the library Sa_SF_1 was the library generated from S. aureus ATCC25923 using Standard Flex with EPM master mix, and Pa_SF_1:50 was the library generated from P. aeruginosa POA1 using 1:50 Flex with EPM master mix. using different genomic DNA samples. Each library preparation condition is shown schematically in Supplementary Table 3 ( S3 ). All additional libraries, except Sa_SF_1 and Pa_SF_1, were prepared using 10 ng input gDNA and 12 PCR cycles for library amplification. For Sa_SF_1 and Pa_SF_1, the libraries were prepared using 200 ng of input DNA and 5 PCR cycle for library amplification. After library preparation, the concentration of each library was measured using Qubit HS dsDNA kit (Thermo Fisher Scientific) and the fragment size was analyzed with the High Sensitivity DNA kit on the Bioanalyzer (Agilent Technologies). Libraries were sequenced on Illumina MiSeq instrument, using MiSeq V3 chemistry, generating 2 x 300 bp paired-end reads.

Nanopore library preparation and sequencing
For long-read MinION sequencing, libraries were prepared using the 1D ligation sequencing kit (SQK-LSK108) from Oxford Nanopore Technologies (ONT) with modifications to the standard ONT protocol as described previously 6 . The sample was barcoded using the Native Barcoding Expansion kit (EXP-NBD103) and barcoded templates were then pooled together with two other samples from an unrelated project. The final library was loaded onto a ONT MinION instrument with a FLO-MIN106 (R9.4) flow cell and run for 48 h as per the manufacturer's instructions. Live base-calling was not performed during the run. Nanopore sequence data was combined with Illumina sequence data into a hybrid genome assembly using the Unicycler software, version 0.46. Unicycler was run with default parameters in "normal" mode.

Data analysis
All the data analysis methods are described below and represented schematically in Supplementary Figure S4 ( S4 ).

Barcode demultiplexing
Hackflex reads were demultiplexed with Bcl2fastq (Bcl2Fastq 2.18.0.12, Illumina, Inc.) software with default settings, allowing one mismatch per index. Barcode cross-contamination was quantified with the PhyloSift command demux . Barcode counts were retrieved from the demultiplexing statistics output of Bcl2fastq and histograms representing barcode distribution were generated with R Studio, version 1.1.463 (RStudio: Integrated Development Environment for R, Boston, Massachusetts).

Processing of libraries before mapping
Raw reads were assessed for quality with FastQC version 0.11.8 ( http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ ) and using the Bioconductor package Rsubread 7 . PHRED scores were plotted in R studio. Trimming and normalization of the libraries were performed using the bbtools package ( http://jgi.doe.gov/data-and-tools/bbtools ). Due to operational constraints, Standard Flex and 1:50 Flex libraries were sequenced on a different flow cell than the Hackflex library, producing 300 bp reads instead of 150 bp reads (as with Hackflex

Short read mapping and coverage analysis
Short read mapping of Standard Flex, 1:50 Flex and Hackflex was computed against all 10 contigs of the fragmented reference assembly with samtools 8 version 1.7 ( http://samtools.sourceforge.net/ ) and the bwa-mem 9 ( http://bio-bwa.sourceforge.net/ ) alignment software ( S4 ). After mapping, PCR duplicates were removed using the samtools command markdup . For simplicity, only coverage data from the largest hybrid assembly contig (4.46Mbp, 96.1% of the total assembly) was analyzed and is taken to be representative of the genomic coverage as a whole. To avoid potential problems associated with mapping reads near contig boundaries, coverage data were trimmed to remove 200 nt from each end of the contig. Reads that failed mapping were analysed with Kraken 10 , using the command krakenhll and default parameters ( S4 ). Histograms representing the coverage distribution were generated by using the mapped depth of coverage information as computed by samtools mpileup for each position on the E. coli MG1655 reference genome and plotting the frequency of each level of coverage with R Studio, version 1.1.463 (RStudio: Integrated Development Environment for R, Boston, Massachusetts). Low coverage regions were visualized with the Integrative Genomic Viewer (IGV) 11 ( S4 ). Short read mapping and coverage analysis were performed for 4 additional libraries in the same manner as described above. Libraries Sa_SF_1:50 and Sa_SF_1 were mapped against S. aureus ATCC 25923 reference genome (CP009361.1). Libraries Pa_SF_1:50 and Pa_SF_1 were mapped against P. aeruginosa PAO1 reference genome, which was obtained co-assembling libraries 3 and 4 with A5-miseq 12 .

Data availability
All sequence data has been deposited in NCBI under accession number PRJNA549801.

Hackflex library preparation and sequencing
Our customised library preparation protocol, Hackflex, involves several modifications to the Nextera Flex method, including the use of a 1:50 dilution of the bead-linked transposase and the replacement of several kit components with alternative reagents to greatly expand the total number of libraries that can be produced from a single kit. To evaluate the performance of the Hackflex protocol, we performed in parallel a sequencing library preparation with the standard Nextera Flex protocol (which we refer to as "Standard Flex") and with an adapted version of the Nextera Flex protocol using the diluted transposase beads but using standard kit components (we refer to as "1:50 Flex"). Libraries Standard Flex, 1:50 Flex and Hackflex were prepared from genomic DNA of E. coli strain MG1655. Libraries were sequenced on an Illumina MiSeq. Read counts obtained, read-and library metrics are shown in Table 1 .

Barcode distribution and quality
Ninety-six i5 and i7 barcodes (8 bp) were designed for this study to provide a resource for high throughput multiplexing of Hackflex libraries. To this end, performance of the 96 designed barcodes was evaluated by subjecting E. coli MG1655 DNA to 96 independent library constructions with Hackflex reagents, each library with a different barcode combination ( S2 ). In order to assess uniform performance of individual barcodes, barcode calling analysis was performed. Barcode analysis with the phylosift demux command showed a high rate of correctly paired barcoded reads (99.84%), with a remaining 0.16% which can be attributed either to oligonucleotide cross-contamination or errors during sequencing such as the presence of multiple overlapping clusters on the flowcell surface, base miscalls, and image processing errors during cluster calling. With the exception of two failed libraries (with 6 and 9 reads), the relative abundance of barcodes across the 96 libraries was homogenous. The average barcode count is 5517 barcodes per sample and 50% of the samples contain between 4292 and 7128 barcodes. ( Fig. 2 ) Ninety-eight percent (94 out of 96) of the libraries fall within a 6.8-fold range of relative abundance ( S5 ). Coefficient of variation is 0.38 with, and 0.34 without the two outliers.
The GC content of the oligos designed for this study was measured and plotted against the read count obtained from each oligo pair. A lower yield was obtained from libraries constructed with higher GC content barcodes ( S6 ), possibly due to the tandem complement design of the barcode pairs. The two failed libraries have a normal GC content (38% and 50%), falling within the GC range of the Hackflex barcodes where 79.2% of the barcodes (76/96) have between a 30% and 70% GC content. The estimated deltaG of primer dimers for the two outliers did not differ from that estimated for the other oligo pairs.

Quality of raw reads
Quality scores from the three libraries were obtained using the Bioconductor package Rsubread 7 .
PHRED scores increased from <30-34 to 36-38 within the first 25 nt of each read, independent of the library preparation method ( Fig. 3 ). Reads from the Hackflex library started from a lower score (29-32) and reached the same PHRED score of reads from Standard Flex and 1:50 Flex (36-38) within the first 25 nt. All reads from each library reached a maximum PHRED score in the first 25 nucleotides, to decrease slightly after the first 25 nt from 36-38 to 34-36 for Hackflex, while reads from Standard Flex and 1:50 Flex decreased from 36-38 to 32-34 across the first 150 nt of each read. ( Fig. 3 )

Cleaning
Quality filtering, PhiX DNA and adapter removal with BBDuk resulted in the removal of 58.59%, 62.75%, and 66.95%, leaving a total of 776728, 690930 and 818478 reads for Standard Flex, 1:50 Flex and Hackflex, respectively. Median read length was 151 nt for all libraries. ( Table 2 ) With 1:50 Flex being the library with fewest reads, Standard Flex and Hackflex were randomly subsampled to the same number of reads as 1:50 Flex.

Fragment size
Quality filtered and trimmed reads were mapped against the reference sequence with samtools and bwa mem. The .bam output was converted to text with samtools to report fragment size. The text file was analyzed with R studio and fragment size versus read density was plotted for Standard Flex, 1:50 Flex and Hacklex libraries ( Fig. 4 ). The observed fragment size distribution appeared uniform for all three libraries, with 1:50 Flex reads being more skewed to the left, representing a higher density of <250 bp fragments, Standard Flex reads being slightly skewed to the right, representing a higher density of larger fragments compared to the other two libraries. Hackflex showed a centered distribution, with the highest density of fragments being between >250-300 bp. ( Fig. 4 )

Coverage
In order to assess performance of the library prep methods, reads were aligned with BWA-MEM 9 and samtools 8 to the E. coli MG1655 genome. The mapping files of the libraries were converted to .tsv and analyzed with R studio for coverage. A mapped fraction of 0.999, 0.999 and 0.998 was measured for Standard Flex, 1:50 Flex and Hackflex, respectively ( Table 2; S7 ). Unmapped reads from the Hackflex library appear to derive from an unknown source of contamination, as suggested by Kraken 10 analysis ( S8 ). The libraries showed a normal distribution ( Fig. 5 )

GC content
In order to assess the correlation between GC coverage bias and sequence coverage, the GC content of the E. coli MG1655 reference genome and reads from Standard Flex, 1:50 Flex and Hackflex libraries were assessed. To this end the mapping (.bam) output of each library was converted to .tsv with the samtools command mpileup 8 as above and analyzed with ALFRED. The output was loaded into R studio and reference genome GC content (102 bins) against coverage was plotted for the three libraries ( Fig. 6 ). A significant negative correlation between GC content and coverage was seen for Standard Flex, 1:50 Flex and Hackflex libraries where the GC content ranged between 30% and 70%. All three libraries showed to a certain extent a bias at extreme GC content areas as it would be expected 12,13 . The extent of bias was highest for 1:50 Flex (⍴ = -0.959; p-value: 1.04e-112 ), lower for Standard Flex (⍴= -0.950; p-value: 6.62e-100 ) and lowest for Hackflex (⍴= -0.770; p-value: 1.847e-42 ). All tests of correlation were carried out using weighted Pearson's correlation coefficient on the observed/expected read count ratios, using the read counts from the 102 GC bins as weights.

Low coverage regions
Standard flex, 1:50 Flex and Hackflex each produced 5, 10 and 11 regions, respectively, with low coverage (<3 reads per site) and no sites with zero coverage ( S10-11 ). The low coverage regions were overlapping to a certain extent among the three libraries, possibly indicating a common feature of these regions that biases against their sequencing with Illumina chemistry. The size of the low coverage regions and their position on the reference genome are displayed in Supplementary Figure S10 , where it is possible to notice to what extent each low coverage region was exclusive to a library or common to two or all three of the libraries. The low coverage in those areas does not appear to be associated with GC content extremes ( S10-S11 ).

Additional libraries: yield and GC coverage bias
Additional libraries were prepared in order to test the performance of 2x KAPA HiFi HotStart ReadyMix #KK2602 (KAPA) polymerase with the Hackflex reagents in terms of yield and GC coverage bias. The yield using 2x KAPA HiFi HotStart ReadyMix #KK2602 (KAPA) with Hackflex reagents ranged from 23.3 to 34.6 nM compared to that of EPM with Standard Flex reagents of 18.5 to 23.1 nM ( S3 ) and to that of PrimeSTAR with Hackflex reagents of 43.1 to 52.4 nM. As expected, a higher GC coverage bias was seen from libraries produced with 12 rather than 5 PCR cycles ( S9 ).

DISCUSSION
The Hackflex library prep workflow we have introduced is as time effective as the Standard Nextera Flex method and yields significant savings in terms of reagent costs (from 1.66-fold for 96x2 samples to 8.23-fold for 96x50 samples). This study demonstrates that data of comparably high quality can be obtained with Hackflex as could be generated by the existing Nextera Flex method. Two out of ninety-six libraries made with Hackflex yielded a low read count, which could be attributed to human error during the error prone step of sample indexing prior to amplification. We suggest as a further improvement to the Hackflex protocol, the automation of the sample indexing step using liquid handling robots so as to eliminate human error. It is worth noting that the libraries Standard Flex, 1:50 Flex and Hackflex in this study have been constructed with different polymerases. Standard Flex and 1:50 Flex were constructed with EPM from the Nextera Flex kit (Illumina), while PrimeSTAR GXL (Takara) was used for Hackflex. Therefore, the small differences in coverage observed in this study may be attributable to the different polymerases used. Before opting for PrimeSTAR GXL (Takara), we tested the performance of KAPA HiFi HotStart ReadyMix #KK2602 (KAPA) as used by Lamble et al 14 , but we observed half the yield from these PCR reactions ( S3 ) as with PrimeSTAR GXL. This stands in contrast to the behavior of KAPA HiFi when coupled with the original Nextera and Nextera XT protocols, where it produces high library yields. Additionally, PrimeSTAR GXL can be decreased to 1.25 units per reaction (50% of the amount used in this study) as per manufacturer's instructions, without compromising the quality of the library (data not shown), further reducing the costs of Hackflex from 1.75-fold for 96x2 to 11-fold for 96x50 samples ( S1 ).
Although there is no indication from the present study of Hackflex performing worse than Nextera Flex with genomes having extreme GC content and with lower DNA inputs, this remains to be tested more comprehensively in future work.

Caveats and limitations
The i5 and i7 barcodes we describe in this work have a tandem complement design, where the corresponding wells of the i7 barcode oligo plate have the reverse complement of the barcode of the corresponding well in the i5 plate. It has been noted by Illumina that the use of tandem complement barcodes on the current generation NovaSeq instruments can lead to significantly reduced quality scores for the i5 index read ( http://sapac.support.illumina.com/bulletins/2017/08/recommended-strategies-for-uniquedual-index-designs.html ). In this study our samples were sequenced on an Illumina MiSeq instrument, which does not appear to be prone to the tandem complement barcode limitation. Although the index read 2 quality of our data was not noticeably impacted when compared to Standard Flex, we did measure a trend towards lower read count of libraries made from higher GC content oligos ( S6 ). Possibly this effect is due to the tandem complement design of the oligo pairs. On a positive note, the self-designed barcode oligos produced by IDT using their standard oligo plate manufacturing process appeared to yield a relatively low cross-contamination rate.

CONCLUSION
Here we have developed and characterised an alternative method of library construction for Illumina sequencing which by reducing the library prep expenses, allows users to process from 1.75-fold to 11-fold more samples at the same reagent cost. Comparison with the existing Nextera Flex method demonstrates that Hackflex is a valid and cost-effective alternative to construct libraries at a large scale.

Conflicts of interest
A.D. and L.M. have a commercial interest in Longas Technologies Pty Ltd, which is developing synthetic long read sequencing technologies for short read sequencing platforms.

Funding information
This work was funded in part by ARC Linkage Project LP15100912, Lead CI: Steven P.
Djordjevic.          Figure 4 (S4). Schematic overview of data analysis methods used in this study.