PT - JOURNAL ARTICLE AU - Jacob R. Heldenbrand AU - Saurabh Baheti AU - Matthew A. Bockol AU - Travis M. Drucker AU - Steven N. Hart AU - Matthew E. Hudson AU - Ravishankar K. Iyer AU - Michael T. Kalmbach AU - Eric W. Klee AU - Eric D. Wieben AU - Mathieu Wiepert AU - Derek E. Wildman AU - Liudmila S. Mainzer TI - Performance benchmarking of GATK3.8 and GATK4 AID - 10.1101/348565 DP - 2018 Jan 01 TA - bioRxiv PG - 348565 4099 - http://biorxiv.org/content/early/2018/06/18/348565.short 4100 - http://biorxiv.org/content/early/2018/06/18/348565.full AB - Use of the Genome Analysis Toolkit (GATK) continues to be the standard practice in genomic variant calling in both research and the clinic. Recently the toolkit has been rapidly evolving. Significant computational performance improvements have been introduced in GATK3.8 through collaboration with Intel in 2017. The first release of GATK4 in early 2018 revealed significant rewrites in the code base, as the stepping stone toward a Spark implementation. As the software continues to be a moving target for optimal deployment in highly productive environments, we present a detailed analysis of these improvements, to help the community stay abreast with changes in performance. We re-evaluated the options previously identified as advantageous, such as threading, parallel garbage collection, I/O options and data-level parallelization. Based on our results, we consider the performance and cost trade-offs of using GATK3.8 and GATK4 for different types of analyses.