RT Journal Article SR Electronic T1 Predictive Computational Phenotyping and Biomarker Discovery Using Reference-Free Genome Comparisons JF bioRxiv FD Cold Spring Harbor Laboratory SP 045153 DO 10.1101/045153 A1 Alexandre Drouin A1 Sébastien Giguère A1 Maxime Déraspe A1 Mario Marchand A1 Michael Tyers A1 Vivian G. Loo A1 Anne-Marie Bourgault A1 François Laviolette A1 Jacques Corbeil YR 2016 UL http://biorxiv.org/content/early/2016/03/27/045153.abstract AB The identification of genomic biomarkers is a key step towards improving diagnostic tests and therapies. We present a new reference-free method for this task that relies on a k-mer representation of genomes and a machine learning algorithm that produces intelligible models. The method is computationally scalable and well-suited for whole genome sequencing studies. The method was validated by generating models that predict the antibiotic resistance of C. difficile, M. tuberculosis, P. aeruginosa and S. pneumoniae. We show that the obtained models are accurate and that they highlight biologically relevant biomarkers, while providing insight into the process of antibiotic resistance acquisition.