RT Journal Article SR Electronic T1 Data-Driven Strain Design Using Aggregated Adaptive Laboratory Evolution Mutational Data JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.07.19.452699 DO 10.1101/2021.07.19.452699 A1 Patrick V. Phaneuf A1 Daniel C. Zielinski A1 James T. Yurkovich A1 Josefin Johnsen A1 Richard Szubin A1 Lei Yang A1 Se Hyeuk Kim A1 Sebastian Schulz A1 Muyao Wu A1 Christopher Dalldorf A1 Emre Ozdemir A1 Bernhard O. Palsson A1 Adam M. Feist YR 2021 UL http://biorxiv.org/content/early/2021/07/20/2021.07.19.452699.abstract AB Microbes are being engineered for an increasingly large and diverse set of applications. However, the designing of microbial genomes remains challenging due to the general complexity of biological system. Adaptive Laboratory Evolution (ALE) leverages nature’s problem-solving processes to generate optimized genotypes currently inaccessible to rational methods. The large amount of public ALE data now represents a new opportunity for data-driven strain design. This study presents a novel and first of its kind meta-analysis workflow to derive data-driven strain designs from aggregate ALE mutational data using rich mutation annotations, statistical and structural biology methods. The mutational dataset consolidated and utilized in this study contained 63 Escherichia coli K-12 MG1655 based ALE experiments, described by 93 unique environmental conditions, 357 independent evolutions, and 13,957 observed mutations. High-level trends across the entire dataset were established and revealed that ALE-derived strain designs will largely be gene-centric, as opposed to non-coding, and a relatively small number of variants (approx. 4) can significantly alter cellular states and provide benefits which range from an increase in fitness to a complete necessity for survival. Three novel experimentally validated designs relevant to metabolic engineering applications are presented as use cases for the workflow. Specifically, these designs increased growth rates with glycerol as a carbon source through a point mutation to glpK and a truncation to cyaA or increased tolerance to toxic levels of isobutyric acid through a pykF truncation. These results demonstrate how strain designs can be extracted from aggregated ALE data to enhance strain design efforts.Competing Interest StatementThe authors have declared no competing interest.ALEadaptive laboratory evolutionSNPsingle nucleotide polymorphismDELdeletionMOBmobile insertion elementsINSinsertionSUBsubstitutionAMPamplificationTFBStranscription factor binding siteRBSribosomal binding siteSVstructural variantSIFTSorting Intolerant from TolerantPykFPyruvate kinase IGlpKglycerol kinaseCyaAAdenylate cyclaseCrrPTS system glucose-specific EIIA componentPTSphosphotransferase systemEIIAEnzyme II ACCRcarbon catabolite repressioncAMP-CRPactivated CRP complexCRPcAMP receptor proteincAMPcyclic AMPΔΔGThe predicted difference between the free energy of unfolding the protein structure before and after the variant.