Novel antimicrobial peptide discovery using machine learning and biophysical selection of minimal bacteriocin domains

Bacteriocins are ribosomally produced antimicrobial peptides that represent an untapped source of promising antibiotic alternatives. However, inherent challenges in isolation and identification of natural bacteriocins in substantial yield have limited their potential use as viable antimicrobial compounds. In this study, we have developed an overall pipeline for bacteriocin-derived compound design and testing that combines sequence-free prediction of bacteriocins using a machine-learning algorithm and a simple biophysical trait filter to generate minimal 20 amino acid peptide candidates that can be readily synthesized and evaluated for activity. We generated 28,895 total 20-mer peptides and scored them for charge, α-helicity, and hydrophobic moment, allowing us to identify putative peptide sequences with the highest potential for interaction and activity against bacterial membranes. Of those, we selected sixteen sequences for synthesis and further study, and evaluated their antimicrobial, cytotoxicity, and hemolytic activities. We show that bacteriocin-based peptides with the overall highest scores for our biophysical parameters exhibited significant antimicrobial activity against E. coli and P. aeruginosa. Our combined method incorporates machine learning and biophysical-based minimal region determination, to create an original approach to rapidly discover novel bacteriocin candidates amenable to rapid synthesis and evaluation for therapeutic use.


Introduction
Many bacteria have become resistant to conventional antibiotics, necessitating the 35 discovery of novel antimicrobial compounds 1 . However, pharmaceutical antibiotic development 36 has declined chiefly due to brief usability window of existing antibiotic scaffolds 2 . To combat 37 the lack of novel antimicrobial discovery, many bioinformatic approaches have been developed 38 to mine the genomes of bacteria for natural products 3 . One promising class of natural products 39 are bacteriocins, the ribosomally produced antimicrobial peptides of bacteria 4,5 . These

Antibiofilm Formation Assays 154
Antibiofilm activity of the peptides were assessed using USA 300 and PAO1. For 155 USA300 biofilms, overnight cultures grown in TSB (Sigma-Aldrich) were diluted 1:100 in TSB 156 .1% glucose 1% NaCl with or without peptide 34 . For PAO1 biofilms, overnight cultures grown 157 in LB were diluted 1:100 in M63 1mM MgSO4 and .4% arginine with or without peptide 35 . 158 Samples were incubated for 24 hours in a microplate. Planktonic cells were removed from the 159 wells and the biofilms were washed three times with ddH2O. Biofilms were then stained with 160 .1% crystal violet, washed three times with ddH2O, and resuspended in 30% acetic acid 36

Design and biophysical selection of 20-mer minimal bacteriocins 179
From the initial set of 676 putative novel bacteriocins using the word embedding algorithm, 180 Word2vec, 28,895 total 20-mer bacteriocin peptide candidates were generated (Figure 1). Each 181 peptide was then assigned a low, middle, or high ranking for each of the biophysical parameters 182 based on the range of scores within that parameter ( Figure 2A). For example, a peptide with a 183 net charge of 5, a helical score of 17, and a µH of 900 would rank middle for charge, low for 184 helicity, and high for µH ( Figure 2A). 185 80% of the 20-mers received a low ranking for charge (a net positive charge between +1 and 186 +3) while only 1% ranked high ( Figure 2B). For the hydrophobic moment values, a majority of 187 the peptides also ranked low (any hydrophobic moment value below 333) with only 5% 188 receiving a high score ( Figure 2B). However, for the helicity score, a majority of the peptides, 189 65%, fell into the middle range of scores between 19 and 22 with only 2% scoring high for 190 helicity ( Figure 2B). It is important to note that the hydrophobic moment and helicity scores was not taken into consideration when calculating these values. 193

Peptide Selection for Chemical Synthesis 194
Many cationic antimicrobial peptides will adopt an amphipathic alpha helical 195 conformation. Therefore, we reasoned that of the peptides generated by our script those ranking 196 high in all three biophysical categories would yield the most antimicrobial activity. Of the 197 sixteen peptides selected for synthesis, peptides 1 and 2 ranked low for all three biophysical 198 parameters while peptides 3 and 4 ranked high for the three parameters (Table 1). The 199 remaining 12 peptides were randomly selected from all 20-mers ranking middle in at least one 200 category and high for the remaining parameters (Table 1). 201

PEP-FOLD prediction of secondary structure 202
To determine if our biophysical selection criteria were able to accurately predict an 203 amphipathic alpha helical structure of the peptides selected for synthesis, we modeled their 204 secondary structure using the PEP-FOLD online tool. For peptides 1 and 2, which received low 205 scores for helicity and hydrophobic moment, their structures are predicted to exist as a majority 206 random coil ( Figure 3A). In contrast, peptides 3 and 4, having high scores for helicity and 207 hydrophobic moment, are predicted to exist as fully extended alpha helices with clear clustering 208 of the polar and charged amino acids to one side of the helix and the hydrophobic residues on 209 the other, indicative of a strong hydrophobic moment ( Figure 3B). Peptides 5 through 10 have a 210 high helicity score; however, the modeling predictions expect unstructured regions owing to 211 helix-breakering residues glycine and proline that occur within their sequences (Figure 3 and 212 Table 1). All of these peptides also received middle scores for their hydrophobic moment which 213 is visible as hydrophobic residues within the polar face of the helix, such as peptide 6, and predicted to exist as a beta sheet ( Figure 3E). The biophysical calculator only takes into account 216 the Chou-Fasman residue helical propensity score and does not calculate the individual 217 likelihood of forming a beta sheet; therefore, peptides with a higher sheet propensity were not 218 excluded from the list of peptides for synthesis. Finally, the rest of the peptides are predicted to 219 adopt various helical structures with differing amphipathic characteristics ( Figure 3). 220

Antimicrobial Properties of Synthetic 20-mers 221
The peptides were assessed for their minimal inhibitory concentration (MIC) and 222

minimal bactericidal concentration (MBC) on Escherichia coli, Staphylococcus aureus, and 223
Pseudomonas aeruginosa ( Table 2). As expected, peptides 1 and 2, which scored low in all 224 three biophysical parameters, did not have activity against any of the organisms tested. Peptides 225 3 and 4, which scored high in all three biophysical parameters, exhibited antimicrobial activity 226 against both E. coli and P. aeruginosa (Table 2). Peptides 5, 6, and 7 scored high in charge and 227 helicity and middle in hydrophobic moment (Table 1). Interestingly, these peptides showed a 228 range of antimicrobial activities (Table 2). Peptide 6 was more efficient at inhibiting the growth 229 of P. aeruginosa (MIC = 32 µM) than E. coli (MIC =128 µM). Peptides 5 and 7 were much less 230 active than peptide 6 despite having similar values for their biophysical scores (Tables 1 and 2). 231 Peptides 8, 9, and 10 scored high for helicity with middle scores for charge and hydrophobic 232 moment. These peptides did not have any antimicrobial activity against the organisms tested. 233 This overall trend continued for the rest of the peptides tested. Indeed, peptides scoring high in 234 any one of the biophysical parameters with only middle scores for the others (peptides 8-16) did 235 not have any antimicrobial activity. We did not test any of the peptide candidates at 236 out. 238

Inhibition of Biofilm Formation by the Synthetic 20-mers 239
Despite not having a true MIC, we observed that peptides 11 and 16 were able to 240 significantly reduce the overnight growth of S. aureus cultures (Supplementary Figure 1A-B). 241 To investigate if these peptides were exerting antibiofilm effects, we employed the biofilm 242 formation assay. Upon incubation with peptide 11 for 24 hours in biofilm inducing media, we 243 observed a significant decrease in USA 300 biofilm formation down to a concentration of 4 µM 244 (Supplementary Figure 1C). This trend was also observed for peptide 16; however, this only 245 inhibited biofilm formation down to 16 µM (Supplementary Figure 1D

Peptide mammalian cell cytotoxicity 252
To determine if our biophysical parameters were able to select for peptides with affinity 253 for bacterial membranes instead of mammalian membranes, we assessed their ability to 254 compromise the membranes of erythrocytes and keratinocytes. Fourteen of the peptides 255 exhibited no hemolytic activity even at high concentrations (Supplementary Table 1). However, 256 peptides 2 and 10 exhibited increased levels of hemolysis at only the highest concentrations 257 (128 µM). Cytotoxicity to keratinocytes was interrogated using the ethidium homodimer assay. 258 We observed that all of the peptides were unable to cause cell death when incubated with peptides generally do not target mammalian membranes. 261

Discussion 262
Bacteriocins are a barely-tapped source of highly diverse antimicrobials. However, 263 verifying the antimicrobial activity of putative bacteriocins can be difficult due to the potentially 264 narrow activity spectra and highly diverse mechanisms 4,37 . Additionally, traditional methods of bacteriocins 25 . We observed that peptides with the highest scores for the biophysical parameters 285 of charge, helicity, and hydrophobic moment were the most active against the bacteria tested 286 (Table 2). Interestingly, the only two peptides to meet these criteria were from the same putative 287 bacteriocin. It is therefore highly likely that this putative bacteriocin works in a membrane 288 active manner 29,42,43 . The interpretation of these data becomes confounded for the peptides 289 whose biophysical parameters begin to receive middle scores. For example, peptide 6, with a 290 middle score for hydrophobic moment, is a more effective antimicrobial against P. aeruginosa, 291 MIC = 32, than E. coli, MIC = 128. This observation is in contrast to the activities of the high 292 scoring peptides, 3 and 4, whose antimicrobial activities were higher against E. coli. Therefore, 293 it may be possible to tune antimicrobial specificity by modifying the biophysical scores 46,47 . 294 While most research has focused on modification of these parameters and their effects on 295 eukaryotic cytotoxicity and overall antimicrobial activity few have examined how these 296 parameters tune the specificity of these compounds to specific bacteria 48,49 . 297 There are some drawbacks to this approach. While it seems that our approach has selected for 298 antimicrobial regions of putative bacteriocins, it is also possible that using a minimal synthetic 299 peptide strategy has decoupled the function of the synthetic bacteriocin from the function of the 300 full sequence. Enterocin AS-48 undergoes dimer formation and then subsequent tertiary 301 structural changes before inserting itself into the membrane of target bacteria 50 . However, 302 synthetic AS-48 peptides lose this ability to dimerize and work in a mechanism more akin to 303 carpet or pore models of synthetic antimicrobial peptide activity 25 . Therefore, some of the 304 antimicrobial function and specificity inherent in bacteriocins will be lost by utilizing synthetic 305 target the bacterial membrane or whose biophysical characteristics change upon post-307 translational modification 4,5,11 . 308 Despite these drawbacks, the techniques described herein have potential for linking de novo 309 computational bacteriocin discovery with immediate therapeutic development. With the 310 increasing amount of computational work being done to predict novel antimicrobial compounds 311 there is a mounting need to verify their antimicrobial activity in vitro 8-10 . Our method validates 312 the use of machine learning algorithms to further mine genomic information for potential 313 bacteriocins candidates that can be refined using biophysical scripting parameters and size 314 optimization for rapid synthesis and testing. The lack of mammalian cell cytotoxicity in our 315 synthesized peptide set indicates that selecting minimal bacteriocin candidates based on the 316 specific set of biophysical parameters that we have established will select for candidates that 317 specifically target bacterial membranes, a highly valuable outcome from our studies (Table 2  318 and Supplementary Table 1). Many current synthetic antimicrobial peptides used to treat human 319 disease have been built around an existing scaffold from eukaryotes 51 . Omiganan, derived from 320 magainin of the African three-toed frog, is currently being developed as a topical antimicrobial 321 for the treatment of diabetic foot ulcers 52 . In contrast, relatively few bacteriocins have been 322 developed for the treatment of disease 51,53,54 . Our strategy to combine machine learning 323 algorithms for de novo bacteriocin discovery along with biophysical refinement and minimal 324 design represent a particularly robust workflow for the development of new antibiotic 325 compounds. These synthetic bacteriocin scaffolds could be further refined via iterative testing 326 and data collection for efficacy and selectivity. 327
Supplemental Figure 2: Antibiofilm activities of peptides 11 and 16 on P. aeruginosa. A. Peptide 11 and B. peptide 16 exhibit no bacteriostatic activity. C. Peptide 11 exhibits mild antibiofilm activities D. Peptide 16 exhibits mild antibiofilm activity. Data is representative of 3 biological replicates. A * represents a p-value < .05 as determined via one- way  ANOVA (A,B). A * represents a significant difference as determined via Tukey HSD compared to the vehicle control (C,D). Table 1: Cytotoxicity of 20-mer bacteriocins at 128 µM. Y indicates an increase in hemolysis or cytotoxicity at 128 µM. N indicates no increase in hemolysis or cytotoxicity at 128 µM.