Antimicrobial peptides are part of the innate immune system and help defend the host against pathogens and regulate the microbiome. Antimicrobial peptides occur in all life, are incredibly diverse, mostly quite small (< 200 amino acids), and only comprise of a small proportion in a genome (~ 1%). This makes them very difficult to find. We created a classification model implemented in an R package, ampir, to predict antimicrobial peptides from protein sequences on a genome-wide scale. ampir was tested on multiple test sets (including complete proteomes) and performed with high accuracy. ampir can be used to narrow down the search space for novel antimicrobial peptides in genomes.
ampir was recently published in Bioinformatics and is available on CRAN and github . Legana has also created a companion repository to accompany the paper and document the thinking behind ampir’s model building process.