More than 25,000 complete phage genomes are now available, and prophage hunting has identified millions more putative phages. New sequence data from long-read sequencing has enabled the rapid and accurate sequencing and assembly of phages. The widespread adoption of AI has revolutionized our ability to model individual proteins' structures, compare those models to databases of known and predicted structures, and assign functions to proteins where all we know is the DNA sequence. Using AI has unveiled hidden genomic structure that enables synteny-based approaches to predict gene function and opened the door for new embeddings and encodings that can be used to predict function. Even more traditional bioinformatics approaches are being upended by the wealth of genomic data and the near-ubiquity of AI-based methods now available.
The latest advancements in phage genomics and bioinformatics mean that we need to take advantage of new tools and ideas to understand phages, so that we can leverage these advances to make richer and more accurate predictions about what bacteria phages are infecting, what they are doing during their infection cycle, and how phages impact their host, either by killing their hosts or while idly sitting in the bacterial genome. Only when we understand phages can we really begin to use them for phage therapy in humans, agriculture, and veterinary sciences.