Klebsiella pneumoniae is a leading cause of multidrug resistant nosocomial outbreaks, and results in high mortality rates in vulnerable populations such as neonates. The K- and O- polysaccharide antigens are attractive targets for a polyvalent glycoconjugate vaccine. These antigens have distinct serotypes, which correspond to unique gene clusters, known as the K- and O- locus respectively. We previously presented Kaptive and Kaptive 2, programs to identify K and O loci directly from K. pneumoniae genome assemblies (also adapted for Acinetobacter baumanii), enabling sero-epidemiological analyses to guide vaccine development. However, for some genome collections Kaptive (v1 and v2) was consistently unable to identify these loci due to high rates of fragmentation in the genome assembly, and this resulted in a high proportion of missing values. We therefore sought to update Kaptive to improve its sensitivity and accuracy for typing fragmented loci.
Kaptive v3 uses a new approach to identify the best matching K or O locus from a genome assembly: rather than using full length BLASTn search followed by tBLASTn to confirm the presence of individual protein coding genes, Kaptive v3 uses minimap 2 to search for genes and selects the best matching K/O locus based on a cumulative weighted alignment score. It then performs a pairwise alignment on the gene translations to confirm that each predicted protein exceeds the minimum identity threshold compared to the reference.
We compared the sensitivity and accuracy of Kaptive K locus calls on genome assemblies generated from subsampled Illlumina read sets (decrements of 10x depth), for which a corresponding high quality completed genome was also available to determine the ‘true’ loci via manual inspection (n=550 K. pneumoniae, n=200 A. baumannii).
Kaptive v3 showed a higher proportion of typable calls, 0.82-0.98 vs 0.05-0.96 for K. pneumoniae and 0.96-0.99 vs 0.51-0.98 for A. baumannii. Kaptive v3 also showed a higher proportion of correct calls, 0.8-0.97 vs 0.57-0.97 for K. pneumoniae and 0.92-1 vs 0.72-1 for A. baumannii.
Finally, Kaptive 3 ran two orders of magnitude faster than Kaptive 2 on an 8-core laptop allowing for faster in silico serotyping which is more accurate on poorer-quality genome assemblies.