Variant calling is fundamental in bacterial genomics, underpinning the identification of disease transmission clusters, the construction of phylogenetic trees, and antimicrobial resistance prediction. This study presents a comprehensive benchmarking of SNP and indel variant calling accuracy across 14 bacterial species using Oxford Nanopore Technologies (ONT) and Illumina sequencing. We generated gold standard reference genomes and projected variations from different strains onto them, creating biologically realistic distributions of SNPs and indels. Our analysis of ONT data, basecalled with three different accuracy models, revealed simplex and duplex read accuracies above 99.0% and 99.9%, respectively.
Our results demonstrate that ONT-based variant calls from deep learning-based tools, specifically Clair3 and DeepVariant, outperformed traditional variant calling methods, delivering SNP and indel accuracy higher than Illumina. We investigate the causes of missed and false calls, highlighting the limitations inherent in short reads and discover that ONT’s traditional limitations with homopolymer-induced indel errors are absent with high-accuracy basecalling models and deep learning-based variant calls. Furthermore, our findings on the impact of read depth on variant calling offer valuable insights for sequencing projects with limited resources, showing that 10x depth is sufficient to achieve variant calls that match or exceed Illumina.
In conclusion, our research highlights the superior accuracy of deep learning tools in SNP and indel detection with ONT sequencing, challenging the primacy of short-read sequencing. The reduction of systematic errors and the ability to attain high accuracy at lower read depths enhance the viability of ONT for widespread use in clinical and public health genomics.