Aim: This study leverages machine learning (ML) techniques to analyse antimicrobial resistance (AMR) in Neisseria gonorrhoeae (NG), aiming to refine gene panel selection for more effective AMR monitoring. By applying ML, this project aims to optimise experimental designs by basing gene selection on data analytics to focus on gene variants with high correlation to AMR.
Background: NG, identified by the WHO as a 'high priority' pathogen due to escalating AMR, presents an increasing threat (1). Expansion of NG whole-genome sequencing (WGS) and AMR datasets provides an opportunity to employ ML tools developed by CSIRO, VariantSpark and BitEpi. These analyse gene variants contributing to phenotypes through random forest (RF) algorithms and identify epistatic interactions, respectively, offering insights into the genetic factors driving AMR (2-4).
Methods: Publicly available NG WGS datasets, paired with ciprofloxacin resistance information, were downloaded. A total of 3,297 samples were processed to generate Variant Call Files (VCFs) using bwa mem, referencing NZ_AP023069.1. VCFs, with resistance data, were processed on Amazon Web Services using VariantSpark RF analysis constructing 1,000 trees. The 250 most important variants were analysed using BitEpi for epistatic interactions. RF model's accuracy was assessed using Out-Of-Bag (OOB) error estimates from self-validation datasets.
Results: RF model accurately identified known ciprofloxacin resistance-related variants in the gyrA and parC genes (5). The OOB error rate was ~6.5%. VariantSpark also highlighted significant AMR associations with tRNA synthesis pathway genes, primarily IleS, tgt, and miaB. BitEpi analysis showed gyrA variants had minimal epistatic interactions while tRNA pathway genes variants participated in extensive epistatic interactions (6).
Conclusions: The integration of VariantSpark and BitEpi analyses reaffirmed the involvement of known resistance genes and showed complex genetic interactions within the tRNA synthesis pathway that could influence AMR in NG. Currently not known to contribute to AMR in NG, these genes are known to contribute to bacterial antimicrobial stress-responses in general, with the Iles mutations specifically known to affect ciprofloxacin resistance in Vibrio Cholera (6-7).
An ML-driven analysis of public data identifies potential targets for NG AMR genetic, potentially improving experimental design, gene target selection, and accelerating potential for significant discoveries in combating AMR.