Symposium Presentation Australian Society for Microbiology Annual Scientific Meeting 2024

SingleM and Sandpiper: Robust microbial community profiles of Earth’s metagenomes (104563)

Ben J Woodcroft 1 , Samuel Aroney 1 , Raphael Eisenhofer 2 , Rossen Zhao 1 , Mitch Cunningham 3 , Joshua Mitchell 1 , Rizky Nurdiansyah 1 , Linda Blackall 3 , Antton Alberdi 2 , Gene Tyson 1
  1. School of Biomedical Science, Centre for Microbiome Research, Queensland University of Technology (QUT), Translational Research Institute, Brisbane City, QLD, Australia
  2. Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
  3. School of BioSciences, The University of Melbourne, Melbourne, Victoria, Australia

Determining the taxonomy and relative abundance of microorganisms in metagenomic data is a foundational problem in microbial ecology. To address the limitations of existing approaches, we developed ‘SingleM’, which estimates community composition using conserved regions within universal marker genes. SingleM accurately profiles complex communities of known microbial species, and is the only tool that detects species without genomic representation, even those representing novel phyla. Given SingleM’s computational efficiency, we applied it to 248,559 publicly available metagenomes, which are available in an online database ‘Sandpiper’ (https://sandpiper.qut.edu.au/). The vast majority of samples from marine, freshwater, sediment and soil environments are dominated by novel species lacking genomic representation (median relative abundance 75.0%). Quantifying the full diversity of Bacteria and Archaea in metagenomic data shows that microbial genome databases are far from saturated. 

SingleM has several further applications. It can identify metagenomes containing lineages of interest, enabling the targeted recovery of novel metagenome-assembled genomes from underrepresented phyla. Accurate quantification of novel lineages also allows us to estimate the number of reads in a metagenome that are microbial. Soil metagenomes contain mostly microbial reads, but many animal metagenomes are dominated by eukaryotic reads.

Natural selection is a massively parallel set of experiments. Community profiles from across Earth’s ecosystems are the results, showing us each species’ physicochemical growth range. We show optimal growth temperature can be predicted from biogeographical observations. Large-scale estimation of microbial growth conditions may help predict how microbial species will react to, and exacerbate, climate change.