Poster Presentation Australian Society for Microbiology Annual Scientific Meeting 2024

A catalogue of small proteins from the global microbiome (#17)

Yiqian Duan 1 , Célio Dias Santos Júnior 1 2 , Thomas S.B. Schmidt 3 , Anthony Fullam 3 , Breno L. S. de Almeida 1 , Chengkai Zhu 1 , Kuhn Michael 3 , Xing-Ming Zhao 1 , Peer Bork 3 4 5 , Luis Pedro Coelho 1 6
  1. Fudan University, Shanghai, China
  2. Laboratory of Microbial Processes & Biodiversity - LMPB; Department of Hydrobiology, Universidade Federal de São Carlos – UFSCar, São Carlos, São Paulo, Brazil, São Paulo, Brazil
  3. Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, Heidelberg, Germany
  4. Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón, Madrid, Spain, Madrid, Spain
  5. Max Delbrück Centre for Molecular Medicine, Berlin, Germany Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany, Würzburg
  6. Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia, Brisbane, Australia
Small Open Reading Frames (smORFs, operationally defined as those with fewer than 100 codons) are often neglected due to the limitation of computational and experimental methods. Recently, systematic studies have shown that smORFs are widely distributed ecologically and evolutionarily and perform diverse functions.

However, little is known about the distribution and role of smORFs in the global microbiome. Therefore, we constructed the global microbial smORFs catalogue (GMSC) from 63,410 publicly-available metagenomes across 72 distinct habitats and 87,920 high-quality isolated microbial genomes. GMSC (available at https://gmsc.big-data-biology.org/) contains 964,970,496 non-redundant smORFs, the majority of which are novel. We found that archaea harbor more small proteins than bacteria (as a fraction of their genomes). 

To enable the use of this resource, we provide a tool called GMSC-mapper that can identify and annotate reliable smORFs from microbial genomes and metagenomes. The resource shows an immense and underexplored diversity of smORFs across different habitats and taxonomy and can be used to research the presence, distribution, and role of smORFs at the global scale.