About me
David Ross, Jocelyn Lauzon, Vladimir Makarenkov, Steven Kembel
Developing an efficient and easy-to-use pipeline to identify genomes from metagenomic data
Metagenomics is a powerful approach which studies the assemblage of genes and genomes from complex microbial communities and environmental DNA. It is particularly vital to the understanding of rare, novel, and culture resistant species, and uncommonly studied environments. Bioinformatics tools for metagenome analysis enable the assembly, identification, and characterization of bacterial genomes, known as metagenome-assembled genomes (MAGs), from metagenomic shotgun sequencing. As part of an ongoing study on bacterial community dynamics in the phyllosphere, we used two widely used bioinformatic pipelines to identify MAGs from leaf samples from 5 temperate tree species growing in Quebec. Each pipeline produced similar results, however there was a distinct difference in the analytical approach, computational requirements, and useability of each pipeline. Furthermore, the continuous development of new technologies and algorithms, outpacing the capacity of current pipelines to stay up to date, highlights the need for a new, more comprehensive pipeline for MAG assembly and analysis. We are currently developing a pipeline which will notably include a) the capacity to input short and long reads, either alone or as a hybrid input; b) assembly algorithms designed to handle increasingly large samples with reduced computational demand; and c) a combination of the most commonly used and state of the art AI-based binning algorithms. This pipeline will enable researchers in multiple fields to perform metagenomic analyses, to profit from novel technologies, and to easily adapt them to their needs, regardless of their experience in bioinformatics.