BacPipe: A rapid, user-friendly whole genome sequencing pipeline for clinical diagnostic bacteriology

Research output: Contribution to journalArticle

Authors

  • Basil B. Xavier
  • Mohamed Ahmed
  • Mattia Bolzan
  • Bruno Ribeiro-Gonçalves
  • Blaise T.F. Alako
  • Peter Harrison
  • Christine Lammens
  • Samir Kumar-Singh
  • Herman Goossens
  • João André Carriço
  • Guy R. Cochrane
  • Surbhi Malhotra-kumar

Institutes & Expert groups

  • VAXINFECTIO - Vaccine & Infectious Disease Institute, University of Antwerp
  • UA - Universiteit Antwerpen
  • iMM - Instituto de Medicina Molecular, João Lobo Antunes
  • EMBL-EBI - The European Bioinformatics Institute - Cambridge Outstation
  • IST - Instituto Superior Técnico, Universidade de Lisboa
  • EMBL-EBI - European Molecular Biology Laboratory, European Bioinformatics Institute

Documents & links

Abstract

Objectives: Despite rapid advances in whole genome sequencing (WGS) technologies, their integration into routine microbiological diagnostics has been hampered by the need for standardised downstream bioinformatics analysis. Here we developed a comprehensive and computationally low-resource bioinformatics pipeline (BacPipe) enabling direct analyses of bacterial whole-genome sequences (raw reads or contigs) obtained from second or third-generation sequencing technologies. Methods: Open-access tools for quality verification, de novo assembly (SPAdes), annotation (Prokka), bacterial typing (MLST, emm typing), and for identification of resistance genes (Resfams), plasmids, virulence genes, single nucleotide polymorphisms (SNPs) and core genome phylogeny were integrated into a single Python script. A graphical user interface (GUI) was developed to allow real-time progression of the analysis. The scalability and speed of BacPipe in handling large data-sets was further demonstrated using 4139 Illumina paired-end sequence files of publicly-available bacterial genomes (2.9−5.4 Mb) from the European Nucleotide Archive (ENA). Results: Computational time on Bacpipe, demonstrated on a 8 Gb RAM personal computer, was 21, 25, 28 and 30 minutes for sequencing coverage of 50-, 70-, 100- and 120-folds of a 5.1 Mb bacterial genome, respectively. Compiled results of every individual genome/strain are saved as an Excel file. Up to 56% reduction in analysis time was achieved by a unique parallelization of post-assembly and post-annotation tools in Bacpipe compared to running these tools in succession. On the 4139 Illumina paired-end sequence files, running time was on average 50 minutes/strain. Bacpipe is integrated in EBI-SELECTA, a project-specific portal (H2020 COMPARE), and is also available as an independent docker image that can be used across Windows- and Unix-based systems. Conclusion: BacPipe offers a fully automated ‘one-stop’ bacterial WGS analysis pipeline with a user-friendly GUI which can contribute to overcome the major hurdle of WGS data analysis in hospitals and public-health and for infection-control monitoring.

Details

Original languageEnglish
Article number100767
Pages (from-to)1-28
Number of pages28
JournaliScience
Volume23
Issue number100769
Publication statusPublished - 9 Dec 2019

Keywords

  • Biological Sciences Research Methodologies, Microbiology, Sequence Analysis, BacPipe

ID: 5890914