From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data. OCToPUS 16s Metagenomic pipeline

Research output: Contribution to journalArticlepeer-review

Standard

From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data. OCToPUS 16s Metagenomic pipeline. / Ahmed, Mohamed Mysara; Njima, Mercy; Leys, Natalie; Raes, Jeroen; Monsieurs, Pieter.

In: GigaScience, Vol. 6, No. 1-10, 01.02.2017, p. 1-10.

Research output: Contribution to journalArticlepeer-review

Bibtex - Download

@article{48ba9350f0a445c08fd33098d3f57bdf,
title = "From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data.: OCToPUS 16s Metagenomic pipeline",
abstract = "The development of high-throughput sequencing technologies has provided microbial ecologists with an efficient approach to assess bacterial diversity at an unseen depth, particularly with the recent advances in the Illumina MiSeq sequencing platform. However, analyzing such high-throughput data is posing important computational challenges, requiring specialized bioinformatics solutions at different stages during the processing pipeline, such as assembly of paired-end reads, chimera removal, correction of sequencing errors, and clustering of those sequences into Operational Taxonomic Units (OTUs). Individual algorithms grappling with each of those challenges have been combined into various bioinformatics pipelines, such as mothur, QIIME, LotuS, and USEARCH. Using a set of well-described bacterial mock communities, state-of-the-art pipelines for Illumina MiSeq amplicon sequencing data are benchmarked at the level of the amount of sequences retained, computational cost, error rate, and quality of the OTUs. In addition, a new pipeline called OCToPUS is introduced, which is making an optimal combination of different algorithms. Huge variability is observed between the different pipelines in respect to the monitored performance parameters, where in general the amount of retained reads is found to be inversely proportional to the quality of the reads. By contrast, OCToPUS achieves the lowest error rate, minimum number of spurious OTUs, and the closest correspondence to the existing community, while retaining the uppermost amount of reads when compared to other pipelines. The newly introduced pipeline translates Illumina MiSeq amplicon sequencing data into high-quality and reliable OTUs, with improved performance and accuracy compared to the currently existing pipelines.",
keywords = "16S rRNA metagenomics, OTU clustering, amplicon sequencing, chimera, denoising, operational taxonomic units",
author = "Ahmed, {Mohamed Mysara} and Mercy Njima and Natalie Leys and Jeroen Raes and Pieter Monsieurs",
note = "Score=10",
year = "2017",
month = feb,
day = "1",
doi = "10.1093/gigascience/giw017",
language = "English",
volume = "6",
pages = "1--10",
journal = "GigaScience",
issn = "2047-217X",
publisher = "Oxford University Press",
number = "1-10",

}

RIS - Download

TY - JOUR

T1 - From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data.

T2 - OCToPUS 16s Metagenomic pipeline

AU - Ahmed, Mohamed Mysara

AU - Njima, Mercy

AU - Leys, Natalie

AU - Raes, Jeroen

AU - Monsieurs, Pieter

N1 - Score=10

PY - 2017/2/1

Y1 - 2017/2/1

N2 - The development of high-throughput sequencing technologies has provided microbial ecologists with an efficient approach to assess bacterial diversity at an unseen depth, particularly with the recent advances in the Illumina MiSeq sequencing platform. However, analyzing such high-throughput data is posing important computational challenges, requiring specialized bioinformatics solutions at different stages during the processing pipeline, such as assembly of paired-end reads, chimera removal, correction of sequencing errors, and clustering of those sequences into Operational Taxonomic Units (OTUs). Individual algorithms grappling with each of those challenges have been combined into various bioinformatics pipelines, such as mothur, QIIME, LotuS, and USEARCH. Using a set of well-described bacterial mock communities, state-of-the-art pipelines for Illumina MiSeq amplicon sequencing data are benchmarked at the level of the amount of sequences retained, computational cost, error rate, and quality of the OTUs. In addition, a new pipeline called OCToPUS is introduced, which is making an optimal combination of different algorithms. Huge variability is observed between the different pipelines in respect to the monitored performance parameters, where in general the amount of retained reads is found to be inversely proportional to the quality of the reads. By contrast, OCToPUS achieves the lowest error rate, minimum number of spurious OTUs, and the closest correspondence to the existing community, while retaining the uppermost amount of reads when compared to other pipelines. The newly introduced pipeline translates Illumina MiSeq amplicon sequencing data into high-quality and reliable OTUs, with improved performance and accuracy compared to the currently existing pipelines.

AB - The development of high-throughput sequencing technologies has provided microbial ecologists with an efficient approach to assess bacterial diversity at an unseen depth, particularly with the recent advances in the Illumina MiSeq sequencing platform. However, analyzing such high-throughput data is posing important computational challenges, requiring specialized bioinformatics solutions at different stages during the processing pipeline, such as assembly of paired-end reads, chimera removal, correction of sequencing errors, and clustering of those sequences into Operational Taxonomic Units (OTUs). Individual algorithms grappling with each of those challenges have been combined into various bioinformatics pipelines, such as mothur, QIIME, LotuS, and USEARCH. Using a set of well-described bacterial mock communities, state-of-the-art pipelines for Illumina MiSeq amplicon sequencing data are benchmarked at the level of the amount of sequences retained, computational cost, error rate, and quality of the OTUs. In addition, a new pipeline called OCToPUS is introduced, which is making an optimal combination of different algorithms. Huge variability is observed between the different pipelines in respect to the monitored performance parameters, where in general the amount of retained reads is found to be inversely proportional to the quality of the reads. By contrast, OCToPUS achieves the lowest error rate, minimum number of spurious OTUs, and the closest correspondence to the existing community, while retaining the uppermost amount of reads when compared to other pipelines. The newly introduced pipeline translates Illumina MiSeq amplicon sequencing data into high-quality and reliable OTUs, with improved performance and accuracy compared to the currently existing pipelines.

KW - 16S rRNA metagenomics

KW - OTU clustering

KW - amplicon sequencing

KW - chimera

KW - denoising

KW - operational taxonomic units

UR - http://ecm.sckcen.be/OTCS/llisapi.dll/open/27558741

U2 - 10.1093/gigascience/giw017

DO - 10.1093/gigascience/giw017

M3 - Article

VL - 6

SP - 1

EP - 10

JO - GigaScience

JF - GigaScience

SN - 2047-217X

IS - 1-10

ER -

ID: 3098998