IPED: a highly efficient denoising tool for Illumina MiSeq Paired-end 16S rRNA gene amplicon sequencing data

Research output: Contribution to journalArticlepeer-review

Standard

IPED: a highly efficient denoising tool for Illumina MiSeq Paired-end 16S rRNA gene amplicon sequencing data. / Ahmed, Mohamed; Leys, Natalie; Raes, Jeroen; Monsieurs, Pieter.

In: BMC Bioinformatics, Vol. 17, 29.04.2016, p. 192:1-192:11.

Research output: Contribution to journalArticlepeer-review

Bibtex - Download

@article{b745134cd834441c84a8c00babd67b5e,
title = "IPED: a highly efficient denoising tool for Illumina MiSeq Paired-end 16S rRNA gene amplicon sequencing data",
abstract = "Background The development of high-throughput sequencing technologies has revolutionized the field of microbial ecology via the sequencing of phylogenetic marker genes (e.g. 16S rRNA gene amplicon sequencing). Denoising, the removal of sequencing errors, is an important step in preprocessing amplicon sequencing data. The increasing popularity of the Illumina MiSeq platform for these applications requires the development of appropriate denoising methods. Results The newly proposed denoising algorithm IPED includes a machine learning method which predicts potentially erroneous positions in sequencing reads based on a combination of quality metrics. Subsequently, this information is used to group those error-containing reads with correct reads, resulting in error-free consensus reads. This is achieved by masking potentially erroneous positions during this clustering step. Compared to the second best algorithm available, IPED detects double the amount of errors. Reducing the error rate had a positive effect on the clustering of reads in operational taxonomic units, with an almost perfect correspondence between the number of clusters and the theoretical number of species present in the mock communities. Conclusion Our algorithm IPED is a powerful denoising tool for correcting sequencing errors in Illumina MiSeq 16S rRNA gene amplicon sequencing data. Apart from significantly reducing the error rate of the sequencing reads, it has also a beneficial effect on their clustering into operational taxonomic units. IPED is freely available at http://​science.​sckcen.​be/​en/​Institutes/​EHS/​MCB/​MIC/​Bioinformatics/​.",
keywords = "bioinformatics, 16S rRNA amplicon sequencing, metagenomics, IPED",
author = "Mohamed Ahmed and Natalie Leys and Jeroen Raes and Pieter Monsieurs",
note = "Score=10",
year = "2016",
month = apr,
day = "29",
doi = "10.1186/s12859-016-1061-2",
language = "English",
volume = "17",
pages = "192:1--192:11",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "Springer",

}

RIS - Download

TY - JOUR

T1 - IPED: a highly efficient denoising tool for Illumina MiSeq Paired-end 16S rRNA gene amplicon sequencing data

AU - Ahmed, Mohamed

AU - Leys, Natalie

AU - Raes, Jeroen

AU - Monsieurs, Pieter

N1 - Score=10

PY - 2016/4/29

Y1 - 2016/4/29

N2 - Background The development of high-throughput sequencing technologies has revolutionized the field of microbial ecology via the sequencing of phylogenetic marker genes (e.g. 16S rRNA gene amplicon sequencing). Denoising, the removal of sequencing errors, is an important step in preprocessing amplicon sequencing data. The increasing popularity of the Illumina MiSeq platform for these applications requires the development of appropriate denoising methods. Results The newly proposed denoising algorithm IPED includes a machine learning method which predicts potentially erroneous positions in sequencing reads based on a combination of quality metrics. Subsequently, this information is used to group those error-containing reads with correct reads, resulting in error-free consensus reads. This is achieved by masking potentially erroneous positions during this clustering step. Compared to the second best algorithm available, IPED detects double the amount of errors. Reducing the error rate had a positive effect on the clustering of reads in operational taxonomic units, with an almost perfect correspondence between the number of clusters and the theoretical number of species present in the mock communities. Conclusion Our algorithm IPED is a powerful denoising tool for correcting sequencing errors in Illumina MiSeq 16S rRNA gene amplicon sequencing data. Apart from significantly reducing the error rate of the sequencing reads, it has also a beneficial effect on their clustering into operational taxonomic units. IPED is freely available at http://​science.​sckcen.​be/​en/​Institutes/​EHS/​MCB/​MIC/​Bioinformatics/​.

AB - Background The development of high-throughput sequencing technologies has revolutionized the field of microbial ecology via the sequencing of phylogenetic marker genes (e.g. 16S rRNA gene amplicon sequencing). Denoising, the removal of sequencing errors, is an important step in preprocessing amplicon sequencing data. The increasing popularity of the Illumina MiSeq platform for these applications requires the development of appropriate denoising methods. Results The newly proposed denoising algorithm IPED includes a machine learning method which predicts potentially erroneous positions in sequencing reads based on a combination of quality metrics. Subsequently, this information is used to group those error-containing reads with correct reads, resulting in error-free consensus reads. This is achieved by masking potentially erroneous positions during this clustering step. Compared to the second best algorithm available, IPED detects double the amount of errors. Reducing the error rate had a positive effect on the clustering of reads in operational taxonomic units, with an almost perfect correspondence between the number of clusters and the theoretical number of species present in the mock communities. Conclusion Our algorithm IPED is a powerful denoising tool for correcting sequencing errors in Illumina MiSeq 16S rRNA gene amplicon sequencing data. Apart from significantly reducing the error rate of the sequencing reads, it has also a beneficial effect on their clustering into operational taxonomic units. IPED is freely available at http://​science.​sckcen.​be/​en/​Institutes/​EHS/​MCB/​MIC/​Bioinformatics/​.

KW - bioinformatics

KW - 16S rRNA amplicon sequencing

KW - metagenomics

KW - IPED

UR - http://ecm.sckcen.be/OTCS/llisapi.dll/open/12356407

U2 - 10.1186/s12859-016-1061-2

DO - 10.1186/s12859-016-1061-2

M3 - Article

VL - 17

SP - 192:1-192:11

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

ER -

ID: 989460