From Sequencing Reads to Microbial Diversity: Bioinformatics Algorithms for Processing Amplicon Sequencing Data

Research output: ThesisDoctoral thesis

Bibtex - Download

@phdthesis{58ce159c281d46c0baa0de4f1d811ab3,
title = "From Sequencing Reads to Microbial Diversity: Bioinformatics Algorithms for Processing Amplicon Sequencing Data",
abstract = "The development of high-throughput sequencing technologies has revolutionized the field of microbial ecology by offering a cost-efficient method to assess microbial diversity at an unseen depth using 16S rRNA amplicon sequencing approaches. Different preprocessing algorithms need to be performed to obtain a collection of highly reliable sequencing reads, ending with a clustering step to group them into Operational Taxonomic Units (OTUs) However, this approach is posing various challenges: the removal of PCR artefacts (called chimera), correction of sequencing errors resulting from the sequencing technologies and clustering those sequences into OTUs. In this work various bioinformatics tools were developed to tackle those challenges. First, an ensemble classifier for chimera detection was developed named CATCh, which obtained a higher performance on different types of sequencing data compared to existing tools. Secondly, two artificial intelligence-based algorithms, NoDe and IPED, able to treat sequencing errors within 454 Pyrosequencing and Illumina MiSeq data respectively, were introduced. A benchmarking study was performed comparing NoDe and IPED, showing a more pronounced decrease of the error rate compared to other state-of-the art tools. Thirdly, a new method was developed introducing an adaptive cut-off score in the OTU clustering step, as such making the results of the OTU clustering less sensitive to variations in evolutionary rates between taxonomic lineages and to the region of the 16S rRNA gene targeted for amplification. Implementing such a dynamic cut-off value resulted in closer correspondence between the number of OTUs and the actual diversity of the samples. Finally, a benchmark analysis comparing existing pipelines for 16S rRNA metagenomics data processing was performed, showing that an integration of our in-house developed algorithms achieved the highest accuracy. Conclusively, the newly developed pipeline within this PhD translates amplicon sequencing data into high-quality OTUs tendering robust diversity estimates.",
keywords = "bioinformatics, microbiology, 16S rRNA amplicon sequencing, high-throughput sequencing",
author = "Mohamed Ahmed",
note = "Score=30",
year = "2016",
month = "9",
day = "27",
language = "English",
isbn = "9789076971230",
school = "VUB - Vrije Universiteit Brussel",

}

RIS - Download

TY - THES

T1 - From Sequencing Reads to Microbial Diversity

T2 - Bioinformatics Algorithms for Processing Amplicon Sequencing Data

AU - Ahmed, Mohamed

N1 - Score=30

PY - 2016/9/27

Y1 - 2016/9/27

N2 - The development of high-throughput sequencing technologies has revolutionized the field of microbial ecology by offering a cost-efficient method to assess microbial diversity at an unseen depth using 16S rRNA amplicon sequencing approaches. Different preprocessing algorithms need to be performed to obtain a collection of highly reliable sequencing reads, ending with a clustering step to group them into Operational Taxonomic Units (OTUs) However, this approach is posing various challenges: the removal of PCR artefacts (called chimera), correction of sequencing errors resulting from the sequencing technologies and clustering those sequences into OTUs. In this work various bioinformatics tools were developed to tackle those challenges. First, an ensemble classifier for chimera detection was developed named CATCh, which obtained a higher performance on different types of sequencing data compared to existing tools. Secondly, two artificial intelligence-based algorithms, NoDe and IPED, able to treat sequencing errors within 454 Pyrosequencing and Illumina MiSeq data respectively, were introduced. A benchmarking study was performed comparing NoDe and IPED, showing a more pronounced decrease of the error rate compared to other state-of-the art tools. Thirdly, a new method was developed introducing an adaptive cut-off score in the OTU clustering step, as such making the results of the OTU clustering less sensitive to variations in evolutionary rates between taxonomic lineages and to the region of the 16S rRNA gene targeted for amplification. Implementing such a dynamic cut-off value resulted in closer correspondence between the number of OTUs and the actual diversity of the samples. Finally, a benchmark analysis comparing existing pipelines for 16S rRNA metagenomics data processing was performed, showing that an integration of our in-house developed algorithms achieved the highest accuracy. Conclusively, the newly developed pipeline within this PhD translates amplicon sequencing data into high-quality OTUs tendering robust diversity estimates.

AB - The development of high-throughput sequencing technologies has revolutionized the field of microbial ecology by offering a cost-efficient method to assess microbial diversity at an unseen depth using 16S rRNA amplicon sequencing approaches. Different preprocessing algorithms need to be performed to obtain a collection of highly reliable sequencing reads, ending with a clustering step to group them into Operational Taxonomic Units (OTUs) However, this approach is posing various challenges: the removal of PCR artefacts (called chimera), correction of sequencing errors resulting from the sequencing technologies and clustering those sequences into OTUs. In this work various bioinformatics tools were developed to tackle those challenges. First, an ensemble classifier for chimera detection was developed named CATCh, which obtained a higher performance on different types of sequencing data compared to existing tools. Secondly, two artificial intelligence-based algorithms, NoDe and IPED, able to treat sequencing errors within 454 Pyrosequencing and Illumina MiSeq data respectively, were introduced. A benchmarking study was performed comparing NoDe and IPED, showing a more pronounced decrease of the error rate compared to other state-of-the art tools. Thirdly, a new method was developed introducing an adaptive cut-off score in the OTU clustering step, as such making the results of the OTU clustering less sensitive to variations in evolutionary rates between taxonomic lineages and to the region of the 16S rRNA gene targeted for amplification. Implementing such a dynamic cut-off value resulted in closer correspondence between the number of OTUs and the actual diversity of the samples. Finally, a benchmark analysis comparing existing pipelines for 16S rRNA metagenomics data processing was performed, showing that an integration of our in-house developed algorithms achieved the highest accuracy. Conclusively, the newly developed pipeline within this PhD translates amplicon sequencing data into high-quality OTUs tendering robust diversity estimates.

KW - bioinformatics

KW - microbiology

KW - 16S rRNA amplicon sequencing

KW - high-throughput sequencing

UR - http://ecm.sckcen.be/OTCS/llisapi.dll/open/19767688

M3 - Doctoral thesis

SN - 9789076971230

ER -

ID: 1492362